Scala Up and Running

Goal

Set up a Scala development environment and create a working project with SBT, including configuration management, testing, and packaging.

Installation

Java

## Install AdoptOpenJDK 11
## Other JDK distributions may fail at runtime with HTTPS issues

Scala

## Install SBT from official site
sbt scalaVersion

Project Setup

Structure

root/
├── build.sbt
├── project/
│   └── plugins.sbt
└── src/
    ├── main/
    │   ├── resources/
    │   │   └── app.conf
    │   └── scala/bkr/data/spark/
    │       ├── App.scala
    │       └── AppConfig.scala
    └── test/
        ├── resources/
        │   └── App.conf
        └── scala/bkr/data/spark/
            └── AppConfigTests.scala

build.sbt

ThisBuild / version      := "0.1.0"
ThisBuild / scalaVersion := "2.13.8"
ThisBuild / organization := "gs"
ThisBuild / scalacOptions ++= Seq("-unchecked", "-deprecation")

lazy val KafkaStreamProcessing = (project in file("."))
  .settings(
    name := "SparkProcessing",

    libraryDependencies ++= List("org.apache.spark" %% "spark-core" % "3.2.0",
                                "org.apache.spark" %% "spark-sql" % "3.2.0",
                                "org.apache.spark" %% "spark-sql-kafka-0-10" % "3.2.0",
                                "org.apache.spark" %% "spark-avro" % "3.2.0", 
                                "org.apache.hadoop" % "hadoop-common" % "3.3.1",
                                "org.apache.hadoop" % "hadoop-azure" % "3.3.1"),
    
    libraryDependencies += "io.confluent" % "kafka-schema-registry-client" % "7.0.0" from "https://packages.confluent.io/maven/io/confluent/kafka-schema-registry-client/7.0.0/kafka-schema-registry-client-7.0.0.jar",

    libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.8" % Test,
    libraryDependencies += "com.typesafe" % "config" % "1.4.1",

    assembly / assemblyJarName := s"${name.value}.jar",
    assembly / assemblyMergeStrategy := { 
      case PathList("META-INF", xs @ _*) => MergeStrategy.discard
      case _ => MergeStrategy.first
    }
  )

project/plugins.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.2")

AppConfig.scala

package bkr.data.spark

import com.typesafe.config.ConfigFactory

object AppConfig {
    
    private val environment: String = {
        val env = sys.env.get("env")
        env.getOrElse("local")
    }

    private val appConfig = ConfigFactory.load(s"app.$environment.conf")

    def apply() = appConfig
}

App.scala

package bkr.scala.upAndRunning

object Application extends App {
    if (args.length == 0) throw new Exception("No arguments specified")

    val url = args(0)

    val response = scala.io.Source.fromURL(url).mkString

    println(response)
}

AppConfigTests.scala

import org.scalatest.funsuite._

class AppConfigTests extends AnyFunSuite {
  test("Hello should start with H") {
    assert("Hello".startsWith("H"))
  }
}

.gitignore

target/

SBT Commands

Basic Operations

sbt compile
sbt reload
sbt test
sbt test:compile
sbt run
sbt projects
sbt dependencyTree

Run with Arguments

sbt "run arg0Value arg1Value"

Subproject Commands

sbt [SUBPROJECT_NAME]/compile
## Example:
sbt helloCore/compile

Interactive Shell

sbt
## Opens SBT shell, then run commands like:
console  # Opens Scala REPL
:q       # Exit Scala REPL

Packaging

ZIP Distribution

sbt dist

Unzip and run:

cd publish
.\bin\hello

With custom config:

/bin/hello -Dconfig.file=/full/path/to/conf/application.prod.conf

Fat JAR

sbt assembly

The JAR is created in target/scala-2.13/[ProjectName].jar:

java -jar target/scala-2.13/SparkProcessing.jar arg0Value

Dockerize

FROM hseeberger/scala-sbt:8u302_1.5.5_2.13.6
WORKDIR /app
COPY . ./
RUN sbt compile
ENTRYPOINT ["sbt", "run"]