Category Archives: Tech

State of JVM languages

Java is still the top dog on JVM, but there are plenty of alternatives for those programmers who a looking for a change, but how valid are these options? It’s hard to make a case for learning a fringe language apart from personal enjoyment, but do these alternate languages offer valid career paths. To do so, what you need is a critical mass of developers using it to motivate tool makers to create proper tools for those languages. When there are users, then libraries will follow.

I selected tree of the most talked jvm languages apart from Java which are: Scala, Clojure and Kotlin.

Scala is the oldest of these three being released already in 2004. Clojure followed in 2007 and Kotlin being the most recent of these languages being unveiled in 2011 and reaching version 1 early 2016.
 

New Interest

Past 12 Months:

Screen Shot 2017-10-25 at 10.40.59 PM
Blue: Scala tutorial, yellow: Kotlin tutorial, Red: Clojure tutorial

Link to most recent graph.

Past 5 years:

Screen Shot 2017-10-25 at 10.43.49 PM
Link

Scala seems still the most interesting to newcomers. Kotlin popularity clearly spiked mid 2017, but the hype has slowed down a bit since.

Job Market

How useful are these languages in the job market.

LinkedIn Job Search:

software engineer scala Showing 5,540 results
software engineer clojure => Showing 684 results
software engineer kotlin => Showing 586 results

engineer scala => Showing 7,701 results
engineer clojure => Showing 778 results
engineer kotlin => Showing 433 results

data scala => Showing 10,076 results
data clojure => Showing 758 results
data kotlin => Showing 254 results

Based on LinkedIn Worldwide job search Scala is mentioned in roughly 10 times more job adds than Clojure. Kotlin seems to be catching on quite quickly, apparently being officially supported by Android drives adoption. I would be surprised if it did not take over Clojure in popularity in the next 6 months.

Scala benefits from growing data science/engineering market as it’s one of the most important languages in that domain alongside Python. Quite a few data processing tools(Spark, Kafka) are written in scala making it the most natural fit for

Salaries

How well do these salaries pay:
It’s hard to find reliable data on how jobs in given languages pay. Googling around I found this article:
https://gooroo.io/GoorooTHINK/Article/16300/Programming-languages–salaries-and-demand-May-2015/18672#.WfEECxOCzXE
Looks like clojure pays pretty well and clearly better than scala. Both however pay clearly better than numerous java or javascript jobs. Jobs that require functional language knowledge are still fairly few, but if you manage to land one, you will be pretty well compensated.

Advertisements

Getting Started With Spark 2.x Streaming and Kafka

I’ve been digging into spark more and more lately and I had some trouble finding up to date tutorials on getting started with Kafka and Spark Streaming (especially for 2.x and for kafka 0.10). Especially if you want to run your own code easily.While running streaming jobs with spark-shell is not really recommended I find it very convenient to get started as you don’t even need to compile the code.

Up and running with Kafka

First things first, you need a kafka producer running. You can find the official quickstart guide here: https://kafka.apache.org/quickstart, but for the sake of simplicity I will repeat the relevant parts here.

1. Get the kafka distribution:

2. Run zookeeper

  • $ [kafka_home]/bin/zookeeper-server-start.sh config/zookeeper.properties

3. Run kafka server

  • $ [kafka_home]/bin/kafka-server-start.sh config/server.properties

4. Create topic

  • $ [kafka_home]/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

5. Run kafka producer

  • $ [kafka_home]/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

You can skip creating the consumer.

Setup Spark 2.x

First we need to download and setup spark.

1. Get spark

2. Verify Spark works

  • You can verify that spark-shell works by launching it [spark_home]/bin/spark-shell
  • Use CTRL + C to quit

Create Spark Streaming Application

Let’s create a new folder for our streaming applications. Lets call it “kafka-spark-stream-app”.
So now the folder structure should look something like:
/[kafka_home]
/[spark_home]
/kafka-spark-stream-app

Let’s create a file for the word count streaming example, use any text editor to create a file called /kafka-spark-stream-app/kafkaSparkStream.scala

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe

// Create the context with a 1 second batch size
val ssc = new StreamingContext(sc, Seconds(1))

val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "localhost:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "use_a_separate_group_id_for_each_stream"
)

val topics = Array("kafka_metadata_example")

val stream = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

val lines = stream.map(_.value)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()

To run this, we need to still add few jars to the classpath so let’s create a subfolder to the /kafka-spark-stream-app/jars and add two jars there:

Both can be find for example on mvnrepository.com.So now the folder structure should look like this:
/[kafka_home]
/[spark_home]
/kafka-spark-stream-app
/kafka-spark-stream-app/kafkaSparkStream.scala
/kafka-spark-stream-app/jars/kafka-clients-0.10.2.1.jar
/kafka-spark-stream-app/jars/spark-streaming-kafka-0-10_2.11-2.1.1.jar

*In case you use different version of Spark, make sure you have the corresponding version of the spark-streaming-kafka -library as well.

Running the spark streaming script

Now all we have is left is to run the script assuming you have the kafka running as setup in the beginning of this article.

$ [spark_home]/bin/spark-shell --jars ../kafka-spark-stream-app/jars/spark-streaming-kafka-0-10_2.11-2.1.1.jar,../kafka-spark-stream-app/jars/kafka-clients-0.10.2.1.jar -i ../kafka-spark-stream-app/kafkaSparkStream.scala

Basically we just tell spark-shell that the script will require those two jars to run and where they are and with the switch -i we tell spark-shell to run the script from the given file. Now post some text from the kafka console producer and see the spark streaming application printing out the word counts of the given phrase.

Followup

Of course it does not make sense to run any real spark streaming application like this, but it’s very convenient to be able to run scripts without having to set up a proper Scala project. I will follow up on how to wrap to application in a proper sbt project soon.

Power to Powershell

The somewhat “new” and fancy powershell does seem to have some rather nice features. This is not to say it’s anyway better than a real unix shell, but you can get some pretty neat stuff done with it in a rather simple manner. One of the old problems I’ve had (other than being stuck on windows) is that when ever a new version of java comes, I need to juggle between different versions of them depending on which project I am working on.

There are 2 environment variables you need to do to change current java version when using command line:

  1. JAVA_HOME
  2. Path

Normal way to add the java commands to path is using the JAVA_HOME environment variable (ie JAVA_HOME\bin). The problem is that the path is resolved when you start the powershell so it replaces all the environment variables with their values so changing the JAVA_HOME is not enough. You need also update the Path and updating both manually is quite tedious.

However powershell provides a way to define functions in the profile and they are quite perfect way to manage java versions.

function java8 {
  $env:JAVA_HOME="C:\Program Files\Java\jdk1.8.0"
  $env:Path=$env:JAVA_HOME + "\bin;" + $env:Path
}

function java7 {
  $env:JAVA_HOME="C:\Program Files\Java\jdk1.7.0_25"
  $env:Path=$env:JAVA_HOME + "\bin;" + $env:Path
}

This is certainly not perfect and if you change the java version many times, the path will get quite long, but I don’t consider that a much of a problem. I usually fire up a new instance anyway and then Path is reverted back to the original.