Naive Kafka and Clojure performance

Hardly rigorous, but I've written it down anyway...

17 April 2018

As per the code in my last post, I’ve been messing around with Kafka.

I’m reading an input file that’s over a GB in size, with about 25.5 million rows in lazily, and then performing some transformations on it before handing it over to Kafka.

There’s also a consumer running in the terminal, to eyeball what’s happening.

Looking at serialization strategies, I did a quick compare of nippy versus JSON (via clojure’s data.json library). I also threw in Jsonista, as their benchmarks versus Cheshire - already faster than the core JSON library - were impressive.

Here’s the slightly noddy code:

Just on my laptop, but with a couple of runs against a single broker, with a consumer reading every message off (in the terminal), I saw the following.

These were run a few times to make sure the times were +/- 10%, but only the final run is recorded here. I’ve rounded them too.




Although the time on this wobbled around a bit, the quickest it ran in was an astonishing ~168s, making it ~150k msgs/s on that run. Whoa.

It’s worth noting that using non-lazy io blows up. The lazy hype is real, yo!

You could, I suppose, poll for a set number of lines and then you could batch and parallelize, so maybe I will try that to see how it performs against using comp‘d functions.

Hardly that scientific, but it’s something.

Fork me on GitHub