As per the code in my last post, I’ve been messing around with Kafka.
I’m reading an input file that’s over a GB in size, with about 25.5 million rows in lazily, and then performing some transformations on it before handing it over to Kafka.
There’s also a consumer running in the terminal, to eyeball what’s happening.
Looking at serialization strategies, I did a quick compare of nippy versus JSON (via clojure’s data.json
library). I also threw in Jsonista, as their benchmarks versus Cheshire - already faster than the core JSON library - were impressive.
Here’s the slightly noddy code:
Just on my laptop, but with a couple of runs against a single broker, with a consumer reading every message off (in the terminal), I saw the following.
These were run a few times to make sure the times were +/- 10%, but only the final run is recorded here. I’ve rounded them too.
JSON
:body
key of a new hashmapNippy
:body
key of a new hashmapJsonista
:body
key of a new hashmapAlthough the time on this wobbled around a bit, the quickest it ran in was an astonishing ~168s, making it ~150k msgs/s on that run. Whoa.
It’s worth noting that using non-lazy io blows up. The lazy hype is real, yo!
You could, I suppose, poll for a set number of lines and then you could batch and parallelize, so maybe I will try that to see how it performs against using comp
‘d functions.
Hardly that scientific, but it’s something.