Member-only story
Combining strict order with massive parallelism using Kafka
Getting the best of both worlds
Having been involved in several large-scale Kafka projects for different clients across a broad range of industries, I have heard my fair share of questions on Apache Kafka — ranging from the fundamental to the esoteric. One question that never seems to go out of fashion is: How can you maintain strict order, yet still process records in parallel?
And it’s a fair question. Strict order assumes linearizability, the very notion of which seems to contradict with the objectives of parallelism.

Partial and total order
We will start by exploring the notion of order.
As expected of an event-streaming platform, Kafka preserves the order of published records, providing those records occupy the same partition. In order to understand what this means in practice, one needs to explore the architecture of Kafka topics, and the underlying sharding mechanism — partitions.
Note: Kafka uses the term ‘record’, where others might use ‘message’ or ‘event’.
A partition is a totally ordered sequence of records, and is fundamental to Kafka. A record has an ID — a 64-bit integer offset, and a millisecond-precise timestamp. Also, it may have a key and a value; both are byte arrays and both are optional. The term ‘totally ordered’ simply means that for any given producer, records will be written in the order they were emitted by the application. If record P was published before Q, then P will precede Q in the partition. (Assuming P and Q share a partition.) Furthermore, they will be read in the same order by all consumers; P will always be read before Q, for every possible consumer. This ordering guarantee is vital in a large majority of use cases; published records will generally correspond to some real-life events, and preserving the timeline of these events is often essential.
The diagram below illustrates the linear arrangement of records in a single partition.