Before we dive into the article, here’s a link to the original CRDT paper

CRDTS: conflict free replicated data types

CRDTS are data structures that guarantee convergence under concurrency, without mandate of coordination. But that’s only the definition. The deeper story is that CRDTs aren’t just some easy database hack, thinking of them as such tragically undersells their potential.

Why do they exist?

When we think of concurrency and replication, most devs default to databases. Why? Because, historically, all the tutorials, examples, and even research papers framed the problem in terms of conflicts in databases. Even Shapiro’s foundational papers pitched CRDTs as tools for eventually consistent replicated data stores. At that time, this framing was sensible. But! It trapped the narrative, “CRDTs are that thing you use when dbs fight over writes. (We’ll come to this in a bit) We see this in practice, too. Many powerful and widely used apps like Figma and Excalidraw use CRDTs under the hood for their real-time collaboration features, but most users or even devs don’t realise it.

What are they really?

Effectively CRDTs are just data structures with special, deterministic merge rules.
Two replicas can be updated independently
When they sync, their states converge deterministically, without requiring locks or coordination, or conflict resolution logic

CRDT’s merge operation must be:

Commutative - merge(a,b) = merge(b,a) (the order of replicas doesnt matter)
Associative - merge(a,merge(b,c)) = merge(merge(a,b),c) (you can merge in any grouping)
Idempotent - merge(a,a) = a (merging the same state twice doesn’t change it)

These three properties guarantee that no matter how many replicas, in whatever order, or how often they exchange their states, everyone converges to the same value.

Example

Imagine two friends, Alice and Bob (yes THAT Alice and Bob indeed) are throwing a surprise birthday party for their friend Charlie. They are using a shared shopping list app on their phones, and they’re both offline in different trains on the metro

The data structure: This app uses a simple CRDT called a G-Set (a grow only set). In a g-set you can only add items and the merge rule is a simple union (U) of all the sets

The initial state: both of them start with an empty list
empty lists

Alice’s actions (offline state): Alice thinks we need cake and streamers to make it a nice party. She adds them to her list
Alice’s local state:

Bob’s Actions ( offline state): Bob wonders, we need really nice pictures of this party! He adds Party Hats and Cake to his list
Bob’s local state:

When they both exit the train, their apps sync! The merge function simply combines the two lists:

Here are the states so far in the view of the app:
merged state

Guaranteed Convergence: This is the final converged state for both of them
alt text

Notice! the merge rule (union) is commutative, associative, idempotent — so no matter how many times they sync, they’ll see the same list. Cake was added by both of them, but since a set only contains unique items, it only appears once. There was no requirement of a CENTRAL SERVER to decide whose update came first! The determinism of the data structures rules handled it automatically.

What does “databases fight” mean?

Now that we have an intuition, here’s the classic problem CRDTs were invented to avoid:

Imagine a normal replicated key-value store with 2 replicas of the same row tacos_consumed = 3
Replica A increments tacos_consumed to 4
Replica B increments tacos_consumed to 4 at the SAME TIME?!
When they sync, without special handling one replica overwrites the other and you end up with tacos_consumed=4 instead of 5
That’s a lost update!

Traditional systems avoid this with locks or consensus, but that just kills the availability of the system, and also inhibits offline work, reducing efficiency. CRDTs solve the same problem in a different way: by designing the data type itself so merges are deterministic and no updates are lost.

Beyond the Database: The Real Playground

Thinking of CRDTs as just another data structure, opens up a cosmos of infinite possibilities far beyond traditional databases. They’re a perfect fit for any system that needs to be shared and updated from multiple sources.

Here are some of my musings on the topic:

Config syncing: Keep config files for a fleet of servers in sync without the need for a complex consensus protocol
Multiplayer games: A players game state ( position, inventory, health) can be modeled as a CRDT and synced with other players, reducing the reliance on a central authoritative game server for every action
Iot - edge devices can collect data offline and sync with a server when they reconnect without losing any sensor readings that may be crucial to the workflow

Closing thoughts

The true potential of CRDTS is in shifting coordination from an infrastructure problem to a library problem. Instead of setting up complex locking systems, consensus protocols, or relying on a single database as the “source of truth”, we can use a data structure that has convergence built into its DNA.

We already have the tools to mitigate the pain of distributed state, but we need to start treating CRDTs as first class programming primitives, the same way we think of maps, sets or dictionaries, but designed for a distributed, multi writer world.

Under the hood, CRDTs are built on semilattices from order theory i.e. structures where any two elements have a least upper bound. The merge operation computes that bound. Understanding this lets us design our own CRDTs instead of even using pre-made ones!

In a few years CRDTs will probably be everywhere and invisible. Embedded directly into programming language runtimes and communication protocols, much like how the concept of garbage collection is today.

Databases may have been CRDTs first home, but they certainly won’t be their last. So the next time you build a distributed system, don’t just reach for a database. Think in CRDTs :)

Shift.