Vector Clocks
Vector Clocks
Introduction: The “Time” Problem
In a single computer, we have one clock. If Event A happens at 10:00:01 and Event B at 10:00:02, we know A happened before B.
In a Distributed System, server clocks are never perfectly synced. If Server 1 thinks it’s 10:00 and Server 2 thinks it’s 09:59, we lose the sense of time. If two people edit the same document on different servers, how do we know who was first? Or did they happen at the same time?
Vector Clocks provide a way to track Causality (what caused what) without needing a central clock.
What Problem does it solve?
- Input: Events occurring on different nodes.
- Output: A logical timestamp that determines if one event “happened before” another or if they are “concurrent” (conflicting).
- The Promise: Reliable conflict detection in distributed databases.
How it Works
- The Vector: Each node keeps a list (vector) of counters, one for every node in the system:
[Node_A: 0, Node_B: 0, Node_C: 0]. - Updating: Every time a node performs an action, it increments its own counter in its vector.
- Syncing: When Node A sends data to Node B, it sends its whole vector. Node B merges it by taking the maximum of each counter.
Comparing Two Clocks:
- Happened Before: If every counter in Vector A is Vector B, then A happened before B.
- Conflict: If some counters in A are higher than B, and others are lower, the events happened concurrently. We have a conflict!
Typical Business Scenarios
✅ Collaborative Editing: Determining if two users edited the same paragraph simultaneously (e.g., Figma or Google Docs internals).
✅ DynamoDB / Riak: These “AP” (High Availability) databases use vector clocks to detect when two versions of a record exist and ask the user to resolve the conflict (Read Repair).
✅ Distributed Version Control: Git uses a similar concept of directed acyclic graphs (DAG) to track history.
❌ Large Clusters: If you have 10,000 nodes, the vector becomes 10,000 items long. This is too heavy. Solutions like Dotted Version Vectors are used instead.
Performance & Complexity
- Space: where is the number of nodes.
- Comparison: .
Summary
“Vector Clocks are the ‘Logical Chronometers’ of the cloud. They allow us to reconstruct the story of our data and identify exactly when two events clashed, even without a shared clock.”
