Hash Tables & Collisions
Hash Tables & Collision Resolution
Introduction: The Promise
The Hash Table (or Hash Map) is arguably the most important data structure in programming. It promises constant time for insertion, deletion, and lookup.
However, since the space of possible keys is infinite and the array size is finite, two different keys will eventually map to the same index. This is a Collision. How a system handles collisions defines its performance and memory footprint.
What Problem does it solve?
- Input: A Key-Value pair.
- Output: Instant access to the Value using the Key.
- The Promise: Performance that doesn’t slow down as the dataset grows.
Core Idea (Intuition)
Think of a massive wall of Numbered Mailboxes.
- You take a name (“Luke”).
- A Hash Function turns “Luke” into a number (e.g., 42).
- You go directly to mailbox #42.
- If “Sarah” also hashes to 42, you have a problem.
Collision Resolution Strategies
1. Separate Chaining (Used by Java HashMap)
Each “bucket” is actually a Linked List.
- If multiple keys hash to index 42, they are stored in a list at that index.
- Pros: Easy to implement, never fills up.
- Cons: If many collisions occur, degrades to (searching the list).
2. Open Addressing (Used by Python dict)
If index 42 is taken, look for the next available spot.
- Linear Probing: Look at 43, 44, 45…
- Pros: Better cache locality (all data is in one array).
- Cons: “Clustering” (long chains of occupied spots slow everything down).
Typical Engineering Concerns
- Load Factor: When the table is 75% full, performance drops. Most tables will Resize (double in size) at this point.
- Hash Flooding Attack: If an attacker knows your hash function, they can send thousands of keys that all collide at index 0, turning your server into an snail. Modern languages use Randomized Hashing to prevent this.
Performance & Complexity
- Time (Average): .
- Time (Worst): (when everything collides).
- Space: .
Summary
“Hash Tables turn keys into array indices for instant access. Chaining and Open Addressing are the two ways we handle the ‘traffic jam’ when two keys want the same spot.”
