Hash Map
Hash Map: The Speed of Light
The Story: The Librarian with Teleportation
Imagine a library with 10 million books. In a normal library (an Array or a List), you have to walk past every shelf to find a book. Even in a sorted library (Binary Search), you still have to check 24 shelves.
In 1953, H.P. Luhn (the same genius behind hashing) proposed a different kind of library. In this library, the book’s title is the secret address. You say the title “The Great Gatsby,” apply a magic formula, and it tells you: “It’s on shelf #429.” You don’t walk; you teleport to shelf #429.
This is the Hash Map (or Hash Table). It is the most important data structure in modern software because it makes searching practically instantaneous ( time).
Why do we need it?
If you ever used a dict in Python, an Object in JavaScript, or a Map in Java, you used a Hash Map.
- Database Indexes: How does a database find your record among billions? It uses a Hash index.
- Caching: When you want to save a result (like a user’s profile) so you don’t have to fetch it again, you put it in a Map.
- Counting: Counting the frequency of every word in a book.
How the Algorithm “Thinks”
The algorithm is a matchmaker between keys and memory.
- The Formula (Hashing): It takes your Key (e.g., “Luke”) and turns it into a large number using a Hash Function.
- The Address (Compression): It takes that large number and shrinks it (usually with a Modulo
%operator) to fit the size of its internal storage (an Array). - The Storage: It puts the Value at that specific index.
- The Conflict Resolution: What if two different keys (e.g., “Luke” and “Sun”) want the same index?
- Chaining: It creates a small list at that index to hold both.
- Open Addressing: It looks for the next empty seat nearby.
Engineering Context: The Load Factor
A Hash Map is fast, but it is memory-hungry.
- Space vs Time: To keep it fast, the map must be mostly empty. If the map gets too full (a high “Load Factor”), collisions increase, and speed degrades into (the speed of a slow list).
- Resizing: When the map hits about 75% capacity, it “Resizes”—it creates a massive new warehouse and re-calculates every single address. This is a heavy operation, which is why we often pre-allocate space if we know the size.
Implementation (Python)
class SimpleHashMap:
def __init__(self, size=100):
self.size = size
# A list of lists to handle collisions via Chaining
self.table = [[] for _ in range(self.size)]
def _hash(self, key):
# Python's built-in hash() handles the dirty work
return hash(key) % self.size
def put(self, key, value):
index = self._hash(key)
# Update if exists, else add
for i, (k, v) in enumerate(self.table[index]):
if k == key:
self.table[index][i] = (key, value)
return
self.table[index].append((key, value))
def get(self, key):
index = self._hash(key)
for k, v in self.table[index]:
if k == key:
return v
return None # Not foundSummary
The Hash Map teaches us that knowledge is about location. By turning the content of a question into the address of its answer, we bypass the need for searching. It reminds us that in engineering, sometimes the best way to speed up a process is to spend a little more on space.
