Rabin-Karp Algorithm
Rabin-Karp: The Digital Fingerprint
The Story: The Efficiency of the Seal
In ancient times, if a King wanted to verify if a letter was authentic, he didnât read the whole letter first. He looked at the Wax Seal. If the seal matched his signet ring, he knew the letter was genuine.
In 1987, Michael O. Rabin and Richard M. Karp applied this ancient wisdom to computer science. They realized that comparing two strings character-by-character is slow. But comparing two numbers is instantaneous.
They designed a way to turn any string into a unique ânumberâ (a Hash). If the numbers match, the strings likely match. This turned string searching from a linguistic problem into a mathematical one.
Why do we need it?
Rabin-Karp shines in Multiple Pattern Search and Plagiarism Detection.
- Plagiarism Checkers: If you want to check if a student copied a paragraph from 10,000 different articles, Rabin-Karp can scan all of them simultaneously by comparing hashes.
- Intrusion Detection: Searching for thousands of virus signatures in a single pass of a network packet.
- Rolling Hash: It is the core of
rsync, allowing it to find which parts of a file have changed without sending the whole file.
How the Algorithm âThinksâ
The algorithm is a mathematician who uses a Rolling Hash.
The Fingerprint (Hash): It turns the pattern
appleinto a number, say12345.The Sliding Window: It slides a window of 5 characters over the text.
- âhelloâ
56789(Not a match) - âworldâ
43210(Not a match)
- âhelloâ
The âRollingâ Magic: This is the secret. To calculate the hash of the next window, it doesnât re-read all 5 characters. It mathematically ârolls outâ the first character and ârolls inâ the new one.
Hash(next) = (Hash(current) - FirstChar) * Base + NewChar. This makes the hash calculation for every step.The Final Verification: If the hash matches, it performs a quick character-by-character check just to be 100% sure there wasnât a âHash Collisionâ (two different strings with the same number).
Engineering Context: The Power of Parallelism
The brilliance of Rabin-Karp is that it turns string matching into something a GPU or a Specialized Chip can do very well.
- Complexity: Average case is , but it allows you to search for patterns in almost the same time as one.
- Trade-off: It is sensitive to the choice of the Hash Function. If the hash is bad, you get too many âfalse alarmsâ (collisions), and the speed drops.
Implementation (Python)
def rabin_karp(text, pattern):
n, m = len(text), len(pattern)
if m > n: return -1
# A simple rolling hash (polynomial rolling hash)
base = 256
prime = 101 # A prime number to keep hashes small
p_hash = 0 # Hash of pattern
t_hash = 0 # Hash of current window in text
h = pow(base, m - 1) % prime # The value of the highest digit
# 1. Initial Fingerprint
for i in range(m):
p_hash = (base * p_hash + ord(pattern[i])) % prime
t_hash = (base * t_hash + ord(text[i])) % prime
# 2. Slide and Roll
for i in range(n - m + 1):
if p_hash == t_hash:
# 3. Final Verification (to avoid collisions)
if text[i:i+m] == pattern:
return i
if i < n - m:
# 4. The Rolling Math: Roll out text[i], Roll in text[i+m]
t_hash = (base * (t_hash - ord(text[i]) * h) + ord(text[i+m])) % prime
if t_hash < 0: t_hash += prime
return -1
# Example Usage
# print(rabin_karp("AABAACAADAABAABA", "AABA")) # Output: 0Summary
Rabin-Karp teaches us that abstraction is a shortcut. By turning complex data into simple numbers, we can process the world at the speed of math. It reminds us that sometimes, the best way to understand a face is to look at its fingerprint.
