Cuckoo Filter
Cuckoo Filter
Introduction: Bloom Filter’s Missing Feature
Bloom Filters (Chapter 4.4) are great, but they have a fatal flaw: You cannot delete items. If you try to unset a bit, you might accidentally delete 10 other items that shared that bit.
Cuckoo Filter is a modern probabilistic data structure that solves this. It supports deletion, uses less space for low false-positive rates, and offers better lookup performance by utilizing a technique called “Cuckoo Hashing.”
What Problem does it solve?
- Input: An element.
- Output:
False(Definitely not present) orTrue(Probably present). - The Promise: Membership testing with deletion support and higher space efficiency for high-precision requirements.
Core Idea (Intuition)
Think of a Cuckoo Bird. It kicks other birds out of their nests to lay its own eggs.
- Buckets: Instead of a bit array, we have an array of buckets, each holding a few “Fingerprints” (small hashes of the items).
- Two Nests: Every item has two potential bucket locations.
- The Kick Out: If both potential buckets are full, the new item kicks out an existing resident. The kicked-out resident then flies to its own alternative bucket, potentially kicking out someone else. This “cuckoo” behavior continues until everyone finds a home.
Why use it over Bloom Filter?
- ✅ Deletion: You can safely remove an item by deleting its fingerprint from its bucket.
- ✅ Performance: It only requires checking 2 buckets, which is very cache-friendly compared to Bloom Filter’s random bit lookups.
- ✅ Space: It is more space-efficient than Bloom Filter when the target false-positive rate is very low (e.g., < 3%).
Typical Business Scenarios
✅ Dynamic Blacklists: A firewall that needs to block IPs temporarily and then unblock them (Deletion support is key).
✅ Inventory Management: Checking if a specific SKU is in a massive warehouse cache where items are frequently added and removed.
✅ Deduplication in Streams: Identifying if we have seen this specific transaction ID in the last hour.
❌ Full Capacity: If the filter gets too full, the “kicking out” process can enter an infinite loop or fail. You must keep the load factor below .
Performance & Complexity
- Time: for lookup, insertion, and deletion.
- Space: Slightly better than Bloom Filter for high precision; supports deletion.
Summary
“Cuckoo Filter is the ‘Smart Bloom Filter’. By storing fingerprints and allowing items to move between nests, it provides the one thing Bloom Filters can’t: a ‘Delete’ button.”
