Bloom Filter
Bloom Filter
Introduction: The “Maybe” Machine
Suppose you want to know if a username is already taken among 1 billion users.
- Hash Table: Needs 20GB+ of RAM. Too expensive.
- Bloom Filter: Needs 100MB. But it might occasionally say “Yes” when the answer is actually “No” (False Positive). However, if it says “No,” it is 100% certain.
A Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set.
What Problem does it solve?
- Input: An element.
- Output:
False(Definitely not present) orTrue(Probably present). - The Promise: Massive memory savings for membership tests.
Core Idea (Intuition)
Imagine a Bit Array of size , all set to 0.
- Add an Item: Take the item, run it through different hash functions. Each hash gives you an index. Set the bits at those indices to 1.
- Query an Item: Hash it with the same functions.
- If any of the bits are 0, the item definitely was never added.
- If all bits are 1, the item might have been added (or other items just happened to flip those same bits).
Why use it?
It acts as a Cheap Filter in front of an Expensive Operation.
- Check Bloom Filter first.
- If “No”: Skip the expensive database/disk lookup.
- If “Yes”: Do the expensive lookup to be sure.
Typical Business Scenarios
✅ Weak Password Detection: Checking a password against a blacklist of 10 million leaked passwords without loading them all in RAM.
✅ Browser Safety: Google Chrome uses Bloom Filters to check if a URL is a known malicious site.
✅ Database Performance: Cassandra and BigTable use Bloom Filters to avoid searching data blocks on disk if the row key is not there.
✅ CDN Caching: Only caching a resource if it has been requested at least once before (One-Hit-Wonder protection).
❌ Deletion: Standard Bloom Filters do not support deletion. If you unset a bit, you might accidentally “delete” other items sharing that bit. Use a Counting Bloom Filter if deletion is required.
Performance & Complexity
- Time: where is the number of hash functions (constant time).
- Space: where is the bit array size. Usually around 10 bits per item for a 1% error rate.
Summary
“A Bloom Filter is the ultimate bouncer. It can tell you instantly who is definitely NOT on the list, saving you from searching the whole club for people who aren’t there.”
