B-Tree
B-Tree: The Disk Giant
The Story: The Short Librarian
Imagine a library with 1 billion books.
- Scenario A (Binary Tree): To find a book, you start at the entrance. âLeft or Right?â You go left. âLeft or Right?â You go right. You might have to walk through 30 doors (height of tree) to find the book.
- Scenario B (B-Tree): The librarian is short but has massive arms. At the entrance, he holds 1,000 index cards. He says: âThe book is in Room #452.â You walk to Room #452. There, another librarian holds 1,000 cards. âShelf #12.â You go to Shelf #12. Found it.
In Scenario B, you only walked through 3 doors. Why? Because each node (librarian) held way more information.
This is the B-Tree. It is designed for Hard Drives, where âwalkingâ (seeking) is slow, but âreadingâ (transferring data) is fast.
Why do we need it?
- Databases: MySQL (InnoDB), PostgreSQL, SQLite all use B-Trees (or B+ Trees) for their indexes.
- File Systems: NTFS (Windows), HFS+ (Mac), XFS all use B-Trees to track files.
- Why not BST? A BST on disk would require 30 random disk seeks to find a record. That takes ~300ms. A B-Tree needs only 3 seeks (~30ms).
How the Algorithm âThinksâ
The algorithm is a Multi-Way Manager.
- The Fat Node: Instead of 1 key and 2 children, a node has up to keys and children (e.g., M=1000).
- The Sort: Keys inside a node are sorted. We can use Binary Search inside the node (which is in RAM, so itâs fast).
- The Growth:
- Insertions happen at the Leaf.
- If a leaf is full, it splits in half. The middle key kicks up to the parent.
- If the parent is full, it splits too.
- The tree grows upwards, keeping the root at the top.
Engineering Context: B-Tree vs. B+ Tree
In practice, almost everyone uses the B+ Tree variant.
- B-Tree: Data is stored in all nodes.
- B+ Tree: Data is stored only in leaves. Internal nodes are just signposts (keys). Leaves are linked together in a chain.
- Benefit: Scanning âAll users from ID 100 to 200â is super fast. You just find 100 and walk the linked list of leaves.
Implementation (Conceptual Python)
Implementing a B-Tree is a 500-line project. Here is the search logic to show the âMulti-wayâ nature.
class BTreeNode:
def __init__(self, leaf=False):
self.leaf = leaf
self.keys = []
self.children = []
def search(node, k):
# Find the first key greater than or equal to k
i = 0
while i < len(node.keys) and k > node.keys[i]:
i += 1
# 1. Found it in this node?
if i < len(node.keys) and node.keys[i] == k:
return (node, i)
# 2. If leaf, we failed
if node.leaf:
return None
# 3. Recurse into the correct child
return search(node.children[i], k)
# The magic is that 'node.children[i]' narrows the search space
# by a factor of M (e.g., 1000) instead of just 2.Summary
The B-Tree teaches us that hardware dictates structure. When the cost of movement (Disk I/O) is high, you should carry as much as you can in one trip. It reminds us that efficient software isnât just about ; itâs about understanding the physical reality of the machine.
