Hash Functions: Why They're Not Encryption
1. Why Should You Care?
A developer stores user passwords like this:
hashed = sha256(password)
database.store(hashed)“It’s secure,” they say. “I’m using SHA-256, a strong hash function.”
Then their database leaks. Within hours, attackers have recovered 80% of the passwords.
What went wrong?
The developer confused “hashing” with “secure password storage.” These are not the same thing. SHA-256 is a hash function, not a password storage solution. Using it directly for passwords is like using a hammer as a screwdriver—it’s the wrong tool for the job.
2. Definition
A hash function takes input of any size and produces a fixed-size output (the “hash” or “digest”). It’s designed to be one-way: you can compute the hash from the input, but you cannot compute the input from the hash.
Key properties of cryptographic hash functions:
- Deterministic: Same input always produces same output
- Fixed output size: SHA-256 always outputs 256 bits, regardless of input size
- One-way: Computationally infeasible to reverse
- Collision-resistant: Hard to find two different inputs with the same hash
- Avalanche effect: Small input change creates drastically different output
3. The Fundamental Difference
Encryption: Two-Way by Design
Plaintext ──[Encrypt with Key]──► Ciphertext ──[Decrypt with Key]──► PlaintextEncryption is reversible. Given the key, you can always recover the original data.
Hashing: One-Way by Design
Input ──[Hash Function]──► Hash
↓
(No way back)Hashing is irreversible. There is no key. There is no decryption. The original data is mathematically destroyed—only a fingerprint remains.
Why the Confusion?
Both produce “garbled output” from readable input. But the purposes are completely different:
| Feature | Encryption | Hashing |
|---|---|---|
| Purpose | Hide data temporarily | Create fingerprint permanently |
| Reversible | Yes (with key) | No |
| Key required | Yes | No |
| Output size | Varies with input | Fixed |
| Use case | Protect data in transit/storage | Verify integrity, store passwords |
4. How Hash Functions Work
The High-Level Process
┌─────────────────────────────────────────────────────────────┐
│ 1. Padding │
│ - Add bits to make input a multiple of block size │
├─────────────────────────────────────────────────────────────┤
│ 2. Block Processing │
│ - Split into fixed-size blocks │
│ - Process each block through compression function │
│ - Each block's output feeds into next block │
├─────────────────────────────────────────────────────────────┤
│ 3. Finalization │
│ - Output the final internal state as the hash │
└─────────────────────────────────────────────────────────────┘The Avalanche Effect
This is what makes hashes useful for integrity checking:
import hashlib
text1 = "Hello, World!"
text2 = "Hello, World." # Just changed ! to .
print(hashlib.sha256(text1.encode()).hexdigest())
# dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
print(hashlib.sha256(text2.encode()).hexdigest())
# f8c3bf62a9aa3e6fc1619c250e48abe7519373d3edf41be62eb5dc45199af2efOne character change → completely different hash. This makes it impossible to “guess your way” to the original input.
5. Why You Can’t “Decrypt” a Hash
Information Loss
A hash function compresses arbitrary-length input into fixed-length output. Information is mathematically lost.
"Hello" (5 bytes) → 256-bit hash
"War and Peace" (3MB) → 256-bit hash
Every possible file ever → 256-bit hashInfinite inputs map to finite outputs. Multiple inputs will produce the same hash (collisions). You can’t reverse this because you don’t know which of the infinite possible inputs was used.
No Key, No Decryption
Encryption without the key is secure because finding the key is computationally infeasible.
Hashing has no key. There’s nothing to find. The “reversal” would require inverting a mathematical function designed to be non-invertible.
What “Breaking” a Hash Means
When we say a hash is “broken,” we mean:
- Collision attack: Found two different inputs with the same hash
- Preimage attack: Given a hash, found an input that produces it (not necessarily the original)
Neither means “decryption.” Even a broken hash function doesn’t become reversible.
6. MD5: Why It Won’t Die
MD5 was designed in 1991. It’s been “broken” since 2004. Yet you still see it everywhere.
Why MD5 Is Broken
- Collision attacks are practical: You can create two different files with the same MD5 hash
- Chosen-prefix attacks work: Given any two prefixes, you can append data to each that results in the same hash
- It’s too fast: 9.5 billion MD5 hashes per second on modern GPUs
Why MD5 Won’t Die
# Still seen in production systems:
file_checksum = hashlib.md5(file_content).hexdigest() # "Just for integrity"
cache_key = hashlib.md5(query).hexdigest() # "Just for cache keys"People argue: “I’m not using it for security, just checksums.”
The problem: Requirements change. Today’s “just a checksum” becomes tomorrow’s security control. And MD5 is so fast that even non-security uses enable attacks.
When MD5 Is Actually Fine
- Comparing files you fully control
- Non-security checksums in closed systems
- Legacy system compatibility (with full awareness of risks)
When MD5 Is Not Fine
- Any security-sensitive application
- User-facing file verification
- Password hashing (never!)
- Digital signatures
- Certificate validation
7. Password Storage: Hashing Done Right
Here’s why sha256(password) fails:
Problem 1: Speed
SHA-256 is designed to be fast. Very fast.
SHA-256: ~8,500,000,000 hashes/second (GPU)
bcrypt: ~71,000 hashes/second (same GPU)
Argon2: ~1,000 hashes/second (same GPU, tuned)Fast hashing means fast cracking. An 8-character password has about 6 quadrillion possibilities. At 8.5 billion/second, that’s 8 days to try them all.
Problem 2: No Salt
Without salt, identical passwords have identical hashes.
Database leak:
user1: 5e884898da28047d9... ← "password"
user2: 5e884898da28047d9... ← Also "password"
user3: 5e884898da28047d9... ← Also "password"Attackers precompute hashes for common passwords (rainbow tables). One lookup, thousands of accounts compromised.
Problem 3: Rainbow Tables
Precomputed tables mapping common passwords to their hashes. With SHA-256 alone, a 10GB rainbow table can crack most weak passwords instantly.
The Solution: Password Hashing Functions
import bcrypt
import argon2
# bcrypt: Proven, widely supported
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
# argon2: Modern winner of Password Hashing Competition
ph = argon2.PasswordHasher(
time_cost=2,
memory_cost=102400, # 100 MB
parallelism=8
)
hashed = ph.hash(password)These functions are:
- Deliberately slow: Configurable work factor
- Memory-hard: (Argon2) Requires significant RAM, defeating GPU attacks
- Automatically salted: Each hash is unique even for identical passwords
The Correct Password Storage Flow
Registration:
password → [Salt + Slow Hash] → stored_hash
Verification:
input_password + stored_salt → [Same Slow Hash] → compare with stored_hash8. Hash Function Selection Guide
| Use Case | Recommended | Avoid |
|---|---|---|
| Password storage | Argon2id, bcrypt, scrypt | SHA-*, MD5 |
| File integrity | SHA-256, SHA-3, BLAKE3 | MD5, SHA-1 |
| Digital signatures | SHA-256, SHA-3 | MD5, SHA-1 |
| HMAC | SHA-256, SHA-3 | MD5 |
| Non-security checksums | CRC32, xxHash | (Any is fine) |
| Content-addressable storage | SHA-256, BLAKE3 | MD5 |
9. Code Example: Proper Password Handling
import argon2
from argon2 import PasswordHasher, exceptions
# Configuration: Adjust based on your server's capabilities
ph = PasswordHasher(
time_cost=2, # Number of iterations
memory_cost=65536, # 64 MB memory usage
parallelism=4, # Number of parallel threads
hash_len=32, # Output hash length
salt_len=16 # Salt length
)
def hash_password(password: str) -> str:
"""Hash a password for storage."""
return ph.hash(password)
def verify_password(stored_hash: str, password: str) -> bool:
"""Verify a password against stored hash."""
try:
ph.verify(stored_hash, password)
return True
except exceptions.VerifyMismatchError:
return False
except exceptions.InvalidHashError:
# Hash format is invalid
return False
def needs_rehash(stored_hash: str) -> bool:
"""Check if password needs to be rehashed (parameters changed)."""
return ph.check_needs_rehash(stored_hash)
# Usage
password = "user_password_here"
# Registration
hashed = hash_password(password)
print(f"Stored hash: {hashed}")
# $argon2id$v=19$m=65536,t=2,p=4$...
# Login
if verify_password(hashed, password):
print("Login successful")
# Check if we should upgrade the hash
if needs_rehash(hashed):
new_hash = hash_password(password)
# Update database with new_hash10. Common Misconceptions
| Misconception | Reality |
|---|---|
| ”I can decrypt a hash with enough computing power” | No. Hashing destroys information. There’s nothing to decrypt. |
| ”SHA-256 is good for password storage” | SHA-256 is too fast. Use bcrypt/Argon2. |
| ”Longer hash = more secure” | Security depends on the algorithm, not just length. SHA-512 isn’t “twice as secure” as SHA-256. |
| ”I’ll hash the password twice for extra security” | This doesn’t help and can actually reduce security in some cases. |
| ”MD5 is fine for non-security purposes” | Until requirements change. Use SHA-256 even for “just checksums.” |
11. Summary
Three things to remember:
Hashing is not encryption. Hash functions are one-way by design. You cannot and should not expect to “decrypt” a hash. There’s no key, no reversal, just a fingerprint.
Speed is the enemy for password storage. General-purpose hash functions like SHA-256 are designed to be fast. Password hashing functions like Argon2 are designed to be slow. Use the right tool for the job.
MD5 and SHA-1 are deprecated for security. Even for “just checksums,” prefer SHA-256 or BLAKE3. Requirements change, and you don’t want to be caught with broken cryptography when they do.
12. What’s Next
We’ve covered encryption, hashing, and their differences. But we’ve glossed over something critical: where do keys and salts come from?
In the next article, we’ll explore: Random numbers—the most underestimated component in cryptographic systems, and why rand() can kill your security.
