Security Glossary

Hash Collision

When two different inputs produce the same hash output, potentially allowing attackers to substitute one document for another while maintaining a valid signature.

Hash Collision occurs when two different inputs produce the same hash output. While collisions are theoretically inevitable (infinite inputs, finite outputs), cryptographic hash functions are designed to make finding collisions computationally infeasible.

Types of Collision Attacks

Collision Attack: Find any two inputs with same hash
Pre-image Attack: Find input for a given hash
Second Pre-image: Find another input matching a specific input's hash

Birthday Paradox

// For n-bit hash, collision expected after ~2^(n/2) attempts
MD5 (128-bit):  ~2^18 in practice (birthday bound 2^64; broken)
SHA-1 (160-bit): ~2^80 operations (broken, demonstrated)
SHA-256 (256-bit): ~2^128 operations (secure)

Real-World Attacks

MD5: Practical collisions since 2004
SHA-1: SHAttered attack (2017) - first practical collision
Flame malware: Used MD5 collision to forge Windows Update certificates

Impact

// If two documents have same hash:
hash(legitimate.pdf) == hash(malicious.pdf)

// Digital signature on legitimate.pdf
// Also validates malicious.pdf!

// Certificate collision:
// Attacker gets CA to sign benign cert
// Creates malicious cert with same hash

Prevention

Use SHA-256 or SHA-3 (no practical attacks)
Don't use MD5 or SHA-1 for security purposes
For signatures, include randomness (nonce)
Use collision-resistant commitments