Hash Collision

Hash Collision occurs when two different inputs produce the same hash output. While collisions are theoretically inevitable (infinite inputs, finite outputs), cryptographic hash functions are designed to make finding collisions computationally infeasible.

Types of Collision Attacks

  • Collision Attack: Find any two inputs with same hash
  • Pre-image Attack: Find input for a given hash
  • Second Pre-image: Find another input matching a specific input's hash

Birthday Paradox

// For n-bit hash, collision expected after ~2^(n/2) attempts
MD5 (128-bit):  ~2^64 operations (broken, practical)
SHA-1 (160-bit): ~2^80 operations (broken, demonstrated)
SHA-256 (256-bit): ~2^128 operations (secure)

Real-World Attacks

  • MD5: Practical collisions since 2004
  • SHA-1: SHAttered attack (2017) - first practical collision
  • Flame malware: Used MD5 collision to forge Windows Update certificates

Impact

// If two documents have same hash:
hash(legitimate.pdf) == hash(malicious.pdf)

// Digital signature on legitimate.pdf
// Also validates malicious.pdf!

// Certificate collision:
// Attacker gets CA to sign benign cert
// Creates malicious cert with same hash

Prevention

  • Use SHA-256 or SHA-3 (no practical attacks)
  • Don't use MD5 or SHA-1 for security purposes
  • For signatures, include randomness (nonce)
  • Use collision-resistant commitments

See Also