T-79.159 Cryptography and Data Security
Lecture 4: Hashes and Message Digests
Markku-Juhani O. SaarinenHelsinki University of Technology
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
1
Cryptographic hash functions
• Maps a message M (a bit string of arbitrary length) as a “messagedigest” X = H(M) of constant length, e.g. 128, 160, or 256 bits.
• Well-known examples: MD5, SHA-1, RIPEMD-160, SHA-256.
• Security requirement 1:One-wayness. Given a message X, it is should be “hard” to find amessage M satisfying X = H(M).
• Security requirement 2:Collision resistance. It should be “hard” to find two messagesM1 6= M2 such that H(M1) = H(M2).
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
2
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
3
UNIX Password authentication
1. User enters a password (key):Login: falken
Password: ******
2. System looks up user in /etc/passwd file and finds the correspond-ing hashed key value and other relevant data:falken: cV/h5TT95.pzQ :1085:1085:Prof. Falken
3. First 2 chars, cV, is the salt. Now the system compares the output ofthe crypt system call to the encrypted string:char *crypt(const char *key, const char *salt);
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
4
UNIX Password authentication (2)
• No need to store the key itself, just H(salt || key)
• The password file /etc/passwd can be world-readable! (And oftenis, although this makes systems more vulnerable to dictionary attacks.)
• Salt slows down dictionary attacks. To check whether some user (froma large group) has a given password, the word has to be hashed witheach one of the salts.
• UNIX crypt(3) is one-way, but not really collision resistant. Basedon DES. Developed by Robert Morris (Sr.) ca. 1975 – still in use today.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
5
SHA-1 and MD5 Fingerprints
• How do you know that your system files have not been tampered with(by viruses or trojans installed by intruders) ?
• One way is to maintain a database of file fingerprints and comparethem to known good values (e.g. www.knowngoods.org).
• Length checking is not sufficient; simple “checksums” won’t be secureenough. One-wayness clearly a requirement.
• Example: Computing a 128-bit MD5 digest of Linux kernel:$ md5sum /boot/vmlinuz95fb55766efa90bfe10c25cd2e9daaa4 /boot/vmlinuz
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
6
Collision resistance
• What if the software distributor tries to cheat ? Could he create a“good” file and a “bad” file (say, with a back-door), such that they havethe same digest ?
• This is different from one-wayness, since the distributor can createboth files (good and bad ones) simultaneously.
• If a n-bit hash is one-way, it takes 2n effort to find a message M sat-isfying H(M) = X, given just X.
• If a n-bit hash is collision-resistant, it takes no more than√
2n = 2n/2
to find two messages M1 6= M2 such that H(M1) = H(M2). Why ?
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
7
Birthday paradox
Question:
“How many persons needs to be in a room before we can expect two ofthem to have the same birthday?”
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
8
Birthday paradox
Question:
“How many persons needs to be in a room before we can expect two ofthem to have the same birthday?”
Answer:
23.
Why ?
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
9
Birthday paradox (2)
n persons make up exactly n(n−1)2 pairs.
Each pair has probability 364365 of not having the same birthday. Since these
events are very close to being unrelated, the total probability of no-one
having the same birthday is roughly (364365)
n(n−1)2 .
Substituting n = 23 we get (364365)
253 ≈ 0.499523.
(So this is not a “paradox” at all.)
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
10
Birthday paradox (3)More generally: We wish to find n (“number of persons”) as a function ofm (“number of days in year”), so that probability of a match is 1
2:
(1− 1m)
n(n−1)2 = 1
2, taking logs:
n(n−1)2 ln(1− 1
m) = − ln 2.
When x > 2, there is a bound −1x −
1x2 < ln(1− 1
x) < −1x.
We get an approximation 0.7213 ∗ (n2 − n) ≈ m.
Asymptotically n = O(√
m).
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
11
How to find collisions
The obvious (but very memory-intensive and hence inefficient) algorithm:
• Initialize a table that can hold√
n pairs of x values. The table is in-dexed by first 1
2 lg√
n bits of H(x).
• For x = 1,2,3, · · · : Compute H(x) and check if the table at positionindexed by H(x) already has a entry. If an entry exists (say y), verifycollision H(x) = H(y) and quit. Otherwise just store x in the tableposition.
This will take about O(√
n) time and O(√
n) memory, e.g. if n = 2128,roughly 264 iterations and memory slots. The memory factor is the pre-ventive one even if we manage to run the 264 steps.T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
12
Floyd’s cycle finding algorithm (1)
Consider a sequence where we start from some x0 and iteratively computea sequence x1, x2, · · · as the hash of the previous value:
xi+1 = H(xi)
We have seen that after about√
n steps, a collision will probably occur:there will be a pair xα and xβ so that xα = xβ but xα−1 6= xβ−1.
α is called the tail of the cycle.
δ = β − α is the cycle length.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
13
Floyd’s cycle finding algorithm (2)
Here a collision occurs at x3 = x14.Hence “tail” α = 3, β = 14 and cycle length β − α = δ = 11.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
14
Floyd’s cycle finding algorithm (2)
• Clearly xi = xi+δ when i ≥ α.
• Hence xi = x2i when 2i = i + δ; i = δ (the cycle length).
Thus we can find the cycle length by starting with (x0, x0) and compute(x1, x2), (x2, x4), (x3, x6), · · · , (xi, x2i). (i.e. stop when xi = x2i).
Three hash function invocations needed in each step. Then i will have thecycle length δ.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
15
Finding the collision
From previous step, we have xδ. Now we compute the sequence
(x0, xδ), (x1, xδ+1), (x2, xδ+2), · · · , (xα, xδ+α)
.. i.e. stop when H(xi) = H(xδ+i). Two hash function invocations areneeded in each step. At the end i = α−1, and hence we have the collisionsince xi 6= xδ+i.
This simple algorithm requires 3δ + 2α invocations of the hash function,and therefore it is asymptotically optimal. However, the memory require-ment is very small!
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
16
Collision finding, pseudocode:
1. Initialize: a← 0, b← 0.
2. Do: a← H(a), b← H(H(b)) Until a = b.
3. Set: b← 0.
4. Do: Store (x, y)← (a, b). a← H(a), b← H(b) until a = b.
When the algorithm terminates: H(x) = H(y), but x 6= y, a collision !
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
17
Rules of thumb
• As implicated by the birthday paradox, there are algorithms that finda collision (birthday match) with O(
√m) effort. Neglible memory is
required by the algorithms.
• Hence to have collision resistance with n-bit security, the hash shouldbe at least 2n bits long; e.g. 128-bit hashes give 64-bit security.
• If only one-wayness is required, then n bits is sufficient for n-bit secu-rity.
• Beware that some hash functions (like MD4) have been broken; theydo not have the security level implicated by hash size.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
18
How do hash functions actually work?
• Additional design requirement besides one-wayness and collision re-sistance: it should be possible to hash long messages without storingthe whole thing in memory (e.g. signing a backup tape).
• Long message is cut into pieces Mi of equal size and a state variableXi is maintained.
• The last piece Mn is padded with the length of message and the finalvalue of the state variable Xn is the hash.
• Many other approaches have been proposed, but almost all practicalhash functions work like this.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
19
Davies-Meyer (1985)
• Use a block cipher E(K, P ). Start with some initial value X0 andupdate as Xi+1 = E(Mi, Xi)⊕Xi. Final value Xn is the hash.
• Provably secure (if the block cipher is secure).
• Since each piece Mi is used to key the block cipher, hashing speedis directly proportional to key size (rather than block size). Resultinghash size is equal to block size.
• Most block ciphers are optimized for fast encryption rather than fastkey initialization; hence dedicated hash functions. E(Mi, Xi) ⊕Xi iscalled “compression function” in the context of these dedicated hashfunctions.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
20
Message Digest 5 (MD5)
• Very widely used hash function (message digest). Fingerprints, PGP2.x, PKI x509, etc.
• Designed by Ron Rivest (MIT), 1992. Specified in RFC 1321. MD5means that this is Rivest’s fifth message digest design.
• Produces a 128-bit hash; has no more than 64-bit security. Processesmessages in 512-bit blocks.
• Hans Dobbertin (BSI) found a flaw in the compression function of MD5in 1996; hence its security proofs do not hold. However, collisions havenot been computed yet. Do not use in new products.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
21
Secure Hash Algorithm - 1 (SHA-1)
• U.S. / NIST federal standard 180-1/2. Currently the most popular cryp-tographic hash algorithm.
• Produces a 160-bit hash; 80-bit security. Processes messages in 512-bit blocks. Similar in design to MD4 and MD5.
• Designed by unknown persons at NSA in 1993 (original design isknown as SHA-0). Slightly modified for (then) unspecified reasons in1995. New version known as SHA-1.
• Chabaud and Joux (CASSI/SCY/EC) published in 1998 an attackagainst SHA-0 (collisions with 261 effort rather than 280) that showedthat SHA-1 was indeed more secure than SHA-0.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
22
SHA - 1 (2)
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
23
Other dedicated hash algorithms
• RIPE-MD 160 is a robust European hash function. 160-bit hash.
• In 2000, NSA proposed new hash functions that produce 256- and 512bit hashes. Known as SHA-256 and SHA-512.
• Some speed measurements on a 1.4 GHz AMD Athlon Linux:
MD2 5 010 kB/s MD4 274 556 kB/sMD5 238 392 kB/s SHA-1 127 283 kB/s
RIPE MD-160 84 896 kB/s
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
24
Message Authentication Codes (MACs)
• Protects against unauthorized or accidental message manipulation.
• Uses a secret key K to make sure that a message is actually fromits assumed sender. MAC is appended to the message. Recipientcomputes the MAC again from the message and K and verifies it.
• It seems natural to use dedicated hash functions for computation ofMACs (fast!), especially if encryption isn’t needed.
• Many MACs have been proposed, the most common being HMAC(“hash MAC”), Krawczyk et al (IBM), 1997.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
25
A Stupid MAC
Question:“Hey! Why not just append the message after the key, hash the whole thingand use that as a MAC ?” (i.e. MAC = H(K |M))
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
26
A Stupid MAC
Question:“Hey! Why not just append the message after the key, hash the whole thingand use that as a MAC ?” (i.e. A = H(K |M))
Answer:Eve sees the message M and the MAC A. Because of the way the Davies-Meyer mode works, she has the state of the hash function Xn = A atthe end of the current message M . Now she can just add anything afterthat and compute more iterations Xn+1, Xn+2, · · · with the compressionfunction, and finally do a new padding.
MAC must detect changes in the message length as well!
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
27
HMAC
• Defined in RFC 2104. Can be used used with many dedicated hashfunctions: HMAC-MD5, HMAC-SHA1, HMAC-RIPEMD.
• The output can be truncated by simply taking the first n bits of output(e.g. HMAC-SHA1-96 is used in the IPSEC protocol).
• Uses two constants, ipad (64 0x36 bytes) and opad (64 0x5c bytes).
• Defined as H(K ⊕ opad | H(K ⊕ ipad |M))
• Only slightly slower than computation of H(M) for long messages.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
28
Key generators
• Where do all of the cryptographic keys come from ?
• Example: AES Needs a 128-bit (16 byte) key, but 16 letters of En-glish contains less than 32 bits of entropy: Directly using a human-understandable key is not a good idea.
• Solution: hash the key first. This way the input key can be of anylength! Such long keys are often called passphrases.
• If protocols need random, unpredictable values (nonces), use properrandom number generators. These are often based on hash functions.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
29
Pseudorandom Number Generators (PRNGs)
Cautionary tale of the Netscape PRNG in 1995.
• Netscape Navigator 1.1 had the first version of the now-popular SSLprotocol. Keys for encryption were generated using a PRNG.
• The PRNG was initialized from time() on program startup and theconsequent outputs were deterministically based on this seed.
• Guess the 32-bit time value (which is not a secret; everyone has aclock) and you can predict all future outputs of the PRNG!
• Since the eavesdropper knows the outputs of the PRNG, she knowsthe keys and she can eavesdrop, regardless of encryption strength.
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
30
PRNGs (2)
• Most OS’s nowadays have built-in cryptographic random number gen-erators for key generation. On UNIX systems:
˜> hexdump /dev/random0000000 d938 cb3d e578 7525 292d 68e3 0bd6 16c40000010 9cbb d6dc c662 9e5b c326 501b [...]
• The randomness is contained in a random state (or pool) and it is con-stantly stirred by events that the operating system gathers: mouse andkeyboard inputs, interrupt timings, network events etc. Cryptographichash functions are used to mix the pool (SHA-1 on Linux).
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
31
A Simple PRNG Based on a Hash Function
Stir new input data to state:State = H(State | counter++ | new input data)
Extract randomness:Output = H(State | counter++)
.. of course it is good to remember ..
“Anyone who considers arithmetical methods of producing random digitsis, of course, in a state of sin.” – John von Neumann (1951)
.. and to use RNGs if available!
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
32
Digital Signatures
When signing a message using a public key digital signature algorithm,it is not necessary to sign the message itself. It is sufficient to sign acryptographic hash (message digest) of the message.
Signing:Signature = Sign(SHA-1(Message), Private Key)
Verifying:Verify(SHA-1(Message), Signature, Public Key) = OK/FAIL
Note; signature algorithm doesn’t even need the message; only its hash issufficient. More on this in the next lecture..
T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,
Markku-Juhani O. Saarinen
33