Lecture 4: Hashes and Message Digests · Lecture 4: Hashes and Message Digests Markku-Juhani O....

T-79.159 Cryptography and Data Security

Lecture 4: Hashes and Message Digests

Markku-Juhani O. SaarinenHelsinki University of Technology

[email protected]

T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,

Markku-Juhani O. Saarinen

1

Cryptographic hash functions

• Maps a message M (a bit string of arbitrary length) as a “messagedigest” X = H(M) of constant length, e.g. 128, 160, or 256 bits.

• Well-known examples: MD5, SHA-1, RIPEMD-160, SHA-256.

• Security requirement 1:One-wayness. Given a message X, it is should be “hard” to find amessage M satisfying X = H(M).

• Security requirement 2:Collision resistance. It should be “hard” to find two messagesM1 6= M2 such that H(M1) = H(M2).



2



3

UNIX Password authentication

1. User enters a password (key):Login: falken

Password: ******

2. System looks up user in /etc/passwd file and finds the correspond-ing hashed key value and other relevant data:falken: cV/h5TT95.pzQ :1085:1085:Prof. Falken

3. First 2 chars, cV, is the salt. Now the system compares the output ofthe crypt system call to the encrypted string:char *crypt(const char *key, const char *salt);



4

UNIX Password authentication (2)

• No need to store the key itself, just H(salt || key)

• The password file /etc/passwd can be world-readable! (And oftenis, although this makes systems more vulnerable to dictionary attacks.)

• Salt slows down dictionary attacks. To check whether some user (froma large group) has a given password, the word has to be hashed witheach one of the salts.

• UNIX crypt(3) is one-way, but not really collision resistant. Basedon DES. Developed by Robert Morris (Sr.) ca. 1975 – still in use today.



5

SHA-1 and MD5 Fingerprints

• How do you know that your system files have not been tampered with(by viruses or trojans installed by intruders) ?

• One way is to maintain a database of file fingerprints and comparethem to known good values (e.g. www.knowngoods.org).

• Length checking is not sufficient; simple “checksums” won’t be secureenough. One-wayness clearly a requirement.

• Example: Computing a 128-bit MD5 digest of Linux kernel:$ md5sum /boot/vmlinuz95fb55766efa90bfe10c25cd2e9daaa4 /boot/vmlinuz



6

Collision resistance

• What if the software distributor tries to cheat ? Could he create a“good” file and a “bad” file (say, with a back-door), such that they havethe same digest ?

• This is different from one-wayness, since the distributor can createboth files (good and bad ones) simultaneously.

• If a n-bit hash is one-way, it takes 2n effort to find a message M sat-isfying H(M) = X, given just X.

• If a n-bit hash is collision-resistant, it takes no more than√

2n = 2n/2

to find two messages M1 6= M2 such that H(M1) = H(M2). Why ?



7

Birthday paradox

Question:

“How many persons needs to be in a room before we can expect two ofthem to have the same birthday?”



8

Birthday paradox

Question:

“How many persons needs to be in a room before we can expect two ofthem to have the same birthday?”

Answer:

23.

Why ?



9

Birthday paradox (2)

n persons make up exactly n(n−1)2 pairs.

Each pair has probability 364365 of not having the same birthday. Since these

events are very close to being unrelated, the total probability of no-one

having the same birthday is roughly (364365)

n(n−1)2 .

Substituting n = 23 we get (364365)

253 ≈ 0.499523.

(So this is not a “paradox” at all.)



10

Birthday paradox (3)More generally: We wish to find n (“number of persons”) as a function ofm (“number of days in year”), so that probability of a match is 1

2:

(1− 1m)

n(n−1)2 = 1

2, taking logs:

n(n−1)2 ln(1− 1

m) = − ln 2.

When x > 2, there is a bound −1x −

1x2 < ln(1− 1

x) < −1x.

We get an approximation 0.7213 ∗ (n2 − n) ≈ m.

Asymptotically n = O(√

m).



11

How to find collisions

The obvious (but very memory-intensive and hence inefficient) algorithm:

• Initialize a table that can hold√

n pairs of x values. The table is in-dexed by first 1

2 lg√

n bits of H(x).

• For x = 1,2,3, · · · : Compute H(x) and check if the table at positionindexed by H(x) already has a entry. If an entry exists (say y), verifycollision H(x) = H(y) and quit. Otherwise just store x in the tableposition.

This will take about O(√

n) time and O(√

n) memory, e.g. if n = 2128,roughly 264 iterations and memory slots. The memory factor is the pre-ventive one even if we manage to run the 264 steps.T-79.159 Cryptography and Data Security, 11.02.2004 Lecture 4: Hashes and Message Digests,


12

Floyd’s cycle finding algorithm (1)

Consider a sequence where we start from some x0 and iteratively computea sequence x1, x2, · · · as the hash of the previous value:

xi+1 = H(xi)

We have seen that after about√

n steps, a collision will probably occur:there will be a pair xα and xβ so that xα = xβ but xα−1 6= xβ−1.

α is called the tail of the cycle.

δ = β − α is the cycle length.



13


Here a collision occurs at x3 = x14.Hence “tail” α = 3, β = 14 and cycle length β − α = δ = 11.



14


• Clearly xi = xi+δ when i ≥ α.

• Hence xi = x2i when 2i = i + δ; i = δ (the cycle length).

Thus we can find the cycle length by starting with (x0, x0) and compute(x1, x2), (x2, x4), (x3, x6), · · · , (xi, x2i). (i.e. stop when xi = x2i).

Three hash function invocations needed in each step. Then i will have thecycle length δ.



15

Finding the collision

From previous step, we have xδ. Now we compute the sequence

(x0, xδ), (x1, xδ+1), (x2, xδ+2), · · · , (xα, xδ+α)

.. i.e. stop when H(xi) = H(xδ+i). Two hash function invocations areneeded in each step. At the end i = α−1, and hence we have the collisionsince xi 6= xδ+i.

This simple algorithm requires 3δ + 2α invocations of the hash function,and therefore it is asymptotically optimal. However, the memory require-ment is very small!



16

Collision finding, pseudocode:

1. Initialize: a← 0, b← 0.

2. Do: a← H(a), b← H(H(b)) Until a = b.

3. Set: b← 0.

4. Do: Store (x, y)← (a, b). a← H(a), b← H(b) until a = b.

When the algorithm terminates: H(x) = H(y), but x 6= y, a collision !



17

Rules of thumb

• As implicated by the birthday paradox, there are algorithms that finda collision (birthday match) with O(

√m) effort. Neglible memory is

required by the algorithms.

• Hence to have collision resistance with n-bit security, the hash shouldbe at least 2n bits long; e.g. 128-bit hashes give 64-bit security.

• If only one-wayness is required, then n bits is sufficient for n-bit secu-rity.

• Beware that some hash functions (like MD4) have been broken; theydo not have the security level implicated by hash size.



18

How do hash functions actually work?

• Additional design requirement besides one-wayness and collision re-sistance: it should be possible to hash long messages without storingthe whole thing in memory (e.g. signing a backup tape).

• Long message is cut into pieces Mi of equal size and a state variableXi is maintained.

• The last piece Mn is padded with the length of message and the finalvalue of the state variable Xn is the hash.

• Many other approaches have been proposed, but almost all practicalhash functions work like this.



19

Davies-Meyer (1985)

• Use a block cipher E(K, P ). Start with some initial value X0 andupdate as Xi+1 = E(Mi, Xi)⊕Xi. Final value Xn is the hash.

• Provably secure (if the block cipher is secure).

• Since each piece Mi is used to key the block cipher, hashing speedis directly proportional to key size (rather than block size). Resultinghash size is equal to block size.

• Most block ciphers are optimized for fast encryption rather than fastkey initialization; hence dedicated hash functions. E(Mi, Xi) ⊕Xi iscalled “compression function” in the context of these dedicated hashfunctions.



20

Message Digest 5 (MD5)

• Very widely used hash function (message digest). Fingerprints, PGP2.x, PKI x509, etc.

• Designed by Ron Rivest (MIT), 1992. Specified in RFC 1321. MD5means that this is Rivest’s fifth message digest design.

• Produces a 128-bit hash; has no more than 64-bit security. Processesmessages in 512-bit blocks.

• Hans Dobbertin (BSI) found a flaw in the compression function of MD5in 1996; hence its security proofs do not hold. However, collisions havenot been computed yet. Do not use in new products.



21

Secure Hash Algorithm - 1 (SHA-1)

• U.S. / NIST federal standard 180-1/2. Currently the most popular cryp-tographic hash algorithm.

• Produces a 160-bit hash; 80-bit security. Processes messages in 512-bit blocks. Similar in design to MD4 and MD5.

• Designed by unknown persons at NSA in 1993 (original design isknown as SHA-0). Slightly modified for (then) unspecified reasons in1995. New version known as SHA-1.

• Chabaud and Joux (CASSI/SCY/EC) published in 1998 an attackagainst SHA-0 (collisions with 261 effort rather than 280) that showedthat SHA-1 was indeed more secure than SHA-0.



22

SHA - 1 (2)



23

Other dedicated hash algorithms

• RIPE-MD 160 is a robust European hash function. 160-bit hash.

• In 2000, NSA proposed new hash functions that produce 256- and 512bit hashes. Known as SHA-256 and SHA-512.

• Some speed measurements on a 1.4 GHz AMD Athlon Linux:

MD2 5 010 kB/s MD4 274 556 kB/sMD5 238 392 kB/s SHA-1 127 283 kB/s

RIPE MD-160 84 896 kB/s



24

Message Authentication Codes (MACs)

• Protects against unauthorized or accidental message manipulation.

• Uses a secret key K to make sure that a message is actually fromits assumed sender. MAC is appended to the message. Recipientcomputes the MAC again from the message and K and verifies it.

• It seems natural to use dedicated hash functions for computation ofMACs (fast!), especially if encryption isn’t needed.

• Many MACs have been proposed, the most common being HMAC(“hash MAC”), Krawczyk et al (IBM), 1997.



25

A Stupid MAC

Question:“Hey! Why not just append the message after the key, hash the whole thingand use that as a MAC ?” (i.e. MAC = H(K |M))



26

A Stupid MAC

Question:“Hey! Why not just append the message after the key, hash the whole thingand use that as a MAC ?” (i.e. A = H(K |M))

Answer:Eve sees the message M and the MAC A. Because of the way the Davies-Meyer mode works, she has the state of the hash function Xn = A atthe end of the current message M . Now she can just add anything afterthat and compute more iterations Xn+1, Xn+2, · · · with the compressionfunction, and finally do a new padding.

MAC must detect changes in the message length as well!



27

HMAC

• Defined in RFC 2104. Can be used used with many dedicated hashfunctions: HMAC-MD5, HMAC-SHA1, HMAC-RIPEMD.

• The output can be truncated by simply taking the first n bits of output(e.g. HMAC-SHA1-96 is used in the IPSEC protocol).

• Uses two constants, ipad (64 0x36 bytes) and opad (64 0x5c bytes).

• Defined as H(K ⊕ opad | H(K ⊕ ipad |M))

• Only slightly slower than computation of H(M) for long messages.



28

Key generators

• Where do all of the cryptographic keys come from ?

• Example: AES Needs a 128-bit (16 byte) key, but 16 letters of En-glish contains less than 32 bits of entropy: Directly using a human-understandable key is not a good idea.

• Solution: hash the key first. This way the input key can be of anylength! Such long keys are often called passphrases.

• If protocols need random, unpredictable values (nonces), use properrandom number generators. These are often based on hash functions.



29

Pseudorandom Number Generators (PRNGs)

Cautionary tale of the Netscape PRNG in 1995.

• Netscape Navigator 1.1 had the first version of the now-popular SSLprotocol. Keys for encryption were generated using a PRNG.

• The PRNG was initialized from time() on program startup and theconsequent outputs were deterministically based on this seed.

• Guess the 32-bit time value (which is not a secret; everyone has aclock) and you can predict all future outputs of the PRNG!

• Since the eavesdropper knows the outputs of the PRNG, she knowsthe keys and she can eavesdrop, regardless of encryption strength.



30

PRNGs (2)

• Most OS’s nowadays have built-in cryptographic random number gen-erators for key generation. On UNIX systems:

˜> hexdump /dev/random0000000 d938 cb3d e578 7525 292d 68e3 0bd6 16c40000010 9cbb d6dc c662 9e5b c326 501b [...]

• The randomness is contained in a random state (or pool) and it is con-stantly stirred by events that the operating system gathers: mouse andkeyboard inputs, interrupt timings, network events etc. Cryptographichash functions are used to mix the pool (SHA-1 on Linux).



31

A Simple PRNG Based on a Hash Function

Stir new input data to state:State = H(State | counter++ | new input data)

Extract randomness:Output = H(State | counter++)

.. of course it is good to remember ..

“Anyone who considers arithmetical methods of producing random digitsis, of course, in a state of sin.” – John von Neumann (1951)

.. and to use RNGs if available!



32

Digital Signatures

When signing a message using a public key digital signature algorithm,it is not necessary to sign the message itself. It is sufficient to sign acryptographic hash (message digest) of the message.

Signing:Signature = Sign(SHA-1(Message), Private Key)

Verifying:Verify(SHA-1(Message), Signature, Public Key) = OK/FAIL

Note; signature algorithm doesn’t even need the message; only its hash issufficient. More on this in the next lecture..



33

Date post:	03-Jun-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Lecture 4: Hashes and Message Digests · Lecture 4: Hashes and Message Digests Markku-Juhani O....

Documents