hash function - Politecnico di...

Hash Function

pp. 1 / 25

Definition• Secure hash functions (aka message digest functions) are intended to

provide proof of data integrity, by providing a verifiable fingerprint of the dataA one a hash f nction H operates on an arbitrar length inp t message M• A one-way hash function H operates on an arbitrary length input message M, returning a fixed-length output h=H(M)

The important properties are:The important properties are:

1. Given M, easy to compute h=H(M)2. Given h, hard to compute M such that h=H(M) -- "one-way", or

"pre-image resistant"3. Given M, hard to find M' (different from M) such that H(M)=H(M') -- "second-

preimage resistant"4. (Not always satisfied) Hard to find M,M' such that H(M)=H(M') --

"collision resistant"• Note that 4 implies 3 (i.e. if we could solve 3 we could solve 4), but not p ( )

conversely. Also, 3 implies 2, but not conversely.• The strange thing about hash functions is that there are typically billions of

collisions, or perhaps infinitely many (if the hash function really does take , p p y y ( yarbitrary-length input; most have some huge limit). But it is computationally hard to find a single one.

Examples of usageExamples of usagei d th t d 't t th d b t l it di t h( ) E• unix passwords -- the system doesn't store the password p but only its digest h(p). Even

if the user breaks into the computer and steals the password file, it is useless to him– Which properties 1-4 does this scheme require?

integrity of downloaded files: when I download a file (say some software) I would like to• integrity of downloaded files: when I download a file (say some software), I would like to check that I have received it correctly, i.e.

– it wasn't accidentally or maliciously corrupted in transitit wasn't accidentally or maliciously corrupted on the server– it wasn t accidentally or maliciously corrupted on the server

To achieve this, the server can offer me H(f) corresponding to the file fI can compute H(f') on the file have received. If they match, I can assume I have the

correct filecorrect file• fingerprints of public keys. Public keys are generally large (e.g., minimum 1024 bits,

which is 128 bytes). Checking their integrity with H(k) is easier than checking k itself• Signing messages. Sign the hash insteadSigning messages. Sign the hash instead• Commitments. Alice can commit to a value v by sending H(v,r) where r is some new

random. To prove she committed to v, she reveals r• File identification in peer-to-peer networks, video references on youtube, etcp p , y ,

Brute force attacks on hash functions• The Birthday "paradox" is the surprising answer to this question: how many people

must I gather in a room in order to have a probability >0 5 that two of them sharemust I gather in a room in order to have a probability >0.5 that two of them share the same birthday? (We assume that the birthdays are distributed randomly).The answer is lower than one might guess: 23.

Compare that with this question: how many people must I gather in a room in order to have a probability >0.5 that one of them share the same birthday as me? The answer is much greater: 253

……That is because with 23 people in the room, there are 253 different pairs of p p , ppeople present!

Brute force attacks on hash functionsSuppose there are N possible hash values from a set of strings X, and suppose that the output of a hash function is randomly distributed in this space. Take a subset of n strings.

• How big does n have to be in order to have a probability >0.5 of some string in that subset having a given hash value? The answer is: choosing n = N+1 I have the certainty to find almost one of suchThe answer is: choosing n = N+1 , I have the certainty to find almost one of such strings!!! …. A more refined answer gives: n= (ln 2)*N (for a large N).

n.b.: for a 128 bit hash function, you need to test 2128 inputs (approximately 1038) to get a 0.5 chance of pre-imaging the hash, that is to say, of getting a given hash value.

• How big does n have to be in order to have a probability >0.5 of two strings in that set having the same hash value? The probability of no duplicate is: (N-1)/N * (N-2)/N * . . . * (N-n+1)/N = = (1-(1/N)) * (1-(2/N)) * . . . * (1-((n-1)/N) < e-1/N * e-2/N * . . . * e-(n-1)/N = = e-n(n-1)/2N The middle inequality comes from 1-x<e-x. Setting this to be 0.5, approximating n(n-1) as n2 and solving for n gives

n=sqrt(2*(ln 2)*N)

• To try to put these numbers into perspective: 1019 microseconds is 317 000 years, while 1038 microseconds is 1024 years

Is collision-resistance necessary?y• One might argue that md5 is still secure for use where only properties 1-3 are required.

Many of the applications that use cryptographic hashes, such as password storage or document signing, are in principle only minimally affected by a collision attack. In the case of document signing, for example, an attacker could not simply fake a signature from an existing document -- the attacker would have to fool the private key holder into signing a preselected document. Reversing password hashing (e.g. to obtain a password to try against a user's account elsewhere) does not require collision resistance. Constructing a password that hashes to a given value requires a preimage attackConstructing a password that hashes to a given value requires a preimage attack.

• Here are two reasons for believing that collision-resistance is important:

1 Ways of finding collisions can often easily be adapted for finding a useful collision For1. Ways of finding collisions can often easily be adapted for finding a useful collision. For example,

– Alice prepares two versions of a contract. C1 is favorable to her, C2 favorable to Bob.– She makes several unnoticeable changes to each of C1 and C2, and computes the hash of each.

The changes are things like extra space at end of line, or double space between words. By making or t ki i l h t h f 32 li h ld il d 2^32 inot making a single change to each of 32 lines, she could easily produce 2^32 versions.

– She compares the hash value of each variation of C1 with each variation of C2, until she finds a collision h(C1')=h(C2').

– She gets Bob to sign C2', and then later claims he signed C1'.

2. For some hash functions h, a random collision h(m1)=h(m2) can be made into a potentially more useful collision by prepending a fixed message p, and/or appending a fixed message q. In other words, h(m1)=h(m2) may imply h(p.m1)=h(p.m2), or h(m1.q)=h(m2.q), or even h(p m1 q)=h(p m2 q) For a dramatic application of this idea read onh(p.m1.q) h(p.m2.q). For a dramatic application of this idea, read on.

Merkle Damgård constructionMerkle–Damgård construction

• This is one of the most used constructions

• It relies on the use of a compression function,usually indicated as fusually indicated as f

f takes as inputs (1) part of the input message• f takes as inputs (1) part of the input messageand (2) a chaining value to generate anotherchaining value as outputchaining value as output

Fundations of Cryptography - hash function pp. 7 / 25

Generic Structure of a Hash FunctionGeneric Structure of a Hash Function

padded messageinitial value

HASH function

…..temporary digest

HASH function

…..temporary digest …..HASH function

digestsomewhat similar to a generic secret key algorithm

(except for the key)


MD5 and SHA 1MD5 and SHA-1MD5 d SHA 1 th t t d h h f ti t d• MD5 and SHA-1 are the two most used hash functions today

• both process the message in blocks of 512 bits• if the message length is not a multiple of 512 bits• if the message length is not a multiple of 512 bits,

the message is justified to the boundary through padding• then the message is processed one block at a timethen the message is processed one block at a time• MD5 is considered not secure, particularly for the collision

resistance• a new study on the weaknesses of SHA-1 has been

presented at Crypto 2005 (and has proved to be effective)t d d t l SHA 1 h b d• new standard to replace SHA-1 has been approved:

– SHA-2 with digest length equal to 224, 256, 384 or 512 bits


MD5 hash functionMD5 hash function• MD5 (Multimedia Digest – version 5) is a simple and• MD5 (Multimedia Digest – version 5) is a simple and

common hash algorithm (PGP uses it for message signatures)

• it is an instance of the general hash model• the size of the digest of MD5 is 128 bits• the digest of MD5 composed by four 32 bit words,

denoted A, B, C and D• the initial value has been fixed by the designer of MD5• the initial value has been fixed by the designer of MD5

(Ron Rivest)• MD5 can be implemented in SW and HWp


MD5 structureMD5 structureth l ith i d b 4 t• the algorithm is composed by 4 steps

• each step is composed by 16 core operations (t t l 64 ti )(total 64 core operations)

• the core operation takes as input:– the four digest words of the temporary digest

computed in the previous stepone more 32 bit word of the 512 bit message block– one more 32-bit word of the 512-bit message block

– and one constant 32-bit word• the core operation permutes the four digest• the core operation permutes the four digest

words and modifies one of them (see next)


MD5 core operationMD5 core operation

A B C D

+ F

m +

+

m j

T k

<<S p

+

AB C DFundations of Cryptography - hash function pp. 12 / 18

AB C D

MD5 core operationMD5 core operationA B C and D are the four digest words• A, B, C and D are the four digest words

• the core operation is parameterized by the following additional inputsp– m j the j th 32-bit word of the input message– T k a set of constants– S p the number of left rotationsS p the number of left rotations

where indices k and p are updated dependingon the step and core operationll th i t l dditi d l 232• all the internal additions are modulo 232

• function F changes at every step:– step 1 F = (B and C) or (not B and D)step 1 F (B and C) or (not B and D)– step 2 F = (B and D) or (C and not D)– etc …


MD5 securityMD5 security• Collisions for the MD5 were announced on 17 August 2004 at

CRYPTO2004 by Xiaoyun Wang Dengguo Feng Xuejia Lai andCRYPTO2004, by Xiaoyun Wang, Dengguo Feng, Xuejia Lai and Hongbo Yu. Their work is a combination of theoretical analysis (based on differential cryptanalysis) which reduces the computational problem, and brute force attack on the reduced computational problem and theyand brute force attack on the reduced computational problem, and they received a standing ovation for their work. Later, on 1 March 2005, Arjen Lenstra, Xiaoyun Wang, and Benne de Weger demonstrated the construction of two X 509 certificates with different public keys and theconstruction of two X.509 certificates with different public keys and the same MD5 hash, a demonstrably practical collision. The construction included private keys for both public keys.C lli i i t• Collision resistance– A collision can be found with an effort of 224, much lower than the

264 expected• Pre-image resistance

– It has been shown that preimage can be found with complexity of 2123 (see before)2 (see before)


SHA 1SHA-1

• SHA-1 (Secure Hash Standard – version 1) is a popular hash function as wellp p

• SHA-1 is a derivation of MD5SHA 1 d l d b NSA f NIST• SHA-1 was developed by NSA for NIST

• a previous version of SHA (named SHA-0) a p e ous e s o o S ( a ed S 0)had been preliminary publishedb t SHA 0 as soon retired b NSA d e• but SHA-0 was soon retired by NSA dueto hidden faults -- never clearly explained!


SHA-1 structureSHA 1 structure• SHA-1 uses five 32-bit temporarySHA 1 uses five 32 bit temporary

variables and outputs a longer digest than MD5: 160 bitsMD5: 160 bits

• MD5 takes only 64 iterations to process 16 message wordsmessage words

• SHA-1 takes 80 iterations to process 16 dmessage words

• SHA-1 is composed by four stepsp y p• each step is composed by 20 applications

of the core operation (see next)Fundations of Cryptography - hash function pp. 16 / 18

of the core operation (see next)

SHA 1 core operationSHA-1 core operationgeneralisation of MD5 core

A B C D E

+F

5

<< 30

+

+ W j

<< 5

K i+

EB C D AFundations of Cryptography - hash function pp. 17 / 18

EB C D A

Uses of Hash FunctionsUses of Hash Functions

• hash functions are components of more complex cryptographic algorithms and protocols

• main uses of hash functions are the following:– DSA Digital Signatureg g– MAC Message Authentication Code– KDF Key Derivation Functiony

• but in general a hash function (of some type)is used whenever one need to compress a msgis used whenever one need to compress a msg


DSADSA

• DSA (digital signature) is a protocol to authenticate a message M

• DSA is based on public key cryptography:– the signature of M is computed with the secret keyg p y– the signature of M is verified with the public key

• actually, instead of computing the signature ofactually, instead of computing the signature of the whole message M, only the digest H (M) of M is signed, as it is much shorterM is signed, as it is much shorter


HMACHMAC• MAC (message authentication code) is a piece of

i f ti th t th ti t th (it iinformation that authenticates the message (it is an alternative to using digital signature)

• HMAC is a MAC based on a hash function HHMAC is a MAC based on a hash function H• the two entities must already have a secret key• the HMAC of msg M is generated as follows:the HMAC of msg M is generated as follows:

HMAC (M) = H ( secret key || H ( secret key || M ) )where operator || indicates concatenationp ||

• the security of HMAC is based on two aspects:– the difficulty of finding a message that maps on a given digest– and the secrecy of the key

• the two occurrences of the secret key in the above formula are padded in two different ways


formula are padded in two different ways

KDFKDFKDF (k d i ti f ti ) i d t t• KDF (key derivation function) is used to createa secret key from a shared secret, for example:– from the result of a Diffie-Hellman key exchange– or from a random number obtained from a not very

d b t (RNG)secure random number generator (RNG)• the common secret is processed by the hash

function and eventually a counter is addedKDF (Z) = H (Z, C0) || H (Z, C1) || H (Z, C2) …( ) ( 0) || ( 1) || ( 2)

where Z is the secret and Ci is the counter


Tree HashingTree Hashing

• When the message is particularly long and part of it can change and the new hash p gshould be computed, it is useful to construct a tree where leafs are messageconstruct a tree, where leafs are message blocks, intermediate nodes are hashes of sub treesub-tree


Date post:	24-Apr-2018
Category:	Documents
Upload:	lykiet
View:	221 times
Download:	3 times

hash function - Politecnico di...

Documents