Cryptography - Institute of Computer Science

transcript

Cryptography

Prof. Dr. Carsten DammDr. Henrik Brosenne

University of GoettingenInstitut of Computer Science

Winter 2013/2014

Table of Contents

Classical CryptographySubstitution CiphersTransposition Ciphers

Elementary Cryptanalysis

Published Worksheet

Published worksheet 02 substitution ciphers.

Classical cryptosystems

Classical cryptosystems act on characters (not bits or bytes).

The main building blocks

Substitution of characters by others.

Transpositions, i.e. rearranging of character positions.

Sometimes combinations of Substitutions and Transpositions.

Notation

plaintext alphabet Aciphertext alphabet A′

ai denotes i-th letter from plaintext alphabet in its natural order (similarily a′i )

m = m1m2 . . . sequence of plaintext letters

c = c1c2 . . . sequence of ciphertext letters

enciphering keys K or k

deciphering keys K ′ or k ′

enciphering map EK

deciphering map DK ′

We will consider 4 types of substitution ciphers. Each of them easily seen to besymmetric ciphers: there is an efficient algorithm producing the deciphering keyfrom the enciphering key.

Simple substitution ciphers

this is a monographic (substitution of single characters by single characters)and monoalphabetic substitution (each occurence of some ai is replaced bythe same ciphertext letter a′j ): key = mapping A → A′

key = list of substitutions under the ordering of the plaintext alphabet, incase A = A′ it specifies a permutation π

for A = A′ = the classical alphabet a key could be specified like this:K = ULOIDTGKXYCRHBPMZJQVWNFSAE

keyspace is as large as 26!, nevertheless simple substitutions ciphers are easyto break

formula: ci = π(mi )

Example

Plaintext (paragraph from Kohel’s book after passing it through the map π):

SUPPOSETHATWEFIRSTENCODEAMESSAGEBYPURGINGALLNONALPHABETI

CCHARACTERSEGNUMBERSSPACESANDPUNCTUATIONANDCHANGINGALLCH

ARACTERSTOUPPERCASETHENTHEKEYSIZEWHICHBOUNDSTHESECURITYO

FTHESYSTEMISALPHABETICCHARACTERSTHEREFORETHETOTALNUMBERO

FKEYSISOFENORMOUSSIZENEVERTHELESSWEWILLSEETHATSIMPLESUBS

TITUTIONISVERYSUSCEPTIBLETOCRYPTANALYTICATTACKS

Key: ULOIDTGKXYCRHBPMZJQVWNFSAE

simple substitution ciphertext:

QWMMPQDVKUVFDTXJQVDBOPIDUHDQQUGDLAMWJGXBGURRBPBURMKULDVX

OOKUJUOVDJQDGBWHLDJQQMUODQUBIMWBOVWUVXPBUBIOKUBGXBGURROK

UJUOVDJQVPWMMDJOUQDVKDBVKDCDAQXEDFKXOKLPWBIQVKDQDOWJXVAP

TVKDQAQVDHXQURMKULDVXOOKUJUOVDJQVKDJDTPJDVKDVPVURBWHLDJP

TCDAQXQPTDBPJHPWQQXEDBDNDJVKDRDQQFDFXRRQDDVKUVQXHMRDQWLQ

VXVWVXPBXQNDJAQWQODMVXLRDVPOJAMVUBURAVXOUVVUOCQ

Easy statistical attack based on observation that frequencies of ciphertext symbolscorrespond to frequencies of plaintext letters under the above key, D is mostfrequent in ciphertext

Special cases

Affine ciphers numerically encode letters A, B, . . . , Z as elements{0,1,. . . ,25} = Z26 := Z/26Z

then operate on letters by transformations of the formx 7→ ax + b for any a coprime to 26key = (a, b)ci = a ∗mi + b (mod 26))

Translation ciphers special case of affine cipher with a = 1, also called shiftcipher) or additive cipher (e.g. Caesar’s cipher)

Exercise 6

1 How many affine ciphers are there on the classical alphabet?

2 Read about ROT13 cipher and it’s uses.

3 Vzcyrzrag gur EBG13 pvcure va Fntr.

4 Someone says: “OK, let’s agree on this substitution system. But to improveit, we better double the key size and double-encrypt messages (encipherplaintext by first key, and afterwards encipher result by second key).” Goodidea?

5 Consider a cryptosystem with plaintext alphabet = ciphertext alphabet. Akey in that system is called involutoric if double-encryption with same keygives the plaintext. ROT13 is an example. Involutoric keys are convenient,but also weak in some sense.

I How many involutoric keys are there for the translation ciphers?I How many involutoric keys are there for affine ciphers?I Find some involutoric but non-affine keys for general substitution ciphers on

the classical alphabet. Try to estimate the number of such keys byconstructing as many as possible involutions.

Homophonic ciphers

precondition: #A′ > #Aadvanced variant of substitution ciphers: each plaintext letter can be replacedby any one of a set of ciphertext letters

still monographic (single letters replaced by single characters from A′)thus: key = partition of A′ into #A sets for the classical alphabet: partitionA′ into 26 blocks, key = ordered list of blocks

enciphering: ai is replaced by random element from ith block

∀i : ci ∈ {π1(mi ), π2(mi )}depending on choice, same plaintext can result in several ciphertexts!

deciphering: element from ith block is deciphered as ith plaintext letter

Example

A′ = Klingonian alphabet, which contains 52 characters, as we all know

The font is not available, so we represent each Klingonian letter simply by a pairof classical letters (corresponding to the pronunciation . . . ) and we arrange themin two rows, that specify our homophonic key:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

LV MJ CW XP QO IG EZ NB YH UA DS RK TF MJ XO SL PE NU FV TC QD RK YH GW AB ZI

UD PY KG JN SH MC FT LX BQ EI VR ZA OW XP HO DJ CY RN ZV WT LA SF BM GU QK IE

To encipher the π-encoded message ALWAYSLOOKONTHEBRIGHTSIDEOFLIFE, wereplace occurrences of ai by either the lower or the upper Klingonian pair in theith key column

The following are legal encipherings of the message:

1 LVRKYHLVABZVRKHOHOVRHOXPWTLXQOMJNUYHFTNBTCFVYHJNQOHOMCZABQMCSH

2 UDZAYHUDQKZVZAHOXODSXOMJTCLXSHMJRNBQFTNBWTZVBQXPQOHOIGZABQMCSH

3 LVRKYHUDQKZVRKXOXODSHOXPTCLXQOPYRNBQEZNBTCFVBQXPSHHOIGZAYHMCSH

4 LVZABMUDABFVRKHOHODSHOXPWTLXQOPYRNBQEZNBTCZVBQXPQOXOIGZABQMCQO

The high frequency of E in plaintext is distributed among several ciphertext letters

Polyalphabetic substitution ciphers

in a way similar to homphonic ciphers (several keys) but choice of key is notrandom but based on the position of the character in the plaintext

ci = πf (i)(mi )

special case are periodic substitution ciphers: substitute the i-th plaintextletter using the (i (mod t))-th key

ci = πf (i (mod t))(mi )

t is called the period: each t-th character is enciphered by the same simplesubstitution cipher

the shorter the period, the weaker the system

Example: Vigenere cipher

The Vigenere cipher is a periodic translation cipher.

each key specifies an affine translation

identify the standard alphabet with Z/26Z = {0, 1, ..., 25}ci = (mi + k(i (mod t))) (mod 26)

Message “Human salvation lies in the hands of the creatively maladjusted.”

Gives the encoded plaintext

HUMANSALVATIONLIESINTHEHANDSOFTHECREATIVELYMALADJUSTED

With key UVLOID enciphering performs the column additions.

HUMANS ALVATI ONLIES INTHEH ANDSOF THECRE ATIVEL YMALAD JUSTED

UVLOID UVLOID UVLOID UVLOID UVLOID UVLOID UVLOID UVLOID UVLOID

--------------------------------------------------------------

BPXOVV UGGOBL IIWWMV CIEVMK UIOGWI NCPQZH UOTJMO SHLZIG DPDHMG

Example: Running Key Cipher and Auto Key Cipher

Running key cipher

key K = k1k2... is a long stream of letters ∈ {0, 1, ..., 25}, non-periodic

ci = mi + ki

best would be completely random letters in the key stream (Vernam cipherto be talked about later), but makes key exchange inconvenient

popular key agreements like this:Alice in Wonderland, start reading at page 5, bottom line

ciphertext C = c1c2 . . . is defined by ci = mi + ki (mod 26)

Autokey cipher, similar but

the key stream consists of a short keyword k1k2...k` with the plaintext(!)appended: k1k2...ktm1m2...

ci = mi + ki (mod 26) for i ≤ ` else ci = mi + mi−` (mod 26)

History

polyalphabetic substitutions were suggested by Alberti mid of the 15thcentury

tool: slide rules or adjustable metal discs containing the alphabet in unshiftedand shifted version:...XYZABCDEFGHIJKLMNOPQRSTUVWXYZABC...

...ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEF...

Vigenere’s cipher was first described by Bellaso a cryptologist in the serviceof the pope in the 16th century

idea: specify key by a single word

it was later misattributed to Vigenere, who instead invented the autokeycipher

Vigenere ciphers are reported to have been in military use until about 1940(Wikipedia)

this type of ciphers was successful for a long time

Example: Rotor machine

an electro-mechanical device for stream enciphering the plaintext

implements a sophisticated polyalphabetic substitution cipher, main ideas:I each rotor defines a simple substitutionI after substituting a single character the rotor moves one cycle step, thus giving

a period of 26I the next rotor applies another substitution but the cycle step is performed only

after the previous one cycled for 26 stepsI thus two, three, . . . rotors give a period of 262, 263, ...

in use between 1920–1970

after its invention the design got to be known (some systems where patented,some where perhaps analyzed by secret service acitivities)

the key consists of the sequence of rotor substitutions and the initial state ofthe system

The Enigma

starting substitution P (an arbitrary permutation of the alphabet!) defined bya plugboard3 rotors: after being substituted by right, middle and left rotors (R,M, L) thesymbol was “reflected” by a special rotor U in fixed positionafter passing the reflector U the symbol went again through rotors 3, 2, 1doing their inverse substitutions L−1,M−1,R−1 and finally through theinverse plugboard substitution P−1

(see Wikipedia http://en.wikipedia.org/wiki/Enigma machine)

if ρ is the cyclic permutation (A→ B→ C → . . . → A) and if i , j , k are theactual cycle positions of the rotors the overall substitution in this situationreads as

EP,i,j,k =P(ρiRρ−i )(ρjMρ−j )(ρkLρ−k )U

(ρkL−1ρ−k )(ρjM−1ρ−j )(ρiR−1ρ−i )P−1

The Enigma

Some details omitted.

there where 5 possible rotors, but only 3 used in the encryption process(wheel 1, 2, and 3)

the operator could change their position, e.g., wheel 1 in rotor bay M, wheel2 in rotor bay R, and wheel 3 in bay L

the key consisted of:I plugboard permutationI selection of wheels (3 out of 5)I wheel placementI initial rotation positions of the wheels

Since the reflector’s substitution was an involution the overall polyalphabeticsubstitution was involutoric too.

This property eased the key distribution process

Another feature of the reflector was that it never mapped a symbol to itself –this turned out to be a weakness of the system and helped to break it.

Exercise 7

1 Identify translation ciphers as special cases of Vigenere ciphers. Using thisidea define ROT13 as a Vigenere cipher in Sage.

2 Running key is kind of an “infinite period version” of a Vigenere cipher.Make use of this observation and implement it in Sage. Hint: Open the“Data..” drop down menu of a Sage worksheet to find out how internet textsources (Alice in Wonderland from Gutenberg project) can be invoked.

3 Autokey is not implemented in Sage. Provide your own implementation.

4 How many keys are there for the Enigma?

5 Rotor machines are not implemented in Sage. Provide your ownimplementation.

Polygram substitution ciphers

in contrast to monographic ciphers, polygram (or polygraphic) cipherssubstitute blocks of characters by other blocks.

ExampleI AA is mapped to NOI AB to IRI JU to AQI . . .

digram, trigram, . . . , n-gram = blocks of 2, 3, . . . , n symbols

harder to break than simple substitutions, since single character frequenciesare destroyed

one drawback for everyday use seems to be the large key size needed

special variants rely on smaller key spaces

Playfair ciphers

Playfair ciphers are digram substitutions.

The key is a 5x5 matrix containing all letters of the alphabetexcept J (is infrequent, can be identified with I).

Plaintext is encrypted two letters at a time. So special rules applyin case of repeating letters.

omitting one occurence, e.g. BALLOON → BALON orseparat them with filler letters, e.g. BALLON → BALXOXN.

ExampleM O N A R

C H Y B D

E F G I K

L P Q S T

U V W X Z

Substitute

Plaintext letters that fall in the same row are each replaced by the letter to theright (cyclically), e.g. PS → QT, TS → LT.Plaintext letters in the same column are each replaced by the letter beneath(cyclically), e.g. ZR → RD, DT → KZ.Otherwise, each plaintext letter in a pair is replaced by the letter that lies in tisown row and the column occupied ny the other plaintext letter, e.g. OK → RF,TH → DP.

One popular method to find and remember a key is to choose a sentence, write it intothe square but omit repeated occurrences of letters, then fill up with the remainingalphabet letters in natural order, e.g. the key in the above example is MONARCHY.

Hill cipher

the key is an invertible matrix K ∈ (Z/26Z )n.

encryption is simply multiplying each block of n symbol numbers by K ,decryption is multplication by K−1.

generalization: affine polygram cipherI key consists of invertible Matrix A and length n vector b.I encryption: m = (m1, ...,mn) 7→ (c1, ..., cn) = mA + b

Example

Consider 2-grams

AA = (0,0)

AB = (0,1)

ZZ = (25,25)

), b = (13, 14)

Mapping

AA = (0,0) is mapped to (13,14) = NO

ZZ = (25,25) is mapped to (18,23) = RD

Exercise 8

1 If K = (A, b) is the encryption key, what is the decryption key K ′? What isthe complexity of computing K ′ given K?

2 The Sage system offers an implementation of the Hill cryptosystem.I type HillCryptosystem?? to view the source code of this constructorI study the source code and discuss how to modify it to define general affine

polygram ciphers

Table of Contents

Classical CryptographySubstitution CiphersTransposition Ciphers

Elementary Cryptanalysis

Published Worksheet

Published worksheet 02 transposition ciphers

Exercise 8

1 Statement: Transposition ciphers are a variant of substitution ciphers. Trueor false?

2 Statement: Substitution ciphers are a variant of transposition ciphers. Trueor false?

Permutations

symmetric group Sn = set of bijections from {1, ..., n} onto itself, groupoperation = composition of mappings

members of Sn = permutations

n-th composition of π with itself = πn

order of a permutation π = smallest k such that πk = identity

BTW: mathematicians reserve the notion transposition to permutations thatexchanges two elements and keep others unchanged, cryptographers do notadopt this restriction to transposition ciphers

each permutation on n elements is a product of at most n transpositions

List notation for permutations

permutation π : j 7→ ij can be denoted by the list [i1, i2, ..., in]

this is called the list notation

same principle as for denotation of substitutions (only formal difference: forsubstitutions we wrote down list of ciphertext symbols, not numbers)

Orbits and cycle notation

any permutation π ∈ Sn has a unique orbit decomposition

{1, 2, ..., n} =t⋃

{πj (ik ) : j ∈ Z},

where union is taken over disjoint sets

the sets in the union are called the orbits of π and the sizes of the sets arecalled the cycle length of the orbits

associated to any orbit decomposition we can express a permutation as

π = (i1, π(i1), ..., πd1−1(i1))...(it , π(it), ..., πdt−1(it))

orbits of cycle length 1 are omitted, identity permutation is abbreviated as 1

very compact and informativ notation

a transposition (in the mathematical sense) has cycle notation (i,j)

Simple columnar transposition

(r , s)-simple columnar transposition: write plaintext as blocks, eachconsisting of r rows of lengths s

enciphering is done by reading blocks off in columns

more general columnar transpositions allow permutation of the colums beforereading them off

Example: simple columnar transposition (36 columns)

I was riding on the MayflowerWhen I thought I spied some landI yelled for Captain AhabI have yuh understandWho came running to the deckSaid, “Boys, forget the whaleLook on over yonderCut the enginesChange the sailHaul on the bowline”We sang that melodyLike all tough sailors doWhen they are far away at sea

IWASRIDINGONTHEMAYFLOWERWHENITHOUGHTISPIEDSOMELANDIYELLEDFORCAPTAINAHABIHAVEYUHUNDERSTANDWHOCAMERUNNINGTOTHEDECKSAIDBOYSFORGETTHEWHALELOOKONOVERYONDERCUTTHEENGINESCHANGETHESAILHAULONTHEBOWLINEWESANGTHATMELODYLIKEALLTOUGHSAILORSDOWHENTHEYAREFARAWAYATSEA

Example: simple columnar transposition (36 columns)

IWASRIDINGONTHEMAYFLOWERWHENITHOUGHTISPIEDSOMELANDIYELLEDFORCAPTAINAHABIHAVEYUHUNDERSTANDWHOCAMERUNNINGTOTHEDECKSAIDBOYSFORGETTHEWHALELOOKONOVERYONDERCUTTHEENGINESCHANGETHESAILHAULONTHEBOWLINEWESANGTHATMELODYLIKEALLTOUGHSAILORSDOWHENTHEYAREFARAWAYATSEA

IIHDYOOWSAEONUAPVCNTGSIEKDHHREYSEESIDUARBADSHICOIIOUDUWLNMNBTLOGEDOTIROLEYHNSNARSEEDTNSFEWOHDTONEWEIARGSHMYNGIAEAEDENNNYLWTEGTFLHTSTHLEOHCHEODCEHAYWFAWATAEOMHNMRRREAGEEWCRLELFHAUETOAEPNLHDRNTNOEYAIAIOSLWTINKAIAHNGOIKYOATNLEAUROOHATGATVALSHBHEULETIERLTA

Exercise 9

1 What is the block length of a simple (r , s)-columnar transposition? Give aformula description for a simple (r , s)-columnar transposition map: Ifm0,m1,m2, ... is the plaintext, what is the cryptotext c0, c1, c2, ...?

2 Determine the order of (r , r)- and of (3,5)-columnar transpositions.

General transposition cipher

there are only few columnar transpositions of 36 columns but about 4 ∗ 1041

permutations of block length 36(try the Sage command: factorial(36)*1.0)

for example

π =(1, 12, 5, 36, 30, 31, 4, 28, 33, 22, 26, 17, 10, 16, 14, 23, 18, 35, 32)

(2, 9, 3, 25, 15, 7, 21, 6, 29, 34, 11, 27, 19, 24)

(8, 13, 20)

applied to the above text gives

NNWNTIOTAMERLEDHGHRIIHYWEAFUGHSIWOOT

AMCTIADNPYPEEOSDEBRODALSIELRANIIFLAI

RNRNEICSVNNYOMHTDHEUUUWAADHOTGEHAETN

SBLOROEFCGLSHHIOOEADAETERETOVOKDWYNK

ETEELSHENIHECNCNTUGURTEOGNSHAIDYAHLA

ELLYTLAWTADEHMOEILEWBOGNSNTALKHOTNEI

DOFAAWYOGERSERIWREELAATUHNHTSYHOASAA

Example

π =(1, 12, 5, 36, 30, 31, 4, 28, 33, 22, 26, 17, 10, 16, 14, 23, 18, 35, 32)

(2, 9, 3, 25, 15, 7, 21, 6, 29, 34, 11, 27, 19, 24)

(8, 13, 20)

IWASRIDINGONTHEMAYFLOWERWHENITHOUGHT

ISPIEDSOMELANDIYELLEDFORCAPTAINAHABI

HAVEYUHUNDERSTANDWHOCAMERUNNINGTOTHE

DECKSAIDBOYSFORGETTHEWHALELOOKONOVER

YONDERCUTTHEENGINESCHANGETHESAILHAUL

ONTHEBOWLINEWESANGTHATMELODYLIKEALLT

OUGHSAILORSDOWHENTHEYAREFARAWAYATSEA

NNWNTIOTAMERLEDHGHRIIHYWEAFUGHSIWOOT

AMCTIADNPYPEEOSDEBRODALSIELRANIIFLAI

RNRNEICSVNNYOMHTDHEUUUWAADHOTGEHAETN

SBLOROEFCGLSHHIOOEADAETERETOVOKDWYNK

ETEELSHENIHECNCNTUGURTEOGNSHAIDYAHLA

ELLYTLAWTADEHMOEILEWBOGNSNTALKHOTNEI

DOFAAWYOGERSERIWREELAATUHNHTSYHOASAA

Exercise 10

1 Determine the cycle notation for a (4,5) simple columnar transposition cipher.

Table of Contents

Classical Cryptography

Elementary CryptanalysisClassification of Cryptanalytic AttacksStochastic structure of natural language - Part 1 (published worksheet)Cryptanalysis by Frequency Analysis (published worksheet)Breaking the Vigenere cipher (published worksheet)Statistical Measures (published worksheet)Cryptanalysis of Transposition Ciphers (published worksheet)

Reference

Bruce Schneier: Applied Cryptography (very comprehensive, recommended to getan overview on all of cryptography, not very detailed but very inspiring)

Starring

@Alice@ = first person in all protocols (initiator)

@Bob@ = second person in all protocols

@Eve@ = an eavesdropper, i.e., passive attacker

@Mallory@ = malicious active attacker

In this chapter we study @passive attacks@: Eve tries to get information about theplaintext, while observing only ciphertext messages in a cryptographic protocol.All attacks rely on a fixed cryptosystem (E ,D).

Ciphertext-only attack

This is the type of attack we will study in this chapter:

given ciphertexts C1 = EK (M1), ...,Ci = EK (Mt) of several messages, allgenerated by the same cipher EK

wanted an algorithm to infer Mt+1 from Ct+1 = EK (Mt+1)

weaker: recover some information about Mt+1

stronger: recover the key K (or at least information about it)

Known plaintext attack

additionally given M1, ...,Mt

scenario: disclosure of formerly classified documents

Chosen plaintext attack

instead given (limited) access to the cipher EK , so that the analyst can chooseM1, ...,Mt and generate the corresponding ciphertextsC1 = EK (M1), ...,Ci = EK (Mt)

scenario: a spy that is able to plant some specially prepared messages on theEnigma-operator

Adaptive-chosen-plaintext attack

special variant of chosen plaintext attack:I the attacker doesn’t need to fix the chosen plaintexts in advance but rather

can watch the outcome of chosen plaintext encryptions and based on thatchoose the next one(s)

scenario: before WWII Polish cryptanalysts (Biuro Szyfrow) were in posessionof a copy of the Enigma machine

Table of Contents

Simple observations

well known: each language (English, German, . . . ) has statisticalcharacteristics that can be used to differentiate between various text sources:

I frequencies of letters and wordsI of pairs, triples, . . . n-grams, or more general patternsI starting/ending letters of words, starting/ending words of sentencesI lengths of words/sentencesI . . .

Letter frequencies of typical english text samples

A B C D E F G H I J K L M

7.3 0.9 3.0 4.4 13.0 2.8 1.6 3.5 7.4 0.2 0.3 3.5 2.5

N O P Q R S T U V W X Y Z

7.8 7.4 2.7 0.3 7.7 6.3 9.3 2.7 1.3 1.6 0.5 1.9 0.1

heavy vowels: {E, I, O, A} = more than 1/3

heavy consonants: {T, N, R, S} = almost 1/3

low frequency symbols {J, K, Q, X, Z} = less than 2/100

Popular frequency ordered alphabets

(cited from F.L.Bauer: Entzifferte Geheimnisse)

English (various sources)I etaoins(h)r dlucmfwypvbgkqjxz (1884)I etoanirs hdlcufmpywgbvkxjqz (1893)I etaoinsr hldcumfpgwybvkxjqz (1982)

German (various sources)I enrisdutaghlobmfzkcwvjpqxy (1840)I enirsahtudlcgmwfbozkpjvqxy (1863)I enisratduhglcmwobfzkvpjqxy (1955)

Artificial text samples

one can generate random text by drawing symbols according to symbolfrequencies in genuine text sources (@0 order Markov source@)

better: @Shannon’s method@ (gives a @1st order Markov source@)1 take a large text sample (typical of the language)2 select a random cursor position, σ = symbol at cursor3 output σ, select a random cursor position4 locate first occurence of σ after cursor5 σ = character following cursor6 back to 3. or STOP

see published worksheet for illustration

can be extended to 2nd, 3rd, . . . order sources

Law of large numbers

wanted: a suitable mathematical model for plaintext sourcesa @stochastic source@ over alphabet A is a device that randomly emits“infinite texts” X = X1X2... ∈ Aωthe source is called @memoryless@, if for every symbol a the probabilityP(Xn = a) =: pa is independent of n and of all previous or future symbolsemittedlet Nn(a,X ) denote the number of occurrences i , Xi = a in the prefix

X1, ...,Xn and let fn(a,X ) := Nn(a,X )n be the @relative frequency@ of a in the

prefix X1X2...Xn

TheoremIf X is a random emission from a memoryless source with symbol probabilites(pa)a∈A, then

limn→∞

fn(a,X ) = pa

holds with probability 1.

this law holds true also for relative frequency of pairs, triples, . . . , and moregeneral “patterns” in the prefiximportant: the longer the text sample, the more stable are its stochasticfeatures in terms of pattern frequency

Ergodic sources

a source is called @stationary@ if probabilty of occurence of arbitrary“patterns” at position n of X is independent of n

generalization of memoryless sources: source is called @ergodic@, if it isstationary and the law of large numbers holds for arbitrary patterns

natural language sources are “close to” ergodic sources

one feature is that for an ergodic source the (infinite) emission is “almostsurely typical” (where typicality has a precise mathematical meaning that wewill discuss later)

Exercise 11

1 Implement a digram counter for text data and try to find some important“heavy pairs” by testing various text samples.

2 Extend this to triples (going much further probably doesn’t make much sensefor cryptanalysis)

Table of Contents

Breaking a simple substitution cipher

Ciphertext from a simple substitution cipherQWMMPQDVKUVFDTXJQVDBOPIDUHDQQUGDLAMWJGXBGURRBPBURMKULDVX

OOKUJUOVDJQDGBWHLDJQQMUODQUBIMWBOVWUVXPBUBIOKUBGXBGURROK

UJUOVDJQVPWMMDJOUQDVKDBVKDCDAQXEDFKXOKLPWBIQVKDQDOWJXVAP

TVKDQAQVDHXQURMKULDVXOOKUJUOVDJQVKDJDTPJDVKDVPVURBWHLDJP

TCDAQXQPTDBPJHPWQQXEDBDNDJVKDRDQQFDFXRRQDDVKUVQXHMRDQWLQ

VXVWVXPBXQNDJAQWQODMVXLRDVPOJAMVUBURAVXOUVVUOCQ

most frequent cipher symbols are D, V, Q, V, U, O, J, K, B (conjecture: thesecorrespond to the heavy symbols)

looks like the cipher takes E 7→ D and T 7→ V or T 7→ Q

rarest are E, N, S, Y, Z (conjecture: these correspond to the low frequencysymbols)

Exercise 12

1 Complete the analysis of this ciphertext. Hint: It is useful to replacerecovered plaintext letters in lower case in the ciphertext. i.e. replacing e forD givesQWMMPQeVKUVFeTXJQVeBOPIeUHeQQUGeLAMWJGXBGURRBPBURMKULeVX...

Once several letters have been identified it may help to first ignore theunidentified ones, as in the below ficticious example) and make a good guess.t.etopo.t.et.reetreesisato..o.oneo.t.reetree.

1 Using a brute force attack is an option for Caesar ciphers. Suggest a methodto avoid it. Implement it in Sage.

2 Using a brute force attack is an option for affine ciphers. Suggest a methodto avoid it. Try to implement it in Sage.

3 Implement a digram counter for text data and try to find some important“heavy pairs” by testing various text samples

Analysis of Vigenere Ciphers

consider @Vigenere ciphers@ as synonymous to periodic substitution cipheron the standard alphabet and with “short period”

methods apply in principle to any periodic substitution cipher

but are probably not powerful enough to break the Enigma or similar ciphers

The column trick

if (E ,C ) is polylaphabetic cryptosystem and for a specific key cipher EK hasperiod `, then each of the “plaintext columns”

M(1) = M1M1+`M1+2`...

M(2) = M2M2+`M2+2`...

M(`) = M`M2`M3`...

is enciphered by the same monoalphabetic cipher.

the corresponding ciphertext columns C (1), ...,C (`) can be deciphered assimple substitution ciphers

in particular:I the symbol distributions in the columns are permuted versions of the source

language symbol distributionI the symbol distributions falling ordered are all very similar

Frequency analysis of periodic ciphers

Observation periodic ciphers destroy the stochastic structure of the sourcelanguage, the distribution looks “more random” than normal source language

the first task for the cryptanalist is to determine the period

there are several methods of estimating the period

often a combination is to be applied

Decimation of a sequence

given a sequence S = s0s1s3... of symbols and a positive integer m (@theperiod@)

for 0 ≤ k < m the k-th @decimation@ of S is the sequence

S(m)k := sksk+msk+2m...

decimating a sequence is a kind of @downsampling@

Idea if m is a candidate period, consider and compare the decimatedsymbol distributions:

compare them to “typical” source language distributionscompare the decimations among each other (e.g., bybar-charts, if you have no other idea)more efficient: compare numerical parameters of distributionsexpectation of rank, variance of rank, entropy, index ofcoincidence (see below)

Reminder on entropy

binary entropy h(p) = −p log2 p − (1− p) log2(1− p)maximum at p = 0.5 (uniform distribution)

general entropy H(p1, ..., pN ) = −∑

pi log2 pi

maximum at p1 = p2 = ... = pN = 1N (uniform distribution)

Fact the “more uncertain” a distribution, the larger the entropy

see published worksheet

natural language symbol distributions of natural languages (or programmingsource code or . . . ), are pretty predictable

should have small entropy values

Kasiski’s method

Kasiski was a Prussian officer (1805-1881)

assume: key for Vigenere cipher under consideration is a natural languageword

ideaI sometimes frequent plaintext words (like ‘the’) are aligned at same positions

with respect to the keywordI in this situation the resulting ciphertext fragments are the same

plain: TOBEORNOTTOBETHATISTHEQUESTION

key stream: RUNRUNRUNRUNRUNRUNRUNRUNRUNRUN

cipher: KIOVIEEIGKIOVNURNVJNUVKHVMGZIA

------- ------------------------------

some coincidences: ----+ + ---- . . . +.

. 012345678901234567890123456789I some of those coincidences are random but the longer the fragments the more

likely is a systematic origin (@essential coincidences@)I the key length is a divisor of the distances of significant coincidences

Kasiski suggests the greatest common divisor (@GCD@) of distances of(sufficiently long) coincidences or a multiple thereof as period of the system

Idea for Refinement

hard to distinguish between essential and inessential coincidences

idea: consider all distances of coincidences, but weigh them according to thelength (the higher the length, the higher the weight)

problem: how to assign a period to such a weighted sum?

this problem was solved by William Friedman (US Army cryptographer1891-1969)

Notations and terminology

alphabet Σ, symbol values 0, 1, ...,m − 1

T ∈ Σ∗ a particular text of length N

Ni = Ni (T ) = number of occurences of symbol i in T

an i- @twin@ is a pair of occurences of symbol i in T cabbage has an a-twinand a b-twin

Observation T has exactly Ni (Ni−1)2 i-twins

i-twins for arbitrary i are commonly called @twins@

T has exactly∑m−1

i=0 Ni (Ni − 1) twins

the @index of coincidence@ of T is the relative frequency of twins among allposition pairs:

ϕ(T ) :=

∑m−1i=0 Ni (Ni−1)

N(N − 1)

(the @phi-value@ of the text)

Typical coincidence values

consider a stochastic source Q that emits an infinite stream ∈ Σω

symbol i is emitted with prob pi

if s, t are “independend positions” in the infinite stream, the probability of(s, t) being an i-twin is p2

iI if Q is a natural language source, the latter is certainly true if s, t are

sufficiently spaced awayI in this case the statement is true for the vast majority of position pairs

in general we conclude that the probability of a twin in positions s, t is(roughly)

∑m−1i=0 p2

i =: κ(Q) (@kappa-value of the source language@)

if T is a long enough typical finite text sample (prefix of infinite emissionfrom Q), we expect by the law of large numbers

ϕ(T ) ≈ κ(Q)

Properties of kappa

The minimal value of κ(p0, ..., pm−1) :=∑

p2i (p = (p0, ..., pm−1) being a

probability distribution) is kmin = 1m and is attained only for the uniform

distribution p0 = ... = pm−1 = 1m .

in turn: kappa is the larger the more “uneven” p is

Beweis.

for i ∈ {0, ...,m − 1} let pi = 1m − εi , where

∑εi = 0

then κ(p) =∑

( 1m − εi )

2 = κmin +∑ε2

Relation to entropy

The kappa-value is related to the Renyi-entropy]], which is a generalization tothe (so-called Shannon-)entropy mentioned above. More precisely, forα ≥ 0, α 6= 1

Hα(p0, ..., pm−1) = 11−α log

(∑m−1i=0 pαi

)for α→ 1 this converges to Shannon entropy

obviously κ = 2−H2

H2 is also called @collision entropy@

Properties of phi

reminder:I κ = quantity associated to a stochastic source

Example: kappa of English ≈ 0.076, kappa of German ≈ 0.066I ϕ = quantity associated to a specific text sample

Observation ϕ is invariant under simple substitutions, i.e., if σ : Σ1 → Σ2 is asubstitution cipher then ϕ(T ) = ϕ(σ(T )).

Application to periodic substitution ciphers

let T be a plaintext from a language source Q

let C be a ciphertext obtained from a periodic substitution cipher of unknownperiod `

for given candidate period m consider the corresponding decimationsC (0),C (1), ...,C (m−1) (“columns” of ciphertext)

if m is a multiple of `, then each column is a simple substitution ciphertext,hence the phi of each column should be ≈ κQ

otherwise the mixture of substitutions should result in more randomness, weexpect phi values closer to κmin

@phi-spectrum@: plotting the phi-values for a range of periods m should offersignificant peaks at multiples of ` (see worksheet)

1st and 2nd kind twins

if it is not possible or too time-consuming to plot the phi-spectrum we can atleast estimate the magnitude of `

basic idea: compare ϕ(C ) to ϕ(C (i)) where C (i) is one of the period `decimations (“columns”), for the correct period `

` is unknown to Eve, but in the beginning of the consideration we need tohave it in the formulas

recall that ϕ(C ) is the relative frequency of twins in the ciphertext of length n

we consider two kinds of twins:I @1st kind@: same symbol in positions s, t of same columnI @2nd kind@: same symbol in position s, t of different columns

Expected numbers of twins of either kind

in each column we have ≈ N/` symbols

this means there are ≈ N · (N/`− 1)/2 = N(N−`)2` position pairs s, t from a

same column

the expected number of 1st kind twins is

Z1 = κ(Q) · N(N−`)2`

on the other hand there are ≈ N · (N − N` )/2 = N2(`−1)

2` position pairs fromdifferent columns

therefore the expected number of 2nd kind twins is about

Z2 = κminN2(`−1)

reasoning: while 1st kind twins occur systematically (according to the sourcestatistics) 2nd kind twins are totally random events

Sinkov’s formula

recall that ϕ(C ) is the relative frequency of twins in C

there is a total of N(N − 1)/2 position pairs s, t, hence if the text is longenough, we expect

ϕ(C ) ≈ Z1+Z2

N(N−1)/2

substituting the above expected values for Z1,Z2 we find after some lines of

computing: ` ≈ (κ(Q)−κmin)N(N−1)ϕ(C)−κminN+κ(Q)

Abraham Sinkov (1907-1998) was a co-worker of Friedman

Exercise 13

1 Take some text samples from Project Gutenberg and compute theircoincidence index (after encoding them into the standard alphabet A-Z).Group the results by language or by author.

2 Do the same for typical source code files in the C or Java programminglanguage compared to typical HTML-files. Take care of the appropriatealphabet!

Table of Contents

Kohel’s example

we consider the ciphertext from Kohel’s book

it is too short to estimate by Sinkov’s formula or to plot the spectrum, soKasiski’s method has to be applied, and this gives period 11 (read in thebook how this is done)

ciphertext arranged in blocks of length 11:. 1 2 3 4

1 OOEXQGHXINM FRTRIFSSMZR WLYOWTTWTJI WMOBEDAXHVH

2 SFTRIQKMENX ZPNQWMCVEJT WJTOHTJXWYI FPSVIWEMNUV

4 WHMCXZTCLFS CVNDLWTENUH SYKVCTGMGYX SYELVAVLTZR

5 VHRUHAGICKI VAHORLWSUNL GZECLSSSWJL SKOGWVDXHDE

6 CLBBMYWXHFA OVUVHLWCSYE VVWCJGGQFFV EOAZTQHLONX

7 GAHOGDTERUE QDIDLLWCMLG ZJLOEJTVLZK ZAWRIFISUEW

8 WLIXKWNISKL QZHKHWHLIEI KZORSOLSUCH AZAIQACIEPI

9 KIELPWHWEUQ SKELCDDSKZR YVNDLWTMNKL WSIFMFVHAPA

0 ZLNSRVTEDEM YOTDLQUEIIM EWEBJWRXSYE VLTRVGJKHYI

1 SCYCPWTTOEW ANHDPWHWEPI KKODLKIEYRP DKAIWSGINZK

2 ZASDSKTITZP DPSOILWIERR VUIQLLHFRZK ZADKCKLLEEH

3 JLAWWVDWHFA LOEOQW

Counting frequent symbols in the decimations

decimations are rather short, the “most frequent is e”-trick may not work inevery decimation . . .

decimation 9 occurrences 8 o. 7 o. 6 o. 5 o.0 S W Z V1 L A2 F T3 D O R4 L I W5 W L6 T H W7 I S X E8 E H9 Z E Y

see published worksheet for further analysis steps

Exercise 14

1 Program a function in Sage that accepts a string from the standard alphabetand returns the relative amount of text that the string symbols typicallycontribute to English text.E.g., the string ETAOINRS (=heavy vowels and heavy consonants) wouldtogether contribute about 60% to the text, while JKQXZ would contribute lessthan 2%.

2 Using that function complete the Vigenere cipher analysis started in thepublished worksheet.

Table of Contents

Brute-force attacks

recall: the system is known to the attacker

if the key space is small, the attacker can just try all keys

at least for a human it is easy (but time-consuming) to distinguish a typicalnatural language sample from random nonsense strings

see published worksheet

will now study easy approach to rank candidate decipherements

Squared differences

let m be a length N plaintext from a natural language source with symbolprobabilities p = (pa)a∈Alet ciphertext c be the result of encrypting m by some mono- orpolyalphabetic substitution cipher

let mK := D(c ,K ) be a trial decipherment of c by candidate key K

further let N(a,mK ) be the number of a’s in mK and qa := N(a,mK

N thecorresponding empirical probability

if K was the correct key, we expect

N · qa ≈ N · pa

for every symbol a

the @squared differences rank@ (residual sum of squares) of K is

rRSS(K ) := N2∑

(qa − pa)2

it quantifies the deviation from the expected situation

in terms of statistical estimators it is related to the mean squared error(MSE) of an estimation

Other rank functions

@Chi-square test statistic@

rχ2 (K ) :=∑

(N(a,mK )− N · pa)2

N · pa

Exercise 15

Brute force deciphering is currently implemented in Sage only for shift and affineciphers. In principle it is possible to implement it also for Vigenere ciphers (ofknown key length), but the full keyspace is too large.

1 Mount a “structured” brute force attack to Vigenere ciphers: Run trialdecipherements for all keys in a given set of keys. (it is useful to read in theSage documentation about the set constructor).

2 Combine this brute force search with an appropriate rank function.

Table of Contents

Some ideas

transposition ciphers don’t change the symbol frequencies and are thereforeeasy to distinguish from substitutions

unfortunately there is no easy method to determine the block length

but transpositions change digram (and in general n-gram) frequencies q isalways followed by u in English, this may be destroyed by transpositions

by “anagramming” one tries to rearrange the first few letters until somemeaningful words occur (digram statistics helps)

large blocks require a lot of memory also for Alice and Bob, so there arechances that block length is small

guess appropriate block lengths and apply that same permutation in laterblocks until something meaningful occurs

automated versions:I dictionary look upI MSE-ranking of candidate decipherments

Cryptography - Institute of Computer Science

Documents