+ All Categories
Home > Documents > RSA Implementationtawalbeh/cpe776/notes/rsa1.pdfRSA Implementation C¸etin Kaya Ko¸c Oregon State...

RSA Implementationtawalbeh/cpe776/notes/rsa1.pdfRSA Implementation C¸etin Kaya Ko¸c Oregon State...

Date post: 01-Feb-2021
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
80
RSA Implementation C ¸etin Kaya Ko¸ c Oregon State University 1
Transcript
  • RSA Implementation

    Çetin Kaya Koç

    Oregon State University

    1

  • Contents:

    • Exponentiation heuristics

    • Multiplication algorithms

    • Computation of GCD and Inverse

    • Chinese remainder algorithm

    • Primality testing

    2

  • RSA Encryption & Decryption

    C := Me mod n

    M := Cd mod n

    where

    n : Modulus (log2 n = k ≥ 512)

    e : Encryption exponent

    d : Decryption exponent

    M : Plaintext (0 ≤ M ≤ n− 1)

    C : Cryptotext (0 ≤ C ≤ n− 1)

    3

  • Modular Exponentiation

    We do NOT compute

    C := Me (mod n)

    by first computing

    Me

    and then computing the remainder

    C := (Me) mod n

    Temporary results must be reduced modulo n

    at each step of the exponentiation.

    4

  • Exponentiation

    M15

    • How many multiplications are needed ?

    Naive Answer:

    M → M2 → M3 → M4 → M5 → · · · → M15

    Requires 14 multiplications.

    Binary Method:

    M → M2 → M3 → M6 → M7 → M14 → M15

    Requires 6 multiplications.

    5

  • Let k be the number of bits of e, i.e.,

    k = 1 + blog2 ec

    e = (ek−1ek−2 · · · e1e0) =k−1∑i=0

    ei2i

    for ei ∈ {0,1}.

    Binary Method

    Input: M, e, n.

    Output: C := Me mod n.

    Step 1. if ek−1 = 1 then C := M else C := 1Step 2. for i = k − 2 downto 0

    2a. C := C · C (mod n)2b. if ei = 1 then C := C ·M (mod n)

    Step 3. return C

    6

  • Example:

    e = 250 = (11111010), thus k = 8.

    Initially, C = M since ek−1 = e7 = 1.

    i ei Step 2a Step 2b7 1 M M6 1 (M)2 = M2 M2 ·M = M35 1 (M3)2 = M6 M6 ·M = M74 1 (M7)2 = M14 M14 ·M = M153 1 (M15)2 = M30 M30 ·M = M312 0 (M31)2 = M62 M62

    1 1 (M62)2 = M124 M124 ·M = M1250 0 (M125)2 = M250 M250

    The number of multiplications: 7 + 5 = 12.

    7

  • The binary method requires:

    • Squarings: k − 1.

    (Step 2a)

    • Multiplications: The number of 1s in thebinary expansion of e, excluding the MSB.

    (Step 2b)

    The total number of multiplications:

    Maximum: (k − 1) + (k − 1) = 2(k − 1)

    Minimum: (k − 1) + 0 = k − 1

    Average: (k − 1) + 12(k − 1) = 1.5(k − 1)

    8

  • By scanning the bits of e

    2 at a time: quaternary method

    3 at a time: octal method

    etc.

    m at a time: m-ary method

    Consider the quaternary method:

    e = 250 = 11 11 10 10

    • Some preprocessing required.

    • At each step 2 squarings performed.

    9

  • Example: Quaternary method.

    bits j Mj

    00 0 101 1 M10 2 M ·M = M211 3 M2 ·M = M3

    e = 250 = 11 11 10 10

    bits Step 2a Step 2b

    11 M3 M3

    11 (M3)4 = M12 M12 ·M3 = M1510 (M15)4 = M60 M60 ·M2 = M6210 (M62)4 = M248 M248 ·M2 = M250

    The number of multiplications: 2+6+3 = 11.

    10

  • Example: Octal method.

    bits j Mj

    000 0 1001 1 M010 2 M ·M = M2011 3 M2 ·M = M3100 4 M3 ·M = M4101 5 M4 ·M = M5110 6 M5 ·M = M6111 7 M6 ·M = M7

    e = 250 = 011 111 010

    bits Step 2a Step 2b

    011 M3 M3

    111 (M3)8 = M24 M24 ·M7 = M31010 (M31)8 = M248 M248 ·M2 = M250

    The number of multiplications: 6+6+2 = 14.

    (Compute only M2 and M7 : 4 + 6 + 2 = 12)

    11

  • Assume 2d = m and kd is an integer.

    The average number of multiplications plus

    squarings required by the m-ary method:

    • Preprocessing Multiplications:

    m− 2 = 2d − 2

    • Squarings:

    (kd − 1) · d = k − d

    • Multiplications:

    m−1m · (

    kd − 1) = (1− 2

    −d) · (kd − 1)

    There is an optimum d for every k.

    12

  • Average number of multiplications

    k BM MM d∗ Savings %8 11 10 2 9.1

    16 23 21 2 8.632 47 43 2,3 8.564 95 85 3 10.5

    128 191 167 3,4 12.6256 383 325 4 15.1512 767 635 5 17.2

    1024 1535 1246 5 18.82048 3071 2439 6 20.6

    13

  • Reduction in the number of preprocessing

    multiplications:

    Consider the following exponent for k = 16

    and d = 4

    1011 0011 0111 1000

    which implies that we need to compute Mw

    (mod n) for only w = 3,7,8,11.

    M2 = M ·MM3 = M2 ·MM4 = M2 ·M2

    M7 = M3 ·M4

    M8 = M4 ·M4

    M11 = M8 ·M3

    This requires 6 multiplications. Computing all

    of the exponent values would require 16− 2 =14 preprocessing multiplications.

    14

  • Sliding Window Techniques

    Based on adaptive (data dependent) m-ary par-

    titioning of the exponent.

    • Constant length nonzero windows

    Rule: Partition the exponent into zero words

    of any length and nonzero words of length

    d.

    • Variable length nonzero windows

    Rule: Partition the exponent into zero words

    of length at least q and nonzero words of

    length at most d.

    15

  • Constant length nonzero windows:

    Example: For d = 3, we partition

    e = 3665 = (111001010001)2

    as

    111 00 101 0 001

    Average number of multiplications

    k m-ary d∗ CLNW d∗ %128 167 4 156 4 6.6256 325 4 308 5 5.2512 635 5 607 5 4.4

    1024 1246 5 1195 6 4.12048 2439 6 2360 7 3.2

    16

  • First compute Mj for odd j ∈ [1, m− 1].

    bits j Mj

    001 1 M010 2 M ·M = M2011 3 M ·M2 = M3101 5 M3 ·M2 = M5111 7 M5 ·M2 = M7

    3665 = 111 00 101 0 001

    bits Step 2a Step 2b

    111 M7 M7

    00 (M7)4 = M28 M28

    101 (M28)8 = M224 M224 ·M5 = M2290 (M229)2 = M458 M458

    001 (M458)8 = M3664 M3664 ·M1 = M3665

    17

  • Variable length nonzero windows:

    Example: d = 5 and q = 2.

    101 0 11101 00 101

    10111 000000 1 00 111 000 1011

    Example: d = 10 and q = 4.

    1011011 0000 11 0000

    11110111 00 1111110101 0000 11011

    Average number of multiplications

    m-ary VLNW for q∗

    k d∗ T/k d∗ T2/k %128 4 1.31 4 1.20 7.8256 4 1.27 4 1.18 6.8512 5 1.24 5 1.16 6.4

    1024 5 1.22 6 1.15 5.82048 6 1.19 6 1.13 5.0

    18

  • The Factor Method

    The factor method is based on factorization

    of the exponent e = rs where r is the smallest

    prime factor of e and s > 1.

    We compute Me by first computing Mr and

    then raising this value to the sth power.

    (Mr)s = Mrs = Me

    If e is prime, we first compute Me−1 then mul-tiply this quantity by M .

    19

  • Factor method: 55 = 5 · 11.

    Compute M → M2 → M4 → M5;

    Assign y := M5;

    Compute y → y2;

    Assign z := y2;

    Compute z → z2 → z4 → z5;

    Compute z5 → (z5y) = y11 = M55.

    Total: 8 multiplications.

    Binary method: e = 55 = (110111)2

    5 + 4 = 9 multiplications.

    20

  • The Power Tree Method

    Consider the node e of the kth level, from left

    to right.

    Construct the (k+1)st level by attaching below

    node e the nodes

    e + a1, e + a2, e + a3, . . . , e + ak

    where

    a1, a2, a3, . . . , ak

    is the path from the root of the tree to e.

    (Note: a1 = 1 and ak = e)

    Discard any duplicates that have already ap-

    peared in the tree.

    21

  • 1

    2

    3 4

    5

    7

    14 11 13 15 20 18 24 32

    10 9 12 16

    6 8

    17

    19 21 28 22 23 26

    22

  • Computation using power tree:

    Find e in the power tree. The sequence ofexponents that occurs in the computation ofMe is found on the path from the root to e.

    Example: e = 23 requires 6 multiplications.

    M → M2 → M3 → M5 → M10 → M13 → M23

    Since 23 = (10111), the binary method re-quires 4 + 3 = 7 multiplications.

    Since 23−1 = 22 = 2 ·11, the factor methodrequires 1 + 5 + 1 = 7 multiplications:

    M → M2 (:= y)

    y → y2 → y4 → y8 → y10 → y11

    y11 → M23

    23

  • Addition Chains

    Consider a sequence of integers

    a0, a1, a2, . . . , ar

    with a0 = 1 and ar = e. The sequence is

    constructed in such a way that for all k there

    exist indices i, j ≤ k such that

    ak = ai + aj

    The length of the chain is r.

    A short chain for a given e implies an efficient

    algorithm for computing Me.

    Example: e = 55

    BM: 1 2 3 6 12 13 26 27 54 55QM: 1 2 3 6 12 13 26 52 55FM: 1 2 4 5 10 20 40 50 55

    PTM: 1 2 3 5 10 11 22 44 55

    24

  • • Finding the shortest addition chain is NP-complete.

    • Upper-bound is given by binary method:

    blog ec+ H(e)− 1

    H(e) is the Hamming weight of e

    • Lower-bound given by Schönhage:

    log e + logH(e)− 2.13

    • Heuristics: binary, m-ary, adaptive m-ary,sliding windows, power tree, factor.

    • Statistical methods, such as simulated an-nealing, can be used to produce short ad-

    dition chains for certain exponents.

    25

  • Vector Addition Chains

    A list of vectors with the following properties:

    • The initial vectors are the unit vectors: [1,0,0], [0,1,0], [0,0,1].

    • Each vector is the sum of two earlier vec-tors.

    • The last vector is equal to the given vector.

    This problem arises in conjunction with re-

    ducing the preprocessing multiplications in the

    m-ary methods and the sliding window tech-

    niques.

    26

  • For example, given the vector [7,15,23], we

    obtain a vector addition as

    [1,0,0][0,1,0][0,0,1]

    [0,1,1] [1,1,1] [0,1,2] [1,2,3]

    [1,3,5] [2,4,6] [3,7,11] [4,8,12] [7,15,23]

    which is of length 9.

    An addition sequence is simply an addition chain

    where the i requested numbers occur some-

    where in the chain.

    It has been established that an addition se-

    quence of length r and i requested numbers

    can be converted to a vector addition chain of

    length r + i− 1 with dimension i.

    27

  • Addition-Subtraction Chains

    Convert the binary number to a signed-digit

    representation using the digits {0,1,−1} .

    These techniques use the identity

    2i+j−1 + 2i+j−2 + · · ·+ 2i = 2i+j − 2i

    to collapse a block of 1s in order to obtain a

    sparse representation of the exponent.

    Example:

    (011110) = 24 + 23 + 22 + 21

    (10001̄0) = 25 − 21

    These methods require that M−1 (mod n) besupplied along with M .

    28

  • Recoding Binary Method

    Input: M, M−1, e, n.Output: C := Me mod n.

    Step 0. Obtain signed-digit recoding d of e.

    Step 1. if dk = 1 then C := M else C := 1

    Step 2. for i = k − 1 downto 02a. C := C · C (mod n)2b. if di = 1 then

    C := C ·M (mod n)if di = 1̄ then

    C := C ·M−1 (mod n)Step 3. return C

    This algorithm is especially useful for elliptic-

    curve cryptosystems since the inverse is avail-

    able at no cost.

    29

  • Example:

    e = 119 = (1110111).

    Binary method: 6 + 5 = 11 multiplications.

    Exponent: 01110111Recoded Exponent: 10001̄001̄

    di Step 2a Step 2b1 M M0 (M)2 = M2 M2

    0 (M2)2 = M4 M4

    0 (M4)2 = M8 M8

    1̄ (M8)2 = M16 M16 ·M−1 = M150 (M15)2 = M30 M30

    0 (M30)2 = M60 M60

    1̄ (M60)2 = M120 M120 ·M−1 = M119

    The number of multiplications: 7 + 2 = 9.

    30

  • Modular Multiplication

    Computation of y = a · b (mod n) can be per-formed by using:

    • Multiply and then divide:

    Multiply x := a · b (2k-bit number)

    Divide: y := x % n (k-bit number)

    • Blakley’s method:

    The multiplication steps are interleaved with

    reduction steps.

    • Montgomery’s method:

    Uses predominantly modulo 2j arithmetic.

    31

  • Multiprecision Multiplication

    a = (as−1as−2 · · · a0)W =s−1∑j=0

    aiWi

    b = (bs−1bs−2 · · · b0)W =s−1∑j=0

    biWi

    Radix W = 2w : wordsize of the computer.

    a3 a2 a1 a0b3 b2 b1 b0t03 t02 t01 t00

    t13 t12 t11 t10t23 t22 t21 t20

    t33 t32 t31 t30t7 t6 t5 t4 t3 t2 t1 t0

    tij = bi · aj : (Carry, Sum) pairs.

    32

  • a, b : ai, bi for i = 0,1, . . . , s− 1

    t : ti for i = 0,1, . . . ,2s− 1

    Standard Multiplication Algorithm

    for i = 0 to s− 1begin

    C := 0

    for j = 0 to s− 1begin

    (C, S) := ti+j + aj · bi + Cti+j := S

    end

    ti+j+1 := C

    end

    33

  • This algorithm requires s2 = (k/w)2 innerprod-

    uct steps:

    (C, S) := ti+j + aj · bi + C

    in other words, O(k2) bit operations.

    The variables ti+j, aj, bi, C, and S each hold

    a single-word, or a w-bit number.

    From this operation, we obtain a double-word,

    or a 2w-bit number since

    2W −1+(2W −1)(2W −1)+2W −1 = 22W −1

    34

  • Example: a · b = 348 · 857

    i j Step (C, S) Partial t0 0 t0 + a0b0 + C (0, ∗) 000000

    0 + 8 · 7 + 0 (5,6) 0000061 t1 + a1b0 + C

    0 + 4 · 7 + 5 (3,3) 0000362 t2 + a2b0 + C

    0 + 3 · 7 + 3 (2,4) 000436002436

    1 0 t1 + a0b1 + C (0, ∗)3 + 8 · 5 + 0 (4,3) 002436

    1 t2 + a1b1 + C4 + 4 · 5 + 4 (2,8) 002836

    2 t3 + a2b1 + C2 + 3 · 5 + 2 (1,9) 009836

    019836

    2 0 t2 + a0b2 + C (0, ∗)8 + 8 · 8 + 0 (7,2) 019236

    1 t3 + a1b2 + C9 + 4 · 8 + 7 (4,8) 018236

    2 t4 + a2b2 + C1 + 3 · 8 + 4 (2,9) 098236

    29823635

  • Squaring is easier:

    Squaring is an easier operation than multipli-

    cation since half of the single-precision multi-

    plications can be skipped. This is due to the

    fact that tij = ai · aj = tji.

    a3 a2 a1 a0× a3 a2 a1 a0

    t03 t02 t01 t00t13 t12 t11 t01

    t23 t22 t12 t02+ t33 t23 t13 t03

    2t03 2t02 2t01 t002t13 2t12 t11

    2t23 t22+ t33t7 t6 t5 t4 t3 t2 t1 t0

    36

  • Standard Squaring Algorithm

    for i = 0 to s− 1(C, S) := ti+i + ai · aifor j = i + 1 to s− 1

    (C, S) := ti+j + 2 · aj · ai + Cti+j := S

    ti+j+1 := C

    return (t2s−1t2s−2 · · · t0)

    However, note that the carry-sum pair pro-

    duced by operation

    (C, S) := ti+j + 2 · aj · ai + C

    in Step 4 may be 1 bit longer than a double-

    word, or a 2w-bit number since

    (2w − 1) + 2(2w − 1)(2w − 1) + (2w − 1)

    is equal to 22w+1 − 2w+1 and

    22w − 1 < 22w+1 − 2w+1 < 22w+1 − 1

    37

  • Recursive Algorithms

    Assuming k = 2h, we decompose a and b into

    two equal-size parts:

    a := 2ha1 + a0b := 2hb1 + b0

    Then

    t := a · b:= (2ha1 + a0)(2

    hb1 + b0)

    := 22ha1b1 + 2h(a1b0 + a0b1) + a0b0

    Standard Recursive Algorithm

    sra(a, b)

    t0 := sra(a0, b0);

    t1 := sra(a0, b1);

    t2 := sra(a1, b0);

    t3 := sra(a1, b1);

    return (22ht3 + 2h(t1 + t2) + t0)

    38

  • The standard recursive algorithm breaks the

    multiplication of two 2h-bit integers into mul-

    tiplication of four h-bit integers.

    The number of bit operations is found by solv-

    ing the recursion

    T (k) = 4T (k/2) + αk

    with T (1) = 1. This gives T (k) = O(k2) bit

    operations.

    Karatsuba-Ofman algorithm is based on the

    observation that three h-bit multiplications suf-

    fice:

    t0 := a0 · b0t3 := a1 · b1

    t1 + t2 := (a1 + a0) · (b1 + b0)− t0 − t3

    39

  • Karatsuba-Ofman Recursive Algorithm

    kora(a, b)

    t0 := kora(a0, b0);

    t3 := kora(a1, b1);

    t12 := kora(a1 + b0, b1 + b0);

    return (22ht3 + 2ht12 + t0)

    The number of bit operations is given by the

    recursion

    T (k) = 3T (k/2) + βk

    with T (k) = 1. This gives

    O(klog2 3) = O(k1.58)

    bit operations.

    The Karatsuba-Ofman algorithm is asymptot-

    ically faster.

    40

  • However, due to the recursive nature of the

    algorithm, there is a large overhead.

    Karatsuba-Ofman algorithm starts paying off

    as k gets larger. After about k = 100, it starts

    being faster than the regular algorithm.

    We also have the option of stopping at any

    point during the recursion. For example, we

    may apply one level of recursion and then com-

    pute the required three multiplications using

    the standard algorithm.

    41

  • FFT-based Multiplication Algorithm

    The fastest multiplication algorithm is based

    on the fast Fourier transform. The FFT al-

    gorithm is used to multiply polynomials. Mul-

    tiprecision integers can be considered as poly-

    nomials evaluated at the radix.

    For example: 348 = 3x2 + 4x + 8 at x = 10.

    Similary, 857 = 8x2 + 5x + 7 at x = 10.

    In order to multiply 348 by 857, we first mul-

    tiply the polynomials

    (3x2 + 4x + 8)(8x2 + 5x + 7)

    then evaluate the resulting polynomial

    24x4 + 47x3 + 105x2 + 68x + 56

    at the radix x = 10 to obtain the product 348 ·857 = 298236.

    42

  • Let the polynomials a(x) and b(x)

    a(x) =k−1∑i=0

    aixi , b(x) =

    k−1∑i=0

    bixi

    denote the multiprecision numbers

    a = (ak−1ak−2 · · · a0)

    b = (bk−1bk−2 · · · b0)

    represented in radix W where ai and bi are the

    ‘digits’ with the property 0 ≤ ai, bi ≤ W − 1.

    Let the integer l = 2k be a power of 2. Given

    the primitive lth root of unity ω, the following

    algorithm computes the product

    t = (tl−1tl−2 · · · t0)

    43

  • FFT-based Multiplication Algorithm

    1. Evaluate a(wi) and b(wi) for i = 0,1, .., l−1by calling the fast Fourier transform pro-cedure.

    2. Multiply pointwise to obtain

    {a(1)b(1), a(w)b(w), . . . , a(wl−1)b(wl−1)}

    3. Interpolate t(x) =∑l−1

    i=0 tixi by evaluating

    l−1l−1∑i=0

    a(wi)b(wi)xi

    on {1, w−1, . . . , w−(l−1)} by calling the fastFourier transform procedure.

    4. Return the coefficients

    (tl−1, tl−2, . . . , t0)

    44

  • Which field to use?

    Complex field is unsuitable for computer im-

    plementation since the lth primitive root of

    unity e2πj/l (where j =√−1) may be irra-

    tional. These numbers cannot be represented

    or operated on our computers which perform

    finite precision arithmetic.

    Pollard showed that a finite field in which l−1

    and a primitive lth root of unity exist can be

    used. The choice is the Galois field of p ele-

    ments where p is a prime and l divides p− 1.

    Example:

    p = 2130706433 = 127 · 224 + 1. The fieldGF (p) can be used to compute FFTs of size

    l = 224 ≈ 107 on a 32-bit machine.

    45

  • The FFT-based multiplication algorithm was

    discovered by Schönhage and Strassen. It re-

    quires

    O(k log k log log k)

    bit operations to multiply two k-bit numbers.

    However, the constant in front of the order

    function is very high. It starts paying off for

    numbers with several thousand bits.

    46

  • Computation of the Remainder

    Given t, the computation of R which satisfies

    t = Q · n + R

    with R < n. Here t is a 2k-bit number and n is

    a k-bit number.

    The numbers t and n are positive, so are the

    results Q and R.

    Since we are not interested in the quotient,

    steps of the division algorithm can be simpli-

    fied.

    Two algorithms are of interest to us:

    • Restoring division

    • Nonrestoring division

    47

  • Restoring Division Algorithm

    R0 := t

    n := 2kn

    for i = 1 to k

    Ri := Ri−1 − nif Ri < 0 then Ri := Ri−1n := n/2

    return Rk

    We give an example of the restoring division

    algorithm for computing 3019 mod 53, where

    3019 = (101111001011)2

    53 = (110101)2

    The result is

    51 = (110011)2

    48

  • R0 101111 001011 tn 110101 subtract

    - 000110 negative rem.R1 101111 001011 restoren/2 11010 1 subtract

    + 10100 1 positive rem.R2 10100 101011 not restoren/2 1101 01 subtract

    + 0111 01 positive rem.R3 0111 011011 not restoren/2 110 101 subtract

    + 000 110 positive rem.R4 000 110011 not restoren/2 11 0101n/2 1 10101n/2 110101 subtract

    - 000010 negative rem.R5 110011 restoreR 110011 Final rem.

    49

  • Nonrestoring Division Algorithm

    The nonrestoring division algorithm allows a

    negative remainder.

    Suppose Ri = Ri−1 − n < 0, then the restoringalgorithm assigns Ri := Ri−1 and performs asubtraction with the shifted n, obtaining

    Ri+1 = Ri − n/2 = Ri−1 − n/2

    However, if Ri = Ri−1 − n < 0, then the non-restoring algorithm lets Ri remain negative and

    adds the shifted n in the following cycle. Thus,

    it obtains

    Ri+1 = Ri + n/2 = (Ri−1 − n) + n/2

    which is equal to Ri−1 − n/2, i.e., the samevalue.

    50

  • R0 := t

    n := 2kn

    for i = 1 to k

    if Ri−1 > 0 then Ri := Ri−1 − nelse Ri := Ri−1 + nn := n/2

    return Rk

    Since the remainder is allowed to stay nega-

    tive, we use 2’s complement coding to repre-

    sent such numbers.

    Also, note that the nonrestoring division al-

    gorithm may require a final restoration cycle

    in which a negative remainder is corrected by

    adding the last value of n back to it.

    51

  • Example: Computation of 51 = 3019 mod 53

    using the nonrestoring division algorithm.

    R0 0101111 001011 tn 0110101 subtract

    R1 1111010 negative rem.n/2 011010 1 addR2 010100 1 positive rem.n/2 01101 01 subtractR3 00111 01 positive rem.n/2 0110 101 subtractR4 0000 110 positive rem.n/2 011 0101n/2 01 10101n/2 0 110101 subtractR5 1 111110 negative rem.n 0 110101 add (restore)R 0 110011 Final rem.

    52

  • Blakley’s Method

    Let ai and bi represent the bits of the k-bitnumbers a and b, respectively. The product t(2k-bit number) can be written as

    t = a · b =

    k−1∑i=0

    ai2i

    · b = k−1∑i=0

    (ai · b)2i

    This formulation yields the shift-add multipli-cation algorithm. Blakley’s algorithm uses thisformulation and furthermore reduces the par-tial product modulo n at each step.

    R := 0for i = 0 to k − 1

    begin

    R := 2R + ak−1−i · bR := R mod nend

    return R

    53

  • Assuming that 0 ≤ a, b, R ≤ n − 1, the new Rwill be in the range

    0 ≤ R ≤ 3n− 3

    since

    R := 2R + aj · b≤ 2(n− 1) + (n− 1) = 3n− 3

    At most 2 subtractions will be needed to bring

    the new R to the range [0, n − 1]. Thus, wecan use

    If R ≥ n then R := R− nIf R ≥ n then R := R− n

    Blakley’s algorithm computes the remainder R

    in k steps, where at each step one left shift,

    one addition, and at most two subtractions

    are performed; the operands involved in these

    computations are of length k bits.

    54

  • Montgomery’s Method

    This method replaces division by n operation

    with division by r = 2k.

    Assuming n is a k-bit integer, i.e.,

    2k−1 < n < 2k

    we assign r = 2k.

    Now, we map the integers a ∈ [0, n − 1] tothe integers ā ∈ [0, n− 1] using the one-to-onemapping

    ā := a · r (mod n)

    We call ā the n-residue of a.

    55

  • We now define the Montgomery product of

    two n-residues as

    MonPro(ā, b̄) = ā · b̄ · r−1 (mod n)

    where r−1 is the inverse of r modulo n. Alsowe need n′ such that

    r · r−1 − n · n′ = 1

    r−1 and n′ are computed by the extended Eu-clid algorithm.

    function MonPro(ā, b̄)

    Step 1. t := ā · b̄Step 2. m := t · n′ mod rStep 3. u := (t + m · n)/rStep 4. if u ≥ n then return u− n

    else return u

    This subroutine requires only modulo r arith-

    metic, which is easily accomplished on a com-

    puter if r = 2j.

    56

  • Theorem 1:

    If c = a · b (mod n)Then c̄ = MonPro(ā, b̄)

    Proof: c̄ = c · r (mod n)= a · b · r (mod n)= a · r · b · r · r−1 (mod n)= ā · b̄ · r−1 (mod n)= MonPro(ā, b̄)

    Theorem 2:

    c = MonPro(c̄,1)

    Proof: c = c · r · r−1 (mod n)= c̄ · r−1 (mod n)= c̄ · 1 · r−1 (mod n)= MonPro(c̄,1)

    57

  • MonPro procedure can be utilized to compute

    c := a · b (mod n)

    as follows:

    function ModMul(a, b, n) { n is odd }Step 1. Compute n′ using Euclid’s algorithm.Step 2. ā := a · r mod nStep 3. b̄ := b · r mod nStep 4. c̄ := MonPro(ā, b̄)

    Step 7. c := MonPro(c̄,1)

    Step 8. return c

    58

  • Since preprocessing operations

    • computation of n′ and

    • conversion from ordinary to n-residue

    • conversion from n-residue to ordinary

    are time-consuming, it is not a good idea to

    use Montgomery’s method for a single modular

    multiplication.

    However, it is very suitable for modular expo-

    nentiation.

    59

  • function ModExp(M, e, n) { n is odd }Step 1. Compute n′ using Euclid’s algorithm.Step 2. M̄ := M · r mod nStep 3. C̄ := 1 · r mod nStep 4. for i = k − 1 down to 0 doStep 5. C̄ := MonPro(C̄, C̄)

    Step 6. if ei = 1 then C̄ := MonPro(M̄, C̄)

    Step 7. C := MonPro(C̄,1)

    Step 8. return C

    The above function uses the binary method.

    However, anyone of the addition chain heuris-

    tics can be used here as well.

    60

  • Example: Computation of 710 (mod 13)

    r = 2k = 16. Since

    16 · 9− 13 · 11 = 1

    we have r−1 = 9 and n′ = 11.

    M = 7

    M̄ := M · r (mod n) = 7 · 16 (mod 13) = 8

    C = 1

    C̄ := C · r (mod n) = 1 · 16 (mod 13) = 3

    Thus, C̄ = 3 and M̄ = 8

    61

  • ei Step 5 Step 61 MonPro(3,3) = 3 MonPro(8,3) = 80 MonPro(8,8) = 41 MonPro(4,4) = 1 MonPro(8,1) = 70 MonPro(7,7) = 12

    Step 7: C = MonPro(12,1) = 4

    • Computation of MonPro(3,3) :

    t := 3 · 3 = 9m := 9 · 11 (mod 16) = 3u := (9 + 3 · 13)/16 = 48/16 = 3

    • Computation of MonPro(8,1) :

    t := 8 · 1 = 8m := 8 · 11 (mod 16) = 8u := (8 + 8 · 13)/16 = 112/16 = 7

    62

  • Computation of GCD and Inverse

    Euclid’s algorithm is based on the following

    observation:

    gcd(a, b) = gcd(b, a mod b)

    Thus, given a and b, we compute gcd(a, b) by

    successive modular reductions:

    r1 = a mod b

    r2 = b mod r1

    r3 = r1 mod r2

    r4 = r2 mod r3...

    until we reach an ri = 0.

    The greatest common divisor of a and b is then

    gcd(a, b) = ri−1.

    63

  • Example: Computation of gcd(1812,1572)

    240 = 1812 mod 1572

    132 = 1572 mod 240

    108 = 240 mod 132

    24 = 132 mod 108

    12 = 108 mod 24

    0 = 24 mod 12

    Thus, gcd(1812,1572) = 12.

    Given a and b with with 0 ≤ a ≤ b, Euclid’salgorithm requires less than

    loge(√

    5 b)

    loge(1+

    √5

    2 )− 1

    steps to compute gcd(a, b).

    64

  • The binary Euclid algorithm replaces modu-

    lar reductions by shift (division by 2) and sub-

    traction operations. It is based on the follow-

    ing observations:

    For any a and b

    gcd(a, b) = gcd(b, b− a)

    If a and b are both even then

    gcd(a, b) = 2 · gcd(a/2, b/2)

    If a is odd and b is even then

    gcd(a, b) = gcd(a, b/2)

    65

  • Example: Computation of gcd(182,98)

    gcd(182,98) = 2 · gcd(91,49)= 2 · gcd(49,42)= 2 · gcd(49,21)= 2 · gcd(21,28)= 2 · gcd(21,14)= 2 · gcd(21,7)= 2 · gcd(7,14)= 2 · gcd(7,7)= 2 · 7= 14

    66

  • Extended Euclid algorithm

    Given a and b, the extended Euclid algorithm

    computes g, u, and v such that

    g = gcd(a, b) = u · a + v · b

    This algorithm is used to compute the modular

    inverse. If g = 1, then

    1 = u · a + v · b

    implies that

    1 = u · a (mod b)1 = v · b (mod a)

    and therefore

    u = a−1 (mod b)

    v = b−1 (mod a)

    67

  • EEA(a, b, g, u, v)

    begin

    (g0, g1) = (a, b)

    (u0, u1) = (1,0)

    (v0, v1) = (0,1)

    while g1 6= 0 dobegin

    q = g0 div g1(g0, g1) = (g1, g0 − g1 · q)(u0, u1) = (u1, u0 − u1 · q)(v0, v1) = (v1, v0 − v1 · q)end

    g = g0 ; u = u0 ; v = v0end

    68

  • Example: a = 21 and b = 16

    iteration q g0 g1 u0 u1 v0 v10 - 21 16 1 0 0 11 1 16 5 0 1 1 -12 3 5 1 1 -3 -1 43 5 1 0 -3 16 4 -21

    • • •

    EEA returns g = 1, u = −3 and v = 4

    This implies

    1 = −3 · 21 + 4 · 16

    Therefore

    21−1 = −3 (mod 16)16−1 = 4 (mod 21)

    69

  • Chinese remainder theorem

    Let m1 and m2 be relatively prime integers,

    i.e., gcd(m1, m2) = 1. Given x1 ∈ [0, m1) andx2 ∈ [0, m2), there exists a unique integer X ∈[0, m1m2) such that

    X = x1 mod m1

    X = x2 mod m2

    which is determined by

    X = x1c1m2 + x2c2m1 (mod m1m2)

    where

    c1 = m−12 mod m1

    c2 = m−11 mod m2

    Another (simpler) formula for X:

    X = x1 + m1 [(x2 − x1)c2 mod m2]

    70

  • Fast Decryption using CRT

    Quisquater & Couvreur showed that the de-

    cryption process, i.e., computation of

    M := Cd (mod n)

    can be performed faster using the CRT.

    Knowing the factors p and q of n, we compute

    M1 := Cd := Cd1 (mod p)

    M2 := Cd := Cd2 (mod q)

    where

    d1 := d (mod p− 1)d2 := d (mod q − 1)

    71

  • M is then computed using the Chinese remain-

    der algorithm:

    M := M1 + p [(M2 −M1)p−1 mod q]

    The sizes of p and q are half of the size of n.

    This technique reduces a k-bit modular expo-

    nentiation into two k/2-bit modular exponen-

    tiations plus the CRT step given above.

    Also p−1 can be precomputed and saved.

    72

  • Typical Actual Timing Results

    Proc. MHz W 512-bit CRT Ref.56000 20 24 44 ms yes •68020 20 32 445 ms yes †80386 16 32 568 ms yes †2105 10 16 1320 ms no ∗

    80286 8 16 2500 ms yes †

    • Kaliski & Dussé 1990

    † Laurichesse & Blain 1991

    ∗ Koç 1992

    73

  • RSA Estimates for 512-bit with CRT

    Processor MHz W T (sec)Motorola 56000 20 24 0.04TI TMS320C30 33 12 0.05NEC uPD77C25 8 15 0.38Analog Devices 2105 10 16 0.38TI TMS320C25 8 16 0.54Mitsubishi M3770 8 16 4.80Intel 8096 4 16 8.76Zilog Z80180 10 8 10.75Zilog Z80 10 8 48.55Motorola 6811 2 8 49.89Intel 8051 12 8 55.65Motorola 6805SC27 2 8 68.04Hitachi HD63701 2 8 75.80

    74

  • Primality Testing

    Fermat’s Theorem: If m is prime, then for

    any a such that gcd(a, m) = 1 we have

    am−1 = 1 (mod m)

    If m is not prime, it is still possible (but prob-

    ably not very likely) that the above holds.

    If m is an odd composite number and a is an

    integer such that gcd(a, m) = 1 and

    am−1 = 1 (mod m)

    then we say that m is a pseudoprime to the

    base a.

    For example, 341 is a pseudoprime to the base

    2 since

    2341−1 = 1 (mod 341)

    A pseudoprime is a number that “pretends” to

    be a prime by passing Fermat’s test.

    75

  • Pseudoprimality Test

    Fermat’s theorem provides a simple and effi-

    cient test for discarding nonprimes. Any num-

    ber that fails the test is not prime.

    We first check if m is a pseudoprime for the

    base 2. If not, then m is composite. If m

    passes for the base 2, we check for other bases.

    For example,

    3341−1 = 56 (mod 341)

    and thus 341 is composite.

    If m fails Fermat’s test for a single base a ∈[1, m), then m fails for at least half of the bases

    a ∈ [1, m).

    Thus, unless m happens to pass the test for

    all possible a with gcd(a, m) = 1, we have at

    least a 50 % chance that m will fail the test

    for a randomly chosen a.

    76

  • Pseudoprimality Test

    Given m

    • Randomly pick an a ∈ [1, m)• Compute g = gcd(a, m)• If g 6= 1 then m is composite• If g = 1 then compute h = am−1 mod m• If h 6= 1 then m fails the test• If h = 1 then m passes the test

    If m passes the test then it is probably prime;

    if m fails the test then it is composite.

    Repeat this test for r different bases. The

    probability of a composite m passing r tests is

    1 in 2r.

    Unfortunately, there are numbers which pass

    Fermat’s test for all bases to which they are

    relatively prime. These are Carmichael num-

    bers and there are infinitely many of them.

    77

  • Strong Pseudoprimality Test

    Given m

    • Write m− 1 = 2st where t is odd• Randomly pick an a ∈ [1, m)• Compute h = at mod m• If h = −1 or +1, then m passes• Compute hi = a2

    it for i = 1,2, . . . , s

    • If hi = −1 for some i < s, then m passes• Otherwise m fails

    Repeat this test for r different bases. The

    probability of a composite m passing r tests is

    1 in 4r.

    If m passes the test for 100 different bases,

    then the probability that m is composite is less

    than 10−60.

    Good news: There are no strong Carmichael

    numbers.

    78

  • Example: m = 25326001

    Write m− 1 = 25326000 = 8 · 1582875 = 24 · t

    Pick a = 2

    h = 2t mod m gives h = −1PASS

    Pick a = 3

    h = 3t mod m gives h = −1PASS

    Pick a = 7

    h = 7t mod m gives h = 19453141

    h1 = 72t mod m gives h1 = 16857740

    h2 = 74t mod m gives h2 = 11448587

    h3 = 78t mod m gives h3 = 10127250

    FAIL

    Thus, m is composite.

    (25326001 = 2251 · 11251)79

  • Generating RSA Primes

    1. Pick a k-bit odd m uniformly at random

    from [m/2, m]

    2. Apply test division on m by all primes less

    than a certain small prime

    3. If m passes trial division test, then apply the

    (strong) pseudoprimality test for r different

    bases 2,3,5,7, . . .

    4. If m passes all r (strong) pseudoprimality

    tests, then m is a prime number with high

    probability

    5. If m fails, take m := m + 2 go to Step 2

    80


Recommended