+ All Categories
Home > Documents > Challenges in integer factorization D. J. Bernstein University …...2017/10/03  · 1 Challenges in...

Challenges in integer factorization D. J. Bernstein University …...2017/10/03  · 1 Challenges in...

Date post: 18-Feb-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
50
1 Challenges in quantum algorithms for integer factorization D. J. Bernstein University of Illinois at Chicago Prelude: What is the fastest algorithm to sort an array? def blindsort(x): while not issorted(x): permuterandomly(x)
Transcript
  • 1

    Challenges in

    quantum algorithms for

    integer factorization

    D. J. Bernstein

    University of Illinois at Chicago

    Prelude: What is the fastest

    algorithm to sort an array?

    def blindsort(x):

    while not issorted(x):

    permuterandomly(x)

  • 2

    def bubblesort(x):

    for j in range(len(x)):

    for i in reversed(range(j)):

    x[i],x[i+1] = (

    min(x[i],x[i+1]),

    max(x[i],x[i+1])

    )

    bubblesort takes poly time.

    Θ(n2) comparisons.

    Huge speedup over blindsort!

    Is this the end of the story?

  • 2

    def bubblesort(x):

    for j in range(len(x)):

    for i in reversed(range(j)):

    x[i],x[i+1] = (

    min(x[i],x[i+1]),

    max(x[i],x[i+1])

    )

    bubblesort takes poly time.

    Θ(n2) comparisons.

    Huge speedup over blindsort!

    Is this the end of the story?

    No, still not optimal.

  • 3

    Analogous: What is the fastest

    algorithm to factor integers?

    Shor’s algorithm takes poly time.

    Huge speedup over NFS!

    b2(log b)1+o(1) qubit operations

    to factor b-bit integer,

    using standard subroutines

    for fast integer arithmetic.

    Is this the end of the story?

  • 3

    Analogous: What is the fastest

    algorithm to factor integers?

    Shor’s algorithm takes poly time.

    Huge speedup over NFS!

    b2(log b)1+o(1) qubit operations

    to factor b-bit integer,

    using standard subroutines

    for fast integer arithmetic.

    Is this the end of the story?

    No, still not optimal.

    “Shor’s algorithm: the bubble sort

    of integer factorization.”

  • 4

    A simple exercise to illustrate

    suboptimality of Shor’s algorithm:

    Find a prime divisor of¨103009ı

    ˝.

    3141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587006606315588174881520920962829254091715364367892590360011330530548820466521384146951941511609433057270365759591953092186117381932611793105118548074462379962749567351885752724891227938183011949129833673362440656643086021394946395224737190702179860943702770539217176293176752384674818467669405132000568127145263560827785771342757789609173637178721468440901224953430146549585371050792279689258923542019956112129021960864034418159813629774771309960518707211349999998372978049951059731732816096318595024459455346908302642522308253344685035261931188171010003137838752886587533208381420617177669147303598253490428755468731159562863882353787593751957781857780532171226806613001927876611195909216420198938095257201065485863278865936153381827968230301952035301852968995773622599413891249721775283479131515574857242454150695950829533116861727855889075098381754637464939319255060400927701671139009848824012858361603563707660104710181942955596198946767837449448255379774726847104047534646208046684259069491293313677028989152104752162056966024058038150193511253382430035587640247496473263914199272604269922796782354781636009341721641219924586315030286182974555706749838505494588586926995690927210797509302955321165344987202755960236480665499119881834797753566369807426542527862551818417574672890977772793800081647060016145249192173217214772350141441973568548161361157352552133475741849468438523323907394143334547762416862518983569485562099219222184272550254256887671790494601653466804988627232791786085784383827967976681454100953883786360950680064225125205117392984896084128488626945604241965285022210661186306744278622039194945047123713786960956364371917287467764657573962413890865832645995813390478027590099465764078951269468398352595709825822620522489407726719478268482601476990902640136394437455305068203496252451749399651431429809190659250937221696461515709858387410597885959772975498930161753928468138268683868942774155991855925245953959431049972524680845987273644695848653836736222626099124608051243884390451244136549762780797715691435997700129616089441694868555848406353422072225828488648158456028506016842739452267467678895252138522549954666727823986456596116354886230577456498035593634568174324112515076069479451096596094025228879710893145669136867228748940560101503308617928680920874760917824938589009714909675985261365549781893129784821682998948722658804857564014270477555132379641451523746234364542858444795265867821051141354735739523113427166102135969536231442952484937187110145765403590279934403742007310578539062198387447808478489683321445713868751943506430218453191048481005370614680674919278191197939952061419663428754440643745123718192179998391015919561814675142691239748940907186494231961567945208

  • 5

    Important variations in the

    factorization problem:

    • Maybe need one factor.• Maybe need all factors.• Maybe factors are small.• Maybe factors are large.• Maybe there are many inputs.• Maybe inputs in superposition.

    Important variations in metrics

    (even assuming perfect devices):

    • Qubits.• Area (“A”, including wire area).• Qubit operations (“gates”).• Depth.• Time (“T”: latency).

  • 6

    Short-term RSA security

    1995 Kitaev, 1996 Vedral–

    Barenco–Ekert, 1996 Beckman–

    Chari–Devabhaktuni–Preskill,

    1998 Zalka, 1999 Mosca–Ekert,

    2000 Parker–Plenio, 2001 Seifert,

    2002 Kitaev–Shen–Vyalyi, 2003

    Beauregard, 2006 Takahashi–

    Kunihiro, 2010 Ahmadi–Chiang,

    2014 Svore–Hastings–Freedman,

    2015 Grosshans–Lawson–Morain–

    Smith, 2016 Häner–Roetteler–

    Svore, 2017 Eker̊a–Håstad, 2017

    Johnston: try to squeeze constant

    factors out of Shor’s algorithm.

  • 7

    2003 Beauregard: 2b + 3 qubits.

    : : : 2016 Häner–Roetteler–Svore:

    2b + 2 qubits; 64b3(lg b + O(1))

    Toffoli gates; similar number of

    CNOT gates; depth O(b3).

  • 7

    2003 Beauregard: 2b + 3 qubits.

    : : : 2016 Häner–Roetteler–Svore:

    2b + 2 qubits; 64b3(lg b + O(1))

    Toffoli gates; similar number of

    CNOT gates; depth O(b3).

    Conventional wisdom:

    cannot avoid 2b qubits

    for controlled mulmod.

    e.g. 4096 qubits for b = 2048,

    very common RSA key size.

    So 2048-bit factorization

    needs 4096 qubits?

  • 7

    2003 Beauregard: 2b + 3 qubits.

    : : : 2016 Häner–Roetteler–Svore:

    2b + 2 qubits; 64b3(lg b + O(1))

    Toffoli gates; similar number of

    CNOT gates; depth O(b3).

    Conventional wisdom:

    cannot avoid 2b qubits

    for controlled mulmod.

    e.g. 4096 qubits for b = 2048,

    very common RSA key size.

    So 2048-bit factorization

    needs 4096 qubits?

    No: NFS uses 0 qubits.

  • 8

    NFS takes Lp+o(1) operations

    with p =3p

    92 + 26√

    13=3 > 1:9,

    log L = (log 2b)1=3(log log 2b)2=3.

    Analysis for b = 2048 (not easy!):

    very roughly 2112 operations.

  • 8

    NFS takes Lp+o(1) operations

    with p =3p

    92 + 26√

    13=3 > 1:9,

    log L = (log 2b)1=3(log log 2b)2=3.

    Analysis for b = 2048 (not easy!):

    very roughly 2112 operations.

    2017 Bernstein–Biasse–Mosca:

    Lq+o(1) operations

    with q = 3p

    8=3 ≈ 1:387,using b2=3+o(1) qubits

    (and many non-quantum bits).

  • 8

    NFS takes Lp+o(1) operations

    with p =3p

    92 + 26√

    13=3 > 1:9,

    log L = (log 2b)1=3(log log 2b)2=3.

    Analysis for b = 2048 (not easy!):

    very roughly 2112 operations.

    2017 Bernstein–Biasse–Mosca:

    Lq+o(1) operations

    with q = 3p

    8=3 ≈ 1:387,using b2=3+o(1) qubits

    (and many non-quantum bits).

    Open: Analyze for b = 2048.

    Fewer than 4096 qubits?

    Fewer than 2048 qubits?

  • 9

    Counting operations is an

    oversimplified cost model: ignores

    communication costs, parallelism.

    See, e.g., 1981 Brent–Kung AT

    theorem for realistic chip model.

  • 9

    Counting operations is an

    oversimplified cost model: ignores

    communication costs, parallelism.

    See, e.g., 1981 Brent–Kung AT

    theorem for realistic chip model.

    NFS suffers somewhat from

    communication costs inside

    big linear-algebra subroutine.

    2001 Bernstein:

    AT = Lp′+o(1) with p′ ≈ 1:976.

    2017 Bernstein–Biasse–Mosca:

    AT = Lq′+o(1) with q′ ≈ 1:456

    using b2=3+o(1) qubits.

    Open: Analyze for b = 2048.

  • 10

    Actually have many inputs.

    Lower cost for some output?

    Lower cost for many outputs?

  • 10

    Actually have many inputs.

    Lower cost for some output?

    Lower cost for many outputs?

    1993 Coppersmith:

    L1:638:::+o(1) operations

    after precomp(b) involving

    L2:006:::+o(1) operations.

  • 10

    Actually have many inputs.

    Lower cost for some output?

    Lower cost for many outputs?

    1993 Coppersmith:

    L1:638:::+o(1) operations

    after precomp(b) involving

    L2:006:::+o(1) operations.

    2014 Bernstein–Lange:

    AT = L2:204:::+o(1)

    to factor L0:5+o(1) inputs;

    L1:704:::+o(1) per input.

  • 10

    Actually have many inputs.

    Lower cost for some output?

    Lower cost for many outputs?

    1993 Coppersmith:

    L1:638:::+o(1) operations

    after precomp(b) involving

    L2:006:::+o(1) operations.

    2014 Bernstein–Lange:

    AT = L2:204:::+o(1)

    to factor L0:5+o(1) inputs;

    L1:704:::+o(1) per input.

    Open: Any quantum speedups

    for factoring many integers?

  • 11

    Long-term RSA security

    Long history of advances

    in integer factorization.

    Long history of RSA users

    switching to larger key sizes,

    not far beyond broken sizes.

  • 11

    Long-term RSA security

    Long history of advances

    in integer factorization.

    Long history of RSA users

    switching to larger key sizes,

    not far beyond broken sizes.

    “Expert” cryptographers:

    “Obviously they won’t react to

    Shor’s algorithm this way! They’ll

    switch to codes, lattices, etc. long

    before quantum computers break

    RSA-2048! We don’t need to

    analyze the security of RSA-4096,

    RSA-8192, RSA-16384, etc.!”

  • 12

    We consider possible impact of

    quantum computers. Shouldn’t

    we also consider possible impact

    of users wanting to stick to RSA?

  • 12

    We consider possible impact of

    quantum computers. Shouldn’t

    we also consider possible impact

    of users wanting to stick to RSA?

    2017 Bernstein–Heninger–Lou–

    Valenta “Post-quantum RSA”

    (pqRSA): Generated 1-terabyte

    RSA key; 2000000 core-hours.

    Shor’s algorithm: >2100 gates.

  • 12

    We consider possible impact of

    quantum computers. Shouldn’t

    we also consider possible impact

    of users wanting to stick to RSA?

    2017 Bernstein–Heninger–Lou–

    Valenta “Post-quantum RSA”

    (pqRSA): Generated 1-terabyte

    RSA key; 2000000 core-hours.

    Shor’s algorithm: >2100 gates.

    Bernstein–Fried–Heninger–Lou–

    Valenta: Draft NIST submission

    proposing 1-gigabyte RSA keys.

    Much faster to generate.

  • 13

    The secret primes are small:

    4096 bits in terabyte key;

    1024 bits in gigabyte key.

    Important time-saver in

    keygen, signing, decryption.

    Is this a weakness?

    ECM finds any prime

  • 14

    Analysis for y ≈ 21024:>2125 mulmods, huge depth;

    and 233-bit mulmod is slow.

    223 target primes, but

    finding just one isn’t enough.

  • 14

    Analysis for y ≈ 21024:>2125 mulmods, huge depth;

    and 233-bit mulmod is slow.

    223 target primes, but

    finding just one isn’t enough.

    2017 Bernstein–Heninger–Lou–

    Valenta: Grover+ECM

    finds any prime

  • 14

    Analysis for y ≈ 21024:>2125 mulmods, huge depth;

    and 233-bit mulmod is slow.

    223 target primes, but

    finding just one isn’t enough.

    2017 Bernstein–Heninger–Lou–

    Valenta: Grover+ECM

    finds any prime

  • 15

    Minimum security level that

    NIST allows for post-quantum

    submissions: brute-force/Grover

    search for a 128-bit AES key.

    Is a gigabyte key so difficult for

    Shor’s algorithm to break?

  • 15

    Minimum security level that

    NIST allows for post-quantum

    submissions: brute-force/Grover

    search for a 128-bit AES key.

    Is a gigabyte key so difficult for

    Shor’s algorithm to break?

    64b3 lg b ≈ 2110 for b = 233.

    Not totally implausible to argue

    that Grover’s algorithm could

    break AES-128 faster than this.

  • 15

    Minimum security level that

    NIST allows for post-quantum

    submissions: brute-force/Grover

    search for a 128-bit AES key.

    Is a gigabyte key so difficult for

    Shor’s algorithm to break?

    64b3 lg b ≈ 2110 for b = 233.

    Not totally implausible to argue

    that Grover’s algorithm could

    break AES-128 faster than this.

    But Shor’s algorithm can (with

    more qubits) use faster mulmods.

  • 16

    NIST allows submissions to

    assume reasonable time limits:

    “Plausible values for MAXDEPTH

    range from 240 logical gates (the

    approximate number of gates that

    presently envisioned quantum

    computing architectures are

    expected to serially perform in

    a year) through 264 logical gates

    (the approximate number of gates

    that current classical computing

    architectures can perform serially

    in a decade), to no more than 296

    logical gates : : : ”

  • 17

    What is the minimum time

    for b-bit integer multiplication?

    Light takes time Ω(b1=2)

    to cross a b1=2 × b1=2 chip.

    1981 Brent–Kung AT theorem:

    AT ≥ small constant · b3=2,even if wire latency is 0.

    (Work around obstacles using

    faster-than-light communication

    through long-distance EPR pairs?

    Haven’t seen plausible designs,

    even if reversible computation

    avoids FTL impossibility proofs.)

  • 18

    What is the minimum time

    for Shor’s algorithm?

    Main bottleneck: ae mod N

    for 2b-bit superposition e.

    Traditional approach: series of

    controlled multiplications by

    a and 1=a mod N;

    a2 mod N and 1=a2 mod N;

    a4 mod N and 1=a4 mod N; etc.

    Can multiply these in parallel,

    using many more qubits;

    but hard to parallelize initial

    computation of a2i

    mod N.

  • 19

    Why gigabyte keys are reasonable:

    big enough to push latency

    beyond the 264 limit,

    under reasonable assumptions.

    Gigabyte inputs are

    millions of times larger

    than 2048-bit inputs.

    These algorithms will take

    billions of times longer.

    More cost to find all primes.

  • 19

    Why gigabyte keys are reasonable:

    big enough to push latency

    beyond the 264 limit,

    under reasonable assumptions.

    Gigabyte inputs are

    millions of times larger

    than 2048-bit inputs.

    These algorithms will take

    billions of times longer.

    More cost to find all primes.

    Open: What is minimum time

    for integer factorization?

  • 20

    NIST’s middle security level

    is defined by an AES-192 key.

  • 20

    NIST’s middle security level

    is defined by an AES-192 key.

    With maximum depth 264,

    finding an AES-192 key

    requires ≈2144 cores.

  • 20

    NIST’s middle security level

    is defined by an AES-192 key.

    With maximum depth 264,

    finding an AES-192 key

    requires ≈2144 cores.

    This is nonsense! There is

    not enough time to broadcast

    the input to 2144 parallel

    computations, and not enough

    time to collect the results.

  • 20

    NIST’s middle security level

    is defined by an AES-192 key.

    With maximum depth 264,

    finding an AES-192 key

    requires ≈2144 cores.

    This is nonsense! There is

    not enough time to broadcast

    the input to 2144 parallel

    computations, and not enough

    time to collect the results.

    Is NIST implicitly assuming

    a higher latency limit?

  • 21

    Some improvements to Shor

    (2017 Bernstein–Biasse–Mosca)

    Consider Shor’s algorithm

    factoring N = pe11 · · · p

    eff . Write

    (pj −1)pej−1j as 2

    tj uj with uj odd.

    Unit group is isomorphic to

    Z=2t1 × · · · × Z=2tf × Z=u1 × · · ·.

  • 21

    Some improvements to Shor

    (2017 Bernstein–Biasse–Mosca)

    Consider Shor’s algorithm

    factoring N = pe11 · · · p

    eff . Write

    (pj −1)pej−1j as 2

    tj uj with uj odd.

    Unit group is isomorphic to

    Z=2t1 × · · · × Z=2tf × Z=u1 × · · ·.

    Shor’s algorithm (hopefully)

    computes order r of random unit.

    Order 2cj in Z=2tj is

    2tj with probability 1=2;

    2tj−1 with probability 1=4; etc.

  • 22

    Shor computes gcd{N; ar=2 − 1}.Divisible by pj exactly when

    cj < max{c1; : : : ; cf }.

    Factorization fails iff all cj are

    equal. Chance ≤1=2f−1.

  • 22

    Shor computes gcd{N; ar=2 − 1}.Divisible by pj exactly when

    cj < max{c1; : : : ; cf }.

    Factorization fails iff all cj are

    equal. Chance ≤1=2f−1.

    More subtle problem:

    Factorization is likely to

    split off some of the

    primes with maximum tj .

    Can iterate Shor’s algorithm

    enough times to completely

    factor. Many full-size iterations;

    many more for adversarial inputs.

  • 23

    Better method, inspired by

    primality testing: compute gcd

    with ar=2 + 1, ar=4 + 1, ar=8 + 1,

    : : : , ad + 1, ad − 1, with odd d .

    This splits pj according to cj .

    Any two primes have chance

    ≥1=2 of being split.

    Factors are around half size.

    Much less overhead for recursion.

    Also “parallel construction”:

    Run several times in parallel,

    giving several factorizations.

    Then factor into coprimes.

  • 24

    These methods use >b qubits.

    Didn’t we claim b2=3+o(1) qubits?

    We actually use Grover’s method

    to search for smooth b2=3+o(1)-bit

    numbers in NFS.

    Oracle for Grover’s method:

    factor thoroughly enough

    to recognize smooth inputs.

    We tweak (improved) Shor to

    work in superposition. Careful

    with qubit budget for continued

    fractions, power detection, etc.

  • 25

    A different way to improve

    randomness of factorizations in

    Shor’s algorithm: replace group

    (Z=N)∗ with E(Z=N)

    for a random elliptic curve E.

  • 25

    A different way to improve

    randomness of factorizations in

    Shor’s algorithm: replace group

    (Z=N)∗ with E(Z=N)

    for a random elliptic curve E.

    Gal Dor suggests unifying

    Grover+ECM with Shor: e.g.,

    compute esP on E(Z=N) where

    e is superposition of scalars,

    s is smooth scalar,

    E is superposition of curves.

  • 25

    A different way to improve

    randomness of factorizations in

    Shor’s algorithm: replace group

    (Z=N)∗ with E(Z=N)

    for a random elliptic curve E.

    Gal Dor suggests unifying

    Grover+ECM with Shor: e.g.,

    compute esP on E(Z=N) where

    e is superposition of scalars,

    s is smooth scalar,

    E is superposition of curves.

    Open: What are minimum costs

    for this unification?


Recommended