+ All Categories
Home > Documents > De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp...

De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp...

Date post: 21-May-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
45
High-speed elliptic-curve cryptography D. J. Bernstein Thanks to: University of Illinois at Chicago NSF CCR–9983950 Alfred P. Sloan Foundation Define =2 255 19; prime. Define = 358990. Define Curve : Z 01 1 by coordinate of th multiple of (2 ) on the elliptic curve 2 = 3 + 2 + over F . Main topic of this talk: Compute Curve( ) Curve( ) in very few CPU cycles. In particular, use floating point for fast arithmetic mod .
Transcript
Page 1: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

High-speed

elliptic-curve cryptography

D. J. Bernstein

Thanks to:

University of Illinois at Chicago

NSF CCR–9983950

Alfred P. Sloan Foundation

Define = 2255 � 19; prime.

Define = 358990. Define

Curve : Z 0 � 1 � � � � � � 1 � by� � � coordinate of � th multiple

of (2 � � � � ) on the elliptic curve2 = � 3 + � 2 + � over F � .

Main topic of this talk: Compute

� Curve( ) � Curve( )

in very few CPU cycles.

In particular, use floating point

for fast arithmetic mod .

Page 2: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

High-speed

elliptic-curve cryptography

D. J. Bernstein

Thanks to:

University of Illinois at Chicago

NSF CCR–9983950

Alfred P. Sloan Foundation

Define = 2255 � 19; prime.

Define = 358990. Define

Curve : Z 0 � 1 � � � � � � 1 � by� � � coordinate of � th multiple

of (2 � � � � ) on the elliptic curve2 = � 3 + � 2 + � over F � .

Main topic of this talk: Compute

� Curve( ) � Curve( )

in very few CPU cycles.

In particular, use floating point

for fast arithmetic mod .

Why cryptographers care

Each user has secret key ,

public key Curve( ).

Users with secret keys �

exchange Curve( ) � Curve( )

through an authenticated channel;

compute Curve( ); hash it;

use hash as shared secret to

encrypt and authenticate messages.

Curve speed is important

when number of messages is small.

Page 3: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Define = 2255 � 19; prime.

Define = 358990. Define

Curve : Z 0 � 1 � � � � � � 1 � by� � � coordinate of � th multiple

of (2 � � � � ) on the elliptic curve2 = � 3 + � 2 + � over F � .

Main topic of this talk: Compute

� Curve( ) � Curve( )

in very few CPU cycles.

In particular, use floating point

for fast arithmetic mod .

Why cryptographers care

Each user has secret key ,

public key Curve( ).

Users with secret keys �

exchange Curve( ) � Curve( )

through an authenticated channel;

compute Curve( ); hash it;

use hash as shared secret to

encrypt and authenticate messages.

Curve speed is important

when number of messages is small.

Page 4: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Define = 2255 � 19; prime.

Define = 358990. Define

Curve : Z 0 � 1 � � � � � � 1 � by� � � coordinate of � th multiple

of (2 � � � � ) on the elliptic curve2 = � 3 + � 2 + � over F � .

Main topic of this talk: Compute

� Curve( ) � Curve( )

in very few CPU cycles.

In particular, use floating point

for fast arithmetic mod .

Why cryptographers care

Each user has secret key ,

public key Curve( ).

Users with secret keys �

exchange Curve( ) � Curve( )

through an authenticated channel;

compute Curve( ); hash it;

use hash as shared secret to

encrypt and authenticate messages.

Curve speed is important

when number of messages is small.

Analogous system using 2�

mod :

1976 Diffie Hellman.

Using elliptic curves

to avoid index-calculus attacks:

1986 Miller, 1987 Koblitz.

Using � 3 + � 2 + � for speed:

1987 Montgomery (for ECM).

High precision from fp sums:

1968 Veltkamp, 1971 Dekker.

Speedups: 1999–2005 Bernstein.

Page 5: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Why cryptographers care

Each user has secret key ,

public key Curve( ).

Users with secret keys �

exchange Curve( ) � Curve( )

through an authenticated channel;

compute Curve( ); hash it;

use hash as shared secret to

encrypt and authenticate messages.

Curve speed is important

when number of messages is small.

Analogous system using 2�

mod :

1976 Diffie Hellman.

Using elliptic curves

to avoid index-calculus attacks:

1986 Miller, 1987 Koblitz.

Using � 3 + � 2 + � for speed:

1987 Montgomery (for ECM).

High precision from fp sums:

1968 Veltkamp, 1971 Dekker.

Speedups: 1999–2005 Bernstein.

Page 6: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Why cryptographers care

Each user has secret key ,

public key Curve( ).

Users with secret keys �

exchange Curve( ) � Curve( )

through an authenticated channel;

compute Curve( ); hash it;

use hash as shared secret to

encrypt and authenticate messages.

Curve speed is important

when number of messages is small.

Analogous system using 2�

mod :

1976 Diffie Hellman.

Using elliptic curves

to avoid index-calculus attacks:

1986 Miller, 1987 Koblitz.

Using � 3 + � 2 + � for speed:

1987 Montgomery (for ECM).

High precision from fp sums:

1968 Veltkamp, 1971 Dekker.

Speedups: 1999–2005 Bernstein.

Understanding CPU design

Computers are designed for

music, movies, Photoshop, Doom 3,

etc. Heavy use of fp arithmetic,

i.e., approximate real arithmetic.

Example: Athlon, every cycle,

does one add and one multiply

of high-precision fp numbers.

Programmer paying attention

to these CPU features

can use them for cryptography.

Page 7: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Analogous system using 2�

mod :

1976 Diffie Hellman.

Using elliptic curves

to avoid index-calculus attacks:

1986 Miller, 1987 Koblitz.

Using � 3 + � 2 + � for speed:

1987 Montgomery (for ECM).

High precision from fp sums:

1968 Veltkamp, 1971 Dekker.

Speedups: 1999–2005 Bernstein.

Understanding CPU design

Computers are designed for

music, movies, Photoshop, Doom 3,

etc. Heavy use of fp arithmetic,

i.e., approximate real arithmetic.

Example: Athlon, every cycle,

does one add and one multiply

of high-precision fp numbers.

Programmer paying attention

to these CPU features

can use them for cryptography.

Page 8: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Analogous system using 2�

mod :

1976 Diffie Hellman.

Using elliptic curves

to avoid index-calculus attacks:

1986 Miller, 1987 Koblitz.

Using � 3 + � 2 + � for speed:

1987 Montgomery (for ECM).

High precision from fp sums:

1968 Veltkamp, 1971 Dekker.

Speedups: 1999–2005 Bernstein.

Understanding CPU design

Computers are designed for

music, movies, Photoshop, Doom 3,

etc. Heavy use of fp arithmetic,

i.e., approximate real arithmetic.

Example: Athlon, every cycle,

does one add and one multiply

of high-precision fp numbers.

Programmer paying attention

to these CPU features

can use them for cryptography.

A 53-bit fp number

is a real number 2�

with � � Z and� �

253.

Round each real number � to

closest 53-bit fp number, fp53 � .

Round halves to even.

Examples:

fp53(8675309) = 8675309;

fp53(2127 + 8675309) = 2127;

fp53(2127 � 8675309) = 2127.

Page 9: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Understanding CPU design

Computers are designed for

music, movies, Photoshop, Doom 3,

etc. Heavy use of fp arithmetic,

i.e., approximate real arithmetic.

Example: Athlon, every cycle,

does one add and one multiply

of high-precision fp numbers.

Programmer paying attention

to these CPU features

can use them for cryptography.

A 53-bit fp number

is a real number 2�

with � � Z and� �

253.

Round each real number � to

closest 53-bit fp number, fp53 � .

Round halves to even.

Examples:

fp53(8675309) = 8675309;

fp53(2127 + 8675309) = 2127;

fp53(2127 � 8675309) = 2127.

Page 10: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Understanding CPU design

Computers are designed for

music, movies, Photoshop, Doom 3,

etc. Heavy use of fp arithmetic,

i.e., approximate real arithmetic.

Example: Athlon, every cycle,

does one add and one multiply

of high-precision fp numbers.

Programmer paying attention

to these CPU features

can use them for cryptography.

A 53-bit fp number

is a real number 2�

with � � Z and� �

253.

Round each real number � to

closest 53-bit fp number, fp53 � .

Round halves to even.

Examples:

fp53(8675309) = 8675309;

fp53(2127 + 8675309) = 2127;

fp53(2127 � 8675309) = 2127.

Typical CPU: UltraSPARC III.

Every cycle, UltraSPARC III can do

one fp multiplication� ��� � fp53(

� � )

and one fp addition� ��� � fp53(

� + � ),

subject to limits on � .

“4-cycle fp-operation latency”:

Results available after 4 cycles.

Can substitute subtraction

for addition. I’ll count

subtractions as additions.

Page 11: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

A 53-bit fp number

is a real number 2�

with � � Z and� �

253.

Round each real number � to

closest 53-bit fp number, fp53 � .

Round halves to even.

Examples:

fp53(8675309) = 8675309;

fp53(2127 + 8675309) = 2127;

fp53(2127 � 8675309) = 2127.

Typical CPU: UltraSPARC III.

Every cycle, UltraSPARC III can do

one fp multiplication� ��� � fp53(

� � )

and one fp addition� ��� � fp53(

� + � ),

subject to limits on � .

“4-cycle fp-operation latency”:

Results available after 4 cycles.

Can substitute subtraction

for addition. I’ll count

subtractions as additions.

Page 12: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

A 53-bit fp number

is a real number 2�

with � � Z and� �

253.

Round each real number � to

closest 53-bit fp number, fp53 � .

Round halves to even.

Examples:

fp53(8675309) = 8675309;

fp53(2127 + 8675309) = 2127;

fp53(2127 � 8675309) = 2127.

Typical CPU: UltraSPARC III.

Every cycle, UltraSPARC III can do

one fp multiplication� ��� � fp53(

� � )

and one fp addition� ��� � fp53(

� + � ),

subject to limits on � .

“4-cycle fp-operation latency”:

Results available after 4 cycles.

Can substitute subtraction

for addition. I’ll count

subtractions as additions.

Some variation among CPUs.

PowerPC RS64 IV: One addition

or one multiplication or one

“fused” � ��� ��� � fp53(� � + � ).

Results available after 4 cycles.

Athlon: fp64 instead of fp53;

one multiplication and one addition.

Results available after 4 cycles.

I’ll focus on UltraSPARC III.

Not the most important CPU,

but it’s a good warmup.

Page 13: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Typical CPU: UltraSPARC III.

Every cycle, UltraSPARC III can do

one fp multiplication� ��� � fp53(

� � )

and one fp addition� ��� � fp53(

� + � ),

subject to limits on � .

“4-cycle fp-operation latency”:

Results available after 4 cycles.

Can substitute subtraction

for addition. I’ll count

subtractions as additions.

Some variation among CPUs.

PowerPC RS64 IV: One addition

or one multiplication or one

“fused” � ��� ��� � fp53(� � + � ).

Results available after 4 cycles.

Athlon: fp64 instead of fp53;

one multiplication and one addition.

Results available after 4 cycles.

I’ll focus on UltraSPARC III.

Not the most important CPU,

but it’s a good warmup.

Page 14: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Typical CPU: UltraSPARC III.

Every cycle, UltraSPARC III can do

one fp multiplication� ��� � fp53(

� � )

and one fp addition� ��� � fp53(

� + � ),

subject to limits on � .

“4-cycle fp-operation latency”:

Results available after 4 cycles.

Can substitute subtraction

for addition. I’ll count

subtractions as additions.

Some variation among CPUs.

PowerPC RS64 IV: One addition

or one multiplication or one

“fused” � ��� ��� � fp53(� � + � ).

Results available after 4 cycles.

Athlon: fp64 instead of fp53;

one multiplication and one addition.

Results available after 4 cycles.

I’ll focus on UltraSPARC III.

Not the most important CPU,

but it’s a good warmup.

Exact dot products

If � � � 220 � � � � � 0 � 1 � � � � � 220

then � is a 53-bit fp number

so � = fp53( � ).

If � � ��� � � 220 � � � � � 220

then � ��� � � + � are

53-bit fp numbers so

� = fp53( � ), � = fp53( � ),

� + � = fp53( � + � ).

UltraSPARC III computes

� � ��� � � � + � with

two fp mults, one fp add.

Page 15: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Some variation among CPUs.

PowerPC RS64 IV: One addition

or one multiplication or one

“fused” � ��� ��� � fp53(� � + � ).

Results available after 4 cycles.

Athlon: fp64 instead of fp53;

one multiplication and one addition.

Results available after 4 cycles.

I’ll focus on UltraSPARC III.

Not the most important CPU,

but it’s a good warmup.

Exact dot products

If � � � 220 � � � � � 0 � 1 � � � � � 220

then � is a 53-bit fp number

so � = fp53( � ).

If � � ��� � � 220 � � � � � 220

then � ��� � � + � are

53-bit fp numbers so

� = fp53( � ), � = fp53( � ),

� + � = fp53( � + � ).

UltraSPARC III computes

� � ��� � � � + � with

two fp mults, one fp add.

Page 16: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Some variation among CPUs.

PowerPC RS64 IV: One addition

or one multiplication or one

“fused” � ��� ��� � fp53(� � + � ).

Results available after 4 cycles.

Athlon: fp64 instead of fp53;

one multiplication and one addition.

Results available after 4 cycles.

I’ll focus on UltraSPARC III.

Not the most important CPU,

but it’s a good warmup.

Exact dot products

If � � � 220 � � � � � 0 � 1 � � � � � 220

then � is a 53-bit fp number

so � = fp53( � ).

If � � ��� � � 220 � � � � � 220

then � ��� � � + � are

53-bit fp numbers so

� = fp53( � ), � = fp53( � ),

� + � = fp53( � + � ).

UltraSPARC III computes

� � ��� � � � + � with

two fp mults, one fp add.

Bit extraction

Define � � = 3 � 2� +51,

top �� = fp53(fp53(

� + � � ) � � � ),

bottom � � = fp53(� � top �

� ).

If � is a 53-bit fp number

and� � �

2� +51 then

top �� 2

�Z;

�bottom � �

�2��� 1; and

� = top �� + bottom � � .

Page 17: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Exact dot products

If � � � 220 � � � � � 0 � 1 � � � � � 220

then � is a 53-bit fp number

so � = fp53( � ).

If � � ��� � � 220 � � � � � 220

then � ��� � � + � are

53-bit fp numbers so

� = fp53( � ), � = fp53( � ),

� + � = fp53( � + � ).

UltraSPARC III computes

� � ��� � � � + � with

two fp mults, one fp add.

Bit extraction

Define � � = 3 � 2� +51,

top �� = fp53(fp53(

� + � � ) � � � ),

bottom � � = fp53(� � top �

� ).

If � is a 53-bit fp number

and� � �

2� +51 then

top �� 2

�Z;

�bottom � �

�2��� 1; and

� = top �� + bottom � � .

Page 18: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Exact dot products

If � � � 220 � � � � � 0 � 1 � � � � � 220

then � is a 53-bit fp number

so � = fp53( � ).

If � � ��� � � 220 � � � � � 220

then � ��� � � + � are

53-bit fp numbers so

� = fp53( � ), � = fp53( � ),

� + � = fp53( � + � ).

UltraSPARC III computes

� � ��� � � � + � with

two fp mults, one fp add.

Bit extraction

Define � � = 3 � 2� +51,

top �� = fp53(fp53(

� + � � ) � � � ),

bottom � � = fp53(� � top �

� ).

If � is a 53-bit fp number

and� � �

2� +51 then

top �� 2

�Z;

�bottom � �

�2��� 1; and

� = top �� + bottom � � .

Big integers as fp sums

Every integer mod 2255 � 19

can be written as a sum�

0 + �22 + �

43 + �64 +

�85 + �

107 + �128 + �

149 +�

170 + �192 + �

213 + �234

where � � 2� � 222 � � � � � 222 .

Indices�

are � 255 12 �for 0 � 1 � � � � � 11 .

Representation is not unique;

it’s not the input/output format.

Uniqueness would cost cycles!

Page 19: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Bit extraction

Define � � = 3 � 2� +51,

top �� = fp53(fp53(

� + � � ) � � � ),

bottom � � = fp53(� � top �

� ).

If � is a 53-bit fp number

and� � �

2� +51 then

top �� 2

�Z;

�bottom � �

�2��� 1; and

� = top �� + bottom � � .

Big integers as fp sums

Every integer mod 2255 � 19

can be written as a sum�

0 + �22 + �

43 + �64 +

�85 + �

107 + �128 + �

149 +�

170 + �192 + �

213 + �234

where � � 2� � 222 � � � � � 222 .

Indices�

are � 255 12 �for 0 � 1 � � � � � 11 .

Representation is not unique;

it’s not the input/output format.

Uniqueness would cost cycles!

Page 20: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Bit extraction

Define � � = 3 � 2� +51,

top �� = fp53(fp53(

� + � � ) � � � ),

bottom � � = fp53(� � top �

� ).

If � is a 53-bit fp number

and� � �

2� +51 then

top �� 2

�Z;

�bottom � �

�2��� 1; and

� = top �� + bottom � � .

Big integers as fp sums

Every integer mod 2255 � 19

can be written as a sum�

0 + �22 + �

43 + �64 +

�85 + �

107 + �128 + �

149 +�

170 + �192 + �

213 + �234

where � � 2� � 222 � � � � � 222 .

Indices�

are � 255 12 �for 0 � 1 � � � � � 11 .

Representation is not unique;

it’s not the input/output format.

Uniqueness would cost cycles!

Assume � = � � as above,

and similarly � = � � . Then� � = 0 + 22 + � � � + 468

where 0 = �0 � 0,

22 = �0 � 22 + �

22 � 0,

43 = �0 � 43 + �

22 � 22 + �43 � 0,

etc.

Each � is a 53-bit fp number.

Given � � ’s and � � ’s,

can compute � ’s using

144 fp mults, 121 fp adds.

Page 21: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Big integers as fp sums

Every integer mod 2255 � 19

can be written as a sum�

0 + �22 + �

43 + �64 +

�85 + �

107 + �128 + �

149 +�

170 + �192 + �

213 + �234

where � � 2� � 222 � � � � � 222 .

Indices�

are � 255 12 �for 0 � 1 � � � � � 11 .

Representation is not unique;

it’s not the input/output format.

Uniqueness would cost cycles!

Assume � = � � as above,

and similarly � = � � . Then� � = 0 + 22 + � � � + 468

where 0 = �0 � 0,

22 = �0 � 22 + �

22 � 0,

43 = �0 � 43 + �

22 � 22 + �43 � 0,

etc.

Each � is a 53-bit fp number.

Given � � ’s and � � ’s,

can compute � ’s using

144 fp mults, 121 fp adds.

Page 22: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Big integers as fp sums

Every integer mod 2255 � 19

can be written as a sum�

0 + �22 + �

43 + �64 +

�85 + �

107 + �128 + �

149 +�

170 + �192 + �

213 + �234

where � � 2� � 222 � � � � � 222 .

Indices�

are � 255 12 �for 0 � 1 � � � � � 11 .

Representation is not unique;

it’s not the input/output format.

Uniqueness would cost cycles!

Assume � = � � as above,

and similarly � = � � . Then� � = 0 + 22 + � � � + 468

where 0 = �0 � 0,

22 = �0 � 22 + �

22 � 0,

43 = �0 � 43 + �

22 � 22 + �43 � 0,

etc.

Each � is a 53-bit fp number.

Given � � ’s and � � ’s,

can compute � ’s using

144 fp mults, 121 fp adds.

Furthermore, modulo 2255 � 19,� � �

0 + �22 + � � � + �

234

where �0 = 0 + 19 � 2

� 255255,

�22 = 22 + 19 � 2

� 255277, etc.

Each � � is a 53-bit fp number.

Example: �0 is an integer;

� �0

�381 � 244.

Computing � � ’s from � ’s takes

11 fp mults, 11 fp adds.

Structure: (Z[ � ] Z[2255�12 � ])

(2255 � 12 � 19) Z (2255 � 19).

Page 23: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Assume � = � � as above,

and similarly � = � � . Then� � = 0 + 22 + � � � + 468

where 0 = �0 � 0,

22 = �0 � 22 + �

22 � 0,

43 = �0 � 43 + �

22 � 22 + �43 � 0,

etc.

Each � is a 53-bit fp number.

Given � � ’s and � � ’s,

can compute � ’s using

144 fp mults, 121 fp adds.

Furthermore, modulo 2255 � 19,� � �

0 + �22 + � � � + �

234

where �0 = 0 + 19 � 2

� 255255,

�22 = 22 + 19 � 2

� 255277, etc.

Each � � is a 53-bit fp number.

Example: �0 is an integer;

� �0

�381 � 244.

Computing � � ’s from � ’s takes

11 fp mults, 11 fp adds.

Structure: (Z[ � ] Z[2255�12 � ])

(2255 � 12 � 19) Z (2255 � 19).

Page 24: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Assume � = � � as above,

and similarly � = � � . Then� � = 0 + 22 + � � � + 468

where 0 = �0 � 0,

22 = �0 � 22 + �

22 � 0,

43 = �0 � 43 + �

22 � 22 + �43 � 0,

etc.

Each � is a 53-bit fp number.

Given � � ’s and � � ’s,

can compute � ’s using

144 fp mults, 121 fp adds.

Furthermore, modulo 2255 � 19,� � �

0 + �22 + � � � + �

234

where �0 = 0 + 19 � 2

� 255255,

�22 = 22 + 19 � 2

� 255277, etc.

Each � � is a 53-bit fp number.

Example: �0 is an integer;

� �0

�381 � 244.

Computing � � ’s from � ’s takes

11 fp mults, 11 fp adds.

Structure: (Z[ � ] Z[2255�12 � ])

(2255 � 12 � 19) Z (2255 � 19).

Carries

“Carry from �0 to �

22”:

replace �0 and �

22 by

bottom22�0 and �

22 + top22�0.

This takes 4 fp adds,

and guarantees� �

0�

221.

Series of 13 carries puts all � � ’s

in range for subsequent products:

from �192 to �

213 to �234 to 255;

then from �0 to �

22 to �43 to � � �

to �192 to �

213.

This takes 52 fp adds.

Page 25: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Furthermore, modulo 2255 � 19,� � �

0 + �22 + � � � + �

234

where �0 = 0 + 19 � 2

� 255255,

�22 = 22 + 19 � 2

� 255277, etc.

Each � � is a 53-bit fp number.

Example: �0 is an integer;

� �0

�381 � 244.

Computing � � ’s from � ’s takes

11 fp mults, 11 fp adds.

Structure: (Z[ � ] Z[2255�12 � ])

(2255 � 12 � 19) Z (2255 � 19).

Carries

“Carry from �0 to �

22”:

replace �0 and �

22 by

bottom22�0 and �

22 + top22�0.

This takes 4 fp adds,

and guarantees� �

0�

221.

Series of 13 carries puts all � � ’s

in range for subsequent products:

from �192 to �

213 to �234 to 255;

then from �0 to �

22 to �43 to � � �

to �192 to �

213.

This takes 52 fp adds.

Page 26: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Furthermore, modulo 2255 � 19,� � �

0 + �22 + � � � + �

234

where �0 = 0 + 19 � 2

� 255255,

�22 = 22 + 19 � 2

� 255277, etc.

Each � � is a 53-bit fp number.

Example: �0 is an integer;

� �0

�381 � 244.

Computing � � ’s from � ’s takes

11 fp mults, 11 fp adds.

Structure: (Z[ � ] Z[2255�12 � ])

(2255 � 12 � 19) Z (2255 � 19).

Carries

“Carry from �0 to �

22”:

replace �0 and �

22 by

bottom22�0 and �

22 + top22�0.

This takes 4 fp adds,

and guarantees� �

0�

221.

Series of 13 carries puts all � � ’s

in range for subsequent products:

from �192 to �

213 to �234 to 255;

then from �0 to �

22 to �43 to � � �

to �192 to �

213.

This takes 52 fp adds.

Total 155 mults, 184 adds

to multiply modulo 2255 � 19

in this representation.

184 UltraSPARC III cycles.

= 184 cycles? Two obstacles:

fp-operation latency;

“load/store” latency imposed by

limited number of “registers.”

Schedule instructions carefully

to bring cycles down to 184.

Page 27: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Carries

“Carry from �0 to �

22”:

replace �0 and �

22 by

bottom22�0 and �

22 + top22�0.

This takes 4 fp adds,

and guarantees� �

0�

221.

Series of 13 carries puts all � � ’s

in range for subsequent products:

from �192 to �

213 to �234 to 255;

then from �0 to �

22 to �43 to � � �

to �192 to �

213.

This takes 52 fp adds.

Total 155 mults, 184 adds

to multiply modulo 2255 � 19

in this representation.

184 UltraSPARC III cycles.

= 184 cycles? Two obstacles:

fp-operation latency;

“load/store” latency imposed by

limited number of “registers.”

Schedule instructions carefully

to bring cycles down to 184.

Page 28: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Carries

“Carry from �0 to �

22”:

replace �0 and �

22 by

bottom22�0 and �

22 + top22�0.

This takes 4 fp adds,

and guarantees� �

0�

221.

Series of 13 carries puts all � � ’s

in range for subsequent products:

from �192 to �

213 to �234 to 255;

then from �0 to �

22 to �43 to � � �

to �192 to �

213.

This takes 52 fp adds.

Total 155 mults, 184 adds

to multiply modulo 2255 � 19

in this representation.

184 UltraSPARC III cycles.

= 184 cycles? Two obstacles:

fp-operation latency;

“load/store” latency imposed by

limited number of “registers.”

Schedule instructions carefully

to bring cycles down to 184.

Have developed qhasm,

new programming language

for high-speed computations.

Includes range verification,

guided register allocation, et al.

Lets me write desired code

with much less human time than

traditional asm, C compiler, etc.

Have also used for fast AES,

fast Poly1305, fast Salsa20, etc.;

see, e.g., http://cr.yp.to

/mac/poly1305_athlon.s.

Page 29: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Total 155 mults, 184 adds

to multiply modulo 2255 � 19

in this representation.

184 UltraSPARC III cycles.

= 184 cycles? Two obstacles:

fp-operation latency;

“load/store” latency imposed by

limited number of “registers.”

Schedule instructions carefully

to bring cycles down to 184.

Have developed qhasm,

new programming language

for high-speed computations.

Includes range verification,

guided register allocation, et al.

Lets me write desired code

with much less human time than

traditional asm, C compiler, etc.

Have also used for fast AES,

fast Poly1305, fast Salsa20, etc.;

see, e.g., http://cr.yp.to

/mac/poly1305_athlon.s.

Page 30: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Total 155 mults, 184 adds

to multiply modulo 2255 � 19

in this representation.

184 UltraSPARC III cycles.

= 184 cycles? Two obstacles:

fp-operation latency;

“load/store” latency imposed by

limited number of “registers.”

Schedule instructions carefully

to bring cycles down to 184.

Have developed qhasm,

new programming language

for high-speed computations.

Includes range verification,

guided register allocation, et al.

Lets me write desired code

with much less human time than

traditional asm, C compiler, etc.

Have also used for fast AES,

fast Poly1305, fast Salsa20, etc.;

see, e.g., http://cr.yp.to

/mac/poly1305_athlon.s.

Speedup: Squarings

Often know in advance that � = � .

�0

�64 + �

22�

43 + �43

�22 + �

64�

0

is more efficiently computed as

2( �0

�64 + �

22�

43).

Even better: First compute

2 �0 � 2 �

22 � � � � � 2 �234

and then compute

(2 �0)

�64 + (2 �

22)�

43 etc.

130 fp adds instead of 184.

Makes carry time even more visible.

Page 31: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Have developed qhasm,

new programming language

for high-speed computations.

Includes range verification,

guided register allocation, et al.

Lets me write desired code

with much less human time than

traditional asm, C compiler, etc.

Have also used for fast AES,

fast Poly1305, fast Salsa20, etc.;

see, e.g., http://cr.yp.to

/mac/poly1305_athlon.s.

Speedup: Squarings

Often know in advance that � = � .

�0

�64 + �

22�

43 + �43

�22 + �

64�

0

is more efficiently computed as

2( �0

�64 + �

22�

43).

Even better: First compute

2 �0 � 2 �

22 � � � � � 2 �234

and then compute

(2 �0)

�64 + (2 �

22)�

43 etc.

130 fp adds instead of 184.

Makes carry time even more visible.

Page 32: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Have developed qhasm,

new programming language

for high-speed computations.

Includes range verification,

guided register allocation, et al.

Lets me write desired code

with much less human time than

traditional asm, C compiler, etc.

Have also used for fast AES,

fast Poly1305, fast Salsa20, etc.;

see, e.g., http://cr.yp.to

/mac/poly1305_athlon.s.

Speedup: Squarings

Often know in advance that � = � .

�0

�64 + �

22�

43 + �43

�22 + �

64�

0

is more efficiently computed as

2( �0

�64 + �

22�

43).

Even better: First compute

2 �0 � 2 �

22 � � � � � 2 �234

and then compute

(2 �0)

�64 + (2 �

22)�

43 etc.

130 fp adds instead of 184.

Makes carry time even more visible.

Speedup: Karatsuba’s method

Say 0 = �0 + �

22 � + � � � + �107 � 5,

1 = �128 + �

149 � + � � � + �234 � 5,

0 = � 0 + � � � , 1 = � 128 + � � � .

Original, 184 adds: Product is

0 0 +( 0 1 + 1 0) � 6 + 1 1 � 12.

Karatsuba, 182 adds:

(( 0+ 1)( 0+ 1) �0 0

�1 1) � 6

+ 0 0 + 1 1 � 12.

Improved Karatsuba, 177 adds:

( 0 + 1)( 0 + 1) � 6

+ ( 0 0�

1 1 � 6)(1 � � 6).

Page 33: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Speedup: Squarings

Often know in advance that � = � .

�0

�64 + �

22�

43 + �43

�22 + �

64�

0

is more efficiently computed as

2( �0

�64 + �

22�

43).

Even better: First compute

2 �0 � 2 �

22 � � � � � 2 �234

and then compute

(2 �0)

�64 + (2 �

22)�

43 etc.

130 fp adds instead of 184.

Makes carry time even more visible.

Speedup: Karatsuba’s method

Say 0 = �0 + �

22 � + � � � + �107 � 5,

1 = �128 + �

149 � + � � � + �234 � 5,

0 = � 0 + � � � , 1 = � 128 + � � � .

Original, 184 adds: Product is

0 0 +( 0 1 + 1 0) � 6 + 1 1 � 12.

Karatsuba, 182 adds:

(( 0+ 1)( 0+ 1) �0 0

�1 1) � 6

+ 0 0 + 1 1 � 12.

Improved Karatsuba, 177 adds:

( 0 + 1)( 0 + 1) � 6

+ ( 0 0�

1 1 � 6)(1 � � 6).

Page 34: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Speedup: Squarings

Often know in advance that � = � .

�0

�64 + �

22�

43 + �43

�22 + �

64�

0

is more efficiently computed as

2( �0

�64 + �

22�

43).

Even better: First compute

2 �0 � 2 �

22 � � � � � 2 �234

and then compute

(2 �0)

�64 + (2 �

22)�

43 etc.

130 fp adds instead of 184.

Makes carry time even more visible.

Speedup: Karatsuba’s method

Say 0 = �0 + �

22 � + � � � + �107 � 5,

1 = �128 + �

149 � + � � � + �234 � 5,

0 = � 0 + � � � , 1 = � 128 + � � � .

Original, 184 adds: Product is

0 0 +( 0 1 + 1 0) � 6 + 1 1 � 12.

Karatsuba, 182 adds:

(( 0+ 1)( 0+ 1) �0 0

�1 1) � 6

+ 0 0 + 1 1 � 12.

Improved Karatsuba, 177 adds:

( 0 + 1)( 0 + 1) � 6

+ ( 0 0�

1 1 � 6)(1 � � 6).

The Curve function

Overall strategy to compute

� Curve( ) � Curve( ),

using arithmetic mod = 2255 � 19:

For various integers � ,

find � � � � � such that

Curve( � ) � � � � (mod ),

i.e., � � Curve( � ) � � (mod ).

e.g. �1 = Curve( ), � 1 = 1,

assuming Curve( ) = .

Can easily restrict � Curve( )

to ensure that never appears.

Page 35: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Speedup: Karatsuba’s method

Say 0 = �0 + �

22 � + � � � + �107 � 5,

1 = �128 + �

149 � + � � � + �234 � 5,

0 = � 0 + � � � , 1 = � 128 + � � � .

Original, 184 adds: Product is

0 0 +( 0 1 + 1 0) � 6 + 1 1 � 12.

Karatsuba, 182 adds:

(( 0+ 1)( 0+ 1) �0 0

�1 1) � 6

+ 0 0 + 1 1 � 12.

Improved Karatsuba, 177 adds:

( 0 + 1)( 0 + 1) � 6

+ ( 0 0�

1 1 � 6)(1 � � 6).

The Curve function

Overall strategy to compute

� Curve( ) � Curve( ),

using arithmetic mod = 2255 � 19:

For various integers � ,

find � � � � � such that

Curve( � ) � � � � (mod ),

i.e., � � Curve( � ) � � (mod ).

e.g. �1 = Curve( ), � 1 = 1,

assuming Curve( ) = .

Can easily restrict � Curve( )

to ensure that never appears.

Page 36: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Speedup: Karatsuba’s method

Say 0 = �0 + �

22 � + � � � + �107 � 5,

1 = �128 + �

149 � + � � � + �234 � 5,

0 = � 0 + � � � , 1 = � 128 + � � � .

Original, 184 adds: Product is

0 0 +( 0 1 + 1 0) � 6 + 1 1 � 12.

Karatsuba, 182 adds:

(( 0+ 1)( 0+ 1) �0 0

�1 1) � 6

+ 0 0 + 1 1 � 12.

Improved Karatsuba, 177 adds:

( 0 + 1)( 0 + 1) � 6

+ ( 0 0�

1 1 � 6)(1 � � 6).

The Curve function

Overall strategy to compute

� Curve( ) � Curve( ),

using arithmetic mod = 2255 � 19:

For various integers � ,

find � � � � � such that

Curve( � ) � � � � (mod ),

i.e., � � Curve( � ) � � (mod ).

e.g. �1 = Curve( ), � 1 = 1,

assuming Curve( ) = .

Can easily restrict � Curve( )

to ensure that never appears.

We’ll see how to compute� � � � � � �

2� � � 2

� ; and� � � � � � � �

+1 � � �+1 � Curve( )

� �2

�+1 � � 2

�+1.

Combine to compute� � � � � � � �

+1 � � �+1 � � Curve( )

� � � � � � � � �

+1 � � �

+1

where =� � 2 � , = � mod 2.

Conditional branches and

input-dependent load addresses

can leak via timing.

Replace with arithmetic:

e.g., (1 � ) � � + ( ) � �+1.

Page 37: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

The Curve function

Overall strategy to compute

� Curve( ) � Curve( ),

using arithmetic mod = 2255 � 19:

For various integers � ,

find � � � � � such that

Curve( � ) � � � � (mod ),

i.e., � � Curve( � ) � � (mod ).

e.g. �1 = Curve( ), � 1 = 1,

assuming Curve( ) = .

Can easily restrict � Curve( )

to ensure that never appears.

We’ll see how to compute� � � � � � �

2� � � 2

� ; and� � � � � � � �

+1 � � �+1 � Curve( )

� �2

�+1 � � 2

�+1.

Combine to compute� � � � � � � �

+1 � � �+1 � � Curve( )

� � � � � � � � �

+1 � � �

+1

where =� � 2 � , = � mod 2.

Conditional branches and

input-dependent load addresses

can leak via timing.

Replace with arithmetic:

e.g., (1 � ) � � + ( ) � �+1.

Page 38: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

The Curve function

Overall strategy to compute

� Curve( ) � Curve( ),

using arithmetic mod = 2255 � 19:

For various integers � ,

find � � � � � such that

Curve( � ) � � � � (mod ),

i.e., � � Curve( � ) � � (mod ).

e.g. �1 = Curve( ), � 1 = 1,

assuming Curve( ) = .

Can easily restrict � Curve( )

to ensure that never appears.

We’ll see how to compute� � � � � � �

2� � � 2

� ; and� � � � � � � �

+1 � � �+1 � Curve( )

� �2

�+1 � � 2

�+1.

Combine to compute� � � � � � � �

+1 � � �+1 � � Curve( )

� � � � � � � � �

+1 � � �

+1

where =� � 2 � , = � mod 2.

Conditional branches and

input-dependent load addresses

can leak via timing.

Replace with arithmetic:

e.g., (1 � ) � � + ( ) � �+1.

Eventually reach � = .

Divide � � by � � modulo

to obtain Curve( ).

Simple division method: Fermat!� � � � � � �

� � 2� .

Euclid-type division methods

are faster but have

input-dependent timings.

Finally convert from

floating-point representation

to byte-string output format.

Page 39: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

We’ll see how to compute� � � � � � �

2� � � 2

� ; and� � � � � � � �

+1 � � �+1 � Curve( )

� �2

�+1 � � 2

�+1.

Combine to compute� � � � � � � �

+1 � � �+1 � � Curve( )

� � � � � � � � �

+1 � � �

+1

where =� � 2 � , = � mod 2.

Conditional branches and

input-dependent load addresses

can leak via timing.

Replace with arithmetic:

e.g., (1 � ) � � + ( ) � �+1.

Eventually reach � = .

Divide � � by � � modulo

to obtain Curve( ).

Simple division method: Fermat!� � � � � � �

� � 2� .

Euclid-type division methods

are faster but have

input-dependent timings.

Finally convert from

floating-point representation

to byte-string output format.

Page 40: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

We’ll see how to compute� � � � � � �

2� � � 2

� ; and� � � � � � � �

+1 � � �+1 � Curve( )

� �2

�+1 � � 2

�+1.

Combine to compute� � � � � � � �

+1 � � �+1 � � Curve( )

� � � � � � � � �

+1 � � �

+1

where =� � 2 � , = � mod 2.

Conditional branches and

input-dependent load addresses

can leak via timing.

Replace with arithmetic:

e.g., (1 � ) � � + ( ) � �+1.

Eventually reach � = .

Divide � � by � � modulo

to obtain Curve( ).

Simple division method: Fermat!� � � � � � �

� � 2� .

Euclid-type division methods

are faster but have

input-dependent timings.

Finally convert from

floating-point representation

to byte-string output format.

From � to 2 �

In Z :�

2� = ( � 2� � � 2� )2,

� 2� = 4 � � � � ( � 2� + � � � � + � 2� ).

Compute as follows:

( � � � � � )2; ( � � + � � )2;�

2� = ( � � � � � )2( � � + � � )2;

4 � � � � = ( � � + � � )2 � ( � � � � � )2;

( � 2) � � � � = 89747 � 4 � � � � ;

� 2� =

4 � � � � (( � � + � � )2 + ( � 2) � � � � ).

Page 41: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Eventually reach � = .

Divide � � by � � modulo

to obtain Curve( ).

Simple division method: Fermat!� � � � � � �

� � 2� .

Euclid-type division methods

are faster but have

input-dependent timings.

Finally convert from

floating-point representation

to byte-string output format.

From � to 2 �

In Z :�

2� = ( � 2� � � 2� )2,

� 2� = 4 � � � � ( � 2� + � � � � + � 2� ).

Compute as follows:

( � � � � � )2; ( � � + � � )2;�

2� = ( � � � � � )2( � � + � � )2;

4 � � � � = ( � � + � � )2 � ( � � � � � )2;

( � 2) � � � � = 89747 � 4 � � � � ;

� 2� =

4 � � � � (( � � + � � )2 + ( � 2) � � � � ).

Page 42: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

Eventually reach � = .

Divide � � by � � modulo

to obtain Curve( ).

Simple division method: Fermat!� � � � � � �

� � 2� .

Euclid-type division methods

are faster but have

input-dependent timings.

Finally convert from

floating-point representation

to byte-string output format.

From � to 2 �

In Z :�

2� = ( � 2� � � 2� )2,

� 2� = 4 � � � � ( � 2� + � � � � + � 2� ).

Compute as follows:

( � � � � � )2; ( � � + � � )2;�

2� = ( � � � � � )2( � � + � � )2;

4 � � � � = ( � � + � � )2 � ( � � � � � )2;

( � 2) � � � � = 89747 � 4 � � � � ;

� 2� =

4 � � � � (( � � + � � )2 + ( � 2) � � � � ).

From � � � + 1 to 2 � + 1

�2

+1 = 4( � � � �

+1� � � � �

+1)2,

� 2�

+1 =

4( � � � �

+1� � � � �

+1)2 Curve( ).

Compute as follows:

( � � � � � )( � �

+1 + � �

+1);

( � � + � � )( � �

+1� � �

+1);

2( � � � �

+1� � � � �

+1) = sum;

2( � � � �

+1� � � � �

+1) = difference;�

2�

+1 = (2( � � � �

+1� � � � �

+1))2;

(2( � � � �

+1� � � � �

+1))2;

� 2�

+1 = ( � � � ) Curve( ).

Page 43: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

From � to 2 �

In Z :�

2� = ( � 2� � � 2� )2,

� 2� = 4 � � � � ( � 2� + � � � � + � 2� ).

Compute as follows:

( � � � � � )2; ( � � + � � )2;�

2� = ( � � � � � )2( � � + � � )2;

4 � � � � = ( � � + � � )2 � ( � � � � � )2;

( � 2) � � � � = 89747 � 4 � � � � ;

� 2� =

4 � � � � (( � � + � � )2 + ( � 2) � � � � ).

From � � � + 1 to 2 � + 1

�2

+1 = 4( � � � �

+1� � � � �

+1)2,

� 2�

+1 =

4( � � � �

+1� � � � �

+1)2 Curve( ).

Compute as follows:

( � � � � � )( � �

+1 + � �

+1);

( � � + � � )( � �

+1� � �

+1);

2( � � � �

+1� � � � �

+1) = sum;

2( � � � �

+1� � � � �

+1) = difference;�

2�

+1 = (2( � � � �

+1� � � � �

+1))2;

(2( � � � �

+1� � � � �

+1))2;

� 2�

+1 = ( � � � ) Curve( ).

Page 44: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

From � to 2 �

In Z :�

2� = ( � 2� � � 2� )2,

� 2� = 4 � � � � ( � 2� + � � � � + � 2� ).

Compute as follows:

( � � � � � )2; ( � � + � � )2;�

2� = ( � � � � � )2( � � + � � )2;

4 � � � � = ( � � + � � )2 � ( � � � � � )2;

( � 2) � � � � = 89747 � 4 � � � � ;

� 2� =

4 � � � � (( � � + � � )2 + ( � 2) � � � � ).

From � � � + 1 to 2 � + 1

�2

+1 = 4( � � � �

+1� � � � �

+1)2,

� 2�

+1 =

4( � � � �

+1� � � � �

+1)2 Curve( ).

Compute as follows:

( � � � � � )( � �

+1 + � �

+1);

( � � + � � )( � �

+1� � �

+1);

2( � � � �

+1� � � � �

+1) = sum;

2( � � � �

+1� � � � �

+1) = difference;�

2�

+1 = (2( � � � �

+1� � � � �

+1))2;

(2( � � � �

+1� � � � �

+1))2;

� 2�

+1 = ( � � � ) Curve( ).

Total time

Slightly over 1600 fp adds

(520 from carries)

for each bit of .

Total for 256-bit :

413000 fp adds; plus

50000 fp adds for final division.

Aiming for 500000 cycles.

Still have to finish software.

Should end up even faster than

my NIST P-224 software,

despite 14% more bits!

Page 45: De ne = 2 Curve : of (2 ) on the elliptic curve + over F · 1968 V eltk amp, 1971 Dekk er. Sp eedups: 1999{2005 Bernstein. Understanding CPU design Computers are designed for music,

From � � � + 1 to 2 � + 1

�2

+1 = 4( � � � �

+1� � � � �

+1)2,

� 2�

+1 =

4( � � � �

+1� � � � �

+1)2 Curve( ).

Compute as follows:

( � � � � � )( � �

+1 + � �

+1);

( � � + � � )( � �

+1� � �

+1);

2( � � � �

+1� � � � �

+1) = sum;

2( � � � �

+1� � � � �

+1) = difference;�

2�

+1 = (2( � � � �

+1� � � � �

+1))2;

(2( � � � �

+1� � � � �

+1))2;

� 2�

+1 = ( � � � ) Curve( ).

Total time

Slightly over 1600 fp adds

(520 from carries)

for each bit of .

Total for 256-bit :

413000 fp adds; plus

50000 fp adds for final division.

Aiming for 500000 cycles.

Still have to finish software.

Should end up even faster than

my NIST P-224 software,

despite 14% more bits!


Recommended