(Comparing) Hardware Complexity
of Cryptographic Algorithms
Liam Marnane
University College Cork
Claude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Thanks
Work of the following Post-Graduate Students
• Dr Francis Crowe
• Maurice Keller
• Andrew Byrne
ECCUCD, 6th September 2007. 1
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Outline
• Hardware & Measuring Hardware Complexity
• Comparing Elliptic Curve Implementations to Other
Cryptographic Systems
• Comparing Complexity of Different Elliptic Curve
Implementations
ECCUCD, 6th September 2007. 2
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Why Hardware?
• Hardware allows exploitation of Parallelism in Algorithm.
• Hardware Implementation for Increase in Processing Speed:
– Increased Throughput:-
Number of bits processed per second.
– Reduce Time Taken:-
How long it takes to perform the calculation.
• Other Benefits:
– Reduce Power
– Increased Security
ECCUCD, 6th September 2007. 3
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Hardware Complexity?
• Cost Benefit Analysis
– How do we measure the Benefit of the implementa-
tion?
– How do we measure the Cost of the implementation?
• Use Metrics
– Clock Speed, Throughput, Time taken
– Area
– Power, Energy
ECCUCD, 6th September 2007. 4
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Cost:- Area
• No such thing as
Free Silicon
ci
ci+1
ai
bi
si
• Number of Transistors
– Number of Boolean Gates or
Combinational Logic
– Number of Flip Flops or
Synchronous Logic
(Registers, Memory)
• Wiring or Interconnect
– Can Dominate Area in Large
Designs
ECCUCD, 6th September 2007. 5
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Benefit:- Processing Speed
• Clock Speed of Design (MHz GHz)
• Clock Speed determined by the time the hardware takes
to carry out an operation.
– Addition:- Very fast Circuit
– Multiplication:- Slower Circuit
• Critical Path of Circuit
– Change Input
– ⇒ Time through each gate and wire
– ⇒ Output available
ECCUCD, 6th September 2007. 6
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Ripple Carry Adder
FAFAFAFA
clk
clk clk
S Reg
A Reg B Reg
c−1c0c1c2cm−2
s0s1s2sm−2
a0a1a2am−2
b0b1b2bm−2
Critical Path
Output Delay of A/B
Register
+
Ripple of Carry
through Combinational
Full Adders
+
Setup time of S
Register
ECCUCD, 6th September 2007. 7
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Throughput vs Time Taken
• Throughput
– Bits per Second (Hopefully MBits/S or GBits/S).
– How long it takes to encrypt a Book using AES.
– How many public key signatures per second can be
calculated using RSA on an e-commerce server.
• Time Taken
– Seconds (Hopefully mS and µS)
– How long it takes to carry out a single key exchange
using ECC on a PDA
ECCUCD, 6th September 2007. 8
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Low Power
• Cost or Benefit
• Mobility or Heat Dissipation
• Energy or Power
• Energy :- Current flowing throughout calculation
– Battery Lifetime
• Power:- Maximum Current flowing at particular time
– Battery Type
• Trade off in Battery design Power vs Energy
ECCUCD, 6th September 2007. 9
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Hardware Complexity of ECC
Implementations
• Compare FPGA Implementation of ECC to:
– Private Key Algorithm AES
– Hash Algorithm SHA
– RSA
• Use:- Area, Clock Speed, Throughput and
Throughput per unit area.
ECCUCD, 6th September 2007. 10
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Field Programmable Gate Arrays
• Excellent for Rapid Prototyping of Hardware Implemen-
tations of Signal Processing Algorithms.
• Industry:- time to market.
• University Research:- Cost and Reuse.
• Very large FPGAs available.
• Are Suitable for Implementations of Cryptographic Al-
gorithms.
• Complete Security protocol on a single FPGA
ECCUCD, 6th September 2007. 11
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Underlying FPGA Architecture
• Typically, Field Programmable Logic Devices consist of:
– 4-input Lookup Tables:- Boolean Logic
– Simple D-type Latches:- FSM & Memory
– Control Logic
• The device is arranged in an array of Configurable Blocks,
ECCUCD, 6th September 2007. 12
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Xilinx Virtex Configurable Logic Block (CLB)
D Q
YB
Y
YQCarry &Control
LUT
G4
G3
G2
G1
BY
D Q
XB
X
XQCarry &Control
LUT
F4
F3
F2
F1
BX
COUT
CIN
D QCarry &Control
LUT
D QCarry &Control
LUT
G4
G3
G2
G1
BY
F4
F3
F2
F1
BX
YB
Y
YQ
XB
X
XQ
CIN
COUT
Slice 1 Slice 0
ECCUCD, 6th September 2007. 13
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Underlying FPGA Architecture
• Typically, Field Programmable Logic Devices consist of:
– 4-input Lookup Tables
– Simple D-type Latches
– Control Logic
• The device is arranged in an array of Configurable Blocks
with communication between them:
– local interconnect between adjacent CLBs:- Fast
– Global interconnect for Buses and communication
between functional blocks on FPGA:- Slow
ECCUCD, 6th September 2007. 14
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Local and Global Interconnect between CLB’s
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
ECCUCD, 6th September 2007. 15
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Underlying FPGA Architecture
• Typically, Field Programmable Logic Devices consist of:
– 4-input Lookup Tables
– Simple D-type Latches
– Control Logic
• The device is arranged in an array of Configurable Blocks,
with local and global interconnect between
• Dedicated High-Speed Carry-Chain Propagation
accelerates Arithmetic operations.
• Parallel multipliers, dedicated memory, RISC Processor.
ECCUCD, 6th September 2007. 16
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Advanced Encryption Standard
• AES on 128 bits of data.
• Depending on the key size
AES repeats 10, 12 or
14 times the basic round
function.
• SubBytes Look Up table:-
– Dominate Area
– Number used dictates
area and throughput.
ByteSub
ShiftRow
MixColumn
AddKey
Input
Output
Round Key
ECCUCD, 6th September 2007. 17
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Architecture Types
(c) Pipelined
Round Round 1 Round N
(b) Unrolled(a) Iterative
Round 1 Round N
• Exploit Parallelism by Loop Unrolling.
• Increase Throughput through Pipelining:- Reduction in
the length of critical path at cost of increased Latency.
ECCUCD, 6th September 2007. 18
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Feedback Modes of Operation
Cipher Block Chaining Encrypt
Encrypt Encrypt
IV
Encrypt Encrypt
Time = 1 Time = 2 Time = n − 1 Time = n
P1 P2 PN−1 PN
KK K K
C1 C2 CN−1 CN
AESAES AES AES
• Pipeline Cannot be kept filled as C1 required before pro-
ceeding with P2
ECCUCD, 6th September 2007. 19
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Secure Hash Standard
• SHA-512 operates on a
message in 1024 bit blocks
and produces a 512 bit
hash value.
• Processing block operates
on 64 bit word through 80
iterations of a compres-
sion function.
• Critical path is Five 64 bit
additions.
HMt (N)Wt
PaddingBlock
MessageScheduler
ProcessingBlock
Message
Start Hash Hash ReadyControlBlock
• Architecture choices:
Loop unrolling and
pipelining.
ECCUCD, 6th September 2007. 20
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Modular Multiplication
• Montgomery proposed
modular multiplication
through a series of ad-
ditions & right-shifts ⇒
Suitable for hardware
implementation.
• Bit lengths dictate bit se-
rial or digital serial ap-
proach
m
+
+
B biA
Mlsbqi
m+2m+1
m+2
mm+2
R(i)
/ 2
ECCUCD, 6th September 2007. 21
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
RSA Architectures
• Number of Multipliers:- Exponentiation Algorithm Used
– R-L Exponentiation:- 2 modular multipliers in par-
allel
– L-R Exponentiation:- Single Modular Multiplier.
• Addition of Large Numbers:- Carry Save versus Carry
Propagate, (Area versus time).
• Suitable for Extensive Pipelining to reduce the critical
path.
ECCUCD, 6th September 2007. 22
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EC Design Choices
for Q=[k]P
m
GF(3 )m
GF(p ), p>3m
Coordinates Affine
Projective HomogenousJacobianLopez−Dahab
GF(P)Base Field
Curve Choose a, b
Algorithm
.........
Addition/Subtraction − NAF
Binary Double and Add
Montgomery Method
GF(2 )
ECCUCD, 6th September 2007. 23
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EC Design Implemented
• Prime Field Fp of 256bits
(security equivalent to 3072bit RSA)
• Co-ordinates:
– Affine requiring Modular Multiplication and
Inversion
– Jacobian Projective
• Bit serial Montgomery Multiplier
• Extended Euclidean Algorithm requiring
512 clock cycles
ECCUCD, 6th September 2007. 24
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Design Complexity
Algorithm Area Clock Throughput Thpt/Area
(CLBs) (MHz) (Mbps) (bps/CLB)
AES 3,259 27.86 324 99417
SHA-512 2,468 40.02 506 205024
RSA - 1024 8,064 51.84 0.051 6.32
ECC-256A 2,718 19.19 0.00873 3.21
ECC-256J 1,353 20.45 0.00439 3.24
• Single Round of AES, with no Pipelining
• Single compression core for SHA
• R-L Algorithm for RSA, 2 Multipliers
• Affine Co-ordinates in Fp for ECC with
multiplier, inverter and adder.
• Jacobian in Fp without conversion.
ECCUCD, 6th September 2007. 25
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Design Complexity
Algorithm Area Clock Throughput Thpt/Area
(CLBs) (MHz) (Mbps) (bps/CLB)
AES 3,259 27.86 324 99417
SHA-512 2,468 40.02 506 205024
RSA - 1024 8,064 51.84 0.051 6.32
ECC-256A 2,718 19.19 0.00873 3.21
ECC-256J 1,353 20.45 0.00439 3.24
• Single Round of AES
• Key Expansion in hardware
• Encryption and Decryption
• No Pipelining
• 16 asynchronous ROMs used (60% of
CLBs)
ECCUCD, 6th September 2007. 26
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Design Complexity
Algorithm Area Clock Throughput Thpt/Area
(CLBs) (MHz) (Mbps) (bps/CLB)
AES 3,259 27.86 324 99417
SHA-512 2,468 40.02 506 205024
RSA - 1024 8,064 51.84 0.051 6.32
ECC-256A 2,718 19.19 0.00873 3.21
ECC-256J 1,353 20.45 0.00439 3.24
• Single Iterative Compression Block Used
• Carry Propagate Adders Used
• 4 Unrolled Architecture:- 3,650 CLBs
• Throughput:- 610 Mbs
• Clock:- 12.51 MHz
• Throughput per CLB of 167000
ECCUCD, 6th September 2007. 27
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Design Complexity
Algorithm Area Clock Throughput Thpt/Area
(CLBs) (MHz) (Mbps) (bps/CLB)
AES 3,259 27.86 324 99417
SHA-512 2,468 40.02 506 205024
RSA - 1024 8,064 51.84 0.051 6.32
ECC-256A 2,718 19.19 0.00873 3.21
ECC-256J 1,353 20.45 0.00439 3.24
• R-L Algorithm for RSA
• Requires 2 Montgomery Multipliers
• Bit Length of 1026 required
• Carry Propagate Adders Used
• Extensive Pipelining
• Maximum Carry Chain of 130 bit.
ECCUCD, 6th September 2007. 28
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Design Complexity
Algorithm Area Clock Throughput Thpt/Area
(CLBs) (MHz) (Mbps) (bps/CLB)
AES 3,259 27.86 324 99417
SHA-512 2,468 40.02 506 205024
RSA - 1024 8,064 51.84 0.051 6.32
ECC-256A 2,718 19.19 0.00873 3.21
ECC-256J 1,353 20.45 0.00439 3.24
• Affine Co-ordinates in Fp
• 256 bit, Bit Serial Montgomery Multiplier
• Extended Euclidean Algorithm
• Point Addition:- Inversion, 3 Multiplica-
tions, 6 Additions
• Point Doubling:- Inversion, 4 Multiplica-
tions, 4 Additions
ECCUCD, 6th September 2007. 29
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Design Complexity
Algorithm Area Clock Throughput Thpt/Area
(CLBs) (MHz) (Mbps) (bps/CLB)
AES 3,259 27.86 324 99417
SHA-512 2,468 40.02 506 205024
RSA - 1024 8,064 51.84 0.051 6.32
ECC-256A 2,718 19.19 0.00873 3.21
ECC-256J 1,353 20.45 0.00439 3.24
• Jacobian co-ordinates in Fp without con-
version.
• Does not include cost of conversion to
Affine.
• Point Addition:- 16 Multiplications, 7
Additions
• Point Doubling:- 10 Multiplications, 4
Additions
ECCUCD, 6th September 2007. 30
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Design Complexity
Algorithm Area Clock Throughput Thpt/Area
(CLBs) (MHz) (Mbps) (bps/CLB)
AES 3,259 27.86 324 99417
SHA-512 2,468 40.02 506 205024
RSA - 1024 8,064 51.84 0.051 6.32
ECC-256A 2,718 19.19 0.00873 3.21
ECC-256J 1,353 20.45 0.00439 3.24
Tate-256 8,438 34.74 0.01868 2.21
• Millers Algorithm
• Jacobian Co-ordinates for Point Addition
and Doubling
• Security Multiplier k = 4
• Karatsuba’s Method for Multiplication in
Fp4
• Includes final Modular Exponentiation
ECCUCD, 6th September 2007. 31
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Summary
• Most Area demanding is the RSA algorithm, due to the
large 1024 bit key size. (Note Security Level)
• Most Efficient in terms of throughput per CLB is Hash
algorithm.
• Mathematical complexity of ECC results in least effi-
cient designs. (Note similar throughput per clb figure
for Jacobian and Affine).
• Virtex-E 2000 has 19,200 CLBs and is suitable for im-
plementing all of these algorithms.
ECCUCD, 6th September 2007. 32
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Power Consumption & EC Design
Choices
• What is the effect of the EC design choices on the Power
and Energy consumption of Hardware implementation?
• FPGA platform used
• (FPGAs are not suitable for Low Power implementa-
tions)
• Difference between implementations not Absolute value
ECCUCD, 6th September 2007. 33
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EC Design Choices
for Q=[k]P
m
GF(p ), p>3m
Coordinates Affine
Projective HomogenousJacobianLopez−Dahab
GF(2 )m GF(2 )163 m = 163=> security equivalent to 1024−bit RSA
NIST Recommended GF(2 ) Curve163
GF(P)Base Field
Curve Choose a, b
Algorithm
.........
Addition/Subtraction − NAF
Binary Double and Add
Montgomery Method
GF(3 )
ECCUCD, 6th September 2007. 34
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EC Design Choices
for Q=[k]P
m
GF(3 )m
GF(p ), p>3m
GF(P)Base Field
Curve Choose a, b
Algorithm
.........
Addition/Subtraction − NAF
Binary Double and Add
Coordinates Affine
Projective HomogenousJacobianLopez−Dahab
Montgomery Method
GF(2 )
ECCUCD, 6th September 2007. 35
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EC Coordinate Systems
• Affine: P = (x, y)
• Projective: P = (X,Y, Z)
– Advantage: Point addition and doubling can be per-
formed without any GF (2m) division
– Affine to projective conversion: (x, y) → (x, y, 1)
– Generally converted back to affine for transmission
• Two types of projective coordinates used in this work:
– Jacobian: (X,Y, Z) → ( XZ2 ,
YZ3 )
– Lopez-Dahab: (X,Y, Z) → (XZ
, YZ2 )
ECCUCD, 6th September 2007. 36
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Cost of Point Operations
AffineAddition: 1M + 1D + 1S
Doubling: 1M + 1D + 1S
Jacobian
Addition: 10M + 4S
Doubling: 5M + 5S
Conversion: 3M + 1D + 1S
Conversion*: 12M + 163S
Lopez-Dahab
Addition: 8M + 5S
Doubling: 4M + 5S
Conversion: 2M + 1D + 1S
Conversion*: 11M + 163S
• * = Conversion with no divider
ECCUCD, 6th September 2007. 37
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EC Design Choices
for Q=[k]P
m
GF(3 )m
GF(p ), p>3m
Coordinates Affine
Projective HomogenousJacobianLopez−Dahab
GF(P)Base Field
Curve Choose a, b
Algorithm
.........
Addition/Subtraction − NAF
Binary Double and Add
Montgomery Method
GF(2 )
ECCUCD, 6th September 2007. 38
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EC Point Scalar Multiplication
Algorithm Cost
• Binary Double and Add:
– NBinary = (m − 1)Ndouble + (m2− 1)Nadd
• Addition/Subtraction – NAF
– NNAF = (m − 1)Ndouble + (m3− 1)Nadd
• Montgomery Method:
– NMontgomery = Ndouble + (m − 1)Nloop + Ncomputey
ECCUCD, 6th September 2007. 39
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Hardware Architectures
Operation Architecture Clock Cycles
Addition m XOR Gates Combinational
MultiplicationDigit-Serial,
n = ⌈md⌉
Digit size d
SquaringBit-Parallel
CombinationalAND-XOR network
DivisionExtended Euclidean
2mAlgorithm
ECCUCD, 6th September 2007. 40
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Power Comparision
• This work studies the effect of coordinate and algorithm
choice on the power and energy consumption of an el-
liptic curve processor
• Coordinates investigated:
– Affine, Jacobian, Lopez-Dahab
• Algorithms investigated:
– Binary Double and Add
– Addition/Subtraction – NAF
– Montgomery Method
ECCUCD, 6th September 2007. 41
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Elliptic Curve Processor
6
m
controlbus
start
addr
ROM CounterControlFSM
CounterMasterm
m
data
4 m
GF2m divider
3 m m
GF2m adder GF2m squarer
m4
GF2m mult
2clk
rw
ld
done
memory0
memory1
init
32
address enable
load
ECCUCD, 6th September 2007. 42
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Elliptic Curve Processor
6
m
controlbus
start
addr
ROM CounterControlFSM
CounterMasterm
m
data
4 m
GF2m divider
3 m m
GF2m adder GF2m squarer
m4
GF2m mult
2clk
rw
ld
done
memory0
memory1
init
32
address enable
load
ECCUCD, 6th September 2007. 43
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Elliptic Curve Processor
6
m
controlbus
start
addr
ROM CounterControlFSM
CounterMasterm
m
data
4 m
GF2m divider
3 m m
GF2m adder GF2m squarer
m4
GF2m mult
2clk
rw
ld
done
memory0
memory1
init
32
address enable
load
ECCUCD, 6th September 2007. 44
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Elliptic Curve Processor
m
controlbus
start
addr
ROM CounterControlFSM
CounterMasterm
m
data
4 m
GF2m divider
3 m m
GF2m adder GF2m squarer
m4
GF2m mult
2clk
rw
ld
done
memory0
memory1
init
32
address enable
load
6
ECCUCD, 6th September 2007. 45
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Elliptic Curve Processor
Master
m
controlbus
start
addr m
m
data
4 m
GF2m divider
3 m m
GF2m adder GF2m squarer
m4
GF2m mult
2clk
rw
ld
done
memory0
memory1
init
32
address enable
load
6
ROM CounterControlFSM
Counter
ECCUCD, 6th September 2007. 46
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Elliptic Curve Processor
6
m
controlbus
start
addr
ROM CounterControlFSM
CounterMasterm
m
data
3 m m
GF2m adder GF2m squarer
m4
GF2m mult
2clk
rw
ld
done
memory0
memory1
init
32
address enable
load
ECCUCD, 6th September 2007. 47
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Elliptic Curve Processor
GF2m mult
m
controlbus
start
addr
ROM CounterControlFSM
CounterMasterm
m
data
4 m 3 m m
GF2m adder GF2m squarer
m4
GF2m mult
2clk
rw
ld
done
memory0
memory1
init
32
address enable
load
6
ECCUCD, 6th September 2007. 48
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
GF (2m) Elliptic Curve Processor
4
m
controlbus
start
addr
ROM CounterControlFSM
CounterMasterm
m
data
4 m 3 m m
GF2m adder GF2m squarer
m4
GF2m mult
2clk
rw
ld
done
memory0
memory1
init
32
address enable
load
6
GF2m mult
GF2m mult
m
ECCUCD, 6th September 2007. 49
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Implementation Results
• Target technology: Xilinx Spartan 3L – xc3s1000l
– Low Power FPGA
– Hibernate mode
• Two digit sizes of GF (2m) multiplier used:
– d = 1: area ≈ 3000 LUTs
– d = 16: area ≈ 5100 LUTs
– Divider area ≈ 1100 LUTs
• Minimum PPR Clock Frequency Reported = 80MHz
• Quiescent Power = 92mW
ECCUCD, 6th September 2007. 50
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Power Dissipation
Bin A
ffBin
Jac
Bin J
ac n
o di
vBin
LD
Bin L
D n
o di
vM
ont A
ff
Mon
t Jac
Mon
t Jac
no
div
Mon
t LD
Mon
t LD
no
div
Mon
t LD
2 m
ults
Mon
t LD
3 m
ults
NAF A
ff
NAF J
ac
NAF J
ac n
o di
vN
AF LD
NAF L
D n
o di
v
100
150
200
250
166.12mW
150.97mW
Power
Dyn
am
ic P
ow
er
(mW
)
d = 1
d = 16
ECCUCD, 6th September 2007. 51
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Point Multiplication Time
Bin Aff
Bin Ja
c
Bin Ja
c no d
ivBin
LD
Bin LD
no di
vMon
t Aff
Mont J
ac
Mont J
ac no
div
Mont L
D
Mont L
D no di
v
Mont L
D 2 mult
s
Mont L
D 3 mult
sNAF AffNAF Ja
c
NAF Jac n
o div
NAF LD
NAF LD no
div
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Average Point Multiplication Time
Tim
e (m
s)
d = 1 d = 16
ECCUCD, 6th September 2007. 52
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Energy Per Point Multiplication
Bin A
ffBin
Jac
Bin J
ac n
o di
vBin
LD
Bin L
D n
o di
vM
ont A
ff
Mon
t Jac
Mon
t Jac
no
div
Mon
t LD
Mon
tg L
D n
o di
v
Mon
t LD
2 m
ults
Mon
t LD
3 m
ults
NAF A
ff
NAF J
acob
ian
NAF J
ac n
o di
vN
AF LD
NAF L
D n
o di
v
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.036mJ
0.17mJ
Energy Per Point Multiplication
En
erg
y (
mJ
)
d = 1
d = 16
ECCUCD, 6th September 2007. 53
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Summary
• Minimum Power: 150.97mW
– Binary Lopez-Dahab no divider, d = 1
– fCLK = 80MHz, Calculation time = 2.87ms
– Energy = 0.43mJ
• Minimum Energy: 0.036mJ
– Montgomery Lopez-Dahab 3 mults, d = 16
– fCLK = 80MHz, Calculation time = 0.18ms
– Power = 203.95mW
ECCUCD, 6th September 2007. 54
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
What is the “best” set of choices?
• What is most important, power or energy?
• Need a metric to compare designs...
• Power and energy requirements will determine battery
size, therefore try to minimise both
• Look at Energy vs. Power
ECCUCD, 6th September 2007. 55
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Energy vs. Power
0.0 0.1 0.2 0.3 0.4 0.5 0.6
140
160
180
200
220
240
260
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
1 = Bin Aff, d=1
2 = Bin Jac, d=1
3 = Bin Jac no div, d=1
4 = Bin LD, d=1
5 = Bin LD no div, d=1
6 = Mont Aff, d=1
7 = Mont Jac, d=1
8 = Mont Jac no div, d=1
9 = Mont LD, d=1
10 = Mont LD no div, d=1
11 = Mont LD 2 mults, d=1
12 = Mont LD 3 mults, d=1
13 = NAF Aff, d=1
14 = NAF Jac, d=1
15 = NAF Jac no div, d=1
16 = NAF LD, d=1
17 = NAF LD no div, d=1
18 = Bin Aff, d=16
19 = Bin Jac, d=16
20 =Bin Jac no div, d=16
21 = Bin LD, d=16
22 = Bin LD no div, d=16
23 = Mont Aff, d=16
24 = Mont Jac, d=16
25 = Mont Jac no div, d=16
26 = Mont LD, d=16
27 = Mont LD no div, d=16
28 = Mont LD 2 mults, d=16
29 = Mont LD 3 mults, d=16
30 = NAF Aff, d=16
31 = NAF Jac, d=16
32 = NAF Jac no div, d=16
33 = NAF LD, d=16
34 = NAF LD no div, d=16
Energy vs Power, f = 80MHzP
ow
er
(mW
)
Energy (mJ)
ECCUCD, 6th September 2007. 56
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Energy–Power Product
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320
20
40
60
80
100
120
140d=16d=1
Architecture
1 = Bin Aff, d=12 = Bin Jac, d=13 = Bin Jac no div, d=14 = Bin LD, d=15 = Bin LD no div, d=16 = Mont Aff, d=17 = Mont Jac, d=18 = Mont Jac no div, d=19 = Mont LD, d=110 = Mont LD no div, d=111 = Mont LD 2 mults, d=112 = Mont LD 3 mults, d=113 = NAF Aff, d=114 = NAF Jac, d=115 = NAF Jac no div, d=116 = NAF LD, d=117 = NAF LD no div, d=118 = Bin Aff, d=1619 = Bin Jac, d=1620 =Bin Jac no div, d=1621 = Bin LD, d=1622 = Bin LD no div, d=1623 = Mont Aff, d=1624 = Mont Jac, d=1625 = Mont Jac no div, d=1626 = Mont LD, d=1627 = Mont LD no div, d=1628 = Mont LD 2 mults, d=1629 = Mont LD 3 mults, d=1630 = NAF Aff, d=1631 = NAF Jac, d=1632 = NAF Jac no div, d=1633 = NAF LD, d=1634 = NAF LD no div, d=16
EP Product, f = 80MHz
EP (m
J.m
W)
ECCUCD, 6th September 2007. 57
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EP Optimised Choices
Montgomery LD Montgomery LD
three mults d = 16 two mults, d = 16
EP Product: 7.4mJ.mW 7.5mJ.mW
Power: 203.95mW 192.7mW
Energy: 0.036mJ 0.039mJ
Time: 177µs 201µs
Area: 9393LUTs 6711LUTs
AT Product: 1.66 1.35
ECCUCD, 6th September 2007. 58
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EP Optimised Choices
Montgomery LD Montgomery LD
three mults d = 16 two mults, d = 16
EP Product: 7.4mJ.mW 7.5mJ.mW
Power: 203.95mW 192.7mW
Energy: 0.036mJ 0.039mJ
Time: 177µs 201µs
Area: 9393LUTs 6711LUTs
AT Product: 1.66 1.35
ECCUCD, 6th September 2007. 59
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EP Optimised Choices
Montgomery LD Montgomery LD
three mults d = 16 two mults, d = 16
EP Product: 7.4mJ.mW 7.5mJ.mW
Power: 203.95mW 192.7mW
Energy: 0.036mJ 0.039mJ
Time: 177µs 201µs
Area: 9393LUTs 6711LUTs
AT Product: 1.66 1.35
ECCUCD, 6th September 2007. 60
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
EP Optimised Choices
Montgomery LD Montgomery LD
three mults d = 16 two mults, d = 16
EP Product: 7.4mJ.mW 7.5mJ.mW
Power: 203.95mW 192.7mW
Energy: 0.036mJ 0.039mJ
Time: 177µs 201µs
Area: 9393LUTs 6711LUTs
AT Product: 1.66 1.35
ECCUCD, 6th September 2007. 61
Liam MarnaneClaude Shannon Institute
(Comparing) Hardware Complexity of Cryptographic Algorithms
Conclusion
• EC Implementation choices Do have an effect on the
Complexity of Final Design
• Many Metrics Available to Determine Best Design
• Designer/Vendor will always choose Metric that put their
design in the best light
• and their competitors in a bad light
ECCUCD, 6th September 2007. 62
Liam MarnaneClaude Shannon Institute