Comparison of Hardware Performanceof Selected Phase II
eSTREAM Candidates
Kris Gaj
Gabriel Southern
Ramakrishna Bachimanchi
&
Fall 2006 GMU ECE 545: Introduction to VHDL class
George Mason University
Goal
Comparison of Profile II (hardware) Phase 2 Focus candidates:
• Grain• Mickey-128• Phelix• Trivium
Two additional reference points:
• A5/1 (old & insecure GSM standard)• AES (compact architecture)
Two hardware technologies:
• Xilinx Spartan 3 FPGAs• TSMC 90 nm standard-cell library ASICs
Genesis & approach
• Part of GMU Fall 2006 graduate courseECE 545 Introduction to VHDL
• Individual 6-week project
• 4 students working independently on each eSTREAM cipher
• best code for each algorithm selected at the endof the semester
• selected designs verified and revised in order to assure• correct functionality• standard interface & control• uniform design & coding style
eSTREAMcipher
clk
reset
enc_dec
data_in
data_in_ready
data_in_write
d
data_out
writefull
d
Fixed interface
key_IV
key_IV_ready
key_IV_write
k
Two independent parameters
d – number of bits processed per clock cycle (radix)
k – number of bits of key/IV loaded per clock cycle
d encryption/decryptionthroughput
area
# of pins
+
ksetup time(key & IV loading+ initialization) area
# of pins
All results generated with k = d
Methodology
Specification
Execution Unit Control Unit
Block
diagram
Algorithmic
State Machine
VHDL code VHDL code
Methodology & tools
Xilinx ISE v. 8.1iImplementation(mapping, placing
& routing)
SynopsysDesign Analyzer
X-2005.9
SynplicitySynplify Pro
v. 8.5
Logic synthesis
Aldec Active HDLModelSim Xilinx Edition
VHDL simulation& debugging
ASICFPGATechnology
All results afterplacing & routing
All results afterlogic synthesis
No physicalimplementation
Assumptions
• Only encryption/decryption, no MAC
• Maximum allowed key and IV sizes
• Key and IV need to be reloaded each time eitherof them changes
• No precomputations of internal state outsideof the circuit.
642264A5/1
2888080Trivium
288128256Phelix
320128128Mickey-128
1606480Grain
Internal state sizeIV sizeKey sizeCipher22
Three categories of stream ciphersrepresented among those implemented
Based onLFSRs / NFSRswith serial inputs
Based onLFSRs / NFSRswith parallel inputs
NFSR LFSR NFSR LFSR
Based on basic iterativearchitecture and componentoperations of block ciphersand hash functions
Phelix, AES in OFB or CTR mode
Grain, Trivium, A5/1 Mickey-128
Optimizations for the first group of ciphersGrain
d=1 d=2
Si+80=si+62 + si+52 + si+38 + si+23 + si+13 + si
Si+81=si+63 + si+53 + si+39 + si+24 + si+14 + si+1
si si+79
si+80 si+80
si+81
sisi+1 si+79
HB
HB
HB
HB
Key mixing Encryption
HB HB
Key mixing Encryption
HB
HB
Key mixing Encryption
HB
Key mixing Encryption
2 x number of clock cycles
Block function,Sharing
Block function,No sharing
Half-Blockfunction,Sharing
Half-Block function,No sharing
Optimizations for the third group of ciphersPhelix
Ease of design as perceivedby students
based on the specification of each cipher
Average score( 5 – very easy,1 – very difficult)
Number of studentswho selected the
cipher as their first choice
Trivium
Mickey-128
Grain
Phelix
3.36
3.32
3.00
2.00
5
3
4
0
0
2000
4000
6000
8000
10000
12000
0 200 400 600 800 1000 1200 1400 Area[CLB slices]
Throughput[Mbit/s]
Phelix
Trivium
T64
T32
T16
Grain
Mickey-128
G16
G1
Best
Worst
Throughput vs. areaFPGA: Xilinx Spartan 3 family
AES
0
200
400
600
800
1000
1200
1200 1250 1300 1350 1400 Area[CLB slices]
Throughput[Mbit/s]
Block function,No sharing
Block function,Sharing
Half-Block function,Sharing
Half-Block function,No sharing
Throughput vs. area: PhelixFPGA: Xilinx Spartan 3 family
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200 250 300 350 400Area
[CLB slices]
Throughput[Mbit/s]
G1 G2 G4
G8
G16
M
T64
T32
T16
T8
T1-4
A – A5/1G – GrainM – Mickey-128T – Trivium
Legend:
Throughput vs. area: Grain, Mickey-128, Trivium, A5/1FPGA: Xilinx Spartan 3 family
0
500
1000
1500
2000
2500
3000
0 50 100 150 200 250 300 350Area
[CLB slices]
Throughput[Mbit/s]
G16
M
T16
G8
T8
T4
T2
T1
G4
G2G1A1
A3 A4
A – A5/1G – GrainM – Mickey-128T – Trivium
Legend:
Throughput vs. area: Throughput up to 3 Gbit/sFPGA: Xilinx Spartan 3 family
0
100
200
300
400
500
600
700
800
0 200 400 600 800 1000 1200Area
[CLB slices]
Throughput[Mbit/s]
Mickey-128
Trivium-1
Grain-1A5/1-1
Phelix (Half-block, No sharing)
Optimizations for minimum areaFPGA: Xilinx Spartan 3 family
0
200
400
600
800
1000
1200
Area[CLB slices]
Mickey-128(d=1)
Trivium(d=1)
Grain(d=1)
A5/1(d=1)
Phelix(d=32, HB-NS)
57122
188230
1216
x 0.47x 1.54
x 1.00
x 1.89
x 9.97
Optimizations for minimum areaFPGA: Xilinx Spartan 3 family
x cipher area/Grain area
0
2000
4000
6000
8000
10000
12000
0 200 400 600 800 1000 1200Area
[CLB slices]
Throughput[Mbit/s]
Mickey-128
Trivium-64
Grain-16
A5/1-1
Phelix (Full block, Sharing)
Optimizations for maximum throughput to area ratioFPGA: Xilinx Spartan 3 family
0.00
5.00
10.00
15.00
20.00
25.00
30.00
Mickey-128(d=1)
Trivium(d=64)
Grain(d=16)
A5/1(d=1)
Phelix(d=32, FB-SH)
Throughput/Area[Mbit/sCLB slices]
31.34
6.97
3.050.96 0.83
x 4.5
x 10.3x 32.5 x 37.7
x 1.0
x Trivium ratio / cipher ratio
Optimizations for maximum throughput to area ratioFPGA: Xilinx Spartan 3 family
0
1000
2000
3000
4000
5000
6000
Mickey-128Trivium Grain A5/1 Phelix
k= 1 2 4 8 16 32 64 1 2 4 8 16 1 3 4 32
Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family
Setup time[ns]
0
200
400
600
800
1000
1200
2000
4000
6000
8000
10000
12000
Clock cycles Nanoseconds
Mickey-128(k=1)
Trivium(k=1)
Grain(k=1)
A5/1(k=1)
Phelix(k=32)
0
Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family
0
50
100
150
200
250
300
350
400
Mickey-128(k=1)
Trivium(k=64)
Grain(k=16)
A5/1(k=1)
Phelix(k=32)
0
500
1000
1500
2000
2500
3000
3500
4000
Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family
Clock cycles Nanoseconds
Results forASICs: TSMC 90 nm library
Relative results comparable to results for FPGAs
Absolute speed increase by a factor from 3 to 10before ASIC layout synthesis
Conclusions• Very large differences among candidate ciphers
(much larger than for five final candidates in the AES contest)
Possible reasons:• variety of ciphers based on different design principles• different internal state, key, and IV sizes• early stage of the contest
Trivium and Grain outperform other eSTREAM ciphersin terms of
• flexibility• minimum area• maximum throughput to area ratio.
Once again ciphers based on LFSR and NFSRs show theirsuperiority in hardware implementations
Security analysis should focus first on the most efficient ciphers