Good Practices for Designing
Cryptographic Primitives in Hardware
Miroslav Knežević
NXP Semiconductors
January 25, 2016
School on Design for a Secure IoT, Tenerife, 2016
THINGS
Smart Plugs
January 28, 20163.
Smart Door Locks
January 28, 20164.
Smart Thermostats
January 28, 20165.
Smart Smoke Detectors
January 28, 20166.
Smart Light Bulbs
January 28, 20167.
Connected Cars
January 28, 20168.
WHY HARDWARE?
…to run software, obviously!
January 28, 201610.
But software alone is…
January 28, 201611.
Slow Insecure Energy inefficient
Super Powers of Embedded HW
January 28, 201613.
Performance Power
(Energy)
AreaSecurity*
* In the original slide deck Superman was held responsible for Security. During the coffee break, students suggested that Batman would be a better representative and I agree he is. So sorry, Mr. Kent!
Area – Designing the Smallest Block Cipher
January 28, 201614.
A A
VDD
January 28, 201615.
MOSFET Channel Length
S
D
G
NAND gate
January 28, 201616.
• Smallest logical gate with two inputs.
• GE (gate equivalence) = physical area
of a single NAND gate.
• (Ab)used for comparing HW designs
across different CMOS technologies.
XOR gate
January 28, 201617.
2-3 GE
Modern Lightweight Ciphers
January 28, 201618.
< 1000 GE
AES (128-bit key, ENC only)
January 28, 201619.
2500 GE
January 28, 201620.
Advances in CMOS Technology
140 nm 40 nm 10 nm
January 28, 201621.
ARM Cortex M0 example (~20 kGE)
16.25 mm
CMOS
40 nm
CMOS
140 nm
Block Cipher – HW Perspective
January 28, 201622.
Memory
Key size ≥ 80 bits
Block size ≥ 32 bits
Round function
+
Key schedule
+
Control logic
Minimize!
KATAN – The Smallest Block Cipher
January 28, 201623.
January 28, 201624.
KATAN32 = 315 GE + 508 bits of ROM
Only 508 bits of expanded key!
KATAN – The Smallest Block Cipher
KATAN in numbers
January 28, 201625.
It’s (one of) the smallest known cipher(s): < 500 GE
But it’s not very fast: 254 clock cycles
Still scalable: 3 times faster for negligible area overhead
KATAN vs Competition
January 28, 201626.
SPECK
IBM130
Synopsys
≥ 580 GE
KLEIN
TSMC180
Synopsys
≥ 1.3 kGE
Piccolo
130nm
Synopsys
≥ 700 GE
LED
180nm
Synopsys
≥ 700 GE
KATAN
NXP140
Cadence
≥ 460 GEPRESENT
UMC180
IHP250
AMIS350
Synopsys
~1kGE
SIMON
IBM130
Synopsys
≥ 520 GE
<5
6.2
5
6.6
7 7.6
7
NXP 90NM UMC 130NM UMC 180NM NANGATE 45NM
AREA OF SCAN-FF [GE]
Memory Elements in different CMOS Technologies
January 28, 201627.
52
1 73
7 91
8
1192
13
40
73
8
10
60 1
32
9
17
28 19
50
75
9
1103 1
36
7
17
68 20
12
86
8
12
56
15
71
20
71 2
32
3
VERSION 1 VERSION 2 VERSION 3 VERSION 4 VERSION 5
AREA [GE]
SPONGENT in different CMOS Technologies
January 28, 201628.
up to 70% difference!
How can we do a fair comparison?
January 28, 201629.
Difficult in practice.
But why not using an open-core library at least?
http://www.nangate.com/
Performance – Designing the Fastest Block Cipher
January 28, 201630.
Latency vs Throughput
January 28, 201631.
Latency = 15 s
Throughput = 0.067 beer/s
12
3
6
9Serial
processing
Latency vs Throughput
January 28, 201632.
Latency = 15 s
Throughput = 0.2 beer/s
12
3
6
9Parallel
processing!
Latency vs Throughput
January 28, 201633.
Latency = 15 s
Throughput = 0.2 beer/s
12
3
6
9Pipelining!
Latency vs Throughput
January 28, 201634.
Latency = 5 s
Throughput = 0.2 beer/s
12
3
6
9
bottom-up!
Unrolling!
128 128 8 MDS LIGHT
128 128 4 BINARY NO
64 64 4 MDS LIGHT
64 64, 96, 128 4 BINARY LIGHT
64 80, 128 4BIT
PERMUTATIONLIGHT
64 64, 80, 96 4 MDS LIGHT
64 64, 128 4 MDS NO
Latency of Existing Ciphers – Is Lightweight = “Light + Wait”?
January 28, 201635.
BLOCK-SIZE KEY-SIZE S-BOX P-LAYER K-SCHEDULE
MCRYPTON
AES
NOEKEON
MINI-AES
PRESENT
KLEIN
LED
Unrolled HW Architectures
January 28, 201636.
17.8
15.3 2
0.3 2
5.3
31.2
46.6
9.8
9.8
9.8
9.9
14.8
15.5
14.8
14.7
20.2
16.4 2
1.4 2
6.4
32.8
48.2
10.8
10.8
11 12
17 17.4
16.4
16.6
LATENCY [NS]
1-cycle 2-cycle
Results – Latency (CMOS 90 nm)
January 28, 201637.
Number of Rounds vs Key Size
January 28, 201638.
366.6
48.2 63.7 79.9
128.7
193.1
41.3
40.4
41.4
40
102.5
49.5 72.3
73.8
191.8
24.9
32.6
41.3 63.5 9
6
20.9
21.1
21
22
49.6
27.1
37.6
37.1
AREA [KGE]
1-cycle 2-cycle
Results – Area (CMOS 90 nm)
January 28, 201639.
• Use small S-boxes (e.g. 5-bit, 4-bit, 3-bit)
• Almost everything follows the normal distribution. So does the S-box!
Low Latency Encryption
S-box
January 28, 201640.
slide credit:
Gregor Leander
choose me!
Low Latency Encryption
Number of Rounds vs Round Complexity
January 28, 201641.
• Not too low complexity.
• Reduce the number of rounds at the cost of (slightly) heavier round.
• Number of rounds should be independent of the key schedule.
• Use constant addition instead of a key schedule (if possible).
Low Latency Encryption
Key Schedule
January 28, 201642.
• Use involution where possible: 𝑓 𝑓 𝑥 = 𝑥.
• Make Encryption and Decryption procedures similar.
• BUT: think application oriented – sometimes it is beneficial to have
asymmetric constructions.
Low Latency Encryption
Encryption vs Decryption
January 28, 201643.
Low Latency Encryption
Meet PRINCE
January 28, 201644.
𝛼-reflection property:
Low Latency Encryption
Meet PRINCE
January 28, 201645.
17.8
15.3
20.3
25.3
31.2
46.6
9.8
9.8
9.8
9.9
14.8
15.5
14.8
14.7
8
LATENCY [NS]
Low Latency Encryption
Meet PRINCE
January 28, 201646.
366.6
48.2 63.7 79.9
128.7
193.1
41.3
40.4
41.4
40
102.5
49.5 7
2.3
73.8
17
AREA [KGE]
Power/Energy – Future of Lightweight Crypto
January 28, 201647.
History of Lightweight Crypto (Area)
January 28, 201648.
0
500
1000
1500
2000
2500
3000
3500
1970 1980 1990 2000 2010 2020
AR
EA
(G
E)
YEAR
Lightweight Block Ciphers
DESAES
PRESENT
KATAN
History of Lightweight Crypto (Latency)
January 28, 201649.
1
10
100
1000
10000
1970 1980 1990 2000 2010 2020
LA
TE
NC
Y (
# C
LO
CK
CY
CLE
S)
YEAR
Lightweight Block Ciphers
History of Lightweight Crypto (Energy*)
January 28, 201650.
1
10
100
1000
10000
100000
1000000
10000000
100000000
1970 1980 1990 2000 2010 2020
AR
EA
* L
AT
EN
CY
YEAR
Lightweight Block Ciphers
Future of Lightweight Crypto (Energy*)
January 28, 201651.
1
10
100
1000
10000
100000
1000000
10000000
100000000
1970 1980 1990 2000 2010 2020
AR
EA
* L
AT
EN
CY
YEAR
Lightweight Block Ciphers
PRINCE
Futu
re o
f LW
C
Energy
January 28, 201652.
=
Power
January 28, 201653.
=
Passive RFID: Low Power Applications
January 28, 201654.
CONTROL
MEMORY
CRYPTORF
NETWORK
INTERFACE RF
RFID tag
Reader
Anything Battery Powered: Low Energy Applications
January 28, 201655.
Every mW matters!
January 28, 201656.
Total number of mobile devices in 2015 = 9.5 billion*
Average (regular) power consumption of a smartphone = 160 mW**
Total energy spent = €2.8 billion*** a year!
*** average electricity price in 2014 in EU was €0.208 per kWh.
** An Analysis of Power Consumption in a Smartphone, A Carroll, G Heiser, USENIX 2010.
* Mobile Statistics Report 2015-2019, The Radicati Group Inc.
Cisco estimates…
January 28, 201657.
… 50 billion* connected devices by 2020
* http://www.cisco.com/web/solutions/trends/iot/portfolio.html
Intel says…
January 28, 201658.
… 200 billion* smart devices by 2020
* http://www.intel.com/content/www/us/en/internet-of-things/infographics/guide-to-iot.html
Moving Bits vs Moving People & Things
January 28, 201659.
* http://www.tech-pundit.com/wp-content/uploads/2013/07/Cloud_Begins_With_Coal.pdf
World’s ICT Energy Consumption
January 28, 201660.
* http://www.tech-pundit.com/wp-content/uploads/2013/07/Cloud_Begins_With_Coal.pdf
The ICT ecosystem uses about 1500 TWh of electricity annually and
approaches 10% of world electricity generation!
What can Crypto do about it?
January 28, 201661.
Become Lightweight Crypto!
Power Consumption in (Crypto) HW
January 28, 201662.
𝑃𝑡𝑜𝑡 = 𝑃𝑠𝑤𝑖𝑡𝑐ℎ𝑖𝑛𝑔 + 𝑃𝑙𝑒𝑎𝑘𝑎𝑔𝑒
𝑃𝑠𝑤𝑖𝑡𝑐ℎ𝑖𝑛𝑔 ≈ 𝐶𝑒𝑓𝑓 ∙ 𝑉𝐷𝐷2 ∙ 𝑓𝑐𝑙𝑘∙ 𝑠𝑤
𝑃𝑠𝑤𝑖𝑡𝑐ℎ𝑖𝑛𝑔 ≫ 𝑃𝑙𝑒𝑎𝑘𝑎𝑔𝑒
Energy Consumption in (Crypto) HW
January 28, 201663.
𝐸 = 𝑃 ∙ 𝑡 = 𝑃 ∙𝑁
𝑓𝑐𝑙𝑘
𝐸 ≈ 𝐶𝑒𝑓𝑓 ∙ 𝑉𝐷𝐷2 ∙ 𝑁 ∙ 𝑠𝑤
• Reduce circuit area (e.g. serializing): 𝐶𝑒𝑓𝑓 ↓, but 𝑁 ↑
• Reduce switching activity (e.g. clock gating): 𝑠𝑤 ↓
• Move to smaller CMOS technologies: 𝐶𝑒𝑓𝑓 ↓, 𝑉𝐷𝐷 ↓, but 𝑃𝑙𝑒𝑎𝑘𝑎𝑔𝑒 ↑
• Reduce the operating clock frequency: 𝑓𝑐𝑙𝑘 ↓
• Reduce the latency: 𝑁 ↓
Reducing Power and Energy Consumption
January 28, 201664.
𝑃 ≈ 𝐶𝑒𝑓𝑓 ∙ 𝑉𝐷𝐷2 ∙ 𝑠𝑤 ∙ 𝑓𝑐𝑙𝑘
𝐸 ≈ 𝐶𝑒𝑓𝑓 ∙ 𝑉𝐷𝐷2 ∙ 𝑠𝑤 ∙ 𝑁
Security – Know Your Enemy
January 28, 201666.
* In the original slide deck Superman was held responsible for Security. During the coffee break, students suggested that Batman would be a better representative and I agree he is. So sorry, Mr. Kent!
January 28, 201667.
Power (Current) Measurement Setup
January 28, 201668.
RAM CPU
COPROS
PE
RIP
HE
RA
LS
VDD
R
RAM CPU
COPROS
PE
RIP
HE
RA
LS
VDD
𝑖EM-probes
𝑖
SIMPLE POWER
ANALYSIS
SPA – Public Key Crypto
January 28, 201670.
• Modular exponentiation: modkx m N
for down to
if then
endfor
return
2 ni2xx
mxx
)1( ik
mx
0
x
Power consumption depends
on value of the secret bit ki!
SPA – Symmetric Key Crypto (1)
January 28, 201671.
RAM CPU
COPROS
PE
RIP
HE
RA
LS
DIFFERENTIAL
POWER ANALYSIS
I-V characteristic of CMOS inverter
January 28, 201676.
Vi
VDD
ID
CMOS timing delays (𝒕𝒓, 𝒕𝒇, 𝒕𝑷𝑯𝑳, 𝒕𝑷𝑳𝑯, 𝒕𝑻𝑯𝑳, 𝒕𝑻𝑳𝑯)
January 28, 201677.
Vi Vo
VDD
Masking – XOR gate example
January 28, 201678.
x1
y1
x2
y2
z1
z2
x
yz
𝑥⨁𝑦 = 𝑥1⨁𝑥2 ⨁ 𝑦1⨁𝑦2 = 𝑥1⨁𝑦1 ⨁ 𝑥2⨁𝑦2 = 𝑧1⨁𝑧2 = 𝑧
Masking – AND gate example
January 28, 201679.
x
yz
x1
y1
x1
y2
x2
y2
x2
y1
z1
z2
z1
𝑥 ∙ 𝑦 = 𝑥1⨁𝑥2 ∙ 𝑦1⨁𝑦2
= 𝑧1 ⊕ 𝑧1 ⊕ (𝑥1 ∙ 𝑦1) ⊕ (𝑥1 ∙ 𝑦2) ⊕ (𝑥2 ∙ 𝑦1) ⊕ 𝑥2 ∙ 𝑦2
= 𝑧1⨁𝑧2 = 𝑧
Masking – AND gate example (delays cause leakage)
January 28, 201680.
x1
y1
x1
y2
x2
y2
x2
y1
z1
z2
z1
y1 y2 y switching
0 0 0 -
0 1 1 1 AND, 1 XOR
1 0 1 1 AND, 2 XOR
1 1 0 2 AND, 2 XOR
y = 0 => 2 AND, 2 XOR gates switching on average
y = 1 => 2 AND, 3 XOR gates switching on average
Masking with Sufficient Noise
slide credit:
Marcel Medwed
CORRELATION
POWER ANALYSIS
January 28, 201683.
Correlation Power Analysis
S-BOX
(8-bit)
plaintext (𝑝)
ciphertext (𝑐)key (𝑘)
𝑥𝑗𝑖 = 𝐻𝑊 𝑘𝑖 ⊕ 𝑝𝑗 0 < 𝑖 < 255, 0 < 𝑗 < 𝑛
𝑦𝑗 = 𝑓 𝐻𝑊 𝐾 ⊕ 𝑝𝑗 Measured current!
𝜌𝑥𝑦𝑖 =
𝑗=0𝑛−1 𝑥𝑗
𝑖 − 𝑥 𝑦𝑗 − 𝑦
𝑗=0𝑛−1 𝑥𝑗
𝑖 − 𝑥2
𝑗=0𝑛−1 𝑦𝑗 − 𝑦
2
# of traces
Pearson correlation
January 28, 201684.
Correlation Power Analysis
S-BOX
(8-bit)
plaintext (𝑝)
ciphertext (𝑐)key (𝑘)
𝑥𝑗𝑖 = 𝐻𝑊 𝑆𝐵𝑂𝑋 𝑘𝑖 ⊕ 𝑝𝑗
0 < 𝑖 < 255, 0 < 𝑗 < 𝑛
𝑦𝑗 = 𝑓 𝐻𝑊 𝑆𝐵𝑂𝑋 𝐾 ⊕ 𝑝𝑗 Measured current!
𝜌𝑥𝑦𝑖 =
𝑗=0𝑛−1 𝑥𝑗
𝑖 − 𝑥 𝑦𝑗 − 𝑦
𝑗=0𝑛−1 𝑥𝑗
𝑖 − 𝑥2
𝑗=0𝑛−1 𝑦𝑗 − 𝑦
2
# of traces
Pearson correlation
COST OF
COUNTERMEASURES
Security comes at Price – Area Overhead
January 28, 201686.
Insecure
X GE
SCA-secure
5X GE
SCA&FA-secure
10X GE
Security comes at Price – Performance Penalty
January 28, 201687.
Insecure
N s
SCA-secure
~3-5N s
SCA&FA-secure
~8-10N s
Challenges – SCA Countermeasures
January 28, 201688.
INCOMPLETE MODELS
Circuit
models
Adversary
models1st vs higher
order DPA
Challenges – FA Countermeasures
January 28, 201689.
LACK OF CREATIVITY
Redundant
executions
Dummy
operationsLight
sensors
THANK YOU!
January 28, 201690.
Thanks to the teams of
KATAN, SPONGENT, PRINCE, FIDES
Workshop on Crypto Design for IoT
January 28, 201691.
https://www.cosic.esat.kuleuven.be/ecrypt_net_iot_workshop_2016/