+ All Categories
Home > Documents > Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs...

Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs...

Date post: 06-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
86
Good Practices for Designing Cryptographic Primitives in Hardware Miroslav Knežević NXP Semiconductors [email protected] January 25, 2016 School on Design for a Secure IoT, Tenerife, 2016
Transcript
Page 1: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Good Practices for Designing

Cryptographic Primitives in Hardware

Miroslav Knežević

NXP Semiconductors

[email protected]

January 25, 2016

School on Design for a Secure IoT, Tenerife, 2016

Page 2: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

THINGS

Page 3: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Smart Plugs

January 28, 20163.

Page 4: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Smart Door Locks

January 28, 20164.

Page 5: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Smart Thermostats

January 28, 20165.

Page 6: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Smart Smoke Detectors

January 28, 20166.

Page 7: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Smart Light Bulbs

January 28, 20167.

Page 8: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Connected Cars

January 28, 20168.

Page 9: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

WHY HARDWARE?

Page 10: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

…to run software, obviously!

January 28, 201610.

Page 11: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

But software alone is…

January 28, 201611.

Slow Insecure Energy inefficient

Page 12: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Super Powers of Embedded HW

January 28, 201613.

Performance Power

(Energy)

AreaSecurity*

* In the original slide deck Superman was held responsible for Security. During the coffee break, students suggested that Batman would be a better representative and I agree he is. So sorry, Mr. Kent!

Page 13: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Area – Designing the Smallest Block Cipher

January 28, 201614.

Page 14: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

A A

VDD

January 28, 201615.

MOSFET Channel Length

S

D

G

Page 15: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

NAND gate

January 28, 201616.

• Smallest logical gate with two inputs.

• GE (gate equivalence) = physical area

of a single NAND gate.

• (Ab)used for comparing HW designs

across different CMOS technologies.

Page 16: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

XOR gate

January 28, 201617.

2-3 GE

Page 17: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Modern Lightweight Ciphers

January 28, 201618.

< 1000 GE

Page 18: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

AES (128-bit key, ENC only)

January 28, 201619.

2500 GE

Page 19: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

January 28, 201620.

Advances in CMOS Technology

140 nm 40 nm 10 nm

Page 20: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

January 28, 201621.

ARM Cortex M0 example (~20 kGE)

16.25 mm

CMOS

40 nm

CMOS

140 nm

Page 21: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Block Cipher – HW Perspective

January 28, 201622.

Memory

Key size ≥ 80 bits

Block size ≥ 32 bits

Round function

+

Key schedule

+

Control logic

Minimize!

Page 22: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

KATAN – The Smallest Block Cipher

January 28, 201623.

Page 23: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

January 28, 201624.

KATAN32 = 315 GE + 508 bits of ROM

Only 508 bits of expanded key!

KATAN – The Smallest Block Cipher

Page 24: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

KATAN in numbers

January 28, 201625.

It’s (one of) the smallest known cipher(s): < 500 GE

But it’s not very fast: 254 clock cycles

Still scalable: 3 times faster for negligible area overhead

Page 25: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

KATAN vs Competition

January 28, 201626.

SPECK

IBM130

Synopsys

≥ 580 GE

KLEIN

TSMC180

Synopsys

≥ 1.3 kGE

Piccolo

130nm

Synopsys

≥ 700 GE

LED

180nm

Synopsys

≥ 700 GE

KATAN

NXP140

Cadence

≥ 460 GEPRESENT

UMC180

IHP250

AMIS350

Synopsys

~1kGE

SIMON

IBM130

Synopsys

≥ 520 GE

Page 26: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

<5

6.2

5

6.6

7 7.6

7

NXP 90NM UMC 130NM UMC 180NM NANGATE 45NM

AREA OF SCAN-FF [GE]

Memory Elements in different CMOS Technologies

January 28, 201627.

Page 27: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

52

1 73

7 91

8

1192

13

40

73

8

10

60 1

32

9

17

28 19

50

75

9

1103 1

36

7

17

68 20

12

86

8

12

56

15

71

20

71 2

32

3

VERSION 1 VERSION 2 VERSION 3 VERSION 4 VERSION 5

AREA [GE]

SPONGENT in different CMOS Technologies

January 28, 201628.

up to 70% difference!

Page 28: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

How can we do a fair comparison?

January 28, 201629.

Difficult in practice.

But why not using an open-core library at least?

http://www.nangate.com/

Page 29: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Performance – Designing the Fastest Block Cipher

January 28, 201630.

Page 30: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Latency vs Throughput

January 28, 201631.

Latency = 15 s

Throughput = 0.067 beer/s

12

3

6

9Serial

processing

Page 31: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Latency vs Throughput

January 28, 201632.

Latency = 15 s

Throughput = 0.2 beer/s

12

3

6

9Parallel

processing!

Page 32: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Latency vs Throughput

January 28, 201633.

Latency = 15 s

Throughput = 0.2 beer/s

12

3

6

9Pipelining!

Page 33: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Latency vs Throughput

January 28, 201634.

Latency = 5 s

Throughput = 0.2 beer/s

12

3

6

9

bottom-up!

Unrolling!

Page 34: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

128 128 8 MDS LIGHT

128 128 4 BINARY NO

64 64 4 MDS LIGHT

64 64, 96, 128 4 BINARY LIGHT

64 80, 128 4BIT

PERMUTATIONLIGHT

64 64, 80, 96 4 MDS LIGHT

64 64, 128 4 MDS NO

Latency of Existing Ciphers – Is Lightweight = “Light + Wait”?

January 28, 201635.

BLOCK-SIZE KEY-SIZE S-BOX P-LAYER K-SCHEDULE

MCRYPTON

AES

NOEKEON

MINI-AES

PRESENT

KLEIN

LED

Page 35: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Unrolled HW Architectures

January 28, 201636.

Page 36: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

17.8

15.3 2

0.3 2

5.3

31.2

46.6

9.8

9.8

9.8

9.9

14.8

15.5

14.8

14.7

20.2

16.4 2

1.4 2

6.4

32.8

48.2

10.8

10.8

11 12

17 17.4

16.4

16.6

LATENCY [NS]

1-cycle 2-cycle

Results – Latency (CMOS 90 nm)

January 28, 201637.

Page 37: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Number of Rounds vs Key Size

January 28, 201638.

Page 38: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

366.6

48.2 63.7 79.9

128.7

193.1

41.3

40.4

41.4

40

102.5

49.5 72.3

73.8

191.8

24.9

32.6

41.3 63.5 9

6

20.9

21.1

21

22

49.6

27.1

37.6

37.1

AREA [KGE]

1-cycle 2-cycle

Results – Area (CMOS 90 nm)

January 28, 201639.

Page 39: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

• Use small S-boxes (e.g. 5-bit, 4-bit, 3-bit)

• Almost everything follows the normal distribution. So does the S-box!

Low Latency Encryption

S-box

January 28, 201640.

slide credit:

Gregor Leander

choose me!

Page 40: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Low Latency Encryption

Number of Rounds vs Round Complexity

January 28, 201641.

• Not too low complexity.

• Reduce the number of rounds at the cost of (slightly) heavier round.

Page 41: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

• Number of rounds should be independent of the key schedule.

• Use constant addition instead of a key schedule (if possible).

Low Latency Encryption

Key Schedule

January 28, 201642.

Page 42: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

• Use involution where possible: 𝑓 𝑓 𝑥 = 𝑥.

• Make Encryption and Decryption procedures similar.

• BUT: think application oriented – sometimes it is beneficial to have

asymmetric constructions.

Low Latency Encryption

Encryption vs Decryption

January 28, 201643.

Page 43: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Low Latency Encryption

Meet PRINCE

January 28, 201644.

𝛼-reflection property:

Page 44: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Low Latency Encryption

Meet PRINCE

January 28, 201645.

17.8

15.3

20.3

25.3

31.2

46.6

9.8

9.8

9.8

9.9

14.8

15.5

14.8

14.7

8

LATENCY [NS]

Page 45: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Low Latency Encryption

Meet PRINCE

January 28, 201646.

366.6

48.2 63.7 79.9

128.7

193.1

41.3

40.4

41.4

40

102.5

49.5 7

2.3

73.8

17

AREA [KGE]

Page 46: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Power/Energy – Future of Lightweight Crypto

January 28, 201647.

Page 47: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

History of Lightweight Crypto (Area)

January 28, 201648.

0

500

1000

1500

2000

2500

3000

3500

1970 1980 1990 2000 2010 2020

AR

EA

(G

E)

YEAR

Lightweight Block Ciphers

DESAES

PRESENT

KATAN

Page 48: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

History of Lightweight Crypto (Latency)

January 28, 201649.

1

10

100

1000

10000

1970 1980 1990 2000 2010 2020

LA

TE

NC

Y (

# C

LO

CK

CY

CLE

S)

YEAR

Lightweight Block Ciphers

Page 49: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

History of Lightweight Crypto (Energy*)

January 28, 201650.

1

10

100

1000

10000

100000

1000000

10000000

100000000

1970 1980 1990 2000 2010 2020

AR

EA

* L

AT

EN

CY

YEAR

Lightweight Block Ciphers

Page 50: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Future of Lightweight Crypto (Energy*)

January 28, 201651.

1

10

100

1000

10000

100000

1000000

10000000

100000000

1970 1980 1990 2000 2010 2020

AR

EA

* L

AT

EN

CY

YEAR

Lightweight Block Ciphers

PRINCE

Futu

re o

f LW

C

Page 51: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Energy

January 28, 201652.

=

Page 52: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Power

January 28, 201653.

=

Page 53: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Passive RFID: Low Power Applications

January 28, 201654.

CONTROL

MEMORY

CRYPTORF

NETWORK

INTERFACE RF

RFID tag

Reader

Page 54: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Anything Battery Powered: Low Energy Applications

January 28, 201655.

Page 55: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Every mW matters!

January 28, 201656.

Total number of mobile devices in 2015 = 9.5 billion*

Average (regular) power consumption of a smartphone = 160 mW**

Total energy spent = €2.8 billion*** a year!

*** average electricity price in 2014 in EU was €0.208 per kWh.

** An Analysis of Power Consumption in a Smartphone, A Carroll, G Heiser, USENIX 2010.

* Mobile Statistics Report 2015-2019, The Radicati Group Inc.

Page 56: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Cisco estimates…

January 28, 201657.

… 50 billion* connected devices by 2020

* http://www.cisco.com/web/solutions/trends/iot/portfolio.html

Page 57: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Intel says…

January 28, 201658.

… 200 billion* smart devices by 2020

* http://www.intel.com/content/www/us/en/internet-of-things/infographics/guide-to-iot.html

Page 58: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Moving Bits vs Moving People & Things

January 28, 201659.

* http://www.tech-pundit.com/wp-content/uploads/2013/07/Cloud_Begins_With_Coal.pdf

Page 59: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

World’s ICT Energy Consumption

January 28, 201660.

* http://www.tech-pundit.com/wp-content/uploads/2013/07/Cloud_Begins_With_Coal.pdf

The ICT ecosystem uses about 1500 TWh of electricity annually and

approaches 10% of world electricity generation!

Page 60: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

What can Crypto do about it?

January 28, 201661.

Become Lightweight Crypto!

Page 61: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Power Consumption in (Crypto) HW

January 28, 201662.

𝑃𝑡𝑜𝑡 = 𝑃𝑠𝑤𝑖𝑡𝑐ℎ𝑖𝑛𝑔 + 𝑃𝑙𝑒𝑎𝑘𝑎𝑔𝑒

𝑃𝑠𝑤𝑖𝑡𝑐ℎ𝑖𝑛𝑔 ≈ 𝐶𝑒𝑓𝑓 ∙ 𝑉𝐷𝐷2 ∙ 𝑓𝑐𝑙𝑘∙ 𝑠𝑤

𝑃𝑠𝑤𝑖𝑡𝑐ℎ𝑖𝑛𝑔 ≫ 𝑃𝑙𝑒𝑎𝑘𝑎𝑔𝑒

Page 62: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Energy Consumption in (Crypto) HW

January 28, 201663.

𝐸 = 𝑃 ∙ 𝑡 = 𝑃 ∙𝑁

𝑓𝑐𝑙𝑘

𝐸 ≈ 𝐶𝑒𝑓𝑓 ∙ 𝑉𝐷𝐷2 ∙ 𝑁 ∙ 𝑠𝑤

Page 63: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

• Reduce circuit area (e.g. serializing): 𝐶𝑒𝑓𝑓 ↓, but 𝑁 ↑

• Reduce switching activity (e.g. clock gating): 𝑠𝑤 ↓

• Move to smaller CMOS technologies: 𝐶𝑒𝑓𝑓 ↓, 𝑉𝐷𝐷 ↓, but 𝑃𝑙𝑒𝑎𝑘𝑎𝑔𝑒 ↑

• Reduce the operating clock frequency: 𝑓𝑐𝑙𝑘 ↓

• Reduce the latency: 𝑁 ↓

Reducing Power and Energy Consumption

January 28, 201664.

𝑃 ≈ 𝐶𝑒𝑓𝑓 ∙ 𝑉𝐷𝐷2 ∙ 𝑠𝑤 ∙ 𝑓𝑐𝑙𝑘

𝐸 ≈ 𝐶𝑒𝑓𝑓 ∙ 𝑉𝐷𝐷2 ∙ 𝑠𝑤 ∙ 𝑁

Page 64: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Security – Know Your Enemy

January 28, 201666.

* In the original slide deck Superman was held responsible for Security. During the coffee break, students suggested that Batman would be a better representative and I agree he is. So sorry, Mr. Kent!

Page 65: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

January 28, 201667.

Page 66: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Power (Current) Measurement Setup

January 28, 201668.

RAM CPU

COPROS

PE

RIP

HE

RA

LS

VDD

R

RAM CPU

COPROS

PE

RIP

HE

RA

LS

VDD

𝑖EM-probes

𝑖

Page 67: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

SIMPLE POWER

ANALYSIS

Page 68: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

SPA – Public Key Crypto

January 28, 201670.

• Modular exponentiation: modkx m N

for down to

if then

endfor

return

2 ni2xx

mxx

)1( ik

mx

0

x

Power consumption depends

on value of the secret bit ki!

Page 69: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

SPA – Symmetric Key Crypto (1)

January 28, 201671.

RAM CPU

COPROS

PE

RIP

HE

RA

LS

Page 70: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

DIFFERENTIAL

POWER ANALYSIS

Page 71: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

I-V characteristic of CMOS inverter

January 28, 201676.

Vi

VDD

ID

Page 72: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

CMOS timing delays (𝒕𝒓, 𝒕𝒇, 𝒕𝑷𝑯𝑳, 𝒕𝑷𝑳𝑯, 𝒕𝑻𝑯𝑳, 𝒕𝑻𝑳𝑯)

January 28, 201677.

Vi Vo

VDD

Page 73: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Masking – XOR gate example

January 28, 201678.

x1

y1

x2

y2

z1

z2

x

yz

𝑥⨁𝑦 = 𝑥1⨁𝑥2 ⨁ 𝑦1⨁𝑦2 = 𝑥1⨁𝑦1 ⨁ 𝑥2⨁𝑦2 = 𝑧1⨁𝑧2 = 𝑧

Page 74: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Masking – AND gate example

January 28, 201679.

x

yz

x1

y1

x1

y2

x2

y2

x2

y1

z1

z2

z1

𝑥 ∙ 𝑦 = 𝑥1⨁𝑥2 ∙ 𝑦1⨁𝑦2

= 𝑧1 ⊕ 𝑧1 ⊕ (𝑥1 ∙ 𝑦1) ⊕ (𝑥1 ∙ 𝑦2) ⊕ (𝑥2 ∙ 𝑦1) ⊕ 𝑥2 ∙ 𝑦2

= 𝑧1⨁𝑧2 = 𝑧

Page 75: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Masking – AND gate example (delays cause leakage)

January 28, 201680.

x1

y1

x1

y2

x2

y2

x2

y1

z1

z2

z1

y1 y2 y switching

0 0 0 -

0 1 1 1 AND, 1 XOR

1 0 1 1 AND, 2 XOR

1 1 0 2 AND, 2 XOR

y = 0 => 2 AND, 2 XOR gates switching on average

y = 1 => 2 AND, 3 XOR gates switching on average

Page 76: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Masking with Sufficient Noise

slide credit:

Marcel Medwed

Page 77: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

CORRELATION

POWER ANALYSIS

Page 78: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

January 28, 201683.

Correlation Power Analysis

S-BOX

(8-bit)

plaintext (𝑝)

ciphertext (𝑐)key (𝑘)

𝑥𝑗𝑖 = 𝐻𝑊 𝑘𝑖 ⊕ 𝑝𝑗 0 < 𝑖 < 255, 0 < 𝑗 < 𝑛

𝑦𝑗 = 𝑓 𝐻𝑊 𝐾 ⊕ 𝑝𝑗 Measured current!

𝜌𝑥𝑦𝑖 =

𝑗=0𝑛−1 𝑥𝑗

𝑖 − 𝑥 𝑦𝑗 − 𝑦

𝑗=0𝑛−1 𝑥𝑗

𝑖 − 𝑥2

𝑗=0𝑛−1 𝑦𝑗 − 𝑦

2

# of traces

Pearson correlation

Page 79: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

January 28, 201684.

Correlation Power Analysis

S-BOX

(8-bit)

plaintext (𝑝)

ciphertext (𝑐)key (𝑘)

𝑥𝑗𝑖 = 𝐻𝑊 𝑆𝐵𝑂𝑋 𝑘𝑖 ⊕ 𝑝𝑗

0 < 𝑖 < 255, 0 < 𝑗 < 𝑛

𝑦𝑗 = 𝑓 𝐻𝑊 𝑆𝐵𝑂𝑋 𝐾 ⊕ 𝑝𝑗 Measured current!

𝜌𝑥𝑦𝑖 =

𝑗=0𝑛−1 𝑥𝑗

𝑖 − 𝑥 𝑦𝑗 − 𝑦

𝑗=0𝑛−1 𝑥𝑗

𝑖 − 𝑥2

𝑗=0𝑛−1 𝑦𝑗 − 𝑦

2

# of traces

Pearson correlation

Page 80: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

COST OF

COUNTERMEASURES

Page 81: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Security comes at Price – Area Overhead

January 28, 201686.

Insecure

X GE

SCA-secure

5X GE

SCA&FA-secure

10X GE

Page 82: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Security comes at Price – Performance Penalty

January 28, 201687.

Insecure

N s

SCA-secure

~3-5N s

SCA&FA-secure

~8-10N s

Page 83: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Challenges – SCA Countermeasures

January 28, 201688.

INCOMPLETE MODELS

Circuit

models

Adversary

models1st vs higher

order DPA

Page 84: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Challenges – FA Countermeasures

January 28, 201689.

LACK OF CREATIVITY

Redundant

executions

Dummy

operationsLight

sensors

Page 85: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

THANK YOU!

January 28, 201690.

Thanks to the teams of

KATAN, SPONGENT, PRINCE, FIDES

Page 86: Good Practices for Designing Cryptographic Primitives in Hardware · 2016-01-28 · Latency vs Throughput 32. January 28, 2016 Latency = 15 s Throughput = 0.2 beer/s 12 3 6 Parallel

Workshop on Crypto Design for IoT

January 28, 201691.

https://www.cosic.esat.kuleuven.be/ecrypt_net_iot_workshop_2016/


Recommended