+ All Categories
Home > Documents > Experimental Testing of the Gigabit IPSec-Compliant ...Paweł Chodowiec & Kris Gaj George Mason...

Experimental Testing of the Gigabit IPSec-Compliant ...Paweł Chodowiec & Kris Gaj George Mason...

Date post: 09-Feb-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
49
Paweł Chodowiec & Kris Gaj George Mason University Peter Bellows & Brian Schott USC - Information Sciences Institute Experimental Testing of the Gigabit IPSec-Compliant Implementations of Rijndael and Triple DES Using SLAAC-1V FPGA Accelerator Board http://ece.gmu.edu/crypto-text.htm
Transcript

Paweł Chodowiec & Kris GajGeorge Mason University

Peter Bellows & Brian SchottUSC - Information Sciences Institute

Experimental Testing of the GigabitIPSec-Compliant Implementations

of Rijndael and Triple DESUsing SLAAC-1V FPGA Accelerator Board

http://ece.gmu.edu/crypto-text.htm

IPSec: Transport Mode

Internet

Gateway Gateway

Cryptographic end points

Host Host

IPSec: Tunnel Mode

Internet

Securitygateway

Securitygateway.

.

.

.

.

.

.

.

Cryptographic end points

Host

Host

Host

Host

IPSec: Need for hardware accelerators

• large amount of secure associations processedby a single device

• cryptographic operations computationallyexpensive compared with regular IP operations

Cryptographic transformations in IPSec

Security ServicesThis project

Confidentiality

Key exchangeAuthentication

IPSec: Cryptographic algorithmsConfidentiality (1)Required:

Document: RFC 2405

DES

Key length

56 bitsAlgorithm

Document: RFC 2451 Optional:

Algorithm Key length[bits]

Popular sizes Default size

Triple DESBlowfishCAST-128IDEARC5

16840..44840..128

12840..2040

168128

40, 64, 80, 128128

40, 64, 80, 128

168128128128128

Breaking DES: Deep CrackElectronic Frontier Foundation, 1998

Total cost: $220,000Average time of search:

4.5 days/key

1800 ASIC chips, 40 MHz clock

Triple DESDiffie, Hellman, 1977

plaintext

DESencryption

DES-1

decryption

DESencryption

56K1

56K2

56K3

K = (K1, K2, K3)

168 bitsof the key

EDE mode

ciphertext

AES Contest EffortJune 1998

15 Candidatesfrom USA, Canada, Belgium,

France, Germany, Norway, UK, Isreal,Korea, Japan, Australia, Costa Rica

Round 1

August 1999

October 2000

5 final candidates

SecuritySoftware efficiency

Round 2

Mars, RC6, Rijndael, Serpent, TwofishSecurity

Hardware efficiency

1 winner: RijndaelBelgium

IPSec: Cryptographic algorithmsConfidentiality (2)

Proposed:Document: Internet Draft, November 2000,

Algorithm Key length[bits]

Popularsizes

Defaultsize

AES (Rijndael)MARSRC6SerpentTwofish

128, 192, 256128..448≤ 2040≤ 256≤ 256

128, 192, 256128, 192, 256128, 192, 256128, 192, 256128, 192, 256

128128128128128

RFC 2405Modes of operation: CBCM1 M2 M3

E

IV MN-1 MN

. . .

E E E E. . .

C1 C2 C3 CN-1 CN

Encryption

DIV

D D D D. . . . . .

M1 M2 M3 MN-1 MN

C3 CN-1 CNC2C1Decryption

Modes of operation: Current standard - CBCM3M1 M2 MNMN-1

. . .

E E E EE

IV

Problems:

. . .

C1 C2 C3 CN-1 CN

- No parallel processing of blocks from the same packet- No speed-up by preprocessing- No integrity or authentication

Counter modeIV+NIV+N-1IV IV+1 IV+2

. . .

M0M1 M2

E

MN-1 MN

E E E E. . .

KN

C0 C1 C2 CN-1Cechy:+ Potential for parallel processing+ Speed-up by preprocessing- No integrity or authentication

K0 K2 KN-1K1

CN

Operating Modes Contest4 Old Modes

(CBC, CFB, OFB, ECB)

10 New Candidatesfrom Egypt, Estonia, Norway,

Sweden, Thailand, USA

April 2001

Counter modeSummer 2001

5 Standard Modes2002

New Standard Modes

IPSec: Why reconfigurable hardware?

Frequently changing algorithms and their parameters

- AES- new modes of operation- new hash functions- parameters of public key cryptosystems

Capability for reconfiguration =- algorithm agility- scalable security- flexible architecture- remote error correction

Reconfigurability

External ROM and microprocessor enableschanging an FPGA function in several milliseconds

Encryption vs. decryption vs. key scheduling

Various algorithms

Keyscheduling Encryption

FPGA

5-15 ms

FPGA

Decryption

FPGA

5-15 ms

Triple DES IDEA

FPGA FPGA

5-15 ms 5-15 ms

FPGA

AES

SLAAC-1V

User programmed part

Standard interface(PCI interface + control module)

Xilinx FPGA devices SRAM

72-bit ring bus(64 bit data+ 8 bit control)

64/66 PCI

X0

X1 X2

IF

X072

72 72

72

X XX

60

72-bit shared bus

configurationcontrol device

Target FPGA devices

Xilinx Virtex - XCV 1000

• 0.22 µm CMOS process

• 1 mln equivalent logic gates

• 12 288 CLB slices

• Up to 200 MHz clock

ProgrammableInterconnects

Configurable Logic Block slices (CLB slices)

Block RAMs

• 10 4-kbit block RAMs

Methodology and Tools

Code in VHDL

1. Functional simulationAldec, Active-HDL

Netlist with timing

Xilinx, Foundation Series v. 3.1i

4. Experimental Testing

3. Timing simulation

Bitstream

Aldec, Active-HDL

2. Synthesis and

Implementation

Implementation Verification

USC-ISI, SLAAC-1V FPGA board

Primary parameters of hardware implementationsfor secret-key block ciphers

Latency Throughput

Mi

Time to encrypt/decrypt

a single block of data

Encryption/decryption

Number of bits encrypted/decrypted

in a unit of timeCi

Encryption/decryption

Mi

Mi+1

Mi+2

Ci

Ci+1

Ci+2

Throughput =Block_size · Number_of_blocks_processed_simultaneouslyLatency

Dependence of the encryption time on latency and throughput

Encryption time

Latency (Message_size –Block_size)

Message size

Throughput

Time

Typical Flow Diagram of a Secret-Key Block Cipher

Round Key[0]Initial transformation

i:=1

Round Key[i]

i:=i+1

i<#rounds?

Cipher Round#rounds

times

Round Key[#rounds+1]

Final transformation

Basic iterative architecture

register

combinationallogic

one round

multiplexer

round key

Triple DES: Basic Architecture Encryption/Decryption CoreInput Ln-1 Input Rn-1

F

32

32

32

mux1

mux3

32

mux2

mux4

Round key Kn

Output Ln Output Rn

Triple DES: Basic ArchitectureKey scheduling

<<<1 <<<2>>>1 >>>2

PC-2

PC-2

e

d

Four banksof key memory

Key input

Round key

PC-164 56 56

5648

4848

encryption decryption

AES -Rijndael: Basic Architecture

ByteSub&

InvByteSub

ShiftRow

MixCol

InvShiftRow

InvMixCol

round key

round key

round key

round key

Data input

Encryption circuit Decryption circuit

R1

R2a

IV

16 x 128 bit buffer

R0

R2b R2cR2d

R4

R3

16 x 128 bit buffer

B1 B2 B3 B4

mux1

mux2

M1 M2

IV

Data output

AES - Rijndael: 3-in-1 Key Scheduling UnitInput 64 bits

Rot Sub

Output 64 bits

Rconi/Nk

32wi-2 wi-1

wi-4 wi-3

wi-7

wi-Nk

wi-6 wi-5

wi-8

wi wi+1

wi-Nk+1

0

wi32

32

32 32

wi+1

32

Banks of round keys

3-in-1 keyscheduling unit

256 x 64 bitRAM

256 x 64 bitRAM

main key(in 64-bit words)

64

round key

64

64 64

128

16 banks of round keys

Rijndael vs. Triple DESExternal differences

AES-RijndaelTriple DES

input

3 DES

64 bits

input

64 bitsoutput

key

168 bitsAES

128 bits

key

128, 192, and 256 bits

128 bitsoutput

Rijndael vs. Triple DESInternal differences

RijndaelTriple DESSubstitution-

Linear Transformation NetworkFeistel network

Internal operationsoptimized for hardware

Internal operations optimized for software and hardware

• separate encryption and decryption units

• larger area• different maximum

encryption and decryption speeds

• the same circuit used forencryption and decryption

• compact design• the same speed for

encryption and decryption

Rijndael vs. Triple DESFunctional differences

RijndaelTriple DES

Round keys generatedfrom the main key• in only one order • 1/4 th or 1/2 nd of a round key per clock cycle

Round keys generatedfrom the main key• in arbitrary order• one round key per

clock cycle

Round keys need to beprecomputed and stored in internal memory

Round keys can be computed on the fly

Testing Procedure1. Functional testing

Tests based on NIST Special Publication 800-20• Known Answer Tests • Monte Carlo Test

2. Maximum clock frequency test

• clock frequency varied using binary search• 1 GB of data encrypted or decrypted in the CBC mode• results compared with results from software implementation

3. Maximum encryption/decryption throughput test

• maximum clock frequency• 4 GB of data encrypted or decrypted in the CBC mode• time necessary to complete all operations determined

Maximum Clock Frequency Test (1)START

Generate and upload key, IV,set DMA to send and receive 1GB of data

Perform reference encryption/decryptionin software

Set upper and lower bounds for clockfrequency

Test clock frequency = (upper bound + lower bound)/2

Encrypt/decrypt data in hardware at thegiven test clock frequency

Result same as insoftware?

Lower bound =test clock frequency

Upper bound =test clock frequency

Test clock frequency = (upper bound + lower bound)/2

Boundsclose?

N Y

N

Y

Results for basic architectures

Maximum clock frequency [MHz]

0255075100125150175 static analysis

experiment

7291

4760 52

Triple DESenc + dec

Rijndaelenc + dec

Rijndaelenc

Corresponding circuit throughputs

0100200300400500600700

static analysisderived from experimentalclock frequency

Throughput [Mbit/s]

91 116

521577665

800900

108

404experiment

Triple DESenc + dec

Rijndaelenc + dec

Rijndaelenc

Use of resources: basic architecture

0102030405060708090100

Percentage of the Virtex 1000 device resources

CLBs

Block RAMs

5 %10%

56 %

Triple DES Rijndael

Increasing throughput using parallel processing

Packet 1 Packet 2 Packet 3

Encryption/decryption

Memoryof

subkeys

Encryption/decryption

Memoryof

subkeys

IV1, a1, a2, … , aK IV2, b1, b2, … , bL IV3, c1, c2, … , cM

Encryption/decryption

Memoryof

subkeys

Increasing throughput using pipelining

b)a)

round #rounds=k pipeline stages

. . . .

round 1= k pipeline stages

round 2=k pipeline stages

. . . .

. . . .

. . . .

d) #rounds ·k registers

round K= k pipeline stages

. . . .

round 1= k pipeline stages

round 2= k pipeline stages

MUX

. . . .

. . . .

. . . .

c)K·k registers

one round= k pipeline stages

MUX

. . . .

k registersMUX

one round,no pipelining

register

combinational logic

Throughput [Mbit/s]mixed pipelininginner-round pipeliningbasic

18,000

431 414 177 143 62

16,76815,232

13,056

7,469

3,805

1,265 994 699

12,160

135

16,00014,00012,00010,0008,0006,0004,0002,000

0 RC6Serpent TwofishRijndael 3DES

Area [CLB slices]mixed pipelininginner-round pipeliningbasic

05000

100001500020000250003000035000400004500050000

SerpentTwofish3DES Rijndael356

375

12,288

1,0761,711

21,000

1,1373,458

46,800

2,507

2,057+8 RAMs

12,600+ 80 RAMs

4,507

19,700

5,623

4 devices

3 devices

2 devices

RC6

AES -Rijndael: Extended Architecture

ByteSub&

InvByteSub

ShiftRow

MixCol

InvShiftRow

InvMixCol

round key

round key

round key

round key

Data input

Encryption circuit Decryption circuit

R1

R2a

IV

16 x 128 bit buffer

R0

R2b R2cR2d

R4

R3

16 x 128 bit buffer

B1 B2 B3 B4

mux1

mux2

R5

R6

M1 M2

IV

Data output

Triple DES: Extended Architecture

Key input

round(1)

round(2)

round(16)

16 banksof main keysPC-1

64 56

Next key(1)

Next key(2)

Next key(16)

K1

K2

K16

Simplification of the keyscheduling unit: extended architecture

Next key(n)

<<< m >>> m

PC-2

e d Round key Kn

56

48

In-1

In

Triple DES: Key schedulingin basic architecture

<<<1 <<<2>>>1 >>>2

PC-2

PC-2

e

d

Four banksof key memory

Key input

Round key

PC-164 56 56

5648

4848

encryption decryption

Tentative results for extended architectureMaximum clock frequency [MHz]

0255075100125150175 analysis

experiment

Triple DESenc + dec

Rijndaelenc + dec

7291

47 52

Rijndaelenc

60

Rijndaelenc + dec

76

Basic architectures Extended architecture

90

Corresponding circuit throughputs

0100200300400500600700

analysisexperiment

Throughput [Mbit/s]

Triple DESenc + dec

Rijndaelenc + dec

91 116

521577

Rijndaelenc

665

Rijndaelenc + dec

843

Basic architectures Extended architecture

800900

9981000

Use of resources by extended architectures

0102030405060708090100

Percentage of the Virtex device resources

60%

19%

56 %(estimated)

CLBs

Block RAMs

Triple DES

5 % 10%

56 %

Basic architectureRijndaelRijndael Triple DES

Extended architecture

Conclusions• High-speed IPSEC-compliant implementations

of Rijndael and Triple DES developed and tested experimentally using the SLAAC-1V FPGAaccelerator board

• Encryption and decryption throughputs of Rijndael in the range of 1 Gbit/s (998 Mbit/s) demonstrated experimentally

• Integrated 1 Gbit/s implementation of Rijndael andTriple DES shown to require only 80% of resourcesof a single FPGA device Virtex XCV-1000

• SLAAC-1V accelerator board capable of supporting encryption & decryption throughputs in the range of 3 Gbit/s


Recommended