Evaluation of the CAESAR Hardware API for Lightweight 0 ... · Evaluation of the CAESAR Hardware...

Post on 16-Apr-2019

213 views 0 download

transcript

Evaluation of the CAESAR Hardware API for LightweightImplementations

Panasayya Yalla, Jens-Peter KapsDepartment of Electrical and Computer Engineering, George Mason University, Fairfax, Virginia 22030, USA

AbstractThe Competition for Authenticated Encryption: Security, Applicability,and Robustness (CAESAR) requires that all hardware implementationsof candidate algorithms adhere to the CAESAR Hardware API [1]. TheCAESAR Hardware API is supported by a development package whichincludes VHDL code for universal pre- and post-processors forhigh-speed and recently also for lightweight implementations. Theseprocessors are designed to make a cipher core compliant with the API.In this work we verify that the lightweight package has a smaller areafootprint than the high-speed package. We also show that the overheadof using the generic lightweight pre- and post-processors over integratingtheir functionality into the cipher core is negligible. As part of these casestudies, we have developed the first lightweight implementations ofKetje-Sr, Ascon-128, and Ascon-128a.

Introduction and MotivationI CAESAR evaluates candidates for a final portfolio of newAuthenticated Encryption with Associated Data (AEAD) algorithms.

I All candidates must adhere to the CAESAR hardware (HW)Application Programming Interface (API).

I The HW API is one component which enables a fair comparisonamong algorithms.I Independent FIFO inputs for public data (PDI) and secret data (SDI) andFIFO output (DO).

I In-band signaling for commands and data types using a simple protocol.I CAESAR HW API is supported by an implementer’s guide anddevelopment package [2].I Includes VHDL code for high-speed (HS) and lightweight (LW)implementations.

I Pre- and PostProcessor separate protocol from cryptographic algorithm.I Bypass FIFO stores and passes header information to PostProcessor.

I It is generally assumed that having generic pre-and post-processorsincreases the area consumption over merging their functionality withthe cipher cores.

Differences between HS vs. LW Packages

High-SpeedI Supports bus width32 ≤ w ≤ 256 in multiples of 8.

I PreProcessor expands PDI andSDI data to full block size forCipherCore.

I PreProcessor stores one block ofPDI and SDI data.

I PreProcessor contains universalpadding unit.

I Tag comparison has to beperformed in CipherCore.

LightweightI Supports bus width w of 8, 16,and 32.

I PreProcessor, CipherCore,Bypass FIFO, and PostProcessorhave equal bus width.

I PreProcessor has no datastorage.

I Assumes padding is performed inCipherCore.

I PostProcessor supports tagcomparison.

CAESAR High-Speed Block Diagram

24 24

sdi_valid

sdi_readysdi_ready

sdi_valid

sdi_data sdi_datasw

DBLK_SIZEDatapath

CipherCore

DBLK_SIZE DBLK_SIZE

din_valid

din_ready

din FIFO

CMD

dout

dout_ready

dout_valid

key_update

bdi_eot

bdi_eoi

bdi_type

bdi_ready

3

bdi_valid

key_update

bdi_eot

bdi_eoi

bdi_type

bdi_ready

bdo_size

bdo_ready

Controller

CipherCorebdi_valid bdo_valid

bdo_size

bdo_ready

bdo_valid

key_valid

key_ready

key_valid

key_ready

LBS_BYTES+1

decrypt decrypt

bdi_valid_bytes

bdi_pad_loc

DBLK_SIZE/8

DBLK_SIZE/8

bdi_size

bdi_pad_loc

bdi_valid_bytes

bdi_sizeLBS_BYTES+1

CipherCore

pdi_valid

pdi_readypdi_ready

pdi_valid

OptionalRequired

do_valid do_valid

pdi_data

do_datado_data

pdi_dataw

w

cm

d_valid

cm

d_re

ady

cm

d

cm

d_va

lid

cm

d_

read

y

cm

d

bdi

key

bdo

bdi

key

DBLK_SIZEbdo

KEY_SIZE

AEAD_TP

fdo_valid

fdi_

ready

dout_

valid

dout_

ready

din

_valid

din

_re

ady

dout

din

bdi_partialbdi_partial

fdi_

data

fdo_data

fdi_

valid

fdo_re

ady

Two−Pass

FIFO

msg_auth_valid msg_auth_valid

msg_authmsg_auth

msg_auth_ready msg_auth_ready

Processor

Pre

Processor

Post

do_ready do_ready

CAESAR Lightweight Block Diagram

sdi_valid

sdi_readysdi_ready

sdi_valid

sdi_data sdi_data

Processor

Pre

sw

w

w/8

w/8

w/8+1

w w

w

4

Processor

Post

w/8

Tag

Comparator

w

pdi_valid

pdi_readypdi_ready

pdi_valid

pdi_data pdi_data

do_ready do_ready

do_valid do_valid

do_datado_dataw

4

bdi

AEAD

bdi_eoi

bdi_eot

key_valid

key_ready

key

cmd_valid

cmd_ready

Required Optional

bdi_type

decrypt

key_update

CipherCore

key

key_valid

key_ready

bdi

bdi_ready

bdi_valid bdi_valid

bdi_ready

bdi_partial

bdi_eot

bdi_eoi

bdi_type

decrypt_in

key_update

bdo

bdo_valid

bdo_ready

din_valid

din_ready

dout_valid

dout_ready

doutHeader/Tag

FIFO cmd_valid

cmd_ready

din

end_of_block

bdo_ready

bdo_valid

bdo

end_of_block

cmd cmd

bdi_partial

bdi_size bdi_size

bdi_valid_bytes bdi_valid_bytes

bdi_pad_locbdi_pad_loc bdo_valid_bytes bdo_valid_bytes

msg_auth_readymsg_auth_ready

msg_auth_valid msg_auth_valid

msg_authmsg_auth

bdo_type bdo_type

sw

Protocol: InstructionLSB

Status

Opcodeor

4 12

MSB

Reserved

16-bit Instruction with w=16LSB

4

2

Reserved

Reserved

Opcode

Statusor

MSB

4

16-bit Instruction with w=8

Rd−Inst Rd−Rsvd

Rd−Hdr w=16, 32w=8

States for ProcessingInstruction

Protocol: Segment Header

EOI

8

1 1 1 14

16

ReservedInfo Segment Length

MSB

TypeSegment

LSB

EOT

Partial

Last

8

32-bit Header

(LSB)

4

Info

Reserved

MSB LSB

Segment Length(MSB)

Segment Length

8

Segment Length

8

2

Info Reserved

MSB LSB

8

With w= 8, and 16

Rd−Hdr Rd−Rsvd

Seglen1SegLen

Seglen2Ld−Data

w=8w=16

w= 32

States forProcessing Header

Case Study1. Determine overhead of CAESAR LW package.

I Ketje-Sr implementation with integrated support of CAESAR API.I Ketje-Sr implementation using new CAESAR lightweight developmentpackage.

2. Determine overhead of CAESAR LW package vs HS package.I Implementation of Ascon using CAESAR LW package.I Using existing Ascon HS implementation.

Ketje-SrI Ketje [3] is based on round reduced Keccak-f called MonkeyWrap.I Has four variants Ketje-Jr, Ketje-Sr, Ketje-Minor, and Ketje-Majorwhich use Keccak-p∗[200], Keccak-p∗[400], Keccak-p∗[800], andKeccak-p∗[1600] respectively.

I Each round of Keccak-p∗ consists of five steps θ, ρ, π, χ, and ι.I In θ step, each bit in the state is Xored with two other bits fromtwo different columns.

I The state bits are rotated for each lane using one of the 25 differentoffsets in ρ step

I Lanes are rearranged in π, integer multiplication in χ.I The last step is ι, where a round constant is added.

Ketje-Sr Datapath

I We implemented aKetje-Sr using a 16-bitdatapath and interface.

I Datapath is the samefor integrated CAESARAPI support and usingCAESAR LW package.

I State is stored in adual-port memory(RAM) with one

Keypack

<<<1

Port−B

Port−A

RAM

Rho

16

RAMK2(LSB)

8

15

RAMK1(MSB)

0

7 reg−K

15

8

7

0

Padding

reg−Arcon16

16

16

sdi_data

pdi_data

do_data

read/write and one read-only ports.I To reduce the complexity of padding for key, the key size is fixed to128-bits.

I Two memory units (RAMK1, and RAMK2) with pre-stored valuesand a register (reg-K) for key storage and KeyPack operations.

I Padding for message and AD using multiplexers.I Needs 160 clock cycles to process a 32-bit block.I TP = 32

160 · F

AsconI Ascon[4] is a permutation based authenticated cipher.I Ascon-128, and Ascon-128a - two variants with block sizes of 64and 128 respectively.

I In each round, three sub transformations called constant-addition,substitution, and linear diffusion are applied.

I Constant-addition is the first operation in the round, where aconstant is added to one of the five words. Twelve round constantsare used.

I Substitution layer uses 5x5 S-boxes.I Linear diffusion layer for diffusion across each of the five 64-bitwords using circular shifts and an XOR.

Ascon Datapath

I A 64-bit datapath isused in this design anda 32-bit interface.

I The state is stored in adual-port RAM.

I Key is stored in a RAM.I The substitution layeris implemented in abit-slice fashion.

I Due to the contentionon the RAM ports, 18operations take 33clock cycles.

31

063

32

32

Padding

31

0 RAMK

63

32

R1

32

32

64

32

32

pdi_data

sdi_data

LDiff

63

10

00

0

63

8

rcon7

0

do_data

64

1

0

0

64

Port−B

Port−A

RAM

I The round constants are generated using two 4 bit registers andadders.

I Two 5-to-1 multiplexers are used to perform circular shifts in lineardiffusion step (LDiff).

I TP = 12833·8 · f

Case Study 1: Integrated vs. LW Package

Implementation Results on Xilinx Spartan-6 FPGA using ATHENa [5]

Flip- Freq TP TP/AreaDesign Slices LUTs Flops [MHz] [Mbps] [Mbps/slice]

KETJE-SR1 140 436 98 122.4 24.48 0.17KETJE-SR2 155 450 114 120.1 24.03 0.16Overhead 15 14 16ASCON-1282 231 684 268 216.0 60.10 0.26ASCON-128a2 231 684 268 216.0 119.16 0.52Joltik [6]3 168 534 381 200.0 426.67 2.54ACORN [6]4 202 540 383 231.6 1,852.80 9.17

1 ⇒ Dedicated CAESAR API; 2 ⇒ CAESAR LW Package; 3 ⇒ Not compliant to CAESAR API;4 ⇒ Tweaked CAESAR HS Package

I Using CAESAR LW Package leads to a small area increase.I Three separate counters for sdi, pdi and do buses are used for simplicity andparallel operation.

I Counter for sdi can be dropped if cipher core provides end_of_key signal.I Comparing our designs which each other and other reportedimplementations.I Ascon-128a has 4 times the TP while consuming only 50% more slices.I Joltik implementation is not compliant with CAESAR API but performssignificantly better.

I ACORN is based on a stream cipher which typically perform very well inlightweight implementations.

Case Study 2: Area Overhead HS vs. LW Pkg.

Area Overhead High-Speed (HS) vs. LightWeight (LW) PackagesImplementation Results on Xilinx Spartan-6 FPGA using ATHENa [5]

Design Top-level Slices LUTs Filp-FlopsAEAD1 231 684 268

LW Ascon CipherCore 196 606 212Overhead 35 78 56

AEAD2 416 1282 792HS Ascon [6] CipherCore 379 1033 529Overhead 37 249 263

1 ⇒ CAESAR LW Package; 2 ⇒ CAESAR HS Package

I Adding CAESAR API support toI LW core using LW Package leads to a small area increase,I HS core using HS Package leads to a larger area increase.

ConclusionsI The graph showsimplementation results ofKetje-Sr on Spartan-6.

I Using the CAESAR LWPackage leads to a small areaincrease over integrateddesigns.

I This small increase can easilybe mitigated. Slices LUTs FFs

050

100150200250300350400450500

Case Study 1

IntegratedLW Package

I The graph shows the overheadincurred for implementationsof Ascon on Spartan-6.

I CEASAR HS Package leads toa much larger area increasethan the LW Package as itexpands the data and keybuses to the full block size.

Slices LUTs FFs 0

50

100

150

200

250

300Case Study 2

LW OverheadHS Overhead

I CAESAR LW Package allows for bus widths of 8 and 16 bits, which arenot currently supported by CAESAR HS Package.

I The CAESAR LW-Package reduces the design time for LWimplementations.

I The CAESAR LW Package will be included in the next release of theDevelopment Package for the CAESAR Hardware API.

I The usage will be documented in the next release of the Implementer’sGuide to the CAESAR Hardware API.

AcknowledgmentThe CAESAR Lightweight API Sup-port Package was developed in collab-oration with Fabrizio De Santis andMichael Tempelmeier from

References[1] E. Homsirikamol, W. Diehl, A. Ferozpuri, F. Farahmand, P. Yalla, J.-P. Kaps, and K. Gaj, “CAESAR

hardware API,” Cryptology ePrint Archive, Report 2016/626, 2016, http://eprint.iacr.org/2016/626.[2] “Development package for the CAESAR hardware APIv1.2,”

https://cryptography.gmu.edu/athena/AEAD/GMU_AEAD_HW_API_v1_2.zip, accessed:2017-06-30.

[3] G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer, “CAESAR submission:Ketje v2,”Submission to CAESAR (Round3), September 2016,https://competitions.cr.yp.to/round3/ketjev2.pdf.

[4] C. Dobraunig, M. Eichlseder, F. Mendel, and M. Schläffer, “ASCON v1.2,” Submission to CAESAR(Round3), September 2016.

[5] K. Gaj, J.-P. Kaps, V. Amirineni, M. Rogawski, E. Homsirikamol, and B. Y. Brewster, “ATHENa –automated tool for hardware evaluation: Toward fair and comprehensive benchmarking of cryptographichardware using FPGAs,” in 20th International Conference on Field Programmable Logic andApplications - FPL 2010. IEEE, 2010, pp. 414–421, winner of the FPL Community Award.

[6] “ATHENa database of FPGA results for authenticated ciphers,”https://cryptography.gmu.edu/athenadb/fpga_auth_cipher/table_view, accessed: 2017-07-30.

Cryptographic Engineering Research Group (CERG) Department of Electrical and Computer Engineering George Mason University http://cryptography.gmu.edu