+ All Categories
Home > Documents > day1_siggaard

day1_siggaard

Date post: 28-Nov-2015
Category:
Upload: srbinforlife
View: 4 times
Download: 0 times
Share this document with a friend
Description:
day1_siggaar dday1_siggaard
Popular Tags:
17
© 2008 The MathWorks, Inc. ® ® Using Matlab to Aid the Implementation of a Fast RSA Cryptocore Carsten Siggaard, Senior Consultant Danish Technological Institute (DTI)
Transcript

©20

08 T

he M

athW

orks

, Inc

.

® ®

Using Matlab to Aid the Implementation of a Fast RSA Cryptocore

Carsten Siggaard, Senior Consultant Danish Technological Institute (DTI)

2

® ®

Danish Technological Institute (DTI)

Knowledge Application

Knowledge

DevelopmentKnowledge

Transfer

3

® ®

Why Implement RSA on a Field Programmable Gate Array?� FPGAs are inherently parallel, that means faster than General

Purpose Processors but at a much lower clock speed.� Consider system using RSA encryption: If you can place the

encryption on a separate FPGA then the CPU on this platform can perform other tasks.

� RSA is a difficult algorithm to implement on FPGAs - much more difficult than the Advanced Encryption Standard (Rijndael, AES) or Blowfish. Therefore if you can implement RSA – virtually any encryption standard can be implemented.

� The core calculations in RSA are the same as those performed inother cryptographic schemes such as Diffie-Hellman key exchange and El-Gamal.

4

® ®

Major Results

� Theoretical MAX: 3.150.000 Ops/s (Altera Stratix IV E with 1360 16-bit multipliers).

� 50% Usage (On Xilinx XC4SX35)� 1024 bit message� 1024 bit modulo, 5 bit public exponent

� Compare with AMD Opteron 2.8 GHz: 26.000 Ops/s

� @200 MHz 50.000 operations can be performed

� Power consumption 1 W (Xilinx power estimator using simulated data).

� The core can perform 35.000 cryptographic operations per second

MAX 90 W

5

® ®

Used Toolboxes and Blocksets

� Matlab

� Fixed Point Toolbox – modelling large integers.

� Simulink� Fixed Point Blockset – modelling (large) integers.

� Stateflow was used to implement the controller.

� hdlCoder – Generating generic VHDL code

� Xilinx Sysgen for HIL

6

® ®

Development Issues

� In cryptography all numbers are usually either bit fields or integers modulo n. Therefore use a toolbox like Fixed Point Toolbox to model these numbers.

� Model the algorithm in Simulink/Stateflow, and compare the results vs. the results from the Matlab model.

� Generate the code and run it.

� Model the algorithm in Matlab

7

® ®

RSA Key Exchange (RFC4432)

b,p Bobs public key

Randombytes K

a,b,p

c= mb mod p

Put K intomessage m

m=(c)a mod p

Signed exchange hash

8

® ®

What is the engine in RSA, Diffie-Hellman and El-Gamal

Xn mod mDiscrete logarith

m

modulo m is

DIFFICULT

9

® ®

The Usual approach

� To calculate exponentiation modulo m repeatedly do:

1. X*X (square and multiply)2. Reduce modulo m by trial division or Barret’s

algorithm

� For small numbers this can be done efficiently

� For large numbers this can become a bit difficult

10

® ®

*

x y

*

*

M

n' r

+

n

-

/

y1 y2b

t

t

m

m2

y1

y2

The Montgomery Algorithm

Calculates(a*r) * (b*r) *r-1 mod n

Result is(a*b*r) mod n

Be aware of timing/power attacks!

11

® ®

Matlab Development

� Matlabs built-in GCD is based upon floats (Double)� A GCD must be created which uses the FI-type.

� R2 mod n must be calculated � Create a function which uses the FI type.

� A helper function which generates stimuli structures for simulink.

� The Montgomery Algorithm was developed to compare the results from this algorithm with the results from Simulink.

12

® ®

Important topics for the NumericType and fimath objects!� Be aware of the round and overflow modes, they are

intended to be used with signal processing.

� Be aware of how the numbers expands during the calculation because� The precision have impact on the correctness

� The precision have impact on the performance.

13

® ®

The Engine – Schoolbook multiplication

14

® ®

HW in the LOOP

JTAG

15

® ®

Perspectives� The title is ”Using Matlab to aid the implementation of a fast RSA

Cryptocore”

� The title should have been ”Using Matlab to do the implementation of a fast RSA Cryptocore”

� An advanced encryption algorithm can implemented using Matlab/Simulink.

� For commercial SSL offload engines certification is a must.

� The core can be implemented as an Off-the shelf service

16

® ®

Conclusion� Correct use of Simulink with the hdlCoder results in a FAST

and efficient core.� Simulink runs faster than a comparable VHDL simulation

� More tests can be performed during the same time.

� Using a faster model-based approach make programming more efficient.

� You must have knowledge of the mapping from Simulink Blocks into HDL blocks, and the result will also depend on your synthesis tool!

� You do not need to spend time digging into subtile VHDL constructs.

� The result is virtually generic.

17

® ®

Questions ?

� Taastrup

� Swedcert AB

� Teknologisk Institut AB, Formerly SIFU

� FIRMA 2000 Poland

� Teknologisk Institut Denmark

� Aarhus

http://www.teknologiskinstitut.se

http://www.teknologisk.dk

� Kolding, Herning, Odense, Hirtshals

[email protected]