Post on 28-Nov-2015
description
transcript
©20
08 T
he M
athW
orks
, Inc
.
® ®
Using Matlab to Aid the Implementation of a Fast RSA Cryptocore
Carsten Siggaard, Senior Consultant Danish Technological Institute (DTI)
2
® ®
Danish Technological Institute (DTI)
Knowledge Application
Knowledge
DevelopmentKnowledge
Transfer
3
® ®
Why Implement RSA on a Field Programmable Gate Array?� FPGAs are inherently parallel, that means faster than General
Purpose Processors but at a much lower clock speed.� Consider system using RSA encryption: If you can place the
encryption on a separate FPGA then the CPU on this platform can perform other tasks.
� RSA is a difficult algorithm to implement on FPGAs - much more difficult than the Advanced Encryption Standard (Rijndael, AES) or Blowfish. Therefore if you can implement RSA – virtually any encryption standard can be implemented.
� The core calculations in RSA are the same as those performed inother cryptographic schemes such as Diffie-Hellman key exchange and El-Gamal.
4
® ®
Major Results
� Theoretical MAX: 3.150.000 Ops/s (Altera Stratix IV E with 1360 16-bit multipliers).
� 50% Usage (On Xilinx XC4SX35)� 1024 bit message� 1024 bit modulo, 5 bit public exponent
� Compare with AMD Opteron 2.8 GHz: 26.000 Ops/s
� @200 MHz 50.000 operations can be performed
� Power consumption 1 W (Xilinx power estimator using simulated data).
� The core can perform 35.000 cryptographic operations per second
MAX 90 W
5
® ®
Used Toolboxes and Blocksets
� Matlab
� Fixed Point Toolbox – modelling large integers.
� Simulink� Fixed Point Blockset – modelling (large) integers.
� Stateflow was used to implement the controller.
� hdlCoder – Generating generic VHDL code
� Xilinx Sysgen for HIL
6
® ®
Development Issues
� In cryptography all numbers are usually either bit fields or integers modulo n. Therefore use a toolbox like Fixed Point Toolbox to model these numbers.
� Model the algorithm in Simulink/Stateflow, and compare the results vs. the results from the Matlab model.
� Generate the code and run it.
� Model the algorithm in Matlab
7
® ®
RSA Key Exchange (RFC4432)
b,p Bobs public key
Randombytes K
a,b,p
c= mb mod p
Put K intomessage m
m=(c)a mod p
Signed exchange hash
8
® ®
What is the engine in RSA, Diffie-Hellman and El-Gamal
Xn mod mDiscrete logarith
m
modulo m is
DIFFICULT
9
® ®
The Usual approach
� To calculate exponentiation modulo m repeatedly do:
1. X*X (square and multiply)2. Reduce modulo m by trial division or Barret’s
algorithm
� For small numbers this can be done efficiently
� For large numbers this can become a bit difficult
10
® ®
*
x y
*
*
M
n' r
+
n
-
/
y1 y2b
t
t
m
m2
y1
y2
The Montgomery Algorithm
Calculates(a*r) * (b*r) *r-1 mod n
Result is(a*b*r) mod n
Be aware of timing/power attacks!
11
® ®
Matlab Development
� Matlabs built-in GCD is based upon floats (Double)� A GCD must be created which uses the FI-type.
� R2 mod n must be calculated � Create a function which uses the FI type.
� A helper function which generates stimuli structures for simulink.
� The Montgomery Algorithm was developed to compare the results from this algorithm with the results from Simulink.
12
® ®
Important topics for the NumericType and fimath objects!� Be aware of the round and overflow modes, they are
intended to be used with signal processing.
� Be aware of how the numbers expands during the calculation because� The precision have impact on the correctness
� The precision have impact on the performance.
15
® ®
Perspectives� The title is ”Using Matlab to aid the implementation of a fast RSA
Cryptocore”
� The title should have been ”Using Matlab to do the implementation of a fast RSA Cryptocore”
� An advanced encryption algorithm can implemented using Matlab/Simulink.
� For commercial SSL offload engines certification is a must.
� The core can be implemented as an Off-the shelf service
16
® ®
Conclusion� Correct use of Simulink with the hdlCoder results in a FAST
and efficient core.� Simulink runs faster than a comparable VHDL simulation
� More tests can be performed during the same time.
� Using a faster model-based approach make programming more efficient.
� You must have knowledge of the mapping from Simulink Blocks into HDL blocks, and the result will also depend on your synthesis tool!
� You do not need to spend time digging into subtile VHDL constructs.
� The result is virtually generic.