+ All Categories
Home > Documents > Running OpenSSL Crypto Algorithms in Simplescalar

Running OpenSSL Crypto Algorithms in Simplescalar

Date post: 31-Dec-2015
Category:
Upload: doris-foreman
View: 55 times
Download: 1 times
Share this document with a friend
Description:
Running OpenSSL Crypto Algorithms in Simplescalar. Piyush Ranjan Satapathy Department of Computer Science & Engineering University of California Riverside. Outline. What Crypto Algorithms are ? Why we need to run them on simplescalar ? Any previous work on this ? Introducing OpenSSL0.9-7e - PowerPoint PPT Presentation
Popular Tags:
24
Running OpenSSL Crypto Algorithms in Simplescalar Piyush Ranjan Satapathy Department of Computer Science & Engineering University of California Riverside
Transcript
Page 1: Running OpenSSL Crypto Algorithms in Simplescalar

Running OpenSSL Crypto Algorithms in Simplescalar

Piyush Ranjan Satapathy

Department of Computer Science & Engineering

University of California Riverside

Page 2: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

2

Outline

What Crypto Algorithms are ? Why we need to run them on simplescalar ? Any previous work on this ? Introducing OpenSSL0.9-7e Introducing Simplescalar version2.0 Selecting the crypto Algorithms from OpenSSL Simulation Settings and parameters Results & Discussions An interesting Comparison Demo Conclusion Acknowledgement and References Q&A

Page 3: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

3

What Crypto Algorithms Are ?

Algorithms meant for Network Security1. Authentication

2. Secrecy3. Nonrepudiation4. Integrity Control

Kind of Crypto Algorithms to solve the above1. Public Key Algorithms (Ex:- RSA,DSS,LUC...)

2. Secret key Algorithms (Ex:- AES,DES,RC4,SEAL…)3. Cryptographic Hash Functions (Ex:- MD5,SHA1…)4. Random Number Generators (Ex:- PGP, Noiz,SSH…)

Secret Key Algorithms1. Block Ciphering (Ex:- IDEA, DES, AES, BLOWFISH…)

2. Stream Ciphering (Ex:- RC4,SEAL,A5)

Many commonly used ciphers (e.g., IDEA, DES, BLOWFISH) are block ciphers. This means that they take a fixed-size block of data (usually 64 bits), an transform it to another 64 bit block using a function selected by the key. The cipher basically defines a one-to-one mapping from 64-bit integers to another permutation of 64-bit integers.
Page 4: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

4

Why run on Simplescalar ?

Architectural Analysis for Crypto algorithmsTo achieve a best network processor design we need to know the architectural analysis of crypto algorithms at cycle level accuracy.

Simplescalar Easy to Simulate !!Fast, Flexible and Accurate simulation.

Simplescalar provides a cycle level accuracy simulation of MIPS processor

Not concerned about Parallel programmingOtherwise could have used Simics…

Page 5: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

5

Previous Work on Architectural Analysis of Crypto Algorithms:

Analysis using widely available Crypto algorithms (I refer “Average” here) by haiyong et. al.

Analysis using SPECInt & CommBench Performance of SSL crypto Algorithms (Li Zhao et. al.)

But no architectural analysis of OpenSSL crypto algorithms.

Now OpenSSL has been the standard bench mark for crypto engines…..

So knowing the architectural analysis of these algorithms help understanding the need of modern network processor dealing with cryptography.

Page 6: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

6

Introducing OpenSSL 0.9.7e

Widely used Open source for crypto algorithms ( I have used the recent version)

OpenSSL is a cryptography toolkitIt implementing the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS

v1) network protocols and related cryptography standards required by them.The openssl program is a command line tool for using the various cryptography functions

of OpenSSL's crypto library from the shell. It can be used for creation of RSA, DH and DSA key parameters Creation of X.509 certificates, CSRs and CRLs o Calculation of Message Digests Encryption and Decryption with Ciphers SSL/TLS Client and Server Tests Handling of S/MIME signed or encrypted mail

I have used the library to port the crypto algorithms into Simplescalar.

Page 7: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

7

Introducing SimpleScalar2.0

Compiling: sslittle-na-sstrix-gcc foo.c –o foo

Running: sim-outorder foo

Page 8: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

8

Selecting OpenSSL Crypto Algorithms:

Private Key Block Cipher Mode

AES (Key length: 128bits; Block Size: 16bits)

DES (Key length: 128bits; Block Size: 8bits)

3DES (Key length: 168 bits; Block Size:8 bits)

IDEA (Key length: 128 bits; Block Size: 8 bits)

Stream Cipher Mode RC4 (Length of 128 bits)

Hash Key MD5 (Block Size: 512 bits; Digest Size: 128 bits)

SHA1 (Block Size: 512 bits; Digest Size: 160 bits)

Page 9: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

9

Simulation Settings & parameters

Settings:-Writing of separate modules for each algorithm

by using crypto library.- Simulating by gcc 2.7.2.3 simplescalar

simulator and running the binary file and giving a file as Input.

-Input file length varies from 1byte to 256 KB.

-Most readings are taken by running through 1 byte length of Input file.

- Changing different parameters in simplescalar in command line and observing the readings.

Parameters used:

Parameters Values

ALU

IFQ Size

ILP

1,2,4,8

1,2,4,…,32

Changing ALU and IFQ same time

Branch prediction type Not taken, taken, 2lev, bimodal, combinational

-Cache size (L1I & L1D)

-Line size

-Sets

-Replacement policy

4,8,…256 KB

8,16,…64 Bytes

1,2,4,8,16

L, r , f

-Unified Cache Size (UL2)

-Replacement Policy

4,8,…2048 KB

L, f, r

Page 10: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

10

Results & Discussions: (1)

1. Instruction Set Characteristics:

- Comparison with Average, SPECint & Commbench

- “Average” represents Li’s work

- SSLcrypto represents the average over all the OpenSSL algorithms I considered.

Obvservation:-

* SSLCrypto algorithms has significant amount of memory reference (~40%)

* Intensive Arithmetic Computation but less than Average

Instruction Mix Comparison

0%

20%

40%

60%

80%

100%

SSLCypto Average SPECint CommBench

Perc

entag

e%

load store uncond. branch cond. branch int computation

Page 11: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

11

Results & Discussions: (2,3)

2.Comaprisons of Instruction Mix:

-Plotted all the block, stream and hash ciphers for the instruction mix

Observation:

- DES, 3DES have high memory reference

-IDEA has a significant branch predictions

3. Cycle per Bytes of Computation

-3DES takes more cycle as it has to manipulate data 3 times with 3 diff keys.

- Block ciphers require more cycles than Stream and hash ciphers.

Block,Stream & hash Algorithm Instruction Mix

0%

20%

40%

60%

80%

100%

AES DES 3DES RC4 IDEA MD5 SHA1

Per

cen

tag

e %

Load Store Cond. Branch Int Computation

Computational Complexity of the algorithms per Byte

010203040

5060708090

AES DES 3DES RC4 IDEA MD5 SHA1

1000

Cyc

les

Page 12: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

12

Results & Discussions (4,5)

4. IPC Vs ALU:

- I26%, 37%, and 40% for Block, stream and hash kind of algorithms respectively when the number of ALUs increases from 1 to 2

- 6%, 10%, and 5% when the number of ALUs increases from 2 to 4

-with more than 4 ALUs, the number of instructions executed in one cycle increases only less than 1%.

5. IPC Vs IFQ Size:

-26%, 37%, and 40% for block, stream and hash kind of algorithms respectively after the size of the instruction fetch queue changes from 1 to 2

- 6%, 10% and 5% if the IFQ changes from 2 to4

- After that it changes within 2%

Average ILP with different Number of Integer ALUs

0

0.5

1

1.5

2

2.5

3

1 2 4 8

ALU

Inst

ruct

ion

Per

Cyc

le AES

DES

3DES

RC4

IDEA

MD5

SHA1

Impact of Instruction fetch Queue on Average ILP

0

0.5

1

1.5

2

2.5

3

1 2 4 8 16 32

IFQ Size

Inst

ruct

ion

Per

Cyc

le AES

DES

3DES

RC4

IDEA

MD5

SHA1

Page 13: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

13

Results & Discussions: (6)

6. IPC Vs ILP:

- ILP 4 means 4 ALU and 4 IFQ (Both Changes)

- ILP of 4 is enough for getting the best Instruction per cycle value.

Overall Impact of ILP

0

0.5

1

1.5

2

2.5

3

1 2 4 8

ILP

Instr

uctio

n Pe

r Cyc

le

AES

DES

3DES

RC4

IDEA

MD5

SHA1

Page 14: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

14

Results & Discussions: (7)

7. Branch prediction Hit Rate:

- Bimodal & Combinational kinds of prediction give a better hit rate

- Also 2lev kind of prediction gives almost better hit rate.

-Simple taken or not taken doesn’t do well..

-So need to consider the complex branch predictions.

Branch prediction Hit Rate

010

2030

405060

7080

90100

AES DES 3DES RC4 IDEA MD5 SHA1

Hit R

ate

%

Not Taken

2 Lev

Taken

Comb

Bimod

Page 15: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

15

Results & Discussions: (8,9)

8. L1 Instruction Cache Size behaviors: - Cache Size changed keeping fixed 64 bytes of lines size , 4way set and l replacement

- We can observe that 128KB is enough to reach the best performance level.

9. L1Instruction Cache Line Size :

-Cache line size changed keeping fixed 256 cache size and 4 way set and l replacement

- we can observe that 32 bytes of line size is enough to reach the lowest possible miss rate.

Impact of cache Size on Miss Rate

0

2

4

6

8

10

12

14

4 8 16 32 64 128 256

Cache Size (KB)

Mis

s ra

te %

AES

DES

3DES

RC4

IDEA

MD5

SHA1

Impact of Block Size on Miss rate

05

1015202530354045

8 16 32 64

Block Size (Bytes)

Mis

s R

ate

%

AES

DES

3DES

RC4

IDEA

MD5

SHA1

Page 16: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

16

Results & Discussions: (10,11)

10. L1 Instruction cache Set behaviors:

- Set Associativity changed keeping fixed 256KB cache size, 32 bytes of line size and l kind of replacement policy.

- We can observe that 2 way set associativity is enough to reach a miss rate lower than 5%.

11. L1 Instruction Cache Replacement Policy Behaviors:

- Replacement policy changes keeping fixed 256KB cache size, 32 bytes of line size and 4 way set..

- We can observe that LRU & FIFO give same performance . We can choose either one.

Impact of Set Associativity

0

1

2

3

4

5

6

7

8

1 2 4 8 16

Number of Sets

Mis

s R

ate

%

AES

DES

3DES

RC4

IDEA

MD5

SHA1

Impact of Replacement policy

0

0.5

1

1.5

2

2.5

3

3.5

AES DES 3DES RC4 IDEA MD5 SHA1

Mis

s R

ate

%

LRU

FIFO

Random

Page 17: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

17

Results & Discussions:(12,13)

12. L1 Data Cache Behaviors:

- Cache Size changed keeping fixed 64 bytes of lines size , 1way set and l replacement

- We can observe that 32KB is enough to reach the best performance level.

13. L1 Data Cache Line Size :

-Cache line size changed keeping fixed 256 cache size and 1 way set and l replacement

- we can observe that 32 bytes of line size is enough to reach the lowest possible miss rate.

Impact of Cache Size on Miss Rate

0

5

10

15

20

25

4 8 16 32 64 128 256

Cache Size (KB)

Mis

s R

ate

%

AES

DES

3DES

RC4

IDEA

MD5

SHA1

Impact of Block Size on Miss Rate

0

5

10

15

20

25

8 16 32 64

Block Size (Bytes)

Mis

s R

ate

%

AES

DES

3DES

RC4

IDEA

MD5

SHA1

Page 18: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

18

Results & Discussions: (14,15)

14. L1 Data cache Set behaviors:

- Set Associativity changed keeping fixed 256KB cache size, 32 bytes of line size and l kind of replacement policy.

- We can observe that 2 way set associativity is enough for block and for stream but 4 way is enough for Hash ciphers.

15. L1 Instruction Cache Replacement Policy Behaviors:

- Replacement policy changes keeping fixed 256KB cache size, 32 bytes of line size and 4 way set..

- We can observe that LRU & FIFO give same performance . We can choose either one.

Impact of Set Associativity

0

1

2

3

4

5

6

7

1 2 4 8 16

Number of Sets

Mis

s R

ate

%

AES

DES

3DES

RC4

IDEA

MD5

SHA1

Impact of Replacement Polilcy

0

1

2

3

4

5

6

AES DES 3DES RC4 IDEA MD5 SHA1

Mis

s R

ate

%

LRU

FIFO

Random

Page 19: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

19

Results & Discussions: (16,17)16. L1 Data Cache Behaviors:

- Cache Size changed keeping fixed 64 bytes of lines size , 1way set and l replacement- We can observe that 512KB is enough to reach the best performance level.

17. L1 Instruction Cache Replacement Policy Behaviors:- Replacement policy changes keeping fixed 512KB cache size, 64 bytes of line size and 4 way set..- We can observe that LRU & FIFO give same performance . We can choose either one.

Impact of Cache Size

0

10

20

30

40

50

60

70

80

4 8 16 32 64 128 256 512 1024 2048

Cache Size (KB)

Mis

s R

ate

%

AES

DES

3DES

RC4

IDEA

MD5

SHA1

Impact of Replacement Policy

05

1015

202530

3540

4550

AES DES 3DES RC4 IDEA MD5 SHA1

Mis

s R

ate

%

LRU

FIFO

Random

Page 20: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

20

An Interesting Comparison:Observation: Li’s Analysis

(Widely available cryptoAlgo)

My Analysis

(OpenSSL Crypto Algorithms)

Instruction Mix: 23% Memory Reference

60% Arithmetic computations

40-45 % Memory Reference

68% Arithmetic Reference

Cycles per Byte of Computation Block:80 Stream: 20

Hash: 18

Block: 55 Stream: 55

Hash: 30

ALU Vs IPC

IFQ Vs IPC

ILP Vs IPC

Best when 4 ALUs

Best when IFQ is 4

Best when ILP is 4

Best when 4 ALUs

Best When IFQ is 4

Best when ILP is 8

Branch prediction technique Simple technique (taken or not taken)

Complex technique (Bimodal or Combinational)

L1 Instruction cache parameters 16KB cache size, 8 bytes of line size, 4 way set, l replacement

128KB Cache size, 32 bytes line size, 2 way sets, l replacement

L1 Data Cache parameters 32KB cache, 8bytes of line size, 2 way sets, l replacement

32KB cache Size, 64 bytes line size, 2 way set, l replacement

UL2 Unified cache parameters 64 KB cache Size, l kind of replacement policy

512 KB cache size, l kind of replacement policy

Page 21: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

21

Demo Time …………

Page 22: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

22

Conclusion:

For crypto Engines using OpenSSL crypto algorithms should have * 128KB L1 Inst cache size* 32KB L1 Data cache Size* 512KB UL2 cache Size* 2 way set associativity* l replacement policy* ILP of 8* Advanced branch prediction schemes

For a better performance architecture wise….!!!

Page 23: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

23

Acknowledgement & References:

A Big Thanks to Li Zhao References:

SimpleScalr Tool Set

http://www.simplescalar.com

OpenSSL

http://www.openssl.org

Architectural Analysis of Cryptographic applications for Network processors by Haiyong Xie et. al.

Anatomy and Performance of SSL processing by Li Zhao, Ravi Iyer, Srihari Maikeneni, Laxmi Bhuyan.

Page 24: Running OpenSSL Crypto Algorithms in Simplescalar

04/19/23 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

24

Q&A ????


Recommended