Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | alemayehu-tilahun |
View: | 72 times |
Download: | 4 times |
Federal Democratic Republic of Ethiopia
Ministry of Defense
Defense University, College Of Engineering
Office of Postgraduate Programs and Research
M-Tech Thesis
Hardware Acceleration of Elliptic Curve Cryptography
(ECC) Algorithm: Design and Simulation
By
Alemayehu Tilahun
Supervisor: Manoj V.N.V (Dr.)
Department of Computer Information and Technology
Computer Engineering Specialization
June, 2014
Bishoftu
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
II
Acknowledgments
I would like to articulate my deepest gratitude to my thesis guide Dr. Manoj V.N.V for his skillful
advice and follow-ups have always been my motivation throughout the thesis. It is also great
pleasure to express my appreciations and many thanks to all scholars whose assemblage
publications are referenced in my work, as the nature will not be completed without their
references.
Let my boundless respect and great gratitude goes to Major Alemseged A. and Captain Abrham
J. (Captain) for their endless support and encouragements during the process.
Last but not the least my sincere thanks to all of CIT Department Staffs members of Defense
Engineering College and All my friends who have patiently extended all sorts of help towards
accomplishing this undertaking.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
III
DECLARATION
I hereby declared that the thesis project entitled “Hardware Acceleration of Elliptic Curve
Cryptography (ECC) Algorithm: Design and Simulation” submitted for M-Tech Degree is my
original work and the thesis project has not formed the basis for the award of any degree, associate
ship, fellowship or any other similar titles.
Signature of the student: Alemayehu Tilahun
Place: DUC, Bishoftu
Date: June 9, 2014
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
IV
CERTIFICATE
This is to notify that the thesis project entitled “Hardware Acceleration of Elliptic Curve
Cryptography (ECC) Algorithm: Design and Simulation” is the Work Carried out by Alemayehu
Tilahun Haile student of M-Tech, Defense Engineering College, Bishoftu, during the year
2013/2014. In partial fulfillment of the requirement for the ward of the degree of M-Tech in
Computer Engineering and has the Project has not formed the basis for the award previously of
any degree, diploma, associate ship, fellowship or any other similar rule.
Signature of the Advisor: Manoj V.N.V(Dr.)
Place: DUC, Bishoftu
Date: June 9, 2014
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
V
Approval by Members of BoE
Name Signature
1. External Examiner:
2. Internal Examiner:
3. Chairperson/HoD:
4. Advisor:
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
VI
Abstract
In today’s dynamically changing world, technology advancement resulted in explosive growth of
communications and other related computer engineering fields. Applications like online banking,
personal digital assistants, mobile communication, smartcards, etc. have emphasized the need for
security in resource constrained environments. Elliptic curve cryptography (ECC) serves as a
perfect cryptographic tool because of its short key sizes and security comparable to that of other
standard public key algorithms. However, to match the ever increasing requirement for speed in
today’s applications, hardware acceleration of the cryptographic algorithms is a necessity. As a
further challenge, the designs have to be robust against an attacks.
This thesis explores hardware acceleration of elliptic curve cryptography over binary Galois fields.
The efficiency is largely affected by the underlying arithmetic primitives. The thesis therefore
explores field programmable gate array designs for two of the most important field primitives
namely multiplication and inversion. Field programmable gate array s are reconfigurable hardware
platforms offering flexibility and lower costs like software programs. However, designing on field
programmable gate array platforms is challenging because of the large granularity, limited
resources, and large routing delay. The smallest programmable entity in a field programmable gate
array is the look up table. The arithmetic algorithms proposed in this thesis maximizes the
utilization of look-up tables on the field programmable gate array. A novel finite field multiplier
based on the Karatsuba multiplication algorithm is proposed. The proposed multiplier combines
two variants of Karatsuba, namely the general and the simple Karatsuba multipliers. The general
Karatsuba multiplier has a large gate count but for small sized multiplications is compact because
it utilizes look-up table resources efficiently. For large sized multiplications, the simple Karatsuba
is efficient as it requires lesser gates. The proposed hybrid multiplier does the initial recursion
using the simple algorithm while final small sized multiplications is done using the general
algorithm. The multiplier thus obtained has the best area time product compared to reported
literature. The proposed primitives are organized as Karatsuba multiplier and has one of the best
timings and area time product compared to reported works. We conclude that the performance of
and our multiplier is significantly enhanced if the underlying primitives are carefully designed.
Key Words:- Cryptography, Elliptic Curve Cryptography, Karatsuba Multiplier, Hardware
Acceleration
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
VII
Table of contents Page
Acknowledgements II
Declaration III
Certificate IV
Approval of BoE’s V
Abstract VI
Table of Contents VII
Lists of Acronyms and Abbreviations XI
Lists of Tables XII
Lists of Figures XIII
Chapter -1 INTRODUCTION
1.1 Background 1
1.2 Quality and Needs for its Achievement 2
1.3 Statement Of the Problem 3
1.4 Objective of the Study 3
1.4.1 General Objective 3
1.4.2 Specific Objectives 3
1.5 Scope of the Study 3
1.6 Limitation of the Study 4
1.7 Significance of the Study 4
1.8 Organization of the study 4
Chapter-2 LITERATURE REVIEW
2.1 Literature Review at International Level 6
2.2 Literature Review at National Level 7
2.3 Concepts in Cryptography 7
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
VIII
2.4 ECC Cryptography 9
2.5 Mathematical Background of ECC 10
2.5.1 Groups 10
2.5.2 Rings 10
2.5.3 Fields and Vector Space 10
2.5.3.1 Finite Field 11
2.5.3.2 Prime Field Fp 12
2.5.3.3 Binary Field F(2m) 12
2.5.4 Polynomial Basis Representation of F(2m) 12
2.5.5 Normal basis Representation of F(2m) 13
2.6 Elliptic Curve Over Fp 14
2.7 Elliptic Curves Over F(2m) 15
2.8 Elliptic Curve Discrete Logarithm Problem 17
2.9 Application of Elliptic Curve in Key Exchange 17
2.9.1 ECC Domain Parameter 17
2.9.2 Elliptic Curve Protocols 17
2.9.3 Elliptic Curve Digital Signature Authentication 18
2.9.4 Elliptic Curve Authentication Encryption
Scheme
21
2.10 Algorithms for Elliptic Curve Multiplication 22
2.11 Hierarchy of Elliptic Curve Cryptography 22
2.11.1 Point Multiplication 23
2.11.2 Point Addition 23
2.11.3 Point Doubling 24
2.12 Hardware Accelerator 24
2.13 FPGA Architecture 26
2.13.1 Look-Up Table 27
2.13.2 Configurable Logic Block 27
2.13.3 Input/Output Block 28
2.13.3 RAM Block 29
2.13.4 Programmable Routing 29
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
IX
Chapter-3 METHODOLOGY
3.1 Design Cycle 30
3.2 Tools Used 32
3.2.1 Simulation and Verification 33
3.2.2 Synthesizing tools 33
3.2.3 Place and Rout 34
3.2.4 Loading FPAG Program 35
Chapter- 4 DATA AND DATA ANALYSIS
4.1 Elliptic Curve Scalar Multiplication 36
4.2 Karatsuba Multiplier 37
4.2 Point Addition 40
4.3 Point Multiplication 41
4.4 Squaring
41
Chapter-5 RESULT AND DISCUSSIONS
5.1 Simulation Result for Karatsuba Multiplier 44
5.2 Resource Utilization of Polynomial Reducer 45
5.3 Simulation Result for Encryption Unit 46
5.4 Design of the Encryption Unit 47
5.5 Xpower Analysis for Karatsuba Multiplier 47
5.6 Resource Utilization for ECC Decryptor Unit 48
5.7 Xpower Analysis of ECC Component 49
5.8 Comparson With Other Related Workd 49
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
X
Chapter-6 SUMMARY, CONCLUSION,
RECOMMENDATION AND FUTURE RESEARCH
WORK
6.1 Summary 51
6.2 Conclusion 52
6.3 Recommendation for Future Work 52
6.3.1 New Design Consideration 52
6.3.2 Implementation Alternatives 53
REFERENCE 54
Appendix-I Sample Algorithms 57
Appendix-II Sample Snapshots 59
Appendix-III Xilinx Synthesis Report 63
Appendix-IV Sample VHDL Code 71
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
XI
Lists of Acronyms and Abbreviations
ASIC Application Specific Integrated Circuits
CLB Configurable Logic Block
DSA Digital Signature Algorithm
ECC Elliptic Curve Cryptography
ECDH Elliptic Curve Diffie-Hellman Protocol
ECDLP Elliptic Curve Discreet Logarithm Problem
ECDSA Elliptic Curve Digital Signature Authentication
ECAES Elliptic Curve Authentication Encryption Scheme
FPGA Field Programmable Gate Array
FDRE Federal Democratic Republic of Ethiopia
GF Finite Field
GRM General Routing Matrix
HDL Hardware Description Language
JTAG Joint Test Action Group
LUT Look-Up Table
LSB Least Significant Bit
MUX Multiplexer
MoD Ministry Of Defense
PLD Programmable Logic Device
RTL Register Transfer Logic
RSA Ron Rivest, Adi Shamir, and Leonard Adleman
SoC System On a Chip
VHDL Very High Speed Integrated Circuit Hardware
Description Language
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
XII
List of Tables
Table Number Description Page No
Table 2.1 Comparison of NIST Recommended Key size 9
Table 5.1 Resource Utilization of Karatsuba Multiplier 44
Table 5.2 Resource Utilization for Polynomial Reducer 45
Table 5.3 Resource Utilization of ECC Encryption Unit 46
Table 5.4 Xpower Report for Karatsuba Multiplier 47
Table 5.5 Decryption Unit Resource Utilization 48
Table 5.6 Xpower Analysis for ECC Components 49
Table 5.7 Comparison with other Related Works 50
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
XIII
List of Figures
Figure Number Description Page No
Figure 2.1 Two Party Communication 8
Figure 2.2 Illustration of Elliptic Curve Digital Signature 20
Figure 2.3 Illustration of Elliptic Curve Authentication 22
Figure 2.4 Hierarchy of Elliptic Curve Cryptography 23
Figure 2.5 Graph of Point Addition 24
Figure 2.6 Graph of Point Doubling 24
Figure 2.7 FPGA Architecture 26
Figure 2.8 Internal Architecture of FPGA 27
Figure 3.1 Design Flow Chart 31
Figure 3.2 FPGA Board 32
Figure 3.3 Synthesis Flow Diagram 33
Figure 3.4 Place and Rout processes 34
Figure 3.5 Programing FPGA Board 35
Figure 4.1 Typical Hierarchy of ECC 36
Figure 4.2 RTL Structure of Karatsuba Multiplier 40
Figure 4.3 RTL Structure of Squarer 43
Figure 5.1 Karatsuba based Encryptor 47
Figure 5.2 Karatsuba Based Decryptor 48
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
1
CHAPTER-1
INTRODUCTION
1.1 Background
The ever increase in communications over the wired and wireless networks lets everyday
thousands of transactions take place over the World Wide Web. Several of these transactions
have critical data which need to be confidential, transactions that need to be validated, and
users authenticated. These requirements need a rugged security framework to be in force [17]
[33] [35].
The idea of information security lead to the evolution of Cryptography. In other words,
Cryptography is the science of keeping information secure. It involves encryption and
decryption of messages. Encryption is the process of converting a plain text into cipher text
and decryption is the process of getting back the original message from the encrypted text.
Cryptography, in addition to providing confidentiality, also provides Authentication, Integrity
and Non-repudiation.[1][5]
There have been many known cryptographic algorithms. The crux of any cryptographic
algorithm is the “seed” or the “key” used for encrypting/decrypting the information[20][34].
Many of the cryptographic algorithms are available publicly, though some organizations
believe in having the algorithm a secret. The general method is in using a publicly known
algorithm while maintaining the key a secret.
Based on the key, cryptosystems can be classified into two categories: Symmetric and
Asymmetric. In Symmetric Key Cryptosystems, we use the same key for both Encryption as
well as the corresponding decryption. [4][7][9][16]
Asymmetric or Public key or shared key cryptosystems use two different keys. One is used
for encryption while the other key is used for decryption. The two keys can be used
interchangeably. One of the keys is made public (shared) while the other key is kept a secret.
i.e. let k1 and k2 be public and private keys respectively. M. Kider and Manoj V.N.V(2008)In
general, symmetric key cryptosystems are preferred over public key systems due to the
following factors:-
I. Ease of computation
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
2
II. Smaller key length providing the same amount of security as compared to a larger key
in Public key systems.
Hence the common method adopted is to use a public key system to securely transmit a “secret
key”. Once we have securely exchanged the Key, we then use this key for encryption and
decryption using a Symmetric Key algorithm.
The idea of using Elliptic curves in cryptography was introduced by Victor Miller and Neal
Koblitz (1986) as an alternative to established public-key systems such as DSA and RSA. The
Elliptical curve Discrete Log Problem (ECDLP) makes it difficult to break an ECC as
compared to RSA and DSA where the problems of factorization or the discrete log problem
can be solved in sub-exponential time. This means that significantly smaller parameters can
be used in ECC than in other competitive systems such as RSA and DSA. This helps in having
smaller key size hence faster computations.
In our theis we study the Hardware Acceleration of elliptic curves in the field of cryptography.
We study the property of finite field and elliptic curves over finite fields and also how these
properties can be used for efficient Design and Simulation of the Encryption Decryption
Process.
1.2 Quality and the Need for its Achievement
FPGAs are an attractive choice for implementing cryptographic algorithms on hardware’s,
because of their low cost in prototyping relative to ASICs. FPGAs are flexible when adopting
security protocol upgrades, as they can be re-programmed in-place [13]. One series of FPGA
is Xilinx Spartan®-6 FPGA which delivers an optimal balance of low risk, low cost, and low
power for cost-sensitive applications, now with 42% less power consumption and 12%
increased performance over previous generation devices. Part of Xilinx’s All
Programmable low-end portfolio, Spartan-6 FPGAs offer advanced power management
technology, up to 150K logic cells, integrated PCI Express® blocks, advanced memory
support, 250MHz DSP slices, and 3.2Gbps low-power transceivers. Xilinx ISE Design Suite
14.7 is the latest version of Hardware programing Environment package which provides
multiple futures over the rival Quartus II Web Edition development package.
www.Xilinx.com (last viewed May 23, 2014:3PM)
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
3
1.3 Statement of the Problem
Scalar multiplication is the most time consuming operation in Elliptic curve when
implemented in both Hardware and software based cryptosystems; as scalar multiplication is
mostly performed in successive addition in their implementation nature[4][11][19][25][33].
Efficient implementation of ECC Cryptography algorithms on Hardware can be introduced
by implementing multiplication optimizing techniques such as Karatsuba. Karatsuba make
ECC protocols more attractive by reducing the execution time and IO usage of the
multiplication process. Therefore, while the general purpose microprocessor is doing its
routine task the time consuming operations can be executed using Karatsuba co-processor
designed on a special reprogrammable hardware’s such as FPGA.
1.4 Objectives of the Study
1.4.1 General Objective
The General objective of this study is to design and simulate Hardware acceleration of
Elliptical Curve Cryptography (ECC).
1.4.2 Specific Objectives
The Specific objectives of the Study are:
1. Design and Simulate the Karatsuba multiplier
2. To design and simulate finite arithmetic units for binary fields using Xilinx ISE Design
3. To measure efficiency of Karatsuba multiplication on Xilinx ISE Design
4. To integrate the finite arithmetic units into an efficient hardware scalar multiplier.
5. To Design and Simulate the Karatsuba based ECC Encryptor/ Decryptor.
6. To compare the performance of the hardware multiplier with the software
implementation and other related works.
1.5 Scope of the Study
In this thesis, the performance of hardware units are designed for Karatsuba multiplication,
binary field arithmetic and then compared with that of the software. These finite field
arithmetic units are then integrated together to create an elliptic curve cryptographic Hardware
capable of computing the scalar multiplication on elliptic curves and Performing Encryption
Decryption.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
4
To measure the efficiency of the hardware, the design is translated into a hardware description
language namely Verilog. Then simulation is done for functionality and timing analysis using
Xilinx design suite V14.5 software.
1.6 Limitations of the study
In conducting this thesis work, the researcher may expect the following challenges:-
a) Window 8 was the Original Operating system for the researcher but does not support
any version of Xilinx ISE Design suite
b) Lack of Current Literature during Literatures Survey
1.7 Significance of the study
With the rapid growth of Internet and digital communication, the need of protecting files and
other information stored on, and transmitted between computers has become of vital
importance. To these extent, the requirement for trusted computing and secured
communications are has become an important issue of the era. Therefore; this thesis on
Hardware acceleration of Elliptic Curve cryptography believed to deliver comparative
importance in advancing the Security and performance of information communication and
dissemination activities in EFDRE MoND. Furthermore the Findings of the thesis and the
result obtained from the analysis will be helpful to other researchers wishing to conduct an
experiment on the area.
1.8 Organization of the Thesis
The thesis is organized into six chapters. The first chapter introduces the thesis background,
research objectives, thesis scope, thesis Significance, thesis contribution, and the thesis
organization.
The Second Chapter reviews the background of the research. Related works are presented.
Summary of the literature review is given to clarify the study rationale, the chapter also
presents the brief introduction of the Cryptography, Elliptic Curve Cryptography with the
mathematical concepts of finite fields and elliptic curves. Various design styles of hardware
accelerator to implement elliptic curve arithmetic are described. The ECDH and ECAES
Protocols scheme are discussed in this chapter.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
5
Chapter three presents the methodology followed to design and Simulate ECC hardware
accelerator for ECC Cryptography primitives namely Karatsuba multiplier, field arithmetic
level and point arithmetic level. The activities that are followed to design Hardware for
primitives of ECC Cryptography also explained in this chapter.
The Fourth Chapter presents the details on the design and Implementation of the hardware
Accelerator of ECC on Reconfigurable Hardware (FPGA) is presented. For Each lower level
activities of the Elliptic Curve Cryptography, their respective circuits have been designed
independently and integrated as one Accelerator Module.
The Fifth Involves on the results and discussion of the thesis. The Reports Generated from
Xilinx ISE Design Suit and PlanAhead platform for design verification, synthesis and
Implementation test results and performance studies on the Hardware Based ECC
Cryptography are presented in tubular and charts. Test results of the elliptic curve
cryptosystem from Xilinx ISE Design Suit 14.7 and Results from related hardware accelerator
also are also compared and reported.
The final chapter in which, the thesis work is summarized, Concluded and the potential future
works are indicated.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
6
CHAPTER-2
LITERATURE REVIEW
2.1 Literature reviewed at International Level
There have been several reported high performance FPGA processors for elliptic curve
cryptography. Various acceleration techniques have been used ranging from efficient
implementations to parallel and pipelined architectures. In [29] the Montgomery multiplier
[4] [13] [30] is used for scalar multiplication. The finite field multiplication is performed using
a digit-serial multiplier proposed in [31]. The Itoh-Tsujii algorithm is used for finite field
inversion [7] [19] [30] [35].
In [22], the ECC processor designed has squarers, adders, and multipliers in the data path. The
authors have used a hybrid coordinate representation in affine, Jacobean, and López-Dahab
form.
In [34] an end-to-end system for ECC is developed, which has a hardware implementation for
ECC on an FPGA. The high performance is obtained with an optimized field multiplier. A
digit-serial shift-and-add multiplier is used for the purpose. Inversion is done with a dedicated
division circuit.
In [30], the finite field multiplier in the processor is prevented from becoming idle. The finite
field multiplier is the bottle neck of the design therefore preventing it from becoming idle
improves the overall performance. Our design of the ECCP is on similar lines where the
operations required for point addition and point doubling are scheduled so that the finite field
multiplier is always utilized.
Hankerson, Hernandez and Menezes (Hankerson, et, al. 2000) wrote an excellent survey
discussing software algorithms for computing elliptic curve point multiplication. Many of
these algorithms can be adapted for use with hardware but it does not refer any multiplication
optimization techniques when implemented on reconfigurable Hardware in which in this
thesis (Optimizations parameters like Surface area, Energy Consumption and Performance)
does.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
7
Groß schädl and Kamendje(2003) propose a simple architectural change to the multiplier
within a RISC processor and software algorithms making use of the modified multiplier for
point multiplication. These software implementations usually make use of a polynomial basis
over binary fields. In addition chesters Reberio and Debdeep Mukhopadhyay on their
publication on High Performance Elliptic Curve Crypto processor explained the same issue
on software polynomial bases using ordinary method for Scalar multiplications.
Hardware implementations, on the other hand, often make use of an optimal normal basis
over binary fields. They also generally target towards FPGAs for the realization of the
proposed architectures. S. Janssens, et al, 2003, propose an architecture that makes use of
hardware/software co-design and targets the Atmel FPSLIC.
Okada, Torii, Itoh and Takenaka, 2011, propose an elliptic curve coprocessor for arbitrary bit
length to be implemented on an FPGA. Sutikno, Surya and Effendi propose a processor to
compute point multiplication in F (2155).
Leung, Ma, Wong and Leong propose an FPGA implementation of a micro-coded elliptic
curve processor for arbitrary key sizes. Many other hardware implementations also exist.
These implementations make use of microcode instructions to drive special-purpose
arithmetic units and store intermediate results in standard registers
An outstanding material was by Bahram Hakhamaneshi (Islamic Azad University Iran 2000),
Z. Guitouni, . Chotin-Avot, M. Machhu, H. Mehrez and R. Tourki is one of the most similar
work proposed in publication available and revised so far. The Publication uses Scalar
Multiplication over finite field on ASIC. Perhaps much more performance will be achieved
by using multiplication optimization schema as indicated in this thesis.
2.2 Literature Reviewed at National Level
A National level work by Mubarek Kedir and Manoj V.N.V (April 2008) provided a
description of Hardware acceleration of Elliptic Curve Cryptography Algorithm by
Montgomery multiplication schema. But much more efficiency will be obtained by
implementing scalar multiplication schema as mentioned in this thesis.
2.3 Concepts in Cryptography
Cryptography uses mathematics to encrypt and decrypt data. It enables people to store or
transmit sensitive information via insecure network. On the other hand, cryptanalysis is the
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
8
science of breaking secure communication. There are two persons, Alice and Bob, (the
beginning of cryptography: “A” and “B” are used as handy abbreviations of the names)
communicate via an insecure channel in a secure way. The third person who is eavesdropper
(Eve, abbreviated as E) should not be able to read the clear-text or change it.
The goal of cryptography is to achieve the aim of allowing two people to exchange messages
using cryptography which are not understood by other people (Wang, et al.). Figure 2.1
provides a sample model of a two-party communication using encryption. In this simple party,
an entity is a person that sends, receives or manipulates data. Sender is an entity that
legitimately transmits the information. On the other hand, a receiver is an entity that is the
recipient of information. A receiver may be one of the entities that attempts to crush the
information security service provided between the sender and receiver. An adversary plays
the role either as the sender or the receiver. The other synonymous names for adversary are
attacker, enemy, eavesdropper, opponent and intruder (Jesper 2006).
Figure 2.1 Two Party Communication
The cryptographic strength can be measured by the needed resources and time in recovering
the plain text. In order to encrypt the plaintext, cryptographic algorithm works in a
combination with a key (private key) to resolve the ciphertext. The ciphertext differs from one
to another because of different values used in each time. The security of encrypted data
depends on the strength of the cryptographic algorithm and the confidentiality of the key (B.
Schneier 1996).
When Whitfield Diffie and Martin E. Hellman published their famous article ”New Directions
in Cryptography” [22], stating cryptography algorithms have been divided into two
categories: symmetric-key cryptography and public key cryptography. Symmetric-key
cryptography (private-key, single-key or one key cryptography) is a cryptosystem where both
encryption and decryption processes are performed using the same key. In a public-key
Plain Text ----> Encryption Decryption----> Plain Text Unsecured Channel
Adversary
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
9
cryptosystem there are two different keys, one which is public (public key), and other which
is secret (private key). The most famous public-key cryptosystem is probably RSA which was
presented by Rivest, Shamir and Adleman in Reference [14] in 1978.
2.4 ECC Cryptography
Elliptic curve cryptography (ECC) was proposed in 1985 by Neal Koblitz and Victor Miller.
Elliptic curve cryptographic schemes can provide the same functionality as RSA schemes
which are public-key mechanisms. The security is based on the difficultly of a different
problem, which is called the Elliptic Curve Discrete Logarithm Problem (ECDLP).In order to
solve the ECDLP, the best algorithms have fully exponential time. In contrast, the integer
factorization problem has to be solved with sub exponential-time algorithms (Hankerson, et
al. 2004). It makes Elliptic Curve Cryptography offers similar security. It is offered by other
traditional public key cryptography schemes used nowadays, with smaller key sizes and
memory requirements. (As shown in Table 2.1) (Kumar 2006). For example, it is generally
accepted that a 1024-bit RSA key provides the same level of security as a 160-bit elliptic
curve key. The advantages can be achieved from smaller key sizes including storage, speed
and efficient use of power and bandwidth. The use of shorter keys means lower space
requirements for key storage and quicker arithmetic operations. These advantages are
essential when public-key cryptography is applied in constrained devices, such as in mobile
devices or RFID. These advantages are the reason behind choosing ECC as the cryptography
system in this thesis.
Table 2.1 Comparison of NIST recommend Key sizes
Symmetric Key ECC RSA Comment
64 128 700 Short Period Security
80 160 1024 Medium period Security
128 256 2040 Long Period Security
In brief, ECC based algorithms can be easily included into existing protocols to get the same
backward compatibility and security with smaller resources. Therefore, more low-end
controlled devices can use such protocols to be considered unsuitable for such systems.
A group structure used to implement the cryptographic schemes is provided by using Elliptic
curves and is determined over a finite field. The elements of the group are the points on the
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
10
elliptic curve. They act as the identity element of the group. On the other hand the group
operation can be executed by arithmetic operations based on finite field. It is discussed in
detail in the next section (Kumar 2006).
2.5 Mathematical Background of Elliptic Curve Cryptography
2.5.1 Groups
A mathematical structure consisting of a set G and a binary operator on G is a group if,
a, b G, if c = a b, then c G (Closure)
a (b c) = (a b) c, a, b, c G (Associative)
e G, such that a G, a e = e a = a (Identity element)
a G, a G such that, a a = a a = e. a is unique for each a and is called the
inverse of a.
The group is represented as G, . Additionally, a group is said to be abelian if it also
satisfies the commutative property, i.e., a, b G, if, a b = b a.
2.5.2 Rings
A Ring is a set R with two binary operations + and (Addition and multiplication) defined
on R such that the following conditions are satisfied.
R, + is an Abelian group
a (b c) = (a b) c, a, b, c R (Associativity of )
a (b + c) = (a b) + (a c), a, b, c R (Distributivity of over +)
A Ring, in which is commutative is called a commutative ring. Further, if the ring contains
an identity element with respect to , i.e. e R and a R, a e = e a = a, then e is
called the identity element or the unity element and is represented by 1. If R contains a unity
element, then R is called a Unitary Ring.
2.5.3 Fields and Vector Spaces
A Field F is a commutative and a unitary ring such that, F* = a | a F and a 0 is a
multiplicative group. The ring Zp is a Field, if and only if p is a prime.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
11
If F is a field. A subset K of F that is also a field under the operations of F (with restriction
to K) is called a sub field of F. In this case, F is called an extension field of K. If K F then
K is a proper sub field of F. A field is called prime if it has no proper sub field.
If F is a field and V is an additive abelian group, then V is called the vector space over F, if
an operation F x V V is defined such that:
a (v + u) = av + au
(a + b) v = av + bv
a (bv) = (a.b) v
1.v = v
where, a, b F and u, v V.
The elements of F are called the scalars and the elements of V are called the vectors.
If v1, v2, …, vm V, and f1, f2, …, fm F, then the vector v’ = ji vf , 1 i, j m, is a
linear combination of the vectors in V. The set of all such linear combinations is called the
span of V.
The vectors v1, v2, …, vm V are said to be linearly independent over F if there exists no
scalars f1, f2, …, fm F such that ji vf 0, 1 i, j m.
A set S = u1, u2, …, un are said to the basis of V iff all the elements of S are linearly
independent and span V. If a vector space V over a field F has a basis of a finite number of
vectors, then this number is called the dimension of V over F.
If F is an extension field of a field Fp then, F is a vector space over Fp. The dimension of F
over Fp is called the degree of the extension of F over Fp.
2.5.3.1 Finite Fields
A field of a finite number of elements is denoted Fq or GF(q), where q is the number
of elements. This is also known as a Galois Field.
The order of a Finite field Fq is the number of elements in Fq. Further, there exists a finite
field Fq of order q iff q is a prime power, i.e. either q is prime or q = pm, where p is prime.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
12
In the latter case, p is called the characteristic of Fq and m is called the extension degree of
Fq and every element of Fq is a root of the polynomial xxmp over Zp.
Let us consider two classes of Finite fields Fp (Prime Field, p is a prime number) and m2F
(Binary finite field).
2.5.3.2 Prime Field Fp
The prime field Fp consists of the set of integers 0, 1, 2, ….., p – 1, with the following
arithmetic operations defined over it.
Addition: a, b Fp, r Fp, where r = (a + b) mod p
Multiplication: a, b Fp, s Fp, where s = (a b) mod p
2.5.3.3 Binary Finite Field F2m
The finite field m2F , called a characteristic two finite field or a binary finite field can be
viewed as a vector space of m dimensions over F2, which consists of 2 elements 0 and 1.
There exists m elements 0, 1, 2, …, m-1 in m2F such that each element m2
F can be
uniquely represented as = i
1m
0i
iαa
, where ai 0, 1, 0 i m
The string 0, 1, 2, …, m-1 is called the basis of m2F over F2. Given such a basis, every
field element can be represented as a bit string (a0a1a2…am-1). Generally two kinds of basis
are used to represent binary finite fields: polynomial basis and normal basis.
2.5.4 Polynomial basis representation of F2m
Let f(x) = xm + fm-1xm-1 + … + f2x
2 + f1x + f0, where fi 0, 1, 0 i m, be an irreducible
polynomial of degree m over F2. f(x) is called the reduction polynomial of m2F .
The finite field m2F is comprised of all polynomials over F2 of degree less than m, i.e.:
m2F = am-1x
m-1 + am-2xm-2 + … + a2x
2 + a1x + a0 : ai 0, 1.
The field element am-1xm-1 + am-2x
m-2 + … + a2x2 + a1x + a0 is usually represented by the bit
string (am-1am-2…a2a1a0) of length m such that
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
13
m2F = (am-1am-2…a2a1a0) : ai 0, 1.
Thus, the elements of m2F can be represented by the set of all binary strings of length m. The
multiplicative identity 1 is represented by the bit string (00…001) and the bit string of all
zeroes represents the additive identity 0.
The following operations are defined on the elements of m2F when using f(x) as the
reduction polynomial.
Addition: If a = (am-1am-2…a2a1a0) and b = (bm-1bm-2…b2b1b0) are elements of m2F ,
then, c = a + b = (cm-1cm-2…c2c1c0), where ci = (ai + bi) mod 2 = ai bi.
Multiplication: If a = (am-1am-2…a2a1a0) and b = (bm-1bm-2…b2b1b0) are elements of
m2F , then, c = a . b = (cm-1cm-2…c2c1c0), where the polynomial
cm-1xm-1 + cm-2x
m-2 + … + c2x2 + c1x + c0 is the remainder when the polynomial
(am-1xm-1 + am-2x
m-2 + … + a1x + a0) (bm-1xm-1 + bm-2x
m-2 + … + b1x + b0) is divided
by f(x) over F2.
Inversion: If a is a nonzero element in m2F , then the inverse of a, denoted a–1, is a
unique element c m2F , where a.c = c.a = 1
2.5.5 Normal basis representation of F2m
A normal basis of m2F over F2 is a basis of the form
1m2222 β ,...,β ,β β,, where m2
F .
Any element a m2F can be written as a =
i
iβ1 m
0 i
a , where ai 0, 1.
Gaussian Normal Bases (GNB): A GNB representation of m2F exists if there exists a
positive integer T such that p = Tm + 1 is prime and gcd(Tm/k , k) = 1, where k is the
multiplicative order of 2 modulo p. The GNB representation is called a “type T GNB for
m2F ”.
The following operations are defined over m2F when using a type T GNB representation.
Addition: If a = (am-1am-2…a2a1a0) and b = (bm-1bm-2…b2b1b0) are elements of m2F ,
then, c = a + b = (cm-1cm-2…c2c1c0), where ci = (ai + bi) mod 2 = ai bi.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
14
Squaring: Let a = (am-1am-2…a2a1a0) m2F . Squaring is a linear operation in m2
F .
Hence 2 - m201 - m
1 - m
0 i
i 21 - i
1 - m
0 i
1 i 2i
21 - m
0 i
i2i
2 β β β a aaaaaaa
. Hence
squaring a field element is simply a rotation of the vector representation.
Multiplication: Let p = Tm + 1 and let u Fp. Let us define a sequence F(0), F(1),
…, F(p - 1) by F(2i uj mod p) = i, for 0 i m, 0 j T.
If a = (am-1am-2…a2a1a0) and b = (bm-1bm-2…b2b1b0) are elements of m2F , then the
product c = a.b = (cm-1cm-2…c2c1c0) where,
odd is T If ) (
even is T If
2 / m
1k
2p
1k
2p
1k
i k) - F(pi 1) F(k 1 - i k 1 - i k m/21 - i k m/21 - i k
i k) - F(pi 1) F(k
i
bababa
ba
c
for each i, 0 i m, where indices are reduced modulo m.
Inversion: If a is a nonzero element in m2F , then the inverse of a, denoted a–1, is a
unique element c m2F , where a.c = c.a = 1
2.6 Elliptic Curves over Fp
An elliptic curve E(Fp) over a finite field Fp is defined by the parameters a, b Fp (a, b
satisfy the relation 4a3 + 27b2 0), consists of the set of points (x, y) Fp, satisfying the
equation y2 = x3 + ax + b. The set of points on E(Fp) also include point O, which is the point
at infinity and which is the identity element under addition.
The Addition operator is defined over E(Fp) and it can be seen that E(Fp) forms an abelian
group under addition.
The addition operation in E(Fp) is specified as follows.
P + O = O + P = P, P E(Fp)
If P = (x , y) E(Fp), then (x, y) + (x, – y) = O. (The point (x, – y) E(Fp) and is called
the negative of P and is denoted –P)
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
15
If P = (x1, y1) E(Fp) and Q = (x2, y2) E(Fp) and P Q, then R = P + Q = (x3, y3)
E(Fp), where x3 = 2 – x1 – x2, y3 = (x1 – x3) – y1, and = (y2 – y1) / (x2 – x1), i.e. the
sum of 2 points can be visualized as the point of intersection E(Fp) and the straight line
passing through both the points.
Let P = (x, y) E(Fp). Then the point Q = P + P = 2P = (x1, y1) E(Fp),
where x1 = 2 – 2x, y1 = (x – x1) – y, where = (3x2 + a) / 2y. This operation is also
called doubling of a point and can be visualized as the point of intersection of the elliptic
curve and the tangent at P.
We can notice that addition over E(Fp) requires one inversion, two multiplications, one
squaring and six additions. Similarly, doubling a point on E(Fp) requires one inversion, two
multiplication, two squaring and eight additions.
Consider the set E(Fp) over addition. We can see that
P, Q E(Fp), if R = P + Q, then R E(Fp) (Closure)
P + (Q + R) = (P + Q) + R, P, Q, R E(Fp) (Associative)
O E(Fp), such that P E(Fp), P + O = O + P = P (Identity element)
P E(Fp), – P E(Fp) such that, P + (– P) = (– P) + P = O. (Inverse element)
P, Q E(Fp), P + Q = Q + P. (Commutative)
Thus we see that E(Fp) forms an abelian group under addition.
2.7 Elliptic curves over F2m
An elliptic curve E( m2F ) over a finite field m2
F is defined by the parameters a, b m2F (a,
b satisfy the relation 4a3 + 27b2 0, b 0), consists of the set of points (x, y) m2F ,
satisfying the equation y2 + xy = x3 + ax + b. The set of points on E( m2F ) also include point
O, which is the point at infinity and which is the identity element under addition.
Similar to E(Fp), addition is defined over E( m2F ) and we can similarly verify that even E(
m2F ) forms an abelian group under addition.
The addition operation in E( m2F ) is specified as follows.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
16
P + O = O + P = P, P E( m2F )
If P = (x , y) E( m2F ), then (x, y) + (x, – y) = O. (The point (x, – y) E( m2
F ) and is
called the negative of P and is denoted –P)
If P = (x1, y1) E( m2F ) and Q = (x2, y2) E( m2
F ) and P Q,
then R = P + Q = (x3, y3) E( m2F ), where x3 = 2 + + x1 + x2 + a,
y3 = (x1 + x3) + x3 + y1, and = (y1 + y2) / (x1 + x2), i.e. the sum of 2 points can be
visualized as the point of intersection E( m2F ) and the straight line passing through both
the points.
Let P = (x, y) E( m2F ). Then the point Q = P + P = 2P = (x1, y1) E( m2
F ), where x1 =
2 + + a, y1 = (x + x1) + x1 + y, where = x + (x / y). This operation is also called
doubling of a point and can be visualized as the point of intersection of the elliptic curve
& the tangent at P.
We can notice that addition over E( m2F ) requires one inversion, two multiplications, one
squaring and eight additions. Similarly, doubling a point on E( m2F ) requires one inversion,
two multiplication, one squaring and six additions.
Similar to E(Fp), consider addition under E( m2F ),
P, Q E( m2F ), if R = P + Q, then R E( m2
F ) (Closure)
P + (Q + R) = (P + Q) + R, P, Q, R E( m2F ) (Associative)
O E( m2F ), such that P E( m2
F ), P + O = O + P = P (Identity element)
P E( m2F ), – P E( m2
F ), such that, P + (– P) = (– P) + P = O. (Inverse)
P, Q E( m2F ), P + Q = Q + P. (Commutative)
Thus we see that E( m2F ) forms an abelian group under addition.
Scalar Multiplication: Given an integer k and a point P on the elliptic curve, the elliptic
scalar multiplication kP is the result of adding Point P to itself k times.
Order: Order of a point P on the elliptic curve is the smallest integer r such that
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
17
rP = O. Further if c and d are integers, then cP = dP iff c d (mod r).
Curve Order: The number of points on the elliptic curve is called its curve order and is
denoted #E.
2.8 Elliptical Curve Discrete Logarithm Problem
The strength of the Elliptic Curve Cryptography lies in the Elliptic Curve Discrete Log
Problem (ECDLP). The statement of ECDLP is as follows.
Let E be an elliptic curve and P E be a point of order n. Given a point Q E with
Q = mP, for a certain m 2, 3, ……, m – 2.
Find the m for which the above equation holds.
When E and P are properly chosen, the ECDLP is thought to be infeasible. Note that m = 0,
1 and m – 1, Q takes the values O, P and – P. One of the conditions is that the order of P i.e.
n be large so that it is infeasible to check all the possibilities of m.
The difference between ECDLP and the Discrete Logarithm Problem (DLP) is that, DLP
though a hard problem is known to have a sub exponential time solution, and the solution of
the DLP can be computed faster than that to the ECDLP. This property of Elliptic curves
makes it favorable for its use in cryptography.
2.9 Application of Elliptical Curves in Key Exchange
2.9.1 Elliptic Curve Cryptography (ECC) domain parameters
The public key cryptographic systems involves arithmetic operations on Elliptic curve over
finite fields which is determined by elliptic curve domain parameters.
The ECC domain parameters over Fq is defined by the septuple as given below
D = (q, FR, a, b, G, n, h), where
q: prime power, that is q = p or q = 2m, where p is a prime
FR: field representation of the method used for representing field elements Fq
a, b: field elements, they specify the equation of the elliptic curve E over Fq,
y2 = x3 + ax + b
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
18
G: A base point represented by G= (xg, yg) on E (Fq)
n: Order of point G , that is n is the smallest positive integer such that nG = O
h: cofactor, and is equal to the ratio #E(Fq)/n, where #E(Fq) is the curve order
The primary security in ECC is the parameter n; therefore the length of ECC key is the bit
length of n. For comparative length, the security of ECC keys is much more than that of
other cryptosystems. That is for equivalent security, the key length of ECC key is much
lesser than other cryptosystems.
2.9.2 Elliptic Curve protocols
Generally in the process of encryption and decryption, we have 2 entities, the one at the
encryption side and the other at the decryption side. Let us assume that Alice is the person
who is encrypting and Bob is the person decrypting.
Key generation: Alice’s (or Bob’s) public and private keys are associated with a particular
set of elliptic key domain parameters (q, FR, a, b, G, n, h).
Alice generates the public and private keys as follows
1. Select a random number d, d [1, n – 1]
2. Compare Q = dG.
3. Alice’s public key is Q and private key is d.
It should be noted that the public key generated needs to be validated to ensure that it
satisfies the arithmetic requirement of elliptic curve public key. A public key Q = (xq, yq)
associated with the domain parameters (q, FR, a, b, G, n, h) is validated using the following
procedure
1. Check that Q O
2. Check that xq and yq are properly represented elements of Fq
3. Check if Q lies on the elliptic curve defined by a and b.
4. Check that nQ = O
2.9.3 Elliptic Curve Digital Signature Authentication (ECDSA)
Alice, with domain parameters D = (q, FR, a, b, G, n, h), public key Q and private key d,
does the following steps to sign the message m
Step 1: Selects a Random number k [1, n – 1]
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
19
Step 2: Computes Point kG = (x, y) and r = x mod n, if r = 0 then goto Step 1
Step 3: Compute t = k–1 mod n
Step 4: Compute e = SHA-1(m), where SHA-1 denotes the 160 bit hash function
Step 5: Compute s = k– 1 (e + da*r) mod n, if s = 0 goto Step 1
Step 6: The signature of message m is the pair (r, s)
Step 7: Alice sends Bob the message m and her signature (r, s)
To verify Alice’s signature, Bob does the following (Note that Bob knows the domain
parameters D and Alice’s public key Q)
Step 1: Verify r and s are integers in the range [1, n – 1]
Step 2: Compute e = SHA-1(m)
Step 3: Compute w = s–1 mod n
Step 4: Compute u1 = e.w and u2 = r.w
Step 5: Compute Point X = (x1, y1) = u1G + u2Q
Step 6: If X = O, then reject the signature
Else compute v = x1 mod n
Step 7: Accept Alice’s signature iff v = r
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
20
Alice Bob
Generates k
Computes P = k G = (x, y)
Verify r and s are integers in
the range [1, n – 1]
Sends P, m
Signature of message
m is the Pair P= (r, s)
Compute
r = x mod n
Compute
s = k– 1
(e + da*r) mod n
e = SHA-1(m)
w = s–1
mod n
u1 = e.w and u2 = r.w
Point X = (x1, y1) = u1G + u2Q
Reject
Accept Alice’s signature if v = r
Is r = 0
?
No
e = SHA-1(m)
Is s = 0
?
Yes
No
Yes
No
Yes Is X = O
?
Figure 2.2 Illustration of Elliptic Curve Digital Signature Algorithm
Proof for verification
If the message is indeed signed by Alice, then s = k–1 (e + d*r) mod n.
That is, k = s–1 (e + d.r) mod n = s–1 e + s–1 d.r = w.e + w.d.r = (u1 + u2.d ) mod n ……[1]
Now consider u1G + u2Q = u1G + u2dG = (u1 + u2.d) G = kG from [1]
In step 5 of the verification process, we have v = x1 mod n, where,
Point X = (x1, y1) = u1G + u2Q. Thus we see that v = r since r = x mod n and x is the x
coordinate of the point kG and we have already seen that u1G + u2Q = kG
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
21
2.9.4 Elliptic Curve Authentication Encryption Scheme (ECAES)
Alice has the domain parameters D = (q, FR, a, b, G, n, h) and public key Q. Bob has the
domain parameters D. Bob’s public key is QB and private key is dB. The ECAES mechanism
is as follows.
Alice performs the following stepsA does the following
Step 1: Selects a random integer r in [1, n – 1]
Step 2: Computes R = rG
Step 3: Computes K = hrQB = (Kx, Ky), checks that K O
Step 4: Computes keys k1||k2 = KDF(Kx) where KDF is a key derivation function,
which derives cryptographic keys from a shared secret
Step 5: Computes c = ENCk1(m) where m is the message to be sent and ENC a
symmetric encryption algorithm
Step 6: Compute t = MACk2(c) where MAC is message authentication code
Step 7: Sends (R, c, t) to Bob
To decrypt a cipher text, Bob performs the following steps
Step 1: Perform a partial key validation on R (check if R O, check if the coordinates
of R are properly represented elements in Fq and check if R lies on the elliptic
curve defined by a and b)
Step 2: Computes KB = h.dB.R = (Kx, Ky ) , check K O
Step 3: Compute k1, k2 = KDF (Kx)
Step 4: Verify that t = MACk2(c)
Step 5: Computes m = (c)ENC 11K
We can see that K = KB, since K = h.r.QB = h.r.dB.G = h.dB.r.G = h.dB.R = KB
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
22
Alice Bob
Generate random integer r
in [1, n – 1]
Perform partial
key validation on R
Sends (R, c, t)
Compute R = rG
Compute
K = hrQB = (Kx, Ky)
Compute
k1||k2 = KDF(Kx)
Computes
KB = h.dB.R = (Kx, Ky )
Verify that t = MACk2(c)
Computes m = ENCk1–1
(c)
m is the
decrypted Plain
Text message
Compute
c = ENCk1(m)
Compute
t = MACk2(c)
Compute
k1||k2 = KDF(Kx)
Figure 2.3 Illustration of Elliptic Curve Authentication Encryption Scheme
2.10 Algorithms for Elliptic Scalar Multiplication
In all the protocols that were discussed (ECDH, ECDSA, ECAES), the most time
consuming part of the computations are scalar multiplications. That is the calculations of the
form
Q= k P = P + P + P… k times
Here P is a curve point, k is an integer in the range of order of P (i.e. n). P is a fixed point
that generates a large, prime subgroup of E(Fq), or P is an arbitrary point in such a subgroup.
Elliptic curves have some properties that allow optimization of scalar multiplications. The
following sections describe some efficient algorithms for computing kP.
2.11 Hierarchy of Elliptic Curve Cryptography
Elliptic curve crypto systems have a layered hierarchy as shown in Figure 2.2. The bottom layer
constituting the arithmetic on the underlying finite field most prominently influences the area
and critical delay of the overall implementation.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
23
Figure 2.4 Hierarchy of Elliptic Curve Cryptography
2.11.1 Point Multiplication
Scalar point multiplication is a block of all elliptic curve cryptosystems. It is an operation of
the form k.P. ‘P’ is a point on the elliptic curve and ‘k’ is a positive integer. Computing k.P
means adding the point ‘P’ exactly d-1 times to itself, which results in another point ‘Q’ on
the elliptic curve. Point multiplication uses two basic elliptic curve operations:
1- Point addition (add two point to find another point)
2- Point doubling (adding point p to itself to find another point)
For example to calculate KP=Q if ‘K’ is 23 then KP=23P=2(2(2(2P) + P) + P) + P so to get
the result point addition and point doubling is used repeatedly (Tata, 2007).
2.11.2 Point Addition
Suppose that P and Q are two distinct points on an elliptic curve, and the P is not -Q. To add
the points P and Q, a line is drawn through the two points. This line will intersect the elliptic
curve in exactly one more point, call -R. The point -R is reflected in the x-axis to the point
R. The law for addition in an elliptic curve group is P + Q = R. For example:
EC Primitives and Protocols
Scalar Multiplication
(Karatsuba, Montgomery... )
Elliptic Curve Group Operations
(Point Addition and Point Doubling)
Finite Field Operations Addition, (Multiplication, Invesion and Squarer)
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
24
Figure 2.5 Point addition
2.11.3 Point Doubling
To add a point P to itself, a tangent line to the curve is drawn at the point P. If YP is not 0,
then the tangent line intersects the elliptic curve at exactly one other point, -R. -R is reflected
in the x-axis to R. This operation is called doubling the point P; the law for doubling a point
on an elliptic curve group is defined by:
Figure 2.6 Point Doubling
2.12 Hardware Accelerator
General purpose processors are not optimized for cryptographic arithmetic [4]. They also
cannot provide the amount of parallelism that is required to compute field arithmetic in scalar
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
25
multiplication which is required in elliptic curve based cryptographic systems. This results
in degradation of performance when compared to hardware implementation. It is, therefore,
important to use hardware implementation to avoid such draw backs. This can be done by
the use of two different hardware technologies.
They are:
I. Application Specific Integrated Circuits (ASICs)
II. Field Programmable Gate Arrays (FPGAs)
ASICs are typically used when a design is to be produced in mass or when performance
is of the utmost importance. FPGAs, on the other hand, lend themselves nicely to research
work where a design is being prototyped. The following attributes of the FPGA design flow
are particularly advantageous.
a. Relatively small initial setup cost: A single FPGA is inexpensive when compared to
the manufacturing cost of an ASIC design.
b. Simplified implementation flow: In most cases, the FPGA vendor will provide a fully
integrated tool flow. This flow will have been fully tested for compatibility with the
FPGA and as a result fewer tool related problems can be expected.
c. Fast turnaround time: An FPGA can be programmed in less than a minute and can also
be reprogrammed many times. An ASIC on the other hand may take months to
fabricate.
d. Simplified integration: Whether using an ASIC or FPGA design flow, the design must
be integrated into a hardware/software system. It is common for FPGAs to be sold
within such a system, minimizing the integration task required of the designer
FPGAs are reconfigurable devices offering parallelism and flexibility on one hand while being
low cost and easy to use on the other. Moreover, they have much shorter design cycle times
compared to ASICs. FPGAs were initially used as prototyping devices and in high
performance scientific applications, but the short time-to-market and on-site reconfigurability
features have expanded their application space.
These devices can now be found in various consumer electronic devices, high performance
networking applications, medical electronics and space applications. The reconfigurability
aspect of FPGAs also makes them suited for cryptography applications. Reconfigurability
results in flexible implementations allowing operating modes, encryption algorithms and
curve constants etc. to be configured. FPGA’s do not require sophisticated equipment for
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
26
production, they can be programmed in house. This is beneficial for cryptography as no
untrusted party is involved in the production cycle. (Chester Rebeiro 2008)
2.13 FPGA Architecture
There are two main parts of the FPGA chip: the input/output (I/O) blocks and the core. The
I/O blocks are located around the periphery of the chip and are used to provide programmable
connectivity to the chip. The core of the chip consists of programmable logic blocks and
programmable routing architectures.
A popular architecture for the core, called island style architecture, is shown in Figure 1.2
below. Logic blocks, also called configurable logic blocks (CLB), consists of logic circuitry
for implementing logic. Each CLB is surrounded by routing channels connected through
switch blocks and connection blocks. A switch block connects wires in adjacent channels
through programmable switches.
Fig 2.7 FPGA Architecture
Logic blocks and interconnects can be programmed by the designer, after the FPGA is
manufactured, to implement any logical function, hence the name “field- programmable”.
FPGAs are usually slower than their application-specific integrated circuit (ASIC)
counterparts, they cannot handle a complex design and draw more power (for any given
semiconductor process). But their advantages include a shorter time to market, ability to
re-program in the field to fix bugs, and lower non-recurring engineering costs.
Through the years, FPGAs features have been improved and their density has grown.
Current FPGAs have embedded processors, GiGa-bit serial transceivers, clock managers,
Logic Block Programmable Connection
Switch
Programmable routing switch
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
27
FPGA LUT F
F
Inter Connection Matrix
analog-to-digital converters, dedicated digital signal processing blocks, Ethernet controllers,
substantial memory capacity, and other dedicated functional blocks beyond the basic
arrays of simple logic elements they started out with in the mid-1980s. The current high
density of FPGAs allows to implement complete systems (System-on-Chip or SoC) on them.
In addition, the capacity of reconfiguration of FPGAs has increased. The best advantage and
the opportunities to design using these devices resides in the way the reconfiguration is
performed. The FPGA reconfiguration is based on the SRAM (Static Random Access
Memory) technology. The configuration of the device is guided by data stored in the
configuration memory. This content deter- mines the interconnection among the configurable
blocks and the function these blocks perform. Usually, the configuration memory stores just
one configuration (one-context) but some devices can store more than one (multi-context).
SRAM memory is volatile so the FPGA must be configured normally by an external
memory nonvolatile each time the FPGA is powered up.
Figure 2.8 Inter Architecture of FPGA
2.13.1 Look-Up Table
The way logic functions are implemented in a FPGA is another key feature. Logic blocks
that carry out logical functions are look-up tables (LUTs), implemented as memory, or
multiplexer and memory. Figure 2-10 shows these internal architecture of Common
FPGA’s, together with an inside component of for some basic operations.
2.13.2 Configurable Logic Blocks (CLBs)
The basic building block of Xilinx (CLBs) is the slice. Virtex and Spartan II hold two slices
in one CLB, while Virtex II and Spartan III hold four slices per CLB. Each slice contains two
4-input function generators (F/G), carry logic, and two storage elements.
Configuration Memory FPGA Structure
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
28
Each function generator output drives both the CLB output and the D-input of a flip-flop.
Besides the four basic function generators, the Virtex/Spartan II CLB contains logic that
combines function generators to provide functions of five or six inputs. The look-up tables
and storage elements of the CLB have the following characteristics:
i. Look-Up Tables (LUTs): Xilinx function generators are implemented as 4-input look-
up tables. Beyond operating as a function generator, each LUT can be programmed as
a (16x1)-bit synchronous RAM. Furthermore, the two LUTs can be combined within
a slice to create a (16x2)-bit or (32x1)-bit synchronous RAM, or a (16x1)-bit dual-port
synchronous RAM. Finally, the LUT can also provide a 16-bit shift register, ideal for
capturing high-speed data.
ii. Storage Elements: The storage elements in a slice can be configured either as edge-
triggered D-type flip-flops or as level-sensitive latches. The D-inputs can be driven
either by the function generators within the slice or directly from the slice inputs,
bypassing the function generators. As well as clock and clock enable signals, each
slice has synchronous set and reset signals.
2.13.3 Input/Output Blocks (IOBs)
The Xilinx IOB includes inputs and outputs that support a wide variety of I/O signaling
standards. The IOB storage elements act either as D-type flip-flops or as latches. For each
flip-flop, the set/reset (SR) signals can be independently configured as synchronous set,
synchronous reset, asynchronous preset, or asynchronous clear. Pull-up and pull-down
resistors and an optional weak-keeper circuit can be attached to each pad. IOBs are
programmable and can be categorized as follows:
a. Input Path: A buffer in the IOB input path is routing the input signals either directly
to internal logic or through an optional input flip-flop.
b. Output Path: The output path includes a 3-state output buffer that drives the output
signal onto the pad. The output signal can be routed to the buffer directly from the
internal logic or through an optional IOB output flip-flop. The 3-state control of the
output can also be routed directly from the internal logic or through a flip-flop that
provides synchronous enable and disable signals.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
29
2.13.4 RAM Blocks
Xilinx FPGA incorporates several large RAM memories (block select RAM). These memory
blocks are organized in columns along the chip. The number of blocks, ranging from 8 up to
more than 100, depends on the device size and family. In Virtex/Spartan II, each block is a
fully synchronous dual-ported 4096-bit RAM, with independent control signals for each port.
The data width of the two ports can be configured independently. In Virtex II/Spartan III, each
block provides 18-kbit storage.
2.13.5 Programmable Routing
Adjacent to each CLB stands a general routing matrix (GRM). The GRM is a switch matrix
through which resources are connected; the GRM is also the means by which the CLB gains
access to the general-purpose routing. Horizontal and vertical routing resources for each row
or column include:
i. Long Lines: bidirectional wires that distribute signals across the device.
ii. Vertical and horizontal long lines span the full height and width of the device.
iii. Hex Lines route signals to every third or sixth block away in all four directions.
iv. Double Lines: route signals to every first or second block away in all four
directions.
v. Direct Lines: route signals to neighboring blocks—vertically, horizontally, and
diagonally.
vi. Fast Lines: internal CLB local interconnections from LUT outputs to LUT inputs. The
routing performance factor of internal signals is the longest delay path that limits the
speed of any worst-case design. Consequently, the Xilinx routing architecture and its
place-and-route software were defined in a single optimization process. Xilinx devices
provide high-speed, low-skew clock distribution. Vertex provides four primary global
nets that drive any clock pin; instead, Vertex II has 16 global clock lines—eight per
quadrant.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
30
CHAPTER-3
METHODOLOGY
Thesis work is defined, implemented, and verified with choice of tools and supporting devices.
Also, some remarks on the performance and effectiveness of the tools are given.
3.1 The Design Cycle
The general design cycle for this work consisted of the following steps:
1. Studying the arithmetic functions.
2. Studying the elliptic curve constructs
3. HDL (VHDL) implementation of arithmetic functions
4. Commitment to a specific implementation of elliptic curve field representation.
5. Design of point multiplication elliptic curve engine.
6. Logic verification of the design.
7. Synthesis and logic optimization.
8. Device specific realization (place and route).
9. Back-annotated verification of the design.
The order of steps outlined above is more or less accurate. At some point of the project, steps
had to be retraced to ensure optimal or correct implementation. Since not all algorithms can
be easily implemented in hardware, careful consideration of the implementation was necessary
before committing to a specific option. By doing the initial research into Galois Field
arithmetic operations and their implementations in hardware, a few guidelines were developed
that aided in the choice of Galois field representation and elliptic curve point representation.
More specifically, standard base representation for Galois field arithmetic was chosen and
composite architectures were mapped to reconfigurable devices. Furthermore, for the Scalar
multiplication scheme Karatsuba based multiplication Optimization technique was chosen for
the reason of avoiding the most complex operation. Thus at the end of initial research,
commitment was made to realize the elliptic curve operation with maximum optimization and
standard base representation.
The next stage was the actual design of the digital system that realized the elliptic curve
group operation. During this stage many revisions were made to better fit the design to a
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
31
specific device. The XILINX FPGA XC6SLX45 family of devices was chosen as the target
platform.
Figure 3.1 Design Flow Chart
Verification of the design was first performed on the logic level basis. This step assured the
correct functionality if all combinatorial and net delays were ignored. Once the design was
verified logically, synthesis and optimization was performed. Timing constraints were set for
each component and different iterations were done until constraints were met. The next step
was to actually map, place and route the design into reconfigurable device. The choice
of a specific device within the XILINX FPGA XC6SLX45 family depends on the area
utilization report obtained through synthesis. Finally, the output of the place and route step
was used to perform back-annotated simulation. This step verified the correct operation with
net and combinatorial delays that resulted from the place and route process.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
32
3.2 Tools
The results presented in this thesis are the ones obtained by implementing the hardware
designs in FPGA technology. The targeted FPGA is the Spartan-6 XC6slx45 from Xilinx. The
Spartan-6 device logic unit is the slice. Each slice (see figure 5.1) consists of two fixed 4-input
LUTs, embedded multiplexers, carry logic, and two registers. (www.Xilinx.com last viewed May
27, 2014:4:00PM)
Configurable Logic Blocks (CLBs) in Spartan-6 FPGAs are made up of four slices. The function
generators are configurable as 4-input look-up tables (LUTs). Two slices in a CLB can have their
LUTs configured as 16-bit shift registers, or as 16-bit distributed RAM. In addition, the two storage
elements are either edge- triggered D-type flip-flops or level sensitive latches. Each CLB has
internal fast interconnect and connects to a switch matrix to access general routing resources.
The entire design, with the exception of vendor specific soft macros, was entered in VHDL format.
Once the design was developed in VHDL, Boolean logic and major timing errors were verified by
simulating the gate level description with ISim (VHDL/Verilog) Simulator. The next step
involved synthesis of the VHDL code with XST (VHDL/Verilog) Version 14.7. The output of
this step was an optimized netlist describing the gate level design in XILINX ISE suite 14.7.
Figure 3.2 FPGA Board
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
33
VHDL Librarie
s
NGCNetlist
NGDNetlist
UCF
3.2.1 Simulation and Verification
As previously stated, verification of the design is done at two points. First, it is applied to the
initial VHDL design. This verifies only the logic without delays. The input to this verification
process is a test bench written in VHDL, a model of the design written in actual VHDL design.
The test bench is used together with the VHDL design to simulate the design. Then the results
from the simulation are compared against results obtained from other published works in the same
area. The post place and route verification uses the test bench (with few modifications). The
VHDL input model to this stage is different. Here the VHDL model is obtained from the XILINX
place and route tools.
3.2.2 Synthesis
XST (VHDL/Verilog) Version 14.7 synthesis tools have been used; the documentation that
accompanied these tools was quite extensive and very helpful. This and other literature helped in
developing script files that could be launched from within the FPGA analyzer. One advantage of
running this XST (VHDL/Verilog) Version 14.7 was that multiple jobs could be run concurrently
resulting in faster turnaround and more time to try different optimization options.
Synthesis
Figure 3.3 Synthesis Flow diagram
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
34
3.2.3 Place and Route
The place and route tools were used on the implementation workstation of Xilinx platform. The
input to the place and route tools is a design netlist and constraints files generated by XST
(VHDL/Verilog) Version 14.7, as well as possible user constraints file. The user constraints have
higher priority. The Xilinx implementation include additional constraints relaxing the clock period
or implementing pin assignment. As it is explicitly known the output of this process is bit-stream
file that can be used to directly program the device and the back-annotated design that can be
simulated for timing verification.
Figure 3.4: Place and Rout Process
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
35
3.2.4 Lording the FPGA board
Once the programing files are generated (.bit file) the next step coming is programing the device;
for this our target device Spartan-6 XC6slx45 is then connected to our computer running the
Xilinx ISE Design Suit from this we trigger the “iMPACT “ tool to initialize the FPGA USB Port.
After the Cable port is initialized the Karatsuba.bit, the ECC_Eryptography.bit and other related
programing files are loaded to the FPGA. For the Reason of displaying the output a computer
HyperTerminal can be used.
Figure 3.5 Programing the FPGA board
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
36
CHAPTER-4
Design and Implementation
As it had been discussed in section 2.11 of chapter two, the efficiency of Elliptic Curve
Cryptography is highly dependent on the general construction of the computationally intensive
operation of the lower tree levels namely Scalar Multiplication, Elliptic Curve Group Operation
and Finite Field Operation. Therefore this chapter illustrates the design of the tree bottom layers
of Elliptic curve Cryptography on FPGAs. Furthermore, the control, data, and processing units
will be introduced as the basic building blocks of the (EC) implementation.
Figure 4.1 Typical Hierarchy of Elliptic Curve Cryptography
ECC Schemes
Point Scalar
Multiplication Point Addition
Point Doubling
Addition
Subtraction
Multiplication
Squarer
Addition Operation
Field
Operations
Large Integer Arithmetic Operation
Elliptic Point
Operations
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
37
Clearly, finite field Operations in Figure 2.4 are designed into any hardware, One possibility of
hardware design is to accelerate finite field arithmetic only, and then use off-the-shelf
microprocessor to perform the higher-level functions of elliptic curve point arithmetic. It is
important to note that an efficient finite field multiplier does not necessarily yield an efficient
point multiplier: all layers of the hierarchy in the Figure 4.1 need to be optimized. This is because
executing field operations in parallel that is possible at the curve operation level in hardware
will not be possible, if implementation such operations is done in software.
Moving point addition and doubling and then point multiplication to hardware provides a more
efficient ECC processor at the expense of more complexity. In all cases a combination of both
efficient algorithms and hardware architectures is required. Our design focuses on all but the
protocol level of the elliptic curve cryptosystem.
The basic method for computing scalar multiplication or point multiplication is the well-known
“add-and-double” method discussed in section literature survey part which requires m point
doublings and m/2 point additions on the average. [27] Proposed a fast algorithm of point
multiplication over GF (2m
) without pre-computation based on Montgomery ladder method [18].
One advantage of using this algorithm is that fewer field multiplications will be involved on
average than in the traditional method. Secondly, since projective instead of affine
coordinates are used, inversion is performed at the coordinate transformation step. In addition,
it is secure against side channel attack. Therefore, we adopt it for our scalar multiplier [1].
4.1 Karatsuba multiplier
Scalar multiplication is the most costly basic arithmetic function in Finite Field. For a given
extension field of order Prime Field GF(P), GF(2m) subfield multiplications are required to
multiply two values using traditional polynomial multiplication. It is shown in [12] [17] [24] that
this can be reduced drastically in certain cases. Using a method developed by Karatsuba and
Ofman [11], the number of multiplications can be reduced in ex- change for an increased number
of additions. As long as the time ratio for executing a multiplication vs. an addition is high, this
tradeoff is more efficient.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
38
A basic example of Karatsuba is given here to demonstrate its usefulness.
Given two degree-1 polynomials, A(x) and B(x), we can demonstrate the traditional and
the Karatsuba methods.
5 A(x) = a1x + a0
6 B(x) = b1x + b0
For the traditional method, w e must calculate the product o f each possible pair of
coefficients.
D0 = a0
b0 D1 = a0
b1 D2 = a1
b0 D3 = a1
b1 Now we can calculate the product C (x) = A(x) · B(x) as:
C (x) = D3 x2 + (D2 + D1) x + D0
The Karatsuba method begins by taking the same two polynomials, and calculating the
following three products:
E0 = a0 b0
E1 = a1 b1
E2 = (a0 + a1 )(b0 +b1 b1 )
These are then used to assemble the result C (x) = A(x) · B(x):
C (x) = E 1 x2 + (E2 − E1 − E0) x + E0 --- Equation 4.1
We can now look at how many operations are required for each method. The traditional
method requires four multiplications a n d one addition, w h i l e the Karatsuba method
requires three multiplications and four additions. Thus we have traded a single multiplication
for three additions. If the cost to multiply on the target platform is as least three times the
cost to add, then the method is effective. While this basic form of Karatsuba was presented
in the original paper, there are a number of ways this method may be expanded to handle
larger degree polynomials. This is shown in [9], where the authors give an in-depth study of
this method and its variations.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
39
In order to reduce the complexity of polynomial multiplication, the method of Karatsuba is
applied [12].
Whereas classically the coefficient of the product
(a1x+a0)(b1x+b0)=alb0x2+ (a1b0a0b1) x+a0b0
From the four input coefficients a0, a1, b0, b1 are computed with four 4 multipliers and one
addition, the Karatsuba formula uses only 3 multipliers and 4 addition in binary fields:
(a1x+a0)(b1x+b0)= alb0x2 + (a1⊕a0) (b1⊕b0) ⊕alb1 ⊕ a0b0 ) x + a0b0 ---Equation 4.2
By applying the Karatsuba method for larger polynomials the cost of extra additions vanishes
compared to other multiplication schemas.
Algorithm 4:1 Karatsuba Multiplier
M Input: Two Element A, B GF(2m) with m an arbitrary number, where A & B Can be
Expressed as : A=Xm/2AH+AL, B=Xm/2BH+BL
Output: A polynomial C=AB with up to 2m-1 coordinates, where C=XmCH+CL
Procedure BK(C,A,B)
Begin
K=[log2m]
d = m-2k
If (d==0)then
C=k mul2k (A,b)
Return
For i from 0 to d-1 do
MAi = AiL +Ai
H
MBi = BiL +Bi
H
End for
mul2k (AL,BL, CL)
mul2k (AL,BL, CL)
BK(CH , AH, BH)
For i from 2 to 2k-2 do
Mi = Mi + CiL
+ CiH
End for
For i from 2 to 2k-2 do
Ci+k= CK+i + Mi
End for
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
40
For i from 2 to 2k-2 do
Ci+k= CK+i + Mi
End for
End if
End
Figure 4.2: RTL Structure of Karatsuba Multiplier
4.2 Point Addition
The addition in the finite field of GF (2m) is very easy to compute. For the chosen field the addition
of two numbers is the simplest operation, since it is only a XOR combination of the bits of the two
addends. Therefore we need only m XOR gates and one clock cycle for this operation.
Algorithm 4.3 Double and Add/Subtract
Input: An Integer k > 0 and a point P
Output: Q = k·P
1. k := (kn-1, …, k1, k0)SD, ki {0, 1, -1}
2. Q := P
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
41
3. for i from n - 2 downto 0 do
4. Q := 2Q
5. if ki = "1" then
6. Q := Q + P
7. elseif ki = "-1" then
8. Q := Q - P
9. return Q
4.3 Point Multiplication
For the multiplication we chose a serial implementation, where the reduction with the irreducible
polynomial was integrated. So we need m - 1 XOR gates for the addition, several additional gates
for the integrated reduction with the irreducible polynomial, two shift registers, one register for
the multiplicand and two multiplexers.
4.4 Squaring
Extension field squaring is similar to multiplication, except that t h e two inputs are equal.
By modifying the standard multiplication routine, we are able to take advantage of identical
inner product terms. For example, c2 = a0 b2 + a1b1 + a2 b0 + ωc19, can be simplified to c2 = 2a0
a2 + a12 + ωc19. Further gain is accomplished by doubling only one coefficient, reducing it,
and storing the new value. This approach saves us from recalculating the doubled coefficient
when it is needed again.
Algorithm 4.2 Squaring with Subfield Reduction
Require: A(x) =∑ ai x
i , B(x) =∑ bix
i ∈ GF (23917)/P (x), where P (x) = xm −ω;
ai, bi ∈ GF (239); 0 ≤ i < 17
Ensure: C (x) =∑ cixi = A(x)B(x), ci ∈ GF (239)
1: Define z[w] to mean the with 8-bit word of z
2: ci ← 0
3: if i = 16 then
4: for j ← m − 1 downto i + 1 do
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
42
5: ci ← ci + ai+m−j bj
6: end for
7: ci ← 2ci – multiply by ω = 2
8: end if
9: for j ← i down to 0 do
10: ci ← ci + ai−j bj
11: end for
12: ci ← ci[2] ∗ 50 + ci[1] ∗ 17 + ci [0] – begin reduction, Equation (4.3)
13: t ← ci [1] ∗ 17 – begin Equation (4.4)
14: if t ≥ 256 then
15: t ← t[0] + 17
16: end if
17: ci ← ci[0] + t – end Equation (4.4)
18: if ci ≥ 256 then
19: ci ← ci[0] + 17
20: if ci ≥ 256 then
21: ci ← ci [0] + 17
22: terminate
23: end if
24: end if
25: ci ← ci − 239
26: if ci ≤ 0 then
27: ci ← ci + 239
28: End if
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
43
Figure 4.3 RTL Schematics of Squarer
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
44
CHAPTER-5
RESULTS AND DISCUSSIONS
5.1 Simulation Result for Karatsuba Multiplier
Based on the simulation result and reported generated from on Xilinx ISE Suit 14.7 the Device
Utilization summary for the Karatsuba multiplication is presented in the table below. As the table
explicitly infers, the resource utilization for Karatsuba multiplier is much utilized compare to all
the surveyed literatures explained in this thesis. This is illustrated by Table and Graph 5.1 shown
below. According to the table values generated the Karatsuba multiplier used 25 (1%) of the look
up tables from the total available (27,288) look up tables. This shows that our Karatsuba multiplier
uses the minimum number of lookup tables used in the literature reviewed [9][18][20][33][35].
Table 5.1 Resource Utilization of Karatsuba
Slice Logic Utilization Used Available Utilization
Number of Slice LUTs 25 27,288 1%
Number used as logic 25 27,288 1%
Number of occupied Slices 11 6,822 1%
Number with an unused Flip Flop 25 25 100%
Number of bonded IOBs 31 218 14%
Number of Slice
LUTs
Number used as
logic
Number of occupied
Slices
Number with an
unused Flip Flop
Number of bonded
IOBs
Utilization 1% 1% 1% 100% 14%
Available 27,288 27,288 6,822 25 218
Used 25 25 11 25 31
25 25 11 25 31
27,288 27,288
6,822
250
5000
10000
15000
20000
25000
30000
Quan
tity
Slice Logic
Resource Utilization of Karatduba Multiplier
Used Available Utilization
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
45
5.2 Resource Utilization for Polynomial Reduce
As the table bellow clearly illastrates the utilization report produced for polynomial reducer is
1% for number of ocupied slicee, 100% for number of Flip-flops used and 8 number (1%) of
Look up tables 1%. From this we can conclude that the Utilization for the mentioned device is
efficeint compared to reports generated from [4][21][20][32].
Table 5.2 Resource Utilization of Polynomila Reducer
Slice Logic Utilization Used Available Utilization
Number of Slice LUTs 8 27,288 1%
Number used as logic 8 27,288 1%
Number of occupied
Slices 5 6,822 1%
Number of Flip-Flops 8 8 100%
Number of bonded IOBs 23 218 10%
Number of SliceLUTs
Number used aslogic
Number ofoccupied Slices
Number of Flip-Flops
Number of bondedIOBs
Utilization 1% 1% 1% 100% 10%
Available 27,288 27,288 6,822 8 218
Used 8 8 5 8 23
8 8 5 8 23
27,288 27,288
6,822
8 218
Utilization, 1% Utilization, 1%Utilization, 1%
Utilization, 100%
Utilization, 10%
0
5000
10000
15000
20000
25000
30000
PLOYMOMIAL REDUCER RESOURCE UTILIZATION
Used Available Utilization
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
46
5.3 Resource Utiliazation of the Encryption Unit
As table 5.3 sumerizes the resurce avalable for the karatsuba based Encyptor Decyptor unit is
almost optimazed compared to publication presented in the thesis[4][11][22][28]. This can be
justified taking the amount of slice registers 715 (1%) used in the Encyptor unit.
Table 5.3 Resource Utilization of ECC Encryption Unit
Slice Logic Utilization Used Available Utilization
Number of Slice Registers 715 54,576 1%
Number of Slice LUTs 1,977 27,288 7%
Number used as logic 1,970 27,288 7%
Number of occupied Slices 601 6,822 8%
Number of MUXCYs used 1,396 13,644 10%
Number with an unused Flip Flop 1,294 2,000 64%
Number with an unused LUT 23 2,000 1%
Number of slice register sites lost
to control set restrictions 7 54,576 1%
Number of
Slice
Registers
Number of
Slice LUTs
Number
used as
logic
Number of
occupied
Slices
Number
with an
unused
Flip Flop
Number
with an
unused
LUT
Number
of fully
used LUT-
FF pairs
Number
of slice
register
sites lost
Used 715 1,977 1,970 601 1,294 23 683 7
Available 54,576 27,288 27,288 6,822 2,000 2,000 2,000 54,576
Utilization 1% 7% 7% 8% 64% 1% 34% 1%
715 1,977 1,970 601 1,294 23 683 7
54,576
27,288 27,288
6,8222,000 2,000 2,000
54,576
1%7% 7% 8%
64%
1%
34%
1% 0%
10%
20%
30%
40%
50%
60%
70%
0
10000
20000
30000
40000
50000
60000
Axi
s Ti
tle
Axi
s Ti
tle
Axis Title
R S O U R C E U T I L I ZAT ION O F E C C E N C RY PTO R
Used Available Utilization
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
47
5.4 Desing of ECC Encryption
The Designed Karatsuba multiplier from the xilinx enviroment is presented in the figure 5.1
bellow
Figure 5.1 Karatsuba based ECC Encryptor
As the figure above shows the designed Elliptic curve cryptography takes 15 bits of data and 15
bits of public key for the encyption prrocess. The output on the right side of the cirute also a 15
bit length after the encyption process.
4.5 Xpower Analysis of the Karatsuba Multiplier
The Xpower analyzer from the Xilinx ISE design suit produces the power consumption of the
Karatsuba based ECC circuit unit
Table 5.4 Power Report
On-Chip Power (mW) Used Available Utilization
Logics 0 25 27288 0%
IOs 36.14 31 218 14%
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
48
As Indicated in the above table the power consumption for the Logics on the chip found to be 25
mW almost 0%. This implies that the power consumption for the logic is efficiently optimized
compared to all publication result viewed in this thesis [17][22][29].
4.6 Decryption Unit resource Utilization report
Figure 5.2 Karatsuba based ECC Decryptor
As the table 5.5 shows the resource utilization for every component in the unit is almost efficiently
utilized. The amount of logic used for the decryption process is 1,980 (7%); this analysis shows
that our design almost efficiently utilized the available resource compared to publications reviewed
in this thesis.
Table 5.5 Decryption Unit Resource Utilization
Slice Logic Utilization Used Available Utilization
Number of Slice Registers 726 54,576 1%
Number of Slice LUTs 1,988 27,288 7%
Number used as logic 1,980 27,288 7%
Number of occupied Slices 607 6,822 8%
Number with an unused Flip Flop 1,364 2,077 65%
Number with an unused LUT 89 2,077 4%
Number of fully used LUT-FF pairs 624 2,077 30%
Number of bonded IOBs 52 218 23%
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
49
4.7 Computation Time Obtained from the Experiments
Table 5.6 illustrates the power consumption of two components inside the Hardware Accelerator
according to the table the power consumption for the Decryption Process is 0.914W greater than
the Karatsuba multiplier and polynomial Reducer units.
Table 5.6 Xpower Analysis for ECC Components
Activities Power consumption Time Taken
Karatsuba Multiplier 0.036W 7.019ns
Polynomial Reducer 0.036W 7.019ns
Decryption 0.037W 7.933ns
4.8 Comparison with other Related Work
As it is explicitly seen from the table below the resource utilizations table we can infer that our
proposed system is much more efficient than the listed experimentations and literatures revised in
the thesis. To exemplify this [4] used 1918 number of flip-flops and 14527 numbers of look-up
tables from their target device XCV200, in which case much more resource have been reduced in
our proposed system. Consequently the reduction of resource consumption in our experiment
resulted the reduction of power consumption as illustrated in section 5.6 of this chapter.
726 1,988 1,980 607 1,364 89 624 52
54,576
27,288 27,288
6,822 2,077 2,0772,077 218
1% 7% 7%
8% 65% 4%30%
23%
0
10000
20000
30000
40000
50000
60000
0 1 2 3 4 5 6 7 8 9
Decryption Unit Resource Utilization
Used Available Utilization
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
50
Table 5.7 Comparison of Different Implementations
Implementation
FPGA Number
of Flip-Flops
Number of
LUTs
KP
M. Kider and Manoj
V.N.V, 2008
XCV2000 1918 14527 47
Orlando & Parr
(2011)
XCV400E Unspecified Unspecified 210
N. Gura, et, al. (2007) XCV2000E 6442 19508 144
J. Luarz(2009) XCV2000E 1930 10017 75
Chang Chu ( 2013) XCV2000E 7467 25768 53
Our Design XC6SLX45 20 20 9.05
MontgomeryClassical Knuthmultiplication
Schönhage-Strassen trick
Montgomery Montgomery Karatsuba
M. Kider andManoj V.N.V,
2008
Orlando & Parr(2011)
N. Gura, et, al.(2007)
J. Luarz(2009)Chang Chu (
2013)Our Design
# of Flip-Flops 1918 0 6442 1930 7467 20
Number of LUTs 14527 0 19508 10017 25768 20
K*P time ns 47 210 144 75 53 9.05
1918 0 6442 1930 7467 2014527 0 19508 10017 25768 2047
210
144
7553
9.05
0
50
100
150
200
250
0
5000
10000
15000
20000
25000
30000
Comparison of Different Works
# of Flip-Flops Number of LUTs K*P time ns
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
51
CHAPTER-6
SUMMARY, CONCLUSION, RECOMMENDATIONS AND FUTURE
RESEARCH WORK
6.1 Summary
From a design point of view, FPGAs provide a suitable environment for our implementation. These
register rich devices can accommodate large memory structures and provide optimized macro cells
that improve the speed performance of the system. The fine grain device architecture allows for
synthesis tools to perform optimization almost at a gate level resulting in very efficient
implementations.
The concept of reconfigurable hardware for elliptic curves is very attractive for various reasons.
Reconfigurable hardware provides a versatile environment that is desirable when implementing
modern cryptographic protocols. In the work described here, we have shown that an elliptic curve
cryptosystem can principally be implemented on reconfigurable devices. There is however one
limitation. The long compile times required to place and route the EC design into a
specific device are currently a bottleneck during the development cycle. The available
tools are improving very rapidly and new, larger devices are being offered from many
vendors every year.
These improvements will make it possible to implement large and very complicated designs
in the near future.
With the synthesis tools available, it was possible to obtain estimated results for all
architectures. Furthermore, comparison of synthesis and implementation results, for various
large modules of our design, shows that synthesis results are very accurate. Thus EC crypto
engine can be implemented on XILINX FPGAs a t the estimated computation time.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
52
6.2 Conclusion
In section 6.1 of chapter six, our work provide some insight into hardware implementation of
complex cryptographic algorithms. Point multiplication on elliptic curves is one of the most
challenging computations used to implement public-key protocols namely Elliptic Curve
Cryptography. This holds especially true for hardware implementations of which very few have
been reported in the literature. It is our intention to provide the reader with the issues concerning
hardware Acceleration of elliptic curves Cryptography. Moreover, one of our goals was to show
that cryptographic protocols can be implemented in reconfigurable hardware. Wide data-paths
associated with elliptic curve implementation in hardware is of concern when trying to use FPGA
devices. However the limitation lies more in the tools rather than the resources available to us.
In this thesis, we have shown that reconfigurable hardware is a viable solution for public-key
cryptography. In principal, elliptic curve point multiplication can be achieved on FPGAs resulting
in very flexible implementation with increased speed performance over current software solution.
As security issues become more and more pronounced in the next few years and supporting FPGA
tools improve, we hope that reconfigurable hardware and elliptic curves will provide a viable
solution
6.3 Recommendations for Future works
This thesis concentrated on achieving point multiplication on elliptic curves in re-
configurable hardware. To our knowledge, this approach has not been yet attempted before.
Below, we summarize some of the more important work that could still be done from a
design and implementation point of view.
6.3.1 New Design Considerations
We would recommend to investigate different alternatives for implementing the control
structure. For example, the possibility of using RAM and counters to generate the control
vectors could be implemented.
Also, we would have liked to implement the system using two clocks to speed up
computation times.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
53
Another important design alternative that should be researched further is the implementation
of multiple arithmetic processing elements. This would allow for parallel
Operation effectively reducing the entire computation cycle by half. Such an alternative would
also require more routing resources.
Conversely, we would like to implement another design with a narrower datapath. Reducing
datapath would result in longer computation cycle. However, such a design would allow us to
use smaller FPGAs and possibly implement the general design on future smart cards.
6.3.2 Implementation Alternatives
From an implementation point of view, further research can be done to investigate other
reconfigurable devices. Soft macros can be remapped so that the design can be implemented in
EPLDs and CPLDs. Furthermore, devices from other vendors like ALTERA, AT&T and Motorola
could be used to implement our design. This would allow us to research other place and route
tools that may or may not perform better.
Future work could also concentrate on the actual system hardware implementation. For instance,
designing a PC plug-in board with reconfigurable cryptographic algorithms seems like an
attractive application.
Lastly, we would like to devote some time to try out one of the new devices that will be available
from XILINX in the near future. The new Virtex family of devices use 0.25 micron, five layer
metal process technology which will increase area, routing resources, and speed performance.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
54
REFERENCES
[1] A. Karatsuba and Y. Ofman. Multiplication of Multidigit Numbers on Automata.
Sov. Phys. Dokl. (English translation), 7(7):595–596, 1963.
[2] A. Woodbury, D. V. Bailey, and C. Paar. Elliptic Curve Cryptography on Smart Cards
Without Coprocessors. In IFIP CARDIS 2000, Fourth Smart Card Research and Ad-
vanced Application Conference, Bristol, UK, September 20–22 2000. Kluwer.
[3] B.Schneier. Applied Cryptography. John Wiley and Sons, second edition, 1996
[4] B. Sunar. Fast Galois Field Arithmetic for Elliptic Curve Cryptography and Error
Control Codes. PhD thesis, Department of Electrical & Computer Engineering, Oregon
State University, Corvallis, Oregon, USA, November 1998
[5] Cryptography and Elliptic Curves,
http://www.tcs.hut.fi/~helger/crypto/link/public/elliptic/
[6] David Seal. ARM Architecture Reference Manual. Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, second edition, 2000.
[7] D. R. Stinson. Cryptography, Theory and Practice. Chapman & Hall/CRC, Boca Raton,
Florida, USA, second edition, 2002.
[8] D. V. Bailey and C. Paar. Optimal Extension Fields for Fast Arithmetic in Public-
Key Algorithms. In H. Krawczyk, editor, Advances in Cryptology — CRYPTO ’98,
volume LNCS 1462, pages 472–485, Berlin, Germany, 1998. Springer-Verlag.
[9] D. V. Bailey and C. Paar. Efficient Arithmetic in Finite Field Extensions with
Appli- cation in Elliptic Curve Cryptography. Journal of Cryptology, 14(3):153–176,
2001.
[10] I. Blake, G. Seroussi, and N. Smart. Elliptic Curves in Cryptography.
Cambridge University Press, London Mathematical Society Lecture Notes Series 265,
1999.
[11] J. Guajardo and C. Paar. Itoh-Tsujii Inversion in Standard Basis and Its
Application in Cryptography. Design, Codes, and Cryptography, (25):207–216, 2002.
[12] M. Kider and Manoj V.N.V Hardware Acceleration of Elliptic Curve Cryptography,
Adiss Ababa University, Ethiopia 2008 .
[13] T. ElGamal. A Public-Key Cryptosystem and a Signature Scheme Based on
DiscreteLogarithms. IEEE Transactions on Information Theory, IT-31(4):469–472,
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
55
1985.
[14] S. T. J. Fenn, M. Benaissa, and D. Taylor. Finite Field Inversion Over the
Dual Base. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
4(1):134– 136, March 1996.
[15] W. Geiselmann and D. Gollmann. Self-Dual Bases in Fqn . Designs, Codes and
Cryp- tography, 3:333–345, 1993.
[16] M. A. Hasan. Double-Basis Multiplicative Inversion Over GF (2m). IEEE
Transactions on Computers, 47(9):960–970, September 1998.
[17] I. S. Hsu, T. K. Truong, L. J. Deutsch, and I. S. Reed. A Comparison of VLSI
Ar- chitecture of Finite Field Multipliers Using Dual-, Normal-, or Standard Bases.
IEEE Transactions on Computers, 37(6):735–739, June 1988.
[18] T. Itoh and S. Tsujii. A Fast Algorithm for Computing Multiplicative Inverses
in GF (2m) Using Normal Bases. Information and Computation, 78:171–177, 1988.
[19] C . K Koc and T. Acar. Montgomery Multplication in GF (2k ). Design, Codes,
and Cryptography, 14(1):57–69, 1998.
[20] R. Lidl and H. Niederreiter. Finite Fields, volume 20 of Encyclopedia of Mathematics
and its Applications. Addison-Wesley, Reading, Massachusetts, USA, 1983.
[21] E. D. Mastrovito. VLSI Architectures for Computation in Galois Fields. PhD thesis,
Linkoping University, Department of Electrical Engineering, Linkoping, Sweden, 1991.
[22] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied Cryptog-
raphy. CRC Press, Boca Raton, Florida, USA, 1997.
[23] R. L. Rivest, A. Shamir, and L. Adleman. A Method for Obtaining Digital Signatures
and Public-Key Cryptosystems. Communications of the ACM, 21(2):120–126,
February,1978.
[24] R. Schroeppel, H. Orman, S. O’Malley, and O. Spatscheck. Fast Key Exchange
with Elliptic Curve Systems. In D. Coppersmith, editor, Advances in Cryptology —
CRYPTO ’95, volume LNCS 963, pages 43–56, Berlin, Germany, 1995. Springer-
Verlag.
[25] Julio Lopez and Ricardo Dahab, “An overview of elliptic curve cryptography”, May
2000.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
56
[26] V. Miller, “Uses of elliptic curves in cryptography”, Advances in Cryptology -
CRYPTO'85, LNCS 218, pp.417-426, 1986.
[27] Jeffrey L. Vagle, “A Gentle Introduction to Elliptic Curve Cryptography”, BBN
Technologies
[28] Mugino Saeki, “Elliptic curve cryptosystems”, M.Sc. thesis, School of Computer
Science, McGill University, 1996. http://citeseer.nj.nec.com/saeki97elliptic.html
[29] J. Borst, “Public key cryptosystems using elliptic curves”, Master's thesis,
Eindhoven University of Technology, Feb. 1997.
http://citeseer.nj.nec.com/borst97public.html
[30] http://world.std.com/~franl/crypto.html
[31] Aleksandar Jurisic and Alfred Menezes, “Elliptic Curves and Cryptography”, Dr.
Dobb's Journal, April 1997, pp 26ff
[32] Robert Milson, “Introduction to Public Key Cryptography and Modular
Arithmetic”
[33] Aleksandar Jurisic and Alfred J. Menezes, Elliptic Curves and Cryptography
[34] William Stallings, Cryptography and Network Security-Principles and Practice
second edition, Prentice Hall publications.
[35] R. Schroppel, H. Orman, S. O’Malley and O. Spatscheck, “Fast key exchange with
elliptic key systems”, Advances in Cryptography, Proc. Crypto’95, LNCS 963, pp. 43-56,
Springer-Verlag, 1995.
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
57
2
APPENDIX I
Algorithm 1: Point Doubling
input : P (x1 , y1) ∈ Fq
output: [2]P (x3, y3) ∈ E (Fq )
x2 2 2 2
2 = x2 .x2, 2x2 = x2 + x2 ,; A = 3x2 = 2x2 + x2, B = A + a,;
2y2 = y2 + y2 , inv2y2, λ = B/2y2 ,;
λ2 = λ.λ, 2x2 = x2 + x2 , x3 = λ2 − 2x2,; C = x2 − x3, D = λ.C, y3 = D − y2
Algorithm 2: Point Addition
input : P (x1 , y1)
Q(x2 , y2) ∈ Fq
output: P + Q(x3, y3) ∈ E (Fq)
A = y2 − y1 , B = x2 − x1 , invB,;
λ = A/B, λ2 = λ.λ, C = λ2 − x1,; x3 = C − x2, D = x1 − x3 , E = D.λ, y3 = E − y1
Algorithm 4.2 Squaring with Subfield Reduction
Require: A(x) =∑ ai x
i , B(x) =∑ bix
i ∈ GF (23917)/P (x), where P (x) = xm −ω;
ai, bi ∈ GF (239); 0 ≤ i < 17
Ensure: C (x) =∑ cixi = A(x)B(x), ci ∈ GF (239)
1: Define z[w] to mean the with 8-bit word of z
2: ci ← 0
3: if i = 16 then
4: for j ← m − 1 downto i + 1 do
5: ci ← ci + ai+m−j bj
6: end for
7: ci ← 2ci – multiply by ω = 2
8: end if
9: for j ← i down to 0 do
10: ci ← ci + ai−j bj
11: end for
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
58
12: ci ← ci[2] ∗ 50 + ci[1] ∗ 17 + ci [0] – begin reduction, Equation (4.3)
13: t ← ci [1] ∗ 17 – begin Equation (4.4)
14: if t ≥ 256 then
15: t ← t[0] + 17
16: end if
17: ci ← ci[0] + t – end Equation (4.4)
18: if ci ≥ 256 then
19: ci ← ci[0] + 17
20: if ci ≥ 256 then
21: ci ← ci [0] + 17
22: terminate
23: end if
24: end if
25: ci ← ci − 239
26: if ci ≤ 0 then
27: ci ← ci + 239
28: End if
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
59
APPENDIX II : Sample Snapshot
A. ECC Encryptor
B. ECC Decryptor
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
60
C. Simulation for ECC Encryption
D. Simulation for Decryption
E. ISM Simulation of Karatsuba Multiplier
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
61
F. Design Result of Karatsuba Multiplier (Narrowed Design )
G. Detailed Karatsuba Multiplier RTL Circuit
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
62
H. Polynomial Reducer Circuit
I. Programing the Device on FPGA Spartan-6 Families
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
63
APPENDIX III : Synthesis Report From Xilinx
Release 14.7 - xst P.20131013 (nt)
Copyright (c) 1995-2013 Xilinx, Inc. All rights reserved.
--> Parameter TMPDIR set to xst/projnav.tmp
Total REAL time to Xst completion: 0.00 secs
Total CPU time to Xst completion: 0.37 secs
--> Parameter xsthdpdir set to xst
Total REAL time to Xst completion: 0.00 secs
Total CPU time to Xst completion: 0.37 secs
--> Reading design: poly_reducer.prj
TABLE OF CONTENTS
1) Synthesis Options Summary
2) HDL Parsing
3) HDL Elaboration
4) HDL Synthesis
4.1) HDL Synthesis Report
5) Advanced HDL Synthesis
5.1) Advanced HDL Synthesis Report
6) Low Level Synthesis
7) Partition Report
8) Design Summary
8.1) Primitive and Black Box Usage
8.2) Device utilization summary
8.3) Partition Resource Summary
8.4) Timing Report
8.4.1) Clock Information
8.4.2) Asynchronous Control Signals Information
8.4.3) Timing Summary
8.4.4) Timing Details
8.4.5) Cross Clock Domains Report
===============================================================
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
64
* Synthesis Options Summary *
===============================================================
---- Source Parameters
Input File Name : "poly_reducer.prj"
Ignore Synthesis Constraint File : NO
---- Target Parameters
Output File Name : "poly_reducer"
Output Format : NGC
Target Device : xc6slx45-2-csg324
---- Source Options
Top Module Name : poly_reducer
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
Safe Implementation : No
FSM Style : LUT
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
Shift Register Extraction : YES
ROM Style : Auto
Resource Sharing : YES
Asynchronous To Synchronous : NO
Shift Register Minimum Size : 2
Use DSP Block : Auto
Automatic Register Balancing : No
---- Target Options
LUT Combining : Auto
Reduce Control Sets : Auto
Add IO Buffers : YES
Global Maximum Fanout : 100000
Add Generic Clock Buffer(BUFG) : 16
Register Duplication : YES
Optimize Instantiated Primitives : NO
Use Clock Enable : Auto
Use Synchronous Set : Auto
Use Synchronous Reset : Auto
Pack IO Registers into IOBs : Auto
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
65
Equivalent register Removal : YES
---- General Options
Optimization Goal : Speed
Optimization Effort : 1
Power Reduction : NO
Keep Hierarchy : No
Netlist Hierarchy : As_Optimized
RTL Output : Yes
Global Optimization : AllClockNets
Read Cores : YES
Write Timing Constraints : NO
Cross Clock Analysis : NO
Hierarchy Separator : /
Bus Delimiter : <>
Case Specifier : Maintain
Slice Utilization Ratio : 100
BRAM Utilization Ratio : 100
DSP48 Utilization Ratio : 100
Auto BRAM Packing : NO
Slice Utilization Ratio Delta : 5
===============================================================
===============================================================
* HDL Parsing *
===============================================================
Parsing VHDL file
"G:\Collection\Karatsuba_Monogomry_ECC_Cryptography\classic_multiplier.vhd" into
library work
Parsing package <classic_multiplier_parameters>.
Parsing package body <classic_multiplier_parameters>.
Parsing entity <poly_multiplier>.
Parsing architecture <simple> of entity <poly_multiplier>.
Parsing entity <poly_reducer>.
Parsing architecture <simple> of entity <poly_reducer>.
Parsing entity <classic_multiplication>.
Parsing architecture <simple> of entity <classic_multiplication>.
Parsing VHDL file
"G:\Collection\Karatsuba_Monogomry_ECC_Cryptography\classic_squarer.vhd" into
library work
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
66
Parsing package <classic_squarer_parameters>.
Parsing package body <classic_squarer_parameters>.
Parsing entity <poly_reducer>.
WARNING:HDLCompiler:685 -
"G:\Collection\Karatsuba_Monogomry_ECC_Cryptography\classic_squarer.vhd" Line
69: Overwriting existing primary unit poly_reducer
Parsing architecture <simple> of entity <poly_reducer>.
Parsing entity <classic_squarer>.
Parsing architecture <simple> of entity <classic_squarer>.
===============================================================
* HDL Elaboration *
===============================================================
Elaborating entity <poly_reducer> (architecture <simple>) from library <work>.
===============================================================
* HDL Synthesis *
===============================================================
Synthesizing Unit <poly_reducer>.
Related source file is
"G:\Collection\Karatsuba_Monogomry_ECC_Cryptography\classic_squarer.vhd".
Summary:
Unit <poly_reducer> synthesized.
===============================================================
HDL Synthesis Report
Macro Statistics
# Xors : 11
1-bit xor2 : 3
1-bit xor3 : 4
1-bit xor4 : 4
===============================================================
===============================================================
* Advanced HDL Synthesis *
===============================================================
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
67
===============================================================
Advanced HDL Synthesis Report
Macro Statistics
# Xors : 11
1-bit xor2 : 3
1-bit xor3 : 4
1-bit xor4 : 4
===============================================================
===============================================================
* Low Level Synthesis *
===============================================================
Optimizing unit <poly_reducer> ...
Mapping all equations...
Building and optimizing final netlist ...
Found area constraint ratio of 100 (+ 5) on block poly_reducer, actual ratio is 0.
Final Macro Processing ...
===============================================================
Final Register Report
Found no macro
===============================================================
===============================================================
* Partition Report *
===============================================================
Partition Implementation Status
-------------------------------
No Partitions were found in this design.
===============================================================
* Design Summary *
===============================================================
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
68
Top Level Output File Name : poly_reducer.ngc
Primitive and Black Box Usage:
------------------------------
# BELS : 9
# LUT2 : 1
# LUT4 : 5
# LUT5 : 2
# LUT6 : 1
# IO Buffers : 23
# IBUF : 15
# OBUF : 8
Device utilization summary:
---------------------------
Selected Device : 6slx45csg324-2
Slice Logic Utilization:
Number of Slice LUTs: 9 out of 27288 0%
Number used as Logic: 9 out of 27288 0%
Slice Logic Distribution:
Number of LUT Flip Flop pairs used: 9
Number with an unused Flip Flop: 9 out of 9 100%
Number with an unused LUT: 0 out of 9 0%
Number of fully used LUT-FF pairs: 0 out of 9 0%
Number of unique control sets: 0
IO Utilization:
Number of IOs: 23
Number of bonded IOBs: 23 out of 218 10%
Specific Feature Utilization:
---------------------------
Partition Resource Summary:
---------------------------
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
69
No Partitions were found in this design.
===============================================================
Timing Report
NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.
FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE
REPORT
GENERATED AFTER PLACE-and-ROUTE.
Clock Information:
------------------
No clock signals found in this design
Asynchronous Control Signals Information:
----------------------------------------
No asynchronous control signals found in this design
Timing Summary:
---------------
Speed Grade: -2
Minimum period: No path found
Minimum input arrival time before clock: No path found
Maximum output required time after clock: No path found
Maximum combinational path delay: 7.019ns
Timing Details:
---------------
All values displayed in nanoseconds (ns)
===============================================================
Timing constraint: Default path analysis
Total number of paths / destination ports: 37 / 8
-------------------------------------------------------------------------
Delay: 7.019ns (Levels of Logic = 4)
Source: d<11> (PAD)
Destination: c<3> (PAD)
Data Path: d<11> to c<3>
Gate Net
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
70
Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
IBUF:I->O 4 1.328 0.912 d_11_IBUF (d_11_IBUF)
LUT2:I0->O 1 0.250 0.682 Mxor_gen_xors[3].l1.aux_xo<0>_SW0 (N2)
LUT6:I5->O 1 0.254 0.681 Mxor_gen_xors[3].l1.aux_xo<0> (c_3_OBUF)
OBUF:I->O 2.912 c_3_OBUF (c<3>)
----------------------------------------
Total 7.019ns (4.744ns logic, 2.275ns route)
(67.6% logic, 32.4% route)
===============================================================
Cross Clock Domains Report:
--------------------------
===============================================================
Total REAL time to Xst completion: 17.00 secs
Total CPU time to Xst completion: 17.19 secs
-->
Total memory usage is 185824 kilobytes
Number of errors : 0 ( 0 filtered)
Number of warnings : 1 ( 0 filtered)
Number of infos : 0 ( 0 filtered)
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
71
APPENDIX IV: Sample VHDL Code
/* VHDL Code for Karatsuba Multiplier */
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity karatsuba_multiplier_even is
generic (M: integer:= 8);
port (
a, b: in std_logic_vector(M-1 downto 0);
d: out std_logic_vector(2*M-2 downto 0)
);
end karatsuba_multiplier_even;
architecture simple of karatsuba_multiplier_even is
component polynom_multiplier is
generic (M: integer:= 8);
port (
a, b: in std_logic_vector(M-1 downto 0);
d: out std_logic_vector(2*M-2 downto 0)
);
end component polynom_multiplier;
constant half_M :integer := M/2;
signal x0y0, x01y01: std_logic_vector(2*half_M-2 downto 0);
signal x1y1: std_logic_vector(2*half_M-2 downto 0);
signal x0_p_X1, y0_p_y1: std_logic_vector(half_M-1 downto 0);
begin
mult1: polynom_multiplier generic map(M => half_M)
port map(a => a(half_M-1 downto 0),
b => b(half_M-1 downto 0), d=> x0y0);
mult2: polynom_multiplier generic map(M => half_M)
port map(a => a(M-1 downto half_M),
b => b(M-1 downto half_M), d=> x1y1);
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
72
mult3: polynom_multiplier generic map(M => half_M)
port map(a => x0_p_X1,
b => y0_p_y1, d=> x01y01);
gen_x0x1y0y1: for i in 0 to half_M-1 generate
x0_p_X1(i) <= a(i) xor a(i + half_M);
y0_p_y1(i) <= b(i) xor b(i + half_M);
end generate;
gen_prod1: for i in 0 to half_M-2 generate
d(half_M + i) <= x01y01(i) xor x0y0(i) xor x1y1(i) xor x0y0(i+half_M);
end generate;
d(2*half_M-1) <= x01y01(half_M-1) xor x0y0(half_M-1) xor x1y1(half_M-1);
gen_prod2: for i in half_M to 2*half_M-2 generate
d(half_M + i) <= x01y01(i) xor x0y0(i) xor x1y1(i) xor x1y1(i-half_M) ;
end generate;
d(3*half_M-1) <= x1y1(half_M-1);
d(half_M-1 downto 0) <= x0y0(half_M-1 downto 0);
d(2*M-2 downto 3*half_M) <= x1y1(2*half_M-2 downto half_M);
end simple;
--------------------------------------------------------------------------------
-- Simple testbench for "poly_multiplier" module (for m=8)
--
--------------------------------------------------------------------------------
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.all;
USE ieee.numeric_std.ALL;
use work.classic_multiplier_parameters.all;
ENTITY test_poly_mult_vhd IS
END test_poly_mult_vhd;
ARCHITECTURE behavior OF test_poly_mult_vhd IS
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
73
-- Component Declaration for the Unit Under Test (UUT)
COMPONENT poly_multiplier
PORT(
a : IN std_logic_vector(m-1 downto 0);
b : IN std_logic_vector(m-1 downto 0);
d : OUT std_logic_vector(2*m-2 downto 0)
);
END COMPONENT;
--Inputs
SIGNAL a : std_logic_vector(m-1 downto 0) := (others=>'0');
SIGNAL b : std_logic_vector(m-1 downto 0) := (others=>'0');
--Outputs
SIGNAL d : std_logic_vector(2*m-2 downto 0);
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: poly_multiplier PORT MAP( a => a, b => b, d => d );
tb : PROCESS
BEGIN
-- Wait 100 ns for global reset to finish
wait for 100 ns;
a <= "10101010";
b <= "10101010";
wait for 100 ns;
assert (d = "100010001000100") report "ERROR in mult" severity FAILURE;
a <= "10101010";
b <= "00000000";
wait for 100 ns;
assert (d = "000000000000000") report "ERROR in mult" severity FAILURE;
a <= "11111111";
b <= "10101010";
wait for 100 ns;
assert (d = "110011001100110") report "ERROR in mult" severity FAILURE;
a <= "10101010";
b <= "01010101";
wait for 100 ns;
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
74
assert (d = "010001000100010") report "ERROR in mult" severity FAILURE;
a <= "01010101";
b <= "01010101";
wait for 100 ns;
assert (d = "001000100010001") report "ERROR in mult" severity FAILURE;
wait; -- will wait forever
END PROCESS;
END;
--------------------------------------------------------------------------------
-- VHDL Code For Square
--------------------------------------------------------------------------------
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE IEEE.std_logic_arith.all;
USE ieee.std_logic_unsigned.all;
USE ieee.numeric_std.ALL;
USE ieee.std_logic_textio.ALL;
use ieee.math_real.all; -- for UNIFORM, TRUNC
USE std.textio.ALL;
--use work.classic_multiplier_parameters.all;
use work.LSB_first_squarer_package.all;
ENTITY test_square_comparac IS
END test_square_comparac;
ARCHITECTURE behavior OF test_square_comparac IS
-- Component Declaration for the Unit Under Test (UUT2)
COMPONENT classic_multiplication
PORT(
a : IN std_logic_vector(M-1 downto 0);
b : IN std_logic_vector(M-1 downto 0);
c : OUT std_logic_vector(M-1 downto 0)
);
END COMPONENT;
COMPONENT classic_squarer
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
75
PORT(
a : IN std_logic_vector(M-1 downto 0);
c : OUT std_logic_vector(M-1 downto 0)
);
END COMPONENT;
COMPONENT montgomery_squarer is
port (
a: in std_logic_vector (M-1 downto 0);
clk, reset, start: in std_logic;
z: out std_logic_vector (M-1 downto 0);
done: out std_logic
);
END COMPONENT;
COMPONENT montgomery_comb_squarer is
port (
a: in std_logic_vector (M-1 downto 0);
c: out std_logic_vector (M-1 downto 0)
);
END COMPONENT;
COMPONENT LSB_first_squarer is
port (
a: in std_logic_vector (M-1 downto 0);
clk, reset, start: in std_logic;
z: out std_logic_vector (M-1 downto 0);
done: out std_logic
);
END COMPONENT;
-- Internal signals
SIGNAL x, c, sq : std_logic_vector(M-1 downto 0) := (others=>'0');
SIGNAL clk, reset, start, done_montg, done_lsbf: std_logic;
SIGNAL montg_sq, r, montg_sq_adj, montg_sq_comb , montg_sq_comb_adj, lsbf_sq:
std_logic_vector(M-1 downto 0) := (others=>'0');
constant DELAY : time := 100 ns;
constant PERIOD : time := 200 ns;
constant DUTY_CYCLE : real := 0.5;
constant OFFSET : time := 0 ns;
constant NUMBER_TESTS: natural := 100;
BEGIN
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
76
-- Instantiate the Unit Under Test (UUT)
uut0: classic_multiplication PORT MAP( a => x, b => x, c => c );
uut1: classic_squarer PORT MAP( a => x, c => sq );
uut2: montgomery_squarer PORT MAP(A => x,
clk => clk, reset => reset, start => start,
z => montg_sq, done => done_montg);
r <= F; --2**K mod F = F
uut2b: classic_multiplication PORT MAP( a => montg_sq, b => r, c => montg_sq_adj );
uut3: montgomery_comb_squarer PORT MAP( a => x, c => montg_sq_comb );
uut3b: classic_multiplication PORT MAP( a => montg_sq_comb, b => r, c =>
montg_sq_comb_adj );
uut4: LSB_first_squarer PORT MAP(A => x,
clk => clk, reset => reset, start => start,
z => lsbf_sq, done => done_lsbf);
PROCESS -- clock process for clk
BEGIN
WAIT for OFFSET;
CLOCK_LOOP : LOOP
clk <= '0';
WAIT FOR (PERIOD *(1.0 - DUTY_CYCLE));
clk <= '1';
WAIT FOR (PERIOD * DUTY_CYCLE);
END LOOP CLOCK_LOOP;
END PROCESS;
tb_proc : PROCESS --generate values
PROCEDURE gen_random(X : out std_logic_vector (M-1 DownTo 0); w: natural; s1, s2:
inout Natural) IS
VARIABLE i_x, aux: integer;
VARIABLE rand: real;
BEGIN
aux := W/16;
for i in 1 to aux loop
UNIFORM(s1, s2, rand);
i_x := INTEGER(TRUNC(rand * real(2**16)));
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
77
x(i*16-1 downto (i-1)*16) := CONV_STD_LOGIC_VECTOR (i_x, 16);
end loop;
UNIFORM(s1, s2, rand);
i_x := INTEGER(TRUNC(rand * real(2**(w-aux*16))));
x(w-1 downto aux*16) := CONV_STD_LOGIC_VECTOR (i_x, (w-aux*16));
END PROCEDURE;
VARIABLE TX_LOC : LINE;
VARIABLE TX_STR : String(1 to 4096);
VARIABLE seed1, seed2: positive;
VARIABLE i_x, i_y, i_p, i_z, i_yz_modp: integer;
VARIABLE cycles, max_cycles, min_cycles, total_cycles: integer := 0;
VARIABLE avg_cycles: real;
VARIABLE initial_time, final_time: time;
VARIABLE xx: std_logic_vector (M-1 DownTo 0) ;
BEGIN
min_cycles:= 2**20;
start <= '0'; reset <= '1';
WAIT FOR PERIOD;
reset <= '0';
WAIT FOR PERIOD;
for I in 1 to NUMBER_TESTS loop
gen_random(xx, M, seed1, seed2);
x <= xx;
start <= '1'; initial_time := now;
WAIT FOR PERIOD;
start <= '0';
wait until done_montg = '1';
final_time := now;
cycles := (final_time - initial_time)/PERIOD;
total_cycles := total_cycles+cycles;
--ASSERT (FALSE) REPORT "Number of Cycles: " & integer'image(cycles) & "
TotalCycles: " & integer'image(total_cycles) SEVERITY WARNING;
if cycles > max_cycles then max_cycles:= cycles; end if;
if cycles < min_cycles then min_cycles:= cycles; end if;
WAIT FOR 2*PERIOD;
IF ( c /= sq or c/= montg_sq_adj or c /= montg_sq_comb_adj or c /=lsbf_sq) THEN
write(TX_LOC,string'("ERROR!!! C=")); write(TX_LOC, c);
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
78
write(TX_LOC,string'("/= Z=")); write(TX_LOC, c);
write(TX_LOC,string'("/= sq=")); write(TX_LOC, sq);
write(TX_LOC,string'("/= montg_sq=")); write(TX_LOC, montg_sq);
write(TX_LOC,string'("/= montg_sq_Adj=")); write(TX_LOC, montg_sq_adj);
write(TX_LOC,string'(" (montg_comb=")); write(TX_LOC, montg_sq_comb);
write(TX_LOC,string'(") /= montg_combAdj=")); write(TX_LOC, montg_sq_comb_adj);
write(TX_LOC,string'(") using: ( A =")); write(TX_LOC, x);
write(TX_LOC, string'(", F = 1")); write(TX_LOC, F);
write(TX_LOC, string'(" )"));
TX_STR(TX_LOC.all'range) := TX_LOC.all;
Deallocate(TX_LOC);
ASSERT (FALSE) REPORT TX_STR SEVERITY ERROR;
END IF;
end loop;
WAIT FOR DELAY;
avg_cycles := real(total_cycles)/real(NUMBER_TESTS);
ASSERT (FALSE) REPORT
"Simulation successful!. MinCycles: " & integer'image(min_cycles) &
" MaxCycles: " & integer'image(max_cycles) & " TotalCycles: " &
integer'image(total_cycles) &
" AvgCycles: " & real'image(avg_cycles)
SEVERITY FAILURE;
END PROCESS;
END;
----------------------------------------------------------------------------------------------------------------
-- Test Division algorithm
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE IEEE.std_logic_arith.all;
USE ieee.std_logic_unsigned.all;
USE ieee.numeric_std.ALL;
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
79
USE ieee.std_logic_textio.ALL;
use ieee.math_real.all; -- for UNIFORM, TRUNC
USE std.textio.ALL;
use work.binary_algorithm_polynomials_parameters.all;
ENTITY test_binary_division IS
END test_binary_division;
ARCHITECTURE behavior OF test_binary_division IS
-- a multiplier is instantiated to check the results
COMPONENT classic_multiplication
PORT(
a : IN std_logic_vector(M-1 downto 0);
b : IN std_logic_vector(M-1 downto 0);
c : OUT std_logic_vector(M-1 downto 0)
);
END COMPONENT;
-- Component Declaration for the Unit Under Test (UUT2)
COMPONENT binary_algorithm_polynomials is
port (
g, h: in std_logic_vector (M-1 downto 0);
clk, reset, start: in std_logic;
Z: out std_logic_vector (M-1 downto 0);
done: out std_logic
);
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
80
END COMPONENT binary_algorithm_polynomials;
-- Internal signals
SIGNAL x, y, z, z_by_y : std_logic_vector(M-1 downto 0) := (others=>'0');
SIGNAL clk, reset, start, done: std_logic;
constant ZERO: std_logic_vector(M-1 downto 0) := (others=>'0');
constant DELAY : time := 100 ns;
constant PERIOD : time := 200 ns;
constant DUTY_CYCLE : real := 0.5;
constant OFFSET : time := 0 ns;
constant NUMBER_TESTS: natural := 100;
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut1: binary_algorithm_polynomials PORT MAP(g => x, h => y,
clk => clk, reset => reset, start => start,
z => z, done => done);
uut2: classic_multiplication PORT MAP( a => z, b => y, c => z_by_y );
PROCESS -- clock process for clk
BEGIN
WAIT for OFFSET;
CLOCK_LOOP : LOOP
clk <= '0';
WAIT FOR (PERIOD *(1.0 - DUTY_CYCLE));
clk <= '1';
WAIT FOR (PERIOD * DUTY_CYCLE);
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
81
END LOOP CLOCK_LOOP;
END PROCESS;
tb_proc : PROCESS --generate values
PROCEDURE gen_random(X : out std_logic_vector (M-1 DownTo 0); w: natural; s1, s2:
inout Natural) IS
VARIABLE i_x, aux: integer;
VARIABLE rand: real;
BEGIN
aux := W/16;
for i in 1 to aux loop
UNIFORM(s1, s2, rand);
i_x := INTEGER(TRUNC(rand * real(2**16)));
x(i*16-1 downto (i-1)*16) := CONV_STD_LOGIC_VECTOR (i_x, 16);
end loop;
UNIFORM(s1, s2, rand);
i_x := INTEGER(TRUNC(rand * real(2**(w-aux*16))));
x(w-1 downto aux*16) := CONV_STD_LOGIC_VECTOR (i_x, (w-aux*16));
END PROCEDURE;
VARIABLE TX_LOC : LINE;
VARIABLE TX_STR : String(1 to 4096);
VARIABLE seed1, seed2: positive;
VARIABLE i_x, i_y, i_p, i_z, i_yz_modp: integer;
VARIABLE cycles, max_cycles, min_cycles, total_cycles: integer := 0;
VARIABLE avg_cycles: real;
VARIABLE initial_time, final_time: time;
VARIABLE xx: std_logic_vector (M-1 DownTo 0) ;
BEGIN
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
82
min_cycles:= 2**20;
start <= '0'; reset <= '1';
WAIT FOR PERIOD;
reset <= '0';
WAIT FOR PERIOD;
for I in 1 to NUMBER_TESTS loop
gen_random(xx, M, seed1, seed2);
x <= xx;
gen_random(xx, M, seed1, seed2);
while (xx = ZERO) loop gen_random(xx, M, seed1, seed2); end loop;
y <= xx;
start <= '1'; initial_time := now;
WAIT FOR PERIOD;
start <= '0';
wait until done = '1';
final_time := now;
cycles := (final_time - initial_time)/PERIOD;
total_cycles := total_cycles+cycles;
--ASSERT (FALSE) REPORT "Number of Cycles: " & integer'image(cycles) & "
TotalCycles: " & integer'image(total_cycles) SEVERITY WARNING;
if cycles > max_cycles then max_cycles:= cycles; end if;
if cycles < min_cycles then min_cycles:= cycles; end if;
WAIT FOR 2*PERIOD;
IF ( x /= z_by_y ) THEN
write(TX_LOC,string'("ERROR!!! z_by_y=")); write(TX_LOC, z_by_y);
write(TX_LOC,string'("/= x=")); write(TX_LOC, x);
write(TX_LOC,string'("( z=")); write(TX_LOC, z);
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
83
write(TX_LOC,string'(") using: ( A =")); write(TX_LOC, x);
write(TX_LOC, string'(", B =")); write(TX_LOC, y);
write(TX_LOC, string'(", F = 1")); write(TX_LOC, F);
write(TX_LOC, string'(" )"));
TX_STR(TX_LOC.all'range) := TX_LOC.all;
Deallocate(TX_LOC);
ASSERT (FALSE) REPORT TX_STR SEVERITY ERROR;
END IF;
end loop;
WAIT FOR DELAY;
avg_cycles := real(total_cycles)/real(NUMBER_TESTS);
ASSERT (FALSE) REPORT
"Simulation successful!. MinCycles: " & integer'image(min_cycles) &
" MaxCycles: " & integer'image(max_cycles) & " TotalCycles: " &
integer'image(total_cycles) &
" AvgCycles: " & real'image(avg_cycles)
SEVERITY FAILURE;
END PROCESS;
END;
----------------------------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity multiplier is
port( B :in std_logic_vector(15 downto 0);
Product :out std_logic_vector(15 downto 0)
);
end multiplier;
architecture Behavioral of multiplier is
signal P1:std_logic_vector(15 downto 0);
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
84
signal P2:std_logic_vector(15 downto 0);
signal P3:std_logic_vector(15 downto 0);
signal P4:std_logic_vector(15 downto 0);
signal P5:std_logic_vector(15 downto 0);
signal P6:std_logic_vector(15 downto 0);
signal P7:std_logic_vector(15 downto 0);
signal P8:std_logic_vector(15 downto 0);
signal P9:std_logic_vector(15 downto 0);
component wallace_structure is
port(P1,P2,P3,P4,P5,P6,P7,P8,P9 :in std_logic_vector( 15 downto 0);
product :out std_logic_vector( 15 downto 0));
end component;
begin
-- partial products reduced to 9 from 16 due to the multiplication of B and 2B+1
P1 <= B;
gen1:for i in 13 downto 1 generate
p2(i+2) <= B(0) and B(i); --P2 = {{B[3:15] & {13{B[15]}}},1'd0,B[15],1'd0};
end generate;
p2(2 downto 0)<=('0' & B(0) & '0');
gen2:for i in 12 downto 2 generate
p3(i+3) <= B(1) and B(i); --P3 <= {{B[5:15] & {11{B[14]}}},1'd0,B[14],3'd0};
end generate;
p3(4 downto 0)<=( '0' & B(1) & "000");
gen3:for i in 11 downto 3 generate
p4(i+4) <= B(2) and B(i); -- P4 <= {{B[4:12] & {9{B[13]}}},1'd0,B[13],5'd0};
end generate;
p4(6 downto 0)<=('0'& B(2) & "00000");
gen5:for i in 10 downto 4 generate
p5(i+5) <= B(3) and B(i); -- P5 <= {{B[5:11] & {7{B[12]}}},1'd0,B[12],7'd0};
end generate;
p5(8 downto 0)<=('0' & B(3) & "0000000");
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun
85
gen6:for i in 9 downto 5 generate
p6(i+6) <= B(4) and B(i); --P6 <= {{B[6:10] & {5{B[11]}}},1'd0,B[11],9'd0};
end generate;
p6(10 downto 0)<=( '0' & B(4) & "000000000");
gen7:for i in 8 downto 6 generate
p7(i+7) <= B(5) and B(i); -- P7 <= {{B[7:9] & {3{B[10]}}},1'd0,B[10],11'd0};
end generate;
p7(12 downto 0)<=( '0' & B(5) & "00000000000");
P8 <= ((B(7) AND B(6)) & '0'& B(6) & "0000000000000");
P9 <= (B(7) & "000000000000000");
w1: wallace_structure port map (P1,P2,P3,P4,P5,P6,P7,P8,P9,product); --Wallace tree
end Behavioral;
Alemayehu TilahunAlemayehu Tilahun Alemayehu Tilahun