+ All Categories
Home > Documents > Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Date post: 08-Apr-2015
Category:
Upload: navneetareddy
View: 140 times
Download: 1 times
Share this document with a friend
66
CMR INSTITUTE OF TECHNOLOGY MINI PROJECT REPORT ON IMPLEMENTATION OF WDDL GATES FOR SECURE IC APPLICATIONS A Mini Project Report Submitted in partial fulfillment of the requirement for the award of degree of BACHELOR OF TECHNOLOGY IN ELECTRONIC AND COMMUNICATION ENGINEERING By R. SATISH KUMAR 06R01A0440 M. SHESHU BINDU 06R01A0441 M. SRAVAN KUMAR 06R01A0444 1
Transcript
Page 1: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

CMR INSTITUTE OF TECHNOLOGY

MINI PROJECT REPORT ON

IMPLEMENTATION OF WDDL GATES FOR

SECURE IC APPLICATIONS

A Mini Project Report Submitted in partial fulfillment of the requirement for the award of degree of

BACHELOR OF TECHNOLOGYIN

ELECTRONIC AND COMMUNICATION ENGINEERING

By

R SATISH KUMAR 06R01A0440

M SHESHU BINDU 06R01A0441

M SRAVAN KUMAR 06R01A0444

S KHAJA MOHIDDIN MTech Prof K RAMANAIAHInternal guide HOD ECECMRIT CMRIT

1

Date __________

CERTIFICATE

This is to certify that the mini project entitled ldquoIMPLEMENTATION OF WDDL GATES FOR SECURE IC APPLICATIONSrdquo was successfully carried out by

R SATISH KUMAR 06R01A0440

M SHESHU BINDU 06R01A0441

M SRAVAN KUMAR 06R01A0444

In partial fulfillment of the requirement for the award of Bachelor of Technology in ldquoElectronics and communication Engineeringrdquo

from ldquoJawaharlal Nehru Technological Universityrdquo during academic year 2006 - 2010

Internal Guide Head of the DepartmentS KHAJA MOHIDDIN MTech ProfK Ramanaiah

External Examiner PRINICIPAL Dr M Janga Reddy

2

ACKNOWLEDGMENT

We express our sincere thanks to the management of VEDIC

SCHOOL OF VLSI DESIGN for giving us this opportunity to work in their

organization

We express our immense gratitude to MrMRKArjun FPGA Design

Engineer(Simpli5ng Semiconductor PvtLtd) his inspiring remarks and

simulating guidance valuable suggestion and encouragement helped us greatly in

completion of our project ldquoIMPLEMENTATION OF WDDL GATES FOR

SECURE IC APPLICATIONSrdquo

We wish to thank internal guide of our project Mr S KHAJA

MOHIDDIN Department of Electronics for his constant inspiration and advice

throughout our project work

We express our sincere gratitude to respected Mr JANGA REDDY Pricipal of

CMRIT and Mr K RAMANAIAH HOD of ECE department for their

valuable guidance encouragement and suggestions

3

INDEX

ABSTRACT

CHAPTER 1 INTRODUCTION AND OBJECTIVE

11 INTRODUCTION 12 OBJECTIVE CHAPTER 2 REVIEW OF LITERATURE 21 INTRODUCTION TO DIGITAL DESIGN FLOW 22 SECURE DIGITAL DESIGN FLOW

CHAPTER 3 HARDWARE DESCRIPTION LANGUAGE (VHDL) CHAPTER 4 SMART CARD OVERVIEW

CHAPTER 5 SIDE CHANNEL ATTACKS

51 CLASSIFICATION FO SIDE CHANNEL ATTACKS 52 POWER ANALYSIS ATTACKS 521 SIMPLE POWER ANALYSIS (SPA) 522 DIFFERENTIAL POWER ANALYSIS (DPA) CHAPTER 6 CONSTANTndashPOWER CONSUMING LOGIC STYLES 61 CURRENT MODE LOGIC 62 VOLTAGE MODE LOGIC (CMOS CIRCUIT STYLES) 63 DYNAMIC DIFFERENTIAL LOGIC 631 SENSE AMPLIFIER BASED LOGIC (SABL) 632 WAVE DYNAMIC DIFFERENTIAL LOGIC GATES

(WDDL) CHAPTER 7 DESIGN OF WDDL GATES 71 WDDL GATES 711 WDDL OR GATE 712 WDDL AND GATE 713 WDDL NAND GATE 714 WDDL NOR GATE 715 WDDL XOR GATE CHAPTER 8 FRONT END RESULTS

CHAPTER 9 SUMMARY AND CONCLUSION 91 SUMMARY 92 CONCLUSION

CHAPTER 10 REFERENCES

4

ABSTRACT

Every electronic device needs security from the smallest RFID tags to the larger

hand held devices Security is needed for financial medical consumer automotive

applications and other applications Small-embedded integrated circuits (ICs) such as smart

cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a

class of attacks that derive information from the integrated circuits while it is in operation The

attacker can gain information by monitoring the power consumption execution time

electromagnetic radiation and other information leaked by the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend

on values of data andor key show what they are doing Simple timing or power attacks give

visual information on the circuit This project presents a digital very large scale integrated

(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for

this problem is that standard CMOS is power efficient and it will only consume dynamic

power when nodes are switching

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic

Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

5

Figure WDDL Pre-charge wave generation

6

CHAPTER 1 INTRODUCTION

AND OBJECTIVE

11 Introduction

Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-

called side-channel attacks (SCAs) The attacker can gain information by monitoring the power

consumption execution time electromagnetic radiation and other information leaked by the

switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This

project presents a digital very large scale integrated (VLSI) design flow to create secure power-

analysis-attack-resistant ICs

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called Wave Dynamic

Differential Logic (WDDL) is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

Depending on the parameter considered the side-channel attacks are classified as

probing attacks fault induction attack timing attack power analysis attack electromagnetic

analysis attack etc One Side Channel Attack in particular namely the Differential Power

Analysis (DPA) is of great concern It is very effective in finding the secret key and can be

mounted quickly with off-the-shelf devices The attack is based on the fact that logic

operations have power characteristics that depend on the input data It relies on statistical

analysis to extract the information from the power consumption that is correlated to the secret

key As the variations actually originate at the logic level implementing the encryption and

decryption modules in a logic style for which a logic gate has at all times constant power

7

consumption independently of signal transitions removes the foundation of DPA and is an

effective means to halt DPA

12 Objective of the Project

The main objectives of this dissertation are

Study of constant-power logic styles

Description of WDDL Gates

Implementation of WDDL Logic Gates

Verification of the functionality of WDDL Logic Gates

Synthesis of the design

Analysis of the reports obtained during simulation and synthesis

8

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 2: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Date __________

CERTIFICATE

This is to certify that the mini project entitled ldquoIMPLEMENTATION OF WDDL GATES FOR SECURE IC APPLICATIONSrdquo was successfully carried out by

R SATISH KUMAR 06R01A0440

M SHESHU BINDU 06R01A0441

M SRAVAN KUMAR 06R01A0444

In partial fulfillment of the requirement for the award of Bachelor of Technology in ldquoElectronics and communication Engineeringrdquo

from ldquoJawaharlal Nehru Technological Universityrdquo during academic year 2006 - 2010

Internal Guide Head of the DepartmentS KHAJA MOHIDDIN MTech ProfK Ramanaiah

External Examiner PRINICIPAL Dr M Janga Reddy

2

ACKNOWLEDGMENT

We express our sincere thanks to the management of VEDIC

SCHOOL OF VLSI DESIGN for giving us this opportunity to work in their

organization

We express our immense gratitude to MrMRKArjun FPGA Design

Engineer(Simpli5ng Semiconductor PvtLtd) his inspiring remarks and

simulating guidance valuable suggestion and encouragement helped us greatly in

completion of our project ldquoIMPLEMENTATION OF WDDL GATES FOR

SECURE IC APPLICATIONSrdquo

We wish to thank internal guide of our project Mr S KHAJA

MOHIDDIN Department of Electronics for his constant inspiration and advice

throughout our project work

We express our sincere gratitude to respected Mr JANGA REDDY Pricipal of

CMRIT and Mr K RAMANAIAH HOD of ECE department for their

valuable guidance encouragement and suggestions

3

INDEX

ABSTRACT

CHAPTER 1 INTRODUCTION AND OBJECTIVE

11 INTRODUCTION 12 OBJECTIVE CHAPTER 2 REVIEW OF LITERATURE 21 INTRODUCTION TO DIGITAL DESIGN FLOW 22 SECURE DIGITAL DESIGN FLOW

CHAPTER 3 HARDWARE DESCRIPTION LANGUAGE (VHDL) CHAPTER 4 SMART CARD OVERVIEW

CHAPTER 5 SIDE CHANNEL ATTACKS

51 CLASSIFICATION FO SIDE CHANNEL ATTACKS 52 POWER ANALYSIS ATTACKS 521 SIMPLE POWER ANALYSIS (SPA) 522 DIFFERENTIAL POWER ANALYSIS (DPA) CHAPTER 6 CONSTANTndashPOWER CONSUMING LOGIC STYLES 61 CURRENT MODE LOGIC 62 VOLTAGE MODE LOGIC (CMOS CIRCUIT STYLES) 63 DYNAMIC DIFFERENTIAL LOGIC 631 SENSE AMPLIFIER BASED LOGIC (SABL) 632 WAVE DYNAMIC DIFFERENTIAL LOGIC GATES

(WDDL) CHAPTER 7 DESIGN OF WDDL GATES 71 WDDL GATES 711 WDDL OR GATE 712 WDDL AND GATE 713 WDDL NAND GATE 714 WDDL NOR GATE 715 WDDL XOR GATE CHAPTER 8 FRONT END RESULTS

CHAPTER 9 SUMMARY AND CONCLUSION 91 SUMMARY 92 CONCLUSION

CHAPTER 10 REFERENCES

4

ABSTRACT

Every electronic device needs security from the smallest RFID tags to the larger

hand held devices Security is needed for financial medical consumer automotive

applications and other applications Small-embedded integrated circuits (ICs) such as smart

cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a

class of attacks that derive information from the integrated circuits while it is in operation The

attacker can gain information by monitoring the power consumption execution time

electromagnetic radiation and other information leaked by the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend

on values of data andor key show what they are doing Simple timing or power attacks give

visual information on the circuit This project presents a digital very large scale integrated

(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for

this problem is that standard CMOS is power efficient and it will only consume dynamic

power when nodes are switching

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic

Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

5

Figure WDDL Pre-charge wave generation

6

CHAPTER 1 INTRODUCTION

AND OBJECTIVE

11 Introduction

Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-

called side-channel attacks (SCAs) The attacker can gain information by monitoring the power

consumption execution time electromagnetic radiation and other information leaked by the

switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This

project presents a digital very large scale integrated (VLSI) design flow to create secure power-

analysis-attack-resistant ICs

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called Wave Dynamic

Differential Logic (WDDL) is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

Depending on the parameter considered the side-channel attacks are classified as

probing attacks fault induction attack timing attack power analysis attack electromagnetic

analysis attack etc One Side Channel Attack in particular namely the Differential Power

Analysis (DPA) is of great concern It is very effective in finding the secret key and can be

mounted quickly with off-the-shelf devices The attack is based on the fact that logic

operations have power characteristics that depend on the input data It relies on statistical

analysis to extract the information from the power consumption that is correlated to the secret

key As the variations actually originate at the logic level implementing the encryption and

decryption modules in a logic style for which a logic gate has at all times constant power

7

consumption independently of signal transitions removes the foundation of DPA and is an

effective means to halt DPA

12 Objective of the Project

The main objectives of this dissertation are

Study of constant-power logic styles

Description of WDDL Gates

Implementation of WDDL Logic Gates

Verification of the functionality of WDDL Logic Gates

Synthesis of the design

Analysis of the reports obtained during simulation and synthesis

8

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 3: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

ACKNOWLEDGMENT

We express our sincere thanks to the management of VEDIC

SCHOOL OF VLSI DESIGN for giving us this opportunity to work in their

organization

We express our immense gratitude to MrMRKArjun FPGA Design

Engineer(Simpli5ng Semiconductor PvtLtd) his inspiring remarks and

simulating guidance valuable suggestion and encouragement helped us greatly in

completion of our project ldquoIMPLEMENTATION OF WDDL GATES FOR

SECURE IC APPLICATIONSrdquo

We wish to thank internal guide of our project Mr S KHAJA

MOHIDDIN Department of Electronics for his constant inspiration and advice

throughout our project work

We express our sincere gratitude to respected Mr JANGA REDDY Pricipal of

CMRIT and Mr K RAMANAIAH HOD of ECE department for their

valuable guidance encouragement and suggestions

3

INDEX

ABSTRACT

CHAPTER 1 INTRODUCTION AND OBJECTIVE

11 INTRODUCTION 12 OBJECTIVE CHAPTER 2 REVIEW OF LITERATURE 21 INTRODUCTION TO DIGITAL DESIGN FLOW 22 SECURE DIGITAL DESIGN FLOW

CHAPTER 3 HARDWARE DESCRIPTION LANGUAGE (VHDL) CHAPTER 4 SMART CARD OVERVIEW

CHAPTER 5 SIDE CHANNEL ATTACKS

51 CLASSIFICATION FO SIDE CHANNEL ATTACKS 52 POWER ANALYSIS ATTACKS 521 SIMPLE POWER ANALYSIS (SPA) 522 DIFFERENTIAL POWER ANALYSIS (DPA) CHAPTER 6 CONSTANTndashPOWER CONSUMING LOGIC STYLES 61 CURRENT MODE LOGIC 62 VOLTAGE MODE LOGIC (CMOS CIRCUIT STYLES) 63 DYNAMIC DIFFERENTIAL LOGIC 631 SENSE AMPLIFIER BASED LOGIC (SABL) 632 WAVE DYNAMIC DIFFERENTIAL LOGIC GATES

(WDDL) CHAPTER 7 DESIGN OF WDDL GATES 71 WDDL GATES 711 WDDL OR GATE 712 WDDL AND GATE 713 WDDL NAND GATE 714 WDDL NOR GATE 715 WDDL XOR GATE CHAPTER 8 FRONT END RESULTS

CHAPTER 9 SUMMARY AND CONCLUSION 91 SUMMARY 92 CONCLUSION

CHAPTER 10 REFERENCES

4

ABSTRACT

Every electronic device needs security from the smallest RFID tags to the larger

hand held devices Security is needed for financial medical consumer automotive

applications and other applications Small-embedded integrated circuits (ICs) such as smart

cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a

class of attacks that derive information from the integrated circuits while it is in operation The

attacker can gain information by monitoring the power consumption execution time

electromagnetic radiation and other information leaked by the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend

on values of data andor key show what they are doing Simple timing or power attacks give

visual information on the circuit This project presents a digital very large scale integrated

(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for

this problem is that standard CMOS is power efficient and it will only consume dynamic

power when nodes are switching

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic

Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

5

Figure WDDL Pre-charge wave generation

6

CHAPTER 1 INTRODUCTION

AND OBJECTIVE

11 Introduction

Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-

called side-channel attacks (SCAs) The attacker can gain information by monitoring the power

consumption execution time electromagnetic radiation and other information leaked by the

switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This

project presents a digital very large scale integrated (VLSI) design flow to create secure power-

analysis-attack-resistant ICs

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called Wave Dynamic

Differential Logic (WDDL) is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

Depending on the parameter considered the side-channel attacks are classified as

probing attacks fault induction attack timing attack power analysis attack electromagnetic

analysis attack etc One Side Channel Attack in particular namely the Differential Power

Analysis (DPA) is of great concern It is very effective in finding the secret key and can be

mounted quickly with off-the-shelf devices The attack is based on the fact that logic

operations have power characteristics that depend on the input data It relies on statistical

analysis to extract the information from the power consumption that is correlated to the secret

key As the variations actually originate at the logic level implementing the encryption and

decryption modules in a logic style for which a logic gate has at all times constant power

7

consumption independently of signal transitions removes the foundation of DPA and is an

effective means to halt DPA

12 Objective of the Project

The main objectives of this dissertation are

Study of constant-power logic styles

Description of WDDL Gates

Implementation of WDDL Logic Gates

Verification of the functionality of WDDL Logic Gates

Synthesis of the design

Analysis of the reports obtained during simulation and synthesis

8

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 4: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

INDEX

ABSTRACT

CHAPTER 1 INTRODUCTION AND OBJECTIVE

11 INTRODUCTION 12 OBJECTIVE CHAPTER 2 REVIEW OF LITERATURE 21 INTRODUCTION TO DIGITAL DESIGN FLOW 22 SECURE DIGITAL DESIGN FLOW

CHAPTER 3 HARDWARE DESCRIPTION LANGUAGE (VHDL) CHAPTER 4 SMART CARD OVERVIEW

CHAPTER 5 SIDE CHANNEL ATTACKS

51 CLASSIFICATION FO SIDE CHANNEL ATTACKS 52 POWER ANALYSIS ATTACKS 521 SIMPLE POWER ANALYSIS (SPA) 522 DIFFERENTIAL POWER ANALYSIS (DPA) CHAPTER 6 CONSTANTndashPOWER CONSUMING LOGIC STYLES 61 CURRENT MODE LOGIC 62 VOLTAGE MODE LOGIC (CMOS CIRCUIT STYLES) 63 DYNAMIC DIFFERENTIAL LOGIC 631 SENSE AMPLIFIER BASED LOGIC (SABL) 632 WAVE DYNAMIC DIFFERENTIAL LOGIC GATES

(WDDL) CHAPTER 7 DESIGN OF WDDL GATES 71 WDDL GATES 711 WDDL OR GATE 712 WDDL AND GATE 713 WDDL NAND GATE 714 WDDL NOR GATE 715 WDDL XOR GATE CHAPTER 8 FRONT END RESULTS

CHAPTER 9 SUMMARY AND CONCLUSION 91 SUMMARY 92 CONCLUSION

CHAPTER 10 REFERENCES

4

ABSTRACT

Every electronic device needs security from the smallest RFID tags to the larger

hand held devices Security is needed for financial medical consumer automotive

applications and other applications Small-embedded integrated circuits (ICs) such as smart

cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a

class of attacks that derive information from the integrated circuits while it is in operation The

attacker can gain information by monitoring the power consumption execution time

electromagnetic radiation and other information leaked by the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend

on values of data andor key show what they are doing Simple timing or power attacks give

visual information on the circuit This project presents a digital very large scale integrated

(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for

this problem is that standard CMOS is power efficient and it will only consume dynamic

power when nodes are switching

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic

Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

5

Figure WDDL Pre-charge wave generation

6

CHAPTER 1 INTRODUCTION

AND OBJECTIVE

11 Introduction

Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-

called side-channel attacks (SCAs) The attacker can gain information by monitoring the power

consumption execution time electromagnetic radiation and other information leaked by the

switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This

project presents a digital very large scale integrated (VLSI) design flow to create secure power-

analysis-attack-resistant ICs

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called Wave Dynamic

Differential Logic (WDDL) is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

Depending on the parameter considered the side-channel attacks are classified as

probing attacks fault induction attack timing attack power analysis attack electromagnetic

analysis attack etc One Side Channel Attack in particular namely the Differential Power

Analysis (DPA) is of great concern It is very effective in finding the secret key and can be

mounted quickly with off-the-shelf devices The attack is based on the fact that logic

operations have power characteristics that depend on the input data It relies on statistical

analysis to extract the information from the power consumption that is correlated to the secret

key As the variations actually originate at the logic level implementing the encryption and

decryption modules in a logic style for which a logic gate has at all times constant power

7

consumption independently of signal transitions removes the foundation of DPA and is an

effective means to halt DPA

12 Objective of the Project

The main objectives of this dissertation are

Study of constant-power logic styles

Description of WDDL Gates

Implementation of WDDL Logic Gates

Verification of the functionality of WDDL Logic Gates

Synthesis of the design

Analysis of the reports obtained during simulation and synthesis

8

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 5: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

ABSTRACT

Every electronic device needs security from the smallest RFID tags to the larger

hand held devices Security is needed for financial medical consumer automotive

applications and other applications Small-embedded integrated circuits (ICs) such as smart

cards are vulnerable to the so-called side-channel attacks (SCAs) Side channel attacks are a

class of attacks that derive information from the integrated circuits while it is in operation The

attacker can gain information by monitoring the power consumption execution time

electromagnetic radiation and other information leaked by the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Eg execution times that depend

on values of data andor key show what they are doing Simple timing or power attacks give

visual information on the circuit This project presents a digital very large scale integrated

(VLSI) design flow to create secure power-analysis-attack-resistant ICs The route cause for

this problem is that standard CMOS is power efficient and it will only consume dynamic

power when nodes are switching

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called ldquoWave Dynamic

Differential Logic (WDDL)rdquo is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

5

Figure WDDL Pre-charge wave generation

6

CHAPTER 1 INTRODUCTION

AND OBJECTIVE

11 Introduction

Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-

called side-channel attacks (SCAs) The attacker can gain information by monitoring the power

consumption execution time electromagnetic radiation and other information leaked by the

switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This

project presents a digital very large scale integrated (VLSI) design flow to create secure power-

analysis-attack-resistant ICs

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called Wave Dynamic

Differential Logic (WDDL) is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

Depending on the parameter considered the side-channel attacks are classified as

probing attacks fault induction attack timing attack power analysis attack electromagnetic

analysis attack etc One Side Channel Attack in particular namely the Differential Power

Analysis (DPA) is of great concern It is very effective in finding the secret key and can be

mounted quickly with off-the-shelf devices The attack is based on the fact that logic

operations have power characteristics that depend on the input data It relies on statistical

analysis to extract the information from the power consumption that is correlated to the secret

key As the variations actually originate at the logic level implementing the encryption and

decryption modules in a logic style for which a logic gate has at all times constant power

7

consumption independently of signal transitions removes the foundation of DPA and is an

effective means to halt DPA

12 Objective of the Project

The main objectives of this dissertation are

Study of constant-power logic styles

Description of WDDL Gates

Implementation of WDDL Logic Gates

Verification of the functionality of WDDL Logic Gates

Synthesis of the design

Analysis of the reports obtained during simulation and synthesis

8

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 6: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Figure WDDL Pre-charge wave generation

6

CHAPTER 1 INTRODUCTION

AND OBJECTIVE

11 Introduction

Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-

called side-channel attacks (SCAs) The attacker can gain information by monitoring the power

consumption execution time electromagnetic radiation and other information leaked by the

switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This

project presents a digital very large scale integrated (VLSI) design flow to create secure power-

analysis-attack-resistant ICs

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called Wave Dynamic

Differential Logic (WDDL) is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

Depending on the parameter considered the side-channel attacks are classified as

probing attacks fault induction attack timing attack power analysis attack electromagnetic

analysis attack etc One Side Channel Attack in particular namely the Differential Power

Analysis (DPA) is of great concern It is very effective in finding the secret key and can be

mounted quickly with off-the-shelf devices The attack is based on the fact that logic

operations have power characteristics that depend on the input data It relies on statistical

analysis to extract the information from the power consumption that is correlated to the secret

key As the variations actually originate at the logic level implementing the encryption and

decryption modules in a logic style for which a logic gate has at all times constant power

7

consumption independently of signal transitions removes the foundation of DPA and is an

effective means to halt DPA

12 Objective of the Project

The main objectives of this dissertation are

Study of constant-power logic styles

Description of WDDL Gates

Implementation of WDDL Logic Gates

Verification of the functionality of WDDL Logic Gates

Synthesis of the design

Analysis of the reports obtained during simulation and synthesis

8

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 7: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

CHAPTER 1 INTRODUCTION

AND OBJECTIVE

11 Introduction

Small-embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-

called side-channel attacks (SCAs) The attacker can gain information by monitoring the power

consumption execution time electromagnetic radiation and other information leaked by the

switching behavior of digital complementary metalndashoxidendashsemiconductor (CMOS) gates This

project presents a digital very large scale integrated (VLSI) design flow to create secure power-

analysis-attack-resistant ICs

The idea is to create digital circuit styles that have a switching behavior independent of

the data or sequence of the data that they are processing A logic style called Wave Dynamic

Differential Logic (WDDL) is used for the implementation of the basic logic gates which are

used in the cryptographic processors The design flow starts from a normal design in a

hardware description language such as VHDL to the Side Channel Attack (SCA) resistant

layout

Depending on the parameter considered the side-channel attacks are classified as

probing attacks fault induction attack timing attack power analysis attack electromagnetic

analysis attack etc One Side Channel Attack in particular namely the Differential Power

Analysis (DPA) is of great concern It is very effective in finding the secret key and can be

mounted quickly with off-the-shelf devices The attack is based on the fact that logic

operations have power characteristics that depend on the input data It relies on statistical

analysis to extract the information from the power consumption that is correlated to the secret

key As the variations actually originate at the logic level implementing the encryption and

decryption modules in a logic style for which a logic gate has at all times constant power

7

consumption independently of signal transitions removes the foundation of DPA and is an

effective means to halt DPA

12 Objective of the Project

The main objectives of this dissertation are

Study of constant-power logic styles

Description of WDDL Gates

Implementation of WDDL Logic Gates

Verification of the functionality of WDDL Logic Gates

Synthesis of the design

Analysis of the reports obtained during simulation and synthesis

8

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 8: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

consumption independently of signal transitions removes the foundation of DPA and is an

effective means to halt DPA

12 Objective of the Project

The main objectives of this dissertation are

Study of constant-power logic styles

Description of WDDL Gates

Implementation of WDDL Logic Gates

Verification of the functionality of WDDL Logic Gates

Synthesis of the design

Analysis of the reports obtained during simulation and synthesis

8

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 9: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

CHAPTER 2 REVIEW

OF LITERATURE

21 Introduction to Digital Design Flow

A typical digital design flow for any IC is as follows Design Entry (Specification

Architecture RTL Coding and RTL Verification) Synthesis and post-synthesis

verification Backend (Floor Planning Place and route Layout) Tape Out to Foundry to get

the end product All modern digital designs start with a designer writing a hardware description

of the IC (using HDL or Hardware Description Language) in VerilogVHDL A Verilog or

VHDL program essentially describes the hardware (logic gates Flip-Flops counters etc) the

inter connect of the circuit blocks and the functionality Various CAD tools are available to

synthesize a circuit based on the HDL

22 Secure Digital Design Flow

The secure digital design flow is depicted in Fig In addition to the

regular steps in an IC design (logic design logic synthesis place amp route

stream out and verifications) one can recognize two additional steps

namely 1) ldquocell substitutionrdquo and 2) ldquointerconnect decompositionrdquo These

operations have been inserted in the back end of the flow and do not

interfere with the creative part of a design indicated by the ldquologic designrdquo

task

9

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 10: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Figure 21 Secure Digital Design Flow

During the cell substitution step cells that are designed by any constant power logic style

replace the conventional CMOS gates This ensures the security of the ICs against power

analysis attacks

10

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 11: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

CHAPTER 3 HARDWARE DESCRIPTIVE

LANGUAGE (VHDL)

Why (V) HDL

Interoperability

Technology independence

Design reuse

Several levels of abstraction

Readability

Standard language

Widely supported

What is VHDL

VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed IC)

Design specification language

Design entry language

Design simulation language

Design documentation language

An alternative to schematics

Brief History

VHDL Was developed in the early 1980s for managing design problems that involved

large circuits and multiple teams of engineers

Funded by US Department of Defence

11

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 12: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

The first publicly available version was released in 1985

In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was presented

with a proposal to standardize the VHDL

In 1987 standardization =gt IEEE 1076-1987

An improved version of the language was released in 1994 =gt IEEE standard1076-

1993

Related Standards

IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-

impedance

Soon after IEEE 1076-1987 was released simulator companies began using their own

non-standard types =gt VHDL was becoming a nonstandard

IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains definitions for a

nine-valued data type std_logic

IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate to actual

hardware

Defines eg two numeric types signed and unsigned

VHDL Environment

12

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 13: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Design Units

Segments of VHDL code can be compiled separately and stored in a library

Entities

A black box with interface definition

Defines the inputsoutputs of a component (define pins)

A way to represent modularity in VHDL

Similar to symbol in schematic

Entity declaration describes entity

Eg

Entity Comparator is

Port (A B in std_logic_vector (7 downto0)

EQ out std_logic)

end Comparator

13

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 14: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Ports

Provide channels of communication between the component and its environment

Each port must have a name direction and a type

An entity may have NO port declaration

Port directions

In A value of a port can be read inside the component but cannot be assigned

Multiple reads of port are allowed

Out Assignments can be made to a port but data from a port cannot be read Multiple

assignments are allowed

In out Bi-directional assignments can be made and data can be read Multiple

assignments are allowed

Buffer An out port with read capability May have at most one assignment (are not

recommended)

Architectures

Every entity has at least one architecture

One entity can have several architectures

Architectures can describe design using

BehaviorndashStructurendashDataflow

Architectures can describe design on many levelsndashGate levelndashRTL (Register Transfer

Level)ndashBehavioral level

Configuration declaration links architecture to entity

Eg

Architecture Comparator1 of Comparator is

Begin

EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo

End Comparator1

Configurations

Links entity declaration and architecture body together

14

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 15: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Concept of default configuration is a bit messy in VHDL lsquo87

ndashLast architecture analyzed links to entity

Can be used to change simulation behavior without re-analyzing the VHDL source

Complex configuration declarations are ignored in synthesis

Some entities can have eggate level architecture and behavioral architecture

Are always optional

Packages

Packages contain information common to many design units

1 Package declaration

Constant declarations

ndash Type and subtype declarations

ndash Function and procedure declarations

ndash Global signal declarations

ndash File declarations

ndash Component declarations

2 Package body

ndash Is not necessary needed

ndash Function bodies

ndash Procedure bodies

Packages are meant for encapsuling data which can be shared globally among several design

units These consist of declaration part and optional body part

Package declaration can contain

ndash Type and subtype declarations

ndash Subprograms

ndash Constants

ndash Alias declarations

ndash Global signal declarations

ndash file declarations

ndash Component declarations

Package body consists of

15

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 16: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

ndash Subprogram declarations and bodies

ndash Type and subtype declarations

ndash Deferred constants

ndash File declarations

Libraries

Collection of VHDL design units (database)

1 Packages

package declaration

package body

2 Entities (entity declaration)

3 Architectures (architecture body)

4 Configurations (configuration declarations)

Usually directory in UNIX file system

Can be also any other kind of database

Levels of Abstraction

VHDL supports many possible styles of design description which differ primarily in how

closely they relate to the HW

It is possible to describe a circuit in a number of ways

Structural-------

Dataflow ------- Higher level of abstraction

Behavioral -------

Structural VHDL description

Circuit is described in terms of its components

From a low-level description (eg transistor-level description) to a high level

description (eg block diagram)

For large circuits low-level descriptions quickly become impractical

Dataflow VHDL Description

Circuit is described in terms of how data moves through the system

16

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 17: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

In the dataflow style you describe how information flows between registers in the

system

The combinational logic is described at a relatively high level the placement and

operation of registers is specified quite precisely

The behavior of the system over the time is defined by registers

There are no build-in registers in VHDL-language

ndashEither lower level description

ndashor behavioral description of sequential elements is needed

The lower level register descriptions must be created or obtained

If there is no 3rd party models for registers =gt you must write the behavioral

description of registers

The behavioral description can be provided in the form of subprograms(functions or

procedures)

Behavioral VHDL Description

Circuit is described in terms of its operation over time

Representation might include eg state diagrams timing diagrams and algorithmic

descriptions

The concept of time may be expressed precisely using delays (eg A lt= B after 10 ns)

If no actual delays are used order of sequential operations is defined

17

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 18: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed timing

specifications

The actual timing results depend on implementation technology and efficiency of

synthesis tool

There are a few tools for behavioral synthesis

Concurrent Vs Sequential

Processes

Basic simulation concept in VHDL

VHDL description can always be broken up to interconnected processes

Quite similar to UNIX process

18

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 19: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Process keyword in VHDL

Process statement is concurrent statement

Statements inside process statements are sequential statements

Process must contain either sensitivity list or wait statement(s) but NOT both

Sensitivity list or wait statement(s) contains signals which wakes process up

General Format

Process [(sensitivity list)]

process_declarative_part

begin

process_statements

[wait_statement]

End process

19

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 20: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

CHAPTER 4 SMART

CARD OVERVIEW

This section will very briefly introduce the concept of a smart card Basically a smart

card is a computer embedded in a safe It consists of a (typically 8-bit or 32-bit) processor

together with ROM EEPROM and a small amount of RAM which is therefore capable of

performing computations The main goal of a smart card is to allow the execution of

cryptographic operations involving some secret parameter (the key) while not revealing this

parameter to the outside world As opposed the goal of the attacker is to recover this secret

parameter This processor is embedded in a chip and connected to the outside world through

eight wires the role use position of which is normalized In addition to the inputoutput wires

the parts we will be the most interested in are the following

1 Power supply Smart cards do not have an internal battery

2 The current they need is provided by the smart card reader This will make the smart

cards power consumption pretty easy to measure for the attacker

3 Clock Similarly smart cards do not dispose of an internal clock either The clock ticks

must also be provided from the outside world As a consequence this will allow the

attacker to measure the cards running time with very good precision

Smart cards are usually equipped with protection mechanisms composed of a shield (the

passivation layer) whose goal is to hide the internal behavior of the chip and possibly sensors

that react when the shield is removed by destroying all sensitive data and preventing the card

to function properly

20

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 21: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

CHAPTER 5 SIDE

CHANNEL ATTACKS

ldquoSide channel attacksrdquo are attacks that are based on ldquoSide Channel Informationrdquo Side

channel information is information that can be retrieved from the encryption device that is

neither the plaintext to be encrypted nor the cipher text resulting from the encryption process

In the past an encryption device was perceived as a unit that receives plaintext input

and produces cipher text output and vice-versa Attacks were therefore based on either

knowing the cipher text (such as cipher text-only attacks) or knowing both (such as known

plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing

the results of the encryption (known as chosen plaintext attacks) Today it is known that

encryption devices have additional output and often additional inputs which are not the

plaintext or the cipher text

Encryption devices produce timing information (information about the time that

operations take) that is easily measurable radiation of various sorts power consumption

statistics (that can be easily measured as well) and more Often the encryption device also has

additional ldquounintentionalrdquo inputs such as voltage that can be modified to cause predictable

outcomes Side channel attacks make use of some or all of this information along with other

(known) cryptanalytic techniques to recover the key the device is using

Side channel analysis techniques are of concern because the attacks can be mounted

quickly and can sometimes be implemented using readily available hardware costing from only

a few hundred dollars to thousands of dollars

51 Classification of side channel attacks

The literature usually classifies side channel attacks along two orthogonal axes

1 Invasive vs Non-invasive

21

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 22: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Invasive attacks require de-packaging the chip to get direct access to its components

A typical example of this is the connection of a wire on a data bus to see the data transfers

A non-invasive attack only exploits externally available information (the emission of

which is however often unintentional) such as running time power consumption

A new distinction called semi-invasive attacks These attacks have the specificity that

they require de-packaging of the chip to get access to the chip surface but do not tamper with

the passivation layer ( they do not require electrical contact to the metal surface)

2 Active vs passive

Active attacks try to tamper with the cards proper functioning For example fault

induction attacks will try to induce errors in the computation

As opposed passive attacks will simply observe the cards behavior during its

processing without disturbing it

Note that these two axes are well orthogonal

An Invasive attack may completely avoid disturbing the cards behavior and a passive

attack may require a preliminary de-packaging for the required information to be observable

These attacks are of course not mutually exclusive an invasive attack may for example serve

as a preliminary step for a non-invasive one by giving a detailed description of the chips

architecture that helps to find out where to put external probes

As smart cards are usually equipped with protection mechanisms that are supposed to

react to invasive attacks (although several invasive attacks are nonetheless capable to defeat

these mechanisms as will be illustrated below) On the other hand it is worth pointing out that

a non-invasive attack is completely undetectable there is for example no way for a smart card

to figure out that its running time is currently being measured Other countermeasures will

therefore be necessary From an economical point of view invasive attacks are usually more

expensive to deploy on a large scale since they require individual processing of each attacked

device In this sense non-invasive attacks constitute therefore a bigger menace for the smart

card industry

Invasive attacks involved a relatively high capital investment for lab equipment plus a

moderate investment of effort for each individual chip attacked Non-invasive attacks require

only a moderate capital investment plus a moderate investment of effort in designing an attack

on a particular type of device Thereafter the cost per device attacked is low Semi-invasive

attacks can be carried out using very cheap and simple equipment

The attacker can gain information by

22

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 23: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

1 Probing attacks

2 Fault induction attacks

3 Timing attacks

4 Power analysis attacks and

5 Electromagnetic timing attacks

These attacks are performed during the switching behavior of digital

complementary metalndashoxidendashsemiconductor (CMOS) gates Of all these Power analysis attack

is of major concern

52 Power analysis attacks

The power consumption of a cryptographic device may provide much information

about the operations that take place and the involved parameters This is the idea of simple and

differential power analysis first introduced by Kocher et al As the clock ticks the cards

energy is also provided by the terminal and can therefore easily be measured Basically to

measure a circuits power consumption a small (eg 50 ohm) resistor is inserted in series with

the power or ground input The voltage difference across the resistor divided by the resistance

yields the current Well-equipped electronics labs have equipment that can digitally sample

voltage differences at extraordinarily high rates (over 1GHz) with excellent accuracy (less than

1 error) Devices capable of sampling at 20MHz or faster and transferring the data to a PC

can be bought for less than US$ 400

Power analysis attacks are of two types

1 Simple power analysis attack and

2 Differential Power Analysis attack

SPA attacks on smartcards typically take a few seconds per card while DPA attacks

can take several hours In a general with a somewhat academic perspective we may consider

the entire internal state of the block cipher to be all the intermediate results and values that are

never included in the output in normal operations For example DES has 16 rounds we can

consider the intermediate states state [115] after each round except the last as a secret internal

state Side channels typically give information about these internal states or about the

operations used in the transition of this internal state from one round to another The type of

side-channel will of course determine what information is available to the attacker about these

states The attacks typically work by finding some information about the internal state of the

cipher which can be learned both by guessing part of the key and checking the value directly

23

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 24: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

and additionally by some statistical property of the cipher that makes that checkable value

slightly nonrandom

521 Simple Power Analysis attack (SPA)

Simple Power Analysis is generally based on looking at the visual representation of the

power consumption of a unit while an encryption operation is being performed Simple Power

Analysis is a technique that involves direct interpretation of power consumption measurements

collected during cryptographic operations SPA can yield information about a devices

operation as well as key material

A trace refers to a set of power consumption measurements taken across a

cryptographic operation For example a 1 millisecond operation sampled at 5 MHz yields a

trace containing 5000 points Figure for example shows an SPA trace from a smart card

performing a DES operation

Figure SPA monitoring from a single DES operation performed by a typical smart card The

upper trace shows the entire encryption operation including the initial permutation the 16

DES rounds and the final permutation The lower trace is a detailed view of the second and

third rounds

Because SPA can reveal the sequence of instructions executed it can be used to break

cryptographic implementations in which the execution path depends on the data being

processed For example

DES key schedule the DES key schedule computation involves rotating 28-bit key registers

A conditional branch is commonly used to check the bit shifted off the end so that ldquo1 bits can

24

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 25: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

be wrapped around The resulting power consumption traces for a ldquo1 bit and a ldquo0 bit will

contain different SPA features if the execution paths take different branches for each

DES permutations DES implementations perform a variety of bit permutations Conditional

branching in software or microcode can cause significant power consumption differences for

ldquo0 and ldquo1 bits

Comparisons String or memory comparison operations typically perform a conditional

branch when a mismatch is found This conditional branching causes large SPA (and

sometimes timing) characteristics

Multipliers Modular multiplication circuits tend to leak a great deal of information about the

data they process The leakage functions depend on the multiplier design but are often strongly

correlated to operand values and Hamming weights

Exponentiators A simple modular exponentiation function scans across the exponent

performing a squaring operation in every iteration with an additional multiplication operation

for each exponent bit that is equal to ldquo1 The exponent can be compromised if squaring and

multiplication operations have different power consumption characteristics take different

amounts of time or are separated by different code Modular exponentiation functions that

operate on two or more exponent bits at a time may have more complex leakage functions

522Differential Power Analysis attack (DPA)

In addition to large-scale power variations due to the instruction sequence there are

effects correlated to data values being manipulated These variations tend to be smaller and are

sometimes overshadowed by measurement errors and other noise In such cases it is still often

possible to break the system using statistical functions tailored to the target algorithm

To implement the DPA attack an attacker first observes m encryption operations and captures

power traces T1 m [1 k] containing k samples each In addition the attacker records the

cipher text C1 m No knowledge of the plain text is required DPA analysis uses power

consumption measurements to determine whether a key block guess Ks is correct The attacker

computes a k-sample differential trace centD [1 k] by finding the difference between the

average of the traces for which a certain intermediate value V is one and the average of the

traces for which V is zero Thus cent D[j) is the average over C1m of the effect due to the value

represented by the selection function D on the power consumption at point j In particular25

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 26: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

If Ks is incorrect the bit computed using D will differ from the actual target bit for about half

of the ciphertext Ci The selection function is thus effectively uncorrelated to what was actually

computed by the target device If a random function is used to divide a set into two subsets the

difference in the averages of the subsets should approach zero as the subset sizes approach

infinity

Thus because trace components uncorrelated to D will diminish with 1 pm causing the

differential trace to become at (the actual trace may not be completely at as D with Ks

incorrect may have a weak correlation to D with the correct Ks) If Ks is correct however the

computed value for D (Ci bKs) will equal the actual value of target bit b with probability 1

The selection function is thus correlated to the value of the bit considered Other data values

measurement errors etc that are not correlated to D approach zero Because power

consumption is correlated to data bit values the plot of centD will be degat with spikes in regions

where D is correlated to the values being processed The correct value of Ks can thus be

identified from the spikes in its differential trace Four values of b correspond to each S box

providing confirmation of key block guesses Finding all eight Ks yields the entire 48-bit round

sub key The remaining 8 key bits can be found easily using exhaustive search or by analyzing

one additional round Triple DES keys can be found by analyzing an outer DES operation first

using the resulting key to decrypt the cipher text and attacking the next DES key DPA can use

known plaintext or known cipher text and can find encryption or decryption keys

26

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 27: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

CHAPTER 6 CONSTANT POWER CONSUMING

LOGIC STYLES

The power consumption of traditional standard cells and logic is

dependent on the signal activity When the output of the logic gate makes

a 0 to 1 transition a current comes from the power supply and charges the

output capacitance On the other hand when the output sees a 1 to 0 a 0

to 0 or a 1 to 1 transition no or only a limited amount of energy (due to

short circuit or leakage) is consumed from the power supply This is the

fundamental reason why information is leaked through the power supply

and why power attacks are possible The basis of a secure digital design

flow is a logic style with constant power consumption

61 Current Mode Logic

Current mode logic (CML) eg current steering logic seems the

ideal solution This type of logic continuously draws a current from the

supply and measures its state through the path that the current takes A

gate has constant power consumption if it draws a perfectly constant

current from the power supply independently of the input and output

signals To build a current source capable of generating a constant current

special circuit techniques that minimize channel length modulation have to

be used

The decisive drawback of CML however is its static power

consumption When the logic gate is not processing any data it burns the

27

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 28: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

current which makes this logic style unacceptable for embedded battery-

operated devices

62 Voltage Mode Logic (CMOS circuit styles)

Voltage mode logic (VML) eg static CMOS logic only draws a current from the

supply to change state and measures its state by the amount of charge it stores on a

capacitance A regular standard CMOS circuit will only consume power when a capacitance

gets charged and later discharged ie when a gate switches state It is the main reason that

CMOS is the style of choice for every battery operated or low power device This is illustrated

in the figure below for simple inverter Thus static CMOS is the preferred logic style because

of its low power consumption and high noise margins

Standard CMOS inverter

Yet two conditions must be satisfied for VML to have constant power consumption

namely

1) A logic gate must have exactly one switching event per signal transition

2) The logic gate must charge a constant capacitance in that switching event

28

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 29: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Here above all the four transitions of CMOS inverter can be distinguished when

monitoring the power supply

63 Dynamic Differential Logic

Dynamic differential logic sometimes also referred to as dual rail with pre-charge

logic fulfills the first condition A differential logic family uses the true and the false

representation of the input and output signals and a dynamic logic family alternates pre-charge

and evaluation phases As a result since both outputs (true and false) are pre-charged to 1

exactly one of the two output nodes evaluates to 0 to have a differential output signal in the

evaluation phase The discharged output node is charged to 1 in the following pre-charge phase

to pre-charge both outputs to 1 In other words every signal transition including the events in

which the input signals remain constant is represented with an actual switching event in

which the logic gate charges a capacitance All the logic families that have been introduced to

thwart the differential power analysis (DPA) by using dynamic differential logic in the

following techniques

1 Sense Amplifier Based Logic (SABL) and

2 Wave Dynamic Differential Logic (WDDL) gates

631 Sense Amplifier Based logic (SABL)

SABL has its main advantage that it has balanced input and output nodes and that all

internal nodes connect to an output The output capacitances can be balanced Systematic

methods have been developed to make sure that both branches of the differential pull down

network are balanced and that no memory effects are present in the network Sense Amplifier

Based logic is illustrated as

29

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 30: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Sense Amplifier Based Logic

ANDNAND gate

This circuit style does require however a full custom characterization and layout It also

suffers from a high clock load common to all dynamic logic gates

632 Wave Dynamic Differential Logic Gates (WDDL)

WDDL logic can be implemented with static CMOS logic Static CMOS

standard cells are combined to form secure compound standard cells

which have a reduced power signature WDDL has many advantages It can

be readily implemented from an existing standard cell library The design

flow is fully supported with accurate EDA library files that come directly

from the vendor WDDL also results in a dynamic differential logic with only

a small load capacitance on the pre-charge control signal and with the low

power consumption and the high noise margins of static CMOS

Advantages of WDDL logic style are as follows

30

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 31: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

A major advantage of the proposed logic style is that it can be incorporated by the common

Electronic Design Automation (EDA) tool flow

No special design rules are involved in the interconnection of WDDL gates

The switching factor of WDDL is 100 A WDDL gate consists of a parallel

combination of two positive complementary gates one calculating the

true output using the true inputs the other the false output using the

false inputs A positive gate produces a zero output for an all zero input

The AND gate and the OR gate are examples of positive gates A

complementary gate sometimes also referred to as a dual gate

expresses the false output of the original logic gate using the false

inputs of the original gate The AND gate fed with true input signals and

the OR gate fed with false input signals are two dual gates Fig shows

the WDDL AND gate and the WDDL OR gate In the evaluation phase

each input signal is differential and the WDDL gate calculates its

differential output In the pre-charge phase the inputs to the WDDL gate

are set at 0 This puts the output of the gate at 0 A module in WDDL

pre-charges without distributing the pre-charge signal to each individual

gate During the pre-charge phase the input vector of the combinatorial

logic is set at all 0s Each individual gate will eventually have all its

inputs at 0 evaluate its output to 0 and pass this 0 value to the next

gate One could say that the pre-charge signal travels over the

combinatorial logic as a 0-wave hence WDDL There are several ways

to launch to pre-charge wave In Fig a pre-charge operator is inserted

at the start of every combinatorial logic tree ie the inputs of the

encryption module and the outputs of the registers They produce an all-

zero output in the pre-charge phase (clk-signal high) but let the

31

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 32: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

differential signal through during the evaluation phase (clk-signal low)

Fig

ure WDDL Pre-charge wave generationCHAPTER 7

WDDL GATESThe methodology used in the project is bottom-up approach Lower

modules are designed and later integrated to form larger modules whose further integration

leads to the final top module As it is a fact that logic gates form lower level modules

initially logic gates required for the design are implemented in WDDL style WDDL

demands a parallel combination of two positive complementary gates one calculating the

true value and the other negative value The logic gates like OR AND XOR have been

implemented Besides there is even implementation of Full Adder 32-bit XOR

etc71WDDL OR gateA WDDL OR gate is constructed by considering conventional

OR gate in parallel to its complementary gate ie AND gate as shown in the following

32

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 33: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

figure Figure

41 WDDL OR GateThe NAND and NOT gates used act as Precharge operator injecting

signal lsquo0rsquo into the gates when lsquoclkrsquo signal is high ie during the Precharge phase72

WDDL AND gateA WDDL AND gate is constructed by considering conventional

AND gate in parallel to its complementary gate ie OR gate as shown in the following

33

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 34: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

figure Figure

42 WDDL AND Gate73 WDDL NAND gateA WDDL NAND gate is constructed by

considering conventional AND gate in parallel to its complementary gate ie OR gate as

shown in the following figure

34

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 35: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Figure

43 WDDL NAND Gate74 WDDL NOR gateA WDDL NOR gate is constructed by

considering conventional OR gate in parallel to its complementary gate ie AND gate as

shown in the following figure

35

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 36: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Figure 44 WDDL

NOR Gate 75 WDDL XOR gate XOR function can be implemented by the

Boolean Equation A B = ABrsquo + ArsquoB Therefore XOR Gate is implemented

in terms of AND gate and OR gate as shown in the figure 44 It can also be implemented

by instantiating a WDDL AND gate and WDDL OR gate But the number of gates

involved in the latter one is greater than the former one Therefore the first method of

implementation is followed rather than the second one

36

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 37: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Figure 45

WDDL XOR gateWith the help of the above basic gates Full adder circuit has been

designed by instantiating the above designed WDDL gates During the implementation of

the Blowfish algorithm a 32-bit XOR gate and 32-bit Adder circuit are required They can

be easily implemented by instantiating the corresponding lower module 32 number of

timesCHAPTER 8 FRONT END

RESULTSWDDL OR GATESynthesis

Report==========================================================

= Final Report

===========================================================Final

ResultsRTL Top Level Output File Name wddlorngrTop Level Output File Name

wddlorOutput Format NGCOptimization Goal SpeedKeep

Hierarchy NODesign Statistics IOs 5Cell Usage

BELS 2 LUT3 2 IO Buffers 5

37

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 38: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

S

ynthesis Result

38

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 39: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

WDD

L AND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlgatesngrTop Level Output File

Name wddlgatesOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2===========================================================Devic

e utilization summary---------------------------Selected Device 3s250etq144-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 108 4 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

39

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 40: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Sy

nthesis Result

WDDL NAND GATESynthesis

Report==========================================================

== Final Report

============================================================Fina

l ResultsRTL Top Level Output File Name wddlnand1ngrTop Level Output File

Name wddlnand1Output Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

40

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 41: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summarySelected Device 3s500efg320-4 Number of Slices

1 out of 4656 0 Number of 4 input LUTs 2 out of 9312 0

Number of IOs 5 Number of bonded IOBs 5 out of 232

2 Timing SummarySpeed Grade -4Maximum combinational path delay

6236nsSimulation Result

Synthesis Result

WD

41

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 42: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

DL XOR GATESimulation Result

Synthesis Result

WDDL XOR GATESynthesis

Report==========================================================

== Final Report

===========================================================Final

42

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 43: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

ResultsRTL Top Level Output File Name wddlxorgatengrTop Level Output File

Name wddlxorgateOutput Format NGCOptimization Goal

SpeedKeep Hierarchy NODesign Statistics IOs

5Cell Usage BELS 2 LUT3 2 IO Buffers

5 IBUF 3 OBUF

2============================================================Devi

ce utilization summary---------------------------Selected Device 3s250eft256-4 Number

of Slices 1 out of 2448 0 Number of 4 input LUTs 2

out of 4896 0 Number of IOs 5 Number of bonded IOBs

5 out of 172 2 Timing Summary---------------Speed Grade -4Maximum

combinational path delay 6236nsSimulation Result

Synthesis Result

43

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 44: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

CHAPTER 9 SUMMARY AND CONCLUSION 91

SummaryIn order to provide security to ICs against side-channel attacks especially

Differential Power Analysis (DPA) it is necessary to implement the design in a logic that

can render constant power dissipation irrespective of the input combination WDDL is

proved to be advantageous to others and therefore is of great significance In this

dissertation work architecture for Blowfish Algorithm is designed and implemented in

WDDL style In this implementation bottom-up approach is used The low level entities

are designed and later they are all combined to form the entire module The key

scheduling is online The sub-keys generated for a particular key can be used for the

encryption of the entire data to be encrypted with that key The sub keys are given in

reverse direction for the decryption data path Initially logic gates are implemented in

WDDL and then higher modules have been designed by instantiating the WDDL gates to

form the entire module thus resulting in constant power dissipation irrespective of any

input data combination The entire design works in two phases namely Precharge phase and

Evaluation phase In the Precharge phase all the signals of the design are zeroed and

during the Evaluation phase the functionality of the design is achieved This sort of design

has been found simple and very effective in thwarting the side-channel attack namely

Differential Power analysis (DPA)92 ConclusionThe crypto processor has been

44

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 45: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

designed for the key size of 448 bits and plain text of 64 bits The code for the

implementation has been written in VHDL The functional verification has been done using

the Cadence (NC Launch) Simulation package and Synthesis using RTL Compiler The

Backend of the design is done using the SOC EncounterAccording to the specifications

desired functionality has been achieved In the output during the Evaluation phase there

has been same number of transitions thus resulting in constant power dissipation During

Synthesis it has been observed that a simple WDDL gate comprised many conventional

gates Therefore the area of the design has grown nearly three-fold when compared to the

design implemented in conventional CMOS logic at the cost of security incorporated into

the IC against Differential Power Analysis (DPA) Due to the constant power dissipation at

the output hacker cannot apply Differential Power Analysis (DPA) scheme to find the

secret key that is being used in the crypto-processor Thus security against DPA is

incorporated into the IC at hardware level by implementing the design in WDDL style

which is quite simple and effectiveCHAPTER 10

REFERENCES 101 Referred Technical papers[1] Kris Tiri Member

IEEE and Ingrid Verbauwhede Senior Member IEEE ldquoA Digital Design Flow for

Secure Integrated Circuitsrdquo IEEE Transaction on Computer-Aided Design of Integrated

Circuits and Systems Vol 25 No 7 July 2006[2] Prof Jean-Jacques Quisquater Math

RiZK laquo Side Channel Attacks raquo October 2002[3] Ross Anderson Mike Bond Jolyon

Clulow and Sergei Skorobogatov ldquoCrypto processors ndash A Surveyrdquo IEEE Proceedings[4]

Noohul Basheer Zain Ali James M Noras ldquoOptimal Data path Design for a Cryptographic

Processor The Blowfish Algorithmrdquo Malaysian Journal of Computer Science Vol 14 No

1 June 2001 pp 16-27[5] Dan Rinehimer Derek Wilson ldquoSummary of B Schneierrsquos

Description of New Variable Length Key 64-Bit Block Cipher (Blowfish)rdquo[6] Kris Tiri and

Ingrid Verbauwhede ldquoA Dynamic and Differential CMOS Logic Style to Resist Power and

Timing Attacks on Security ICrsquosrdquo[7] Discretix Technologies Limited ldquoIntroduction to Side

45

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES
Page 46: Implementation of Wddl Gates for Secure Ic Applications Sss 2004 (1)

Channel Attacksrdquo[8] Kris Tiri Moonmoon Akmal and Ingrid Verbauwhede ldquoA Dynamic

and Differential Logic with Signal Independent Power Consumption to withstand

Differential Power Analysis on Smart CardsrdquoReference books[1] William Stallings

ldquoCryptography and Network Security Principles and Practicesrdquo Pearson Education

2003[2] Samir Palnitkar ldquoVerilog HDL A Guide to Digital Design and Synthesisrdquo

Prentice Hall Referred Web Pages[1] httpwwwschneiercomblowfishhtml[2]

httpwwwnistgov[3] httpwwwdiscretixcomPDFIntroduction20to20Side20Channel

20Attackspdf[4] httpwwwwipointpctdbenwojsp

IA=WO2005081085ampDISPLAY=CLAIMS

46

  • INDEX
  • CHAPTER 1 INTRODUCTION AND OBJECTIVE
  • CHAPTER 2 REVIEW OF LITERATURE
  • CHAPTER 3 HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
  • CHAPTER 4 SMART CARD OVERVIEW
  • CHAPTER 5 SIDE CHANNEL ATTACKS
  • CHAPTER 6 CONSTANT POWER CONSUMING LOGIC STYLES

Recommended