On the Vulnerability of FPGA Bitstream Encryption against ... · neering or the introduction of...

On the Vulnerability of FPGA Bitstream Encryption againstPower Analysis Attacks

Extracting Keys from Xilinx Virtex-II FPGAs

Amir MoradiHorst Görtz Institute

for IT-SecurityRuhr University Bochum

[email protected]

Alessandro BarenghiDipartimento di

Elettronica e InformazionePolitecnico di Milano

[email protected]

Timo KasperHorst Görtz Institute


[email protected]

Christof PaarHorst Görtz Institute


[email protected]

ABSTRACTOver the last two decades FPGAs have become central com-ponents for many advanced digital systems, e.g., video signalprocessing, network routers, data acquisition and militarysystems. In order to protect the intellectual property andto prevent fraud, e.g., by cloning a design embedded into anFPGA or manipulating its content, many current FPGAsemploy a bitstream encryption feature. We develop a suc-cessful attack on the bitstream encryption engine integratedin the widespread Virtex-II Pro FPGAs from Xilinx, usingside-channel analysis. After measuring the power consump-tion of a single power-up of the device and a modest amountof off-line computation, we are able to recover all three differ-ent keys used by its triple DES module. Our method allowsextracting secret keys from any real-world device where thebitstream encryption feature of Virtex-II Pro is enabled. Asa consequence, the target product can be cloned and ma-nipulated at the will of the attacker since no side-channelprotection was included into the design of the decryptionmodule. Also, more advanced attacks such as reverse engi-neering or the introduction of hardware Trojans become po-tential threats. While performing the side-channel attack,we were able to deduce a hypothetical architecture of thehardware encryption engine. To our knowledge, this is thefirst attack against the bitstream encryption of a commercialFPGA reported in the open literature.

.

Categories and Subject DescriptorsK.6.5 [Security and Protection]: [Unauthorized access];D.4.6 [Security and Protection]: [Cryptographic con-trols]

General TermsSecurity, Experimentation

KeywordsSide-channel attacks, FPGA, bitstream encryption, tripleDES

1. INTRODUCTIONElectronic devices have become essential parts of our pri-

vate life (e.g., cell phones, e-readers or set-top boxes), atwork (e.g., laptops or routers) and in industrial environ-ments (e.g., control systems or sensors). Virtually all of to-day’s IT, communication and consumer electronic systemsemploy digital technology. As a flip side of this well-knowndevelopment, reverse-engineering and product piracy havebecome important issues for a wide variety of electronicproducts since the creation of exact copies of digital infor-mation is often straightforward. Examples include consumerproducts, network routers and set-top boxes, or militarysystems. The threat is not limited to merely counterfeit-ing products; once a product has been reverse-engineered itbecomes vulnerable to various other attacks. For instance,ill-intended malfunctioning of the device or circumventionof business models based on the electronic content, as it isregularly happening in the pay-TV sector, become possible.Not only consumer products, but also industrial and mili-tary applications can be affected. For instance, the recentdiscovery of details of the STUXNET virus shows the po-tential implications that embedded malware can have [12].Another flavor of malicious manipulation of digital systemswas described in a 2005 report by the US Defense Science

Board, where the clandestine introduction of hardware Tro-jans was underlined as a serious threat [1]. In summary, pro-tection of digital secrets and intellectual property is a keyfactor for developing and successfully marketing electronicproducts nowadays.

1.1 FPGA BasicsWhen developing embedded systems, the main target plat-

form choices are software, i.e. running the application ona microprocessor, or hardware, i.e realize an application-specific integrated circuits (ASIC). A third form of device,Field Programmable Gate Arrays (FPGAs), combines someadvantages of software (fast development, low non-recurringengineering costs) with those of hardware (performance, powerefficiency). These advantages have made FPGAs an im-portant fixture in embedded system design, especially forapplications that require heavy processing, e.g., for rout-ing, signal processing or encryption. Modern high-end FP-GAs have the functional equivalent of several tens of mil-lions of Boolean gates, making them formidable devices fora large spectrum of applications. Most of today’s FPGAsare (re)configured via a blob of binary data, called bitstream,which completely determines the functionality of the device.As described in Sect. 2.1, programming them is quite simi-lar to developing software for a microprocessor, allowing forfast development and a quick time-to-market.

Nowadays FPGAs have applications in banking, defence,aerospace, and many sophisticated commercial technical ap-plications such as video signal processing, e.g., for HDTV,or network routing. Even hitherto totally mechanical de-vices such as guns nowadays incorporate FPGAs, e.g., theXM25 Individual Airburst Weapon System [8]. Notewor-thy for this contribution, satellite communication and othermission-critical systems as well as high-security applicationsalso employ FPGAs [20]. As a key advantage, a productrelying on FPGAs can be regularly improved during its life-cycle by simply changing the bitstream, e.g., if a design bugis found or to provide new functionality. To name two ex-amples, today’s network routers are easily upgraded in thefield via the Internet and set-top boxes obtain an updatedfirmware via cable or a satellite link. One of the disadvan-tages of FPGAs, especially with respect to custom hardwaresuch as ASICs, is that an attacker who has access to thebitstream can clone the system and extract the intellectualproperty of the design. Note that the bitstream is in thevast majority of systems stored externally to the FPGA ina dedicated configuration memory and is from there loadedinto the FPGA on every power-up or reset — an adver-sary wire-tapping the relevant data signals can hence easilymonitor the bitstream. The main answer of the industry forprotecting the design is a security feature called bitstreamencryption.

1.2 Bitstream Encryption BasicsThe idea of bitstream encryption is to establish end-to-

end confidentiality by means of symmetric cryptography. Itprotects the entire path of an FPGA bitstream: the develop-ment environment of the manufacturer, the insecure channelinto the product, the storage inside the product, and finallythe loading of the design into the FPGA.

Figure 1 illustrates an example for a secure firmware up-grade of an FPGA-based network router. As a prerequi-site, the manufacturer and the FPGA in the network router

FPGA

01001110

01001110

Internet,satellite,

...encrypted bitstream

encrypted bitstream

Configuration

Memory

Designer

(network router, set-top box, etc.)

Figure 1: Bitstream encryption enables to securelytransfer the content of an FPGA, e.g., a firmwareupdate for a network router via the Internet.

possess the same secret key, that should be created individ-ually for every device and securely stored both inside theFPGA (kFPGA) and at the manufacturer site (kdesign). Af-ter generating the bitstream, the designer encrypts it witha secure symmetric cipher such as triple DES or AES usinga secret key kdesign. This encrypted bitstream can now besafely sent, e.g., via the Internet, to the configuration mem-ory of the target network router, from where it is loadedinto the FPGA. The latter possesses an internal decryptionengine and uses its secret key kFPGA to decrypt each bit-stream. Finally, the FPGA configures its internal circuitryaccording to the decrypted bitstream. The configurationis successful if and only if the secret keys used for the en-cryption and decryption of the bitstream are identical, i.e.,kdesign = kFPGA.

A third party who gets hold of the encrypted bitstream —e.g., from the Internet or by wire-tapping the internal databus that is used for the configuration inside the product —will not be able to extract any useful information. Withoutknowing the secret key for the decryption, she will not beable to deduce the design or configure another FPGA cor-rectly. As a major consequence, counterfeiting or attackingthe product becomes unfeasible, and no confidential or pro-prietary information contained in the bitstream can fall intothe hands of the attacker.

1.3 Side-Channel AttacksSide-channel attacks exploit physical information leakage

of a cryptographic implementation in order to extract secretinformation, in particular the cryptographic key used. In thecase of power analysis, the consumption of electric current orthe electromagnetic emanations of the cryptographic deviceare used as a side channel for key extraction. Since bothpower consumption and EM emissions of a digital devicedepend on the values being computed, side channel attacksemploy a divide-and-conquer approach, to model a part ofthe circuit employing guesses on small parts of the key, (e.g.,6 bits). The consumption (or EM emission) models are sub-sequently correlated with real world measurements: only themodel dependent on the correct key guess will predict thebehaviour of the actual device. Contrary to mathematical orbrute-force attacks, where typically pairs of plaintexts andciphertexts are required for a key-recovery, side-channel at-tacks require only one of them, i.e., either the plaintext orthe ciphertext [15, 18]. While the open literature reports anumber of successful attacks against cryptographic architec-tures implemented on the FPGA fabric [25, 27], no publiclyknown attack on the bitstream encryption has been reported

in the open literature. We also refer to a number of surveysdealing with FPGA security which have not addressed bit-stream reverse engineering as a real-world threat [14, 30].

1.4 Content of this PaperIn this paper we investigate the level of security provided

by the bitstream encryption used in the widespread Virtex-IIfamily of FPGAs produced by Xilinx. A detailed descrip-tion of these FPGAs and their bitstream encryption fea-ture, as well as an introduction to power analysis, are givenin Sect. 2. The manufacturer claims that the bitstream en-cryption as implemented in Xilinx Virtex-II Pro FPGAs canthwart even most Class III attacks [28], i.e., attacks by well-funded intelligence agencies [5]. However, our research doesnot support this claim as shown in Sect. 3: starting with-out any previous knowledge about the implementation, i.e.,a typical black box scenario, we demonstrate step-by-stephow to conclude from the power consumption and timingbehaviour of the FPGA to the detailed internal structureof the decryption hardware that we found not side-channelprotected. Finally, we develop a side-channel attack thatexploits the power consumption of the FPGA during thedecryption of one bitstream to recover the secret keys usedfor the bitstream encryption. Some of the dramatic impli-cations of our attack are illustrated in Sect. 4.

2. PRELIMINARIESThis section introduces Field Programmable Gate Arrays

(FPGA) in general, and in particular the Virtex-II Pro de-vice and its security features. We further describe the side-channel analysis and digital signal processing techniques usedfor our attack.

2.1 Flexible Hardware: FPGA DetailsAn FPGA is a reconfigurable integrated circuit that can

host highly complex digital circuitry, e.g., a complete mi-crocontroller, digital signal processing algorithms, or almostany other design that can be put in silicon.

Analyzing the worldwide FPGA market, Xilinx (51.2 %)and Altera (35.5 %) together account for more than 85 %market share, followed by Actel and Lattice with about 6 %market share each (Gartner Inc., 2008) [13]. Both marketleaders offer security solutions for commercial and militaryenvironments that aim at protecting the bitstream by meansof symmetric encryption. Altera programs the respectivesecret keys in non-volatile e-fuses on the silicon die of theFPGA that are covered by layers of metal to hinder extrac-tion of the key by invasive physical attacks [7]. Xilinx onthe other hand states that programmable fuses are easy toreverse-engineer [20] and instead promotes a battery-backedsolution to store the keys in volatile memory, ensuring thatthe key is instantly zeroed if the FPGA is removed from theproduct, e.g., to conduct physical attacks. In this paper wefocus on the advertised to be more secure solution of the —in terms of FPGAs sold — leading manufacturer Xilinx.

The essential building blocks of an FPGA are config-urable logical blocks (CLB) which consist of slices. Eachslice in turn contains small look-up tables (LUTs) that real-ize arbitrary combinational functions, i.e., small Boolean cir-cuits, and usually also simple memory elements, e.g., latchesstoring one bit. Programmable switches allow altering of thesignal path inside the slices, as well as routing the data toother CLBs on the FPGA. Depending on the particular type

of FPGA various other auxiliary resources are available, forinstance, multiply-and-add units, BlockRAMs that can beaccessed by several CLBs, and even entire CPUs. By appro-priately configuring the slices and combining the inputs andoutputs of tens of thousands CLBs, an FPGA can realizethe most elaborate tasks in parallel and hence achieves veryhigh computational power. Obviously, the vast complexityof this configuration process cannot be managed manually,thus software tools are required for the design of an ap-plication. The tools ultimately generate the configurationinformation, i.e., the bitstream, for the FPGA.

The design flow for reconfigurable hardware starts withexpressing the function to be realized by means of a high-level hardware description language (HDL) such as VHDL orVerilog. The FPGA manufacturers provide integrated devel-opment environments for their products, e.g., the Xilinx ISEDesign Suite [3]. They allow HDL descriptions to be synthe-sized into a schematic representing the internal wiring of thecorresponding FPGA, commonly called netlist. The synthe-sized netlist can be simulated on the development PC fordebugging the implementation. Finally, the netlist is trans-lated into a low-level circuit description, the bitstream, thatspecifies the exact content of each slice, each programmableswitch, and every other configurable component inside theFPGA. Configuring the target device with this bitstreamuniquely sets the initial state of the whole circuitry to real-ize the designed function.

2.2 Device Under Attack: Virtex-II ProAs the target for our analysis we opted for a Virtex-II

Pro XC2VP7 FPGA. According to the data sheet [33] itssilicon die consists of nine metal layers manufactured in a130 nm process with 90 nm high-speed transistors, allowingfor designs with clock frequencies up to 600 MHz for inter-nal components. Amongst others, the FPGA provides morethan 11, 000 internal registers and LUTs and an embeddedPowerPC 401 processor core, wired to the fabric, in casea general purpose processor is needed alongside applicationspecific hardware. For configuring all of its internal circuitrythe FPGA requires a bitstream consisting of 560, 700 bytes.

During our analysis, the power supply pins of the FPGAare of special interest, as they form the entry point for theside channel. The pins are divided in three main groupsaccording to the parts of the device they are providing en-ergy to:V CCINT for a 1.5 V supply of the internal core logic,V CCAUX for a 3.3 V supply of auxiliary circuits, and V CCO

for a 3.3 V supply of the input and output buffers. Note,that the 1.5 V of the core voltage V CCINT can be distin-guished from the other 3.3 V supply voltages and can hencebe easily identified on the unknown printed circuit board ofa real-world product.

Both Xilinx and Altera FPGAs rely on volatile static ran-dom access memory (SRAM) for holding the configurationinformation and, as a consequence, lose it when the power isswitched off. In order to keep the crucial cryptographic key,in Xilinx products a lithium battery providing 1.0–3.6 V hasto be connected to the V CCBatt pin. The battery suppliesthe SRAM-based secret key storage used to decrypt the en-crypted bitstream and provides the basis to use Virtex-II Prodevices in FIPS 140-2 Level 4 security devices, the highestsecurity standard indicated by NIST. The standard man-dates that all cryptographic keys must be instantly zeroedif physical tamper actions are detected on the secure device.

In case the battery is disconnected and the FPGA is notpowered up the keys are instantly flushed away, forcing thecustomer to return the device to the producer for re-keying.While power is applied to the V CCAUX pin of a Virtex-IIPro, the battery is not required to buffer the key storageand can be replaced [31].

2.2.1 Configuration Process and ProtocolsAs stated above, the configuration of the FPGA is com-

pletely erased every time the FPGA is powered off. It is thusnecessary to reprogram the device after every boot, evenwhen it is deployed in the field. Xilinx provides five differentconfiguration protocols to configure the device, namely Mas-ter or Slave Serial mode, Master or Slave SelectMAP modeand IEEE-1149.1-2001 standard JTAG boundary scanningmode [4]. The first two modes input the bitstream into thedevice, one bit per clock cycle, through a specific pin whichserves as a data-in line. The SelectMAP modes represent anevolution of the simple serial mode as they are able to send abyte of the bitstream at a time, thus decreasing significantlythe time needed for configuration. The JTAG mode allowsto input the whole bitstream into the device considering theinternal configuration memory as one of the data registersof a standard JTAG chain. It is thus possible to insert theconfiguration information one bit at a time via JTAG, tak-ing proper care of setting all the other devices on the JTAGchain into bypass mode.

Due to the widespread adoption and standardization ofthe JTAG protocol, we chose the JTAG interface of theFPGA for supplying the bitstream during our attack. Thevast majority of commercial products retain the JTAG linecontacts for post-production testing, hence this port is alsothe most suitable for real-world attacks. The JTAG interfaceis based on four lines: two for data input and output (TDIand TDO), one to supply the clock to the whole JTAG chain(TCK) and one to supply commands to the finite state ma-chine implementing the JTAG port on every device (TMS).All the modules belonging to a JTAG chain, e.g., the config-uration memory and the FPGA itself, have their TDI andTDO port daisy chained together and form a single loop.The number of devices connected to the JTAG testing chainand the order in which they are linked to the JTAG loopcan be easily reverse-engineered [26].

2.2.2 Bitstream Encryption in the Virtex-IIThe bitstream encryption feature [19] enables to configure

the Virtex-II Pro FPGA with a bitstream that is encryptedby means of the symmetric-key cipher triple DES [32]. TripleDES is a block cipher constructed by chaining three sub-sequent executions of the Data Encryption Standard [23](DES) by using either two or three different DES keys. In or-der to provide backwards compatibility, the three executionsof DES are actually an encryption-decryption-encryption pat-tern: in this way, if a the same key is employed for all threecipher instances, the triple DES is effectively reduced to asingle DES. The DES cipher encrypts blocks of 64 bits em-ploying a 56-bit key, hence the possible key lengths of tripleDES are 112 or 168 bits. Whilst single DES is vulnerableto brute-force attacks due to its short key length, e.g., us-ing the COPACOBANA code breaking machine [17] or withcommodity hardware such as GPUs [6], there are no theo-retical or brute-force attacks against triple DES known withany realistic chance of success. Hence, neither mathemati-

cal cryptanalysis nor an exhaustive search can endanger thesecrecy of a design protected by the bitstream encryption.

Since the bitstream is processed by the cipher in blocks of64 bits, Cipher Block Chaining (CBC) [22] is implementedto combine subsequent blocks of the bitstream. During adecryption, this mode of operation defines that the cipher-text of one block should be added via a bitwise XOR tothe output of the following triple DES decryption to obtainthe plaintext. Since the first block to be decrypted doesnot have a predecessor, a 64-bit initialization vector (IV) isused to mask it. This IV may be publicly known and canhence be transmitted in plain without loss of security. Theanalyzed FPGA can store up to six single DES encryptionkeys, which can be independently marked for use in eitherthe first, the second or the third DES run of the triple DES.This feature allows for either 3 key sets (for 2-key standardtriple DES) or 2 key sets (for use with 3-key triple DES)and enables the manufacturer to update the bitstream evenif a key set has been compromised. Xilinx states that thereis no read port for the key storage, except for the one inter-nally employed by the decryption engine [32]. The only wayto program the keys into the device is through the JTAGprogramming mode, although no details of the key enteringmode are provided by Xilinx in the technical specification.In order to avoid incorrect configuration of the FPGA dueto a faulty decrypted bitstream and the subsequent possibledamage to the device, Xilinx states that the FPGA performsa CRC check on the decrypted bitstream. For this purpose,a CRC16 standard checksum embedded in the configurationfile is being used. In fact, there is no other authenticationscheme implemented in the bitstream (encryption) mecha-nism of Virtex-II.

2.3 Power Analysis and Power ModelsThe power consumption of modern CMOS-based devices

mostly depends on the ongoing switching activity. It is pos-sible to exploit this relation in order to gain insights intothe values being involved in a computation. For power anal-ysis, the power dissipation is measured during the regularfunctioning of a secure device in order to infer the secretkeys. The literature divides this class of attacks into twogeneral families, “simple” (SPA) and “differential” (DPA) at-tacks [18]. SPA attacks involve visually interpreting powerconsumption measurements as a function of time in orderto detect data-dependent properties between the computedvalue and the power consumption, in precise intervals oftime. This methodology assumes that it is possible to dis-cern a subset of keys thanks to their peculiar consumptionbehavior, e.g., to distinguish a square and multiply step in amodular exponentiation from a simple squaring. In public-key algorithms such as RSA this can lead to a completeleakage of the secret key. SPA is rarely applicable againstsymmetric ciphers, as the one used in our target device.DPA attacks rely on building a power model of a computingcircuit, employing the input values to the circuit as inputsto the model, and a set of possible values of the unknownkey as a parameter. To verify a correct key hypothesis, theattacker correlates the predicted values with the actual mea-surements from the device under attack through a statisticaltool of her choice. Commonly, for a DPA it is assumed thatthe values of the measurement set and the hypothesis setare normally distributed random variables with mean µ andstandard deviation σ, where µ is the mean consumption of

the circuit for a precise key value at a specific time instant.Since the dynamic power consumption of a device is causedby the switching activity of the logic gates [16], a properdynamic consumption model tries to express the power con-sumption depending on the intensity of the switching activ-ity. This switching activity may be caused, e.g., by the com-binational logic computing the result of a Boolean function,or by a latch storing a single bit value. A first order approxi-mation of the switching activity of many circuits is providedby the Hamming weight (HW) of the input value. This re-sults in a quite approximate model and has the advantageof requiring no precise knowledge about the implementation,since only the data processed at one instant in time has to bepredicted. The switching activity induced by a latch storinga value is often better modeled by the Hamming distance(HD) between the former value stored by the latch and thenew one. A latch toggles internally (and thereby consumesa noticeable amounts of energy) only if the previous valuebeing held is different from the new one to be memorized,i.e., when their HD is 1 [11]. Consequentially, the model re-quires to predict two intermediate values, i.e., those storedin a register before and after the targeted operation. Thisrequires some additional knowledge about the implementa-tion, compared to a HW model.

Once a power model has been chosen, the attacker pre-dicts an intermediate value depending on both a known in-put value and a part of the secret key. She then computes aset of hypothetical power consumption values, one for eachof the possible values taken by the part of the secret key.The size of the part of the secret key hypothesized duringone step of the attack is a trade-off between building an ac-curate model of the power consumption, which requires morekey bits to be considered in a single time, and the computa-tional complexity involved in computing a power hypothesisfor every possible value which may be taken by the key por-tion. However, since attacking one key portion is performedindependently from attacking the others, the attacker is freeto divide the whole key in portions that are small enoughto be processed. To recover the whole key, multiple attacksare performed (on the same set of measurements). Aftercollecting a large number of measurements during the cryp-tographic operation to be attacked, the adversary employsa statistical tool to compare the hypothesized power valueswith those that have been measured, in order to infer thecorrect value of the key part. The typical methodology tocorrelate the predicted and actual power consumption of thecircuit is to employ Pearson’s linear correlation coefficient:if the model correctly predicts the consumption of the cir-cuit, the linear correlation among the synthetic values andthe recorded ones will be close to 1, while it is expected for awrong model to have negligible correlation values. Conduct-ing the correlation analysis time-wise, i.e., for each time in-stance of the consumption values recorded from the circuit,gives the attacker additional information about the exactmoment when a side-channel leakage occurs in the powertraces: the values of the correlation coefficient of the correctkey hypothesis spike only in the time interval in which theoperation is executed.

2.3.1 Processing the MeasurementsFor a successful power analysis attack the collected power

traces need to fulfill a number of requirements. The firstand foremost, due to the time-wise nature of the analysis, it

is fundamental for obtaining correct results in the analysis,that the traces are perfectly synchronized. Due to eitherinstrumental issues or the fact that the instant of the be-ginning of a sensitive operation is not always in full controlof the attacker, it can be necessary to process the obtainedtraces in order to compensate for the possible phase shifts.The problem of realignment can be solved through choosinga single trace as a reference and detecting the most likelytime delay that needs to be applied to each other measure-ment in order to achieve correct synchronization. Since themeasurements are taken in very comparable situations (i.e.,encryptions of different plaintexts with the same algorithm),the most natural figure of merit to detect the appropriatetime delays is cross-correlation. Cross-correlation is a mea-sure of the similarity of two signals as a function of a time-lag applied to one of them. The time delay maximizing thecross-correlation between the reference and the current traceis taken as the optimal time delay to achieve alignment.

A further key point for a successful power analysis is min-imizing the amount of noise in the traces. Regardless of thequality of the measurement setup the measurements alwayscontain noise, since the traces represent an aggregate mea-surement of the power consumption of the entire chip andits environment. Digital signal processing techniques canbe employed to separate the side-channel leakage from theunwanted signals and thereby extract only the relevant in-formation. Their usage turns out to be of crucial importancewhen dealing with large digital components as in [9]. Propersignal filtering can also be useful to evict any components ofthe measured trace which are not pertaining to the hardwareunder attack, provided they pertain to different harmoniccomponents [10], removing even large amounts of unrelatedsignal.

3. ATTACKING A BLACK BOXThis section will provide a detailed description of the steps

performed to successfully reverse engineer and break the pro-tection scheme of Xilinx’s Virtex-II Pro. The first subsec-tion illustrates the implementation details of the encryptedbitstream that were gathered through the analysis of theoutput of the Xilinx ISE synthesis tool. After presentingthe in-vitro analysis on the bitstream, the section will de-scribe the custom communication device we built to programthe FPGA and the measurement setup employed to recordthe power consumption profiles of the Virtex-II Pro. Subse-quently we describe the preliminary chip analysis orientedat identifying the moment in time when the decryption isperformed and provide an analysis of the power profile ofthe device. The section then infers the inner architecture ofthe DES decryption engine according to our findings aboutthe time instances in which the intermediate values are pro-cessed. Finally, the precise attack techniques that allow torecover the whole triple DES key used for the decryptionof the bitstream are illustrated and the minimal number oftraces required for the full-key recovery is determined.

3.1 Reverse Engineering of the BitstreamAs a first step, the precise format of the bitstream needs to

be analyzed. Understanding the bitstream is a prerequisiteto find out how the configuration information is processedby the internal decryption engine of the device. We assumethat the attacker is able to monitor the entire encrypted bit-stream of the target FPGA, since the bitstream of an FPGA

FPGA LogicSynchronization

and Startup

CRC-16 and finalization

Write Plain Bitstream

UnencryptedBitstream

FPGA LogicSynchronization

and Startup

CRC-16 and finalization

Select First Key + Key ID

Write IV + IV Value

Write Enc. Bitstream

Triple DES EncryptedBitstream

Figure 2: Comparison between the structure of anencrypted and an unencrypted bitstream

retains the same size regardless of the design implemented(the FPGA needs to be configured in full at least duringits boot phase). Moreover, in case the bitstream encryptionfeature is enabled, the device cannot use the partial reconfig-uration features — it is hence forced to perform a full recon-figuration for each update. To obtain the full (encrypted)bitstream an attacker has two options: i) she can wire-tapthe data pin and command lines in order to eavesdrop onthe communication while the FPGA is being configured, orii) she can read out the content of the non-volatile memoryattached to the FPGA which is responsible to configure theFPGA on power-ups. Note that this external memory is notnecessarily connected via JTAG, and feeding the encryptedbitstream using different configuration schemes, e.g., SlaveSerial, is also possible, but this does not have any impacton the attack that we present here. From now on, we as-sume that the attacker is in possession of the full encryptedbitstream through one of these methods.

Comparing Bitstreams.While the Virtex-II Pro user guide [31] from Xilinx doc-

uments all the common configuration registers and gives abrief description of the inner addressing modes of the de-vice, the internal registers driving the decryption are notexplicitly mentioned. The the Virtex-II Pro user guide [31]describes the bitstream as split into packets which may tar-get a specific configuration register to set configuration op-tions, write to the SRAM configuration memory, or toggleinternal signals, and are formed by a 32-bit header and avariable length body. Depending on the packet length theyare split into two types: Type 1 packets may only handle abody shorter than 211 − 1 32-bit words (i.e., 8 kB of data),while Type 2 packets, used to write the bitstream to theinner configuration mechanisms, handle up to 227 − 1 words(i.e., 512 MB of data). A peculiarity of the architecture man-dates that only Type 1 packets may change the destinationof a read or write operation which is not changed duringa Type 2 packet operation. In order to understand whichparts of the bitstream drive the decryption, we start the re-verse engineering on the basis of a very simple test design

comprising just a single boolean gate. We synthesized thetest design once without activating the bitstream encryptionfeature, to obtain an unencrypted bitstream, and then againwith varying keys for the encryption, producing several bit-streams that are encrypted with the different keys. In thefollowing we reveal the relevant details of the configurationby comparing the files containing the different bitstreams.

The differences deduced from the comparison are summa-rized in Fig. 2. The content of the bitstream is organizedin packets, preceded by a single 32-bit synchronization word(namely, 0xAA995566). This initialization header is the samefor both encrypted and unencrypted bitstreams and has thepurpose to synchronize the FPGA logic and reset the in-ner programming logic to a default state. After this phase,the unencrypted variant of the bitstream simply consists ofa large packet with the whole configuration information. Incontrast, the encrypted bitstream contains some extra infor-mation for initializing the cryptographic engine. The firstvalue indexes the first key in the key storage to be employedfor the triple DES decryption. The two remaining keys aredetermined by flags marking two cells of the 6-keys bufferas “middle” and “last”. After the key indexes, the value ofthe IV required for the CBC mode is passed in plain. Fol-lowing the IV value, the bitstream continues with a largepacket containing the whole configuration information en-crypted with triple DES. The writing command is the sameas for the unencrypted bitstream, with the exception of anextra bit which is set to 1: we thus infer that setting thebit effectively enables the decryption engine. The next stepwas to reverse-engineer how the 64-bit secret keys input tothe ISE synthesis tool are processed before being used forthe encryption of a bitstream. In contrast to the DES stan-dard, which suggests either to discard the eighth bit of everybyte of the DES key or to employ it for a parity check, wehave discovered that the ISE tool discards the full first byteof the 64-bit supplied key in order to obtain the 56-bit keyfor the DES encryption/decryption.This is done regardlessof the fact that the manuals point out that the first byteis employed only for key indexing in the storage. As a fi-nal check to all our inferences and in order to understandthe endianness of the bitstream, we generated a special bit-stream that is encrypted with triple DES employing one ofthe known weak keys of DES — the same for all three exe-cutions of DES. These special weak keys have the propertythat a DES encryption involving one of them becomes aninvolutary operation, i.e., encrypting a second time with thesame weak key is identical to a decryption and hence yieldsthe plaintext. Through generating a bitstream that is en-crypted with weak keys and with the IV of the CBC modeset to zero, due to the properties of the CBC mode the sec-ond 64-bit block contained in the bitstream is equal to theplaintext corresponding to the first block in the bitstream —thus we obtain a correctly deciphered plaintext in the gen-erated bitstream file. Comparing this deciphered plaintextwith the available unencrypted bitstream we confirmed thatwe correctly understood the triple DES engine input formatand in particular the order in which the bits are processedinternally.

Timing Issues.The last information we were able to infer before tackling

the device pertains the minimum throughput of the internaldecryption engine. The Xilinx user guide reports that the

maximum clock rate to be externally supplied during JTAGprogramming of an encrypted bitstream is fclk = 33 MHz.In every clock cycle, one new bit of the bitstream is sup-plied to the FPGA. This in turn implies that the triple DESengine must be able to perform a full triple DES decryp-tion of a 64-bit block in a time less than 64 × Tclk, whereTclk = 1

fclkdenotes the duration of a clock cycle. This

gives us the expected time for a full triple DES of around64 × Tclk = 1.94µs which implies that, assuming all 48rounds of the triple DES take roughly the same time, a sin-gle round of the DES must be executed in around 40 ns.Accordingly, the hypothesis of the DES being perfomed bymeans of a software implementation running on an inter-nal microcontroller becomes highly unlikely, since it wouldbe too slow — instead we now assume a hardware imple-mentation. If the decryption was realized by implementingone DES round in hardware and execute it 48 times for thetriple DES, the above details about the timing give a lowerbound for the clock rate of the inner cryptographic engineof 24.75 MHz.

3.2 Customizing the Measurement SetupAfter analyzing the format of the bitstream and the spec-

ifications of the target device, we moved on to develop ameasurement workbench in order to record the power pro-file of our target FPGA during its operation in a real-worldscenario. The first step in this direction was to develop acustomized communication module that is able to correctlyconfigure an FPGA via JTAG. Hence, we designed an in-system programmable board that is based on an Atmel AT-Mega256 8-bit microcontroller and provides a JTAG port,a universal serial bus (USB) as well as a dedicated pin fortriggering the oscilloscope. We implemented a frameworkon the microcontroller comprising the JTAG protocol anda serial protocol for communicating via USB. The firmwareallows to freeze the configuration process through stoppingthe clock signal fed to the Virtex-II Pro and enables to issuethe trigger signal to the oscilloscope before the clocking inof any chosen bit, i.e., with a resolution of 125 ns. While thefixed header of the bitstream (see Fig. 2) fits into the mem-ory of the microcontroller, the remaining bits to be sent areprovided by the control PC and sent to the communicationmodule. The latter then wraps the bitstream in the JTAGprotocol and forwards it to the device under attack whileissuing trigger signals at the appropriate time instants.

As the target platform for attacking the bitstream en-cryption we chose the customized FPGA development boardSASEBO [2]. It contains an XC2VP7 Virtex-II Pro FPGAand provides stable and suitable voltages by means of on-board voltage regulators. A JTAG connector is providedto configure both the FPGA and a dedicated PROM whichare connected in a daisy chain form. During the analyseswe slightly modified the board by inserting resistors thatallow measuring the power consumption of the V CCINT ,V CCAUX and GND paths. Similar modifications are alsorequired for attacking other real-world products comprisinga bitstream encryption.

A LeCroy WP715Zi digital oscilloscope with a maximumsampling rate of 20 GSamples/s and an analog bandwidthof 1.5 GHz was employed to record the instantaneous powerconsumption of the target FPGA at a maximum verticalprecision of 2 mV/division.All the traces where acquired at10 GSamples/s to avoid any possible aliasing due to under-

sampling. The program feeding the encrypted bitstreams toour customized programmer and controlling the acquisitionof the power consumption traces runs on the oscilloscope it-self, i.e., the communication module is connected to it viaUSB. The employed probe connecting the oscilloscope to themeasurement resistor on the board is a LeCroy AP033 ac-tive differential probe, which includes a low noise 10x analogamplifier that boosts the effective vertical resolution of ourmeasurement subsystem to 200µV/division.

For verifying that our setup allows to correctly configurethe target FPGA with custom (encrypted) bitstreams, weagain synthesized our test design comprising a single booleangate and connected the appropriate pins of the FPGA to twoexternal switches serving as inputs and an LED displayingthe output of the logical operation. Experimenting with var-ious self-generated bitstreams, e.g., by means of arbitraryIVs, we used the simple circuit as a practical means to de-bug the proper configuration of the FPGA and successfullyverified the functionality of our setup.

3.3 Timing and Power Profile AnalysisWith the framework for configuring the FPGA and ac-

quiring its power consumption at hand we now proceed toanalyze the power profile of the target FPGA, in order toidentify the point in time when the targeted triple DESdecryption takes place. We started our analyses from theV CCINT line, which, according to the specifications pro-vided by Xilinx, powers the whole FPGA fabric and innercircuits. This line turned out to be the actual line feedingthe decryption engine. For the sake of completeness, thedetection analyses have been repeated also on V CCAUX ,which did not show any change in the power consumptionwhether or not the decryption engine is enabled. We maythus conclude that only V CCINT is effectively powering theFPGA decryption engine. To record the power consumptionof the internal circuits connected to V CCINT , there are twooptions: the power consumption can be sampled either bymeasuring the voltage drop across resistor between V CCINT

and the device, or one between the device and GND.Trying the second option first, we detected a strong echo

from the ground plane, implying a strong variation of the off-set of the measured power consumption that rendered thepower traces useless for further analysis. The disturbing ef-fect turned out to be related to the activities at the inputand output ports of the FPGA during the configuration, i.e.,the communication via JTAG: the power consumption of allvoltage supply inputs (V CCINT , V CCAUX and V CCO) issummed up when measuring at the GND path, as a conse-quence the data and clock signal of the JTAG port causedthe undesired variations in the offset. As a remedy for theproblem we finally decided to acquire all our measurementsbetween V CCINT and the device and thereby solely recordthe power consumption of this pin, separated from all otheractivities on the other power pins.

To learn more about the internal configuration processwe performed tests with bitstreams that were generated in-tentionally wrong, i.e., configuring an FPGA with them re-sults in an unpredictable behavior. During the tests, theconsumption of the FPGA spiked after a while, the deviceheated up and the configuration process effectively stopped:we proceeded no further in order to avoid damage to thedevice. These findings let us suggest that the configurationinformation is written to the SRAM memory of the configu-

ration fabric instantly upon reception of each configurationblock, even before the actual boot command is given andbefore the CRC checksum is properly verified.

The first step in the detection of the time period in whichthe decryption takes place was done through recording powertraces during the loading of 64 bits of the bitstream. Twolong traces have been collected: one from an encrypted bit-stream and the other one from a plain bitstream.

(a) Not encrypted

(b) Encrypted

Figure 3: Raw measurements of the power consump-tion at V CCINT during the time period of the de-cryption

Through comparison we spotted the key difference in powerconsumption after the second bit of the 64-bit block is clockedin. Fig. 3 depicts the power consumption of the FPGA forthe case of an encrypted bitstream (at the bottom) and itsunencrypted counterpart (at the top), revealing a clear in-crement in the dynamic power consumption occurring at thistime instant only if the bitstream encryption feature is used.

Filtering the Measurements.A strong oscillating signal with a frequency of 295 MHz

can be noticed in both traces. Being present also in theunencrypted bitstream trace we followed the intuition thatit is not related to the decryption engine and tried to removeit by means of a band block filter. The latter is realized inthe digital domain and suppresses the components of thesignal at 295 MHz, employing a narrow Chebyshev type 2window in order to minimize the effects of the aliasing onother harmonic components. The results of the filtering aredepicted in Fig. 4: it is now possible to clearly distinguishthe shape of the three DES executions, followed by the samepeak in power consumption regarding what we suppose is thewriteback operation of the 64-bit word into the configurationfabric.

In order to confirm that we did not discard any informa-tion relevant for further power analysis, we computed thevariance of both the filtered measurements and also the partof the measurements that has been discarded by the filter,

WB

(a) Not encrypted

DES-D DES-DDES-E WB

(b) Encrypted

Figure 4: Filtered power consumption measured atV CCINT during the time period of the decryption.

i.e., containing the frequency components that have beenblocked by the band block filter. We recall that, since dif-ferential power analysis relies on exploiting the differencesin the power consumption caused by different inputs, thetime-wise variance computed over a significant number oftraces is a reasonable index of the information contained inthe signal.

Figure 5 presents the computed variances for the time win-dow pertaining the decryption-encryption-decryption opera-tions, based on processing 50, 000 different inputs. As we cansee, the variance of the part of the signal kept by the filterfeatures three distinct peaks at the beginning of each DESexecution and is higher during the encryption process, withrespect to the lower values assumed before and after. In con-trast, the variance of the discarded signal is practically flat,suggesting that no information relevant to the decryptionprocess has been omitted. Therefore it is conceivable thatthe oscillating power consumption is actually the power pro-file of the PowerPC 401 core embedded in the FPGA. Thecore has a reference working frequency of 300 MHz whichseems compatible with our measurements.

The analysis suggests the presence of a large buffer em-ployed to store the result of a whole DES decryption (orencryption) operation, due to the high variance at the be-ginning of every DES computation. It is also possible toinfer the duration of a single DES round from the variancefigure by simply computing the distance between the peaks:exactly one DES execution is encompassed between them.Through calculating this distance we obtain a computationtime for the DES of 217 ns, corresponding to 651 ns for theexecution of three full DES computations. This figure is wellwithin the maximum bounds enforced by the JTAG inputrate calculated in Sect. 3.1 which mandates the triple DESto be faster than 1.94µs. The peak to peak amplitude of thedecryption signal is roughly 2.8 mV, thus the measurement

(a) Kept by the filter

(b) Discarded by the filter

Figure 5: Variance of the power consumption for thepart of the signal kept and discarded by the filter

setup should have enough sensitivity to capture the differ-ences in the consumptions of the decryptions of differentplaintexts.

3.3.1 Power-Profiling the Decryption EnginePerforming a power analysis attack requires to know the

architecture of the target device to select a power modelwhich matches to its power consumption characteristics. Thisinformation is missing when attacking a black box. Havingidentified the point in time where the triple DES decryptionis performed by the device we thus proceed with investigat-ing the correct model for the power consumption. Duringthis profiling phase, that aims at determining the underlyinghardware structure of the decryption engine, the secret keysused for the 3DES are known. Accordingly, we are able tofully compute the intermediate results of the cipher in everyencryption/decryption round.

We selected three different keys in the Xilinx ISE toolsand generated the corresponding key file and an encryptedbitstream. After configuring the keys into the FPGA us-ing a Xilinx programming device we collected 50, 000 powertraces for the same number of ciphertexts (64-bit blocks)1

while sending the encrypted bitstream to the target deviceby means of our customized configuration module. In thefollowing, in order to perform a power analysis of the ac-quired measurements one needs to align them. For this pur-pose we applied the cross correlation alignment explained inSect. 2.3.1 on the whole trace, band block filter.

Section 3.3 pinpoints the relevant part of the configura-tion process, i.e., when the second bit of a 64-bit ciphertextpacket is sent to the FPGA. The acquired power traces musthence be related to one of the previous ciphertexts (depend-ing on the size of the internal buffer of the device). In orderto determine which ciphertext belongs to which power tracewe calculated the intermediate values of every DES round,

1Configuring this FPGA, a maximum of around 70, 000 ci-phertexts are available.

Figure 6: Correlation for hypothesizing the HD be-tween the output of the first two rounds of each DESmodule

Figure 7: Correlation peaks for HD between theoutputs of all the 16 rounds of the second DES

considering the first ciphertext block in the bitstream andthe known key as inputs.

The power consumption model for the analysis was chosenbased on an assumption about the internal DES hardware,i.e, the common design choice that the state of the cipher issaved in a buffer in every round. Consequently, we computedthe correlation coefficient between the processed measure-ments and the HD of two consecutive round outputs of thefirst DES decryption, which is a typical hypothetical powermodel when attacking hardware. We supposed different re-lations between the ciphertexts and power traces. For thehypothesis that the previously sent ciphertext is processedwhen sending the current 64-bit packet a strong correlationappears right after the trigger point (see Fig. 6). As a result,we could deduce that a 64 bit ciphertext block is processedtwo clock cycles after it is fully input into the device. Togain further insights into the hardware architecture we re-peated the same for the input of the second and the thirdDES executions. The results are shown in Fig. 6: clearly,the correlation corresponding to each of the three DES runscan be spotted.

In order to further confirm the presence of a buffer be-tween every round of the DES cipher, we ran a series ofpower analyses, this time hypothesizing the HD between theinput and the output of every DES round. Fig.7 depicts thecorrelation related to the 16 rounds at hand of the secondDES execution. The correlation peak for the inner buffers isclearly present and the position of the peak shifts forward intime, increasing with the number of the hypothesized round.

An important point here is the role of the initial permu-tation (IP) of DES. By trial-and-error we found that re-liable correlation results are obtained when excluding IPand IP−1 in the hypothesis. A self-evident hypotheticalpower model for attacking the first round of the first DESis HD = HW(IP(C) ⊕ Round(IP(C),K)), where C is a 64-bit ciphertext, “Round” represents the 64-bit DES roundfunction, and K is the last 48-bit subkey of the first DES.

Accordingly, all the correlation results shown above base onthis hypothetical power model.

Observing that the features of the correlation figures donot sport any high frequency features, we tried a more re-strictive filtering strategy: an additional low-pass filter witha cut-off frequency of 50 MHz was applied to all the collectedtraces and the HD-based correlation analysis was repeated.The results did not show any significant change in the shapeand the amplitude of the correlations, thus we proceed inthe actual attack with low-pass filtered traces.

3.4 The V2 DES Engine InternalsWe conclude the profiling with some assumptions about

the hardware implementation of the 3DES engine inside theVirtex-II Pro. It is conceivable that it employs a round func-tion executing a round of the cipher per clock cycle, sinceDES employs 8 different S-boxes that cannot be shared dur-ing a round computation. The correlation peaks in Fig. 7moving forward in time with the targeted round of DESsupports this assumption. The corresponding hypotheticalarchitecture is depicted in Fig. 8. Due to the Feistel struc-ture of the cipher, the encryption and decryption can berealized by 48 successive rounds, supplying the subkeys inthe correct order, and applying the IP before the start ofthe first round and IP−1 after the last round.

64 64

64 64

48

64 -1 64

64

CLKstart

Figure 8: Our hypothetical architecture of the in-ternal triple DES module of Virtex-II Pro FPGA

3.5 Extracting Keys from the Virtex-IIAfter choosing the proper information leakage model in

the previous section, we now proceed to perform the attackin a real-world scenario: this time, an FPGA with an un-known set of keys is attacked. During a normal power-up ofthe target device, the adversary measures the power tracesand collects the corresponding ciphertexts contained in thebitstream. Since the bitstream is encrypted with a securecipher, even the simplest design results in the ciphertextsbeing uniformly distributed. This allows the attacker tomeasure the encryption operation of a whole codebook ofplaintexts for each reasonably sized key hypothesis.

In the case of DES, each subkey of length 48-bit can bedivided into 6-bit parts, since the value of a single bit of theoutput of a DES round is influenced by 6 key bits added tothe input of the S-box computing it, because of the inputwidth of the DES S-boxes. This allowed us to choose a 6-bit wide key hypothesis, thus, to reveal the first round keythe attack therefore has to be repeated eight times, once foreach key portion considered in the attack. After the first48 bits of the round key have been recovered, the remaining6 bits of the 56-bit DES key are exposed by attacking the

second round of the DES operation. We performed a fullkey recovery involving 50, 000 traces by hypothesizing theHD between a single bit of the output from a DES roundand the previous value being stored in the round buffer (forsimplicity consider a bit of the round buffer in Fig. 8). Oncethe 56-bit key for the first DES decryption is known theinput for the subsequent DES encryption is computed andits 56-bit key (and finally that of the last DES decryption)is revealed similarly.

To quantify the minimum sampling frequency required fora successful key recovery, we further investigated the factthat the attacks still works if the traces are sampled at asensibly lower frequency. Following the fact that we applieda lowpass filter at 50 MHz, we applied a 100 times decima-tion to the traces, i.e., use only every hundredth samplingpoint and discard the rest, and repeated all the attacks inorder to see if the reduced traces are still containing enoughinformation. As a result, all the attacks mentioned up tonow are repeatable on the decimated traces, thus yielding asignificant speedup in the required computation time and areduction in the memory footprint.

It is possible to recover a 6-bit DES subkey in less than 4seconds of computation time on a common desktop PC, witha memory footprint smaller than 20 MB. The whole 112-bitkey of a 2-key triple DES is hence revealed by our attack inroughly two minutes and the 168-bit key of a 3-key tripleDES in three minutes. The trace decimation also impactson the minimum sampling frequency required for the digitaloscilloscope: since the decimated traces are now sampled ata rate of 100 MS/s this can be regarded as a new lower boundfor the required sampling frequency of the oscilloscope.

3.5.1 Practical Results

Figure 9: Results of the DPA attack on 50, 000 tracesemploying non decimated lowpass traces targetingthe (a) first (b) second (c) third DES execution (cor-rect key guess in black, the others in gray)

So far, the same set of 50, 000 power traces (acquired fromloading a single bitstream into the FPGA) was used to pro-file the power consumption characteristics of the device andto perform the actual attacks. To determine the minimumnumber of traces required to reveal the complete triple DESkey we have examined all subkeys of all three DES execu-tions accordingly. As illustrated in Fig. 9 the power con-sumption right after the trigger signal is disturbed by theI/O activity, causing that a key-recovery attack on the firstDES execution is harder than the others. The result of theattack over the number of traces for the worst case is shownby Fig. 10 indicating that with the current attack setup aminimum of 25, 000 measurements is required for the cor-rect hypothesis to emerge, albeit the statistical confidencemargin is not high.

The minimum number of required traces is largely af-fected by the measurement setup and environmental noise.Depending on these parameters therefore the attacker mayneed more traces. Our target FPGA, i.e., XC2VP7, is oneof the smallest variants on the market. Still, the size of itsconfiguration bitstream (independent of the size of the de-sign) suffices to acquire more than 70, 000 measurements ofdifferent ciphertexts being encrypted by the 3DES engine.Hence, loading of a single bitstream into our target FPGAduring a power-up suffices to acquire close to two times moremeasurements than required for a successful recovery of thefull secret key — this indicates that our attack can be con-ducted on real-world products even in a very noisy mea-surement environment, and while large scale ASIC compo-nents are embedded in the FPGA chip. Note, that repeatingthe measurements for more power-ups does not increase thenumber of different ciphertexts — still averaging over thepower-ups allows to reduce the noise level.

4. IMPLICATIONSIn the previous section we have shown that the bitstream

encryption feature aiming to provide IP protection can becircumvented by extracting the secret keys used for the en-cryption via power analysis. The complete content of anyVirtex-II Pro protected with bitstream encryption can fallinto the hands of a competitor or criminal — this may im-ply system-wide damage, if IP such as encryption schemesand keys programmed into the FPGA, or similar secrets,are misused or become public. The attacks hence have adevastating impact on the security of products employingbitstream encryption in the real-world.

For performing our key-recovery, an attacker needs knowl-edge about side-channel cryptanalysis, a basic lab measure-ment setup and physical access to the target device in which

Figure 10: Result of the DPA attacks targeting thefirst DES as a function of number of traces (correctkey guess in black, the others in gray)

the Virtex-II Pro FPGA is embedded. The only modifica-tion required to be performed on the product to be attackedis removing the capacitances which have negative effect onpower measurements. Also, the lithium battery must remainconnected to the FPGA to ensure that the secret keys arenot zeroed. The adversary connects her measurement equip-ment and powers up the product-under-attack once, whilemonitoring the loading of the encrypted bitstream. Finally,she performs the statistical evaluation on a standard PC torecover the full secret key of the 3DES. Applying the meth-ods described in this paper, the keys used for the bitstreamencryption can be recovered with modest efforts in less thanone hour, using the measurements of only one single config-uration process.

After the key recovery the attacker can decrypt the pro-tected bitstream from the Virtex-II Pro FPGA, and hencepossesses its complete content. As a consequence, she canproduce a clone of the original FPGA on the basis of anempty device by simply configuring it with the extractedbitstream, even without using the encryption feature, forexample to replicate the product of a competitor.

Furthermore, by reverse-engineering the internal wiring ofthe FPGA from the bitstream a stolen design can be ana-lyzed, the internal secrets extracted and used for maliciouspurposes. The stolen design could even be improved and dis-guised as an own product that outrivals the original. WhileXilinx tries to establish security by obscurity by keeping theexact mapping of the bitstream to the internal circuitry oftheir product confidential, there already exist methods to re-cover the original design of an FPGA from a bitstream [24,30].

In addition to lost IP, in a security-critical environment re-programming the attacked FPGA with an ill-intended mod-ified code, e.g., to accomplish malfunctioning or to unno-ticedly implement a hardware trojan [21], is a particularlydamaging option. In the following we illustrate possible im-plications that our attacks may have at hand of two exam-ples from the commercial sector2, without naming manufac-turers.

4.1 Real-World Example 1: Set Top BoxSet-top boxes are widely used in the field for receiving dig-

ital TV and radio programmes, pay-per-view special events,allow to watch recent movies on a subscription basis andenable video on demand, i.e., remotely renting a video in-cluding the ability to pause, rewind, and fast forward. Aset-top box is an ideal candidate for using FPGAs withbitstream encryption: In addition to the computationallyhighly demanding demodulating and decompressing algo-rithms for descrambling the transmitted TV and radio pro-grammes, the manufacturers often employ their proprietaryencryption schemes, e.g., for the above mentioned applica-tions. The secure bitstream encryption feature enables toregularly update and improve the firmware running on theset-top boxes in the households, for example by cable or asatellite link, and establish corresponding business modelswithout disclosing the descrambling schemes or secret keysto potential attackers.

Our key-recovery attack puts set-top boxes relying on theinvestigated or similar FPGAs using bitstream encryptionat a high risk: an attacker extracting the firmware of a set-

2for obvious reasons we do not cover military applicationshere.

top box knows amongst others the scrambling scheme andsecret keys, can gain access to all digital content withoutpaying and is thus able to circumvent the above mentionedbusiness models. A criminal can make high financial gain(and cause high financial losses for the service provider) byproducing new set-top boxes or modifying the firmware ofexisting set-top boxes to enable the usage of the services freeof cost by the customers.

4.2 Real-World Example 2: RouterNetwork routers redirect data traffic with a speed of hun-

dreds of Gigabits per second in local networks and often con-stitute the interface to the outside world, e.g., the Internet,for companies, government agencies and private households.In addition to routing data, an FPGA in common productsoften realizes security-relevant functions such as a firewallto separate private local networks from other insecure ones.In case of a bug in the firmware or if a security fix is re-quired, some routers can be updated remotely — again, thebitstream encryption is the basis for establishing this func-tionality. Assuming a router that is based on a Virtex-IIPro, an attacker having one-time physical access to it canperform our key-recovery attack. From there on, she canremotely initiate firmware updates, e.g., via the Internet,to modify the functionality of the router according to herdemands.

A denial-of-service attack, aiming at malfunctioning orto destroy the device or other connected equipment is oneoption. More likely, a modified firmware can be used toopen a covered channel allowing the adversary to spy outand access data from the internally connected computersand other devices, e.g., by implementing a trojan horse thatsecretly forwards all internal traffic to the adversary. Theuser typically has no means to verify whether his firmwarehas been manipulated or if it is original, hence — dependingon the location where the manipulated router is used —a modified firmware can have a devastating impact on theprivacy and security, and even the safety, of the attackedindividual or company.

4.3 Scope of our FindingsAs researchers of a university with this paper we intend to

inform about possible vulnerabilities when using bitstreamencryption and warn the end-users of products incorporat-ing this feature from possible damage. We believe there canbe fairly severe real-world implications (depending on thecommercial devices in which Virtex-II Pro FPGAs are used)due to our findings. Furthermore, it seems highly likely thatcertain determined attackers, e.g., foreign intelligence ser-vices, are already capable of extracting and altering FPGAdesigns using power analysis techniques. Since the successof a power analysis attack in general does not depend onthe cipher employed, it is conceivable that similar attackscan be applied to newer generations of FPGAs from differ-ent manufacturers, employing different ciphers, such as theVirtex 4 family or Altera products using AES-256 in theirbit encryption schemes [7, 29].

5. CONCLUSIONWe presented the first attacks targeting the bitstream en-

cryption of FPGAs in the literature. By profiling the powerconsumption of the target FPGA we reverse-engineered allrelevant details of the security feature and pinpointed the

time instant when the decryption of the ciphertext blocksof the bitstream is performed by a dedicated hardware con-tained in the FPGA. We identified the appropriate powermodel for attacking the triple DES engine by means of poweranalysis and deduced the internal architecture of the hard-ware. Our proposed techniques for the digital signal pro-cessing of the power measurements enable a highly efficientpower analysis attack that allows the full secret key of thetriple DES to be extracted in only 3 minutes of computa-tion time from only 25, 000 measurements, obtainable witha single boot up of the device. Our findings indicate thatthe attacks are possible using a low-cost oscilloscope with asample rate as little as 100 MS/s.

In particular, we have successfully exemplified our at-tack at hand of the bitstream decryption of a Virtex-II ProXC2CP7 FPGA manufactured by the market leader Xilinx.We are able to recover all three different keys used by itstriple DES module from a single power-up of the device in areal-world scenario. The lithium battery for the key storage,providing extra security according to Xilinx, does not haveto be removed for the attack. It is highly conceivable thatsimilar attacks can be applied to other series of FPGAs, e.g.,from other manufacturers.

Besides the obscurity that emerges from Xilinx keepingthe details of their bitstream files secret we encountered nocountermeasures against side-channel analysis — an alarm-ing fact considering the protection of many devices deployedin the field: an attacker knowing the secret key used for theencryption of the bitstream can obtain all secrets and in-tellectual property contained in commercial products, e.g.,proprietary encryption schemes or processing algorithms.

As a consequence of our attacks, cloning of the prod-ucts protected by the bitstream encryption scheme becomesstraightforward. After reverse engineering the content ofthe FPGA from the bitstream, improved products could bemarketed by a competitor that outrival the original. Muchworse, the content of commercial products can be updatedremotely, e.g., via Internet, with a maliciously modified newfirmware. This poses a severe threat to the reliability of theproducts, puts the privacy of individuals and companies ata high risk and further enables to infect the devices withembedded malware.

6. REFERENCES[1] Defense Science Board.

http://www.acq.osd.mil/dsb/.

[2] Side-channel Attack Standard Evaluation Board(SASEBO). Further information are available viahttp://www.rcis.aist.go.jp/special/SASEBO/.

[3] Xilinx ISE Design Suite.http://www.xilinx.com/tools/designtools.htm.

[4] IEEE Standard Test Access Port and Boundary-ScanArchitecture. IEEE Std 1149.1-2001, pages i –200,2001.

[5] D. Abraham, G. Dolan, G. Double, and J. Stevens.Transaction Security System. In IBM Systems Journal30, pages 206–229, 1991.

[6] G. Agosta, A. Barenghi, F. D. Santis, and G. Pelosi.Record Setting Software Implementation of DESUsing CUDA. Information Technology: NewGenerations, Third International Conference on, pages748–755, 2010.

[7] ALTERA. Using the Design Security Feature inStratix II and Stratix II GX Devices (AN 341 version2.3). Technical report, August 2009.http://www.altera.com/literature/an/an341.pdf.

[8] ATK. XM25 Counter Defilade Target EngagementSystem. http://www.atk.com/customer_solutions_missionsystems/documents/sw_iw_xm25.pdf, May2009. Post “FPGAs in interesting places – the XM25Airburst Weapon System” by Saar Drimer onwww.fpgasecurity.com.

[9] A. Barenghi, G. Pelosi, and Y. Teglia. Improving firstorder differential power attacks through digital signalprocessing. In ACM-SIGSAC International Conferenceon Security of Information and Networks, pages124–133. ACM, 2010.

[10] A. Barenghi, G. Pelosi, and Y. Teglia. Informationleakage discovery techniques to enhance secure chipdesign. In C. A. Ardagna and J. Zhou, editors,WISTP, volume 6633 of Lecture Notes in ComputerScience, pages 128–143. Springer, 2011.

[11] E. Brier, C. Clavier, and F. Olivier. Correlation PowerAnalysis with a Leakage Model. In CHES 2004,volume 3156 of LNCS, pages 16–29. Springer, 2004.

[12] W. J. Broad, J. Markoff, and D. E. Sanger. IsraeliTest on Worm Called Crucial in Iran Nuclear Delay.Technical report, New York Times, January 2011.http://www.nytimes.com/2011/01/16/world/

middleeast/16stuxnet.html.

[13] O. Coudert. Why FPGA startups keep failing, 2009.FPGA market shares according to Gartner Inc, 2008.

[14] S. Drimer. Security for volatile FPGAs. TechnicalReport UCAM-CL-TR-763, University of Cambridge,Computer Laboratory, Novembre 2009. ISSN1476-2986 http://www.cl.cam.ac.uk/techreports/

UCAM-CL-TR-763.pdf.

[15] T. Eisenbarth, T. Kasper, A. Moradi, C. Paar,M. Salmasizadeh, and M. T. M. Shalmani. On thePower of Power Analysis in the Real World: AComplete Break of the KeeLoq Code HoppingScheme. In CRYPTO 2008, volume 5157 of LNCS,pages 203–220. Springer.

[16] Eric Peeters and Francois-Xavier Standaert andJean-Jacques Quisquater. Power and ElectromagneticAnalysis: Improved Model, Consequences andComparisons. Integr. VLSI J., 40(1):52–60, 2007.

[17] T. Guneysu, T. Kasper, M. Novotny, C. Paar, andA. Rupp. Cryptanalysis with COPACOBANA. IEEETransactions on Computers, 57(11):1498–1513, 2008.

[18] P. Kocher, J. Jaffe, and B. Jun. Differential PowerAnalysis. In CRYPTO 99, volume 1666 of LNCS,pages 388–397. Springer, 1999.

[19] R. Krueger. Application Note XAPP766: Using HighSecurity Features in Virtex-II Series FPGAs.Technical report, XILINX, 2004.http://www.xilinx.com/support/documentation/

application_notes/xapp766.pdf.

[20] A. Lesea. IP Security in FPGAs, White Paper WP261. Technical report, XILINX, February 2007.

[21] L. Lin, M. Kasper, T. Guneysu, C. Paar, andW. Burleson. Trojan Side-Channels: LightweightHardware Trojans through Side-Channel Engineering.

In CHES, volume 5747 of LNCS, pages 382–395.Springer, 2009.

[22] A. Menezes, P. C. van Oorschot, and S. A. Vanstone.Handbook of Applied Cryptography. CRC Press, 1996.

[23] NIST. FIPS-46-3: Data Encryption Standard (DES),1999.

[24] J.-B. Note and E. Rannaud. From the bitstream tothe netlist. In M. Hutton and P. Chow, editors, 16thInternational Symposium on Field Programmable GateArrays, FPGA 2008. ACM, 2008.

[25] S. B. Ors, E. Oswald, and B. Preneel. Power-AnalysisAttacks on an FPGA - First Experimental Results. InCHES 2003, volume 2779 of LNCS, pages 35–50.Springer, 2003.

[26] Recurity Labs. Embedded Analysis. 27th ChaosCommunication Congress, Dec. 2010. http://events.ccc.de/congress/2010/wiki/Embedded_Analysis.

[27] F.-X. Standaert, S. B. Ors, J.-J. Quisquater, andB. Preneel. Power Analysis Attacks Against FPGAImplementations of the DES. In FPL 2004, volume3203 of LNCS, pages 84–94. Springer, 2004.

[28] A. Telikepalli. Is Your FPGA Design Secure? XCellJournal, XILINX, Fall 2003.

[29] C. W. Tseng. Lock Your Designs with the Virtex-4Security Solution. XCell Journal, XILINX, Spring2005.

[30] T. J. Wollinger, J. Guajardo, and C. Paar. Security onFPGAs: State-of-the-art implementations and attacks.ACM Transactions in Embedded Computing Systems(TECS), 3(3):534–574, 2004.

[31] XILINX. Virtex-2 Platform FPGA User Guide(UG002 version 2.2). Technical report, November2007. http://www.xilinx.com/support/documentation/user_guides/ug002.pdf.

[32] XILINX. Virtex-II Pro and Virtex-II Pro X FPGAUser Guide. Technical report, 2007.http://www.xilinx.com/support/documentation/

user_guides/ug012.pdf.

[33] XILINX. Virtex-II Pro Platform FPGAs: CompleteData Sheet (DS 083 version 4.7). Technical report,November 2007. http://www.xilinx.com/support/documentation/data_sheets/ds083.pdf.

Date post:	12-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

On the Vulnerability of FPGA Bitstream Encryption against ... · neering or the introduction of...

Documents