+ All Categories
Home > Documents > E˙icient Architecture Design for the AES-128 Algorithm on ...

E˙icient Architecture Design for the AES-128 Algorithm on ...

Date post: 18-Mar-2022
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
9
Efficient Architecture Design for the AES-128 Algorithm on Embedded Systems Rupam Mondal, Hau Ngo, James Shey, Ryan Rakvic, Owens Walker, Dane Brown United States Naval Academy, Electrical and Computer Engineering Department Annapolis, United States of America {m204302,ngo,shey,rakvic,owalker,dabrown}@usna.edu ABSTRACT Many applications make use of the edge devices in wireless sensor networks (WSNs), including video surveillance, traffic monitoring and enforcement, personal and health care, gaming, habitat moni- toring, and industrial process control. However, these edge devices are resource-limited embedded systems that require a low-cost, low-power, and high-performance encryption/decryption solution to prevent attacks such as eavesdropping, message modification, and impersonation. This paper proposes a field-programmable gate array (FPGA) based design and implementation of the Advanced Encryption Standard (AES) algorithm for encryption and decryp- tion using a parallel-pipeline architecture with a data forwarding mechanism that efficiently utilizes on-chip memory modules and massive parallel processing units to support a high throughput rate. Hardware designs that optimize the implementation of the AES algorithm are proposed to minimize resource allocation and maximize throughput. These designs are shown to outperform ex- isting solutions in the literature. Additionally, a rapid prototype of a complete system-on-chip (SoC) solution that employs the proposed design on a configurable platform has been developed and proven to be suitable for real-time applications. CCS CONCEPTS Hardware Reconfigurable logic and FPGAs; Security and privacy Embedded systems security. KEYWORDS AES, cybersecurity, FPGA, embedded systems ACM Reference Format: Rupam Mondal, Hau Ngo, James Shey, Ryan Rakvic, Owens Walker, Dane Brown. 2020. Efficient Architecture Design for the AES-128 Algorithm on Embedded Systems. In 17th ACM International Conference on Computing Frontiers (CF ’20), May 11–13, 2020, Catania, Italy. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3387902.3392624 1 INTRODUCTION Edge devices in a typical wireless sensor network (WSN) are used to gather information, process the data, and communicate with each other [14]. These devices are used in many domains, including video surveillance, traffic monitoring and enforcement, personal and health care, gaming, habitat monitoring, and industrial process This paper is authored by an employee(s) of the United States Government and is in the public domain. Non-exclusive copying or redistribution is allowed, provided that the article citation is given and the authors and agency are clearly identified as its source. CF ’20, May 11–13, 2020, Catania, Italy 2020. ACM ISBN 978-1-4503-7956-4/20/05. https://doi.org/10.1145/3387902.3392624 control [1]. Each edge device is typically an embedded system that integrates different electronic sensors with an on-chip processing unit [6]. These devices are susceptible to security threats such as eavesdropping, message modification, and impersonation [10]. One of the most common methods to protect these edge devices against those threats is to encrypt the data with a standard encryption algorithm. However, these encryption algorithms generally require significant computing resources such as CPU time, memory us- age, and power consumption [13]. Accordingly, data encryption in edge devices can be a challenging problem because of the limited resources in these embedded systems. There are many algorithms that are suitable for massive data en- cryption, including the Advanced Encryption Standard (AES), Triple Data Encryption Algorithm (3DES), TwoFish, and Rivest-Shamir- Adleman (RSA). Bahnasawi et al. provided a detailed study across many application-specific implementations of these algorithms, and they reported that the AES algorithm is the most suitable algorithm for embedded devices [2]. The work presented in this paper provides integrated designs and implementations of the AES algorithm that support both en- cryption and decryption capabilities on a field-programmable gate array (FPGA) device. The proposed design in this paper uses a parallel-pipeline architecture with a data forwarding mechanism that efficiently utilizes on-chip memory modules tightly coupled with the massive parallel processing units to support a high through- put rate. In addition, novel architecture designs that optimize the implementation of the AES algorithm to minimize resource con- sumption and maximize throughput are presented in this paper. The proposed architecture is designed to smoothly integrate with an embedded system via standard memory-mapped or streaming protocol of a standard system-on-chip-based (SoC) FPGA platform. This approach allows considerable flexibility for deployment while maintaining a high-performance capability without requiring fab- rication of a custom integrated circuit (IC). The proposed design achieves both superior throughput when compared to other mod- ern designs and efficient utilization of the available computational resources. This paper is organized as follows. A brief review of the AES-128 algorithm is provided in Section 2 while Section 3 discusses related work. The proposed design of a high-performance architecture for the AES-128 algorithm is presented in Section 4, and the results are discussed in Section 5. The prototyping of a real–time encryp- tion/decryption unit in an embedded system is discussed in Section 6. Finally, conclusions and potential future work are provided in Section 7. 89
Transcript

Efficient Architecture Design for the AES-128 Algorithm onEmbedded Systems

Rupam Mondal, Hau Ngo, James Shey, Ryan Rakvic, Owens Walker, Dane BrownUnited States Naval Academy, Electrical and Computer Engineering Department

Annapolis, United States of America{m204302,ngo,shey,rakvic,owalker,dabrown}@usna.edu

ABSTRACTMany applications make use of the edge devices in wireless sensornetworks (WSNs), including video surveillance, traffic monitoringand enforcement, personal and health care, gaming, habitat moni-toring, and industrial process control. However, these edge devicesare resource-limited embedded systems that require a low-cost,low-power, and high-performance encryption/decryption solutionto prevent attacks such as eavesdropping, message modification,and impersonation. This paper proposes a field-programmable gatearray (FPGA) based design and implementation of the AdvancedEncryption Standard (AES) algorithm for encryption and decryp-tion using a parallel-pipeline architecture with a data forwardingmechanism that efficiently utilizes on-chip memory modules andmassive parallel processing units to support a high throughputrate. Hardware designs that optimize the implementation of theAES algorithm are proposed to minimize resource allocation andmaximize throughput. These designs are shown to outperform ex-isting solutions in the literature. Additionally, a rapid prototype of acomplete system-on-chip (SoC) solution that employs the proposeddesign on a configurable platform has been developed and provento be suitable for real-time applications.

CCS CONCEPTS• Hardware → Reconfigurable logic and FPGAs; • Securityand privacy → Embedded systems security.

KEYWORDSAES, cybersecurity, FPGA, embedded systemsACM Reference Format:Rupam Mondal, Hau Ngo, James Shey, Ryan Rakvic, Owens Walker, DaneBrown. 2020. Efficient Architecture Design for the AES-128 Algorithm onEmbedded Systems. In 17th ACM International Conference on ComputingFrontiers (CF ’20), May 11–13, 2020, Catania, Italy. ACM, New York, NY, USA,9 pages. https://doi.org/10.1145/3387902.3392624

1 INTRODUCTIONEdge devices in a typical wireless sensor network (WSN) are usedto gather information, process the data, and communicate witheach other [14]. These devices are used in many domains, includingvideo surveillance, traffic monitoring and enforcement, personaland health care, gaming, habitat monitoring, and industrial processThis paper is authored by an employee(s) of the United States Government and is inthe public domain. Non-exclusive copying or redistribution is allowed, provided thatthe article citation is given and the authors and agency are clearly identified as itssource.CF ’20, May 11–13, 2020, Catania, Italy2020. ACM ISBN 978-1-4503-7956-4/20/05.https://doi.org/10.1145/3387902.3392624

control [1]. Each edge device is typically an embedded system thatintegrates different electronic sensors with an on-chip processingunit [6]. These devices are susceptible to security threats such aseavesdropping, message modification, and impersonation [10]. Oneof the most common methods to protect these edge devices againstthose threats is to encrypt the data with a standard encryptionalgorithm. However, these encryption algorithms generally requiresignificant computing resources such as CPU time, memory us-age, and power consumption [13]. Accordingly, data encryption inedge devices can be a challenging problem because of the limitedresources in these embedded systems.

There are many algorithms that are suitable for massive data en-cryption, including the Advanced Encryption Standard (AES), TripleData Encryption Algorithm (3DES), TwoFish, and Rivest-Shamir-Adleman (RSA). Bahnasawi et al. provided a detailed study acrossmany application-specific implementations of these algorithms, andthey reported that the AES algorithm is the most suitable algorithmfor embedded devices [2].

The work presented in this paper provides integrated designsand implementations of the AES algorithm that support both en-cryption and decryption capabilities on a field-programmable gatearray (FPGA) device. The proposed design in this paper uses aparallel-pipeline architecture with a data forwarding mechanismthat efficiently utilizes on-chip memory modules tightly coupledwith themassive parallel processing units to support a high through-put rate. In addition, novel architecture designs that optimize theimplementation of the AES algorithm to minimize resource con-sumption and maximize throughput are presented in this paper.The proposed architecture is designed to smoothly integrate withan embedded system via standard memory-mapped or streamingprotocol of a standard system-on-chip-based (SoC) FPGA platform.This approach allows considerable flexibility for deployment whilemaintaining a high-performance capability without requiring fab-rication of a custom integrated circuit (IC). The proposed designachieves both superior throughput when compared to other mod-ern designs and efficient utilization of the available computationalresources.

This paper is organized as follows. A brief review of the AES-128algorithm is provided in Section 2 while Section 3 discusses relatedwork. The proposed design of a high-performance architecture forthe AES-128 algorithm is presented in Section 4, and the resultsare discussed in Section 5. The prototyping of a real–time encryp-tion/decryption unit in an embedded system is discussed in Section6. Finally, conclusions and potential future work are provided inSection 7.

89

CF ’20, May 11–13, 2020, Catania, Italy Rupam Mondal et al.

2 OVERVIEW OF THE AES-128 ALGORITHMThe AES algorithm is a symmetric block cipher used to encrypt anddecrypt digital data. The original data is called plaintext, and theencrypted data is called ciphertext. A block of data and a cipher keyare 128 bits each for the AES-128 standard [18]. AES-128 is secureenough to protect information up to the secret level [7]. Informationis classified as secret by the U.S. Government if its unauthorizeddisclosure can cause serious damage to national security [16]. Ablock is represented in an array of 16 bytes known as the State [18].In this paper, each byte in an input block is referenced using theform inn, where n is a value between 0 and 15. An input block is pre-sented in the following form: in15in14in13. . . in0. Similarly, each bytein an output block is referenced using the form outn, and an outputblock is presented in the following form: out15out14out13. . . out0 .Each byte in a key is referenced using the form kn, and a key ispresented in the following form: k15k14k13. . . k0. Figure 1 shows theoverall structure of the AES algorithm.

AES-128 performs 10 rounds of encryption/decryption. Eachround uses a different key called a Round Key. The 10 Round Keysare generated by a Key Scheduler using the provided cipher key[18].

In the Add Round Key (AddRoundKey) operation, Galois Field oforder 256 addition, or GF(256) addition, is performed to add theinput block to the Round Key. A bitwise XOR is used to performGF(256) addition [18]. A bitwise XOR takes two bit patterns of equallength and performs the logical exclusive OR operation on eachpair of corresponding bits.

In the Substitute Bytes (SubBytes) operation, each input byte issubstituted by a different value defined by a substitution table (S-box). The input byte value is used as the index to look up the outputbyte value in the S-box. The Inverse Substitute Bytes (InvSubBytes)operation is the same as the SubBytes operation with the exceptionthat it uses the inverse S-box instead [18].

In the Shift Rows (ShiftRows) operation, each row of the Statearray is cyclically shifted left by a set amount. The 1st row is un-changed, the 2nd row is cyclically shifted left by one position, the3rd row is cyclically shifted left by two positions, and the 4th rowis cyclically shifted left by three positions. The Inverse Shift Rows(InvShiftRow) operation is the same as the ShiftRows operation withthe exception that each row is cyclically shifted right instead [18].

The Mix Columns (MixColumns) operation on each column ofthe State array is achieved using

𝑑0𝑑1𝑑2𝑑3

=

2 3 1 11 2 3 11 1 2 33 1 1 2

𝑏0𝑏1𝑏2𝑏3

(1)

and the Inverse Mix Columns (InvMixColumns) operation on eachcolumn of the State array is achieved using

𝑏0𝑏1𝑏2𝑏3

=

14 11 13 99 14 11 1313 9 14 1111 13 9 14

𝑑0𝑑1𝑑2𝑑3

(2)

Figure 1: Overall structure of the AES-128 algorithm.

3 RELATEDWORKMany researchers have proposed different designs and implementa-tions of the AES encryption algorithm on an FPGA device. Featureslike parallel digital signal processing (DSP) units, embedded memoryblocks and registers, and high-speed storage interfaces make FPGAdevices a suitable platform to facilitate a complete system-on-chipsolution for edge devices. Chen et al. presented a deep pipelineand full expansion method to maximize throughput and minimizelatency for big data applications on an FPGA device [4]. Zopde andSapkal proposed a way to generate S-box values and encryptionkeys with a PN Sequence Generator to enhance encryption qualityand maximize throughput [20]. Khose and Raut stored the S-boxvalues using Block RAMs in an FPGA device to achieve low areaand power consumption while maintaining a high throughput rate[9]. Rao et al. also used Block RAMs in their design to optimizepower, speed, and area for FPGA-based WSNs [15].

Intellectual property (IP) cores for FPGAs that perform AESencryption and decryption are also available. They are provided byboth Intel and Xilinx [8, 19].

Other researchers have presented their implementations of theAES algorithm on platforms other than FPGAs. Schwabe and Stof-felen proposed an AES-based method to protect against timing andside-channel attacks using an ARM assembly implementation that

90

Efficient Architecture Design for the AES-128 Algorithm on Embedded Systems CF ’20, May 11–13, 2020, Catania, Italy

targeted the ARM Cortex-M3 and M4 embedded microprocessors[17]. Luo et al. provided an implementation of the AES algorithmon a Graphics Processing Unit (GPU), and they found that paral-lel computing hardware systems such as GPUs are vulnerable topower-based side-channel attacks [11]. Dao et al. implemented theAES algorithm on a low area 8-bit AES encryption core with anoptimized S-box for wireless networks [5].

Other researchers focused their efforts on optimizing the designsand implementations of the AES algorithm for specific low powerconsumption applications. For example, Banik et al. presented amethod to dynamically adjust the number of AES encryption roundsbased on the application’s configurations/requirements. To reducethe energy consumption, they proposed a clock and power gatingtechnique to reduce the dynamic switching rate in the computa-tional modules [3].

4 DESIGN OF A HIGH PERFORMANCE ANDRESOURCE-EFFICIENT ARCHITECTUREFOR AES-128 ALGORITHM

This section presents the methods used to design and implement ahigh-performance and resource-efficient architecture for the AES-128 algorithm. Contributions include a complete parallel-pipelinearchitecture with a data forwarding mechanism that takes advan-tage of the computational resources provided by an FPGA device, aswell as novel architecture designs that optimize the implementationof the AES algorithm.

4.1 Complete Parallel-Pipeline Architecturewith Data Forwarding

The parallel-pipeline architecture and data forwarding mechanismimplemented in this paper take advantage of the on-chip memorymodules andmassive parallel processing units provided by an FPGAdevice to maximize the throughput. To implement the pipelinearchitecture, each round of encryption/decryption is treated as apipeline stage. Figure 4 and Figure 5 show the block diagrams ofthe overall pipeline architecture for the AES encryption algorithmand AES decryption algorithm respectively.

This paper implements full parallelism in its architecture designto maximize throughput. There are 16 parallel data paths for eachbyte of the data block to be processed individually and simultane-ously. An S-Box table is integrated into every data path as shownin Figure 4, and an Inverse S-box table is integrated into every datapath as shown in Figure 5. High performance embedded memoryblocks of the FPGA device are used to support the SubBytes andInvSubBytes operations. Also, hardware units for multiplicationare implemented into each data path to support the MixColumnsand InvMixColumns operations as shown in Figure 4 and Figure 5respectively. Finally, four hardware units for the MixColumns andAddRoundKey operations are integrated into the encryption archi-tecture design as shown in Figure 4, and four hardware units forthe InvMixColumns operations are integrated into the decryptionarchitecture design as shown in Figure 5. A significant improve-ment in throughput is achieved through the implementation of fullparallelism.

The design presented in this work implements an optimiza-tion known as data forwarding to perform the ShiftRows and In-vShiftRows operations. This optimization does not require addi-tional computational resources. For the ShiftRows operation, theoutput of each S-box table is forwarded to a specified MixColumnsand AddRoundKey hardware unit based on the layout shown in Fig-ure 2. The same optimization is used for the InvShiftRows operationusing a different layout. For round 10 of encryption/decryption, thesame optimization is used to forward the output of each S-box orInverse S-box to a specified XOR gate as shown in Figure 6 andFigure 7.

Figure 2: ShiftRows layout.

4.2 Optimization for MixColumns andAddRoundKey Implementation

Equation (1) shows that theMixColumns operation requires GF(256)multiplication by two, and Figure 4 shows that this multiplicationoperation is performed extensively. Figure 3 shows the conventionaland proposed implementations of GF(256) multiplication by 2. Theproposed implementation uses less logic elements (LEs) than theconventional implementation. Also, the proposed implementationhas a shorter critical path than the conventional implementation.

For the first nine rounds of the AES algorithm, the MixColumnsoperation and the subsequent AddRoundKey operation both requireGF(256) addition. These two operations are performed simultane-ously using a 6-input XOR gate as shown in Figure 4. The inputs toeach XOR gate come from the hardware units for GF(256) multipli-cation by 2 and the Round Keys as shown in Figure 4. Combiningthese operations reduces both the allocation of LEs in the FPGAand the critical path in the architecture.

Figure 3: Implementations of GF(256)multiplication by two.

91

CF ’20, May 11–13, 2020, Catania, Italy Rupam Mondal et al.

Figure 4: Architecture design for rounds 1 through 9 of the AES encryption algorithm.

4.3 Optimization for InvMixColumns andAddRoundKey Implementation

Equation (2) shows that the InvMixColumns operation requiresGF(256) multiplication by 9, 11, 13 and 14. These operations areachieved with hardware designs for GF(256) addition and GF(256)multiplication by 2 using

𝑥 × 9 =( ( (𝑥 × 2) × 2

) × 2)+ 𝑥 (3a)

𝑥 × 11 =(( ( (𝑥 × 2) × 2

) + 𝑥

)× 2

)+ 𝑥 (3b)

𝑥 × 13 =(( ( (𝑥 × 2) + 𝑥

) × 2)× 2

)+ 𝑥 (3c)

𝑥 × 14 =(( ( (𝑥 × 2) + 𝑥

) × 2)+ 𝑥

)× 2 (3d)

Equations (3a) to (3d) are implemented using the proposed archi-tecture design for GF(256) multiplication by 2 discussed previously.GF(256) addition is performed using XOR gates to perform a bit-wise XOR. In addition to the proposed approach, an AddRoundKeyoperation and a multiplication operation are performed by a singlehardware unit through the use of multiple-input XOR gates. Using

92

Efficient Architecture Design for the AES-128 Algorithm on Embedded Systems CF ’20, May 11–13, 2020, Catania, Italy

Figure 5: Architecture design for rounds 1 through 9 of the AES decryption algorithm.

this combination design with the optimized multiplication unitsdecreases resource consumption considerably. This optimized de-sign also shortens the critical path by eliminating extra hardware toperform the AddRoundKey operation separately. Figure 8 illustratesa block diagram for the implementation for GF(256) multiplicationby 9 combined with the AddRoundKey operation. The implementa-tion for the other multiplication and AddRoundKey operations areachieved using a similar method.

Similar to the implementation of the MixColumns operation, theInvMixColumns operation described by Equation (2) is implementedusing XOR gates to perform GF(256) addition. The inputs to each

XOR gate come from the hardware units for GF(256) multiplicationby 9, 11, 13, and 14 as shown in Figure 5.

5 RESULTSThe design presented in this paper was implemented using the VeryHigh Speed Integrated Circuit Hardware Description Language(VHDL) with Intel Quartus Prime software.

Table 1 shows the allocation of LEs and the critical path forthe conventional and proposed implementations of GF(256) mul-tiplication by 2. These designs were implemented on the Intel10CL040YF484C6G device to provide a proof of concept for the

93

CF ’20, May 11–13, 2020, Catania, Italy Rupam Mondal et al.

Figure 6: Architecture design for round 10 of theAES encryp-tion algorithm.

proposed implementation. Table 1 demonstrates that the proposedimplementation minimizes resource allocation and shortens thecritical path.

The encryption/decryption designs were implemented on the In-tel 5AGTMD3G3F31I3, 5SGXMA3E1H29C1, and EP2AGZ300FF35C3devices. The Intel 5AGTMD3G3F31I3 device was chosen because itis similar to the Xilinx XC7K325T device used in [4]. Both FPGAsare mid-range devices based on 28 nm technology, and they havesimilar amounts of available computational resources. The Intel5SGXMA3E1H29C1 device was chosen to provide performance re-sults on a high-end device based on 28 nm technology. Finally, theIntel EP2AGZ300FF35C3 device was chosen to provide performanceresults on a mid-range device based on 40 nm technology.

For the encryption/decryption design implementations on theIntel 5AGTMD3G3F31I3 and 5SGXMA3E1H29C1 devices, general-purpose input/output (GPIO) were included to interface with ex-ternal devices. This implementation allows for the Round Keysto be changed at runtime, but it requires additional registers and

Figure 7: Architecture design for round 10 of theAES decryp-tion algorithm.

lengthens the critical path. For the encryption/decryption designimplementations on the Intel EP2AGZ300FF35C3 device, the RoundKeys were embedded in hardware. This implementation does notrequire additional registers and shortens the critical path, but itdoes not allow for the keys to be changed at runtime.

Table 2 and Table 3 show the logic utilization, total registers,and total block memory bits on each device for the encryptiondesign implementation and the decryption design implementationrespectively. They demonstrate that the resource allocation is smalland, accordingly, the available computational resources for boththe AES encryption and decryption units are utilized efficiently.

94

Efficient Architecture Design for the AES-128 Algorithm on Embedded Systems CF ’20, May 11–13, 2020, Catania, Italy

Figure 8: Architecture design for GF(256) multiplication by9.

Table 1: GF(256) multiplication by 2 implementation com-parison

Conventional ProposedLEs 33 16Crit.Path 1.02 ns 0.90 ns

Table 2: Resource utilization for encryption

Device Logic Utilization(% Usage)

TotalRegisters

Total BlockMemory Bits(% Usage)

5AGTD3 1,541 ALMs (1%) 2,708 327,680 (2%)5SGXA3 1,540 ALMs (1%) 2,708 327,680 (2%)EP2AGZ300 1,440 LEs (<1%) 1,300 327,680 (2%)

Table 3: Resource utilization for decryption

Device Logic Utilization(% Usage)

TotalRegisters

Total BlockMemory Bits(% Usage)

5AGTD3 2,528 ALMs (2%) 2,708 327,680 (2%)5SGXA3 2,347 ALMs (2%) 2,708 327,680 (2%)EP2AGZ300 1,460 LEs (<1%) 1,300 327,680 (2%)

Table 4 shows the device, critical path, latency, and throughputfor the encryption implementations reported by [4, 9, 15, 20] andthis paper. The design in [20] includes enhanced security featureswith its AES implementation, which most likely affects its perfor-mance. Table 5 shows the device, critical path, latency, and through-put for the decryption implementations reported by [8, 9, 12, 19]

Table 4: Encryption performance comparison

Design DeviceCrit.Path(ns)

Latency(cycles)

Throu-ghput(Gbps)

[4] XC7K325T 4.09 62 31.296[9] XC6SLX16 9.71 10 13.183[15] XC7A200T 3.21 59 0.6763[20] XC5VLX110T 4.30 10 29.73[20] XC6VLX240T 2.16 10 59.31This Work 5AGTD3 3.88 20 33.015This Work 5SGXA3 2.36 20 54.237This Work EP2AGZ300 2.92 20 43.776

Table 5: Decryption performance comparison

Design DeviceCrit.Path(ns)

Latency(cycles)

Throu-ghput(Gbps)

[8]

Arria 10Device(Model notreported)

Notreported

Notreported 20 to 40

[9] XC6SLX16 13.8 10 9.279

[12] XC6VLX240T 3.08 Notreported 41.549

[19] XC7A100T 5.26 Notreported 0.304

This Work 5AGTD3 5.09 20 25.133This Work 5SGXA3 3.06 20 41.802This Work EP2AGZ300 3.74 20 34.234

and this paper. The implementation presented in [19] includes au-tomatic Round Key generation in its design which may affect itsperformance. Also, its latency is not reported. The specific deviceused, the exact throughput, and the exact critical path are not re-ported for the implementation presented in [8]. As shown in thetables, the proposed designs for encryption and decryption achievedlower or comparable critical paths when compared against othersdesigns that used FPGA devices in the similar category and, conse-quently, achieved higher or comparable throughput. In addition, theresults in the tables show the designs presented in this paper havea comparable latency for both encryption and decryption whencompared against most previous designs.

6 PROTOTYPING OF A REAL-TIMEENCRYPTION-DECRYPTION UNIT IN ANEMBEDDED SYSTEM

In addition to simulating the full potential of the parallel-pipelinearchitecture using a data streaming protocol that is capable ofproviding a data block every clock cycle, a prototype of a videosurveillance system employing the proposed design has also beendeveloped. The proposed architecture interfaces with the embedded

95

CF ’20, May 11–13, 2020, Catania, Italy Rupam Mondal et al.

Figure 9: A real-time prototype of the proposed AES-128 encryption/decryption design.

Cortex-A9 processor on a low-cost Intel SoC FPGA device via a stan-dard memory-mapped protocol. An edge-triggered handshakingmechanism was incorporated in the design to allow low overheadcommunication between the processor and the AES encryption-decryption unit. The processor provides the Round Keys to the AESencryption-decryption unit at runtime. The complete system-on-chip solution was realized with the aid of the Platform DesignerTool in the Intel FPGA’s Quartus Prime software. Figure 9 showsthe prototype system with two FPGA platforms acting as two edgedevices (or sensor nodes) that employ the encryption and decryp-tion unit on-board. The first node system captures a frame from alive video stream and encrypts it in real-time. On command, theencrypted image is sent to the second node system over the airusing two XBee wireless transmission devices. The second nodesystem receives the image, decrypts it, and displays the recoveredimage on a monitor.

7 CONCLUSIONThe design of a resource-aware, high performance architecture forthe AES encryption/decryption algorithm was presented in thispaper. The proposed design and implementation require minimalcomputational resources, making them ideal for embedded systemsthat have limited resources and power. The proposed parallel andpipeline architecture achieves the high throughput and low latency

that is desirable for real-time applications. Employing a high speedencryption unit in an embedded system of an edge device enhancesits ability to protect data against security threats such as electroniceavesdropping, impersonation, andmessagemodification. Theworkpresented in this paper is showcased in a real-time prototype of avideo surveillance system.

For future work, it is necessary to test this design’s resistanceto side-channel power analysis. Also, this design can be improvedupon by using the AES-192 and/or the AES-256 standards. AES-192 and AES-256 use 192 bit keys and 256 bit keys respectively[18]. These algorithms are secure enough to protect informationup to the top secret level [7]. The methods used by this paper toimplement the AES-128 standard can be used to implement theAES-192 and AES-256 standards.

REFERENCES[1] Ian F Akyildiz, Tommaso Melodia, and Kaushik R Chowdhury. 2008. Wireless

multimedia sensor networks: Applications and testbeds. Proc. IEEE 96, 10 (2008),1588–1605.

[2] M. A. Bahnasawi, K. Ibrahim, A. Mohamed, M. K. Mohamed, A. Moustafa, K.Abdelmonem, Y. Ismail, and H. Mostafa. 2016. ASIC-oriented comparative reviewof hardware security algorithms for internet of things applications. In 2016 28thInternational Conference on Microelectronics (ICM). 285–288.

96

Efficient Architecture Design for the AES-128 Algorithm on Embedded Systems CF ’20, May 11–13, 2020, Catania, Italy

[3] Subhadeep Banik, Andrey Bogdanov, Tiziana Fanni, Carlo Sau, Luigi Raffo,Francesca Palumbo, and Francesco Regazzoni. 2016. Adaptable AES Implementa-tion with Power-Gating Support. In Proceedings of the ACM International Confer-ence on Computing Frontiers (Como, Italy) (CF ’16). Association for ComputingMa-chinery, New York, NY, USA, 331–334. https://doi.org/10.1145/2903150.2903488

[4] S. Chen, W. Hu, and Z. Li. 2019. High Performance Data Encryption with AESImplementation on FPGA. In 2019 IEEE 5th Intl Conference on Big Data Securityon Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and SmartComputing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS).149–153.

[5] V. Dao, A. Nguyen, V. Hoang, and T. Tran. 2015. An ASIC implementation of lowarea AES encryption core for wireless networks. In 2015 International Conferenceon Communications, Management and Telecommunications (ComManTel). 99–102.

[6] Panu Hämäläinen, Marko Hännikäinen, and Timo D Hämäläinen. 2007. Reviewof hardware architectures for advanced encryption standard implementationsconsidering wireless sensor networks. In International Workshop on EmbeddedComputer Systems. Springer, 443–453.

[7] LynnHathaway. 2003. National policy on the use of the advanced encryption stan-dard (AES) to protect national security systems and national security information.National Security Agency 23 (2003).

[8] Intel. 2010. AES-XTS: Advanced Encryption Standard Core. https://www.intel.com/content/www/us/en/programmable/solutions/partners/partner-profile/cast-inc-/ip/aes-xts--advanced-encryption-standard-core.html

[9] P. N. Khose and V. G. Raut. 2015. Implementation of AES algorithm on FPGA forlow area consumption. In 2015 International Conference on Pervasive Computing(ICPC). 1–4.

[10] MooSeop Kim, Juhan Kim, and Yongje Choi. 2005. Low power circuit architectureof AES crypto module for wireless sensor network. Proceedings of the World

Academy of Science, Engineering and Technology 8 (2005), 146–150.[11] C. Luo, Y. Fei, P. Luo, S. Mukherjee, and D. Kaeli. 2015. Side-channel power

analysis of a GPU AES implementation. In 2015 33rd IEEE International Conferenceon Computer Design (ICCD). 281–288.

[12] OpenCores. 2012. Overview :: AES :: OpenCores. https://opencores.org/projects/tiny_aes

[13] P. Prasithsangaree and P. Krishnamurthy. 2003. Analysis of energy consumptionof RC4 and AES algorithms in wireless LANs. In GLOBECOM ’03. IEEE GlobalTelecommunications Conference (IEEE Cat. No.03CH37489), Vol. 3. 1445–1449 vol.3.

[14] Daniele Puccinelli and Martin Haenggi. 2005. Wireless sensor networks: applica-tions and challenges of ubiquitous sensing. IEEE Circuits and systems magazine5, 3 (2005), 19–31.

[15] M. Rao, T. Newe, and I. Grout. 2015. AES implementation on Xilinx FPGAssuitable for FPGA based WBSNs. In 2015 9th International Conference on SensingTechnology (ICST). 773–778.

[16] R Reagan. 1982. Executive Order 12356,“. National Security Information,” TheWhite House (1982).

[17] Peter Schwabe and Ko Stoffelen. 2017. All the AES You Need on Cortex-M3 andM4. 180–194. https://doi.org/10.1007/978-3-319-69453-5_10

[18] NIST-FIPS Standard. 2001. Announcing the advanced encryption standard (aes).Federal Information Processing Standards Publication 197, 1-51 (2001), 3–3.

[19] Xilinx. [n.d.]. AES Encryption / Decryption. https://www.xilinx.com/products/intellectual-property/1-3fs9k9.html

[20] Harshali Zodpe and Ashok Sapkal. 2018. An Efficient AES Implementationusing FPGA with Enhanced Security Features. Journal of King Saud University -Engineering Sciences 32 (07 2018). https://doi.org/10.1016/j.jksues.2018.07.002

97


Recommended