+ All Categories
Home > Documents > Asymmetric Cryptographic Accelerator CASPER · 2020-02-24 · 1 Introduction The Cryptographic...

Asymmetric Cryptographic Accelerator CASPER · 2020-02-24 · 1 Introduction The Cryptographic...

Date post: 12-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
18
1 Introduction The Cryptographic Accelerator and Signaling Processing Engine with RAM- sharing (CASPER) peripheral provides acceleration to asymmetric cryptographic algorithms as well as to certain signal processing algorithms. Theacceleratoris faster,moreefficient and lower power.Itperformsthe hardtasksoflarge-scalemaththroughacombinationofspeedandusing fewer resources. The processor may be idle or sleeping while the accelerator runs, oritmaybedoingotherrelatedorunrelatedtasks.Further,the systemmaybe able to run using a slower clock to reduce power consumption for energy efficiency. This application note introduces the CASPER on security devices of LPC5500 series: LPC55S6x, LPC55S2x, and LPC55S1x. Because these devices use exact same CASPER, the examples shown in this document are based on the SDK for leading part LPC55S69 for simplicity. 1.1 Asymmetric cryptographic algorithms The CASPER defined here is intended to be a very general engine that can be applied to all manners of cryptographic algorithms in combination with software, including asymmetric public-key (e.g. RSA, and ECC) and the related Diffie-Hellman key Exchange methods, generator exponentials, and non- standard large number algorithms. 1.2 Signal processing algorithms CASPER also can be optionally parameterized to perform signal processing operations such as FFT, DCT, iFFT, most Matrix operations, and SIMD based blending and scaling for graphics. 1.3 Model of CASPER accelerator and facilities it provides The accelerator provides six facilities to improve the efficiency/speed of algorithms, usually by an order of magnitude in speed. Figure 1 shows the block layout. An AHB bus and Armv8-M Co-Processor (CP) interface to allow loading information to perform operation. Fast shared memory access, allowing up to 128 bits to be moved at a time, as shown in Figure 1. Two 32×32 multipliers. A secondary bank of adders and registers to allow MAC type operations (multiply then accumulate). A mask facility to allow side-channel countermeasure by never storing plain values in flops. A state machine to perform operations as needed by the operations. Contents 1 Introduction............................................ 1 1.1 Asymmetric cryptographic algorithms...................... 1 1.2 Signal processing algorithms ..................... 1 1.3 Model of CASPER accelerator and facilities it provides......... 1 2 Approach............................................... 2 3 Operations............................................. 4 3.1 Modes........................... 4 3.2 Internal steps taken and flow by two example modes.............. 4 4 RAM interface........................................ 8 5 Performance numbers........................... 9 6 SDK implementations ......................... 10 6.1 ModExp algorithm....... 11 6.2 Elliptic curve multiplication................ 12 7 CASPER usage in mbedTLS............... 13 8 Revision history................................... 17 AN12445 Asymmetric Cryptographic Accelerator CASPER Rev. 3 — 7 January 2020 Application Note
Transcript

1 IntroductionThe Cryptographic Accelerator and Signaling Processing Engine with RAM-sharing (CASPER) peripheral provides acceleration to asymmetriccryptographic algorithms as well as to certain signal processing algorithms.

Theacceleratoris faster,moreefficient and lower power.Itperformsthehardtasksoflarge-scalemaththroughacombinationofspeedandusing fewerresources. The processor may be idle or sleeping while the accelerator runs,oritmaybedoingotherrelatedorunrelatedtasks.Further,the systemmaybe ableto run using a slower clock to reduce power consumption for energy efficiency.

This application note introduces the CASPER on security devices of LPC5500series: LPC55S6x, LPC55S2x, and LPC55S1x. Because these devices useexact same CASPER, the examples shown in this document are based on theSDK for leading part LPC55S69 for simplicity.

1.1 Asymmetric cryptographic algorithmsThe CASPER defined here is intended to be a very general engine that can beapplied to all manners of cryptographic algorithms in combination withsoftware, including asymmetric public-key (e.g. RSA, and ECC) and the relatedDiffie-Hellman key Exchange methods, generator exponentials, and non-standard large number algorithms.

1.2 Signal processing algorithmsCASPER also can be optionally parameterized to perform signal processingoperations such as FFT, DCT, iFFT, most Matrix operations, and SIMD based blending and scaling for graphics.

1.3 Model of CASPER accelerator and facilities it providesThe accelerator provides six facilities to improve the efficiency/speed of algorithms, usually by an order of magnitude in speed. Figure 1 shows the block layout.

• An AHB bus and Armv8-M Co-Processor (CP) interface to allow loading information to perform operation.

• Fast shared memory access, allowing up to 128 bits to be moved at a time, as shown in Figure 1.

• Two 32×32 multipliers.

• A secondary bank of adders and registers to allow MAC type operations (multiply then accumulate).

• A mask facility to allow side-channel countermeasure by never storing plain values in flops.

• A state machine to perform operations as needed by the operations.

Contents

1 Introduction............................................ 11.1 Asymmetric

cryptographicalgorithms...................... 1

1.2 Signal processingalgorithms ..................... 1

1.3 Model of CASPERaccelerator andfacilities it provides.........1

2 Approach............................................... 2

3 Operations............................................. 43.1 Modes...........................43.2 Internal steps taken

and flow by twoexample modes..............4

4 RAM interface........................................ 8

5 Performance numbers........................... 9

6 SDK implementations ......................... 106.1 ModExp algorithm.......116.2 Elliptic curve

multiplication................ 127 CASPER usage in mbedTLS...............13

8 Revision history................................... 17

AN12445Asymmetric Cryptographic Accelerator CASPERRev. 3 — 7 January 2020 Application Note

Figure 1. Showing how CASPER fits into typical system

2 ApproachThe approach to the CASPER accelerator is based on the fundamental properties shown in Figure 2.

NXP SemiconductorsApproach

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 2 / 18

Figure 2. Showing block diagram of CASPER accelerator

• A group of 4 data registers of 32-bit each (A/B/C/D), used to feed the two multipliers. The multipliers can apply an XORmask for side channel uses.

• A group of 4 result registers (Res[3]/Res[2]/Res[1]/Res[0]) which can be used with 4 adders, and can also performAdd-Mask and XOR operations.

• Special access to 2 or 4 RAMs (up to 8 KB) in parallel.

— The block uses a RAM interface to these RAMs which also supports AHB, so that the application may access theRAMs normally at any time arbitrarily.

— The AHB bus sees pairs of the RAMs as combined by interleaving (i.e. one is the even words and one is the oddwords) whereas the accelerator sees them separately, allowing for 64b word pairs to be accessed in one go.

— The block can access these two or four banks simultaneously, allowing for two or four operations in parallel – i.e. 64or 128 bits at a time.

• Two control words (not shown here) used to launch the accelerator.

NXP SemiconductorsApproach

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 3 / 18

• An optional mask register for creating an XOR mask for unmasking ABCD and masking output for side channel protection.

• Two multipliers supporting 32b × 32b with 64b output each.

• Four Adders using ADD, XOR.

• A carry bit (C) used when full-sum is performed.

3 Operations

3.1 ModesThe following operation modes are supported:

1. 64b×64b:(MUL)

2. 64b×64b + 64b×64b:(MUL+ADD)

3. 64b + 64b:(ADD)

4. 64b - 64b:(SUB)

5. 64b ^ 64b:(XOR)

6. 32b>>:(Shift Right)

7. 32b<<:(Shift Left)

8. Others: Copy, Remark, Fill, ZERO, Compare and so on

3.2 Internal steps taken and flow by two example modes

3.2.1 MUL (64b×64b = 128b)Steps:

1. Read ABCD

2. Step1:

a. Compute DB and DA, sum=DBH+DAL

b. Res[0] = DBL

c. Res[1] = sum

d. Res[2] = DAH

e. Res[3] = 0

3. Step2:

a. Compute CB and CA, sum=CBH+CAL

b. Res[1] += CBL (generate Carry bit)

c. Res[2] += sum + Carry bit (generate Carry bit)

d. Res[3] = CAH + Carry bit

4. Step3:

a. Write Res[3:0]

b. Done

Step1 can be referred to Figure 3

NXP SemiconductorsOperations

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 4 / 18

Figure 3. Step1 flow for 64b×64b

Step2 can be referred to Figure 4

NXP SemiconductorsOperations

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 5 / 18

Figure 4. Step2 flow for 64×64

3.2.2 MUL+ADD (64b×64b + 64b×64b)Steps:

1. Read ABCD

2. Step1:

a. Compute DB and DA, sum=DBH+DAL

b. Res[0] += DBL (generate Carry bit)

c. Res[1] += sum + Carry bit (generate Carry bit)

d. Res[2] += DAH + Carry bit

e. Res[3] += Carry bit

3. Step2:

a. Compute CB and CA, sum=CBH+CAL

b. Res[1] += CBL (generate Carry bit)

c. Res[2] += sum + Carry bit (generate Carry bit)

NXP SemiconductorsOperations

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 6 / 18

d. Res[3] += CAH + Carry bit

4. Step3:

a. Write Res[3:0]

b. Done

Step1 can be referred to Figure 5

Figure 5. Step1 flow of (64b×64b + 64b×64b)

Step2 can be referred to Figure 6

NXP SemiconductorsOperations

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 7 / 18

Figure 6. Step2 flow of (64b×64b + 64b×64b)

4 RAM interfaceThe RAM model is setup to allow for 2 and 4 RAMs (as shown below in Figure 7). This means that the accelerator has accessto 2 or 4 banks at the same time, allowing for 2 or 4 parallel accesses to those RAMs, meaning up to 128 bits of reading, writing.But the AHB bus still sees 1 access of 32 bits.

NXP SemiconductorsRAM interface

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 8 / 18

Figure 7. RAM in system with accelerator

5 Performance numbersThe CASPER accelerator is about several times faster than a pure multiplier for crypto-graphic purposes. Actual speed for varioususes varies based on the algorithm, number of RAMs, whether interleaved, and how software has placed its buffers.

The performance between CASPER accelerated and pure software implementation on LPC55S69 is as shown in Figure 8.

Figure 8. Performance comparison for asymmetric cryptographic algorithms implemented by software and CASPER

Remark:

NXP SemiconductorsPerformance numbers

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 9 / 18

1. The CASPER performance is similar on other security parts of LPC5500 series such as LPC55S2x, LPC55S1x and soon

2. IDE version: IAR8.32.2

Figure 9. IAR version

3. Optimizations: High, Speed, No size constraints

Figure 10. IAR Optimizations

4. SDK Version: 2.6.3 (2019-10-11)

SDK Tag: REL_SDK_NIOBE4_2.6.3_RFP3.0_RC2_3

5. System clock: 150 MHz

6. Example version: release version of project.

7. - stands for no CASPER accelerated, hardware and software efficiency are same.

8. If running in the flash, LPC55S69_cm33_core0_flash.icf needs to be installed.

If running in the RAM, LPC55S69_cm33_core0_ram.icf needs to be installed.

9. Board: LPCXpresso55S69, LPC55S69-EVK

10. Silicon revision: 1B

6 SDK implementationsThe example of CASPER can be found in the SDK package.

After downloading the SDK of the latest version from NXP website, open the CASPER example at the following location:

SDK_2.6.3_LPCXpresso55S69\boards\lpcxpresso55s69\driver_examples\casper\cm33_core0

Compile the project and open the casper.c file. There are some application functions which can be divided into two applications:

• ModExp algorithm

• Elliptic curve Secp256r1 multiplication

NXP SemiconductorsSDK implementations

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 10 / 18

6.1 ModExp algorithmModular exponentiation is a type of algorithms where exponentiation performed over a module. It is useful in computer science,especially in the field of public-key cryptography.

The following example explains how to verify a signature by using the public key (including E and N), as the formula in Figure 11.

Figure 11. Verify signature formula

The example of function codes is as shown in Figure 12.

Figure 12. ModExp code

Implementation process includes a series of complicated data conversions. It is based on classic ModExp algorithm, includingMontgomery modular multiplication and so on. For details, you can research it on the internet. Finally, the algorithm uses basicmultiply, addition and subtraction algorithms. These algorithms can be achieved by CASPER. Some basic application codes areas shown in Figure 13.

Figure 13. CASPER application

Due to accelerator function of CASPER, RSA signature verification will be fast. In the functions, there are some CASPERoperations. As shown in Figure 14, these operations are corresponding to operation modes as described in Operations.

NXP SemiconductorsSDK implementations

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 11 / 18

Figure 14. CASPER operations

6.2 Elliptic curve multiplicationThe functions perform ECC secp384r1 point single scalar multiplication [resX; resY] = scalar _ [X; Y] and ECC secp384r1 pointdouble scalar multiplication [resX; resY] = scalar1 * [X1; Y1] + scalar2 * [X2; Y2]. They are the bases of Elliptic-curve cryptography(ECC). Any details about ECC, you can research it on the internet.

The function codes are as shown in Figure 15.

Figure 15. Elliptic curve multiplication

As same with ModExp, the basic implement of ECC multiplication is based on CASPER API, as shown in Figure 16.

Figure 16. CASPER API

Finally, after running the example code, you can get the print string on the CommAssistant, as shown in Figure 17

NXP SemiconductorsSDK implementations

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 12 / 18

Figure 17. CASPER example results

7 CASPER usage in mbedTLSThe example of mbedTLS can be found in the SDK package and its location is as below:

SDK_2.6.3_LPCXpresso55S69\boards\lpcxpresso55s69\mbedtls_examples\mbedtls_benchmark\cm33_core0

The demo application performs a cryptographic algorithm which includes symmetric and asymmetric encryption. CASPER HWaccelerated in the RSA-1024 encryption, ECDSA-secp256r1 Signing and Verification, ECDHE-secp256r1 key exchange, ECDH-secp256r1 key exchange.

After downloading and running the code, the debug port is as shown in Figure 18.

NXP SemiconductorsCASPER usage in mbedTLS

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 13 / 18

Figure 18. CASPER HW accelerated result

If setting FSL_FEATURE_SOC_CASPER_COUNT as 0 in the LPC55S69_cm33_core0_features.h, it will change to softwareimplementation. The result is as shown in Figure 19.

NXP SemiconductorsCASPER usage in mbedTLS

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 14 / 18

Figure 19. software implementation result

If setting FSL_FEATURE_SOC_CASPER_COUNT as 1 in the LPC55S69_cm33_core0_features.h and debugging step by step, functioncalls can be found as below:

1. The implementation of RSA-1024: 196.33 public/s.

CASPER_ModExp() is CASPER low driver API and is used in the RSA-1024 encryption, as shown in Call Stack in Figure 20.

NXP SemiconductorsCASPER usage in mbedTLS

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 15 / 18

Figure 20. RSA-1024 encryption implementation

2. The implementation of ECDSA-secp256r1 : 3.33 sign/s.

CASPER_ECC_SECP256R1_Mul () is CASPER low driver API and is used in the ECDSA-secp256r1 Signing, as shown inCall Stack in Figure 21.

Figure 21. ECDSA-secp256r1 Signing implementation

3. The implementation of ECDSA-secp256r1 : 3.33 verify/s.

CASPER_ECC_SECP256R1_MulAdd () is CASPER low driver API and is used in the ECDSA-secp256r1 verification, asshown in Call Stack in Figure 22.

Figure 22. ECDSA-secp256r1 verification implementation

4. The implementation of ECDHE-secp256r1 : 2.00 handshake/s.

NXP SemiconductorsCASPER usage in mbedTLS

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 16 / 18

CASPER_ECC_SECP256R1_Mul () is CASPER low driver API and is used in the ECDHE-secp256r1 key exchange, asshown in Call Stack in Figure 23.

Figure 23. ECDHE-secp256r1 key exchange implementation

5. The implementation of ECDH-secp256r1 : 3.67 handshake/s.

CASPER_ECC_SECP256R1_Mul() is CASPER low driver API and is used in the ECDH-secp256r1 key exchange, as shownin Call Stack in Figure 24.

Figure 24. ECDH-secp256r1 key exchange implementation

8 Revision historyTable 1. Revision history

Rev. No. Date Description

0 20 April 2019 Initial release

1 15 July 2019 General updates

2 23 October 2019 Parameter update

3 7 January 2020 Include LPC55S2x/1x

NXP SemiconductorsRevision history

Asymmetric Cryptographic Accelerator CASPER, Rev. 3, 7 January 2020Application Note 17 / 18

How To Reach Us

Home Page:

nxp.com

Web Support:

nxp.com/support

Information in this document is provided solely to enable system and software implementers touse NXP products. There are no express or implied copyright licenses granted hereunder todesign or fabricate any integrated circuits based on the information in this document. NXPreserves the right to make changes without further notice to any products herein.

NXP makes no warranty, representation, or guarantee regarding the suitability of its products forany particular purpose, nor does NXP assume any liability arising out of the application or useof any product or circuit, and specifically disclaims any and all liability, including without limitationconsequential or incidental damages. “Typical” parameters that may be provided in NXP datasheets and/or specifications can and do vary in different applications, and actual performancemay vary over time. All operating parameters, including “typicals,” must be validated for eachcustomer application by customer's technical experts. NXP does not convey any license underits patent rights nor the rights of others. NXP sells products pursuant to standard terms andconditions of sale, which can be found at the following address: nxp.com/SalesTermsandConditions.

While NXP has implemented advanced security features, all products may be subject tounidentified vulnerabilities. Customers are responsible for the design and operation of theirapplications and products to reduce the effect of these vulnerabilities on customer’s applicationsand products, and NXP accepts no liability for any vulnerability that is discovered. Customersshould implement appropriate design and operating safeguards to minimize the risks associatedwith their applications and products.

NXP, the NXP logo, NXP SECURE CONNECTIONS FOR A SMARTER WORLD, COOLFLUX,EMBRACE, GREENCHIP, HITAG, I2C BUS, ICODE, JCOP, LIFE VIBES, MIFARE, MIFARECLASSIC, MIFARE DESFire, MIFARE PLUS, MIFARE FLEX, MANTIS, MIFARE ULTRALIGHT,MIFARE4MOBILE, MIGLO, NTAG, ROADLINK, SMARTLX, SMARTMX, STARPLUG, TOPFET,TRENCHMOS, UCODE, Freescale, the Freescale logo, AltiVec, C‑5, CodeTEST, CodeWarrior,ColdFire, ColdFire+, C‑Ware, the Energy Efficient Solutions logo, Kinetis, Layerscape, MagniV,mobileGT, PEG, PowerQUICC, Processor Expert, QorIQ, QorIQ Qonverge, Ready Play,SafeAssure, the SafeAssure logo, StarCore, Symphony, VortiQa, Vybrid, Airfast, BeeKit,BeeStack, CoreNet, Flexis, MXC, Platform in a Package, QUICC Engine, SMARTMOS, Tower,TurboLink, UMEMS, EdgeScale, EdgeLock, eIQ, and Immersive3D are trademarks of NXP B.V.All other product or service names are the property of their respective owners. AMBA, Arm,Arm7, Arm7TDMI, Arm9, Arm11, Artisan, big.LITTLE, Cordio, CoreLink, CoreSight, Cortex,DesignStart, DynamIQ, Jazelle, Keil, Mali, Mbed, Mbed Enabled, NEON, POP, RealView,SecurCore, Socrates, Thumb, TrustZone, ULINK, ULINK2, ULINK-ME, ULINK-PLUS, ULINKpro,µVision, Versatile are trademarks or registered trademarks of Arm Limited (or its subsidiaries) inthe US and/or elsewhere. The related technology may be protected by any or all of patents,copyrights, designs and trade secrets. All rights reserved. Oracle and Java are registeredtrademarks of Oracle and/or its affiliates. The Power Architecture and Power.org word marksand the Power and Power.org logos and related marks are trademarks and service markslicensed by Power.org.

© NXP B.V. 2020. All rights reserved.

For more information, please visit: http://www.nxp.comFor sales office addresses, please send an email to: [email protected]

Date of release: 7 January 2020Document identifier: AN12445


Recommended