+ All Categories
Home > Software > LCU14 310- Cisco ODP v2

LCU14 310- Cisco ODP v2

Date post: 18-Nov-2014
Category:
Upload: linaro
View: 206 times
Download: 1 times
Share this document with a friend
Description:
LCU14 310- Cisco ODP --------------------------------------------------- Speaker: Robbie King Date: September 17, 2014 --------------------------------------------------- ★ Session Summary ★ Cisco to present their experience using ODP to provide portable accelerated access to crypto functions on various SoCs. --------------------------------------------------- ★ Resources ★ Zerista: http://lcu14.zerista.com/event/member/137757 Google Event: https://plus.google.com/u/0/events/ckmld1hll5jjijq11frbqmptet8 Video: https://www.youtube.com/watch?v=eFlTmslVK-Y&list=UUIVqQKxCyQLJS6xvSmfndLA Etherpad: http://pad.linaro.org/p/lcu14-310 --------------------------------------------------- ★ Event Details ★ Linaro Connect USA - #LCU14 September 15-19th, 2014 Hyatt Regency San Francisco Airport --------------------------------------------------- http://www.linaro.org http://connect.linaro.org
20
LCU14 BURLINGAME Robbie King, LCU14 LCU14-310: Cisco ODP
Transcript
Page 1: LCU14 310- Cisco ODP v2

LCU14 BURLINGAME

Robbie King, LCU14

LCU14-310: Cisco ODP

Page 2: LCU14 310- Cisco ODP v2

● Cisco’s Data Plane● Background● Today● Initial Merchant Silicon Deployment● Subsequent Deployments

● OpenDataPlane Project● Cisco’s Interest● Cisco Crypto API● ODP Crypto API

● ODP Crypto API Status● Definition● Applications● IPsec Example App● Performance Test App & Results● HW Implementations

Agenda● Cisco DP on ODP

● Introduction● Block Diagram● Status

● Going Forward

Page 3: LCU14 310- Cisco ODP v2

Cisco Data Plane - Background● Originally developed for Cisco QFP ASIC, ASR1000 series routers● Over one bazillion lines of code (OK, not a bazillion, but a LOT)● Deployed on ASICs ranging from 160 to 1024 threads● New ASICs continue to be developed● Software leverages assists via hardware abstraction layer

● Crypto operations originally performed by external coprocessor● Things have changed over the last decade...

● Work distribution● Packet order preservation● Ordered / atomic code sections

● Classification

Page 4: LCU14 310- Cisco ODP v2

Cisco Data Plane - Today● Deployed on a variety of merchant silicon devices

● x86● MIPs● PPC● ARMv7 and ARMv8

● Deployed in a variety of environments● Bare Metal (QFP ASIC)● Bare Metal (Merchant Silicon)● Native Linux Process

● Crypto operations offloaded in a variety of ways● Synchronous in place● Asynchronous co processor

Page 5: LCU14 310- Cisco ODP v2

Cisco Data Plane - Initial Merchant Silicon● Initial merchant silicon deployment was (still is) “bare metal”● Device provides several hardware assists

● Work distribution● Packet ordering● Ordered / atomic code sections● Hierarchical queuing and scheduling● Cryptographic operations

● Drawbacks● Control plane runs on separate device (difficult to partition Linux / bare metal)● Huge investment in time and resources

● Lack of OS required instrumenting large amount of infrastructure● Required reworking existing Cisco hardware abstraction API● HW abstraction implementation written from scratch (did not leverage vendor SDK)

Page 6: LCU14 310- Cisco ODP v2

Cisco Data Plane - Later Merchant Silicon● Subsequent deployments run as multithreaded Linux process● Advantages

● Control plane able to run on same device● Rapid deployment velocity on new architectures● Consistent infrastructure (file I/O, core files, etc)

● Drawbacks● Kernel interaction must be kept to a minimum for performance● Hardware assists (if available) are difficult to leverage

● If assist controlled by kernel then high interaction price● If assist directly accessible from user space, inconsistent API across vendors

Page 7: LCU14 310- Cisco ODP v2

OpenDataPlane - Cisco’s Interest● Deploying on merchant silicon makes good business sense

● Allows our ASIC teams to focus on high end differentiation● Allows us to take advantage of “economies of scale” using off the shelf silicon

● Difficult to compare devices today● Often unable to consider a device’s HW assists due to SW effort required to leverage

● Goal is to compare devices based on throughput, power and cost● Desire well defined, common APIs for hardware assists● Common APIs are good for everyone

● Common APIs accelerate the development of both proprietary and open source apps● Well crafted APIs allow vendors to differentiate while maintaining portability● Facilitates device selection based on all merits

Page 8: LCU14 310- Cisco ODP v2

OpenDataPlane - Cisco Crypto API ● Defining HW assist APIs is a daunting task - prioritize● Crypto performance is becoming increasingly important● Getting crypto working “on the next device” has been challenging● Cisco developed “Crypto Device Abstraction Layer” (CDAL)

● Initial version defines symmetric key operations● Session creation and per packet APIs, both synchronous and asynchronous● CDAL has been / is being implemented by multiple HW vendors

● ODP project presents an awesome opportunity● Helps Cisco accelerate crypto development and participate in open source community

● Cisco requested Crypto API become an ODP priority at LCA14

Page 9: LCU14 310- Cisco ODP v2

OpenDataPlane - ODP Crypto API ● Goals for the ODP Crypto API

● Level of functionality (but not necessarily semantics) similar to CDAL API● Develop within existing ODP constructs (i.e. don’t force ODP to be CDAL)● Be useable “ala carte”, i.e. don’t require wholesale conversion of app to ODP

● Deliverables● Crypto API Specification● Linux-generic reference implementation ● Example application to evaluate API definition

● Stretch Goals● Test application to evaluate performance across implementations● Cisco data plane using ODP crypto API

Page 10: LCU14 310- Cisco ODP v2

ODP Crypto API Status - Definition● Document version 1.0 available today (opendataplane.org)● Reference implementation also available (git.linaro.org/lng/odp.git)

● Patches accepted for “linux-generic” in August● Implements 3DES cipher and MD5 hash for authentication using OpenSSL libraries● Supports sync and async versions of per session and per packet operations● Supports multiple models for storing results of per packet operations

● Result into same buffer (i.e. in place)● Result into new buffer (supplied by application)● Result into new buffer (allocated by implementation)

● Open issues / work items● Resolve packet / completion event relationship questions● Ability to query implementation capacities and capabilities

Page 11: LCU14 310- Cisco ODP v2

ODP Crypto API Status - Applications● IPsec example application

● Patches reviewed and ready for implementation● Vehicle to evaluate “robustness” of ODP crypto API● Implements IPsec ESP and AH protocols using 3DES and MD5/96

● Performance test application● Initial version functioning, more work to do before submitting patches● Measures throughput for various payload sizes● Preliminary results (next slides)

● Cisco DP using ODP crypto API● Start gated by Cisco data plane work items● Pending DP port to ODP infrastructure● Pending DP support for configuring crypto in “headless” environment

Page 12: LCU14 310- Cisco ODP v2

ODP Crypto API Status - IPsec Example App● Configuration driven from command line (modeled after “setkey”)● IPv4 forwarding between ports based on configured routing table● IPsec encode/decode based on configured SA/SP database entries● Currently transport mode only, tunnel mode to be added (Bug 641)● Supports live traffic (demos on multiple platforms this week)● Supports standalone traffic generation / verification

● Generates packets internally, captures and verifies results without need for packet IO

● Utilizes key features of ODP● Runs on multiple cores, utilizing either odp_schedule or polled queues● Utilizes ORDERED and ATOMIC queues to maintain ordering

Page 13: LCU14 310- Cisco ODP v2

ODP Crypto API Status - Performance Test App● All testing performed on TI Keystone II eval system (4xA15)● Compare “linux-generic” (SW) versus “linux-keystone2” (HW)● Test loops, invoking per packet crypto API, measures elapsed time

● Single encode/encrypt session used for testing● Session specifies both cipher (3DES) and authentication (MD5-96)● Async test saturates pipeline with parallel encrypt operations, polls for responses

● Caveats● The “linux-generic” as tested focuses on functionality not performance● The “linux-keystone2” as tested has yet to be performance optimized (but will be soon)

Page 14: LCU14 310- Cisco ODP v2

ODP Crypto API Status - Perf Test App Results

payload(bytes)

linux-generic linux-keystone2

elapsed (us) throughput (kb) elapsed (us) throughput (kb)

16 14.447 1,081 2.782 5,615

64 22.132 2,823 2.804 22,290

256 52.910 4,725 2.867 87,198

1,024 176.745 5,657 7.349 136,076

8,192 1,331.475 6,008 56.250 142,221

16,384 2,652.426 6,032 112.500 142,221

● In summary, H/W assist is ~22 times faster for sizeable payloads

Page 15: LCU14 310- Cisco ODP v2

ODP Crypto API Status - HW Implementations● Several vendors demoing this week using IPsec example app● Texas Instruments - Keystone II

● Asynchronous / new buffer mode

● Cavium - Octeon CN66XX● Synchronous / in place mode

● Freescale - P4080DS● Asynchronous / in place mode

● Avago - AXM5500● Asynchronous / in place mode

Page 16: LCU14 310- Cisco ODP v2

Cisco DP on ODP - Introduction● With Crypto API defined, where do we focus next?● As core counts grow, HW assists critical to core over core scaling● Leveraging merchant silicon HW assists proves challenging

● Large resource investment for each device / SDK targeted● Different operating environments● Different levels of abstraction

● ODP potentially allows Cisco to quickly leverage critical assists● Work distribution - odp_schedule()● Packet ordering - ODP_SCHED_SYNC_ORDERED queues● Ordered / atomic code sections - ODP_SCHED_SYNC_ATOMIC queues● Buffer management● Crypto operations

Page 17: LCU14 310- Cisco ODP v2

Cisco DP on ODP - Block Diagram

RX IF

CryptoAssist

SCHEDULER

RX IFTX IF

TX IF

Loop doing the following● Call odp_schedule for new work● Process as much as possible● Call odp_queue_enq to send to

○ Output interface or○ Crypto assist engine or○ Ordered queue or○ Atomic queue

N

BUFFER MANAGER

ORDEREDATOMIC

Core NCore 1

Core 0

Page 18: LCU14 310- Cisco ODP v2

Cisco DP on ODP - Status● Currently forwarding IPv4 on X86 and ARM using “linux-generic”● Development started on ARM using “linux-keystone2”● Demoing on ARM this week● Next steps

● Target additional platforms as ODP implementations become available● Performance analysis and optimizations● End to end QOS analysis (priority, over subscription, etc)● Integration of CDAL / crypto API

Page 19: LCU14 310- Cisco ODP v2

Going Forward● For ODP 1.0

● Quickly finalize the basic APIs● Strive for functionality not perfection● Define tear down APIs for normal application exit● Define abnormal cleanup APIs / mechanisms for abnormal exit● Complete API compliance test suite

● Post ODP 1.0● Focus on performance and HW implementations● Verify 1.0 APIs can be implemented efficiently across member hardware● Verify 1.0 APIs can be used to build a non-trivial application considering

● Portability● Performance● Quality of Service (for example, behavior of overall system when over-subscribed)

Page 20: LCU14 310- Cisco ODP v2

More about Linaro Connect: connect.linaro.org Linaro members: www.linaro.org/membersMore about Linaro: www.linaro.org/about/


Recommended