External Use
TM
Introduction to Data Path
Acceleration Architecture (DPAA)
FTF-NET-F0146
A P R I L 2 0 1 4
Mary Kung | Digital Networking Applications Engineering
TM
External Use 1
Session Introduction
• This session will provide:
− Introduction to the QorIQ Data Path Acceleration Architecture (DPAA)
− Discussion of how each component interacts with the core and with
each other
TM
External Use 2
Session Objectives
• After completing this session you will be able to:
− Understand the purpose of DPAA
− Describe the building blocks of DPAA
− Understand how the DPAA blocks interact with each other
− Understand DPAA implementations on various Freescale devices
TM
External Use 3
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 4
Multicore Data Path Issues and Requirements
Multicore SoCs have a number of new requirements related to packet processing when compared to single core SoCs:
− Load spreading of arriving packets across pools of cores for parallel processing
− Packet ordering issues after processing
− Pipelined processing of packets using cores
− Network I/O sharing between cores
− Hardware accelerator “virtualization”
− Inter-core communication
Core
D$ I$
Network
I/O
Hardware
Accelerator
Network
Core
D$ I$
Core
D$ I$
Core
D$ I$
Core
D$ I$
TM
External Use 5
More Multicore Data Path Requirements
• Addressing these requirements can lead to new requirements:
− Hardware managed queues Hardware-supported active queue
management
− Network interfaces must be able to parse, classify, and distribute frames
• High-bandwidth network I/O on QorIQ devices also drive data path
requirements:
− Queue congestion driven flow control
− Resource depletion driven flow control
− Hardware buffer management
TM
External Use 6
What is the Data Path Acceleration Architecture (DPAA)?
The QorIQ DPAA is a comprehensive architecture which integrates
all aspects of packet processing in the SoC
- Addresses issues and requirements resulting from the multicore nature
of QorIQ SoCs
The DPAA includes:
− Network and Packet I/Os
− Hardware offload accelerators
− Infrastructure required to facilitate the flow of packets between the above
TM
External Use 7
Example HW Difference: Buffer Descriptor Rings vs. DPAA
DPAA infrastructure replaces descriptor rings:
• Queuing split from buffer management
• Queues can be shared by multiple cores
• Data reception no longer throttled by how fast software can service ring entries
• Data can be stashed into cache just before it is processed
Core
D$ I$
Core
D$ I$
Core
D$ I$
Core
D$ I$
Network I/O
Eth Eth Eth
Queue
Manager
Buffer
Manager
Core
D$ I$
Eth
TM
External Use 8
QorIQ DPAA Fundamental Components
QMan
Queue
Manager
BMan
Buffer
Manager
SEC
Security
Engine
PME
Pattern
Matching
Engine
FMan
Frame
Manager
RMan
RapidIO
Message
Manager
Cores
Rapid I/O
Messaging
and more
Ethernet
Hardware
Accelerators
Infrastructure
Components
DPAA
Network &
Packet I/O
TM
External Use 9
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 10
DPAA Ethernet MAC Component Differences
• QorIQ P class devices have both:
− Three-Speed Ethernet Controller (dTSEC)
− 10-Gigabit Ethernet Media Access Controller (10GEC)
• QorIQ T class devices have:
− Multi-rate Ethernet Media Access Controller (mEMAC)
TM
External Use 11
QorIQ P4080 DPAA Components
RapidIO
Message
Unit (RMU)
2x DMA
PCIe
18-Lane 5GHz SERDES
PCIe SRIO PCIe
CoreNet
1024KB
Frontside
L3 Cache
64-bit
DDR-2 / 3
Memory Controller
SRIO
Watchpoint Cross
Trigger
Perf Monitor
CoreNet Trace
Aurora
SEC PME
Buffer
Mgr
eLBC
Test
Port/
SAP Frame Manager
1GE 1GE
1GE 1GE 10GE
1024KB
Frontside
L3 Cache
64-bit
DDR-2 / 3
Memory Controller
PAMU
Coherency Fabric PAMU PAMU PAMU PAMU
Peripheral
Access Mgmt Unit
eOpenPIC
Power Mgmt
2x USB 2.0/ULPI
SD/MMC
Clocks/Reset
2x DUART
4x I 2 C
SPI
GPIO
PreBoot Loader
Security Monitor
Internal BootROM
CCSR
Power Architecture
e500-mc Core
D-Cache I-Cache
128KB
Backside
L2 Cache 32KB 32KB
Real Time Debug
Frame Manager
1GE 1GE
1GE 1GE 10GE
Queue
Manager
QorIQ
P4080
TM
External Use 12
QorIQ T4xxx DPAA Components
Hardware Accelerators
FMAN
Frame
Manager
50 Gbps aggregate Parse,
Classify, Distribute
BMAN
Buffer
Manager
64 buffer pools
QMAN
Queue
Manager
Up to 224 queues
RMAN
Rapid IO
Manager
Seamless mapping sRIO
to DPAA
SEC
Security
40Gbps: IPSec, SSL
Public Key 25K/s 1024b
RSA
PME
Pattern
Matching
10Gbps aggregate
DCE
Data
Compression
20Gbps aggregate
Saving CPU Cycles for higher value work
Compress and
decompress
traffic across the
Internet
Protects against
internal and
external Internet
attacks
Frees CPU from
draining repetitive
RSA, VPN and
HTTPs traffic
Identifies traffic
and targets CPU
or accelerator
New Enhanced
Line rate
50Gbps
Networking
Quality of Service
for FCoE in
converged data
center networking
TM
External Use 13
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 14
Network I/O: FMAN
Frame Manager (FMan) supports:
• (P4080) One 10GE MAC and Four GE MACs − Max 12xGE parse+classify
• (T4xxx) Two 10GE MAC and Six GE MACs
• L2/L3/L4 protocol parse and validate − User defined protocols supported
• Hash-based queue selection for load spreading
• Exact-match classification queue selection
• IEEE 1588 timestamping
• RMON/ifMIB stats
• Color-aware dual-rate, 3-color policing
• “Right size” buffer acquisition from BMan buffer pools
• Per port egress rate limiting
• TCP/UDP TX checksum calculation
10GE GE GE GE GE
Frame Manager
(FMan) DMA
Policer Keygen
(Distribution)
Parser Classifier
CoreNet
To
BMan
To QMan
QMI
BMI
Buffer
Memory
TM
External Use 15
Fman Modular Architecture Processing Pipeline
MAC Rx and validate
BMI streams and allocate internal buffer
for incoming frame + IC (Internal Context)
Calculate raw L4 checksum for parser
Based upon Layer-2 packet size, BMI requests
“right sized” buffer to BMan
BMI instructs DMA to transfer frame
to external buffer
Parse / Classify / Distribute
Determine queue ID#
Per-group policing(RFC2698/4115)
QMI instructs QMan to enqueues FD
BMI releases internal buffer on completion
QMan active queue mngt
and scheduling (WRED)
Core dequeue & processing
Rx
BMI instructs DMA to write
frame IC and header to ext. Buffer
Multiple
stages
FMan
Parser
Internal Ctx
Shared
Memory
BMI
QMI
1GE 10GE
DMA
Keygen
Classify/ Distrib
Policer
MACs
FPM BMan
QMan
BMI
BMI / BMan
BMI / DMA
PCD
Policer
BMI / DMA
QMI / BMI
QMan / BMan
QMan
Core / SW
MAC
FMC
10GE 1GE 1GE 1GE 1GE 1GE
QorIQ T4240
FD
BP
Frame
TM
External Use 16
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 17
DPAA Infrastructure: BMAN
Buffer Manager (BMan) supports:
• 64 pools of buffer pointers
− All buffers in a pool are expected to have “like” characteristics
− BMan places no restrictions on these characteristics
• Hardware (and software) acquire and release of buffer pointers from/to pools
− BMan is primarily intended to reduce the buffer management load on SW
• Pool depletion thresholds for pool replenishment and lossless flow control
− All thresholds have hysteresis
Buffer Manager
(BMan)
FMan
FMan
SEC
PME
List
Engines
Software Portals
CoreNet
Internal stockpile
To Cores
Hard
wa
re p
orta
ls
TM
External Use 18
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 19
Terminology…
• Buffer: Unit of contiguous memory, allocated by software
• Frame: Buffer(s) that hold a data element (generally a packet)
− Frames can be single buffers or multiple buffers (scatter/gather lists)
A “simple frame” has one delimited data element
A “multi buffer frame” has two or more data elements
• Frame Descriptor (FD): Proxy structure used to represent frames
• Frame Queue:
− FIFO of related Frames Descriptor.(e.g. TCP session)
− The basic queuing structure supported by QMan
• Frame Queue Descriptor (FQD): Structure used to manage Frame Queues
Buffer
Buffer
Ethernet
Frame Pre-
amble
Dest
addr
Src
addr Type Data CRC
Buffer Buffer
FD
FD
FQD FD FD FD FD
TM
External Use 20
Queue “Building Blocks”
• Frame Queues (FQs) are the basic queuing structure supported by QMan − FIFO lists of Frame Descriptors (FDs)
− Each FD describes a frame which is a delineated piece of data (e.g. a packet) in buffer(s) in memory
− Multi-buffer frames are described using Scatter/Gather Tables
− FQs are in turn enqueued on Work Queues (WQs)
• Channels are a collection of 8 WQs which have relative priority − Class scheduling is performed at a channel
− FQs are an ordered list of frames which need to be processed in the same way
− WQs are an ordered list of FQs which all have the same priority
• Portal is an interface used to access QMan facilities (e.g. Enqueue or Dequeue) possibly for multiple channels
C
ha
nn
el
Ch
an
ne
l
WQ7
WQ0
WQ1 …
FQ
FQ FQ
FQ FQ
FD FD
SGT
Bu
ffer
FD
User memory QMan data structures
Bu
ffer
Bu
ffer
Porta
l
Context
TM
External Use 21
DPAA Infrastructure: QMAN
Queue Manager (QMan) supports: • Low latency, prioritized queuing of
descriptors between cores, network I/O, and accelerators
• Lockless shared queues for load spreading and device “virtualization”
• Order restoration as well as order preservation through queue affinity
• Active queue management (WRED)
• Optimized core interface which can pre-position data/context/descriptors in core’s cache
• Delivery of per-queue accelerator specific commands and context information to offload accelerators along with dequeued descriptors
FQD
Cache
Queue Manager
(QMan)
…
FMan
FMan
SEC
PME
…
…
…
…
…
FD
Memory
Queuing
Engines
Software Portals
…
CoreNet To Cores
Hard
wa
re p
orta
ls
Frame Descriptor
Frame
Descriptor
TM
External Use 22
Core Interface: QMan Software Portals
• Software portals provide the DPAA interface to cores and software
− Portal per core
− Can be used by a core to access multiple channels or queues directly
• Low latency, lock free dequeue and enqueue of descriptors
• Portals can work closely with a core to (optionally) position the following:
− Descriptors
− Packet data
− Software defined per queue context or state information in L1 or L2 cache
• Queues can be “held” on a portal to ensure temporary affinity for order preservation
channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Power Architecture™
Core
D-Cache
I-Cache
L2 Cache
SW Portal
Dedicated channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
QMan
Held FQs
Power Architecture™
Core
D-Cache
I-Cache
L2 Cache
SW Portal
Pool channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
TM
External Use 23
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 24
SEC 5.x • Public Key Hardware Accelerators (PKHA)
− RSA and Diffie-Hellman (to 4096b)
− Elliptic curve cryptography (1023b)
− Supports Run Time Equalization
• Data Encryption Standard Accelerators (DESA)
− DES, 3DES (2K, 3K)
− ECB, CBC, OFB modes
• Advanced Encryption Standard Accelerators (AESA)
− Key lengths of 128-, 192-, and 256-bit
− ECB, CBC, CTR, CCM, GCM, CMAC,
− OFB, CFB, and XTS
• Message Digest Hardware Accelerators (MDHA)
− SHA-1, SHA-2 256,384,512-bit digests
− MD5 128-bit digest
− HMAC with all algorithms
• ARC Four Hardware Accelerators (AFHA)
− Compatible with RC4 algorithm • Kasumi/F8 Hardware Accelerators (KFHA)
− F8 , F9 as required for 3GPP − A5/3 for GSM and EDGE − GEA-3 for GPRS
• Snow 3G Hardware Accelerators (STHA) − Implements Snow 3.0
• CRC Unit
− CRC32, CRC32C, 802.16e OFDMA CRC
• Random Number Generator, random IV generation
• Header & Trailer off-load for the following Security Protocols:
− IPSec, 802.1ae, SSL/TLS, SRTP, 802.11i, 802.16e
• Modular & Scalable with simplified device driver
On - Chip
System
Interface
Queue
Manager
Interface
Descriptor
Controllers
Job Queue
Controller
CHAs
RTIC RTIC
On - Chip
System
Interface
Queue
Manager
Interface
Descriptor
Controllers
Job Queue
Controller RTIC RTIC
CoreNet
QMan/
BMan
TM
External Use 25
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 26
Pattern Matching Engine (PME) 2.x
On-Chip
System
Bus
Interface
Pattern
Matcher
Frame
Agent
(PMFA)
Data
Examination
Engine
(DXE)
Stateful
Rule
Engine
(SRE)
Key
Element
Scanning
Engine
(KES)
Hash
Tables
Access to Pattern Descriptors and State
Pattern Matching Engine components
Cache Cache
User Definable Reports BM
an i/f
• Regex support plus extensions:
− Patterns can be split into 256 sets, each of which can contain 16 subsets
− 32K patterns of up to 128B length
− 9.6 Gbps raw performance
• Combined hash/NFA technology
− No “explosion” in number of patterns due to wildcards
− Low system memory utilization
− Fast pattern database compilations and incremental updates
• Pattern identification in streamed data by matching across “work units”
• Utilizes a pipeline of processing blocks to provide a complete pattern matching solution
QM
an
i/f
TM
External Use 27
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 28
Data Compression Engine (DCE)
• Deflate
− RFC1951
• GZIP
− RFC1952
• Zlib
− RFC1950
− Interoperability with Zlib 1.2.5 compression library
• Encode
− RFC4648: Supports Base 64 encoding and decoding
• Operate up to 600Mhz
− 10Gbps Compression rate
− 10Gbps Decompression rate
− 20Gbps Aggregate
32KB
History
Frame
Agent
QMan
I/F
BMan
I/F
Bus
I/F
Decompressor
Compressor
QMan
Portal
BMan
Portal
To
Corenet
4KB
History
TM
External Use 29
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 30
RapidIO Message Manager (RMan)
QMan
RMan
Inbound Rule
Matching
Classification
Unit
Reassembly
Contexts
Reassembly
Unit
Segmentation
Unit
Rapid
IO I
nbound T
raffic
Rapid
IO O
utb
ound T
raffic
Classification
Unit
Classification
Unit
Reassembly
Unit
Reassembly
Unit
Segmentation
Unit
Segmentation
Unit
AR
B
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Channel
Frame Manager
1GE 1GE
1GE 1GE 10GE
D$ I$
D$ I$ L2$ Core
D$ I$
SE
C
PM
E
Disassembly
Contexts
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Channel
TM
External Use 31
RMan Unit Comparison
QorIQ P4080 QorIQ P2040, P3, P5,T2,T4
Outbound
Transactions Supported Type 10 Doorbells
Type 11 Messaging
Type 5 NWRITE Type 9 Data Streaming
Type 6 SWRITE Type 10 Doorbells
Type 8 Port-Write Type 11 Messaging
Queues 1 Type 10 Doorbell
2 Type 11 Messaging Thousands of queues supporting Type 5,6,8-11
Queue Arbitration Round Robin Data Path Acceleration Architecture
• 3+3+1 SP+WRR
Segmentation Resources 2 Segmentation Units 4 Segmentation Units
Multicast Support Type 11 256B PDU to 16 Destinations Type 11 256B PDU to 32 Destinations
Inbound
Transactions Supported
Type 8 Port-Write
Type 10 Doorbells
Type 11 Messaging
Type 8 Port-Write Type 10 Doorbells
Type 9 Data Streaming Type 11 Messaging
Queues
1 Type 8 Port-Write
1 Type 10 Doorbell
2 Type 11 Messaging
1 Type 8 Port-Write
1000s Type 9-11
Classification 2 Rules (Fixed)
Type 11: [mbox]
64 Rules (Exact or Wildcards)
or
Map selected header fields to queue ID
Simultaneous Reassembly
Contexts 2 Type 11 16 Type 9, 11
Additional Features
Traffic Management N/A Type 9: End-to-end XON/XOFF Per-Queue Flow Control
TM
External Use 32
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• BMAN
• QMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 33
Power Architecture™
Core
D-Cache I-Cache
L2 Cache
Power Architecture™
Core
D-Cache I-Cache
L2 Cache
Power Architecture™
Core
D-Cache I-Cache
L2 Cache
• FMan receives packets
− Allocates internal buffers
− Retrieves data from MAC
• BMI
− Acquires a buffer from BMan
− Uses DMA to store data in it
• Parse+classify+keygen select a queue and policer profile
• Policer “colors” and optionally discards frame
• QMan applies active queue management and enqueues frame
• Frame is enqueued to one of a pool of cores
• Available core dequeue FD for processing
MAC
BMI
Parser
Classifier
Keygen
Policer
QMI
WRED
Enqueue
Dequeue
To
Memory
10GE GE GE GE GE
Frame Manager
(FMan) DMA
Policer Keygen
(Distribution)
Parser Classifier
QMI
BMI
Memory Buffer
Manager
Queue
Manager
D
WQ0
WQ1
WQ2
WQ3
WQ4
WQ5
WQ6
WQ7
Power Architecture™
Core
D-Cache I-Cache
L2 Cache
ENQ
FD
DEQ
Return
Buf Ptr
Request
Buffer
DDR
D
PKT
D
PKT
DDR
Life of an Ingress Packet
TM
External Use 34
Channel Enqueue / Dequeue Example (QorIQ P4080)
Dedicated Channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
SW Portal n+1
Pool Channel W
Q0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
DCP Portal 3
Dedicated Channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
PME Core0 Core1
Dedicated Channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
QMAN
FMAN 1
QMI
DCP Portal 0
SW Portal n
• • •
EQCR
• • •
DQRR
• • •
EQCR
• • •
DQRR
FQD[Dest_WQ]
1GE
Enqueue
Dequeue Enqueue
Dequeue
FQD[Dest_WQ]
Enqueue
Dequeue
Enqueue
Dedicated Channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
FQD[Dest_WQ] FQD[Dest_WQ]
10GEC
PCD
DCP = Direct Connect Portal / Hardware Portal
TM
External Use 35
Agenda
• Overall DPAA Architecture
• DPAA Implementation Differences
• FMAN
• QMAN
• BMAN
• SEC
• PME
• DCE
• RMAN
• Life of an Ingress Packet
• Additional Product Diagrams…
TM
External Use 36
RapidIO
Message
Unit (RMU)
2x DMA
PCIe
18-Lane 5GHz SERDES
PCIe SRIO PCIe
CoreNet™
1024KB
Frontside
L3 Cache
64-bit
DDR-2 / 3
Memory Controller
SRIO
Watchpoint Cross
Trigger
Perf Monitor
CoreNet Trace
Aurora
SEC PME
Buffer
Mgr
eLBC
Test
Port/
SAP Frame Manager
1GE 1GE
1GE 1GE 10GE
1024KB
Frontside
L3 Cache
64-bit
DDR-2 / 3
Memory Controller
PAMU
Coherency Fabric PAMU PAMU PAMU PAMU
Peripheral
Access Mgmt Unit
eOpenPIC
Power Mgmt
2x USB 2.0/ULPI
SD/MMC
Clocks/Reset
2x DUART
4x I 2 C
SPI
GPIO
PreBoot Loader
Security Monitor
Internal BootROM
CCSR
Power Architecture
e500-mc Core
D-Cache I-Cache
128KB
Backside
L2 Cache 32KB 32KB
Real Time Debug
Frame Manager
1GE 1GE
1GE 1GE 10GE
Queue
Manager
QorIQ
P4080
QorIQ P4080 DPAA Components
TM
External Use 37
SRIO
Message
Unit
DMA
PCIe
18-Lane 5GHz SERDES
PCIe SRIO PCIe
CoreNet
512-Kbyte
Frontside
L3 Cache
64-bit
DDR-2 / 3
Memory Controller
QorIQ
P4040 Power Architecture
e500-mc Core
D-Cache I-Cache
128-Kbyte
Backside
L2 Cache
SRIO
Watchpoint Cross Trigger
Perf Monitor
CoreNet Trace
Aurora
Real Time Debug
Security 4.0
Pattern
Match
Engine
2.0
Queue
Mgr.
Buffer
Mgr.
eLBIU
M2SB
Test
Port/
SAP
Frame Manager
1GE 1GE
1GE 1GE 10GE
Parse, Classify, Distribute
Buffer
32-Kbyte 32-Kbyte 512-Kbyte
Frontside
L3 Cache
64-bit
DDR-2 / 3
Memory Controller
PAMU
Coherency Fabric PAMU PAMU PAMU PAMU
1GE 1GE
1GE 1GE 10GE
Parse, Classify, Distribute
Buffer
Frame Manager
Peripheral
Access
Mgmt Unit
eOpenPIC
Power Mgmt
2x USB 2.0/ULPI
SD/MMC
Clocks/Reset
DUART
2x I 2 C
SPI
GPIO
PreBoot Loader
Security Monitor
Internal BootROM
CCSR
Execution
QorIQ P4040
TM
External Use 38
Quad e500mc Power Architecture • 4 cores (up to 1.5GHz)
• Each with 128KB backside L2 cache
• 1MB Shared L3 Cache w/ECC
Memory Controller • DDR3/3L SDRAM up to 1.3 GHz
• 32/64 bit data bus w/ECC
High Speed Interconnect • 4 PCIe 2.0 Controllers
• 2 sRapidIO 2.1 Controllers Type 9 and 11 messaging
• 2 SATA 2.0
CoreNet Switch Fabric
Ethernet • 5 x 10/100/1000 Ethernet Controllers
Or 4x 2.5Gb/s SGMII
• 1 x 10GE Controllers
• All w/ Classification, H/W Queuing, policing,
and Buffer Management, Checksum Offload,
QoS, Lossless Flow Control, IEEE 1588
• Up to 1 XAUI, 4 SGMII or 2.5Gb/s SGMII, 2
RGMII
Device • 45nm SOI Process
• 1295-pin package, pin compat with P4040 37.5x37.5mm
CoreNet
Pattern
Match
Engine
2.0
1024 KB
Frontside
L3 Cache
64-bit DDR3/3L
Memory Controller
Coherency Fabric PAMU
Peripheral
Access Mgmt Unit
eOpenPIC
Power Mgmt
2x USB 2.0 PHY
SD/MMC
Clocks/Reset
2x DUART
4x I 2 C
SPI
GPIO
PreBoot Loader
Security Monitor
Internal BootROM
CCSR
Power Architecture
e500-mc Core
D-Cache I-Cache
128 KB
Backside
L2 Cache 32 KB 32 KB
SEC
4.0
Queue
Mgr.
Buffer
Mgr.
eLBC
32b
Rapid
IO
RMan
PAMU PAMU PAMU PAMU
Frame Manager
Parse, Classify, Distribute
Buffer
DMA
x2
PC
Ie
18-Lane 5 GHz SerDes
PC
Ie
PC
Ie
PC
Ie
Watchpoint Cross
Trigger
Perf Monitor
CoreNet Trace
Aurora
1GE
10GE
Real Time
Debug
1GE
1GE
1GE
SA
TA
2.0
SA
TA
2.0
SR
IO
SR
IO
1GE
QorIQ P3 Series – P3041 Block Diagram
TM
External Use 39
Frame Manager
Parse, Classify, Distribute
Buffer
DMA
x2 P
CIe
18-Lane 5 GHz SerDes
PC
Ie
PC
Ie
PC
Ie
CoreNet
Watchpoint Cross
Trigger
Perf Monitor
CoreNet Trace
Aurora
Pattern
Match
Engine
2
1GE
10GE
1024 KB
Frontside
L3 Cache
64-bit DDR-3
Memory Controller
Coherency Fabric PAMU
Peripheral
Access Mgmt Unit
eOpenPIC
Power Mgmt
2x USB 2.0 PHY
SD/MMC
Clocks/Reset
DUART
4x I 2 C
SPI
GPIO
PreBoot Loader
Security Monitor
Internal BootROM
CCSR
Power Architecture
e500mc-64 2GHz Core
D-Cache I-Cache
512 KB
Backside
L2 Cache 32 KB 32 KB
Real Time
Debug SEC
4
Queue
Mgr.
Buffer
Mgr.
eLBC
1GE
1GE
1GE
PAMU PAMU PAMU
SA
TA
2.0
SA
TA
2.0
RAID
5/6
Engine
SR
IO
SR
IO
SRIO
Mgr.
1GE
QorIQ
P5020
QorIQ P5 Series – P5020 DPAA Components
• Dual e500mc-64 Power Architecture − 2x 64-bit e500mc cores (up to 2
GHz) − Each with 512 KB backside L2
cache − Dual 1MB Shared L3 Cache w/ECC − Supports up to 64GB addressability
(36 bit physical addressing) • Memory Controller
− Dual DDR3, 3L up to 1.3 GHz − 32/64 bit data bus w/ECC
• High Speed Interconnect − 4 PCIe 2.0 Controllers − 2 SRIO 2.1 Controllers Type 9 and 11 messaging
− 2 SATA 3Gb/s − 2 USB 2.0 with PHY
• CoreNet Switch Fabric • Ethernet
− 5 x 10/100/1000 Ethernet Controllers
− 1 x 10GE Controller (XAUI) − All w/ Classification/Policing, H/W
Queuing, policing, and Buffer Management, Checksum Offload, QoS, Lossless Flow Control, IEEE 1588v2, 4 SGMII, QSGMII
• Data Path Acceleration − SEC 4 − PME 2 − RapidIO Messaging
• Device − 45nm SOI Process − 1295-pin package
TM
External Use 40
QorIQ T4xxx DPAA Components
Hardware Accelerators
FMAN
Frame
Manager
50 Gbps aggregate Parse,
Classify, Distribute
BMAN
Buffer
Manager
64 buffer pools
QMAN
Queue
Manager
Up to 224 queues
RMAN
Rapid IO
Manager
Seamless mapping sRIO
to DPAA
SEC
Security
40Gbps: IPSec, SSL
Public Key 25K/s 1024b
RSA
PME
Pattern
Matching
10Gbps aggregate
DCE
Data
Compression
20Gbps aggregate
Saving CPU Cycles for higher value work
New Enhanced
TM
External Use 41
DPAA Component Comparison Reference
Component QorIQ P3041 QorIQ P4040/80 QorIQ P5020/40 QorIQ T4240 / T2080
Cores 4 4/8 2/4 12 cores, 24 threads / 4 cores, 8 threads
QMan 100M ops/sec
256 CongGrp
10 SP
100M ops/sec
256 CongGrp
10 SP
100M ops/sec
256 CongGrp
10 SP
295M ops/sec
256 CongGrp
50 SP
BMan 64 BufferPool 64 BufferPool 64 BufferPool 64 BufferPool
Network IO
FMan 18Mpps 2 * 18Mpps 18Mpps 2 * 37.2 Mpps / 1*27.2 Mpps
Accelerator
SEC 5Gbps (v4.2) 10Gbps (v4.0) 10Gbps (v4.2) 40Gbps(v5)
PME 5Gbps 9.6Gbps 9.6Gbps 9.6Gbps
RE n/a n/a Yes n/a
RMan 1x,2x,4x @1.25, 2.5,
3.125 &5G baud
n/a
SRIO Rev 1.2
1x,2x,4x @1.25, 2.5,
3.125 &5G baud
1x,2x,4x @1.25, 2.5, 3.125 &5G baud
DCE n/a n/a n/a 20Gbps
DCB n/a n/a n/a Yes
TM
External Use 42
Session Summary
• The Data Path Acceleration Architecture components include:
− Frame Manager
− Buffer Manager
− Queue Manager
− Hardware Accelerators (SEC, PME, DCE, RMan)
• These components are integrated to address multicore requirements such as:
− Load spreading
− Packet ordering
− Device virtualization
− Inter-core communication
− HW buffer management
TM
External Use 43
For Further Information
• Freescale Website: DPAA
− http://www.freescale.com/webapp/sps/site/overview.jsp?code=QORIQ_DPAA
• Freescale Website: DPAA Reference Manual rev 2.0
− See individual device’s webpage
• Freescale Infocenter: SDK / USDPAA Information
− http://www.freescale.com/infocenter
• FTF Presentations
− FTF-NET-F0147 Data Path Acceleration Architecture (DPAA) Usage Scenarios
− FTF-NET-F0148 Data Path Acceleration Architecture (DPAA) Debug
− FTF-NET-F0031 QorIQ T4240 Communications Processor Deep Dive
− FTF-NET-F0111 Overview of Autonomous IPSec with QorIQ T Series Processors
− FTF-NET-F0246 Troubleshooting Techniques for QorIQ eTSEC and DPAA Platforms
− FTF-SDS-F0004 QorIQ Optimization Suite (QOS) Packet Analysis Tool
TM
External Use 44
Session Closing
By now, you should be able to:
• Describe, at a high level, the DPAA module and how it is used in
Freescale’s devices
• Apply the knowledge gained in this presentation to begin or refine
your design efforts
TM
External Use 45
Introducing The
QorIQ LS2 Family
Breakthrough,
software-defined
approach to advance
the world’s new
virtualized networks
New, high-performance architecture built with ease-of-use in mind Groundbreaking, flexible architecture that abstracts hardware complexity and
enables customers to focus their resources on innovation at the application level
Optimized for software-defined networking applications Balanced integration of CPU performance with network I/O and C-programmable
datapath acceleration that is right-sized (power/performance/cost) to deliver
advanced SoC technology for the SDN era
Extending the industry’s broadest portfolio of 64-bit multicore SoCs Built on the ARM® Cortex®-A57 architecture with integrated L2 switch enabling
interconnect and peripherals to provide a complete system-on-chip solution
TM
External Use 46
QorIQ LS2 Family Key Features
Unprecedented performance and
ease of use for smarter, more
capable networks
High performance cores with leading
interconnect and memory bandwidth
• 8x ARM Cortex-A57 cores, 2.0GHz, 4MB L2
cache, w Neon SIMD
• 1MB L3 platform cache w/ECC
• 2x 64b DDR4 up to 2.4GT/s
A high performance datapath designed
with software developers in mind
• New datapath hardware and abstracted
acceleration that is called via standard Linux
objects
• 40 Gbps Packet processing performance with
20Gbps acceleration (crypto, Pattern
Match/RegEx, Data Compression)
• Management complex provides all
init/setup/teardown tasks
Leading network I/O integration
• 8x1/10GbE + 8x1G, MACSec on up to 4x 1/10GbE
• Integrated L2 switching capability for cost savings
• 4 PCIe Gen3 controllers, 1 with SR-IOV support
• 2 x SATA 3.0, 2 x USB 3.0 with PHY
SDN/NFV
Switching
Data
Center
Wireless
Access
TM
External Use 47
See the LS2 Family First in the Tech Lab!
4 new demos built on QorIQ LS2 processors:
Performance Analysis Made Easy
Leave the Packet Processing To Us
Combining Ease of Use with Performance
Tools for Every Step of Your Design
TM
© 2014 Freescale Semiconductor, Inc. | External Use
www.Freescale.com