1
“A PROGRAMMABLE AND HIGHLY “A PROGRAMMABLE AND HIGHLY PIPELINED PPP ARCHITECTURE FOR PIPELINED PPP ARCHITECTURE FOR
GIGABIT IP OVER SDH/SONET”GIGABIT IP OVER SDH/SONET”
Ciaran ToalCiaran Toal,, Sakir SezerSakir Sezer
1010thth ReconfigurableReconfigurable Architecture Architecture Workshop 2003Workshop 2003
OutlineOutline
Introduction to IP over SONET/SDHIntroduction to IP over SONET/SDHThe PointThe Point--toto--Point Protocol (PPP)Point Protocol (PPP)PPP System Architecture PPP System Architecture PPP DataPPP Data--PathPath
3232--Bit Programmable CRC UnitBit Programmable CRC UnitEscape Detect and Escape Generate UnitsEscape Detect and Escape Generate Units
Synthesis and Circuit AnalysisSynthesis and Circuit AnalysisComparison between 8Comparison between 8--bit and 32bit and 32--bit bit implementationsimplementations
ConclusionsConclusions
IP over SDH/SONET IP over SDH/SONET Layer 2 TechnologiesLayer 2 Technologies
SDH/SONET
Internet Protocol
EthernetBridge, VLAN
PPP GFP*FrameRelay
ATM(MPOA, MPLS,..)
Network Layer
DataLink Layer
PHY Layer
Legacy Protocols Emerging Protocols
PointPoint--toto--Point Protocol based Point Protocol based NetworkingNetworking
SDH/SONET (optical Layer)
Internet Protocol
PPP56kbps
Network Layer
DataLink Layer
PHY Layer
PPP0.5-2Mbps
PPP155 Mbps
PPP622 Mbps
PPP2.4 Gbps
Dialup Modem
ADSL Modem
T1/E1Leased
LineSTM 1OC 3
STM 16OC 48
STM 4OC 12
The PointThe Point--toto--Point ProtocolPoint ProtocolThe most efficient layer 2 protocol for encapsulating IP The most efficient layer 2 protocol for encapsulating IP datagrams datagrams Key applications include ADSL, dialup modems, Key applications include ADSL, dialup modems, encapsulated Ethernet, Virtual Local Area Networks encapsulated Ethernet, Virtual Local Area Networks (VLAN), Virtual Privet Networks (VPN) etc.(VLAN), Virtual Privet Networks (VPN) etc.Key functions:Key functions:
Framing and Error Control method for encapsulating data over physical layer point-to-point links.Link Control Protocol (LCP) functions to establishes, negotiate configure and terminate PPP links between two network nodes. Network Control Protocol (NCP) function for upper network protocols, optional for ATM, IP, Ethernet etc.
The PPP Frame FormatThe PPP Frame Format
The PPP frame is made up of the following fields:The PPP frame is made up of the following fields:Flag Flag -- A single byte which indicates the beginning or end of a frame. A single byte which indicates the beginning or end of a frame. The flag field always consists of the binary sequence 01111110 The flag field always consists of the binary sequence 01111110 Address Address -- A single byte that contains the binary sequence 11111111. A single byte that contains the binary sequence 11111111. This is the standard broadcast addressThis is the standard broadcast addressControl Control -- A single byte that contains the binary sequence 00000011 A single byte that contains the binary sequence 00000011 which requests user data transmission in an which requests user data transmission in an unsequencedunsequenced frame frame Protocol Protocol -- Two bytes that identify the protocol encapsulated in the Two bytes that identify the protocol encapsulated in the information field of the frame information field of the frame Payload Payload -- Zero or more bytes that contain theZero or more bytes that contain the datagramdatagram for the protocol for the protocol specified in the protocol field specified in the protocol field Frame Check Sequence (FCS)Frame Check Sequence (FCS) -- Normally 16 bits (2 bytes)Normally 16 bits (2 bytes) but can be but can be 32 bits (4 bytes) for improved error detection32 bits (4 bytes) for improved error detection
Bytes 1 1 1 1 or 2 Variable 2 or 4 1
Flag 01111110
Address 11111111
Control 00000011 Protocol Payload Checksum Flag
01111110
2
Programmable Programmable SoPCSoPC System System ArchitectureArchitecture for PPP processingfor PPP processing
The Programmable SoPC Architecture for PPP processing is composed of three architectural independent units.
This includes :Protocol DataProtocol Data--Path UnitPath UnitA highly pipelined and parallel frame processing circuit
Protocol ControlProtocol Control--Path UnitPath UnitData path control, Register and control protocol FIFO circuit
EmbeddedEmbedded MicrocontrollerMicrocontroller UnitUnitResponsible for Control and management protocol processing and Data-Path configuration.
SoftµP
CoreLocal
µP BUS
Local µP Bus
Interface
Slave
Master
OAM
Control
FIFO Control
Memory Control
PPP RxD
PPP TxD
Memory Control
FIFO Control
PHY
PHY
Protocol Control Path
Protocol Data Path
SRAM
SRAM
Software
Higher Layer Protocol Processing e.g. Router
MEM
Peripherals
Slave
Hardware
PPP PPP SoPCSoPC System ArchitectureSystem Architecture
PP55 Protocol DataProtocol Data--Path UnitsPath Units32-bit wide circuit. Complete implementation in hardware
operating at 78.125 Mhzoffering a data transfer rate of 2.5 Gbps
TxD and RxD units are independent, parallel and pipelined
Focus of presentation is on this unit. 8-bit version has also been implemented
Transmitter Control
Transmitter CRC Control
Escape Generate
State Machine State Machine
State Machine
32
Data Path
Transmitter Control
Transmitter CRC Control
Escape Generate
State Machine State Machine
State Machine
32
Data Path
Receiver Control
Receiver CRC Control
Escape Detect
State Machine State Machine
State Machine
32
Data Path
Embedded Microprocessor UnitEmbedded Microprocessor UnitBased on a soft IP core processor(e.g. Nios Microblaze or Leon).
Processes the majority of the control and management protocols and control / housekeeping functions.
Interfaced to a local bus via a standard embedded bus architecture e.g. AMBA
All implemented functions are software based and can be reprogrammed without interrupting the data transmission path.
Leon µP
AMBABUS
Slave
Master
MEM
Peripherals
Slave
Protocol ControlProtocol Control--Path UnitPath Unit
Handles the interaction between the embedded processor and the Layer 2 protocol data path
Composed of an Operation Administration and Maintenance (OAM) unit,
Accommodates a transmitter and a receiver FIFO for temporary storage of control protocols.
AMBA Bus Int.
OAM
Control
FIFO Control
Memory Control
Memory Control
FIFO Control
SRAM
SRAM
Transmitter Control
Transmitter CRC Control
Escape Generate
State Machine State Machine
State Machine
32
Data Path
The PThe P55 TxDTxD DataData--PathPath
Consists of 3 main units, a control unit, a CRC processing unit and an Escape generator unit. The data-path is heavily pipelined.
3
Receiver Control
Receiver CRC Control
Escape Detect
State Machine State Machine
State Machine
32
Data Path
The PThe P55 RxDRxD DataData--PathPath
Essentially the reverse operation of the TxD.The circuit of each block is very different from the TxD sub-blocks.
F
A
C
P1
P2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CRC
21
CRC
CRC
CRC
F
F
A
C
P1
P2
1
2
3
4
Transmit
Start of Frame
Start of Frame
Configurable 32Configurable 32--Bit parallel CRC Bit parallel CRC computationcomputation
CRC unit must be able to process 1, 2, 3 or 4 input bytes The need for configurability requires further pipeliningWait state insert for backpressure must be included
Configurable 32-Bit CRC Calculator Circuit
Output D
ata Buffer
Input Data Buffer
Byte R
eorder M
echanism
Data Bytes Select
Input Data 32
2
Bits 24-31
CRC Output
32
32
Bits 16-23 Bits 8-15
Bits 0-7
CRC
Computational Matrix
PPP Frame DelimiterPPP Frame DelimiterRequire safe-guard for any instances of flag character 0x7E outside of the flag fields
Escape character 0x7DFlag character 0x7E replaced by escape character 0x7D and original data with sixth bit complicated.
e.g. consider the following data sequence:
6C 12 7E 51 4A
6C 12 7D 5E 51
Data before transmission
Data stream transmitted
3232--Bit Escape GenerateBit Escape Generate
Consider 4 bytes of data to be processedConsider 4 bytes of data to be processedByte location 2 contains a flag character 0x7E Byte location 2 contains a flag character 0x7E
The Escape generator inserts a 0x7D (Esc) character. The Escape generator inserts a 0x7D (Esc) character.
Byte Location 1 (7-0)
Byte Location 2 (15-8)
Byte Location 4 (31-24)
Byte Location 3 (23-16)
A1
7E
12
8C
Byte Location 1 (7-0) A1
Byte Location 2 (15-8)
Byte Location 3 (23-16)
Byte Location 4 (31-24)
7D
5E
12
Extra Byte 8C
3232--Bit Escape Generate CircuitBit Escape Generate CircuitExtra pipelining is added to reduce critical path.
Data reorder mechanism is introduces to insert Escape characters and to rearrange the 4 byte data-path frame data.
Buffer is introduces to enable full cycle back-pressure mechanism after inserting 4 Escape characters.
Byte 1
32
Byte 2
Byte 3
Byte 4
Data In (32 Bits)
Byte 1
Byte 2
Byte 3
Byte 4
Byte 5
Byte 6
Byte 7
Byte 8
Data Latch (64 Bits)
Flag Detect
Data Fill
MUX
64
32
Byte 2
Byte 1
Byte 3
Byte 4
Data Out (32 Bits)
Byte 1
Byte 2
Byte 3
Byte 4
Byte 5
Byte 6
Byte 7
Byte 8
Buffer Fill
64
Buffer Store (64 Bits)
32
Feedback Buffer (32 Bits)
XOR + Insert
Data reorganising mechanism controller
4
Byte Location 1 (7-0)
Byte Location 2 (15-8)
Byte Location 4 (31-24)
Byte Location 3 (23-16)
A1
7D
5E
8C
Byte Location 1 (7-0) A1
Byte Location 2 (15-8)
Byte Location 3 (23-16)
Byte Location 4 (31-24)
7E
8C
Empty
3232--Bit Escape DetectBit Escape Detect
Consider 4 bytes at the receiverConsider 4 bytes at the receiverByte location 2 contains byte escape character 0x7D Byte location 2 contains byte escape character 0x7D
3232--Bit Escape Detect UnitBit Escape Detect Unit
Extra pipelining is added to reduce critical path.
Data reorder mechanism is introduces to eliminate inserted Escape characters.
Buffer is introduces to enable full cycle wait states.
Byte 1
32
Byte 2
Byte 3
Byte 4
Data In (32 Bits)
Byte 1
Byte 2
Byte 3
Byte 4
Data Latch (32 Bits)
Flag Detect Data Fill
MUX
32
32
Byte 2
Byte 1
Byte 3
Byte 4
Data Out (32 Bits)
Byte 1
Byte 2
Byte 3
Byte 4
Buffer Fill
32
Buffer Store (64 Bits)
32
No Data
XOR Next Byte
Feedback Buffer (64 Bits)
Byte 1
Byte 2
Byte 3
Byte 4
FeedbackBuffer Fill
Data reorganising mechanism controller
8 -B it S ys te m
P re -la yo u t S yn th e s is P o s t-la yo u t S y n th e s is
X C V 5 0 -4 9 5 .3 M H z
1 8 4 L U T s
8 4 R e g is te r
7 9 .5 M H z
1 3 0 S lic e s 1 9 1 L U T s
X C 2 V 4 0 -6 1 2 8 .4 M H z
1 7 9 L U T s
8 4 R e g is te r
9 1 .5 M H z
1 2 4 S lic e s 1 8 5 L U T s
88--Bit System Synthesis ResultsBit System Synthesis Results
8-bit PPP meets required speed of 78.125 Mhz with Virtexand Virtex II technologyVirtex II enables considerable speed-up over Virtex. Critical paths analysis revealed the same number of LUTs for both technologies. Thus speed-up is achieved because of the technological advantage of Virtex II
3232--Bit System Synthesis ResultsBit System Synthesis Results
32-Bit System
Pre-layout Synthesis Post-layout Synthesis
XCV600 -4 73 MHz
2641 LUTs 841 Register 65
M Hz 1208 Slices 2563 LUTs
XC2V1000 -6 125.9 MHz
2230 LUTs
689 Register
78.66 M Hz
1144 Slices 2157 LUTs
The 32-bit PPP implementation meets the required speed of 78.125 Mhz with Virtex II technology only.Again, critical paths analysis revealed the same number of LUTs for both technologies.The 32-bit implementation is ×11 times larger than the 8-bit implementation.
Escape Generate Unit SynthesisEscape Generate Unit Synthesis
Escape Generate Module
32-Bit Implementation 8-Bit Implementation
XCV40 -6 492 LUTs (96%)
168 Register (32%)
22 LUTs (4%)
6 Register (~1%)
Size increase x 22.36
32-bit Escape Generate unit requires 22 times more combinational logic and 28 times as many flip-flops as the 8-bit version.
Therefore the large size increase for the 32-bit PPP is due to decisional logic and data reordering mechanisms included within the CRC and Escape units ..
ConclusionsConclusionsWe have shown that a programmable Layer 2 network processing for 2.5 Gbps throughput rate, including control protocol processing is feasible using the latest SoPCtechnology.
The circuit study has revealed that the 32-bit PPP implementation is 11 times larger than 8-bit version.
Further analysis has shown that this increase is mainly due to the byte sorter and buffering mechanisms included in the 32-bit design which are heavy in combinational logic.
Programmability of PPP processing is achieved by reconfiguring the programmable logic blocks and by reprogramming the firmware of the embedded processor.
5
Future WorkFuture WorkASIC Implementation of the current PPP SoPCarchitecture as a complete SoC (System on a Chip) solution.
Investigating new technologies and systems architectures for programmable / configurable network processing.
Investigating trade-offs using off-the-shelf embedded processor and FPGA technology for network processing.
Development of a new generation of programmable packet processing elements by combining configurable logic with custom processing technology.