A Comparison of Five DifferentMultiprocessor SoC Bus Architectures
Kyeong Keol Ryu, Eung Shin and Vincent J. Mooney IIISchool of Electrical and Computer EngineeringGeorgia Institute of Technology, Atlanta, USA
{kkryu, eung, mooney}@ece.gatech.edu
Outline
IntroductionMotivation and Previous WorkFive Bus Architectures for SoC:
BFBA, GBIA, GBIIA, CSBA, and CCBA
Application Examples: OFDM transmitter and MPEG2 decoder
Experiment EnvironmentComparison in View of Algorithm and ArchitectureComparison of Throughput of the Bus ArchitecturesConclusion
IntroductionMPC750_A SRAM SRAM_A REGISTERS BI-FIFO_A
MPC750_B SRAM SRAM_B REGISTERS BI-FIFO_B
MPC750_C SRAM SRAM_C REGISTERS BI-FIFO_C
MPC750_D SRAM SRAM_D REGISTERS BI-FIFO_D
zz
xx
zz
xx
zz
xx
zz
xx(A) (B)
(A) (B)
CPU Bus A
CPU Bus B
CPU Bus C
CPU Bus D
PCB
Motivation and Previous Work (I)
CoreConnect (IBM):Processor Local Bus (PLB)On-chip Peripheral Bus (OPB)
AMBA (ARM):Advanced High-performance Bus (AHB)Advanced Peripheral Bus (APB)
Intellectual Propery (IP)
IP1 IP2 IP3
PLB
IP1 IP2 IP3
AHB
Motivation and Previous Work (II)
Sonics uNetworkTDMA arbitrationIP reuse and integration
Whisbone architecture (Silicore)one bus for allsupports multiple masters
In terms of bus topology, uNetwork and Whisbone are similar to AMBA and CoreConnect
Five Bus Architectures for 4 processor System (I)
Global Bus I Architecture (GBIA)Bi-FIFO Bus Architecture (BFBA)
MPC750_ASRAM SRAM_A REGISTERS BI-FIFO_A
MPC750_BSRAM SRAM_B REGISTERS BI-FIFO_B
MPC750_C SRAM SRAM_C REGISTERS BI-FIFO_C
MPC750_D SRAM SRAM_D REGISTERS BI-FIFO_D
zz
xx
zz
xx
zz
xx
zz
xx
xx
(A) (B)
(A) (B)
CPU Bus A
CPU Bus B
CPU Bus C
CPU Bus D
Crossbar Switch Bus Architecture(CSBA)
IBM CoreConnect Bus Architecture(CCBA)
Global Bus II Architecture (GBIIA)
Five Bus Architectures for 4 processor System (II)
Application Examples (I)OFDM Transmitter
Block Diagram
Data Format: 32 guard samples and 128 data samplesFunction Assignment
Reference: D. Kim and G. L. Stüber, ''Performance of Multiresolution OFDM on Frequency-selective Fading Channels,''
IEEE Transaction on Vehicular Technology, vol. 48, no. 5, pp. 1740-1746, September 1999.
A1Pro_A
Pro_B
Pro_C
Pro_D
Time
Compute Node
…..B1
A2
B2
A3
C1
B3
A4
C2
D1
C3
D2
B4
C4
D3 D4
Application Examples (II)
MPEG2 DecoderVideo Processing Example
16 x 16 pixel resolution, M=1, N=2
SH I P
SH I P
SH I P
SH I P
SH I P
SH I P
SH I P
SH I P
Pro_A
Pro_B
Pro_C
Pro_D
Time
SH: Sequence header, I: Intra decoding frame, P: Predictive decoding frame
Compute Node
…..SH I P
SH I P
SH I P
SH I P
SH I P
SH I P
SH I P
SH I P
Pro_A
Pro_B
Pro_C
Pro_D
Time
Compute Node
…..
( BFBA and GBIA ) ( GBIIA, CSBA, and CCBA )
Experiment EnvironmentCo-simulation Environment
Seamless CVE• co-simulator from Mentor Graphics
VCS• A Verilog HDL simulator from Synopsys
XRAY• A High-level debugger from Mentor Graphics
PowerPC C cross compiler• GCC
External Clock of PowerPC 750• 83.33 MHz (the internal clock speed can be much faster,
e.g., 400MHz)
Comparison in View of Algorithm and Architecture
AlgorithmOFDM Transmitter
• Strong output-data dependency between functions using many local variables• Many short loops• Few global variables
MPEG2 Decoder• Many global variables for header information• Hierarchical data structure which has a long loop with many nested loops
ArchitectureBFBA and GBIA
• No method to access global data• Fast data transfer between processor blocks
GBIIA, CSBA, and CCBA• Efficient access of global data
Comparison of Throughput of the Bus Architectures (I)
OFDM Transmitter
1.02
1.04
1.06
1.08
1.1
1.12
1.14
BFBA
GBIA
GBIIA
CSBA
CCBA
[Mbps]
1.1208Mbps4.5682 ms380,686CCBA
1.1222Mbps4.5624 ms380,199CSBA
1.1197Mbps4.5727 ms381,061GBIIA
1.0588Mbps4.8360 ms403,000GBIA
1.1277Mbps4.5402 ms378,348BFBA
ThroughputExe.Time/Packet
Exe.Cycles/Packet
BusArchitecture
Reference: 128 data samples and 32 guard samples per packet
Comparison of Throughput of the Bus Architectures (II)
MPEG2 Decoder
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
BFBA
GBIA
GBIIA
CSBA
CCBA
[Mbps]
0.6769Mbps4.5382 ms378,181CCBA
0.6781Mbps4.5306 ms377,548CSBA
0.6780Mbps4.5307 ms377,562GBIIA
0.4852Mbps6.3305 ms527,545GBIA
0.5041Mbps6.0942 ms507,853BFBA
ThroughputExe.Time/Packet
Exe.Cycles/Packet
BusArchitecture
Reference: 128 data samples and 32 guard samples per packet
ConclusionFive bus architectures evaluated
• BFBA, GBIA, GBIIA, CSBA, and CCBATwo application programs
• OFDM transmitter and MPEG2 decoderPipeline or parallel operation improves performanceBFBA best for OFDM
• pipelined applicationsCSBA best for MPEG2
• parallel applicationsbus architecture performance heavily dependent on
• distribution of computation load• algorithm style
Future work: combine the bus architectures with switching logic to maximize performance according to application characteristics