High Performance Baseband Processor and LTE/LTE-A System Solution
CASIA National Engineering & Technology Research Center for ASIC Design
Kaiyang Liu ([email protected])
April 2017
1
INTRODUTION OF HIGH PERFORMANCE BASEBAND PROCESSOR - MAPU
2
Algebraic Processor MaPU Introduction
3
800MHz ARM Core
1.2GHz APE Core
40nm LP Process
Size:42mmx42mm
Peak Computational
Performance :
768GOPS@16bit
Power:
• Typical 5V,2.5A。
• 3W for Single Core
FFT Instruction
APELocal
Memory
APELocal
Memory
APELocal
Memory
APELocal
Memory
Hig
h S
pee
d N
etw
ork
DDR 3 Controller 0 PCIe 0 RapidIO 0
DDR 3 Controller 1PCIe 1RapidIO 1
Cortex-A8 ShareMemory
High Speed Network
High Speed Network
EthernetDDR3
ControllerGPU CODEC
GPIO
UART
External Bus
Interface
I2C/SPI
IIS
Timer
Watch Dog
Interrupt
JTAG
Reset
L2
Bu
s
L1 Bus
L2
Bu
s
4x 4x 4x 4x
MaPU Novel Conception
Combine with GP CPU、ASIC and FPGA merits
4
Parallel Computing and
Memory, reconfigurable
Programming, Flexible
Ordered by Algorithms, High Performance and Low Power
Data-Intensive Computing
Hardware Computing
Innovative Multi-granularity Parallel Memory
Concentric Circles Model
IS Based Design
Novel Instruction Set Architecture(ISA) of MaPU
5
lication lgorithm nstruction et hitecture
AppAISArcTM:
TM
ISA
MaPU Pipeline
• Hard:Micro opts direct controll cores
• Soft:Micro opts array pipeline support multiple algorithm
• bottom level Microcode Pipeline reach ASIC efficiency
• Top level Scalar pipeline support multiple application instructions
MaPU in Mobile Network Advantages
6
FD-LTE
TD-LTE
TDS-CDMA
WCMDA
CDMA
GSM
基于MaPU的
软基站平台
MaPU SOC architecture and Mobile Com Instruction Set provide ultra
high computing power(307.2GFLOPS)。
Multi- protocol SDR platform,smoothly evolved ,software upgrade
MaPU in Mobile Network Advantages
7
MaPU Merits in Mobile eNB Baseband
Powerful algebra computing
Comp Unit Alg. Comp Tp Alg Opt
Fixed ALU
Fixed MAC
Single ALU
Single MAC
Double ALU
Double MAC
Vector Add
Vector Mux
Matrix Add
Matrix Mux
Matrix Trans
FFT
2D Filter
OFDM
Chn Est.
Sync
Scr/De-Scr
Pre-coding
MIMO
LTEPHY
MaPU in Mobile Network Advantages
8
MaPU Merits in Mobile eNB Baseband
运算核心
寄存器堆
片上存储器
DDR
Novel“Concentric Circle”Model Space Time Conception
Data Layer Parallel
Tasks Layer Parallel
Algorithm Parallel
User Layer Parallel
Chn Encode/Decode
LTE PHY Layer Example
Scrb/DeScrb Mod/DeMod Chn Est.
Core
Reg Stk
SOC Mem
DDR
MaPU in Mobile Network Advantages
9
MaPU Merits in Mobile eNB Baseband
Novel AppAISArcTM ISA
Compute Unit
Memory
Std Processor:Single Instruction Single Op
SIMD/VLIW Processor: Multi IS Parallel Ops
ISA Level Low
MaPU in Mobile Network Advantages
10
MaPU Merits in Mobile eNB Baseband
Novel AppAISArcTM IS
Compute Unit
Memory
Std Processor:Single Instruction Single Op
SIMD/VLIW Processor: Multi IS Parallel Ops
IS Level Low
Novel MaPU processor: CompaOpt for Complex Algorithm
Microcode High Level Single MC multi tasks
Mod
Scrb
+
MaPU key features in Wireless Comms
11
MaPU Merits in Mobile eNB Baseband
Strong Algbra Computing Power
Novel “Concentric Circle”Model
Novel AppAISArcTM ISA
The novel MaPU
architecture provides
great Flexibility,
performance and
energy efficience,
similar to ASIC. It
provides a powerful
guarantee for
Realtime and High
throughput of wireless
communication
systems.
Towards 2020 processors’ Roadmap
12 12 12 Design Tools Manufacturing Tech Test & Verify
MaPU (2015,40nm)
极光-HPP1.0 (Super computing,
2017,16nm)
AppAISArcTM ISA
极光-C1.0 (Towards 5G wireless
communiations ,2017,28nm)
极光-M1.0 (Multimedia,2017,
28nm)
Novel Processor Design Platform
with Independent Intellectual property
极光JiGuang-C1.0:Towards 5G
13
极光-C1.0 ten times computational performance to power consumption ratio then TMS320C6670
极光-C1.0: ARM x 1 + UCP Core x 8
UCP Core performance: 1.2GHz 1843.2GOPS@16bit
28nm process Share memory:192Mb Telcoms coprocessor CSCP SRIO、PCIe、CPRI power < 8W
极光JiGuang-C1.0 Advantage
14
BS1
W1
BS2
W2
BS3
W3
H1
H2
H3
极光-C1.0
Powerful Computation: •Broad Band Communications •Multimedia •Massive connections
System total solution: •Support LTE/LTE-A、WiFi etc. •Support MIMO、CA、CoMP
High reliability and Security: •Independent intellectual technology •National Security, Commercial Confidentiality, Personal privacy
HIGH PERFORMANCE LTE/LTE-A BASEBAND SOLUTION Based On MaPU
15
LTE flat network framework
16
MME / S-GW MME / S-GW
X2
S1
Mobility Management
Serving Gateway
Ports between MME/SGW
eNode B
(TS-36.411, 412, 413, 414)
EPC
E-UTRAN
Ports between eNode B
(TS 36.421, 422, 423, 424)
Node B
RNC
+ = eNode B
EPS
UMTS: Universal Mobile Telecom System
E-UTRAN: UMTS Terrestrial Radio Access
Network
EPC: Evolved Packet Core (演进型分组核心网)
eNode B
X2 X2
eNode B
eNode B
Uu
EPS: Evolved Packet System (演进型分组系统)
RNC: Radio Network Controller (无线网络控制
器– 仅3G)
UE
(wireless access
network evolution)
(framework
evolution)
MaPU
Solution of LTE software base station based on MaPU
17
Reference System
MaPU源代码
ARM程序 SPU程序 MPU程序
代码着色与代码分解
llvm-mc ld.goldclangarm-none-eabi-gcc
预编译
二进制代码组 二进制代码组 二进制代码组
ARM-SPU-MPU控制流关系数据库
自顶向下逐层查找未定义函数符号
后编译
二进制代码组 二进制代码组 二进制代码组关系表
ARM可执行文件 SPU可执行文件 MPU可执行文件
控制流关系数据库生成
程序发布
生成调度代码
生成参数传递代码
Compiling Processing
Simulator Prototype System
main.cmain.c
lte_projlte_proj
ARM编译脚步ARM编译脚步
armarm spuspu mpumpu
ltelte
IncludeIncludecommoncommondl_msdl_msdl_bsdl_bsul_msul_msul_bsul_bs lte.clte.clte_uplink.clte_uplink.c lte_downlink.c
lte_downlink.c
SPU编译脚本SPU编译脚本lte_core0lte_core0lte_core1lte_core1
lte_bs_recv.c
lte_bs_recv.c *.h*.h
ULRecvFFT.cULRecvFFT.c Init.spu.sInit.spu.s main.cmain.c
MPU编译脚本MPU编译脚本
fftfft fft2048.mpu.sfft2048.mpu.s
*.sh*.sh
……
buildbuild
build_spubuild_spu build_mpubuild_mpu
channelchannel 编译脚本编译脚本
编译脚本编译脚本
slibslib
编译脚本编译脚本
ChanEst_MMSEChanEst_MMSE
mlibmlibm5outm5outimagesimages
ChanEst_LS.spu.c
ChanEst_LS.spu.c
fft.spu.cfft.spu.c
……
……
ULRecvChEst.cULRecvChEst.c
ChanEst_MMSE.mpu.s
ChanEst_MMSE.mpu.s
u-bootu-boot fs_rootfs_root
……
boot_romboot_rom
ChanEst_MMSE.spu.c
ChanEst_MMSE.spu.c
ChanEst_LI.spu.c
ChanEst_LI.spu.c
DeModScrDeModScr DeModScr.mpu.s
DeModScr.mpu.s
equalizerequalizer equalizer.mpu.s
equalizer.mpu.s
DeRatematchDeRatematch DeRatematch.mpu.s
DeRatematch.mpu.s
idftidft idft.mpu.sidft.mpu.s
ULRecvRscDeMap.c
ULRecvRscDeMap.c
RscDeMap.spu.c
RscDeMap.spu.c
ULRecvMIMO.c
ULRecvMIMO.c
freqShift.spu.cfreqShift.spu.c
ULRecvDeModScr.c
ULRecvDeModScr.c
freqMove.spu.c
freqMove.spu.c
NoisePowEst.spu.c
NoisePowEst.spu.c
Equalizer.spu.c
Equalizer.spu.c
Idft.spu.cIdft.spu.c DeModScr.spu.c
DeModScr.spu.c
……
LTE Demo system based on MaPU
18
Performance:
supporting 2x2 MIMO
peak data rate:150Mbps
Development goal:
data transmitting and receiving of
LTE PUSCH based on MaPU
19
precoding
database generating
scheduling code generating
parameter delivering code generating
latter compiling
program releasing
code resolving
MaPU source code ARM simulator APE simulator UART terminal
Tra
ce
info
rmatio
n d
ispla
y
cycles APE index mpu instructions register data
Prototype system of LTE software base station based on simulator
Rx Processing of LTE/LTE-A PUSCH
20
UL received signal processing procedure
21
19%
3% 3% 3% 4%
2% 6%
60%
Algebraic instrucion processing time duty ratio
解OFDM 解资源映射+AGC
信道估计+噪声方差 均衡/MIMO
IDFT 解调解扰
解信道交织+解速率匹配 turbo decoder
Rx Processing of LTE/LTE-A PUSCH
Algebraic instruction performance
22
2048 fixed point FFT algebraic instruction Turbo decoder algebraic instruction
Processor Unit Throughput/Mbps
MaPU Turbo
decoder 496
TMS320C6670 TCP3d 348
Processor Unit Throughput/Msps
MaPU APE core 1168
TMS320C6670 FFTC 431
23
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
8.47 9.74
22.96 20.12
37.17
加速
比
MaPU与TMS320C6670加速比统计
FFT MMSE信道估计 解调解扰(64QAM) 均衡 信道估计线性插值
Algebraic instruction performance
FFT MMSE CE DeMod&DeScrb Equ CE Linear Int.
Reference: Donglin Wang.., “MaPU: A Novel Mathematical Computing Architecture”,IEEE HPCA, 2016
24
Q&A
谢谢!
25