Zynq Ultrascale+ Architecture
Stephanie Soldavini and Andrew Ramsey
CMPE-550
Dec 2017
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 1 / 17
Agenda
Heterogeneous Computing
Zynq Ultrascale+
HistoryArchitectureApplications
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 2 / 17
Problem: Flexibility/Performance Trade Off
Performance
Flex
ibili
ty
General PurposeProcessors (GPPs):
Application-SpecificProcessors (ASPs)
Co-ProcessorsApplication Specific Integrated Circuits
(ASICs)
Configurable Hardware
- Type and complexity of computational algorithms(general purpose vs. Specialized)
- Desired level of flexibility - Performance- Development cost - System cost- Power requirements - Real-time constrains
Selection Factors:
Specialization , Development cost/timePerformance/Chip Area/Watt(Computational Efficiency)
Prog
ram
mab
ility
/
Software Hardware
e.g. FPGAs
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 3 / 17
Solution: Use some ofeach in a single system
Problem: Flexibility/Performance Trade Off
Performance
Flex
ibili
ty
General PurposeProcessors (GPPs):
Application-SpecificProcessors (ASPs)
Co-ProcessorsApplication Specific Integrated Circuits
(ASICs)
Configurable Hardware
- Type and complexity of computational algorithms(general purpose vs. Specialized)
- Desired level of flexibility - Performance- Development cost - System cost- Power requirements - Real-time constrains
Selection Factors:
Specialization , Development cost/timePerformance/Chip Area/Watt(Computational Efficiency)
Prog
ram
mab
ility
/
Software Hardware
e.g. FPGAs
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 3 / 17
Solution: Use some ofeach in a single system
Heterogeneous Computing
Combine the use of different devices, for example:
Hardware accelerator used to speed up one function in a programOffload matrix calculations to a GPUCloud system with GPP, GPU, and/or FPGA resources
Allows for each part of a task to run on the device it is best suited for
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 4 / 17
Zynq Ultrascale+ History
Made by Xilinx
“Microheterogenous”
Integrates GPP, GPU, FPGA, Co-Proc, &ASIC in one SoCIncreases speed by reducing off-chip datatransfer
Predecessors
Kintex-UltraScale and Virtex-UltraScale(20/16nm FPGA fabric)Zynq-7000 (Dual-core ARM Cortex A9 &28nm FPGA fabric)
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 5 / 17
General Architecture
Processing System (PS)
Application Processing Unit (APU)
64-bit quad-core or dual-coreARM Cortex-A53
Real-time Processing Unit (RPU)
32-bit dual-core ARM Cortex-R5
Graphics Processing Unit (GPU)
ARM Mali-400
On-Chip Memory (OCM)
256 kB RAM withError-Correcting Codes (ECC)
Programmable Logic (PL)
16nm FinFET+programmable logic
Configurable LogicBlocks (CLB)
36 kb Block RAMs
UltraRAM
DSP Blocks
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 6 / 17
Processing System(PS)
Programmable Logic(PL)
Interconnects & I/O
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 7 / 17
Application Processing Unit (APU)
64-bit quad-core or dual-coreARM Cortex-A53
Up to 1.5 GHz
ARMv8-A Architecture
64-bit mode: A64instruction set32-bit mode: A32/T32instruction set
Single/double precision floating point unit (FPU)
Cache
IL1: 32 kB 2-way set-assoc with parity (independent for each CPU)DL1: 32 kB 4-way set-assoc with ECC (independent for each CPU)L2: 1 MB 16-way set-assoc with ECC (shared between CPUs)
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 8 / 17
Real-time Processing Unit (RPU)
32-bit dual-core ARM Cortex-R5
Up to 600 MHz
ARMv7-R Architecture: A32/T32instruction set
Single/double precision FPU
Caches/Tightly Coupled Memory (TCM)
L1: 32 kB 4-way set-assoc with ECC (independent for each CPU)TCM: 128 kB (independent, but can be combined into one 256 kB)
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 9 / 17
Graphics Processing Unit (GPU)
ARM Mali-400
Up to 667 MHz
One geometry processor
Two pixel processors
Supports OpenGL 1.1 & 2.0, OpenVG 1.1
Advanced anti-aliasing support
Cache: L2: 64 kB
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 10 / 17
Programmable Logic (PL)
16nm FinFET+ programmable logic
Configurable Logic Blocks (CLB)Look Up Tables (LUT)Flip flops (FF)Cascadable adders
36 kb Block RAMsTrue dual-portUp to 72 bits wideConfigurable as dual 18 kb
UltraRAM288 kb72 bits wideECC
DSP Blocks27×18 signed multiply48-bit adder/accumulator27-bit pre-adder
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 11 / 17
Vivado Design Suite
Bright greenshowsconfigurablecomponents
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 12 / 17
Vivado Design Suite
Customizecomponents,for instancethe DDRcontroller
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 13 / 17
Applications
Data Center: Networked Storage/Service Platform[2]
Multimedia, video encoding/decoding[1]
Particle physics[4]
Automotive driver assistance, driver information, and infotainment.
LTE radio and baseband.
Medical diagnostics and imaging.
Video and night vision equipment.
Wireless radio.
Single-chip computer.
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 14 / 17
Application: Data Center
A sample configuration used for a networked storage platform
4.5X performance speed-up & 20X power reduction over x86implementations
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 15 / 17
References
[1] Gosain, Y. and A. Gupta. 2017. “Xilinx Advanced Multimedia Solutions with VideoCodec/Graphics Engines,” Zynq UltraScale+ MPSoC. Xilinx, October 23.https://www.xilinx.com/support/documentation/white papers/wp497-multimedia.pdf
[2] Hansen, L. 2016. “Unleash the Unparalleled Power and Flexibility of Zynq UltraScale+MPSoCs,” Zynq UltraScale+ MPSoC. Xilinx, June 15.https://www.xilinx.com/support/documentation/white papers/wp470-ultrascale-plus-power-flexibility.pdf
[3] Shaaban, M. “Basics of Computer Design.” Lecture, CMPE-550, Rochester, NY, August 29,2017.
[4] Stamen, R. “The Development of the Global Feature eXtractor (gFEX) for the ATLASLevel 1 Calorimeter Trigger at the LHC” Presented at TWEPP 2017, Santa Cruz, CA, 2017.
[5] Xilinx, “Overview,” Zynq UltraScale+ MPSoC Data Sheet, July 2017.https://www.xilinx.com/support/documentation/data sheets/ds891-zynq-ultrascale-plus-overview.pdf
[6] Xilinx, “Zynq UltraScale+ Device,” Technical Reference Manual, November 2017.https://www.xilinx.com/support/documentation/user guides/ug1085-zynq-ultrascale-trm.pdf
Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 17 / 17