Embedded Computer Architecture5KK73
Wrap-Up
Henk Corporaalwww.ics.ele.tue.nl/~heco/courses/EmbeddedComputerArchitecture
TUEindhoven January 2015
04/21/23 ACA H.Corporaal 2
Trends:• #transistors follows Moore• but not freq. and performance/core
Core i7
3GHz
100W
5
Crisis?
Today: Final lecture
• Many MPSoC examples
• Educational boards
• Small experiment
• Final remarks– course exam procedure– 5HC99: Embedded Visual Control– internships
04/21/23 ACA H.Corporaal 3
04/21/23 ACA H.Corporaal 4
MPSoC examples• ARM Cortex-A15• TI OMAP 2420• CELL• Xilinx• NVIDIA: Tegra K1 • TI: OMAP5430• Samsung: Exynos octo• Apple A7, A8(x) • QualComm: Snapdragon 800 (couldn’t find much)• AMD Jaguar 8 core (PS4, Xbox one)
ARM: Cortex-A15
04/21/23 ACA H.Corporaal 5
From: Travis Lanier, ARM
Cortex-A15
04/21/23 ACA H.Corporaal 6
Cortex-A15 overall pipeline
04/21/23 ACA H.Corporaal 7
Cortex-A15 Execution pipeline
04/21/23 ACA H.Corporaal 8
04/21/23 ACA H.Corporaal 9
TI OMAP 2420 architecture
04/21/23 ACA H.Corporaal 1004/21/23 ECA - 5KK73. H.Corporaal and B. Mesman 10
CELL - the history
• Sony/Toshiba/IBM consortium– Austin, TX – March 2001– Initial investment: $400,000,000
• Official name: STI Cell Broadband Engine – Also goes by Cell BE, STI Cell, Cell
• In production for:– PlayStation 3 from Sony – Mercury’s blades
04/21/23 ACA H.Corporaal 1104/21/23 ECA - 5KK73. H.Corporaal and B. Mesman 11
CELL – the architecture
1 x PPE 64-bit PowerPCL1: 32 KB I$ + 32 KB D$L2: 512 KB
8 x SPE cores:Local store: 256 KB 128 x 128 bit vector registers
Hybrid memory model: PPE: Rd/Wr SPEs: Asynchronous DMA
• EIB: 205 GB/s sustained aggregate bandwidth• Processor-to-memory bandwidth: 25.6 GB/s• Processor-to-processor: 20 GB/s in each direction
04/21/23 ACA H.Corporaal 1204/21/23 ECA - 5KK73. H.Corporaal and B. Mesman 12
CELL chip
04/21/23 ACA H.Corporaal 1304/21/23 ECA - 5KK73. H.Corporaal and B. Mesman 13
SPE (Synergetic Processing Element)
04/21/23 ACA H.Corporaal 1404/21/23 ECA - 5KK73. H.Corporaal and B. Mesman 14
SPE pipeline
04/21/23 ACA H.Corporaal 15
Xilinx goes multi-core as well: Zynq
Apr 21, 2023 16
Nvidia Tegra K1• Integrated ARM CPU + Nvidia GPU• Pictures & slides from http://www.anandtech.com/show/7622/nvidia-tegra-k1
Nvidia Tegra K1
Apr 21, 2023 17
CPU option 1: Quad-Core ARM Cortex A15
Apr 21, 2023 18
CPU option 1: Quad-Core ARM Cortex A15
Apr 21, 2023 19
CPU option 2: Dual-Core 64-bit Nvidia Denver
Apr 21, 2023 20
Tegra K1: GPU
Apr 21, 2023 21
Tegra K1: GPU
Apr 21, 2023 22
Tegra K1: Image Signal Processor (ISP)
Apr 21, 2023 23
TI OMAP 5430 mobile platform
04/21/23 ACA H.Corporaal 24
Exynos 5410 Octo core (Samsung)
04/21/23 ACA H.Corporaal 25
Apple A7 (from iPhone 5s)
04/21/23 ACA H.Corporaal 26
(chipworks.com)
Apple A6
(chipworks.com)
04/21/23 ACA H.Corporaal 27
A7floorplan
(chipworks.com)
04/21/23 ACA H.Corporaal 28
28 nm
Apple iPhone 6 / iPad air 2: A8 / A8x• > 9 Sept 2014 / >16 Oct 2014• 64 bit• 1.4 GHz• 20 nm• CPU: 2 / 3 cores• GPU: PowerVR 6XT series:
– GX6450 quad / GXA6850 octa-core
• Performance ~ 1.5 x A7
04/21/23 ACA H.Corporaal 29
Qualcomm Snapdragon 810• 20nm
• ARMv8-A (64-bit) based cores– 4+4 (Cortex-A57/A53), 2.8 GHz
• GPU: Adreno 430, 650 MHz– OpenCL supported
• DSP: Hexagon V56: 4-way VLIW, 3-way multi-threaded, 32-bit, 800 MHz
• 2 ISPs:Image Signal Processors
04/21/23 ACA H.Corporaal 30
PS4/Xbox one / AMD Jaguar• PS4 SoC reverse
engineered
• 328 mm^2
• 28 nm
• 2x quad Jaguar– 1.6 - ? GHz
• Radeon GPU with 20 cores @ 800MHz– 1840 GOPS/s
• BW 176 GB/s
04/21/23 ACA H.Corporaal 31
AMD Jaguar core• 28nm
• 3.1 mm^2 (per core)
• 2 way integer integer
• 2 way 128-bit floating-point/packed integer
• L1: 32+32 KB
• L2: 1-2 MB
• no multi-threading
04/21/23 ACA H.Corporaal 32
Some educational boards
• Rasberry Pi
• Beagle board
• Panda board
• Arndale board
• Zedboard
04/21/23 ACA H.Corporaal 33
Some educational / prototype boards
Raspberry Pi
• ARMv6 32-bit 700 MHz
• GPU (VideoCore IV) @ 250MHz
• ~ 30 $
Some educational / prototype boards
Beagle board (TI)
• ARM Cortex A8 1GHz
• C64x DSP (VLIW)
• PowerVR GPU
• ~ 150$
Some educational / prototype boards
Panda board
• OMAP4430
• dual core ARM Cortex-A9, 1.2 GHz
• PowerVR 384 MHz
• IVA3 multimedia DSP
• ~ 180 $
Some educational / prototype boards
Arndale• SOC: Exynos 5250• Dual ARM Cortex-
A15• GPU: ARM Mali-
T604• Runs OpenCL• Inside Chromebook
laptop
Some educational / prototype boards• Zedboard
• Xilinx Zynq FPGA
• Dual ARM core
• ~ 300 $ (Universities)
Questions
• What are the major things you learned?
• What was you favorite topic(s)?
• What are the key issues?
• What topics did you miss?
• What should I change next year?
• Check our website 5kk73:• www.es.ele.tue.nl/~heco/courses/EmbeddedComputerArchitecture/
04/21/23 ACA H.Corporaal 39
Crucial Topics Treated
• Processor components
• The energy / power law
• Memory hierarchy
• Reuse & Loop transformations
• ASIPs and Accelerators
• Parallelism
• Multi-Processing
• Embedded Systems: MPSoCs
04/21/23 ACA H.Corporaal 40
Finally
• Project based course on quadcopters: 5HC99– http://www.es.ele.tue.nl/~heco/courses/EmbeddedVisualControl/
– http://www.es.ele.tue.nl/education/5HC99/wiki/index.php
• Student assignements: see PARSE website, go to student projects:– http://parse.ele.tue.nl
• Exam procedure
04/21/23 ACA H.Corporaal 41