© Copyright 2013 Xilinx .
Ivo Bolsens, Senior Vice President & CTO
The All Programmable SoC FPGA for
Networking and Computing in Big Data
Infrastructure
Page 1
© Copyright 2013 Xilinx .
Moore’s Law: The Technology Pipeline
Page 2
© Copyright 2013 Xilinx .
Industry Debates on Transistor Cost
Page 3
© Copyright 2013 Xilinx .
28nm =
2x 45nm
cost
> $170 M
28/22-nm
32-nm
45-nm
65-nm
90-nm
130-nm
180-nm
0 45 90 135 180
($ Million)
Estimated Chip Design Cost, by Process Node, Worldwide, 2011
Design cost ($M)
Mask cost ($M)
Embedded software ($M)
Yield ramp-up cost ($M)
Design Cost
Page 4
© Copyright 2013 Xilinx .
Page 5
Growing Problems for ASIC & ASSP Offerings
Growing ASSP Gaps
Eroding customer confidence in vendors
High cost burden from over design for diverse needs
No ability to differentiate or customize
>50% of Top 16 ASSP Vendors Losing Money
Source – Public reports, Xilinx estimates
© Copyright 2013 Xilinx .
Trend Mobile Infrastructure:
Scalable Platforms
Page 6
Coverage
Capacity
Home
Office
Dense
Indoor (Malls,
Transport
Hubs)
Urban
Infill
Wide Area
4-16 Users
<100mW
30-60 Users
<250mW
30-200 Users
<1W
~200 Users
<1-10W
Single Sector
~200 Users/
sector
20-100W
Multi-Sector
Residential
Femto
Enterprise
Femto
Picocell
Microcell
Macrocell
Macrocell +
Active Antennas
Wide Area 2-3x Data
Capacity of RRU
Outdoor Indoor
© Copyright 2013 Xilinx .
Computing
Communicating
Storing
Gaming
Streaming
Latency (ms)
GB, Latency (s)
GB, BW (mean, var)
GB, BW (mean, var)
latency (ms)
Latency (ms), BW (mean)
Trend Services : Different Figures of Merit
© Copyright 2013 Xilinx .
Trend Wired Infrastructure:
Software Defined Networks
Slide credit: From “Virtualizing the Net” by Jon Turner (2004)
Page 8
© Copyright 2013 Xilinx .
Software Defined Networking gains industry
mindshare
The best thing about OpenFlow or SDN, is that it’s brought back a new hope to networking.
Networking is cool again- Jayshree, CEO - Arista Networks
© Copyright 2013 Xilinx .
Trend Data Center Infrastructure:
Cloud Computing
Big Data Increasing Volume, Velocity, and Variety
Security Both outside and inside
Low power Reduce operation and cooling costs
Page 10
© Copyright 2013 Xilinx .
Impact of trends
(1) Networking
Page 11
New network fabrics
• Faster, Fatter, and Flatter
Software defined
networking
• Software control plane
• Hardware data plane
Content-aware
networking
• Deep packet inspection
• Enhanced security
© Copyright 2013 Xilinx .
Impact of trends
(2) Compute
Page 12
ARM-based microservers
• Improved performance per watt
Hybrid SoC
• CPU+accelerators+fabric
• Cost and power reduction
Larger memory
• Hybrid NVRAM and DRAM
• Latency reduction
© Copyright 2013 Xilinx .
Impact of trends
(3) Storage
Page 13
Specialized functions
• Compression, encryption, memcached
Custom SSD controllers
• Higher performance
• Reduced latency
Data-aware storage
• Integrated database support
• Offload from processor
© Copyright 2013 Xilinx .
Page 14
Future Data Centre Architecture
Internet
“Intelligent
Appliances”
(router, firewall)
TOR
Server
...
Server
Server
...
Direct-attached
storage
TOR
Server
...
Server
Server
Rack
...
...
... ...
...
...
Direct-attached
storage
new
topologies
(torus NIC)
FibreChannel
SAN
Network attached
storage
Intelligence in the network
Further convergence
Fewer tiers
Convergence
Low latency
NICs
... ... ...
© Copyright 2013 Xilinx .
Page 15
Generic Data Center
x86
SOC
DRAM
DRAM
40/80G
NIC
Cohere
nt M
em
ory
Bus
Memory Bus
Coherency
Lowest Latency
I/O Bus
PCIe
Core Network
Public and Private
Ethernet
I/O
Bu
s a
nd
Fab
rics
Co
re N
etw
ork
I/O
Attach
Accel.
NVM
NVM
H
D H
D
Switch
100/400G
Co
re N
etw
ork
NVM
NVM
H
D H
D
QPI
AXI ACE
PCIe
Fabrics
Ethernet
FPGA
Storage
Ctrl
Direct
Attach
Storage
FPGA
Storage
Ctrl
Network
Attach
Storage
© Copyright 2013 Xilinx .
Page 16
The New Data Center
x86
FPGA
Acc
FPGA
ARM
Zynq SOC
Fabric
Bridge
Node Ctrl
DRAM
DRAM
DRAM
FPGA
NIC
40/80G
Cohere
nt M
em
ory
Bus
I/O
Bu
s a
nd
Fab
rics
Co
re N
etw
ork
FPGA
Storage
Ctrl
FPGA
Acc
NVM
NVM
NVM
NVM
NVM
H
D H
D
FPGA
Switch
100/400G
Co
re N
etw
ork
NVM
FPGA
ARM
FPGA
ARM
DRAM
DRAM
NVM
NVM
Transcode
Search / Database
DSP Filters
Large Memory
Hybrid NVM/DRAM
Compute
Networking
Storage
Custom SSD
Low latency
Memcache appliance
Data aware storage
Encrypt, Compress
Security, DPI
Low latency switch/bridge
Custom fabrics
SDN control plane
Ethernet
QPI
AXI ACE
PCIe
Fabrics
© Copyright 2013 Xilinx .
Trend : More Intelligence in Embedded Systems
Page 17
MACHINES THAT UNDERSTAND
The Next Big,
Digital Economy;
‘Smart Energy’ The energy market is undergoing
a major transformation…
Smart Factories For factory management in the future, it will become
essential to strive to implement smart capabilities…
SMART Data Center Revolution
New Opportunities to Control Costs
and Increase Strategic Advantage…
Smart wireless networks
to the rescue Carriers are turning toward more intelligent
network management…
Page 17
© Copyright 2013 Xilinx .
Page 18
Programmable & Smart Across All Markets
Embedded
Data Center
Wired Comms
All Programmable Smarter • Multiple Spectrums
• Multiple Standards (LTE, 3G)
• Multiple Levels of QoS
• Self Organizing Networks (SON)
• Cognitive Radio
• Smart Antenna
• Network Function Virtualization (NFV)
• Multiple Stds (400Gb etc.)
• Dynamic QoS Provisioning
• Context Aware Network Services
• Self-Healing Networks
• Video Caching at the Edge
• Software Defined Networks (SDN)
• Multiple Stds (FCoE, iSCSI ...)
• Config Storage (SAN, NAS, SSD…)
• Data Pre-Processing & Analytics
• Virtualized Resource Optimization
• Intelligent Appliances
• Changing Resolutions (MPixel, Fps)
• Emerging Video Stds (UHD, 8K/4K)
• Evolving Video Processing Algorithms
• Object Detection & Analytics
• Automotive Collision Avoidance
• Industrial Machine Vision
Wireless Comms
© Copyright 2013 Xilinx .
Industry Mandates
Programmable
Imperative
Programmable
Systems
Integration
Insatiable Intelligent Bandwidth
Page 19
© Copyright 2013 Xilinx .
The All Programmable Platform
Security : Bit level operations
Packet Processing : Wide Datapaths
DSP Processing : Pipelined Datapaths
Graphics Processing : Parallel Micro-Engines
System Management : Finite State machines
© Copyright 2013 Xilinx .
X86 or ARM Accelerator
Accelerator
Accelerator MicroBlaze micro
engines
uEngine uEngine
uEngine ARM multicore
The Heterogeneous MPSoC
Platform
Accelerator
Accelerator
Accelerator DSP engines
Dedicated Accelerators
Host
Heterogeneous
Connected
Scalable
Parallel
Configurable
© Copyright 2013 Xilinx .
The Era of Heterogeneous Processing Unit
Page 22
© Copyright 2013 Xilinx .
The UltraSCALE FPGA SoC
Page 23
© Copyright 2013 Xilinx .
Programming the Heterogeous SoC
Page 24
© Copyright 2013 Xilinx .
CPU + FPGA Use Models
FPGA
0. Pipelined datapath
HDL programmed
1. Pipelined datapath with SW control
CPU sets register values
Control
Processor
FPGA
2. CPU + FPGA co-processing
FPGA part of explicit address space
CPU
FPGA
Memory
3. CPU + FPGA peer processing
Cache Coherency
CPU
FPGA
Memory
Cac
he
Co
he
ren
cy
$
$
Vir
tual
Vir
tual
Page 25
© Copyright 2013 Xilinx .
CPU + FPGA Evolution
QPI
PCIe
IO-Connected
Page 26
ARM
AXI
Coherent Integrated
© Copyright 2013 Xilinx .
Complexity of building MPSoC systems
High-Speed
Analog
Design
High-Speed
Interface
Design
DSP Hardware
Design
Software
Development
SOC
System
Assembly
Requires distinct design skills!
© Copyright 2013 Xilinx .
Platform IP Integrator
Radio pipeline
Build/ Re-Use IP Subsystems
Link/Assemble Subsystems
Generate SW Drivers
CPU
Radio pipeline
© Copyright 2013 Xilinx .
HW/SW Design Flow
C-compiler
SW-drivers
AXI
Libraries
Wires
C-synthesis
CPU FPGA
Middleware
Hardware
Concurrent
SW
Video Codec CPU
Encryption
LTE Modem Memory Data
Mo
ve
me
nt
Inte
rco
nn
ec
t
Application
Platform
Page 29
© Copyright 2013 Xilinx .
High Level Synthesis (HLS)
Page 30
Create IP from C/C++/System C algorithm specification
Abstract algorithm verification to the specification level
Traditional FPGA design experience not required
© Copyright 2013 Xilinx .
From C Algorithm to FPGA Implementation
Video frames/second
0
50
100
150
200
5.1
196
DSP C2FPGA
0
2
4
6
FPGA resources
RTL C2FPGA
FPGA: >38 times better performance than DSP video processor
QOR: C2FPGA equal to or better than RTL synthesis
Ease-of-use: C2FPGA 2x fewer lines of C code than DSP processor
Page 31
Quality of Results
© Copyright 2013 Xilinx .
Programming Heterogeneous Multi-core
Page 32
FPGA V
ideo
co
de
c
En
cry
ptio
n
Pa
cke
t
Pro
ce
ssin
g
FF
T
Se
arc
h
Application-Specific
ARM Processor
A9 A9
HS-SW Interfacing
Domain Specific API
Hardware / Software partitioning & interfacing
Commercial Software
Ecosystem
OpenCL
C
Compile / Debug
C-HLS
Accelerator synth
© Copyright 2013 Xilinx .
HW/SW Design Flow: SW Programmer View
C-compiler
SW-drivers
AXI
Libraries
Wires
C-synthesis
CPU FPGA
Middleware
Hardware
Concurrent
SW
Video Codec CPU
Encryption
LTE Modem Memory Data
Mo
ve
me
nt
Inte
rco
nn
ec
t
Application
Application Programming Page 33
© Copyright 2013 Xilinx .
Programmable Platform:
CPU + FPGA Peer Processing
Core
L1 Cache
Shared L2 Cache
DDR
MemCon
Coherency Engine
Over NOC Interconnect
Logic
Coherency Engine
Accel
L1 Cache
Capabilities
Coherent Caches for HW
Coherent Caches for SW
Coherency Management
Core
L1 Cache
Accel
L1 Cache
I/O
Device DMA
Coherency Benefits:
Peer Processing: Direct Cache-2-Cache data movement
Latency: Very low latency access to CPU (FPGA) data
Usability: No SW cache flush needed
SRAM
Page 34
© Copyright 2013 Xilinx .
Domain Specific Abstractions
Abstraction
IP
Automation
IPI
© Copyright 2013 Xilinx .
ZED Board
ZED Board
– Zynq Evaluation and Development Kit
– Low cost Zynq based community board (XC7Z020)
– Partnership between Avnet, Digilent, Xilinx
– Digilent will fulfill academic market for Xilinx University Program
www.ZEDboard.org
Open source SW and IP
– Linux
– Eclipse based IDE
– Vivado HLS: C to FPGA
– Reference designs
Page 36
© Copyright 2013 Xilinx .
Conclusions
New Markets Require Heterogeneous Multi-Core SoC
Modern FPGA are All Programmable SoC
Software Centric Design Flow Becoming Possible
Democratizing SoC Design : Targeted Teaching Platform
Page 37