Design center Vienna
Donau-City-Str. 1
A-1220 Vienna
Vers. 1.12
SVENScalable Video Engine
Gerald Krottendorfer
SVEN – Decoder/Encoder
SVENSVEN – – SScalable calable VVideo ideo EnEnginegineVideo Decoder/EncoderVideo Decoder/Encoder
Fully programmable:Fully programmable:multistandard / multiformat:MPEG-2MPEG-4H.264WMV9/VC-1
SVEN Architecture
D R AM
M em oryC ontro ller
H ostC ontro ller Video
Interface
DMABus
VideoProcessor
C ontro lP rocessor
AudioD SP
SVEN
TS/PSStream
D R AM
M em oryC ontro ller
H ostC ontro ller Video
Interface
DMABus
VideoProcessor
C ontro lP rocessor
AudioD SP
SVEN
TS/PSStream
SVEN Architecture
SVENDR AM
M em oryContro ller
HostContro ller
DMA Bus
VideoProcessor
StreamProcessor
AudioDSP
SVEN
CTL Bus
Video Processor
SVENDR AM
M em oryContro ller
HostContro ller
DMA Bus
VideoProcessor
StreamProcessor
AudioDSP
SVEN
CTL Bus
…
…
G lobal A rithm eticU nit
I/OU nit
Prog
ram
Sequ
encer
VSP Bus
S lice Bus / Sw itchm atrix
Slice
#3
A ddressG en
Da
ta M
em A
S liceArithm etic
U nit
Slice
#2
Slice
#N
D a ta Bus
Data M
em
B
D M A Port
SLICE Slice ClusterS
lice #
1
In ternal D ata Bus & C ross M atrix
scaleable
D M A Ports …
Video Processor
Da
ta B
us
D M AC ontro ller
DM
A B
us
Stream Processor
SVENDR AM
M em oryContro ller
HostContro ller
DMA Bus
VideoProcessor
StreamProcessor
AudioDSP
SVEN
CTL Bus
…
Program
Seque
ncer
S lice Bus / Sw itchm atrix
S lotA rithm etic
U nit
Data
Mem
SlotS
lot #1
Parse R egister
Control Processor
Buffer
Parse R egister
Slot
#2
Slot
#N…scaleable
D ataStream
Plug InExtensions
VLCTables
Internal D ata Bus
Control B
us &
Data
Bu
s
D M AC ontro ller
D M ABus
Memory Controller
SVENDR AM
M em oryContro ller
HostContro ller
DMA Bus
VideoProcessor
StreamProcessor
AudioDSP
SVEN
CTL Bus
DR AM M em oryContro ller
VideoInterface
Memory Controller
D M AC ontro ller
D M AC ontro ller
D M AC ontro ller
…
D M A Bus
PH YD R AM
C ontro ller
RequirementsRequirements
ProgrammableProgrammable Processing PowerProcessing Power BandwidthBandwidth ScalabilityScalability
Video Decoder/Encoder:Video Decoder/Encoder:
Programmable
Processing power
Bandwidth
Scalability
Programmable
Programmable
Processing power
Bandwidth
Scalability
Multi standard compliancy: H.264 VC-1 MPEG-2 MPEG4 DIVx etc. …
High complexity of application Standard compliancy tests after TO Minimize design risks & time to market
Increased flexibility
Processing Power
Enhanced compression standards: H.264, VC-1: higher compression rates at substantially higher processing power requirements
High processing power requirementsHigh processing power requirements
In combination with High Definition Digital TV Standards:
VERY highVERY high processing power needs processing power needs
Programmable
Processing power
Bandwidth
Scalability
Scalable Processor
applicationapplication
Pro
cess
ing
Po
wer
Pro
cess
ing
Po
wer
Decoder +Encoder
Decoder +Imageprocessing
Decoder
Programmable
Processing power
Bandwidth
Scalability
H.264
MPEG-4
MPEG-2
decoding standarddecoding standard
Power & Core Size
0
200
400
600
2 4 8 16 32
Slices
Pow
erdi
ssip
atio
n (m
W)
0
2
4
6
2 4 8 16 32
Slices
Cor
eare
a (m
m2)
PowerPower and AreaArea Scales with Processing power requirements
Programmable
Processing power
Bandwidth
Scalability
Bandwidth Requirements H.264
H.264 ... VERY High Databandwidth Needs
BitstreamDecoder
IQ + IITransInter/intraprediction
94MB/s
94MB/s
MemoryController94MB/s
490MB/s
DRAM
20Mbit/s
Parameter
Deblocking94MB/s
250MB/s220MB/s
Programmable
Processing power
Bandwidth
Scalability
SolutionSolution
ProgrammableProgrammable Processing PowerProcessing Power BandwidthBandwidth ScalabilityScalability
Video Decoder/Encoder:Video Decoder/Encoder:
Programmable
Processing power
Bandwidth
Scalability
Control ProcessorControllerController
Dual Core Solution
Dual Core Architecture:Dual Core Architecture:
Control Processor:
stream parsing
data flow control
HW accelerator
Datapath: ALU (RISC)
VIDEO Processor:
Transform operation
inter / intra prediction
filtering
Datapath: MAC (DSP)Video ProcessorNumber CruncherNumber Cruncher
Programmable
Bandwidth
Scalability
Processing power
Programmable
Processing power
Bandwidth
Scalability
SVENSVEN
Pro
cess
ing
Po
wer
H.2
64 D
eco
der
/ pi
ctu
re fo
rmat
enhanced DTV
HDTV
Film
1080i/720pMain Profile
Standard DSPPerformance Limit
ADI Blackfin
TI C64D1
720x486Baseline profile
dualBlackfin Embedded DSP
Performance LimitCIF
352x288Baseline
Equator
SVENSVEN
SVEN Performance
Effective Processing Power
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8
# Parallel Processing Units
Pro
cess
ing
Po
wer
--- Scalable Architecture
--- DSP
Programmable
Processing power
Bandwidth
Scalability
Linear Scalability
effective processing power
scalable Architecture
R * N
DSP/RISC Architecture
R / [ (1-k) + k/N ]
Legend:
N = Number of parallel computing entities
R = processing power of a single computing entity
k = factor from 0 to 1, determining the percentage of commands which can be parallelized
SVEN versus DSP: ScalabilityScalability
Bandwidth
Flexible data exchange in between Slices due to cross matrix
REG REG
SLICE 0 SLICE 1
REG
SLICE N
StreamlineBus
BroadcastBus
Cross Matrix
Data Transport in between Slices/Slots does not steal processing performance Processing Power
Programmable
Processing power
Scalability
Bandwidth
I/O Capabilities
Programmable
Processing power
Scalability
Bandwidth
Stream Processor: Each Slot has its own Data memory
SLOT 0
DA
TA
_ME
M
SLOT 1 SLOT N
DA
TA
_ME
M
DA
TA
_ME
M
I/O Capabilities
Programmable
Processing power
Scalability
Bandwidth
Video Processor: Each Slice has its own Data memories
Stream Processor: Each Slot has its own Data memory
SLICE 0
DA
TA
_ME
M A
SLICE 1 SLICE N
DA
TA
_ME
M B
DA
TA
_ME
M A
DA
TA
_ME
M B
DA
TA
_ME
M A
DA
TA
_ME
M B
Programmable
Processing power
Scalability
Bandwidth
DMA Access
SLICE 0
DA
TA
_ME
M A
SLICE 1 SLICE N
DA
TA
_ME
M B
DA
TA
_ME
M A
DA
TA
_ME
M B
DA
TA
_ME
M A
DA
TA
_ME
M B
DMAENGINE DMA IF
Direct Access to Video Buffer from all Slices via DMA port at each Slice Data Memory
Video Processor: Each Slice has its own 2 Data memories
Stream Processor: Each Slot has its own Data memory
Programmable
Processing power
Scalability
Bandwidth
Scalable Bandwidth
Direct Access to Video Buffer from all Slices via DMA port at each Slice Data Memory
Video Processor: Each Slice has its one 2 Data memories
Stream Processor: Each Slot has its one Data memory
Scalable Data BandwidthScalable Data Bandwidth
Roadmap
• VSP IP available
• H.264 • VC-1 • MPEG4• DivX• MJPEG• JPEG2000• etc...
• MPEG2
2003 2004 2005 2006
• SVEN IP available
• SVEN IPProof of concept:Manufactured IC
• VSP S8-32Proof of concept:Manufactured IC
Library / Applications
IP-Core• IP core shipping
Programming Toolchain
Controller-Debugger SVEN-Debugger
Eclipse based IDE
Simulator / Debugger
Assembler / C-Style Code
Compiler
Allows integration of external Processor cores (Host, Audio DSP)
Multistandard HDTV Decoder
HDTV compliant1080i60 … 30 fps720p60 … 60fps
Multistandard DecoderMPEG2H.264 VC-1
Requirements:Requirements:
Core Size & Power
Core Area [mm2]
Code Memory
Data Memory
Video Processor 7.7 64k 128k
Stream Processor Core: 1.5 64k 8k
Extension: 1.4 (CABAC, VLC)
Memory Controller 0.6 - -
Power 750 mW
Technology: 130nm TSMCTechnology: 130nm TSMCClockspeed: 200MHzClockspeed: 200MHz
Summary
Fully Programmable
All video formats: H.264, VC-1, MPEG-2, ...
Scalable Architecture
Enables software programmable HDTV video codec
Small Core Size / Low Power
For high volume markets
ON DEMAND Microelectronics
Design Center ViennaTechgate, Donau-City-Str.1A-1220 Vienna / Austria www.ondemand.co.at