VLSI DSP 2010 Y.T.HWANG 1-1
VLSI Designs for Digital Signal VLSI Designs for Digital Signal ProcessingProcessing
Instructor: Yin-Tsung Hwang
Department of Electrical Engineering
National Chung Hsing University
VLSI DSP 2010 Y.T.HWANG 1-2
Chapter 1Chapter 1Introduction to VLSI DSPIntroduction to VLSI DSP
VLSI DSP 2010 Y.T.HWANG 1-3
SOC design EraSOC design Era
SOC architecture
VLSI DSP 2010 Y.T.HWANG 1-4
SoCSoC Example Example -- Blue Tooth (1)Blue Tooth (1)
VLSI DSP 2010 Y.T.HWANG 1-5
SoCSoC Example Example -- Blue Tooth (2)Blue Tooth (2)
Die size: 40 mm2
CMOS 0.25 μm technology
5 metals
Customized function unitCustomized function unit& Glue logic& Glue logic
VLSI DSP 2010 Y.T.HWANG 1-6
SoCSoC ExampleExample
VLSI DSP 2010 Y.T.HWANG 1-7
Structured ASIC Base DesignStructured ASIC Base Design
USB 2.0OTG PHY
ARM926
I-cache
D-cache MMU
AMBAInterface
JTAG
10bit,8ch ADC
IDE
LCDController
Flash Controller
SDR SDRAM Controller
PCI HostBridge
GPIO
Real TimeClock
WatchdogTimer
Timer
SPI
InterruptController
SystemControl Unit
APB Bridge
UART
DMA
USB 2.0 HostController I2C
I2S
10/100Ethernet MAC
SD/MMC
16 KBDual Port SRAM
PWM
ADCCtrl
AudioCODEC
EEPROM
EthernetPHY
32.768KHz33 MHz
SDR SDRAMNOR FLASH
VLSI DSP 2010 Y.T.HWANG 1-8
Customization of Structured ASICCustomization of Structured ASIC
ARM926EJ-S
USB2.0PHY
PL
L
Cheetah Connectivity10-bit ADC
InterruptController
10/100MEthernet
MACController
AHB-toPCIHost
Bridge
USB2.0Host
Controller
ST/SDRMemory
Controller
AHB-toAPB
Bridge
4-CHDMA
Controller
Arbiter/Decoder
SystemControl
Unit(SCU)
UART0/1/2
SPI0/1
I2C I2S SD/MMCTimer0/1/2
RTCWDT
8-bitGPIO
TFT/STNLCD
Controller
IDEController
PWM0/1
ADCController
AHB Bus
APB Bus
16KBDP SRAM
4-CHDMA
Controller
134-bitGPIO
AHBExpansion to
FPGA
VLSI DSP 2010 Y.T.HWANG 1-9
Verification PlatformVerification Platform
JTAG port MEM. Exp. SDRAM SD Port JTAG port
IO Exp.
Xilinx FPGASpartan3 xc3s4000
ADC Connector
Debug port
LA Miter
LCD Connector
LCD
GPIO/I2S/I2CMini PCI
IDENor Flash
Audio In/out
USB A/B
RS-232
PCI
Cheetah Test Chip
Power adapter
MAC
VLSI DSP 2010 Y.T.HWANG 1-10
Hardwired Hardwired v.sv.s. Programmable Approaches. Programmable Approaches
System design is often a tradeoff amongCost (hardwired)
Time to market (programmable processor)
Area/power (hardwired)
Design flexibility (programmable)
Performance (hardwired)
HW/SW partitioningSoftware modules: executed in CPU (e.g. ARM, PPC), micro processor (e.g. 8051) or DSP
Hardware modules: to perform customized or specific functions
VLSI DSP 2010 Y.T.HWANG 1-11
2G CDMA Base Station Example2G CDMA Base Station Example
Turbo codingspectrum spread/de-spread
Multi-user detectionRake receiverSmart antennas
HW components
SW components
VLSI DSP 2010 Y.T.HWANG 1-12
3G Base Station Example3G Base Station Example
VLSI DSP 2010 Y.T.HWANG 1-13
DSP for reality processing (1)DSP for reality processing (1)
Features of DSP algorithmsFiltering, transform, coding and so on
real time process ⇒ throughput
require high speed and massive computing capabilities
large volume of data
sophisticated algorithm ⇒ reduced communication BW
VLSI DSP 2010 Y.T.HWANG 1-14
DSP for reality processing (2)DSP for reality processing (2)
The roles of DSP
VLSI DSP 2010 Y.T.HWANG 1-15
Design by VLSIDesign by VLSI
Meritshigh integration
high parallelism
High data bandwidth
Reduced power consumption
suitable for modular design
FFT processor design
VLSI DSP 2010 Y.T.HWANG 1-16
What is VLSI DSP? What is VLSI DSP?
Implementing specific DSP algorithms in VLSIConverting DSP algorithms to VLSI circuitry
A hardwired solution
Exploiting the merits of VLSI design
Goals: To meet the computing demands
To reduce the power consumption
To reduce the chip area
To reduce the production cost
To achieve wide data bandwidth
To enhance the performance
VLSI DSP 2010 Y.T.HWANG 1-17
Why VLSI DSP? Why VLSI DSP? −− computing demand (1)computing demand (1)
Performance requirements driven by broadband communication
VLSI DSP 2010 Y.T.HWANG 1-18
Why VLSI DSP? Why VLSI DSP? −− computing demand (2)computing demand (2)
Current programmable processor solution
VLSI DSP 2010 Y.T.HWANG 1-19
Why VLSI DSP? Why VLSI DSP? −− computing demand (3)computing demand (3)
Current ASIC (FPGA) solution
VLSI DSP 2010 Y.T.HWANG 1-20
ASIC (FPGA) v.s programmable processor (1)ASIC (FPGA) v.s programmable processor (1)
VLSI DSP 2010 Y.T.HWANG 1-21
ASIC (FPGA) v.s programmable processor (2)ASIC (FPGA) v.s programmable processor (2)
TI C6X processor v.s. Xilinx 4000 series
VLSI DSP 2010 Y.T.HWANG 1-22
Why VLSI DSP? Why VLSI DSP? −− power issue (1)power issue (1)
Lead microprocessor power continues to increase
Power delivery and dissipation will be prohibitive
VLSI DSP 2010 Y.T.HWANG 1-23
Why VLSI DSP? Why VLSI DSP? −− power issue (2)power issue (2)
Chip power densitySunSun’’ss
VLSI DSP 2010 Y.T.HWANG 1-24
Why VLSI DSP? Why VLSI DSP? −− power issue (3)power issue (3)
VLSI DSP 2010 Y.T.HWANG 1-25
Why VLSI DSP? Why VLSI DSP? −− bandwidth issuebandwidth issue
Dedicated interconnect among modules to increase data bandwidth
VLSI DSP 2010 Y.T.HWANG 1-26
Applications of VLSI DSPApplications of VLSI DSP
Video Signal Processing(2D, 3D Filters)Digital communicationsNeural Networks and more …….
Signal SynthesisModulation / DemodulationFast Fourier Transforms
VLSI DSP 2010 Y.T.HWANG 1-27
Application Examples OverviewApplication Examples Overview
Consumer and broadcast applicationsDTV/DVB (HDTV and SDTV), Cable, Satellite, DVDBroadcast studio (nonlinear editing)Digital cinema, Video-on-demandVideo-over-wireless, Video-over-IP, Video conferencing
Industrial applicationsReal-time pattern recognitionSurveillance systems
Medical and military applicationsReal-time noise reductionReal-time enhancement, resizing & rotationLossless compression, digital archiving
VLSI DSP 2010 Y.T.HWANG 1-28
DTV/DVB Transmitter ExampleDTV/DVB Transmitter Example
DTV (ATSC USA), DVB (Europe and Japan)
HDTV or Several SDTV
VLSI DSP 2010 Y.T.HWANG 1-29
DTV/DVB Receiver ExampleDTV/DVB Receiver Example
VLSI DSP 2010 Y.T.HWANG 1-30
DTV/DVB Transmit & Receive System DiagramDTV/DVB Transmit & Receive System Diagram
VLSI DSP 2010 Y.T.HWANG 1-31
DVB and COFDM System DiagramDVB and COFDM System Diagram
VLSI DSP 2010 Y.T.HWANG 1-32
DVB and COFDM System Data (Japan)DVB and COFDM System Data (Japan)
2K, 4K, 8K FFT QPSK, 16QAM, 64QAM, DQPSK Guard interval 1/4, 1/8, 1/16, 1/32 of symbol durationIn-band pilots 1/12 scattered pilotsRS(204,188) Convolutional codes punctured 1/2, 2/3, 3/4, 5/6, 7/8Data throughput 4.9-31 Mb/s over 8MHz channelData throughput 3.6-23.2 Mb/s over 6 MHz channel
VLSI DSP 2010 Y.T.HWANG 1-33
Digital Camera and VideoDigital Camera and Video--onon--Demand examplesDemand examples
Digital CinemaJPEG2000 compression
standard
Other compression systems
4096x3072p 12-bit
130,000 cinemas
go digital
Video-on-DemandHome theater
Cable
Satellite
Hotel chains
VLSI DSP 2010 Y.T.HWANG 1-34
VideoVideo--overover--IP ExampleIP Example
MPEG-4, H.263+, H.26L
Video conferencing
CIF 352x288
H.324 Video telephony,
H.323 Video-over-IP
VLSI DSP 2010 Y.T.HWANG 1-35
Industrial Application ExamplesIndustrial Application Examples
Edge detection
Pattern recognition
Image registration
Image segmentation
Automatic surveillance
VLSI DSP 2010 Y.T.HWANG 1-36
Medical Application ExamplesMedical Application Examples
Real time XRAY
Cardiology, Angiography, Fluoroscopy, Mammography
Ultrasound
512x512 - 2048x2048,
1-60 f/s, 8-16 Bits gray
DICOM standard &
JPEG2000
Lossless compression
Temporal filtering
Spatial enhancement
VLSI DSP 2010 Y.T.HWANG 1-37
Medical Video ExamplesMedical Video Examples
Mosaic feature
Power zoom
Aspect ratio
VLSI DSP 2010 Y.T.HWANG 1-38
XX--Ray System DiagramRay System Diagram
VLSI DSP 2010 Y.T.HWANG 1-39
Image and Video CompressionImage and Video Compression
RGB to YCrCb color space conversion
Transforms and codeblocks
Pixels 8, 10, 12 bits
Frames 4096x3072 to 176x144 QCIF
Frame rates 10 to 60 f/s
Lossless Compression, Lossy Compression
Decimation in color components
HVS sensitivity towards lower frequencies
VLSI DSP 2010 Y.T.HWANG 1-40
Baseline JPEG Application ExamplesBaseline JPEG Application Examples
Block based
original compressed
VLSI DSP 2010 Y.T.HWANG 1-41
JPEG 2000 Application ExamplesJPEG 2000 Application Examples
Bit-plane based
VLSI DSP 2010 Y.T.HWANG 1-42
MPEGMPEG--2 Application Examples2 Application Examples
H.262
VLSI DSP 2010 Y.T.HWANG 1-43
VLSI DSP design issuesVLSI DSP design issues
Algorithm aspects
Architecture aspects
Circuit implementation aspectsNot addressed in this course
Algorithm to architecture mapping
VLSI DSP 2010 Y.T.HWANG 1-44
Algorithm Aspect (1)Algorithm Aspect (1)
point typegray scale transformation
histogram equalization
quantization
filter typetemplate matching
window technique (e.g. Hamming window)
Convolution / correlation
linear phase filtering
median filtering
moving average
VLSI DSP 2010 Y.T.HWANG 1-45
Algorithm Aspect (2)Algorithm Aspect (2)
Filter type (cont.)wiener filtering
optimal for stationary signals to remove additive noise
Kalman filteringgood for non-stationary signals
state vectors, measurement equations, errors
inverse filteringspecial case of Kalman filtering
adaptive filteringLeast Mean Square (LMS), Recursive Least Square (RLS)
VLSI DSP 2010 Y.T.HWANG 1-46
Algorithm Aspect (3)Algorithm Aspect (3)
Matrix algebra typesingular value decomposition
image processing (coding & enhancement)
channel estimation
Maximum entropy estimation
stochastic pointer estimation
sorter typebitonic sort
bubble sort
insertion sort
VLSI DSP 2010 Y.T.HWANG 1-47
Algorithm Aspect (4)Algorithm Aspect (4)
transform typeFourier transform
DFT, FFT, IFFTTime domain v.s. frequency domain transformationData modulation: OFDM
Cosine transformDiscrete cosine transform (MPEG, JPEG)Modified discrete cosine transform (MP3, MPEG4)
Wavelet transformMulti-resolution sub-band coding, JPEG2000, MPEG4
Hough transformto determine all possible straight lines & curves
Hadamard transformspeech processing, word recognition, data compression
VLSI DSP 2010 Y.T.HWANG 1-48
Algorithm Aspect (5)Algorithm Aspect (5)
Algorithm selectionPerformance evaluation
E.g. LMS vs. RLS
Computing complexityE.g. full search v.s. fast algorithm in motion estimation
Data bandwidth complexityE.g. memory bandwidth, data storage size, I/O pin count
Numerical propertyE.g. FIR vs. IIR, rounding effect
ParallelismE.g. Schur v.s. Levinson Durbin algorithm
Hardware module reuse
VLSI DSP 2010 Y.T.HWANG 1-49
Algorithm Aspect (6)Algorithm Aspect (6)
Algorithm refinementtransform for parallelism
E.g. look ahead transform
computing complexity reductionFast computing algorithm, e.g. FFT in lieu of DFT
Relaxation in computation, e.g. norm 1 in lieu of norm 2
VLSI DSP 2010 Y.T.HWANG 1-50
Architecture AspectArchitecture Aspect (1)(1)
Architecture classificationCustomized function unit
Array processor
Customized function unitfor fine grain level
parallelism
Exploit computing
concurrency within
each processing
iteration+ Slicer
R
A
B
1
64
Dual PortRam
64*12bits
LUT256*
10 bits
RAMRAM
RAM
R
R
R
ChannelEstimation
Coefficient
Update
MatchedFilter
Feed-forwardFilter
FeedbackFilter
y'
Cy
d
M
L
VLSI DSP 2010 Y.T.HWANG 1-51
Architecture AspectArchitecture Aspect (2)(2)
Array processorSystolic architecture
For massive data parallelism
Exploit computing concurrency across processing iterations
Array processor vs. SIMD & MIMD srchitectures
c44c34c24c14
D
D
×÷
D
D
D
D
D
c43c33c23c13
D
DD
×÷a43
D
D
D D
c41c31c21c11
c42c32c22c12
D
×÷a32 a42
×÷a21 a31 a41
D
D D D
D
DD
− −− −−−
VLSI DSP 2010 Y.T.HWANG 1-52
Architecture AspectArchitecture Aspect (3)(3)
VLSI array processorsparallel + pipelined processing
wide internal communication BW
locally connected
VLSI DSP 2010 Y.T.HWANG 1-53
VLSI DSP design flowVLSI DSP design flow
Application & functional spec.
algorithm
architecture
Circuit implementation
IP authoring
Spec. refinement
Algorithm simulation,Transform, simplificationTransform, simplification
Architecture mappingArchitecture mappingResource allocation,Resource allocation,Scheduling, pipelining,Scheduling, pipelining,retimingretimingProcessor element designVerilog coding, Timing closure
Test bench,Behavioral model
VLSI DSP 2010 Y.T.HWANG 1-54
Algorithm to architecture mappingAlgorithm to architecture mapping
MappingFrom algorithm domain to architecture domain
Resource allocation
Scheduling
Architecture refinementpipelined
adv : 1.clock rate increase 2.power saving
parallel processing
multi-rate
retiming
VLSI DSP 2010 Y.T.HWANG 1-55
Performance gain at different levelsPerformance gain at different levels
algorithm ( improvement > 10 times)Better algorithm (e.g. QR factorization vs. matrix inverse)
fast algorithm (e.g. FFT in lieu of DFT)
architecture (improvement about 1~2 times)pipelined
parallel processing
multi-rate
retiming
Circuit (improvement about 10%~30% )fast circuit design
improved technology
VLSI DSP 2010 Y.T.HWANG 1-56
ConclusionConclusion
Algorithm mapping is crucial !!
Good command of algorithms
Skills of Architecture designs
Bridging the gap between algorithm design and hardware implementation