EEC 281, B. Baas3
Today
• Administrative items
• Syllabus and course overview
• My background
• Digital signal processing overview
• Read Programmable DSP Architectures, Part Iby E. A. Lee
EEC 281, B. Baas4
Course Communication
• Email– Urgent announcements
• Web page– http://www.ece.ucdavis.edu/~bbaas/281/
• Office hours– After lecture Tuesday
– Tentatively Wednesday afternoon
– After lecture Thursday
EEC 281, B. Baas5
Course Workload
• 4 unit graduate course
• This course requires significant effort and time– Multi-disciplinary field coverage
• DSP algorithms
• Digital processor architectures
• Arithmetic
– Utilizes robust industry-standard CAD tools (but we will make use of only the core essential features)
• Verilog
• Synthesis tool
• Matlab
EEC 281, B. Baas6
Course Overview
• EEC 281 web page contents– Reading materials
and references
– Hwk/Project descriptions
– Handouts– http://www.ece.ucdavis.
edu/~bbaas/281/
EEC 281, B. Baas7
Course Overview
• Canvas– Grades posted here
• Let me know if you ever see a score different than you expect
– Upload electronic portions of hwk/projects here
• Syllabus– Posted on course web page
EEC 281, B. Baas8
Lectures
• Ask questions at any time
• Please hold conversations outside of class
• Please silence phones
• Integrated Solid-State Circuits Conference (ISSCC) February 12–14– Quiz and guest lecture on Tue, Feb 13; or possibly a special
make-up lecture
• I assign a letter grade only for the final course grade
• I look at the final exams and course record of the class and assign two key dividing points: the A/A+ and (probably B/B+) boundaries, and assign course grades from there using equally-sized intervals
– No required numbers of any particular letter grades
– Absolute scores are not important; the boundaries shift accordingto the difficulty of the exams in any quarter
– Ignore any letter grades you might see on canvas
EEC 180A, B. Baas9
Letter Grade Assignments
A A+
A/A+
B/B+
B B+
Example with hypothetical data:
© B. Baas19
Critical Challenges Facing Industry
• Energy Efficiency
• Performance
• Software development cost and time
• Hardware development cost and time
• Opportunity: Critical workloads sometimes/frequently have
relatively simple tasks as critical kernels
(e.g., machine learning, digital signal processing,
multimedia, data record processing, pattern matching, etc.)
– Embedded (e.g., IoT)
– Mobile
– Datacenter
Number of Processors on a Single Die vs. Year
23
Note: Each processor capable of independent program execution
Academic
Industry
© B. Baas24
Processor Eras
• Transistor Era: the Intel 4004 was
the first commercial single-chip
microprocessor and it contained
2300 hand-drawn transistors
• Single/Multi-Processor Era: focus
on components of single processors
and multi-processors, which
generally scale well to only small
numbers of processors
• 1000-Processor Era: focus on
making systems scalable and
working with processors as
building blocks. The 32 nm
1000-processor KiloCore chip would contain approximately 2300-3700
processors if its area were the same as a 32 nm Intel Core i7 processor,
or 11,000 processors if its area were the same as an Nvidia GP100
EEC 281, B. Baas25
• Basic trends– Number of available devices: continually increasing– Energy dissipation per operation: decreasing too slowly
• There are a lot of ways to place and connect a billion transistors• The most efficient implementations (throughput, energy, area) will have:
– Processor sizes that capture computational kernels with few excess circuits
– Optimized clock frequencies and supply voltages matched to dynamic workloads
Future Fabrication Technologies
VDD1VDD3
VDD2
26
Optimal Computational Tile Size
• The most efficient implementations (energy, throughput, chip area) have: Processor sizes that capture computational kernels with few excess circuits
~~~ ~~~
Energy Effic.
Clock rate
Area Effic.
Unused or low
benefit-per-cost
circuits
Inter-
processor
interconnect
Tile Size
Single Processor
Cores per
area of
ARM A9
22 mW/GHz
per
0.055 mm^2 area
ARM Cortex-A9 1 1643 mW/GHz
Intel Atom Clover Trail 1.5 1120 mW/GHz
ARM Cortex-A15 7.8 212 mW/GHz
MIT RAW 8.3 198 mW/GHz
UC Davis KiloCore 74.7 22 mW/GHz
27
Some Benefits of Fine-Grain Many-Core
• Die drawn approximately
to scale
UC Davis
KiloCoreMIT
RAW
ARM
Cortex-A9
ARM
Cortex-A15
Intel Atom
Clover Trail Saltwell
KiloCore Chip
Slide 28
7.8
2 m
m
7.67 mm8 mm
8 m
m
Technology32nm IBM
PDSOI CMOS
Num. Procs. 1000
Num. Mems. 12
Num. Oscs. 2012
Die Area 64 mm2
Array Area 60 mm2
Transistors 621 Million
C4 Bumps 564 (162 I/O)
Package676 Pad
Flip-Chip BGA
EEC 281, B. Baas31
Advancing CMOS Technologies
• Moore’s “Law” (Observation) was made in 1965 and notes that transistor density ~doubles every year (every 1.5 years now)
• "Cramming more components onto integrated circuits," Gordon Moore, Electronics, April 19, 1965.
© B. Baas32
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten
New plot and data collected for 2010-2015 by K. Rupp
New data added by B. Baas
Number of Logical Cores
Transistors(thousands)
EEC 281, B. Baas33
Digital Signal Processing
• Digital– Discrete time
– Discrete valued
• Signal– 1, 2, 3,… dimensional
• Processing– Analysis
– Synthesis
– Enhancement
EEC 281, B. Baas34
DSP Workloads
• Often “real-time”– Data producer and consumer can not be paused or held up
• Examples: antenna, controller, camera, video monitor,…
– Very strict minimum performance levels
– Performance above that minimum is often of little value
DSPsystem
Dataproducer
Dataconsumer
EEC 281, B. Baas35
DSP Workloads
DSPsystem
Dataconsumer
Synthesis. Ex: music keyboard
DSPsystem
Dataproducer
Analysis. Ex: anti-lock brakes
Maybe
MSamples/sec
Maybe
1 Sample/sec
EEC 281, B. Baas36
DSP Workloads
• Data stream can be considered infinite duration– Length of data stream >> any buffering
– Ex: high-pass filter, automotive collision-detection radar distance measurement system
DSPsystem
1
0
0
1
0
0
1
0
0
1
1
0
0
1
1
0
1
0
0
0
1
1
1
0
……… …
EEC 281, B. Baas37
DSP Workloads
• Digital signal processing
• Typically very numerically intensive– Lots of +, -, x
DSP
system1
0
0
1
0
0
1
0
0
1
1
0
0
1
1
0
1
0
0
0
1
1
1
0
……… …
EEC 281, B. Baas38
DSP Compared withAnalog Processing
• Digital signal processing– Compare with analog signal processing
• If possible in analog domain (at required precision), analog processing will likely require far fewer devices
• If possible in analog domain, either domain may produce the most energy-efficient solution
• Many algorithms are possible only with DSP (arbitrarily high precision, non-causal, …)
• DSP arithmetic is completely stable over process, temperature, and voltage variations
– Ex: 2.0000 + 3.0000 = 5.0000 will always be true as long as the circuit is functioning correctly
EEC 281, B. Baas39
DSP Compared withAnalog Processing
• Digital signal processing– Compare with analog signal processing
• DSP energy-efficiencies are rapidly increasing
• Once a DSP processor has been designed in a portable format (gate netlist, HDL, software), very little effort is required to “port” (re-target) the design to a different processing technology. Analog circuits typically require a nearly-complete re-design.
• DSP capabilities are rapidly increasing
– Analog A/D speed x resolution product doubles every 5 years
– Digital processing performance doubles every 18-24 months (6x to 10x every 5 years)
EEC 281, B. Baas40
Common DSP Applications
• Early applications
– Instrumentation
– Radar
– Communication
– Imaging
• Current applications
– Consumer audio, video
– Networking
– Telecommunications
– Machine learning
– Imaging
– Many many more…
EEC 281, B. Baas41
Consumer Products’ Trends
• Analog based Digital based– Music records, tapes CDs, MP3s
– Video VHS, 8mm DVD, Blu-ray, H.264, H.265
– Telephony analog mobile (1G) digital (4G, LTE,…)
– Television NTSC/PAL digital (DVB, ATSC, ISDB, …)
– Many products use digital data and “speak” digital: computers, networks, digital appliances
• Impacts– Processing
– Transmission
– Storage
– etc.
EEC 281, B. Baas42
Future Applications
• Very limited power budgets
• Require significant digital signal processing
EEC 281, B. Baas43
Key Design Metrics
1) Performancea) Throughput (high); e.g., 250 MSamples/secb) Latency (low); e.g., 2.7 µsec from first sample in -> first outc) Numerical precision
2) Chip area (cost); e.g., mm2 die area, area of standard cell netlist
3) Energy dissipation per workload, e.g., Joules per JPEG image
4) Design complexity
– Design time = lower performance
– Software more important as systems become more complex
5) Suitability for future fabrication technologies
– Many transistors
– Faulty devicesi) During manufacturing processii) device wear out due to effects such as NBTI