1
ECE 475/CS 416 Computer Architecture - Introduction
Edward Suh Computer Systems Laboratory [email protected]
Today’s Agenda
Question 1: What is this course about? What will I learn from it?
Question 2: How will the course be run? What do I need to know?
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
2
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Title = “Computer Architecture” What is “Computer Architecture”?
Old definition (80s)=
Today’s architects must do more; implementation hurdles are more challenging than those in instruction set design
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Role of the Computer Architect
To design and engineer the various levels of a computer system to maximize “performance” and programmability within limits of technology and cost.
Architect must be aware of • application characteristics and benchmarks • measures of cost and performance • technology trends • software and hardware interaction
3
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
“Performance”? Desktop computers
• Largest market in dollar terms
Web servers • Amazon.com had $1.35MM revenue / hour (2005)
Embedded / mobile computers
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Single-Processor Performance
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th Edition, 2006
4
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Technology “If […] history […] teaches us anything, it is that man, in his quest for
knowledge and progress, is determined and cannot be deterred.” John F. Kennedy (1962)
Amazing yearly advances • ~60% more devices per chip (doubles every 18 months) • ~15% faster devices (doubles every 5 years) • disks increase ~60% in capacity • circuit boards increase ~5% in wire density
Faster devices and advances in circuit design improve performance
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Clock Frequency Growth Rate
Source: Intel
30% per year
5
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Architecture Contribution
Part I. Single-Core Processors
What kinds of architectural innovations enabled the uni-processor performance improvement over the past 20 years?
Same program (binary) Runs 1.58x faster each year!!
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
6
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Moore’s Law: 2X transistors / “year”
“Cramming More Components onto Integrated Circuits”, Gordon Moore, Electronics, 1965 # of transistors / cost-effective integrated circuit double every N months (12 ≤ N ≤ 24)
Source UCB EECS 252 notes
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
CPUs: Archaic vs. Modern 1982 Intel 80286 12.5 MHz 2 MIPS (peak) Latency 320 ns 134,000 xtors, 47 mm2 16-bit data bus, 68 pins Microcode interpreter, separate FPU chip (no caches)
2001 Intel Pentium 4 1500 MHz (120X) 4500 MIPS (peak) (2250X) Latency 15 ns (20X) 42,000,000 xtors, 217 mm2 (310X) 64-bit data bus, 423 pins 3-way superscalar, Dynamic translate to RISC, Superpipelined (22 stage), Out-of-Order execution On-chip 8KB Data caches, 96KB Instr. Trace cache, 256KB L2 cache
Source UCB EECS 252 notes
7
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Memory: Archaic vs. Modern 1980 DRAM (async) 0.06 Mbits/chip 64,000 xtors, 35 mm2 16-bit data bus per module, 16 pins/chip 13 Mbytes/sec Latency: 225 ns (no block transfer)
2000 DDR52 SDRAM (clocked) 256.00 Mbits/chip (4000X) 256,000,000 xtors, 204 mm2 64-bit data bus per DIMM, 66 pins/chip (4X) 1600 Mbytes/sec (120X) Latency: 52 ns
(4X) Block transfers (page mode)
Source UCB EECS 252 notes
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Disk: Archaic vs. Modern CDC Wren I, 1983 3600 RPM 0.03 GBytes capacity Tracks/Inch: 800 Bits/Inch: 9550 Three 5.25” platters
Bandwidth: 0.6 MBytes/sec Latency: 48.3 ms Cache: none
Seagate 373453, 2003 15000 RPM (4X) 73.4 GBytes (2500X) Tracks/Inch: 64000 (80X) Bits/Inch: 533,000 (60X) Four 2.5” platters (in 3.5” form factor) Bandwidth: 86 MBytes/sec (140X) Latency: 5.7 ms (8X) Cache: 8 MBytes
Source UCB EECS 252 notes
8
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
LANs: Archaic vs. Modern Ethernet 802.3, 1978 Bandwidth: 10 Mbits/s Latency: 3000 msec Shared media Coaxial cable
Ethernet 802.3ae, 2003 Bandwidth: 10,000 Mbits/s (1000X) Latency: 190 msec (15X) Switched media Category 5 copper wire
Source UCB EECS 252 notes
Coaxial Cable:
Copper core Insulator
Braided outer conductor Plastic Covering
Copper, 1mm thick, twisted to avoid antenna effect
Twisted Pair: "Cat 5" is 4 twisted pairs in bundle
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
How Did We Get Performance? Trade-off transistors and bandwidth for latency Take advantage of parallelism
• .
Principle of locality
9
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Pipelined Instruction Execution
I n s t r.
O r d e r
Time (clock cycles)
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Why Slowdown?
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th Edition, 2006
10
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Architecture at a Crossroads How many cores does your computer have?
Uniprocessor performance now 2x / 5(?) years • “Power wall”: power consumption limits the transistors that can be turned on • “ILP wall”: law of diminishing returns on more HW for ILP • “Memory wall”: off-chip memory accesses take hundreds of CPU cycles
Change in chip design: multiple “cores”: Thread Level Parallelism (TLP)
All microprocessor companies switch to multiprocessors (AMD, Intel, IBM, Sun; all new Apples 2 CPUs)
“We are dedicating all of our future product development to multicore designs. … This is a sea change in computing”
Paul Otellini, President, Intel (2004)
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
A Peek at the Syllabus Cost and performance In-order processors Memory hierarchy Out-of-order processors Branch prediction Speculative execution Superscalar processors VLIW, Vector Simultaneous multithreading (a.k.a. Hyperthreading™) Multicore hardware, parallel processing Virtual machines, I/O, networks
11
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Labs Verilog design projects
• incredibly useful language to know — industry loves Verilog • projects done in teams of two
Expand on a basic MIPS R3000 processor • Lab 0: Welcome to Verilog (not graded) • Lab 1: Get used to processor model, fix bugs, add instructions • Lab 2: Pipeline model, add forwarding logic • Lab 3: Add caches and cache controller • Lab 4 Final Lab (next)
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Final Lab Superscalar (dual-issue) pipeline Design a processor extension of your choosing
• branch prediction • dynamic scheduling • hardware prefetchers • speculative loads • multiple-level caches • instruction set extensions • [your idea here]
Project report required
12
What You Will Learn How to evaluate architectural decisions?
• You will need to choose among different designs
Architectural techniques in modern microprocessors • Go from 1986 (314) to 2002 • Apply to your down designs
Why processors are moving towards “multi-cores”
Problems and solutions in multi-core systems
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
What Do I Need to Know? You are expected to know MIPS ISA and Verilog
• alternatively, you are expected to learn them quickly and on your own
What about C/C++? • as a computer engineer you should know C • we use small C programs to test Verilogdesigns
What about Unix/Linux? • basic Unix skills you should have or acquire:
– elementary tasks: logging in, changing password, manipulating files, etc. – familiarity with a Unix text editor of your choosing (e.g., vi, emacs)
13
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
ECE 475/CS 416 Requirements Prerequisites
• ENGRD 230 or equivalent, and ECE 314 or equivalent – logic design, FSM design – basic computer organization
Assets • passion for computer hardware • prior exposure to Unix and/or Verilog • ability to work nonstop for extended periods of time
You should not take this course if any of these apply • you do not meet the prerequisites • your schedule and/or lifestyle won’t fit a(nother) high-workload course
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Staff Instructor: Edward Suh, 338 Rhodes, office hours TuTh 11am-Noon
Teaching assistants: (office hoursTuWTh7-10pm, PH329) • Jiho Choi • Mark Cianchetti • Richard Hough • Yuan Ning • KK Yu
If you must, use the staff’s email: [email protected] • but we may post your question (and our answer) on Blackboard
14
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Computing Resources Blackboard is used for course communication
• Announcements (e.g. errata, date changes, etc.) • Handouts and lecture notes : print out before coming to lectures • Questions / Answers http://blackboard.cornell.edu/
All assignments handled through CMS
http://cscms.cit.cornell.edu/
ECE Computing Labs for lab assignments
Course Components Lectures: TuTh 2:55-4:10, PH219
• Download notes from Blackboard “Course Documents” • 5 min break in the middle
4 Homeworks • Individual assignment
4 Labs • Group of (one or) two
2 Exams • Prelim & Final
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
15
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Grading Grade distribution:
• Homework 15% – Individual
• Midterm 15% (Oct. 16) • Final 25% (TBA) • Verilog projects 40% (5% + 5% + 10% + 20%)
– Group of one or two • Class participation 5% / Half grade at my discretion
Late policy: 1min late = not submitted = zero (I’m not kidding) • but you have onelifeline on one assignment – 24 hours
– all parties involved must have lifeline available
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
A Few Rules When in trouble with the material
• Use Blackboard! It’s likely your question has been asked and answered – do not send me questions by email
• Observe office hours — we are all very busy – do not randomly drop by
• Ask in class! – good citizen’s hallmark: in-class participation
I have a keen eye and no tolerance for cheating • disciplinary hearings are no fun • check Cornell’s Code of Academic Integrity
http://cuinfo.cornell.edu/Academic/AIC.html
16
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
Textbook
Computer Architecture: A Quantitative Approach, 4th Ed. by John L. Hennessy and David A. Patterson
Morgan Kaufmann Publishers
ECE 475/CS 416 — Computer Architecture, Fall 2008, Suh
FAQ I have a question about ECE 475/CS 416
• office hours: TuTh 11-Noon, 338 Rhodes Hall
I have a question about conducting research in your group • office hours: TuTh 11-Noon, 338 Rhodes Hall
What courses complement ECE 475/CS 416? • ECE 474 (VLSI), CS 412/413 (Compilers), CS 414 (OS)