Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | debra-mcdonald |
View: | 217 times |
Download: | 0 times |
Status of Microprocessors Technology
Advanced Computer Architecture
Spring 2013, Kyushu University
Lecturer: Farhad MehdipourEmail: [email protected]
Web: http://www.c.csce.kyushu-u.ac.jp/~farhad
A Typical Computer Organization
CPU: Central Processing UnitRF: Register FileALU: Arithmetic & Logic UnitI/O: Input/Output
2
3
Designing Computers
All computers more or less based on the same basic design: the Von Neumann Architecture!
4
The Von Neumann Architecture
• Model for designing and building computers, based on the following three characteristics:
1) The computer consists of four main sub-systems:• Memory• ALU (Arithmetic/Logic Unit)• Control Unit• Input/Output System (I/O)
2) Program is stored in memory during execution.3) Program instructions are executed sequentially.
5
The Von Neumann Architecture
Memory
Processor (CPU)
Input/OutputControl Unit
ALUStore data and program
Execute program
Do arithmetic/logic operationsrequested by program
Communicate with"outside world", e.g. • Screen• Keyboard• Storage devices • ...
Bus
Classes of Computers• 1960s - large mainframes
– Costing millions of dollars– Stored in computer rooms– Multiple operators– Typical applications: business data processing and
large-scale scientific computing
• 1970s - the birth of the minicomputer– A smaller-sized and cheaper computer
• Also the emergence of supercomputers– High-performance computers for scientific computing
6
Classes of Computers
• 1980s - the rise of the desktop computer based on microprocessors– Personal computers– Workstations
• 1990s - the emergence of – The Internet and the World Wide Web– The first successful handheld computing devices
(personal digital assistants or PDAs)– High-performance digital consumer electronics– Cell phones and smart phones
7
Personal Mobile Device (PMD)
8
• Wireless devices with multimedia interfaces such as cell phones, smartphones, tablet computers and ….
• Requirements– Cost– Energy efficiency – Real-time performance– Minimized memory
Desktop Computers
• One of the largest markets in dollar terms
• Low-end (<$500) to high-end ($5K) systems
• Optimized price-performance– Performance measured in the no. of
calculations and graphic operations– Price is what matters to customers
9
10
Servers• Provide large-scale and more reliable file and
computing services (Web servers)• Key requirements– Dependability – effectively provide service 24/7/365 (Yahoo!,
Google, eBay)– Scalability – server systems grow over time, so the ability to
scale up the computing capacity is crucial– Performance – transactions per minute
Clusters/Warehouse-Scale Computers
• Software as a Service(SaaS)– Search– Social networking– Video sharing – Multiplayer games
• Each nodes runs its own OS and nodes communicate using a network protocol.
• The largest of the clusters are called Warehouse-Scale Computers (WSC), tens of thousands of servers can act as one.
• Power (80% of the cost of $90M a WCS is associated with power and cooling)
• As clusters grow in popularity, the number of conventional supercomputers is shrinking.
11
Google’s data center
Embedded Computers• Computers as parts of other devices where their presence is
not obviously visible– e.g., home appliances, printers, smart cards,
cell phones, set-top boxes, gaming consoles, network routers.
• Fastest growing portion of the market
• Wide range of processing power and cost– $0.1 (8-bit, 16-bit processors), $10 (32-bit, capable to
execute 50M instructions per second), $100-$200 (high-end video gaming consoles and network switches)
• Requirements– Real-time performance
(e.g., time to process a video frame is limited)– Minimized memory– Minimized power – Price, Weight, Size 12
13
Classes of Computers
• These changes in computer use have led to five different computing markets:
Exciting Change
Eniac, 1946
Occupied 17x10 meter ^2 room, weighted 30 tones, contained 18000 electronic valves, consumed 150KW of electrical power;capable to perform 5K addition per second
It impacts every aspect of human life.
PlayStation Portable (PSP)
Approx. 170 mm (L) x 74 mm (W) x 23 mm (D) Weight: Approx. 260 g (including battery) CPU: PSP CPU (clock frequency 1~333MHz) Main Memory: 32MB Embedded DRAM: 4MB Profile: Game, Audio, Video
14
Evolution of Computers
First generation (1939-1954) - vacuum tube
Second generation (1954-1959) - transistor
Third generation (1959-1971) - IC
Fourth generation (1971-present) - microprocessor
15
Technology Used in Computers
Vacuum Tube
Transistors
Microprocessor VLSI* chips
*VLSI: Very large-scale integration16
Integrated Circuit- IC
Wafer & Die
17
Wafer
Die
x mm (e.g. 100 mm)
20~30 cm
X nm(nanometer)
Evolution of Computers
• The first programmable electronic digital computer
• 18,000 vacuum tubes• 30 ton, 30m x 2.5m x 1m• 5000 additions per second• 20×10-decimal-digit words• Programmed by 3000 switches• Cost: almost $500,000
(approximately $6,000,000 today)(became stored program in 1948
following von Neumann's advise)
18
First Generation: ENIAC, 1946 (U of Penn) –Vacuum Tubes
Evolution of Computers
Second generation (1954-1959) - Transistor
http://history.acusd.edu/gen/recording/computer1.htmlhttp://www.computer50.org/kgill/transistor/trans.html
Manchester University Experimental Transistor Computer
19
Commercialization in the 50s• UNIVAC, 1951, the first commercial computer
– contract price $400K, actual cost ~$1M, sold 48 copies• IBM 701, 1952, shipped 19 copies
– leased at $12K per month• IBM 650, 1953, mass produced ~2000 units
– $200K ~ 400K• IBM System/360, 1964
– A family of binary compatible computer– 19 combinations of varying speed and memory capacity from $200K ~
$2M– Still lives on today as the “highly-profitable” IBM z900 series
20
Evolution of Computers
Third generation (1959-1971) - IC
http://history.acusd.edu/gen/recording/computer1.htmlhttp://www.piercefuller.com/collect/pdp8.html
PDP-8, Digital Equipment Corporation
¾ Thanks to the use of ICs, the DEC PDP-8 is the least expensive general-purpose small computer in 1960s
21
Cheaper or Faster in 60s and 70s
• Minicomputers – DEC PDP-8, 1965, $20K, size of large refrigerators– Less powerful than “mainframes”, 10x cheaper– Departmental computers--PDP-11 and VAXs enjoyed extreme
popularity in the 70s and 80s
• Supercomputers– Performance at all cost!! – Biggest customers: national security, nuclear weapons, cryptography,
(also aerospace, petroleum, automotive, pharmaceutical, sciences) check out www.top500.org
22
Evolution of Computers
Fourth generation (1971-present) - microprocessor ¾ In 1971, Intel developed 4-bit 4004 chip for calculator applications.
ALU
Instructiondecoder
Reg.
Programcounter
I/ORefreshlogic
System bus
Control logic
ROM/RAM buffer Timing Reset
http://www.intel.com
A good review article: The History of The Microprocessor, Bell Labs Technical Journal, 1997.
Block diagram of Intel 4004 4004 chip layout
23
Early Examples
DEC PDP 8, 1963 An early mini
Xerox Alto, 1973An early “PC” with mouse
24
Cray 3, 1993
• Up to 16 processors and up to 2 gigawords (16 GB) of memory• Power consumption: 90KW• 15 GFLOPS (1 sec on Cray3 ≈ 67 years ENIAC)• $30,000,000
25
Microprocessor Generations• First generation: 1971-78
– Behind the power curve (16-bit, <50k transistors)
• Second Generation: 1979-85– Becoming “real” computers
(32-bit , >50k transistors)• Third Generation: 1985-89
– Challenging the “establishment” (Reduced Instruction Set Computer/RISC, >100k transistors)
• Fourth Generation: 1990-– Architectural and performance leadership
(64-bit, > 1M transistors, Intel/AMD translate into RISC internally)
26
Intel 4004 @ 70s
• Intel 4004, first single chip CPU– 4- bit processor for a calculator– 2,300 transistors– 16-pin DIP package– 740kHz (eight clock cycles per CPU
cycle of 10.8 microseconds)– ~ 100K OPs per second
27
Intel Itanium 9500 Series
• 64-bit processor• 3.1 billion transistors• 2.53 GHz, issue up to 12
instructions per cycle• 8 Cores• 54 MByte of cache!!
In ~40 years, about 1,000,000 times growth in transistor count and performance!
28
Key Architectural Trends
• Increase performance at 1.6x per year (2X/1.5yr) – True from 1985-present
• Combination of Technology and Architectural enhancements– Technology provides faster transistors
Faster transistors leads to high clock rates
– More transistors (“Moore’s Law”):• Architectural ideas turn transistors into performance
– Responsible for about half the yearly performance growth
• Two key architectural directions– Sophisticated memory hierarchies– Exploiting instruction level parallelism
29
Moor’s Law
30
Transistor count doubles every 18-24 months!
Transistor Count-Intel Processors
Transistor count doubles every 18-24 months
31
32
Processor Transistor Count
Intel 4004, 2300tr(1971) Intel P4 – 55M tr
(2001)
Intel McKinley – 221M tr.(2001)
Intel Core 2 Extreme Quad-core 2x291M tr.
(2006)
Microprocessors (Y2K-2014)
Year of 1st shipment 1997 1999 2002 2005 2008 2011 2014Clock Frequency (GHz) 0.75 1.2 1.6 2 2.5 3 3.674Chip Size (mm²) 300 340 430 520 620 750 901Transistors per chip 11M 21M 76M 200M 520M 1,4B 3,62B
33
Towards RISCs• Two significant changes:
– Virtual elimination of assembly language programming reduced the need for object-code compatibility
– The creation of standardized, vendor-independent operating systems (UNIX and Linux)
• These changes – A new set of architectures with simpler instructions, called RISC (
Appendix I) (early 1980s).
• RISC-based machines focused on– the exploitation of Pipelining (Appendix II) and
Instruction Level Parallelism (Appendix III) – use of Caches
34
Growth in Processor Performance
35
• Advances in technology• Innovations in computer design
Growth in Processor Performance
36
RISC
• ILP (pipelining, multiple instruction issue)• Use of caches
Growth in Processor Performance
37
RISC
Forcing prior architectures to keep up or disappear• Digital Equipment VAX was replaced by a RISC architecture• Intel rose to the challenge, primarily by translating x86 (or IA-32) instructions into RISC-like
instructions internally
Growth in Processor Performance
38
RISC
• Little ILP left to exploit efficiently (ILP-Wall)• Almost unchanged memory latency (Memory-Wall-Appendix IV)• Maximum power dissipation of air-cooled chips (Power-Wall- Appendix V)
Growth in Processor Performance
39
RISC
Move to Multiprocessor
• Maximum power dissipation of air-cooled chips• Little ILP left to exploit efficiently• Almost unchanged memory latency
Multiprocessor• “We are dedicating all of our future product development to
multicore designs. … This is a sea change in computing” Paul Otellini, President, Intel (2005)
• All microprocessor companies switch to MP (2X CPUs / 2 yrs)
Manufacturer/Year AMD/’05 Intel/’06 IBM/’04 Sun/’05
Processors/chip 2 2 2 8Threads/Processor 1 2 2 4
Threads/chip 2 4 4 32
40
Future of Computers• End of Moore’s law
– Future of VLSI technology after 2015 is unknown Transistor size will be measured in atoms and node charge will be measured in electrons!! It doesn’t mean VLSI is finished, just no more scaling
• Non-von Neumann architectures toward:– Parallel and distributed processing– Reconfigurable hardware computing
• Non-silicon technologies– Nanotechnologies: carbon nanotubes, molecular switches– Biological/cellular computers: DNA, proteins and enzymes– Quantum computers: magnetic resonance and quantum dots.
• New ways of using computers!!!
41
Thank you!
42
Appendix I:RISC-Reduced Instruction Set Architectures
• Properties of RISC architectures:– All ops on data apply to data in registers and typically
change the entire register (32-bits or 64-bits).– The only ops that affect memory are load/store
operations. Memory to register, and register to memory.
– Load and store ops on data less than a full size of a register (32, 16, 8 bits) are often available.
– Usually instructions are few in number (this can be relative) and are typically one size.
Back
Appendix II: Pipelining
Single-Cycle CPU
Multiple Cycle CPU
Pipelined CPUCycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8
IF Dec EX Mem WBLoadIF Dec EX Mem WBLoad
IF Dec EX Mem WBLoadIF Dec EX Mem WBLoad
Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5
IF Dec EX Mem WBLoad
IF Dec EX Mem WBLoad
44
Back
Appendix III: Instruction Level Parallelism
45
• Architectural technique that allows the overlap of individual machine operations ( add, mul, load, store …)
• Multiple operations execute in parallel (simultaneously)• Goal: Speed Up the execution• Example:
instr. 1: sub R1 R1, “1”instr. 2: add R4 R1, R3
instr. 3: add R5 R3, R2
• Sequential execution (Without ILP)each instruction takes one cycle
Total execution time: 3 cycles• ILP execution (overlap execution)
instr. 1 or instr. 2 can run simultaneously with instr. 3
Total execution time: 2 cycles Back
Appendix IV: Memory Wall
46
Back
Appendix V: Power Wall
47
Back