+ All Categories
Home > Documents > Chapter 1 Computer Abstractions and Technology 20100906

Chapter 1 Computer Abstractions and Technology 20100906

Date post: 07-Apr-2018
Category:
Upload: ying-jou-chen
View: 241 times
Download: 0 times
Share this document with a friend

of 71

Transcript
  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    1/71

    Computer Abstractionsand Technology

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    2/71

    Chapter 1 Computer Abstractions and Technology 2

    The Computer Revolution

    Progress in computer technology

    Underpinned by Moores Law

    Makes novel applications feasible

    Computers in automobiles

    Cell phones

    Human genome project

    World Wide Web

    Search Engines

    Computers are pervasive1.1Introdu

    ction

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    3/71

    Chapter 1 Computer Abstractions and Technology 3

    Classes of Computers

    Desktop computers

    General purpose, variety of software

    Subject to cost/performance tradeoff

    Server computers

    Network based

    High capacity, performance, reliability

    Range from small servers to building sized

    Embedded computers Hidden as components of systems

    Stringent power/performance/cost constraints

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    4/71

    Chapter 1 Computer Abstractions and Technology 4

    The Computer/Processor Market

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    5/71

    Chapter 1 Computer Abstractions and Technology 5

    What You Will Learn

    Computer organization

    Design a faster, and cheaper computer system

    How programs are translated into the machine

    language

    And how the hardware executes them

    The hardware/software interface

    What determines program performance

    And how it can be improved How hardware designers improve performance

    What is parallel processing

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    6/71

    Chapter 1 Computer Abstractions and Technology 6

    Below Your Program

    Application software

    Written in high-level language (HLL)

    E.g. MS Word, Power Point

    System software

    Compiler: translates HLL code to machinecode

    Operating System: service code

    Handling input/output

    Managing memory and storage Scheduling tasks & sharing resources

    Hardware

    Processor, memory, I/O controllers

    1.2

    BelowYourProgram

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    7/71Chapter 1 Computer Abstractions and Technology 7

    Levels of Program Code

    High-level language

    Level of abstraction closer to

    problem domain

    Provides for productivity and

    portability

    Assembly language

    Textual representation of

    instructions

    Hardware representation Binary digits (bits)

    Encoded instructions and data

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    8/71

    Levels of RepresentationFrom High Level Language to Hardware Language

    High Level Language

    Program (e.g., C)

    Assembly LanguageProgram (e.g.,MIPS)

    Machine LanguageProgram (MIPS)

    Hardware Architecture Description(e.g., Verilog Language)

    Compiler

    Assembler

    MachineInterpretation

    temp = v[k];

    v[k] = v[k+1];

    v[k+1] = temp;

    lw $t0, 0($2)lw $t1, 4($2)sw $t1, 0($2)sw $t0, 4($2)

    0000 1001 1100 0110 1010 1111 0101 1000

    1010 1111 0101 1000 0000 1001 1100 0110

    1100 0110 1010 1111 0101 1000 0000 1001

    0101 1000 0000 1001 1100 0110 1010 1111

    Logic Circuit Description(Verilog Language)

    ArchitectureImplementation

    wire [31:0] dataBus;

    regFile registers (databus);

    ALU ALUBlock (inA, inB, databus);

    wire w0;

    XOR (w0, a, b);

    AND (s, w0, a);

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    9/71

    Abstractions

    * Coordination of many levels of abstraction

    I/O systemProcessor

    Compiler

    Operating

    System

    (Windows)

    Application (Netscape)

    Digital Design

    Circuit Design

    Instruction SetArchitecture

    Datapath & Control

    transistors

    MemoryHardware

    Software Assembler

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    10/71

    Instruction Set Architecture

    Also called architecture

    A very important abstraction interface between hardware and low-level software

    Includes instructions, registers, memory access, I/O and so on

    advantage: different implementations of the same architecture

    disadvantage: sometimes prevents using new innovations

    True or False: Binary compatibility is extraordinarily important?

    Modern instruction set architectures: IA-32, PowerPC, MIPS, SPARC, ARM, and others

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    11/71

    Chapter 1 Computer Abstractions and Technology 11

    Components of a Computer

    Same components for

    all kinds of computer

    Desktop, server,

    embedded

    Input/output includes User-interface devices

    Display, keyboard, mouse

    Storage devices

    Hard disk, CD/DVD, flash

    Network adapters

    For communicating with other

    computers

    1

    .3UndertheCo

    vers

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    12/71

    Chapter 1 Computer Abstractions and Technology 12

    Anatomy of a Computer

    Outputdevice

    Inputdevice

    Inputdevice

    Networkcable

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    13/71

    Chapter 1 Computer Abstractions and Technology 13

    Opening the Box

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    14/71

    Chapter 1 Computer Abstractions and Technology 14

    A Safe Place for Data

    Volatile main memory Loses instructions and data when power off

    Non-volatile secondary memory Magnetic disk

    Flash memory

    Optical disk (CDROM, DVD)

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    15/71

    Chapter 1 Computer Abstractions and Technology 15

    Networks

    Communication and resource sharing

    Local area network (LAN): Ethernet

    Within a building

    Wide area network (WAN): the Internet

    Wireless network: WiFi, Bluetooth

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    16/71

    Chapter 1 Computer Abstractions and Technology 16

    Inside the Processor (CPU)

    Datapath: performs operations on data Control: sequences datapath, memory, ...

    Cache memory

    Small fast SRAM memory for immediateaccess to data

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    17/71

    Chapter 1 Computer Abstractions and Technology 17

    Inside the Processor

    AMD Barcelona: 4 processor cores

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    18/71

    Chapter 1 Computer Abstractions and Technology 18

    Why do all computer look alike?

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    19/71

    Back to History: Milestones

    Chapter 1 Computer Abstractions and Technology 19

    ENIACthe worlds first

    general-purpose

    electronic computer

    1936 1944/1946 1947Transistors

    John Von NeumannStored program

    1958Integrated

    circuits

    IBM 360Family of computersCompatible computer

    1964 1971

    Intel 4004First uP

    1977

    Apple IPersonalcomputer

    Information from wiki

    Alan TuringUniversal computing

    machine

    1961IBM StretchInstruction

    pipeline

    1980RISC

    Load-storeArch

    SUN/MIPS

    1970C.mmp

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    20/71

    von Neumann Architecture a stored-program digital computer that uses a central processing unit (CPU)

    and a single separate storage structure ("memory") to hold both instructionsand data.

    von Neumann bottleneck Throughput between CPU and memory

    Memory speed is much slower than CPU (Needs a cache memory, in chapter 5)

    Chapter 1 Computer Abstractions and Technology 20

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    21/71Chapter 1 Computer Abstractions and Technology 21

    Other CPUs beyond von Neumann Arch.

    Modified Harvard arch (in DSP)

    separated instruction and dataFound in DSP like TI OMAP in handset

    Dataflow architecture

    No program counterFound in graphic processing, network routing

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    22/71

    Chapter 1 Computer Abstractions and Technology 22

    Abstractions

    Abstraction helps us deal with complexity

    Hide lower-level detail

    Instruction set architecture (ISA) The hardware/software interface

    Application binary interface

    The ISA plus system software interface Implementation

    The details underlying and interface

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    23/71

    Chapter 1 Computer Abstractions and Technology 23

    Technology Trends

    Electronics technology

    continues to evolve

    Increased capacity and

    performance

    Reduced cost

    Year Technology Relative performance/cost

    1951 Vacuum tube 1

    1965 Transistor 351975 Integrated circuit (IC) 900

    1995 Very large scale IC (VLSI) 2,400,000

    2005 Ultra large scale IC 6,200,000,000

    DRAM capacity

    T h l T d M C it

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    24/71

    Technology Trends: Memory Capacity

    (Single-Chip DRAM)size

    Year

    1000

    10000

    100000

    1000000

    10000000

    100000000

    1000000000

    1970 1975 1980 1985 1990 1995 2000

    year size (Mbit)

    1980 0.0625

    1983 0.25

    1986 1

    1989 4

    1992 16

    1996 64

    1998 1282000 256

    2002 512 Now 1.4X/yr, or 2X every 2 years.

    8000X since 1980!

    T h l T d Mi

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    25/71

    Year

    1000

    10000

    100000

    1000000

    10000000

    100000000

    1970 1975 1980 1985 1990 1995 2000

    i80386

    i4004

    i8080

    Pentium

    i80486

    i80286

    i8086

    Technology Trends: Microprocessor

    Complexity

    2X transistors/Chip

    Every 1.5 years

    Called

    Moores Law

    Alpha 21264: 15 million

    Pentium Pro: 5.5 million

    PowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million

    Moores Law

    Athlon (K7): 22 Million

    Itanium 2: 410 Million

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    26/71

    Technology Trends: Processor Performance

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    87 88 89 90 91 92 93 94 95 96 97

    DEC Alpha

    21264/600

    DEC Alpha 5/500

    DEC Alpha 5/300

    DEC Alpha 4/266

    IBM POWER 100

    1.54X/yr

    Intel P4 2000 MHz(Fall 2001)

    Well talk about processor performance later on

    year

    Performan

    cemeasure

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    27/71

    Computer Technology - Dramatic Change!

    MemoryDRAM capacity: 2x / 2 years (since 96);64x size improvement in last decade.

    Processor

    Speed 2x / 1.5 years (since 85);100X performance in last decade.

    Disk

    Capacity: 2x / 1 year (since 97)250X size in last decade.

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    28/71

    Chapter 1 Computer Abstractions and Technology 28

    Why do you need to know the technology trend?Or What is the impact of technology trend tocomputer design

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    29/71

    The meaning of technology drive

    Forecastthe expected design performance

    within the next few years

    It change the waythat you design yourcomputer, and software Multi-processor, VLIW, parallel processor,

    efficient access via caches Size, speed, probability

    Other technology may change the way of thecomputer/processor design Quantum computer, DNA computer, bio-computer

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    30/71

    Chapter 1 Computer Abstractions and Technology 30

    Performance? How fast is my computer?

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    31/71

    Chapter 1 Computer Abstractions and Technology 31

    Understanding Performance

    Algorithm

    Determines number of operations executed

    Programming language, compiler, architecture

    Determine number of machine instructions executed

    per operation

    Processor and memory system

    Determine how fast instructions are executed

    I/O system (including OS)

    Determines how fast I/O operations are executed

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    32/71

    Chapter 1 Computer Abstractions and Technology 32

    Defining Performance

    Which airplane has the best performance?

    0 100 200 300 400 500

    Douglas

    DC-8-50

    BAC/Sud

    Concorde

    Boeing 747

    Boeing 777

    Passenger Capacity

    0 2000 4000 6000 8000 10000

    Douglas DC-

    8-50

    BAC/Sud

    Concorde

    Boeing 747

    Boeing 777

    Cruising Range (miles)

    0 500 1000 1500

    Douglas

    DC-8-50

    BAC/Sud

    Concorde

    Boeing 747

    Boeing 777

    Cruising Speed (mph)

    0 100000 200000 300000 400000

    Douglas DC-

    8-50

    BAC/Sud

    Concorde

    Boeing 747

    Boeing 777

    Passengers x mph

    1.4Performance

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    33/71

    Chapter 1 Computer Abstractions and Technology 33

    Response Time and Throughput

    Response time

    How long it takes to do a task

    Throughput

    Total work done per unit time

    e.g., tasks/transactions/ per hour

    How are response time and throughput affected

    by

    Replacing the processor with a faster version?

    Adding more processors?

    Well focus on response time for now

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    34/71

    Chapter 1 Computer Abstractions and Technology 34

    Relative Performance

    Define Performance = 1/Execution Time

    X is n time faster than Y

    n XY

    YX

    timeExecutiontimeExecution

    ePerformancePerformanc

    Example: time taken to run a program

    10s on A, 15s on B

    Execution TimeB / Execution TimeA= 15s / 10s = 1.5

    So A is 1.5 times faster than B

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    35/71

    Chapter 1 Computer Abstractions and Technology 35

    Measuring Execution Time

    Elapsed time

    Total response time, including all aspects

    Processing, I/O, OS overhead, idle time

    Determines system performance

    CPU time

    Time spent processing a given job

    Discounts I/O time, other jobs shares

    Comprises user CPU time and system CPU time

    Different programs are affected differently by CPU

    and system performance

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    36/71

    Chapter 1 Computer Abstractions and Technology 36

    CPU Clocking

    Operations of digital hardware governed by aconstant-rate clock

    Clock (cycles)

    Data transferand computation

    Update state

    Clock period

    Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 2501012s

    Clock frequency (rate): cycles per second

    e.g., 4.0GHz = 4000MHz = 4.0109Hz

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    37/71

    Chapter 1 Computer Abstractions and Technology 37

    CPU Time

    Performance improved by Reducing number of clock cycles

    Increasing clock rate

    Hardware designer must often trade off clockrate against cycle count

    RateClock

    CyclesClockCPUTimeCycleClockCyclesClockCPUTimeCPU

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    38/71

    Chapter 1 Computer Abstractions and Technology 38

    CPU Time Example

    Computer A: 2GHz clock, 10s CPU time

    Designing Computer B Aim for 6s CPU time

    Can do faster clock, but causes 1.2 clock cycles

    How fast must Computer B clock be?

    4GHz6s

    1024

    6s

    10201.2RateClock

    10202GHz10s

    RateClockTimeCPUCyclesClock

    6s

    CyclesClock1.2

    TimeCPU

    CyclesClockRateClock

    99

    B

    9

    AAA

    A

    B

    BB

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    39/71

    Chapter 1 Computer Abstractions and Technology 39

    Instruction Count and CPI

    Instruction Count for a program

    Determined by program, ISA and compiler

    Average cycles per instruction (CPI)

    Determined by CPU hardware

    If different instructions have different CPI Average CPI affected by instruction mix

    RateClock

    CPICountnInstructio

    TimeCycleClockCPICountnInstructioTimeCPUnInstructioperCyclesCountnInstructioCyclesClock

    C

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    40/71

    Chapter 1 Computer Abstractions and Technology 40

    CPI Example

    Computer A: Cycle Time = 250ps, CPI = 2.0

    Computer B: Cycle Time = 500ps, CPI = 1.2

    Same ISA

    Which is faster, and by how much?

    1.2500psI

    600psI

    ATimeCPU

    BTimeCPU

    600psI500ps1.2I

    BTimeCycle

    BCPICountnInstructio

    BTimeCPU

    500psI250ps2.0I

    ATimeCycleACPICountnInstructioATimeCPU

    A is faster

    by this much

    CPI i M D il

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    41/71

    Chapter 1 Computer Abstractions and Technology 41

    CPI in More Detail

    If different instruction classes take differentnumbers of cycles

    n

    1i

    ii )CountnInstructio(CPICyclesClock

    Weighted average CPI

    n

    1i

    i

    i CountnInstructio

    CountnInstructio

    CPICountnInstructio

    CyclesClock

    CPI

    Relative frequency

    CPI E l

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    42/71

    Chapter 1 Computer Abstractions and Technology 42

    CPI Example

    Alternative compiled code sequences using

    instructions in classes A, B, C

    Class A B C

    CPI for class 1 2 3

    IC in sequence 1 2 1 2IC in sequence 2 4 1 1

    Sequence 1: IC = 5

    Clock Cycles= 21 + 12 + 23= 10

    Avg. CPI = 10/5 = 2.0

    Sequence 2: IC = 6

    Clock Cycles= 41 + 12 + 13= 9

    Avg. CPI = 9/6 = 1.5

    P f S

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    43/71

    Chapter 1 Computer Abstractions and Technology 43

    Performance Summary

    Performance depends on

    Algorithm: affects IC, possibly CPI

    Programming language: affects IC, CPI

    Compiler: affects IC, CPI

    Instruction set architecture: affects IC, CPI, Tc

    cycleClock

    Seconds

    nInstructio

    cyclesClock

    Program

    nsInstructioTimeCPU

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    44/71

    Chapter 1 Computer Abstractions and Technology 44

    What Improvement can I gain if I speedupsomething?

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    45/71

    Limit of speedup Gain

    1

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    46/71

    Chapter 1 Computer Abstractions and Technology 46

    enhance: 0.5 + 0.5 = 1 enhance4=>4/(4+1) = 0.8 = F

    ,enhance1=> S = 4/1 = 4

    4 + 1speedup = -----------------------= ----------= 2.5

    1 + 1 (Amdahls Law):

    1 1speedup = ------------------------ = -------------------------

    ((1 - 0.8) + 0.8/4) (1 0.8) + 0.8/4 When S ->, speedup -> 5

    A d hl' L

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    47/71

    Chapter 1 Computer Abstractions and Technology 47

    Speedup due to enhancement E:

    Suppose that enhancement E accelerates a fraction F of the task bya factor S and the remainder of the task is unaffected then,

    Ew/oePerformanc

    Ew/ePerformanc

    Ew/TimeExecution

    Ew/oTimeExecutionSpeedup(E)

    E)Time(w/oExecution)S

    FF)((1E)Time(w/Execution

    F1

    1

    S

    FF)-(1

    1E)Speedup(w/ S

    Amdahl's Law

    Pitf ll A d hl L

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    48/71

    Chapter 1 Computer Abstractions and Technology 48

    Pitfall: Amdahls Law

    Improving an aspect of a computer and

    expecting a proportional improvement inoverall performance

    1.8FallaciesandPitfalls

    2080

    20 n

    Cant be done!

    unaffectedaffected

    improved T

    factortimprovemen

    TT

    Example: multiply accounts for 80s/100s

    How much improvement in multiply performance toget 5 overall?

    Corollary: make the common case fast

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    49/71

    Chapter 1 Computer Abstractions and Technology 49

    CPU Time is great? But how fast is my computer?Or how faster is my computer than that one?

    P f M t

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    50/71

    Chapter 1 Computer Abstractions and Technology 50

    Performance Measurement

    Two different machines X and Y.

    X is ntimes faster than Y

    Since execution time is the reciprocal of performance

    Says n-1 = m/100

    This concludes that X is m% faster than Y

    nX

    Y

    timeExecution

    timeExecution

    Y

    X

    X

    Y

    ePerformanc

    ePerformanc

    timeExecution

    timeExecution n

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    51/71

    Chapter 1 Computer Abstractions and Technology 51

    What Programs for Comparison?

    Whats wrong with this program as a workload?

    integer A[][], B[][], C[][];

    for (I=0; I

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    52/71

    Chapter 1 Computer Abstractions and Technology 52

    5 Levels of Programs Used for Evaluation

    Real applications

    Portability, compiler, OS Modified (or scripted) applications

    To enhance portability or to focus on one particular aspect of systemperformance

    Kernels

    Small, key pieces from real programs Best way to isolate performance of individual features

    Toy benchmark

    10~100 code lines

    Usually, the user already knows the evaluation results

    Synthetic benchmark

    Whetstone, Dhrystone

    Be created artificially to match an average execution profile

    No user runs it

    SPEC CPU Benchmark

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    53/71

    Chapter 1 Computer Abstractions and Technology 53

    SPEC CPU Benchmark

    Programs used to measure performance

    Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC)

    Develops benchmarks for CPU, I/O, Web,

    SPEC CPU2006

    Elapsed time to execute a selection of programs

    Negligible I/O, so focuses on CPU performance

    Normalize relative to reference machine

    Summarize as geometric mean of performance ratios

    CINT2006 (integer) and CFP2006 (floating-point)

    n

    n

    1i

    iratiotimeExecution

    CINT2006 f O t X4 2356

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    54/71

    Chapter 1 Computer Abstractions and Technology 54

    CINT2006 for Opteron X4 2356

    Name Description IC109 CPI Tc (ns) Exec time Ref time SPECratio

    perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3

    bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8

    gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1

    mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8

    go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6

    hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5

    sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5

    libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8

    h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3

    omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1

    astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1

    xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0Geometric mean 11.7

    High cache miss rates

    Reporting Performance

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    55/71

    Chapter 1 Computer Abstractions and Technology 55

    Reporting Performance

    Guiding principle: reproducible

    List everything another experimenter would need to duplicate the results(especially, the input set)

    Example

    Hardware

    CPU 3.2-GHz Pentium 4 Extreme EditionL3 Cache size 2048KB (I+D) on chip

    Memory 4 x 512 MB

    Disk subsystem 1 x 80GB ATA/100 7200RPM

    SoftwareOS Windows XP Professional SP1

    Compiler Intel C++ Compiler 7.1

    C /S i P f

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    56/71

    Chapter 1 Computer Abstractions and Technology 56

    Compare/Summarize Performance

    WLOG, 2 different ways

    1. Arithmetic mean

    Timei is the execution time for the ith program in the workload

    Weighted arithmetic mean

    Weighti factors add up to 1

    2. Geometric mean

    To normalize to a reference machine (e.g. SPEC) Execution time ratioi is the execution time normalized to the reference

    machine, for the ith program

    n

    in 1iTime

    1

    n

    i 1

    ii TimeWeight

    n

    n

    i

    i1

    ratiotimeExecution

    Which one is better ? Or is there any difference?

    Example

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    57/71

    Chapter 1 Computer Abstractions and Technology 57

    Example

    1. The arithmetic mean performance varies from ref. to ref.2. The geometric mean performance is consistent

    Remark

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    58/71

    Chapter 1 Computer Abstractions and Technology 58

    Remark

    SPECRatio is just a ratio rather than an absolute esecution time

    Note that when comparing 2 computers as a ratio, execution times on

    the reference computer drop out, so choice of reference computer is

    irrelevant

    B

    A

    A

    B

    B

    reference

    A

    reference

    B

    A

    ePerformanc

    ePerformanc

    imeExecutionT

    imeExecutionT

    imeExecutionT

    imeExecutionT

    imeExecutionT

    imeExecutionT

    SPECRatio

    SPECRatio

    25.1e.g.

    SPEC CINT2000 and CFP2000 Rating for

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    59/71

    Chapter 1 Computer Abstractions and Technology 59

    gPentium III and 4 at Different Clock Rates

    1.Performance scales with the clock frequency. Not the usual case.

    Losses in memory system is not presented.

    2. Pentium III performs better for CINT2000 than for CFP2000.

    Pentium 4 is reverse.

    Pitf ll MIPS P f M t i

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    60/71

    Chapter 1 Computer Abstractions and Technology 60

    Pitfall: MIPS as a Performance Metric

    MIPS: Millions of Instructions Per Second

    Doesnt account for

    Differences in ISAs between computers

    Differences in complexity between instructions

    66

    6

    10CPI

    rateClock

    10rateClock

    CPIcountnInstructio

    countnInstructio

    10timeExecution

    countnInstructioMIPS

    CPI varies between programs on a givenCPU

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    61/71

    Chapter 1 Computer Abstractions and Technology 61

    Why do we need parallel processing?Fact behind multiprocessor?

    Uniprocessor Performance

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    62/71

    Chapter 1 Computer Abstractions and Technology 62

    Uniprocessor Performance1

    .6TheSeaCha

    nge:TheSwitchtoMultiprocessors

    Constrained by power, instruction-level parallelism,memory latency

    Power Trends

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    63/71

    Chapter 1 Computer Abstractions and Technology 63

    Power Trends

    In CMOS IC technology

    1.5ThePowerWall

    FrequencyVoltageloadCapacitivePower 2

    100030 5V 1V

    Power is a Limiter

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    64/71

    64

    Power is a Limiter

    5KW18KW

    1.5KW

    500W

    40048008

    80808085

    8086286

    386486

    Pentium

    0.1

    1

    10

    100

    1000

    10000

    100000

    1971 1974 1978 1985 1992 2000 2004 2008

    Power(Watts)

    Power delivery and dissipation will be prohibitive !Source: Borkar, De Intel

    P6

    transitionfrom NMOSto CMOS

    Power Density will Increase

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    65/71

    65

    Power Density will Increase

    40048008

    8080

    8085

    8086

    286 386 486Pentium

    P6

    1

    10

    100

    1000

    10000

    1970 1980 1990 2000 2010

    PowerDensity(W/cm2)

    Hot Plate

    Nuclear

    Reactor

    Rocket

    Nozzle

    Power densities too high to keep junctions at low temps

    Suns

    Surface

    Source: Borkar, De Intel

    Reducing Power

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    66/71

    Chapter 1 Computer Abstractions and Technology 66

    Reducing Power

    Suppose a new CPU has

    85% of capacitive load of old CPU

    15% voltage and 15% frequency reduction

    0.520.85FVC

    0.85F0.85)(V0.85C

    P

    P4

    old

    2

    oldold

    old

    2

    oldold

    old

    new

    The power wall

    We cant reduce voltage further We cant remove more heat

    How else can we improve performance?

    Multiprocessors

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    67/71

    Chapter 1 Computer Abstractions and Technology 67

    Multiprocessors

    Multicore microprocessors

    More than one processor per chip

    Requires explicitly parallel programming

    Compare with instruction level parallelism

    Hardware executes multiple instructions at once

    Hidden from the programmer

    Hard to do

    Programming for performance Load balancing

    Optimizing communication and synchronization

    SPEC Power Benchmark

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    68/71

    Chapter 1 Computer Abstractions and Technology 68

    SPEC Power Benchmark

    Power consumption of server at differentworkload levels

    Performance: ssj_ops/sec

    Power: Watts (Joules/sec)

    10

    0i

    i

    10

    0i

    i powerssj_opsWattperssj_opsOverall

    SPECpower ssj2008 for X4

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    69/71

    Chapter 1 Computer Abstractions and Technology 69

    SPECpower_ssj2008 for X4

    Target Load % Performance (ssj_ops/sec) Average Power (Watts)

    100% 231,867 295

    90% 211,282 286

    80% 185,803 275

    70% 163,427 265

    60% 140,160 256

    50% 118,324 246

    40% 920,35 233

    30% 70,500 222

    20% 47,126 206

    10% 23,066 1800% 0 141

    Overall sum 1,283,590 2,605

    ssj_ops/ power 493

    Fallacy: Low Power at Idle

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    70/71

    Chapter 1 Computer Abstractions and Technology 70

    Fallacy: Low Power at Idle

    Look back at X4 power benchmark

    At 100% load: 295W

    At 50% load: 246W (83%)

    At 10% load: 180W (61%)

    Google data center

    Mostly operates at 10% 50% load

    At 100% load less than 1% of the time

    Consider designing processors to makepower proportional to load

    Concluding Remarks

  • 8/4/2019 Chapter 1 Computer Abstractions and Technology 20100906

    71/71

    Concluding Remarks

    Cost/performance is improving

    Due to underlying technology development

    Hierarchical layers of abstraction

    In both hardware and software

    Instruction set architecture

    The hardware/software interface

    Execution time: the best performance measure

    Power is a limiting factor Use parallelism to improve performance

    1.9C

    oncludingRem


Recommended