+ All Categories
Home > Documents > Measuring and Discussing Computer Performance

Measuring and Discussing Computer Performance

Date post: 17-Jan-2022
Category:
Upload: others
View: 4 times
Download: 2 times
Share this document with a friend
13
1 CSE141 - Carro Measuring and Discussing Computer Performance 2 CSE141 - Carro Before that, some announcements! Tas office hours: Ryan: M W , 2:15-3:15, R3349A Jessica: Tu Th, 1:00-2:00 Fritz: F , 1:30-3:30 Misaki (tutor): to be defined There are 15 seats available in 141, and 21 in 141L; I will accept the remaining 8 students (total of 108). Now you should: drop from the waiting list; add into the class. Welcome! Web page on air, please look at it frequently 3 CSE141 - Carro Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine’s instruction set affect performance? Performance
Transcript
Page 1: Measuring and Discussing Computer Performance

1CSE141 - Carro

Measuring and DiscussingComputer Performance

2CSE141 - Carro

Before that, some announcements!

Tas office hours:Ryan: M W , 2:15-3:15, R3349AJessica: Tu Th, 1:00-2:00Fritz: F , 1:30-3:30Misaki (tutor): to be defined

There are 15 seats available in 141, and 21 in 141L;I will accept the remaining 8 students (total of 108).Now you should:

drop from the waiting list;add into the class.

Welcome!

Web page on air, please look at it frequently

3CSE141 - Carro

Measure, Report, and SummarizeMake intelligent choicesSee through the marketing hypeKey to understanding underlying organizational motivation

Why is some hardware better than others for different programs?

What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)

How does the machine’s instruction set affect performance?

Performance

Page 2: Measuring and Discussing Computer Performance

4CSE141 - Carro

Which of these airplanes has the best performance?

Airplane Passengers Range (mi) Speed (mph)

Boeing 737-100 101 630 598Boeing 747 470 4150 610BAC/Sud Concorde 132 4000 1350Douglas DC-8-50 146 8720 544

How much faster is the Concorde compared to the747?

How much bigger is the 747 than the Douglas DC-8?

Which one to pick for a 500 miles trip: 747 or 737

737: 50 min; 747: 49 min!

It is hard to say based only on one parameter; real lifeneeds a combination of various characteristics

5CSE141 - Carro

Combining information:

Airplane Passengers Range (mi) Speed (mph) pXmph

Boeing 737-100 101 630 598 60,398Boeing 747 470 4150 610 286,700BAC/Sud Concorde 132 4000 1350 178,200Douglas DC-8-50 146 8720 544 79424

More combinations can be made:

Speed / gallons of fuel;

Range / gallons of fuel;

Passenger*Range; Passenger*Range/gallons.

6CSE141 - Carro

Response Time (latency)

Throughput

If we upgrade a machine with a new processor what do we increase?

If we add a new machine to the lab what do we increase?

Computer Performance: TIME, TIME, TIME

Page 3: Measuring and Discussing Computer Performance

7CSE141 - Carro

Elapsed Timecounts everything (disk and memory accesses, I/O , etc.)a useful number, but often not good for comparison purposes

CPU timedoesn’t count I/O or time spent running other programscan be broken up into system time, and user time

Our focus: user CPU timetime spent executing the lines of code that are "in" our program

Execution Time

8CSE141 - Carro

Relative Performance

can be confusingA runs in 12 secondsB runs in 20 seconds

A/B = .6 , so A is 40% faster, or 1.4X faster, or Bis 40% slowerB/A = 1.67, so A is 67% faster, or 1.67X faster, orB is 67% slower

needs a precise definition

9CSE141 - Carro

For some program running on machine X,

PerformanceX = 1 / Execution timeX

"X is n times faster than Y"

PerformanceX / PerformanceY = n

Problem:machine A runs a program in 20 secondsmachine B runs the same program in 25 seconds

Book’s Definition of Performance

PerformanceA/PerformanceB=ExecutionTimeB/ExecutionTimeA= n = 20/12 = 1.67

A is 1.67X faster than B!

Page 4: Measuring and Discussing Computer Performance

10CSE141 - Carro

Clock Cycles

Instead of reporting execution time in seconds, we often use cycles

cycle time = time between ticks = seconds per cycleclock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec)

A 200 Mhz clock has a cycle time

time

seconds

program=

cycles

program

seconds

cycle

1

200 106 109 = 5 nanoseconds

11CSE141 - Carro

To make it clear:

A certain program needs 6000 cycles, and the clock is running at200MHz. How long will it take to complete the program?

Time for program = cycles the program needs*second/cycle = 6000*1/200e6=30s!

Piece of cake!

12CSE141 - Carro

So, to improve performance (everything else being equal) you can either

________ the # of required cycles for a program, or

________ the clock cycle time or, said another way,

________ the clock rate.

How to Improve Performance

seconds

program=

cycles

program

seconds

cycle

decrease

decrease

increase

Page 5: Measuring and Discussing Computer Performance

13CSE141 - Carro

Could assume that # of cycles = # of instructions

This assumption is incorrect,

different instructions take different amounts of time on different machines.

Why?

time

1st i

nstr

uctio

n

2nd

inst

ruct

ion

3rd

inst

ruct

ion

4th

5th

6th ...

How many cycles are required for a program?

14CSE141 - Carro

Multiplication takes more time than addition

Floating point operations take longer than integer ones

Accessing memory takes more time than accessing registers

Important point: changing the cycle time often changes the number ofcycles required for various instructions (more later)

time

Different numbers of cycles for different instructions

15CSE141 - Carro

Brief review: what is time

CPU Execution Time = CPU clock cycles * Clock cycle time

Every conventional processor has a clock with an associated clockcycle time or clock rateEvery program runs in an integral number of clock cycles

MHz = millions of cycles/secondX MHz = 1000/X nanoseconds cycle time

Page 6: Measuring and Discussing Computer Performance

16CSE141 - Carro

How many clock cycles?

Number of CPU cycles = Instructions executed *Average Clock Cycles per Instruction (CPI)

17CSE141 - Carro

All Together Now

CPU ExecutionTime

InstructionCount

CPI Clock CycleTime= X X

instructionscycles/instruction seconds/cycle

seconds

18CSE141 - Carro

Our favorite program runs in 10 seconds on computer A, which has a400 Mhz. clock. We are trying to help a computer designer build a newmachine B, that will run this program in 6 seconds. The designer can usenew (or perhaps more expensive) technology to substantially increase theclock rate, but has informed us that this increase will affect the rest of theCPU design, causing machine B to require 1.2 times as many clock cycles asmachine A for the same program. What clock rate should we tell thedesigner to target?"

Don’t Panic, can easily work this out from basic principles

Example

Page 7: Measuring and Discussing Computer Performance

19CSE141 - Carro

A: 400MHz, 10s, X clock cycles;B: ? MHz, 6s, 1.2*X clock cycles;

10s = X * 1/400MHz; program takes 4000X106 cyclesB takes 1.2*4000X106 = 4800X106 cyclessince B requires 6s, 4800X106 cycles/6s = 800X106 cycles/s,

OR: 800MHz! See that we most double the clock frequency so that Bis 10/6=1.67 faster than A.

20CSE141 - Carro

A given program will require

some number of instructions (machine instructions)

some number of cycles

some number of seconds

We have a vocabulary that relates these quantities:

cycle time (seconds per cycle)

clock rate (cycles per second)

CPI (cycles per instruction)

a floating point intensive application might have a higher CPI

MIPS (millions of instructions per second)

this would be higher for a program using simple instructions

Now that we understand cycles

21CSE141 - Carro

Performance

Performance is determined by execution timeDo any of the other variables equal performance?

# of cycles to execute program?# of instructions in program?# of cycles per second?average # of cycles per instruction?average # of instructions per second?

Common pitfall: thinking one of the variables is indicative of

Page 8: Measuring and Discussing Computer Performance

22CSE141 - Carro

Suppose we have two implementations of the same instruction setarchitecture (ISA).

For some program,

Machine A has a clock cycle time of 10 ns. and a CPI of 2.0Machine B has a clock cycle time of 20 ns. and a CPI of 1.2

What machine is faster for this program, and by how much?

CPI Example

Same ISA, same instructions!

A: 2.0*I*10ns = 20nsB: 1.2*I*20ns = 24nsCPU_TIMEA/CPU_TIMEB = execTimeB/execTimeA= 2.4/2.0 = 1.2

23CSE141 - Carro

Who Affects Performance?

programmercompilerinstruction-set architectmachine architecthardware designermaterials scientist/physicist/silicon engineer

Clock CycleTime

CPU ExecutionTime

InstructionCount

CPI= X X

24CSE141 - Carro

Performance Variation

Clock CycleTime

CPU ExecutionTime

InstructionCount

CPI= X X

Number ofinstructions

CPI Clock Cycle Time

Same machine differentprograms

same programs,different machines,same ISASame programs,different machines

different same

differentdifferent

differentdifferent

same

Somewhat

different

similar

Page 9: Measuring and Discussing Computer Performance

25CSE141 - Carro

A compiler designer is trying to decide between two code sequencesfor a particular machine. Based on the hardware implementation,there are three different classes of instructions: Class A, Class B,and Class C, and they require one, two, and three cycles(respectively).

The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of CThe second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.

Which sequence will be faster? How much?What is the CPI for each sequence?

# of Instructions Example

26CSE141 - Carro

Solution

Instructionclass

CPI for theinstruction class

A 1

B 2

C 3

Codesequence

A B C

1 2 1 2

2 4 1 1

CPU_CK_CYCLES1 = 2*1 + 1*2 + 2*3 = 10 cycles

CPU_CK_CYCLES2 = 4*1 + 1*2 + 1*3 = 9 cycles

Instruction count for sequences

=5 inst

=6 inst

27CSE141 - Carro

Two different compilers are being tested for a 500 MHz. machine withthree different classes of instructions: Class A, Class B, and ClassC, which require one, two, and three cycles (respectively). Bothcompilers are used to produce code for a large piece of software.

The first compiler’s code uses 5 million Class A instructions, 1million Class B instructions, and 1 million Class C instructions.

The second compiler’s code uses 10 million Class A instructions, 1million Class B instructions, and 1 million Class C instructions.

Which sequence will be faster according to MIPS?Which sequence will be faster according to execution time?

MIPS example

Page 10: Measuring and Discussing Computer Performance

28CSE141 - Carro

MIPS example II

Code from A B C

Compiler 1 5 1 1

Compiler 2 10 1 1

Instruction count (billions) for instruction class

CPU_CK_CYCLES1 = 5*1 + 1*2 + 1*3 = 10x109

CPU_CK_CYCLES2 = 10*1 + 1*2 + 1*3 = 15x109

EXEC_TIME1 = 10x109/500x106=20s

EXEC_TIME2 = 15x109/500x106=30s

29CSE141 - Carro

MIPS example III

MIPS= Instruction count / (execution timeX106)MIPS1 = (5+1+1)x109/20x106=350MIPS2 = (10+1+1)x109/30x106=400

Code for compiler 2 has higher MIPS, but code from compiler 1 runsfaster!!!!

30CSE141 - Carro

Performance best determined by running a real applicationUse programs typical of expected workloadOr, typical of expected class of applications

e.g., compilers/editors, scientific applications, graphics, etc.Small benchmarks

nice for architects and designerseasy to standardizecan be abused

SPEC (System Performance Evaluation Cooperative)companies have agreed on a set of real program and inputs

valuable indicator of performance (and compiler technology)

Benchmarks

Page 11: Measuring and Discussing Computer Performance

31CSE141 - Carro

0

100

200

300

400

500

600

700

800

tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc

BenchmarkCompiler

Enhanced compiler

SP

EC

per

form

ance

rat

io

32CSE141 - Carro

Benchmark Description

go Artificial intelligence; plays the game of Gom88ksim Motorola 88k chip simulator; runs test programgcc The Gnu C compiler generating SPARC codecompress Compresses and decompresses file in memoryli Lisp interpreterijpeg Graphic compression and decompressionperl Manipulates strings and prime numbers in the special-purpose programming language Perlvortex A database programtomcatv A mesh generation programswim Shallow water model with 513 x 513 gridsu2cor quantum physics; Monte Carlo simulationhydro2d Astrophysics; Hydrodynamic Naiver Stokes equationsmgrid Multigrid solver in 3-D potential fieldapplu Parabolic/elliptic partial differential equationstrub3d Simulates isotropic, homogeneous turbulence in a cubeapsi Solves problems regarding temperature, wind velocity, and distribution of pollutantfpppp Quantum chemistrywave5 Plasma physics; electromagnetic particle simulation

33CSE141 - Carro

Does doubling the clock rate double the performance?

Can a machine with a slower clock rate have better performance?

Clock rate (MHz)

SP

EC

int

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Pentium

Pentium Pro

PentiumClock rate (MHz)

SP

EC

fp

Pentium Pro

2

0

4

6

8

3

1

5

7

9

10

200 25015010050

Page 12: Measuring and Discussing Computer Performance

34CSE141 - Carro

Execution Time After Improvement =

Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )

Example:

"Suppose a program runs in 100 seconds on a machine, withmultiply responsible for 80 seconds of this time. How much do we have toimprove the speed of multiplication if we want the program to run 4 timesfaster?"

Amdahl’s Law (or: common sense as math)

100s/4 = (100-80) + 80/x

X=80/5=16, or multiplication should improve by a factor of 5

35CSE141 - Carro

Amdahl’s Law II

How about making the program run 5 times faster?

36CSE141 - Carro

Suppose we enhance a machine making all floating-point instructions runfive times faster. If the execution time of some benchmark before thefloating-point enhancement is 10 seconds, what will the speedup be if half ofthe 10 seconds is spent executing floating-point instructions?

We are looking for a benchmark to show off the new floating-point unitdescribed above, and want the overall benchmark to show a speedup of 3.One benchmark we are considering runs for 100 seconds with the oldfloating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield ourdesired speedup on this benchmark?

Example

Page 13: Measuring and Discussing Computer Performance

37CSE141 - Carro

Performance is specific to a particular program/s

Total execution time is a consistent summary of performance

For a given architecture performance increases come from:

increases in clock rate (without adverse CPI affects)improvements in processor organization that lower CPIcompiler enhancements that lower CPI and/or instruction count

Remember


Recommended