INTRODUCTION Jehan-François Pâris jparis@uh.edu. An evolving field Computer architectures keep...

Post on 26-Dec-2015

219 views 0 download

Tags:

transcript

INTRODUCTION

Jehan-François Pârisjparis@uh.edu

An evolving field

• Computer architectures keep changing– Building faster computers

• Supercomputers and data centers– Building cheaper, smaller computers

• Laptops, notebooks, netbooks, smartbooks– Putting computer systems everywhere

• Cars, cell phones, HDTV:embedded computers

An analogy

• Electrical motors– Replaced the single steam engine powering

many machines through transmission belts and pulleys

– One electrical motor per machine– Domestic appliances, car starters, …– Power tools– Power windows, electrical toothbrushes, …

The coming revolution

• Cannot increase CPU clock frequency above2 GHz without running into unsolvableheat dissipation problems– Switch to multicore architectures

• Two, four, eight, … CPUs per chip– Creates new problems

• Hardware: cache synchronization• Software: programming these beasts Ouch!

Other challenges

• Reducing power consumption of data centers– Often contain archival data that are

very rarely accessed • Finding new ways to keep increasing magnetic

disk capacity• Dealing with physical limits to SDRAM density

– Will never get 8 TB SODIMM modules• Finding a replacement for hard drives

Classical computer components

• Input• Output• Memory• Datapath• Control

– Datapath + Control = Processor• Storage subsystem is missing!

A laptop motherboard

The course philosophy

• Showing you how computer work is fine

• Showing you how to make them faster is better!

PERFORMANCE ISSUES

• Defining performance• Measuring it

– Not an easy task• Evaluating the impact of

– Amount of work done by each instruction– Time they take to run– CPU clock speed

Measuring Performance

• Inverse of execution time of a benchmarkPerformance = 1/Execution Time

• If computers A and B are such thatExecution TimeA < Execution TimeB

for the same benchmark, thenPerformanceA > PerformanceB

SPEC CPU Benchmark

• SPEC CPU2006– Set of 12 integer and 17 floating-point

benchmarks– Results are normalized:

Execution on a reference processor /Execution on benchmarked processor

– Single value is geometric mean of these ratios

How is it computed (I)

• Two new processors P and Q compared toa reference processor R

• Execution times for n benchmarks– P1, P2, …, Pn

– Q1, Q2, …, Qn

– R1, R2, …, Rn

How it is computed

• SPEC value for processor P is

• Observe that

• (property of geometric mean)

nn

ii

iP P

RSPEC

1

nn

ii

i

Q

P

PQ

SPECSPEC

1

Impact of Instruction Set

• Execution Time =Number of Instructions ×Mean Instruction Execution Time

– Gave birth to the idea of more complex instruction sets• Each does more• Fewer instructions

Impact of Clock Speed

• Execution Time =Number of Clock Cycles × Clock Cycle Time

same as

Execution Time =Number of Clock Cycles / Clock Frequency

Putting everything together

• Execution Time =Number of Instructions ×Number of Clock Cycles per Instruction ×Clock Cycle Time

• Gives us three ways to reduce program execution time

1. Using fewer instructions

• VAX– Super minicomputer designed in late 70’s– Had a complicated instruction set (CISC)– Idea was to use more powerful instructions in

order to reduce the number of instructions used to perform most frequent tasks

– Poor pipelining performance

2. Using a faster clock

• Major reason for explosion of CPU performance in the 80’s and 90’s– IBM PC (1981):

Intel 8088 @ 4.77 MHz– IBM PC AT (1984):

Intel 80286 @ 6 and 8 MHz – Nowadays up to 3 GHz

• Cannot get much higher!

3. Using better instructions

• Best strategy is to reduce the average number of clock cycles per instruction– Privileging fast instructions– Using fixed-size instructions to allow

pipelining– Trying to execute as many tasks as possible

in parallel

Amdahl’s Law (I)

• Examples:– Supersonic jet

• Could fly from Houston to Washington in thirty minutes

• Total travel time would be dominated by travel time to airport and check in procedures

– Today's laptops:• Disk access times are the bottleneck

Amdahl’s Law (II)

• Assume that we have a technique for improving the performance of some part of a system.

• Let – To be the time originally spent in the part of

the system that can be improved– Ti be the time spent in that part once the

improvement has been applied– Tn be the time spent in in the part of the

system that remains unaffected

Amdahl’s Law (III)

• The total speedup for the whole system will be

• The maximum possible speedup when Ti 0

in

on

TTTT

Speedup

n

on

TTT Speedup

An example

• Flying to Washington National Airport takes three hours

• Going to the airport and waiting for the flight takes a minimum of two hours

• Going from the airport to Washington downtown takes a minimum of 30 minutes

• What is the maximum speedup that could be achieved using much faster planes?

5h30 / 2h30 = 2.2

Answer

• Current travel time:– To airport and wait: 2 hours– Plane: 3 hours– To downtown by DC metro: 30 minutes– Total: 5 hours 30 minutes

Answer

• Assume plane travels at speed of light:– To airport and wait: 2 hours– Plane: negligible– To downtown by DC metro: 30 minutes– Total: 2 hours 30 minutes

• Maximum speedup would be

5h30 / 2h30 = 2.2

Train and busses

• Commuter trains and city busses spend significant amount of trip time debarking and embarking travelers– Have wide doors

• Not true for Amtrak train and intercity buses– Fewer narrower doors

Train and busses

A problem

• Assume we have a technique to improve the speed of floating-point operations by 20 percent

• What will be the overall CPU speedup if we expect it to spend 10 percent of its time executing floating point operations?

• How would that speedup be affected if the CPU spends 30 percent of its time executing floating point operations?

Solution (I)

• First case:– Baseline time = 0.9 × 1 + 0.1 × 1 = 1– After improvement = 0.9 × 1 + 0.1 × 0.8

= 0.98– Speedup = 1/0.98 = 1.02

• A 2 percent improvement!

Solution (II)

• Second case:– Baseline time = 0.7 × 1 + 0.3 × 1 = 1– After improvement = 0.7 × 1 + 0.7 × 0.8

= 0.94– Speedup = 1/0.94 = 1.064

• A 6.4 percent improvement!

REVIEW PROBLEMS

Problem

• Consider a huge program that consists of a purely sequential part that takes two hours and another part that takes eight hours.

What is the maximum speedup we can achieve by parallelizing the second part of the program?

Answer

• Current run time:– Sequential part: 2 hours– Other part: 8 hours– Total: 10 hours

• Minimum run time:– Sequential part: 2 hours– Other part: negligible– Total: 2 hours

Answer

• Current run time:– Sequential part: 2 hours– Other part: 8 hours– Total: 10 hours

• Minimum run time:– Sequential part: 2 hours– Other part: negligible– Total: 2 hours

Maximumspeed up10/2 = 5

Problem

• Server motherboard A has a SPEC CPU2006 rating of 31.4 while server motherboard B has a rating of 29.7. Which one of the two motherboards is faster?

Answer

• Server motherboard A has a SPEC CPU2006 rating of 31.4 while server motherboard B has a rating of 29.7. Which one of the two motherboards is faster?

• Motherboard A because a higher SPEC value is better

Fun problem

• Shanghai maglev train runs at 268 mph • How does it compare to airplane for going

between Houston and Washington, DC?

Fun answer

• Current travel time:– To airport and wait: 2 hours– Plane: 3 hours– To downtown by DC metro: 30 minutes– Total: 5 hours 30 minutes

• With maglev:– To station: 1 hour– Train to downtown DC: 6 hours 30 minutes– Total: 7 hours 30 minutes

Fun answer

• Current travel time:– To airport and wait: 2 hours– Plane: 3 hours– To downtown by DC metro: 30 minutes– Total: 5 hours 30 minutes

• With maglev:– To station: one hour– Train to downtown DC: 6 hours 30 minutes – Total: 7 hours 30 minutes

Plane is still fasterfor very long trips