CS 230: Computer Organization and Assembly Language

CCMMLLCCMMLL

CS 230: Computer CS 230: Computer Organization and Organization and

Assembly LanguageAssembly LanguageAviral

ShrivastavaDepartment of Computer Science and

EngineeringSchool of Computing and Informatics

Arizona State University

Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB

CCMMLLCCMMLL

AnnouncementsAnnouncements• Alternate Project

– Due Today

• Real Examples

• Finals– Tuesday, Dec 08, 2009– Please come on time (You’ll need all the time)– Open book, notes, and internet– No communication with any other human

CCMMLLCCMMLL

Time, Time, TimeTime, Time, Time• Making a Single Cycle Implementation is very easy

– Difficulty and excitement is in making it fast

• Two fundamental methods to make Computers fast– Pipelining– Caches

Address Instruction

InstructionMemory

Write Data

Reg Addr

Reg Addr

Reg Addr

Register

File ALU

DataMemory

Address

Write Data

Read DataPC

Read Data

Read Data

CCMMLLCCMMLL

Effect of high memory Effect of high memory LatencyLatency

• Single Cycle Implementation– Cycle time becomes very large– Operation that do not need memory also slow down

Address Instruction

InstructionMemory

Write Data

Reg Addr

Reg Addr

Reg Addr

Register

File ALU

DataMemory

Address

Write Data

Read DataPC

Read Data

Read Data

CCMMLLCCMMLL

Effect of high memory Effect of high memory LatencyLatency

Address

Read Data(Instr. or Data)

Memory

PC

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

ALU

Write Data

IRM

DR

AB A

LU

ou

t

• Multi-cycle Implementation– Cycle time becomes long

• But– Can make memory access multi-cycle– Avoid penalty to instructions that do not use memory

CCMMLL

Effects of high memory Effects of high memory latencylatency

AL

U

RegIM DM Reg

• Pipelined Implementation− Cycle time becomes long

• But− Can make memory access multi-cycle

− Avoid penalty to instructions that do not use memory

− Can overlap execution of other instructions with a memory operation

CCMMLLCCMMLL

Kinds of Memory Kinds of Memory

CPU Registers 100s Bytes <10s ns

SRAM K Bytes 10-20 ns $.00003/bit

DRAM M Bytes 50ns-100ns $.00001/bit

Disk G Bytes ms 10-6 cents

Tape infinite sec-min

Flipflops

SRAM

DRAM

Disk

Tape

faster

larger

CCMMLLCCMMLL

MemoriesMemories

• CPU Registers, Latches– Flip flops: very fast, but very small

• SRAM – Static RAM– Very fast, Low Power, but small– Data is persistent, until there is power

• DRAM – Dynamic RAM– Very dense– Like a vanishing ink – data disappears with time– Need to refresh the contents

CCMMLLCCMMLL

Flip FlopsFlip Flops

• Fastest form of memory– Store data using

combinational logic components only

• SR, JK, T, D- flip flops

2/10/2009 CSE 420: Computer Architecture I9

CCMMLLCCMMLL

SRAM CellSRAM Cell

Computer Scientist View

b b’

Electrical Engineering View

CCMMLLCCMMLL

A 4-bit SRAMA 4-bit SRAM

Word

- +Wr Driver

SRAMCell

SRAMCell

SRAMCell

SRAMCell

- +Wr Driver

- +Wr Driver

- +Wr Driver

WrEn

Precharge

Din 0Din 1Din 2Din 3

CCMMLLCCMMLL

Sense Amp Sense Amp Sense Amp Sense Amp- +- + - + - +

A 16X4 Static RAM A 16X4 Static RAM (SRAM)(SRAM)

Word 0

Word 1

Word 15

- +Wr Driver

Ad

dre

ss D

ecod

er

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

SRAMCell

: : :

Dout 0Dout 1Dout 2

SRAMCell

SRAMCell

SRAMCell

:

Dout 3

- +Wr Driver

- +Wr Driver

- +Wr Driver

WrEn

Precharge

Din 0Din 1Din 2Din 3

A0

A1

A2

A3

CCMMLLCCMMLL

Dynamic RAM (DRAM)Dynamic RAM (DRAM)

• Value is stored in the capacitor– Discharges with time– Needs to be refreshed regularly

• Dummy read will recharge the capacitor

• Very high density– Newest technology is first tried

on DRAMs

• Intel became popular because of DRAM– Biggest vendor of DRAM

Word line

Pass transistor

Capacitor

Bit line

CCMMLLCCMMLL

Why Not Only DRAM?Why Not Only DRAM?

• Not large enough for some things– Backed up by storage (disk)– Virtual memory, paging, etc.– Will get back to this

• Not fast enough for processor accesses– Takes hundreds of cycles to return data– OK in very regular applications

• Can use SW pipelining, vectors– Not OK in most other applications

CCMMLLCCMMLL

Is there a problem with Is there a problem with DRAM?DRAM?

µProc60%/yr.(2X/1.5yr)

DRAM9%/yr.(2X/10yrs)1

10

100

1000

198

0198

1 198

3198

4198

5 198

6198

7198

8198

9199

0199

1 199

2199

3199

4199

5199

6199

7199

8 199

9200

0

DRAM

CPU198

2

Processor-MemoryPerformance Gap:grows 50% / year

Perf

orm

an

ce

Time

“Moore’s Law”

Processor-DRAM Memory Gap (latency)

CCMMLLCCMMLL

Memory Hierarchy Analogy: Memory Hierarchy Analogy: Library (1/2)Library (1/2)

• You’re writing a term paper (Anthropology) at a table in Hayden

• Hayden Library is equivalent to disk– essentially limitless capacity– very slow to retrieve a book

• Table is memory– smaller capacity: means you must return book when

table fills up– easier and faster to find a book there once you’ve

already retrieved it

CCMMLLCCMMLL

Memory Hierarchy Analogy: Memory Hierarchy Analogy: Library (2/2)Library (2/2)

• Open books on table are cache– smaller capacity: can have very few open books fit on

table; again, when table fills up, you must close a book

– much, much faster to retrieve data

• Illusion created: whole library open on the tabletop – Keep as many recently used books open on table as

possible since likely to use again– Also keep as many books on table as possible, since

faster than going to library

CCMMLLCCMMLL

Memory Hierarchy: GoalsMemory Hierarchy: Goals

• Fact: Large memories are slow, fast memories are small• How do we create a memory that gives the illusion of

being large, cheap and fast (most of the time)?

CCMMLLCCMMLL

Memory Hierarchy: Memory Hierarchy: InsightsInsights

• Temporal Locality (Locality in Time):=> Keep most recently accessed data items closer to

the processor

• Spatial Locality (Locality in Space):=> Move blocks consists of contiguous words to the

upper levels

Lower LevelMemoryUpper Level

MemoryTo Processor

From ProcessorBlk X

Blk Y

CCMMLLCCMMLL

Memory Hierarchy: Memory Hierarchy: SolutionSolution

CPU Registers100s Bytes<10s ns

CacheK Bytes10-100 ns1-0.1 cents/bit

Main MemoryM Bytes200ns- 500ns$.0001-.00001 cents /bitDiskG Bytes, 10 ms (10,000,000 ns)

10 - 10 cents/bit-5 -6

CapacityAccess TimeCost

Tapeinfinitesec-min10 -8

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

StagingXfer Unit

prog./compiler1-8 bytes

cache cntl8-128 bytes

OS4K-16K bytes

user/operatorMbytes

Upper Level

Lower Level

faster

Larger

Our currentfocus

CCMMLLCCMMLL

Memory Hierarchy: Memory Hierarchy: TerminologyTerminology

• Hit: data appears in some block in the upper level (Block X) – Hit Rate: fraction of memory accesses found in the upper level– Hit Time: Time to access the upper level which consists of

• RAM access time + Time to determine hit/miss

• Miss: data needs to be retrieve from a block in the lower level (Block Y)– Miss Rate = 1 - (Hit Rate)– Miss Penalty: Time to replace a block in the upper level

+ Time to deliver the block the processor – Hit Time << Miss Penalty

Lower LevelMemoryUpper Level

MemoryTo Processor

From ProcessorBlk X

Blk Y

CCMMLLCCMMLL

Memory Hierarchy: Show me Memory Hierarchy: Show me numbersnumbers

• Consider application− 30% instructions are load/stores

− Suppose memory latency = 100 cycles

− Time to execute 100 instructions = 70*1 + 30*100 = 3070 cycles

• Add a cache with latency 2 cycle− Suppose hit rate is 90%

− Time to execute 100 instructions= 70*1 + 27*2 + 3*100 = 70+54+300 = 424 cycles

CCMMLLCCMMLL

Yoda says…Yoda says…

You will find only what you bring in

Date post:	10-Jan-2016
Category:	Documents
Upload:	aminia
View:	17 times
Download:	1 times

CS 230: Computer Organization and Assembly Language

Documents