Random Access Memory and Problems

8/14/2019 Random Access Memory and Problems

1/89

Problem 1 Interleaving factor

What is the optimal interleaving factor if the memorycycle is 8 CC (1+6+1)?

Assume 4 banks

1stRead

Request

2nd Read

Request

b1 b2 b3 b4

3rd Read

Request

X

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

3rd ReadRequest


2/89

Problem 2 (page 452 in yourtext book)

Block size = 1 word

Memory bus size = 1 word

Miss rate = 3%

Memory access per instruction = 1.2

Cache miss Penalty = 64 CC Avg Cycles per instruction = 2

SimpleMemory

CPUCache 64 CC

Assume 1000 instructions in your Program

If no miss then execution time is 2000 CC

. , -> One instruction needs 1 2 memory accesses 1000 instruction 1200. %accesses If miss rate is 3 then number of misses for 1200.accesses is 36

the execution time is= + =2000 36x64 4304 CC

= / = .Average cycles per instruction 4304 1000 4 3


3/89

Problem 2 (wider bus 2words)

Block size = 4 word


Miss rate = 2.5%


Cache miss Penalty = 128 CC Avg Cycles per instruction = 2

InterleavedMemory

CPUCache 64 CC



. , -> One instruction needs 1 2 memory accesses 1000 instruction 1200. . %accesses If miss rate is 2 5 then number of misses for 1200.accesses is 30

the execution time is= + =2000 30x128 5840 CC



4/89

Problem 2 (2-wayinterleaving)

Block size = 4 word


Miss rate = 2.5%


Cache miss Penalty = 68x2 CC Avg Cycles per instruction = 2

-InterleavedMemory

CPUCache 64 CC



. , -> One instruction needs 1 2 memory accesses 1000 instruction 1200. . %accesses If miss rate is 2 5 then number of misses for 1200.accesses is 30

the execution time is= + =2000 30x68x2 6000 CC



5/89

DRAM (Dynamic RandomAccess Memory)

Address Multiplexing

RAS and CAS

Dynamic refreshing and implications Data to be written after a READ!

Amdahls rule of thumb

1000 MIPS should have 1000 Mmemory

Read your text book for DRAM, SRAM,RAMBUS, and Flash.


6/89

SRAM (Static RandomAccess Memory)

Uses six transistor per bit.

Static

NO NEED to write data after a READ

Comparison versus DRAM More transistors

Less memory to 1/8

Expensive 8 - 10 times

No Refresh Faster 8-16 times

Used in Cache instead of main memory


7/89

Flash

Similar to EEPROM

Ideal choice for embedded systems Low power requirement, no moving parts

Faster READ time, slower WRITE time Comparison versus EEPROM

Finer and simpler control

Better capacity

WRITE operation entire block to beerased and replaced. Writes are 50 to 100 times slower than

READ

NOR type flash and NAND type flash


8/89

Virtual Memory

0 A

4K B

8K C

12K D

0

4K C

8K

12K

16K A

20K

24K B

28K

D

Virtual Memory

Physical

Main MemoryDisk

VirtualAddress

Physical

Address


9/89

Virtual Versus First LevelCache

Block /Page S ize 16-128 B 4K-64K B

Hit Time 1-3 CC 50-150 CC

Miss Penalty 8-150 CC 1-10 M CC

Access Tim e 6-130 CC 0.8-8 M CCTransferTime 2-20 CC 0.2-2 M CC

Miss Rate 0.1-10% .00001 to 0.001%

Address M apping

25-45 Phys ical

Address to 14-20

Cache A ddress

32-64 Virtual Addres

to 25-45 Phy sical

Address


10/89

Memory HierarchyQuestions

Where can a block be placed?

How is a block found if it is in?

Which block should be replaced? What happens on a write?


11/89

Segmentation versus Paging

Remark Page Segment

Words per address One Two (Segment and offset)

Programmer visible Invisible Visible to programmer

Replacing a block Trivial HardMemory use inefficiency Internal fragmentation External fragmentation

Efficient disk traffic Yes (easy tune the page size) Not always (small segments)

Code Data

Paging

Segmentation


12/89

CPU Execution TimeCPU Execution Time = (CPU Cycle time + Stalled Cycle) X

Cycle Time

Stalled Cycle = misses x penalty

Misses given either as misses/1000instruction or misses/memory-access AKA

miss rate. Instruction Count , Cycles per Instruction,

Miss are also required to compute CPUexecution time.


13/89

Average Access Time withCache

Average Access Time= Hit Time + Miss Rate X Penalty

Multi-level Cache Avg Access Time = Hit TimeL1+

Miss RateL1X PenaltyL1

PenaltyL1 = Hit TimeL2+ Miss

RateL2X PenaltyL2


14/89

I/O Systems

Objective

To understand mainly disk storagetechnologies.


15/89

Storage Systems

Individual Disks

SCSI Subsystem

RAID Storage Area Networks (NAS & SAN)


16/89

Disk Storage

Disk storage slowerthan Memory.

Disk offers bettervolume at a lowerunit price.

Nicely fit into theVirtual Memoryconcept.

Critical piece of the

performance

Courtesy:www.shortcourses.com/ choosing/storage/06.htm
http://www.shortcourses.com/choosing/storage/06.htmhttp://www.shortcourses.com/choosing/storage/06.htm


17/89

Disk System

Disk pack & Cache

Disk Embedded Controller

SCSI Host Adapter OS (kernel/driver)

User Program

HostSCSI

AdapterControll

er

Controll

er

Host I/O Bus

SCSI Bus

disk disk

T


18/89

Individual Disks

Reference:http://www.stjulians.com/cs/diskstoragenotes.html


19/89

Components of Disk Access Time

Seek Time

Rotational Latency

Internal Transfer Time Other Delays

That is,

Avg Access Time = Avg Seek Time + Avg Rotational Delay +

Transfer Time + Other overhead


20/89

Problem (page 684) Seek Time = 5 ms/100 tracks RPM = 10000 RPM

Transfer rate 40 MB/sec

Other Delays = 0.1 ms

Sector Size = 2 KB

Average Access Time = Average Seek Time (5ms) + Average Rotational Delay (time for 1/2 revolution) + Transfer Time (512/(40x106)) + Other overhead (0.1 ms)

= 5 + 103

x60/(2x10000) + 512x103

/(40x106

) + 0.1 ms = 5 + 3 + 128/104 + 0.1 ms = 5 + 3 + 0.0128 + 0.1 =


21/89

RAID

RAID stands for Redundant Array of

Inexpensive/Individual Disks

Uses a number of little disks instead of one large

disk. Six+ Types of RAID (0-5).


22/89

Terminologies Mean Time Between Failure (MTBF)

Mean Time To Data Loss (MTTDL)

Mean Time To Data Inaccessability(MTTDI)


23/89


24/89

RAID-0 Performance

Throughput: best case - nearly n x singledisk value

Utilization: worst case nearly (1/ n) xsingle disk value

Data Reliability: (r)n , where ris thereliability of a disk, (r 1).

Sequential Access: Fast

Random Access: Multithreaded RandomAccess offers better performance.

When r = 0.8 and n = 2, reliability is 0.64


25/89

RAID-1 Mirroring Issue Addressed:RAID-0 Reliability

Problem.

Shadowing ormirroring is used.

Data is not lost whena disk goes down.

One or more diskscan be used tomirror primarydisk.

Writes are posted toprimary andshadowing disks.

Read from any ofthem.

01

02

03

01

02

03


26/89

RAID-1 Performance

Reliability is improved with mirroring: (1- (1-r)(1-r))

Example: when r is 0.8, the reliability of

RAID-1 is .96. Writes are more complex must be

committed to primary and all shadowing

disks. Writes much slower due to atomicity

requirement.

Expensive due to 1-to-1 redundancy.

RAID 0 1 St i i &


27/89

RAID 0+1 -Stripping &Mirroring

RAID 0 + 1

01

03

05

02

04

06

01

03

05

02

04

06

RAID 1

n disks n disks

RAID 0 RAID 0


28/89

Performance RAID-0+1

Let the Reliability of a RAID 0 sub-tree be R:

Then the reliability of RAID 1tree= 1 (1-R)

(1-R)

Reliability R is: R= r2(reliability of a single disk is r):

Throughput is same as RAID-0, however with 2 x n

disks Utilization is lower than RAID-0 due to mirroring

Write is marginally slower due to atomicity

When r = 0.9, R = 0.81, and R = 1 (0.19)2 = .96

RAID 1 0 Mi i &


29/89

RAID 1+0 - Mirroring &Stripping

RAID 1+0

RAID 0

RAID 1 RAID 1

01

03

05

01

03

05

02

04

06

02

04

06


30/89

Performance RAID-1+0

Let the Reliability of a RAID 1 sub-tree be R:

Then the reliability of RAID 0 tree= (R)2

Reliability R is:

R= 1-(1-r)2(reliability of a single disk is r):

Throughput is same as RAID-0, however with 2 x n

disks

Utilization is lower than RAID-0 due to mirroring Write is marginally slower due to its atomicity

When r = 0.9, R = 0.99, and R = (0.99)2 = .98

RAID 2 H i C d


31/89

RAID-2 Hamming CodeArrays

Low commercial interest due tocomplex nature of Hamming codecomputation.


32/89

RAID-3 Stripping with Parity

Single Bits

Or Words

P0

P1P2

Parity Disk

Stripe width

Logical

Storage

Physical

Storage

01 02 03 04 05 06 07 08 09 10 11 12

01

0509

02

0610

03

0711

04

0812


33/89

RAID-3 Operation

Based on the principle of reversible form of

parity computation.

Where Parity P = C0 C1

Cn-1 C

n

Missing Stripe Cm = P C0 C 1 Cm-1 C m+1 Cn-1 C n


34/89

RAID-3 Performance RAID-1s 1-to-1 redundancy issue is addressed by 1-for-n

Parity disk. Less expensive than RAID-1

Rest of the performance is similar to RAID-0.

This can withstand the failure of one of its disks.

Reliability = all the disks are working + exactly one failed = rn + nc1

rn-1 .(1-r)

When r = 0.9 and n = 5

= 0.95

+ 5 x 0.94

x (1- 0.9) = 0.6 + 5x0.66 x 0.1

= 0.6 + 0.33 = .93


35/89

RAID-4 Performance

Similar to RAID-3, but supports largerchunks.

Performance measures are similar to

RAID-3.


36/89

RAID-5 (Distributed Parity)

Parity Disk

Stripe width

Logical

Storage

Physical

Storage

01 02 03 04 05 06 07 08 09 10 11 12

01

P1

09

02

05

P2

03

06

10

04

07

11

Chunk

P0

08

12


37/89


38/89

Reading Assignment

Optical Disk

RAID study material

Worked out problems from the textbook: p 452, p 537, p 539, p561, p684, p 691, p 704

Amdahls Speedup


39/89

Amdahl s SpeedupMultiprocessors

Speedup = 1/(Fractione/Speedupe+(1Fractione ))

Speedupe - number of processors

Fractione - Fraction of the program that runs

parallely on SpeedupeAssumption: Either the program runs in fully parallel

(enhanced) mode making use of all the processors ornon-enhanced mode.


40/89

Multiprocessor Architectures Single Instruction Stream, Single Data Stream (SISD):

Single Instruction Stream, Multiple Data Stream (SIMD)

Multiple Instruction Streams, Single Data Stream (MISD)

Multiple Instruction Streams, Multiple Data Streams (MIMD)


41/89

Shared Memory versus


42/89

Shared Memory versusMessage Passing

No Shared Memory Message Passing1 Compatibility with well-understood

shared memory mechanism used in

centralized multi-processor systems.

Simpler hardware compared to

scalable shared memory

2 Simplify compiler design Communication is explicit. This

means it is easier to understand.

3 No need to learn a messaging protocol Improved modularity

4 Low communication overhead No need for expensive and complexsynchronization mechanisms similar

to the ones used in shared memory.

5 Caching to improve latency


43/89

Two types of Shared Memory architectures

Main Memory I/O Systems

Processor

CacheCache

Processor

CacheCache

Processor

CacheCache

Processor

CacheCache

Centralized Shared -Memory Architecture

Distributed -Shared -Memory Architecture

Processor+

Cache

Memory I/O

Processor+

Cache

Memory I/O

Processor+

Cache

Memory I/O

Interconnection Network

Symmetric versus


44/89

Symmetric versusDistributed Memory MP

SMP: Uses shared memoryfor Inter-process Communication

Advantage:

Close coupling due to shared memory

Sharing of data is faster between processors

Disadvantage

Scaling: Memory is a bottleneck

High and unpredictable Memory Latency

Distributed : Uses message passingfor Inter-process Communication

Advantage:

Low Memory Latency

Scaling: Scales better than SMP

Disadvantage

Control and management are more complex due to distributedmemory


45/89

Performance Metrics

Communication bandwidth

Communication Latency: Ideally latency is as low aspossible. Communication latency is

= Sender Overhead + Time of flight + Transmission Time + Receiver Overhead

Communication Latency Hiding


46/89

Cache Coherence Problem

Time Event

CacheContents for

CPU A

CacheContents for

CPU B

Memorycontents forLocation X

0 1

1 CPU A reads X 1 12 CPU B reads X 1 1 1

3 CPU A stores 0 in X 0 1 0


47/89

Cache Coherence

A Memory System is coherent if

1. A read by a processor to location X that follows a write by another processorto X returns the written value if the read and write are sufficientlyseparatedin time and no other writes to X occur between the twoaccesses. Defines coherent view of memory.

2.3. Write to the same location are serialized; that is two writes to the same

location by any two processors are seen in the same order by all

processors. For example, if the values 1 and then 2 are written to alocation, processors can never read the value of the location as 2 andthen later read it as 1.


48/89


49/89

Features supported by CoherentMultiprocessors

Migration: Shared data in a cache can be moved toanother cache directly (without going through the mainmemory). This is referred to as migration. It is used ina transparent fashion. It reduces both latency (going

to another cache every time the data item isaccessed) and precious memory bandwidth.

Replication: Cache also provides replication for shareddata items that are being simultaneously read, sincethe caches make a copy of the data item in the localcache. Replication reduces both latency of access andcontention for a read shared data item.


50/89

Migration and Replication


Processor

CacheCache

Processor

CacheCache

Processor

CacheCache

Processor

CacheCache

Centralized Shared-Memory Architecture (SMP)

cc cc cc cc

cc Cache Controller

Bus

Cache Coherent (CC)


51/89

Cache Coherent (CC)Protocols

CC protocol implement cache coherency

Two types:

Snooping (Replicated): There are multiple copies

of the sharing status. Every cache that has acopy of the data from a block of physicalmemory also has a copy of the sharing status ofthe block, and no centralized state is kept.

Directory based (logically centralized): There is

only one copy of the sharing status of a block ofphysical memory. All the processors use thisone copy. This copy could be in any of theparticipating processors.


52/89

Snooping Protocol


Processor

CacheCache

Processor

CacheCache

Processor

CacheCache

Processor

CacheCache

Centralized Shared -Memory Architecture (SMP)

cc cc cc cc

cc Cache Controller

Bus

:Tw o w ays

.1 W rite

In v a lid a tio n

.2 W rite B ro a d ca st


53/89


54/89


55/89

Reading Assignment

Section 7.4 (Reliability, Availability, &Dependability) in your text book

Page 554 and 555

Section 7.11 I/O Design attempt allthe 5 problems in this section.

Invalidation versus Write


56/89

Invalidation versus WriteDistribute

Multiple writes to the same data item with no intervening reads require multiple writebroadcast in an update protocol, but only one initial invalidation in a writeinvalidate protocol.

With multiword cache blocks, each word written in a cache block requires a writebroadcast in an update protocol, although only the first write to any word in the

block needs to generate an invalidate in an invalidation protocol.

An invalidation protocol works on cache blocks, while an update protocol must workon individual words.

The delay between writing a word in one processor and reading the written value inanother processor is usually less in a write update scheme, since written data areimmediately updated in the readers cache. By comparison, in an invalidationprotocol, the reader is invalidated first, then later reads the data and is stalled untila copy can be read and returned to the processor.


57/89

Use of Valid Shared and


58/89

Use of Valid, Shared, andDirty bits

Valid bit: Every time a block is loaded into a cache from memory, the tag for the blockis saved in cache and the valid bit is set to TRUE. A write update to the sameblock in a different processor may reset this valid bit due to write invalidate. Thus,when a cache block is accessed for READ or WRITE, tag should match AND thevalue of valid bit should be TRUE. If the tag matches but the valid bit is reset,then its a cache miss.

Shared bit: When a memory block is loaded into a cache block for the first time theshared bit is set to FALSE. When some other cache loads the same block, it isturned into TRUE. When this block is updated, write invalidate uses the value ofshared bit to decide whether to send write-invalidate message or not. If theshared bit is set then an invalidate message is sent, otherwise not.

Dirty bit: Dirty bit is set to FALSE when a block is loaded into a cache memory. It isset to TRUE when the block is updated the first time. When another processorwants to load this block, this block is migrated to the processor instead of loadingfrom the memory.

Summary of Snooping


59/89

Summary of SnoopingMechanism

Re que st S ourceState of a ddressca ch e blo ck Fun ctio n a nd Ex pl ana tion

Read h it p rocessor shared or exc lus iveRead data in cache

Read m iss processor invalid P lace read m iss on bus

Read m is s processor shared A ddres s conflic t m iss : plac e read m iss on bus

Read mis s proc es sor ex c lus ive A ddres s conflic t m is s: write bac k bloc k, then plac e read m is s

W rite hit processor exc lusive W rite data in cache

W rite hit processor shared P lace write m iss on bus

W rite m iss processor invalid P lace write m iss on bus

W rite m iss processor shared A ddres s conflic t m iss : plac e read m iss on bus

W rit e m is s proc es sor ex c lus ive A ddres s conflic t m is s: write bac k bloc k, then plac e read mis s

Read m iss bus shared No action; allow memory to service read miss

Read m iss bus exc lusive

Attem pt to share data: place cac he block on bus and change

shared

W rite m is s bus s hared A ttem pt to write s hared bloc k; invalidate the bloc k

W rite m iss bus exc lusive

Attem pt to write block that is ex clusive elsewhere: write back

block and m ake its state invalid


60/89

State Transition

In v a lid

Exclusive( / )read write

Shared( )read only

Cache State TransitionsBased on requests from CPU

CPU Write

CPU Read

CPU readhit

CPU read

miss

CPU Write hitCPU Read hit

CPU Write miss

CPU

read

miss

CPU

write

-rite back cache blocklace write miss on bus

lace read miss on bus

lace read miss on bus

Pl

aceread

misson b

us

Placewrit

e miss o

n bus

-

Write back

block

P la ce

w ri te

m is s

o n

b us


61/89

State Transition

In v a lid


Shared( )read only

Cache State TransitionsBased on requests from the bus

CPU Write

Write miss for thisblock

CPU readmiss

-

;

Write ba

ck block

abort

Memo

ry acces

s

W ri t

eb a

c kblo c

k

A bo r

tm em o

r yac

ces s

Read miss forthis block

Write miss for

this block


62/89

Some Terminologies Polling: A process periodically checks if there is a message that it

needs to handle. This method of awaiting a message is calledpolling. Polling reduces processor utilization.

Interrupt: A process is notified when a message arrives using built-ininterrupt register. Interrupt increases processor utilization in

comparison to polling. Synchronous: A process sends a message and waits for the

response to come before sending another message or carryingout other tasks. This is way of waiting is referred to assynchronous communication.

Asynchronous: A process sends a message and continues to carryout other tasks while the requested message is processed. This

is referred as asynchronous communication.

Communication


63/89

CommunicationInfrastructure

Multiprocessor Systems with sharedmemory can have two types ofcommunication infrastructure:

Shared Bus Interconnect

Interconnect

Shared Bus


64/89

Directory Based Protocol

Node

2

Directory

Nod

en

Directory

Directory

Node

1

Directory

Node

n-1

Interconnect


65/89

State of the Block

Shared: One or more processors have the block cached,and the value in memory is up to date.

Uncached: No processor has a copy of the block.

Exclusive: Exactly one processor has a copy of the

cache block, and it has written the block, so thememory copy is out of date. The processor is calledthe ownerof the block.


66/89

Local, Remote, Home Node

Local Node: This is the node whererequest originates

Home Node: This is the node where

the memory location and thedirectory entry of an an addressreside.

Remote Node: A remote node is thenode that has copy of a cache block


67/89


68/89

Uncached State Operation

Read-miss: The requesting processor is sentthe requested data from memory and therequestor is made the only sharing node. Thestate of the block is made shared.

Write-miss:The requesting processor is sent therequested data and becomes the sharingnode. The block is made exclusive to indicatethat the only valid copy is cached. Sharerindicates the identity of the owner.


69/89

Exclusive State Operation Read-miss: The owner processor is sent a data fetch message, which causes the

state of the block in the owners cache to transition to shared and causes theowner to send the data to the directory, which it is written to memory and sentback to the requesting processor. The identity of the requesting processor isadded to the set Sharer, which still contains the identity of the processor that wasthe owner (since it still has a readable copy).

Data write back: The owner processor is replacing the block and therefore must writeit back. This write back makes the memory copy up to date (the home directoryessentially becomes the owner), the block is now uncached and the Sharer isempty.

Write-miss: The block has a new owner. A message is sent to the old owner causing

the cache to invalidate the block and send the value to the directory, from which itis sent to the requesting processor, which becomes the new owner. Shareris set

to the identity of the new owner, and the state of the block remains exclusive.


70/89

Cache State Transition

U n c a c h e d


Shared( )read only

Write Miss

Read Miss

Read miss

-Data WriteBack

Write miss

re

admiss

Wr

ite M

iss

/etch invalidateata value Reply= { }hares P

ata value Reply+= { }hares P

ata Value Reply= { }hares P

Pl

aceread

misson b

us

;={

} ;

Invalidate

Shares

P

data

value re

ply

;

;

Fetch da

ta value

reply

+={ }

Shares

P

=

F et c

hS h

a res

D at

aV

al u

eRe p

ly

=

Sh

ar e

s

P

True Sharing Miss & False


71/89

ue S a g ss & a seSharing Miss

Time Processor P1 Processor P2

1 Write X1

2 Read X2

3 Write X1

4 Write X25 Read X2


72/89

Synchronization

We need a set of hardware primitiveswith the ability to atomically readand modify a memory location.

Example 1: Atomic


73/89

pExchange

Interchanges a value in a register fora value in memory

You could build a lock with this. If the

memory location contains 1 thenthe lock is on.

Physical Main Memory

Register

A 0 1


74/89

Other Operations

Test-and-Set

Fetch-and-Increment

Load Linked & Set


75/89

Conditional

Load Linked: Load linked is a primitiveoperation that loads the content of aspecified location into cache.

Set Conditional: Set conditional operationis related to Load Linked operation byoperating on the same memory location.This operation sets the location to a new

value only if the location contains thesame value read by Load Linked,otherwise it will not.


76/89

Operation of LL & SC

try: LL R2, 0(R1) ; load linkedDADDUI R3, R2, #1 ; increment

SC R3, 0(R1) ; Store conditional

BEQZ R3, try ; branch store fails

.Address of the memory location loaded by LL is kept in a register

.If there is an interrupt or invalidate then this register is cleared

. .SC checks if this register is zero.a If it is then it fails.b Otherwise it simply Stores the content of R3 in that memory address


77/89


78/89


79/89


80/89

Multiprocessor System

Uses Cache Coherence Bus Based Systems or Interconnect

Based System

Coherence system arbitrates writes(Invalidation/write distribute)

Thus, serializes writes!


81/89

Spin Locks

Locks that a processor continuouslytries to acquire, spinning around aloop until it succeeds.

It is used when a lock is to be for ashortamount of time.


82/89

Spin Lock implementation

lockit: LD R2, 0(R1) ;load of lock

BNEZ R2, lockit ;not available-spin

DADDUI R2, R0,#1 ;load locked value

EXCH R2, 0(R1) ;swap

BNEZ R2, lockit ;branch if lock wasn't 0


83/89

Cache Coherence stepsstep 1 P0 P1 P2

Coherence

state of lock Bus/Directory activity

1 Has lock

Spins, testing if

lock=0

Spins, testing if

lock=0 Shared None

2 Set lock to 0

Invalidate

received

Invalidate

received Exclusive (P0)

Write invalidate of lock

variable from P0

3 Cache Miss Cache Miss Shared

Bus/Directory services P2

Cache miss; write back from

P0

4 Waits fo bus Lock = 0 Shared Cache miss for P2 satisfied

5 Lock = 0

Executes swap,

gets cache miss Shared Cache miss for P1 satisfied

6

Executes swap,

gets cache miss

Completes

swap: returns 0

and sets Lock =

1 Exclusive (P2)


Cache miss; generates

invalidate

7

Swap completes

and returns 1,

and sets Lock =

1

Enter critical

section Exclusive (P1)


Cache miss; generates write

back

8

Spins, testing if

lock=0 None


84/89

Multi-threading

Multi-threading why and how?

Fine-grain

Almost round robin

High overhead (compared to coarse grain) Better throughput then coarse grain

Coarse-grain

Threads switched only on long stalls (L2

cache miss) Low overhead

Individual threads get better performance


85/89

Reading Assignment

Memory Consistency Notes (Answers)

Problem on page 596

An example that uses LL (LoadLinked) and Set Conditional (SC).


86/89

IETF Internet Engineering Task


87/89

Force

Loosely self-organized groups (working groups)of

Two types of documents: I-D and RFC

I-Ds life is shorter compared to RFCs life RFCs include proposed standards and standards

RFC Example: RFC 1213, RFC 2543

I-D -> Proposed Standard -> Standard


88/89

SIP Evolution

SIP I-D v1 97, v2 98

SIP RFC 2543 99

SIP RFC 3261 00, obsoletes RFC 2543

SIP working groups SIPPING, SIMPLE, PINT,SPIRITS,

SIPit SIP interoperability test

Predictor for SPEC92h k


89/89

0

5 0

1 0 0

1 5 0

2 0 0

2 5 0

3 0 0

compre

ss eqntott

espres

so gcc l

i

doduc ear

ydro2d mdijdp su2

cor

P r e d ic t e d

P ro f i l e b a

Benchmarks

B e n ch m a rk

Instructionsbetweenmispredictions

Date post:	30-May-2018
Category:	Documents
Upload:	ainugiri
View:	240 times
Download:	0 times

Random Access Memory and Problems

Documents