+ All Categories
Home > Documents > Random Access Memory and Problems

Random Access Memory and Problems

Date post: 30-May-2018
Category:
Upload: ainugiri
View: 240 times
Download: 0 times
Share this document with a friend

of 89

Transcript
  • 8/14/2019 Random Access Memory and Problems

    1/89

    Problem 1 Interleaving factor

    What is the optimal interleaving factor if the memorycycle is 8 CC (1+6+1)?

    Assume 4 banks

    1stRead

    Request

    2nd Read

    Request

    b1 b2 b3 b4

    3rd Read

    Request

    X

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    3rd ReadRequest

  • 8/14/2019 Random Access Memory and Problems

    2/89

    Problem 2 (page 452 in yourtext book)

    Block size = 1 word

    Memory bus size = 1 word

    Miss rate = 3%

    Memory access per instruction = 1.2

    Cache miss Penalty = 64 CC Avg Cycles per instruction = 2

    SimpleMemory

    CPUCache 64 CC

    Assume 1000 instructions in your Program

    If no miss then execution time is 2000 CC

    . , -> One instruction needs 1 2 memory accesses 1000 instruction 1200. %accesses If miss rate is 3 then number of misses for 1200.accesses is 36

    the execution time is= + =2000 36x64 4304 CC

    = / = .Average cycles per instruction 4304 1000 4 3

  • 8/14/2019 Random Access Memory and Problems

    3/89

    Problem 2 (wider bus 2words)

    Block size = 4 word

    Memory bus size = 2 word

    Miss rate = 2.5%

    Memory access per instruction = 1.2

    Cache miss Penalty = 128 CC Avg Cycles per instruction = 2

    InterleavedMemory

    CPUCache 64 CC

    Assume 1000 instructions in your Program

    If no miss then execution time is 2000 CC

    . , -> One instruction needs 1 2 memory accesses 1000 instruction 1200. . %accesses If miss rate is 2 5 then number of misses for 1200.accesses is 30

    the execution time is= + =2000 30x128 5840 CC

    = / = .Average cycles per instruction 5840 1000 5 84

  • 8/14/2019 Random Access Memory and Problems

    4/89

    Problem 2 (2-wayinterleaving)

    Block size = 4 word

    Memory bus size = 1 word

    Miss rate = 2.5%

    Memory access per instruction = 1.2

    Cache miss Penalty = 68x2 CC Avg Cycles per instruction = 2

    -InterleavedMemory

    CPUCache 64 CC

    Assume 1000 instructions in your Program

    If no miss then execution time is 2000 CC

    . , -> One instruction needs 1 2 memory accesses 1000 instruction 1200. . %accesses If miss rate is 2 5 then number of misses for 1200.accesses is 30

    the execution time is= + =2000 30x68x2 6000 CC

    = / = .Average cycles per instruction 6000 1000 6 0

  • 8/14/2019 Random Access Memory and Problems

    5/89

    DRAM (Dynamic RandomAccess Memory)

    Address Multiplexing

    RAS and CAS

    Dynamic refreshing and implications Data to be written after a READ!

    Amdahls rule of thumb

    1000 MIPS should have 1000 Mmemory

    Read your text book for DRAM, SRAM,RAMBUS, and Flash.

  • 8/14/2019 Random Access Memory and Problems

    6/89

    SRAM (Static RandomAccess Memory)

    Uses six transistor per bit.

    Static

    NO NEED to write data after a READ

    Comparison versus DRAM More transistors

    Less memory to 1/8

    Expensive 8 - 10 times

    No Refresh Faster 8-16 times

    Used in Cache instead of main memory

  • 8/14/2019 Random Access Memory and Problems

    7/89

    Flash

    Similar to EEPROM

    Ideal choice for embedded systems Low power requirement, no moving parts

    Faster READ time, slower WRITE time Comparison versus EEPROM

    Finer and simpler control

    Better capacity

    WRITE operation entire block to beerased and replaced. Writes are 50 to 100 times slower than

    READ

    NOR type flash and NAND type flash

  • 8/14/2019 Random Access Memory and Problems

    8/89

    Virtual Memory

    0 A

    4K B

    8K C

    12K D

    0

    4K C

    8K

    12K

    16K A

    20K

    24K B

    28K

    D

    Virtual Memory

    Physical

    Main MemoryDisk

    VirtualAddress

    Physical

    Address

  • 8/14/2019 Random Access Memory and Problems

    9/89

    Virtual Versus First LevelCache

    Block /Page S ize 16-128 B 4K-64K B

    Hit Time 1-3 CC 50-150 CC

    Miss Penalty 8-150 CC 1-10 M CC

    Access Tim e 6-130 CC 0.8-8 M CCTransferTime 2-20 CC 0.2-2 M CC

    Miss Rate 0.1-10% .00001 to 0.001%

    Address M apping

    25-45 Phys ical

    Address to 14-20

    Cache A ddress

    32-64 Virtual Addres

    to 25-45 Phy sical

    Address

  • 8/14/2019 Random Access Memory and Problems

    10/89

    Memory HierarchyQuestions

    Where can a block be placed?

    How is a block found if it is in?

    Which block should be replaced? What happens on a write?

  • 8/14/2019 Random Access Memory and Problems

    11/89

    Segmentation versus Paging

    Remark Page Segment

    Words per address One Two (Segment and offset)

    Programmer visible Invisible Visible to programmer

    Replacing a block Trivial HardMemory use inefficiency Internal fragmentation External fragmentation

    Efficient disk traffic Yes (easy tune the page size) Not always (small segments)

    Code Data

    Paging

    Segmentation

  • 8/14/2019 Random Access Memory and Problems

    12/89

    CPU Execution TimeCPU Execution Time = (CPU Cycle time + Stalled Cycle) X

    Cycle Time

    Stalled Cycle = misses x penalty

    Misses given either as misses/1000instruction or misses/memory-access AKA

    miss rate. Instruction Count , Cycles per Instruction,

    Miss are also required to compute CPUexecution time.

  • 8/14/2019 Random Access Memory and Problems

    13/89

    Average Access Time withCache

    Average Access Time= Hit Time + Miss Rate X Penalty

    Multi-level Cache Avg Access Time = Hit TimeL1+

    Miss RateL1X PenaltyL1

    PenaltyL1 = Hit TimeL2+ Miss

    RateL2X PenaltyL2

  • 8/14/2019 Random Access Memory and Problems

    14/89

    I/O Systems

    Objective

    To understand mainly disk storagetechnologies.

  • 8/14/2019 Random Access Memory and Problems

    15/89

    Storage Systems

    Individual Disks

    SCSI Subsystem

    RAID Storage Area Networks (NAS & SAN)

  • 8/14/2019 Random Access Memory and Problems

    16/89

    Disk Storage

    Disk storage slowerthan Memory.

    Disk offers bettervolume at a lowerunit price.

    Nicely fit into theVirtual Memoryconcept.

    Critical piece of the

    performance

    Courtesy:www.shortcourses.com/ choosing/storage/06.htm

    http://www.shortcourses.com/choosing/storage/06.htmhttp://www.shortcourses.com/choosing/storage/06.htm
  • 8/14/2019 Random Access Memory and Problems

    17/89

    Disk System

    Disk pack & Cache

    Disk Embedded Controller

    SCSI Host Adapter OS (kernel/driver)

    User Program

    HostSCSI

    AdapterControll

    er

    Controll

    er

    Host I/O Bus

    SCSI Bus

    disk disk

    T

  • 8/14/2019 Random Access Memory and Problems

    18/89

    Individual Disks

    Reference:http://www.stjulians.com/cs/diskstoragenotes.html

  • 8/14/2019 Random Access Memory and Problems

    19/89

    Components of Disk Access Time

    Seek Time

    Rotational Latency

    Internal Transfer Time Other Delays

    That is,

    Avg Access Time = Avg Seek Time + Avg Rotational Delay +

    Transfer Time + Other overhead

  • 8/14/2019 Random Access Memory and Problems

    20/89

    Problem (page 684) Seek Time = 5 ms/100 tracks RPM = 10000 RPM

    Transfer rate 40 MB/sec

    Other Delays = 0.1 ms

    Sector Size = 2 KB

    Average Access Time = Average Seek Time (5ms) + Average Rotational Delay (time for 1/2 revolution) + Transfer Time (512/(40x106)) + Other overhead (0.1 ms)

    = 5 + 103

    x60/(2x10000) + 512x103

    /(40x106

    ) + 0.1 ms = 5 + 3 + 128/104 + 0.1 ms = 5 + 3 + 0.0128 + 0.1 =

  • 8/14/2019 Random Access Memory and Problems

    21/89

    RAID

    RAID stands for Redundant Array of

    Inexpensive/Individual Disks

    Uses a number of little disks instead of one large

    disk. Six+ Types of RAID (0-5).

  • 8/14/2019 Random Access Memory and Problems

    22/89

    Terminologies Mean Time Between Failure (MTBF)

    Mean Time To Data Loss (MTTDL)

    Mean Time To Data Inaccessability(MTTDI)

  • 8/14/2019 Random Access Memory and Problems

    23/89

  • 8/14/2019 Random Access Memory and Problems

    24/89

    RAID-0 Performance

    Throughput: best case - nearly n x singledisk value

    Utilization: worst case nearly (1/ n) xsingle disk value

    Data Reliability: (r)n , where ris thereliability of a disk, (r 1).

    Sequential Access: Fast

    Random Access: Multithreaded RandomAccess offers better performance.

    When r = 0.8 and n = 2, reliability is 0.64

  • 8/14/2019 Random Access Memory and Problems

    25/89

    RAID-1 Mirroring Issue Addressed:RAID-0 Reliability

    Problem.

    Shadowing ormirroring is used.

    Data is not lost whena disk goes down.

    One or more diskscan be used tomirror primarydisk.

    Writes are posted toprimary andshadowing disks.

    Read from any ofthem.

    01

    02

    03

    01

    02

    03

  • 8/14/2019 Random Access Memory and Problems

    26/89

    RAID-1 Performance

    Reliability is improved with mirroring: (1- (1-r)(1-r))

    Example: when r is 0.8, the reliability of

    RAID-1 is .96. Writes are more complex must be

    committed to primary and all shadowing

    disks. Writes much slower due to atomicity

    requirement.

    Expensive due to 1-to-1 redundancy.

    RAID 0 1 St i i &

  • 8/14/2019 Random Access Memory and Problems

    27/89

    RAID 0+1 -Stripping &Mirroring

    RAID 0 + 1

    01

    03

    05

    02

    04

    06

    01

    03

    05

    02

    04

    06

    RAID 1

    n disks n disks

    RAID 0 RAID 0

  • 8/14/2019 Random Access Memory and Problems

    28/89

    Performance RAID-0+1

    Let the Reliability of a RAID 0 sub-tree be R:

    Then the reliability of RAID 1tree= 1 (1-R)

    (1-R)

    Reliability R is: R= r2(reliability of a single disk is r):

    Throughput is same as RAID-0, however with 2 x n

    disks Utilization is lower than RAID-0 due to mirroring

    Write is marginally slower due to atomicity

    When r = 0.9, R = 0.81, and R = 1 (0.19)2 = .96

    RAID 1 0 Mi i &

  • 8/14/2019 Random Access Memory and Problems

    29/89

    RAID 1+0 - Mirroring &Stripping

    RAID 1+0

    RAID 0

    RAID 1 RAID 1

    01

    03

    05

    01

    03

    05

    02

    04

    06

    02

    04

    06

  • 8/14/2019 Random Access Memory and Problems

    30/89

    Performance RAID-1+0

    Let the Reliability of a RAID 1 sub-tree be R:

    Then the reliability of RAID 0 tree= (R)2

    Reliability R is:

    R= 1-(1-r)2(reliability of a single disk is r):

    Throughput is same as RAID-0, however with 2 x n

    disks

    Utilization is lower than RAID-0 due to mirroring Write is marginally slower due to its atomicity

    When r = 0.9, R = 0.99, and R = (0.99)2 = .98

    RAID 2 H i C d

  • 8/14/2019 Random Access Memory and Problems

    31/89

    RAID-2 Hamming CodeArrays

    Low commercial interest due tocomplex nature of Hamming codecomputation.

  • 8/14/2019 Random Access Memory and Problems

    32/89

    RAID-3 Stripping with Parity

    Single Bits

    Or Words

    P0

    P1P2

    Parity Disk

    Stripe width

    Logical

    Storage

    Physical

    Storage

    01 02 03 04 05 06 07 08 09 10 11 12

    01

    0509

    02

    0610

    03

    0711

    04

    0812

  • 8/14/2019 Random Access Memory and Problems

    33/89

    RAID-3 Operation

    Based on the principle of reversible form of

    parity computation.

    Where Parity P = C0 C1

    Cn-1 C

    n

    Missing Stripe Cm = P C0 C 1 Cm-1 C m+1 Cn-1 C n

  • 8/14/2019 Random Access Memory and Problems

    34/89

    RAID-3 Performance RAID-1s 1-to-1 redundancy issue is addressed by 1-for-n

    Parity disk. Less expensive than RAID-1

    Rest of the performance is similar to RAID-0.

    This can withstand the failure of one of its disks.

    Reliability = all the disks are working + exactly one failed = rn + nc1

    rn-1 .(1-r)

    When r = 0.9 and n = 5

    = 0.95

    + 5 x 0.94

    x (1- 0.9) = 0.6 + 5x0.66 x 0.1

    = 0.6 + 0.33 = .93

  • 8/14/2019 Random Access Memory and Problems

    35/89

    RAID-4 Performance

    Similar to RAID-3, but supports largerchunks.

    Performance measures are similar to

    RAID-3.

  • 8/14/2019 Random Access Memory and Problems

    36/89

    RAID-5 (Distributed Parity)

    Parity Disk

    Stripe width

    Logical

    Storage

    Physical

    Storage

    01 02 03 04 05 06 07 08 09 10 11 12

    01

    P1

    09

    02

    05

    P2

    03

    06

    10

    04

    07

    11

    Chunk

    P0

    08

    12

  • 8/14/2019 Random Access Memory and Problems

    37/89

  • 8/14/2019 Random Access Memory and Problems

    38/89

    Reading Assignment

    Optical Disk

    RAID study material

    Worked out problems from the textbook: p 452, p 537, p 539, p561, p684, p 691, p 704

    Amdahls Speedup

  • 8/14/2019 Random Access Memory and Problems

    39/89

    Amdahl s SpeedupMultiprocessors

    Speedup = 1/(Fractione/Speedupe+(1Fractione ))

    Speedupe - number of processors

    Fractione - Fraction of the program that runs

    parallely on SpeedupeAssumption: Either the program runs in fully parallel

    (enhanced) mode making use of all the processors ornon-enhanced mode.

  • 8/14/2019 Random Access Memory and Problems

    40/89

    Multiprocessor Architectures Single Instruction Stream, Single Data Stream (SISD):

    Single Instruction Stream, Multiple Data Stream (SIMD)

    Multiple Instruction Streams, Single Data Stream (MISD)

    Multiple Instruction Streams, Multiple Data Streams (MIMD)

  • 8/14/2019 Random Access Memory and Problems

    41/89

    Shared Memory versus

  • 8/14/2019 Random Access Memory and Problems

    42/89

    Shared Memory versusMessage Passing

    No Shared Memory Message Passing1 Compatibility with well-understood

    shared memory mechanism used in

    centralized multi-processor systems.

    Simpler hardware compared to

    scalable shared memory

    2 Simplify compiler design Communication is explicit. This

    means it is easier to understand.

    3 No need to learn a messaging protocol Improved modularity

    4 Low communication overhead No need for expensive and complexsynchronization mechanisms similar

    to the ones used in shared memory.

    5 Caching to improve latency

  • 8/14/2019 Random Access Memory and Problems

    43/89

    Two types of Shared Memory architectures

    Main Memory I/O Systems

    Processor

    CacheCache

    Processor

    CacheCache

    Processor

    CacheCache

    Processor

    CacheCache

    Centralized Shared -Memory Architecture

    Distributed -Shared -Memory Architecture

    Processor+

    Cache

    Memory I/O

    Processor+

    Cache

    Memory I/O

    Processor+

    Cache

    Memory I/O

    Interconnection Network

    Symmetric versus

  • 8/14/2019 Random Access Memory and Problems

    44/89

    Symmetric versusDistributed Memory MP

    SMP: Uses shared memoryfor Inter-process Communication

    Advantage:

    Close coupling due to shared memory

    Sharing of data is faster between processors

    Disadvantage

    Scaling: Memory is a bottleneck

    High and unpredictable Memory Latency

    Distributed : Uses message passingfor Inter-process Communication

    Advantage:

    Low Memory Latency

    Scaling: Scales better than SMP

    Disadvantage

    Control and management are more complex due to distributedmemory

  • 8/14/2019 Random Access Memory and Problems

    45/89

    Performance Metrics

    Communication bandwidth

    Communication Latency: Ideally latency is as low aspossible. Communication latency is

    = Sender Overhead + Time of flight + Transmission Time + Receiver Overhead

    Communication Latency Hiding

  • 8/14/2019 Random Access Memory and Problems

    46/89

    Cache Coherence Problem

    Time Event

    CacheContents for

    CPU A

    CacheContents for

    CPU B

    Memorycontents forLocation X

    0 1

    1 CPU A reads X 1 12 CPU B reads X 1 1 1

    3 CPU A stores 0 in X 0 1 0

  • 8/14/2019 Random Access Memory and Problems

    47/89

    Cache Coherence

    A Memory System is coherent if

    1. A read by a processor to location X that follows a write by another processorto X returns the written value if the read and write are sufficientlyseparatedin time and no other writes to X occur between the twoaccesses. Defines coherent view of memory.

    2.3. Write to the same location are serialized; that is two writes to the same

    location by any two processors are seen in the same order by all

    processors. For example, if the values 1 and then 2 are written to alocation, processors can never read the value of the location as 2 andthen later read it as 1.

  • 8/14/2019 Random Access Memory and Problems

    48/89

  • 8/14/2019 Random Access Memory and Problems

    49/89

    Features supported by CoherentMultiprocessors

    Migration: Shared data in a cache can be moved toanother cache directly (without going through the mainmemory). This is referred to as migration. It is used ina transparent fashion. It reduces both latency (going

    to another cache every time the data item isaccessed) and precious memory bandwidth.

    Replication: Cache also provides replication for shareddata items that are being simultaneously read, sincethe caches make a copy of the data item in the localcache. Replication reduces both latency of access andcontention for a read shared data item.

  • 8/14/2019 Random Access Memory and Problems

    50/89

    Migration and Replication

    Main Memory I/O Systems

    Processor

    CacheCache

    Processor

    CacheCache

    Processor

    CacheCache

    Processor

    CacheCache

    Centralized Shared-Memory Architecture (SMP)

    cc cc cc cc

    cc Cache Controller

    Bus

    Cache Coherent (CC)

  • 8/14/2019 Random Access Memory and Problems

    51/89

    Cache Coherent (CC)Protocols

    CC protocol implement cache coherency

    Two types:

    Snooping (Replicated): There are multiple copies

    of the sharing status. Every cache that has acopy of the data from a block of physicalmemory also has a copy of the sharing status ofthe block, and no centralized state is kept.

    Directory based (logically centralized): There is

    only one copy of the sharing status of a block ofphysical memory. All the processors use thisone copy. This copy could be in any of theparticipating processors.

  • 8/14/2019 Random Access Memory and Problems

    52/89

    Snooping Protocol

    Main Memory I/O Systems

    Processor

    CacheCache

    Processor

    CacheCache

    Processor

    CacheCache

    Processor

    CacheCache

    Centralized Shared -Memory Architecture (SMP)

    cc cc cc cc

    cc Cache Controller

    Bus

    :Tw o w ays

    .1 W rite

    In v a lid a tio n

    .2 W rite B ro a d ca st

  • 8/14/2019 Random Access Memory and Problems

    53/89

  • 8/14/2019 Random Access Memory and Problems

    54/89

  • 8/14/2019 Random Access Memory and Problems

    55/89

    Reading Assignment

    Section 7.4 (Reliability, Availability, &Dependability) in your text book

    Page 554 and 555

    Section 7.11 I/O Design attempt allthe 5 problems in this section.

    Invalidation versus Write

  • 8/14/2019 Random Access Memory and Problems

    56/89

    Invalidation versus WriteDistribute

    Multiple writes to the same data item with no intervening reads require multiple writebroadcast in an update protocol, but only one initial invalidation in a writeinvalidate protocol.

    With multiword cache blocks, each word written in a cache block requires a writebroadcast in an update protocol, although only the first write to any word in the

    block needs to generate an invalidate in an invalidation protocol.

    An invalidation protocol works on cache blocks, while an update protocol must workon individual words.

    The delay between writing a word in one processor and reading the written value inanother processor is usually less in a write update scheme, since written data areimmediately updated in the readers cache. By comparison, in an invalidationprotocol, the reader is invalidated first, then later reads the data and is stalled untila copy can be read and returned to the processor.

  • 8/14/2019 Random Access Memory and Problems

    57/89

    Use of Valid Shared and

  • 8/14/2019 Random Access Memory and Problems

    58/89

    Use of Valid, Shared, andDirty bits

    Valid bit: Every time a block is loaded into a cache from memory, the tag for the blockis saved in cache and the valid bit is set to TRUE. A write update to the sameblock in a different processor may reset this valid bit due to write invalidate. Thus,when a cache block is accessed for READ or WRITE, tag should match AND thevalue of valid bit should be TRUE. If the tag matches but the valid bit is reset,then its a cache miss.

    Shared bit: When a memory block is loaded into a cache block for the first time theshared bit is set to FALSE. When some other cache loads the same block, it isturned into TRUE. When this block is updated, write invalidate uses the value ofshared bit to decide whether to send write-invalidate message or not. If theshared bit is set then an invalidate message is sent, otherwise not.

    Dirty bit: Dirty bit is set to FALSE when a block is loaded into a cache memory. It isset to TRUE when the block is updated the first time. When another processorwants to load this block, this block is migrated to the processor instead of loadingfrom the memory.

    Summary of Snooping

  • 8/14/2019 Random Access Memory and Problems

    59/89

    Summary of SnoopingMechanism

    Re que st S ourceState of a ddressca ch e blo ck Fun ctio n a nd Ex pl ana tion

    Read h it p rocessor shared or exc lus iveRead data in cache

    Read m iss processor invalid P lace read m iss on bus

    Read m is s processor shared A ddres s conflic t m iss : plac e read m iss on bus

    Read mis s proc es sor ex c lus ive A ddres s conflic t m is s: write bac k bloc k, then plac e read m is s

    W rite hit processor exc lusive W rite data in cache

    W rite hit processor shared P lace write m iss on bus

    W rite m iss processor invalid P lace write m iss on bus

    W rite m iss processor shared A ddres s conflic t m iss : plac e read m iss on bus

    W rit e m is s proc es sor ex c lus ive A ddres s conflic t m is s: write bac k bloc k, then plac e read mis s

    Read m iss bus shared No action; allow memory to service read miss

    Read m iss bus exc lusive

    Attem pt to share data: place cac he block on bus and change

    shared

    W rite m is s bus s hared A ttem pt to write s hared bloc k; invalidate the bloc k

    W rite m iss bus exc lusive

    Attem pt to write block that is ex clusive elsewhere: write back

    block and m ake its state invalid

  • 8/14/2019 Random Access Memory and Problems

    60/89

    State Transition

    In v a lid

    Exclusive( / )read write

    Shared( )read only

    Cache State TransitionsBased on requests from CPU

    CPU Write

    CPU Read

    CPU readhit

    CPU read

    miss

    CPU Write hitCPU Read hit

    CPU Write miss

    CPU

    read

    miss

    CPU

    write

    -rite back cache blocklace write miss on bus

    lace read miss on bus

    lace read miss on bus

    Pl

    aceread

    misson b

    us

    Placewrit

    e miss o

    n bus

    -

    Write back

    block

    P la ce

    w ri te

    m is s

    o n

    b us

  • 8/14/2019 Random Access Memory and Problems

    61/89

    State Transition

    In v a lid

    Exclusive( / )read write

    Shared( )read only

    Cache State TransitionsBased on requests from the bus

    CPU Write

    Write miss for thisblock

    CPU readmiss

    -

    ;

    Write ba

    ck block

    abort

    Memo

    ry acces

    s

    W ri t

    eb a

    c kblo c

    k

    A bo r

    tm em o

    r yac

    ces s

    Read miss forthis block

    Write miss for

    this block

  • 8/14/2019 Random Access Memory and Problems

    62/89

    Some Terminologies Polling: A process periodically checks if there is a message that it

    needs to handle. This method of awaiting a message is calledpolling. Polling reduces processor utilization.

    Interrupt: A process is notified when a message arrives using built-ininterrupt register. Interrupt increases processor utilization in

    comparison to polling. Synchronous: A process sends a message and waits for the

    response to come before sending another message or carryingout other tasks. This is way of waiting is referred to assynchronous communication.

    Asynchronous: A process sends a message and continues to carryout other tasks while the requested message is processed. This

    is referred as asynchronous communication.

    Communication

  • 8/14/2019 Random Access Memory and Problems

    63/89

    CommunicationInfrastructure

    Multiprocessor Systems with sharedmemory can have two types ofcommunication infrastructure:

    Shared Bus Interconnect

    Interconnect

    Shared Bus

  • 8/14/2019 Random Access Memory and Problems

    64/89

    Directory Based Protocol

    Node

    2

    Directory

    Nod

    en

    Directory

    Directory

    Node

    1

    Directory

    Node

    n-1

    Interconnect

  • 8/14/2019 Random Access Memory and Problems

    65/89

    State of the Block

    Shared: One or more processors have the block cached,and the value in memory is up to date.

    Uncached: No processor has a copy of the block.

    Exclusive: Exactly one processor has a copy of the

    cache block, and it has written the block, so thememory copy is out of date. The processor is calledthe ownerof the block.

  • 8/14/2019 Random Access Memory and Problems

    66/89

    Local, Remote, Home Node

    Local Node: This is the node whererequest originates

    Home Node: This is the node where

    the memory location and thedirectory entry of an an addressreside.

    Remote Node: A remote node is thenode that has copy of a cache block

  • 8/14/2019 Random Access Memory and Problems

    67/89

  • 8/14/2019 Random Access Memory and Problems

    68/89

    Uncached State Operation

    Read-miss: The requesting processor is sentthe requested data from memory and therequestor is made the only sharing node. Thestate of the block is made shared.

    Write-miss:The requesting processor is sent therequested data and becomes the sharingnode. The block is made exclusive to indicatethat the only valid copy is cached. Sharerindicates the identity of the owner.

  • 8/14/2019 Random Access Memory and Problems

    69/89

    Exclusive State Operation Read-miss: The owner processor is sent a data fetch message, which causes the

    state of the block in the owners cache to transition to shared and causes theowner to send the data to the directory, which it is written to memory and sentback to the requesting processor. The identity of the requesting processor isadded to the set Sharer, which still contains the identity of the processor that wasthe owner (since it still has a readable copy).

    Data write back: The owner processor is replacing the block and therefore must writeit back. This write back makes the memory copy up to date (the home directoryessentially becomes the owner), the block is now uncached and the Sharer isempty.

    Write-miss: The block has a new owner. A message is sent to the old owner causing

    the cache to invalidate the block and send the value to the directory, from which itis sent to the requesting processor, which becomes the new owner. Shareris set

    to the identity of the new owner, and the state of the block remains exclusive.

  • 8/14/2019 Random Access Memory and Problems

    70/89

    Cache State Transition

    U n c a c h e d

    Exclusive( / )read write

    Shared( )read only

    Write Miss

    Read Miss

    Read miss

    -Data WriteBack

    Write miss

    re

    admiss

    Wr

    ite M

    iss

    /etch invalidateata value Reply= { }hares P

    ata value Reply+= { }hares P

    ata Value Reply= { }hares P

    Pl

    aceread

    misson b

    us

    ;={

    } ;

    Invalidate

    Shares

    P

    data

    value re

    ply

    ;

    ;

    Fetch da

    ta value

    reply

    +={ }

    Shares

    P

    =

    F et c

    hS h

    a res

    D at

    aV

    al u

    eRe p

    ly

    =

    Sh

    ar e

    s

    P

    True Sharing Miss & False

  • 8/14/2019 Random Access Memory and Problems

    71/89

    ue S a g ss & a seSharing Miss

    Time Processor P1 Processor P2

    1 Write X1

    2 Read X2

    3 Write X1

    4 Write X25 Read X2

  • 8/14/2019 Random Access Memory and Problems

    72/89

    Synchronization

    We need a set of hardware primitiveswith the ability to atomically readand modify a memory location.

    Example 1: Atomic

  • 8/14/2019 Random Access Memory and Problems

    73/89

    pExchange

    Interchanges a value in a register fora value in memory

    You could build a lock with this. If the

    memory location contains 1 thenthe lock is on.

    Physical Main Memory

    Register

    A 0 1

  • 8/14/2019 Random Access Memory and Problems

    74/89

    Other Operations

    Test-and-Set

    Fetch-and-Increment

    Load Linked & Set

  • 8/14/2019 Random Access Memory and Problems

    75/89

    Conditional

    Load Linked: Load linked is a primitiveoperation that loads the content of aspecified location into cache.

    Set Conditional: Set conditional operationis related to Load Linked operation byoperating on the same memory location.This operation sets the location to a new

    value only if the location contains thesame value read by Load Linked,otherwise it will not.

  • 8/14/2019 Random Access Memory and Problems

    76/89

    Operation of LL & SC

    try: LL R2, 0(R1) ; load linkedDADDUI R3, R2, #1 ; increment

    SC R3, 0(R1) ; Store conditional

    BEQZ R3, try ; branch store fails

    .Address of the memory location loaded by LL is kept in a register

    .If there is an interrupt or invalidate then this register is cleared

    . .SC checks if this register is zero.a If it is then it fails.b Otherwise it simply Stores the content of R3 in that memory address

  • 8/14/2019 Random Access Memory and Problems

    77/89

  • 8/14/2019 Random Access Memory and Problems

    78/89

  • 8/14/2019 Random Access Memory and Problems

    79/89

  • 8/14/2019 Random Access Memory and Problems

    80/89

    Multiprocessor System

    Uses Cache Coherence Bus Based Systems or Interconnect

    Based System

    Coherence system arbitrates writes(Invalidation/write distribute)

    Thus, serializes writes!

  • 8/14/2019 Random Access Memory and Problems

    81/89

    Spin Locks

    Locks that a processor continuouslytries to acquire, spinning around aloop until it succeeds.

    It is used when a lock is to be for ashortamount of time.

  • 8/14/2019 Random Access Memory and Problems

    82/89

    Spin Lock implementation

    lockit: LD R2, 0(R1) ;load of lock

    BNEZ R2, lockit ;not available-spin

    DADDUI R2, R0,#1 ;load locked value

    EXCH R2, 0(R1) ;swap

    BNEZ R2, lockit ;branch if lock wasn't 0

  • 8/14/2019 Random Access Memory and Problems

    83/89

    Cache Coherence stepsstep 1 P0 P1 P2

    Coherence

    state of lock Bus/Directory activity

    1 Has lock

    Spins, testing if

    lock=0

    Spins, testing if

    lock=0 Shared None

    2 Set lock to 0

    Invalidate

    received

    Invalidate

    received Exclusive (P0)

    Write invalidate of lock

    variable from P0

    3 Cache Miss Cache Miss Shared

    Bus/Directory services P2

    Cache miss; write back from

    P0

    4 Waits fo bus Lock = 0 Shared Cache miss for P2 satisfied

    5 Lock = 0

    Executes swap,

    gets cache miss Shared Cache miss for P1 satisfied

    6

    Executes swap,

    gets cache miss

    Completes

    swap: returns 0

    and sets Lock =

    1 Exclusive (P2)

    Bus/Directory services P2

    Cache miss; generates

    invalidate

    7

    Swap completes

    and returns 1,

    and sets Lock =

    1

    Enter critical

    section Exclusive (P1)

    Bus/Directory services P1

    Cache miss; generates write

    back

    8

    Spins, testing if

    lock=0 None

  • 8/14/2019 Random Access Memory and Problems

    84/89

    Multi-threading

    Multi-threading why and how?

    Fine-grain

    Almost round robin

    High overhead (compared to coarse grain) Better throughput then coarse grain

    Coarse-grain

    Threads switched only on long stalls (L2

    cache miss) Low overhead

    Individual threads get better performance

  • 8/14/2019 Random Access Memory and Problems

    85/89

    Reading Assignment

    Memory Consistency Notes (Answers)

    Problem on page 596

    An example that uses LL (LoadLinked) and Set Conditional (SC).

  • 8/14/2019 Random Access Memory and Problems

    86/89

    IETF Internet Engineering Task

  • 8/14/2019 Random Access Memory and Problems

    87/89

    Force

    Loosely self-organized groups (working groups)of

    Two types of documents: I-D and RFC

    I-Ds life is shorter compared to RFCs life RFCs include proposed standards and standards

    RFC Example: RFC 1213, RFC 2543

    I-D -> Proposed Standard -> Standard

  • 8/14/2019 Random Access Memory and Problems

    88/89

    SIP Evolution

    SIP I-D v1 97, v2 98

    SIP RFC 2543 99

    SIP RFC 3261 00, obsoletes RFC 2543

    SIP working groups SIPPING, SIMPLE, PINT,SPIRITS,

    SIPit SIP interoperability test

    Predictor for SPEC92h k

  • 8/14/2019 Random Access Memory and Problems

    89/89

    0

    5 0

    1 0 0

    1 5 0

    2 0 0

    2 5 0

    3 0 0

    compre

    ss eqntott

    espres

    so gcc l

    i

    doduc ear

    ydro2d mdijdp su2

    cor

    P r e d ic t e d

    P ro f i l e b a

    Benchmarks

    B e n ch m a rk

    Instructionsbetweenmispredictions


Recommended