Date post: | 08-Aug-2015 |
Category: |
Engineering |
Upload: | pietro-piscione |
View: | 147 times |
Download: | 3 times |
Cache coherence and consistency models in multiprocessor architecture
Computer Architecture Authors:Piscione Pietro
Villardita AlessioDegree: Computer EngineeringA.Y. 2014/2015
Introduction● Multiprocessor architecture
overview
● Coherence vs. Consistency○ Coherence protocols○ Snooping and Directory models
○ Consistency models○ Sequential Consistency
● More throughput● More efficiency
Why multiprocessor architecture? ● Clock frequency wall
● Shared memory
● Distributed memory
The bus is the bottleneck
More processors (16 Threads)
More cache memory(20 MB L3)
More complexity(1.86 billions transistors)
i7-990x (2011): 12 threads, 12 MB cache, 1.17 billions trans.
Cache design factorsTraditionally, memory hierarchies designers focused on:● Optimizing average memory access time● Miss rate● Miss penalty
More recently:● power consumption has become a major
consideration
They are different
Consistency and coherence
● Cache coherence model specifies HOW memory accesses are coordinated among CPUs.
● Cache consistency model specifies WHEN a memory write shows up at another CPU.
“For any given memory location, at any given (logical) time, there is either a single core that may write it (and that may also read it) or some number of cores that may read it.”
Cache coherence: definition
Two fundamental invariants:● Single-Writer-Multiple-Reader (SWMR)● Data-Value
Cache coherence: epochs
● Dividing a given memory location’s lifetime into epochs
● SWMR only is not enough: need for the Data-Value invariant
● Accepts loads and stores from and returns load values to the core
● Initiate a coherence transaction when a cache miss occurs, by issuing a coherence request for the block requested by the core
● Receive coherence requests and coherence responses that must be processed
Coherence controller behavior
Coherence Protocols: basicsWhen a write occurs on a specific address, what’s next? Two alternatives:● Write invalidate (most common): invalidate all
other copies
● Write update (broadcast): update all the cached copies
Invalidate vs. Update protocolsInvalidate:● One message to
achieve coherence
● Significantly less bandwidth
● Easy to implement
Update:● Less read latency
● Larger messages
● More bandwidth
● More complex implementations
Coherence Protocols: basics● Directory based: physical memory blocks’
sharing status stored in one centralized location
● Snooping: every cache tracks the sharing status of the given block of physical memory
Snooping protocol: main features● Distributed architecture● Messages broadcasting● Not so scalable● Total order of coherence requests across all
blocks● Interconnection network must serialize these
requests into some total order
● Write to shared data:○ An invalidate is sent to all caches which snoop and
invalidate any copy
Snooping protocol: Write Invalidate
● Read Miss:○ Write-through: memory is always up-to-date○ Write-back: force other caches to update copy in main
memory, then snoop that value
Can use a separate invalidate bus for write traffic
● Write to shared data:○ Broadcast on bus, processors snoop, and update
copies
Snooping protocol: Write Update
● Read miss:○ memory is always up-to-date
● Higher bandwidth (transmit data + address), but lower latency for readers (looks like write-through cache)
Directory protocol: basic idea● Global view of cache states● Centralized in directory● Unicast message● More scalability
When a directory receives a message, what does it happen?
Reply or Forward
Possible cases:
Directory protocol: basic idea
● One request-reply
● One request -> K forwards -> K replies
● Point-to-point ordering
Directory protocol: example1. Requestor sends GetM to
Directory2. Directory sends Ack Count
to Requestor3. Directory sends K Invalidate
Message to sharers4. Sharers send an AckInv to
requestor5. The requestor modifies the
block
Snooping vs. Directory coherenceSnooping Solution (Snoopy Bus):● Send all requests for data to all processors (broadcast)● Scaling limited by cache miss & write traffic saturating the
bus
Directory-Based Schemes:● Send point-to-point requests to processors (unicast)● Keep track of what is being shared in a directory● Distributed memory => distributed directory (reducing
bottlenecks)
Hybrid Designs
There are protocols that combine aspects of:● Snooping and directory protocols● Invalidate and update protocols
Achieving advantages from both the solutions.
(aka memory consistency model, or, memory model)
● A specification of the allowed behavior of multithreaded programs executing with shared memory
● Multiple correct behaviors are usually allowed
One fundamental:● Out-of-Order execution
Consistency model: definition
Core Might Reorder Memory AccessesSequential execution model (von Neumann):● Usually, operations to the same address execute in the
original program order.
Possible reorderings (to different addresses):● Store-Store: no FIFO write buffer● Load-Load● Load-Store and store-load: local bypass
Multiple executions allowed → Non-Determinism
S2 S7S1
write buffer
read R1
● “The result of an execution is the same as if the operations had been executed in the order specified by the program.” (Lamport, 1979)
● Memory order must respect program order
● Every load gets its value from the last store before it (in global memory order)
Sequential consistency: basic idea
Sequential consistency: Atomicity
● Need of instructions that atomically perform a “read–modify–write” (e.g. “test-and-set”)
● Simplistic approach: the core effectively locks the memory system → sacrifices performance
● Aggressive approach: only need for a “test-and-set” appearing in total order