Implementation and Verification of a Cache Coherence protocol
using Spin
Steven Farago
Goal
• To use Spin to design a “plausible” cache coherence protocol – Introduce nothing in the Spin model that would
not be realistic in hardware (e.g. instant global knowledge between unrelated state machines)
• To verify the correctness of the protocol
Background
• Definition: Cache = Small, high-speed memory that is used by a single processor. All processor memory accesses are via the cache.
• Problem: – In a multiprocessor system, each processor could have a cache.
– Each cache could contain (potentially different) data for the same addresses.
– Given this, how to ensure that processors see a consistent picture of memory?
Coherence protocol
• A Coherence protocol specifies how caches communicate with processors and each other so that processors will have a predictable view of memory.
• Caches that always provide this “predictable view of memory” are said to be coherent.
A Definition of Coherence
• A “view of memory” is coherent if the following property holds:– Given cacheline A, two processors may not see storage accesses to
A in a conflicting order.
– Example:
– Processor 0 Processor 1 Processor 2 Processor 3
Store A, 0 Load A, 0 Load A, 0 Load A, 1
Store A, 1 Load A, 1 Load A, 0 Load A, 0
Coherent Coherent ** NOT Coherent
• Informally, a processor may not see “old” data after seeing “new” data.
Standard Coherence Protocol
• MESI (Modified, Exclusive, Shared, Invalid) – Standard protocol that is supposed to guarantee cache coherence
• Each block in the cacheline is marked with one of these states.
• Cacheline accesses are only allowed if the cache states are “correct” w.r.t the coherence protocol
• Examples: – A cache that is marked “invalid” may not provide data to a
processor.
– Cacheline data may not be updated unless the line is in the Exclusive or Modified
System Model
• Initial version
• Three state machines
– ProcessorModel: Non-deterministically issues Loads and Stores to cache forever
– CacheModel: Two parts - initially combined into a single process
• MainCache - Services processor requests.
• Snooper - Responds to messages from memory controller
– MemoryController - Services requests from each cache and maintains coherency among all
MemoryController
Processor
MainCache
Snooper
Processor
MainCache
Snooper
System Model
ProcessorModel
• Simple
• Continually issues Load/Store requests to associated Cache. – Communication done via Bus Model. – Read requests are blocking
• Coherence verification done when Load receives data (via Spin assert statement)
CacheModel
• Two parts: MainCache and Snooper– MainCache services ProcessorModel Load and Store
requests and initiates contact with the MemoryController when an “invalid” cache state is encountered
– Snooper services independent request from MemoryController. Requests necessary for MemoryController to coordinate coherence responses.
MemoryControllerModel
• Responsible for servicing Cache requests
• 3 Types of requests– Data request: Cache requires up-to-date data to supply to
processor
– Permission-to-store: A Cache may not transition to the Modified state w/o MC’s permission
– A combination of these two
• All types of requests may require MC to communicate with all system caches (via Snooper processes) to ensure coherence
Implementation of Busses
• All processes represent independent state machines. Need communication mechanism
• Use Spin depth 1 queues to simulate communication.
• Destructive/Blocking read of queues requires global bool to indicate bus activity (required for polling).– Global between processes valid to make up for
differences between Spin queues and real busses
Problems - Part 1
• MainCache and Snooper initially implemented as a single process.
• Process nondeterministically determines which to execute at each iteration
• Communication between Processor/Cache and Cache/Memory done with blocking queues
• Blocked receive in MainCache --> Snooper cannot execute
• Leads to deadlock in certain situations
Solution 1
• Split MainCache and Snooper into separate processes.
• Both can access “global” cacheData and cacheState variables independently
--> Problems - Part2
• As separate processes, Snooper and MainCache could change cache state unpredictably.
• Race conditions: Snooper changes cache state/data while MainCache is in mid-transaction --> returns invalidated data to processor.
Solution 2
• Add locking mechanism to cache.– MainCache or Snooper may only access cache if they first lock it.
• Locking mechanism: For simplicity, cheated by using Spin’s atomic keyword to implement test-set on a shared variable.
• Assumption: Real hardware would have some similar mechanism available to lock caches.
• Question: Revised model now equivalent to original??
--> Problem 3
• Memory controller allows multiple outstanding requests from caches.
• Snooper of cache which has a MainCache request outstanding cannot respond to MC queries for other outstanding requests (due to locked cacheline).
• Deadlock.
Solution 3
• Disallow multiple outstanding Cache/MC transactions.
• Introduce global bool variable shared across all caches: outstandingBusOp.
• A cache may only issue requests to the memory controller if no requests from other caches outstanding.
• Global knowledge across all caches unrealistic.• Equivalent to “retries” from MC??
--> Problem 4
• Previous problems failed in Spin simulation within 1000 steps.
• Given last solution, random simulation failures vanish in first 3000 steps.
• Verification fails after ~20000 steps
• Cause of problem as yet unresolved
Verification
• How to verify coherence generally??
• Verify something stronger: A processor will never see conflicting ordering of data if it always sees the newest data available in the system.
• For all loads, assert that data is “new”
Modeling of Data
• Concern that modeling data as random integer would cause Spin to run out of memory
• Model data as a bit with values OLD and NEW.
• All processor Stores store NEW data.
• When transitioning to a Modified state, a cache will change all other values of data in memory and other caches to OLD– Global access to data here strictly a part of verification effort, not
algorithm. Thus allowed.
Debugging
• Found debugging parallel processes difficult.
• Made much easier by Spin’s message sequence diagrams– Graphically shows sends and receives of all
messages. – Requires use of Spin queues rather than globals
for interprocess communication
Future work
• Make existing protocol completely bug free• Activate additional “features” disabled for
debugging purposes (e.g. bus transaction types)• Verify protocol specific rules
– No two caches may be simultaneously Modified
– Cache Modified or Exclusive --> no other cache is Shared