+ All Categories
Home > Documents > Post-Silicon Verification under Limited Observability

Post-Silicon Verification under Limited Observability

Date post: 01-Jan-2016
Category:
Upload: yetta-skinner
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Post-Silicon Verification under Limited Observability. Ganesh Gopalakrishnan School of Computing, University of Utah, Salt Lake City, UT 84112 Ching Tsun Chou Intel Corporation, 3600 Juliette Lane, Santa Clara, CA Supported in part by NSF award CCR 0219805. Why Post-Silicon Verification?. - PowerPoint PPT Presentation
30
1 Post-Silicon Verification under Limited Observability Ganesh Gopalakrishnan School of Computing, University of Utah, Salt Lake City, UT 84112 Ching Tsun Chou Intel Corporation, 3600 Juliette Lane, Santa Clara, CA Supported in part by NSF award CCR 0219805
Transcript
Page 1: Post-Silicon Verification under Limited  Observability

1

Post-Silicon Verificationunder

Limited Observability

Ganesh Gopalakrishnan

School of Computing, University of Utah,

Salt Lake City, UT 84112

Ching Tsun Chou

Intel Corporation, 3600 Juliette Lane,

Santa Clara, CA

Supported in part by NSF award CCR 0219805

Page 2: Post-Silicon Verification under Limited  Observability

2

Why Post-Silicon Verification?

Why verify the silicon? Isn’t doing FV enough? (!)– FV cannot be applied to entire MP systems yet

• MP systems contain several CPUs and several “chip-sets”

We cannot verify the silicon exhaustively - so why bother?

– Formal analysis applied to particular executions can yield far more insights than ad hoc criteria applied to executions

• e.g. “Runtime Verification” of software (Havelund, Rosu, Lee ,..)

Page 3: Post-Silicon Verification under Limited  Observability

3

Runtime verification can cover more!– 1 GHz in silicon instead of 100 Hz during simulation– With well-designed “stress tests” one often finds out a lot

Why Post-Silicon Verification?

Page 4: Post-Silicon Verification under Limited  Observability

4

Where Post-Si Verification fitsin the Hardware Verification Flow

SpecificationValidation

DesignVerification

Testing forFabrication

Faults

Post-SiliconVerification

product

Does functionalitymatch designed behavior?

Pre-manufacture Post-manufacture

Spec

Page 5: Post-Silicon Verification under Limited  Observability

5

More Facts about Post-Silicon Verification

• Post-Si Verification can be for uniprocessor functionality

• .. or to determine if MP Orderings are being obeyed

• ... or to check if cache coherence protocols are behaving

• Directly impacts the time to market

• The industry spends huge amounts of effort in this area

• Great opportunities to apply FV

Page 6: Post-Silicon Verification under Limited  Observability

6

How Formal Methods can enhance Post-Si Verification

• Reduces manual effort

• Helps in test-case selection

• Helps analyze execution results comprehensively

Page 7: Post-Silicon Verification under Limited  Observability

7

Overview of the talk

• How the paradigm for post-Si verification must change

• How Limited Observability impacts post-Si verification

• The use of Constraints

• A paper design for a Post-Si verification system based on constraints

- based on actual experience developing prototypes in an industrial context

• Concluding Remarks

Page 8: Post-Silicon Verification under Limited  Observability

8

Post-Si Verification for Cache Protocol Execution

• PRESENT-DAY

• Assume there is a “front-side bus”• Record bus transactions in response to test programs• Generate detailed cache states from bus transactions• See if behavior matches cache coherence protocol that was supposedly realized

cpu cpu cpu….

mem

“Front-side Bus”

Page 9: Post-Silicon Verification under Limited  Observability

9

Post-Si Verification for Cache Protocol Execution

• Future

• CANNOT Assume there is a “front-side bus”• CANNOT Record all link traffic• CAN ONLY Generate sets of possible cache states • HOW BEST can one match against designed behavior?

cpu cpu cpu cpu

Invisible“miss” traffic

Visible“miss” traffic

Page 10: Post-Silicon Verification under Limited  Observability

10

Potential Carry-over of Techniques

Runtime verification of distributed embedded systems

• Hundreds of processors, FPGAs, SoCs, ... interacting

• Cannot assume system will work correctly on its own

• Must detect onset of crashes, intrusions, ... EARLY

• Cannot easily observe all the nodes

• Even if observable, information corrupts

- bandwidth limitations (need to compress / discard)

- time uncertainties

Page 11: Post-Silicon Verification under Limited  Observability

11

Back to our specific problem domain...

Verify the operation of systems at runtime

when we can’t see all transactions

Could also be offline analysis of a partial log of activities

a

b

x

y

c

d

a x c d y b …

Page 12: Post-Silicon Verification under Limited  Observability

12

Possible Outcomes of Post-Si Verification

Observed Behavior is “Definitely wrong”

“Potentially dangerous” (rely on statistics to give this verdict?)

“Worth noting” (based on past experience and bug logs?)

…..

“Totally benign” (not even worth noting event)

Caveat: we are partially observing a potentially incorrect system

Page 13: Post-Silicon Verification under Limited  Observability

13

Concrete example: Coherence Protocol Verification

RequesterHome

Potential Owners

….

reqsreq

sresp

RetriesorCompletion

DirectSupply of Data

Page 14: Post-Silicon Verification under Limited  Observability

14

Packet encodings, and example trace-file

Req Home

Users

….

reqsreq

resp

req /sreq

Pkt_type mid tid sender dest addr data

resp Pkt_type mid tid sender dest data

• All the packets pertaining to a transaction share the same mid and tid

• Address not shipped with responses

req first-snoop-req subseq-snoop-reqs

subseq-snoop-resps Data Completion

A transaction and various packets it may involve:

Page 15: Post-Silicon Verification under Limited  Observability

15

The actual trace-file is an interleaving of the packets of all active transactions:

The actual trace-file analyzed looks something like this:

The transactions may pertain to the same address (or not); many of the shown events may be missing…

Individual transactions and theirpossible temporal overlap

Page 16: Post-Silicon Verification under Limited  Observability

16

Transaction (packet) semantics:

Requester

Potential Owners

….

p

p

p

p

• Each packet “p” can only be issued under certain cache-line states• After issuing it, the cache-line state often changes• After receiving a packet, the cache-line state changes• These details are VERY complex, and often need to be extracted from cache protocol tables...

Page 17: Post-Silicon Verification under Limited  Observability

17

Verification consists of abstract interpretation driven by transaction history:

c1 c2

c3 c4

c1 c2

c3 c4

c1 c2

c3 c4

c1 c2

c3 c4

c1 c2

c3 c4

Knowing transaction (packet) semantics, we can compute sets ofpossible states in which each cache line can be in after each packetgoes by ... (well, during offline analysis) . Error is flagged wheninconsistency is noted in sets of cache states.

Page 18: Post-Silicon Verification under Limited  Observability

18

General approach: Know all possible communication patternsof various transactions, and how to record progress along aparticular pattern; use constraints to bridge gap.

Communicationpatterns

State within comm. pattern

Page 19: Post-Silicon Verification under Limited  Observability

19

How many of the packets can be invisible? At first cut (and based on some practical experience) having one missing in any “causal loop” seems tolerable – more than one appears TOO under-constrained.

OK

OK

OK

Not OK

Page 20: Post-Silicon Verification under Limited  Observability

20

General statements pertaining to invisibility

OR

In a “fork/join” situation,how many responses can be invisible?

Generally there are invariants governing the responses (e.g., “at most one supplier ofthe value)

If one response is invisible, we can assume it met the invariant -- and remember this to cross-check against future behavior

If more than one response is invisible, we will have to increase the space of assumptions

If we do not see a response,we have to delay “closing out” thetransaction till another pertinent eventinvolving the same address occurs

Page 21: Post-Silicon Verification under Limited  Observability

21

Verification of Mutual Exclusion of Resource Usage (proper arbitration):

Possible idea: Assume that the“first snoop request” tellswho won the arbitration

Snoop of 1

12

3

Check:Transaction 1 must “close-out” before transactions2 and 3 are found to make progress

Tr 1

Tr 2

Tr 3

Expected overlap of transactions under proper arbitration

Problem: What if the firstsnoop request was on aninvisible link?

Page 22: Post-Silicon Verification under Limited  Observability

22

Approach initially tried

Wrote a prototype in Ocaml to analyze given cache protocol execution trace

For each new packet read, its corresponding communication pattern and state within communication pattern was determined

For each packet, we obtained WP and SP

– WP : Weakest Precondition (in a sense)– The most general set of cache states under which packet could be generated

– SP ; Strongest Postcondition (in a sense)– The tightest set of states the cache could be after the packet is sent

– Many transaction-types and “conflict situations” made state maintenance and update highly unstructured (about 8 versions of the code were written, with each version becoming soon ugly)

Page 23: Post-Silicon Verification under Limited  Observability

23

A Conflict Scenario (for example)

RequesterHome

Potential Owners

….

req

sreq

sresp

RetriesorCompletion

DirectSupply of Data

• Requester issues “flush” packet

• Arbitration conflict at home

• Packet sent back for re-issue

• Meanwhile another request gets past home

• Home sends new request to requester

• New request “hijacks” flush-line away!

• Transaction never gets reissued

Page 24: Post-Silicon Verification under Limited  Observability

24

Constraints to the rescue.... but....

Constraint-programming was viewed as a possible solution

– Would permit local behavior to be expressed in terms of constraints

– Constraint formalisms can “solve” for missing information

But, traditional constraint frameworks found inadequate

– After extensive search, we could not find a constraint paradigm that can deal with interacting automata

– What we need is a method for back-tracing precursors to observed actions

– When multiple observations trace back to the same precursor, we can ‘vote the precursor up or down’

– Conditional probabilities of events are involved in guiding search

Page 25: Post-Silicon Verification under Limited  Observability

25

Approach being planned for implementation

Given a packet, determine comm pattern and state within comm pattern

Trace precursors along comm pattern till we reach origin of transaction(which is at a cache where the transaction missed and issued)

Determine the cache state for the particular transaction using theWP rule for the packet

Page 26: Post-Silicon Verification under Limited  Observability

26

Approach being planned for implementation

If cache state not previously determined, mark it speculative

If cache state previously determined and present WP determines acompatible cache state, convert `speculative’ to committed

If previously determined cache state is being contradicted by present WP, mark cache state unknown and trigger backtracing(cancel this precursor computation path and explore another)

Page 27: Post-Silicon Verification under Limited  Observability

27

Cache Agent that wasa “responder” for one transactionmay be “originator” for another....

Responder totwo different transactions

How two precursor computationsmay lead back in time to a common nodeand how we will have to “vote” its cache state(red deposits a speculative state - purple votes it up or down...)

Page 28: Post-Silicon Verification under Limited  Observability

28

Why today’s constraint approaches don’t give these capabilities readily..

Today’s constraint solving approaches (“CSP”) appear to be about “static” situations

Various algorithms based on arc consistency and propagators can be found in the literature

Temporal Concurrent Constraint Programming is in its infancy

(I also don’t know much about these areas... tell me if I’m wrong! But I’ve not seen very much despite intense literature searches...)

Constraint Solving in the context of Coupled Reactive Processes can be have multiple uses

Environments such as Comet (van Hentenryck) may offer a powerful way to organize such a constraint-based system

Page 29: Post-Silicon Verification under Limited  Observability

29

Constraint Languages Surveyed (and some evaluated...)

GnuProlog Sicstus Prolog Mozart / Oz Erlang FaCile .. or even Murphi perhaps?

Reading List (Books / Papers...)

Stuckey’s book on Constraint Logic Programming Dechter’s book on Constraints Modeler++ / Localizer++ / Comet

Ultimately will roll our own constraint system

Page 30: Post-Silicon Verification under Limited  Observability

30

Concluding Remarks

Limited Observability is going to be a central concern in future system verification

Plenty of opportunities for formal methods, constraint-solving methods, and abstract interpretation methods to work in concert

Formal Methods communities must talk to other communities to significantly enhance the scope and relevance of what they are doing

– testing communities

– diagnosis communities


Recommended