+ All Categories
Home > Documents > Static Specification Analysis for Termination of Specification-Based Data Structure Repair Brian...

Static Specification Analysis for Termination of Specification-Based Data Structure Repair Brian...

Date post: 20-Dec-2015
Category:
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
Static Specification Analysis for Termination of Specification-Based Data Structure Repair Brian Demsky Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology
Transcript

Static Specification Analysis for Termination of Specification-Based

Data Structure Repair

Brian DemskyMartin Rinard

Laboratory for Computer ScienceMassachusetts Institute of Technology

Motivation

F = 20G = 5

F = 20G = 10

I = 5

J = 2

Broken Data Structure

Errors• Missing elements• Inappropriate

sharing• Dangling

references• Out of bounds

array indices• Inconsistent values

Goal

F = 10G = 5

F = 20G = 10

I = 3

J = 2

F = 2G = 1

F = 20G = 5

F = 20G = 10

I = 5

J = 2

Broken Data Structure Consistent Data Structure

RepairAlgorithm

Goal

F = 10G = 5

F = 20G = 10

I = 3

J = 2

F = 2G = 1

F = 20G = 5

F = 20G = 10

I = 5

J = 2

Broken Data Structure Consistent Data Structure

RepairAlgorithm

ConsistencyProperties

FromDeveloper

What Does Repair Algorithm Produce?

• Data structure that • Satisfies consistency properties, and• Heuristically close to broken data

structure• Not necessarily the same data structure

as (hypothetical) correct program would produce

• But enough to keep program operating successfully

Precursors

• Data structure repair has historically appeared in systems with extreme reliability goals• 5ESS switch – hand coded audit

routines• IBM MVS operating system – hand

coded failure recovery routines• Key component of these systems

Where Is This Likely To Be Useful?

• Not for systems with slack - can just reboot• Cause of error must go away after reboot• Must be OK to lose volatile state• Must be OK to wait for reboot

• Persistent data structures (file systems, application files)• Autonomous and/or safety critical systems

• Monitor/control unstable physical phenomena

• Largely independent subcomputations• Moving time window

Architecture

101110011000111101110101010111100111011010111000111101110

Broken Bits

BrokenAbstract Model

RepairedAbstract Model

101001111000111101110101101011100110101010111011001100010

Repaired Bits

Model Definition &Translation

Internal ConsistencyProperties

External ConsistencyProperties

Architecture RationaleWhy go through the abstract model?

• Simple, uniform structure • Sets of objects• Relations between objects

• Simplifies both• Expression of consistency properties• Repair algorithm

• Enables system to support full range of efficient, heavily encoded data structures

File System Example

abst intro 0 2 1

Directory Entries Disk Blocks

struct Entry {byte name[Length];int firstBlock;

}struct Block {

int nextBlock;data byte[BlockSize];

}

struct Disk {Entry dir[NumEntries];Block block[NumBlocks];

}

Disk D;

-5 1 -1

Model Definition

• Sets of objectsset blocks of integer : partition used |

free;• Relations between objects – values of

object fields, referencing relationships between objectsrelation next : used, used;blocks

used freenext

Model TranslationBits translated to sets and relations in abstract

model using statements of the form:

Quantifiers, Condition Inclusion Constraint

for i in 0..NumEntries, 0 D.dir[i].firstBlock and D.dir[i].firstBlock < NumBlocks D.dir[i].firstBlock in used

for b in used, 0 D.block[b].nextBlock and D.block[b].nextBlock < NumBlocks b,D.block[b].nextBlock in next

for b,n in next, true n in usedfor b in 0..NumBlocks, not (b in used) b in free

Model in Example

1

0

2

next

next

used

free

3

blocks

abst intro 0 2 1

Directory Entries Disk Blocks

-5 1 -1

Internal Consistency PropertiesQuantifiers, Body

• Body is first-order property of basic propositions• Inequality constraints on values of numeric

fields • V.R = E, V.R < E, V.R E, V.R E, V.R > E

• Presence of required number of objects• size(S) = C, size(S) C, size(S) C

• Topology of region surrounding each object• size(V.R) = C, size(V.R) C, size(V.R) C • size(R.V) = C, size(R.V) C, size(R.V) C

• Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R• Example: for b in used, size(next.b) 1

Internal Consistency ViolationsEvaluate consistency properties, find

violationsfor b in used, size(next.b) 1 is false for b

= 1

1

0

2

next

next

used

free

3

blocks

Repairing Violations of Internal Consistency Properties

• Violation provides binding for quantified variables

• Convert Body to disjunctive normal form(p1 … pn ) … (q1 … qm )

p1 … pn , q1 … qm are basic propositions

• Choose a conjunction to satisfy• Repair violated basic propositions in

conjunction

Repairing Violations of Basic Propositions

• Inequality constraints on values of numeric fields • V.R = E, V.R < E, V.R E, V.R E, V.R > E• Compute value of expression, assign field

• Presence of required number of objects• size(S) = C, size(S) C, size(S) C• Remove or insert objects from/to set

• Topology of region surrounding each object• size(V.R) = C, size(V.R) C, size(V.R) C • size(R.V) = C, size(R.V) C, size(R.V) C• Remove or insert pairs from/to relation

• Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R• Remove or add the object or pair from/to set or relation

Repair in Examplefor b in used, size(next.b) 1 is false for b

= 1Must repair size(next.1) 1

Can remove either 0,1 or 2,1 from next

1

0

2

next

next

used

free

3

blocks

Repair in Examplefor b in used, size(next.b) 1 is false for b

= 1Must repair size(next.1) 1

Can remove either 0,1 or 2,1 from next

1

0

2

next

used

free

3

blocks

Acyclic Repair Dependences

• Questions• Isn’t it possible for the repair of one

constraint to invalidate another constraint?

• What about infinite repair loops?• What about unsatisfiable specifications?

• Answer• We require specifications to have no

cyclic repair dependences between constraints

• So all repair sequences terminate• Repair can fail only because of resource

limitations

Formalizing Repair Dependences:

Constraint Dependence Graph• Nodes: Conjuncts from DNF• Edges

conjunction to dependent conjunctionif repairing conjunction could falsify

conjunction, orif repairing conjunction could increase

quantifier scope

(a1 … an ) (b1 … bn )

(c1 … cn ) (d1 … dn ) (e1 … en ) (f1 … fn )

Formalizing Repair Dependences:

Constraint Dependence Graph

(a1 … an ) (b1 … bn )

(c1 … cn ) (d1 … dn ) (e1 … en ) (f1 … fn )

• Absence of cycles implies valid repair schedule• Conjunction removal for cycle elimination

(must leave at least one conjunction per constraint)

Formalizing Repair Dependences:

Constraint Dependence Graph• Absence of cycles implies valid repair schedule• Conjunction removal for cycle elimination

(must leave at least one conjunction per constraint)

(a1 … an ) (b1 … bn )

(c1 … cn ) (d1 … dn ) (e1 … en )

External Consistency Constraints

Quantifiers, Condition Body• Body of form V = E, V.F = E, V.F[I] = E• Example

for b in free, true D.block[b].nextBlock = -2

for i,j in next, true D.block[i].nextBlock = j

for b in used, size(b.next) = 0 D.block[b].nextBlock = -1

• Repair simply performs assignments• Translates model repairs to bit repairs

abst intro 0 2 1

Directory Entries Disk Blocks

-5 1 -1

abst intro 0 2 1

Directory Entries Disk Blocks

-1 -1 -2

Repaired File System

Repair in Example

Inconsistent File System

When to Test for Consistency and Repair

• Persistent data structures• Repair can be independent activity, or• Repair when data written out or read in

• Volatile data structures in running program• Under programmer control• Transaction-based approach

• Identify transaction start and end• Repair at start, end, or both

• Failure-based approach• Wait until program fails• Repair and restart from latest safe point

Experience• We acquired four benchmarks (written in C/C++)

• CTAS (air-traffic control tool)• Simplified Linux file system• Freeciv interactive game• Microsoft Word files

• We developed specifications for all four • Very little development time (days, not weeks)• Most of time spent figuring out Freeciv and

CTAS • Each benchmark has

• Workload• Fault insertion methodology

• Ran benchmarks with and without repair

CTAS

• Set of air-traffic control tools• Traffic management• Arrival planning• Flow visualization• Shortcut planning

• Deployed in centers around country (Dallas/Ft. Worth, Los Angeles, Denver, Miami, Minneapolis/St. Paul, Atlanta, Oakland)

• Approximately 1 million lines of C/C++ code

CTAS Screen Shot

Results

• Workload – recorded radar feed from DFW• Fault insertion

• Simulate error in flight plan processing• Bad airport index in flight plan data

structure • Without repair

• System crashes – segmentation fault• With repair

• Aircraft has different origin or destination• System continues to execute• Anomaly eventually flushed from system

Aspects of CTAS

• Lots of independent subcomputations• System processes hundreds of aircraft –

problem with one should not affect others• Multipurpose system

(visualization, arrival planning, shortcuts, …) – problem in one purpose should not affect others

• Sliding time window: anomalies eventually flushed

• Rebooting ineffective – system will crash again as soon as it sees the problematic flight plan

intro 110 0 1011

directoryblock

inodebitmapblock

blockbitmapblock

inode inode…

inode block

disk blocks

Simplified Linux File System

Some Consistency Properties• inode bitmap consistent with inode

usage• block bitmap consistent with block

usage• directory entries refer to valid inodes • files contain valid blocks only• files do not share blocks

superblock

groupblock

Results

• Workload – write and verify several files • Fault insertion – crash file system

• Inode and block bitmap errors• Partially initialized directory and inode

entries• Without repair

• Incorrect file contents because of inode and disk block sharing

• With repair• Bitmaps repaired preventing illegal

sharing, correct file contents

PO MM

OO MP

PO MM

PP MP

loc: 3,0

loc: 2,3

Terrain Grid

City Structures

Freeciv

Consistency Properties• Tiles have valid terrain

values• Cities are not in the ocean• Each city has exactly one

reference from city location grid

• City locations are consistent in• City structures and• tile grid

O = OceanP = PlainM = Mountain

Results

• Workload – Freeciv software plays against itself

• Fault insertion – randomly corrupt terrain values

• Without repair – program fails (seg fault)• With repair

• Game runs just fine• But game plays out differently because

of the different terrain values

Microsoft Word Files• Files consist of a sequence of streams• Streams stored using FAT-based data

structure

• Consistency Properties• FAT blocks exist and contain valid entries• FAT streams are properly terminated• Free blocks properly marked• Streams contain valid blocks• No sharing of blocks between streams

abst 1 intro 7 0 1 9 2 -1 -1 -21

Directory Entries FAT Disk Blocks

Results

• Workload – several Microsoft Word files• Fault insertion – scramble FAT• Without repair

• If blocks containing the FAT were incorrectly marked as free, Word successfully loads file

• Otherwise, “The document name or path is not

valid”

• With repair• Word loads all files

Recent Work

101110011000111101110101010111100111011010111000111101110

Broken Bits

BrokenAbstract Model

RepairedAbstract Model

1010011110001110101110101101011100110101010111011001100010

Repaired Bits

Model Definition &Translation

Internal ConsistencyProperties

External ConsistencyProperties

• External consistency properties translate model repairs to data structure repairs

• Errors may cause data structures to remain inconsistent even after repair

Recent Work

101110011000111101110101010111100111011010111000111101110

Broken Bits

BrokenAbstract Model

RepairedAbstract Model

1010011110001110101110101101011100110101010111011001100010

Repaired Bits

Model Definition &Translation

Internal ConsistencyProperties

External ConsistencyProperties

• Current strategy• Eliminate external consistency properties• Analyze model definition rules and internal

consistency properties• Automatically generate data structure repairs

Recent Work

101110010111010101110110101110110

101110010111010101110110101110110

101110010111010101110110101110110

Broken Bits

Repaired Bits

Broken Abstract Model

RepairedAbstract Model

AbstractRepair

AutomaticallyGeneratedConcreteRepair

. . . .

. . . .

Model Definition &Translation

Result: Repaired bits guaranteed to satisfy consistency constraints

Recent Work• Efficient evaluation of consistency properties

• Compilation to remove interpreter overhead (4.7x speedup)

• Fixed point elimination (210x speedup)• Relation construction elimination (500x

speedup)• Set construction elimination (3900x speedup)

• Model-based error localization• User study shows benefit from approach• Users with tool take 11 minutes on average to

find and fix a bug• Users without tool mostly failed to find a bug

within the hour allocated

Related Work

• Hand-coded repair• Lucent 5ESS switch• IBM MVS operating system

• Integrity Maintenance in Databases (Ceri, Widom, Urban)

• Self-stabilizing algorithms• Log-based recovery for database systems• Recovery-oriented computing

• Recursive restartability• Undo framework

Conclusion

• Data structure repair interesting way to (potentially) improve reliability

• Specification-based approach promises to make technique more widely applicable

• Moving towards more robust, probabilistic, continuous concept of system behavior


Recommended