Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink...

Post on 21-Dec-2015

221 views 2 download

transcript

Static Specification Mining Using Automata-Based Abstractions

Sharon Shoham Eran Yahav Stephen Fink

IBM T.J. Watson Research Center

Marco Pistoia

Technion, Israel

Finding What’s There(but is hard to find)

Sharon Shoham Eran Yahav Stephen Fink

IBM T.J. Watson Research Center

Marco Pistoia

Technion, Israel

Component APIs Are Complicated

There is only one thing more painful than learning fromexperience and that is not learning from experience.– Archibald MacLeish

Temporal API Specifications

Legal interactions with a componentWhat methods could be called at every state

finishConnect read,write

finishConnectread,write

close

close

0 1 2 3 4 5

config connect

java.nio.channels.SocketChannel (partial spec)

Applications

Program understanding

Regression Deviant behaviorsSpecs for verification…

Mining Temporal Specifications

Component-side mining Infer usage from component implementation Relies on error conditions in component implementation

Client-side mining Infer usage from existing clients using the component

Real usage scenarios << Permitted scenarios

connect; close;

close

connect; write; write; close;

connect; write; close;

connect; read; close;

Dynamic vs. Static Specification Mining

Dynamic Mine specification from representative executions Requires running the program (with varying inputs) Incomplete coverage of behaviors

Static Cover all client behaviors Challenging

Our approach Static client-side specification mining Bad news: this is hard Good news: we can still make it work

Example

How do I use a java.nio.channels.SocketChannel?

void example() { Collection<SocketChannel> chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { /* ... wait for connection ... */ } if (?) { receive(sc); } else { send(sc); } } closeAll(channels); }

Collection<SocketChannel> createChannels() { List<SocketChannel> list = new

LinkedList<SocketChannel>(); list.add(createChannel(“ ", 80)); //… more channels added to list … return list; }

SocketChannel createChannel (String hostName, int port) {

SocketChannel sc = SocketChannel.open(); sc.configureBlocking(false); return sc; }

void receive(SocketChannel x) { //… FileOutputStream fos = new …; ByteBuffer dst = …; int numBytesRead = 0; while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } fos.close(); }

void send(SocketChannel x) { for (?) { … int numWritten = x.write(buf); } }

void example() { Collection<SocketChannel> chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { /* ... wait for connection ... */ } if (?) { receive(sc); } else { send(sc); } } closeAll(channels); }

void closeAll (Collection<SocketChannel> chnls) {

for (SocketChannel sc : chnls) { sc.close(); } }

Bad News

Interprocedural Flow

Flow Sensitivity

Context Sensitivity

Non-trivial aliasing

void example() { Collection<SocketChannel>

chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { …

} if (?) { receive(sc); }

else { send(sc); } } closeAll(channels); }

SocketChannel createChannel (…)

{ SocketChannel sc =

SocketChannel.open(); sc.configureBlocking(false); return sc; }

void receive(SocketChannel x) { … while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } … }}

sc=open()

cfgsc.cfg

cnccfgsc.cnc

fincnccfgsc.fin

… …

fincnccfg … finsc.fin

fincnccfg…

fin rdx.read

fincnccfg…

fin rd … rdx.read

… …

finishConnect

SocketChannel Specification

read,write

finishConnectread,write

close

close

0 1 2 3 4 5config connect

(Partial specification)

Challenges

Dynamically allocated objects

unbounded number of objects

aliasing

objects flow through complex heap-allocated data structures

heap abstraction

Unbounded length of event sequences

event sequence observed for an object might be unbounded

event sequence abstraction

Noise

analysis imprecision and/or incorrect client programs

Noise reduction

Overview

Abstract Trace Collection

Abstract InterpretationAbstract value

• Heap abstraction: abstracts unbounded heap• Trace abstraction: abstracts unbounded sequences of

operations

Initial heap abstraction partition the heap into a fixed partition (based on allocation

site)

AS2

AS2

AS3

AS3

AS3

AS1

AS2

AS1

fincnccfg fin fincnccfg cnccfgfincnccfg …

void example() { Collection<SocketChannel>

chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { …

} if (?) { receive(sc); }

else { send(sc); } } closeAll(channels); }

SocketChannel createChannel (…)

{ SocketChannel sc =

SocketChannel.open(); // AS1 sc.configureBlocking(false); return sc; }

void receive(SocketChannel x) { … while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } … }}

sc=open() <AS1, >

sc.cfgcfg

<AS1, > <AS1, >

<AS1, >cnccfg

sc.cnccnc

<AS1, >

<AS1, >

… …

[write, connect,close, finCon,config, read]

[write, connect,close, finCon,read]

Heap data for an “abstract object” o• unique = true

• abstract value represents a single object

• must = {x.f}• the access path x.f must point to o

• mustNot = {y.g}• the access path y.g must not to point to o

• …

Must points-to information allows strong updates

<AS1, must: { sc }, >

Refined Heap Abstraction

sc.cfgcfg

<AS1, must : { sc }, > <AS1, >

sc=open()

History Abstraction

Abstract history Automaton over-approximating unbounded event

sequences

Quotient-based abstractions for history Automata states which are equivalent w.r.t. a given

equivalence relation R are merged

<allocated at AS1, >fincnccfg

<o1, >fincnccfg

<o2, >fincnccfg

<allocated at ASk, >fincnccfg…

…<AS1, >fincnccfg

…fin read

?

History Abstraction

Past-Future Abstraction (q1,q2) R[k1,k2] if q1 and q2 share both an incoming sequence of length k1 and an outgoing sequence of length k2

aa

ca

aa

bc

ca

b c

ca

bc

ca

bc

Past 1 Examples Future 1 Example

Abstract Semantics Initial abstract history

empty sequence automatonWhen an API method is invoked

history extended: append event and construct quotient

fincnccfgwhile (!sc.finCon) {

finfincnccfg

Past 1 equivalent} //endof while fincnccfg fincnccfg

fin

fin

fincnccfg

cfg

cnccfg

sc = open

sc.config

sc.connect

Are We Done?

Bounded is great, but not enough

Merge histories at control flow join points Speed up convergence

Merge all histories that have identical heap-data, and

satisfy a given merge criterion

Merge: union construction followed by quotient construction

a ab

x.a()

x.b()

if(?)

endof while fincnccfg fincnccfg

fin

quo

fincnccfgfin

Example: Past Abstraction with Exterior Merge

union

fincnccfg

fincnccfg

fin

Recap: Abstraction Dimensions

0

[write, connect,close, finCon,config, read]

1

[write, connect,close, finCon,read]

0config

1

[write, connect,close, finCon,read]

0 1

3 42

5

6

config

close

connect

fincon

connect read

write

read

write

connect

connectclose

close

0

1

2

[write, connect,config, finCon,read]

[write, connect,finCon, read]

close

close close

Base APFocus (refined heap abstraction)

Pas

t /

Tot

alP

ast

/ E

xter

ior

close

Third dimension: different history abstraction, not shown here

Heap Abstraction

Merge Criteria

History Abstraction

0

[write, connect,close, finCon,config, read]

1

[write, connect,close, finCon,read]

0 1

3 4

2

5

6

config

close

connect

fincon

connectread

write

read

write

connect

connectclose

close

close

Summarization Phase: Noise

Analysis imprecisionBugs in training corpus

initV

Trace collection results

signup0 1 2 3

upinitS

initS

verifyup0 1’ 2’ 3’

upinitV

initV

verifyup0 1’ 2’ 3’

up

initV

n

k

1

verify

Real API

sign

up

initV

initS

up

verify

java.security.Signature

Naïve UnionNaïve Union

signup

0

1 2 3

up

initS initS

verifyup1’ 2’ 3’

initVinitV

upinitV

Trace collection results

signup0 1 2 3

upinitS

initS

verifyup0 1’ 2’ 3’

upinitV

initV

verifyup0 1’ 2’ 3’

up

initV

n

k

1

verify

verify

No noise reductionSound summary

Weighted UnionWeighted Union

initV

Trace collection results

signup0 1 2 3

upinitS

initS

verifyup0 1’ 2’ 3’

upinitV

initV

verifyup0 1’ 2’ 3’

up

initV

n

k

1

verify

Label each transition with number of input automata that contain it Transitions with weight < threshold are removed

signup

0

1 2 3

up

initS initS

verifyup1’ 2’ 3’

initV

up

n

k+1

initV

verify

1

ClusteringClustering

signup0 1 2 3

initS

initS

initV verifyup0 1’ 2’ 3’

up

initV

1initS

n

k

1

initV

signup0 1 2 3

upinitS

initS

verifyup0 1’ 2’ 3’

up

initV

initV

verifyup0 1’ 2’ 3’

up

initV

1initS

n

k

1

Automata partitioned into clusters of “similar” automata, each cluster summarized separatelySimilarity = language inclusion

Trace collection results

Experimental Results

Mined various APIs from a suite of benchmarks APIs from Java libraries

• java.security.Signature, java.security.KeyAgreement,…

• Ganymed • Session, Connection, ConnectionManager,…

• FlickrAPI• Photo, Auth, …

java.security.SignatureBase/Past/Total

Base/Past/Exterior

APFocus/Past/Exterior

Ganymed Session

Base/Past/Exterior

APFocus/Past/Exterior

(all results here are actual images produced by the tool)

Lessons from Experiments

Precise heap abstractions AND history abstractions needed

Pragmatics Summaries other than union do not guarantee an over-

approximation of behaviors, but still useful with timeout, trace collection result is not an over-

approximation, but still useful

Limitations Too detailed results (print, println) Scalability remains a challenge Single object vs. multiple objects specs

Summary

Client-side specification mining

based on flow-sensitive, context-sensitive abstract interpretation

combined domain abstracting both aliasing and event sequences

Novel family of abstractions to represent unbounded event sequences

Novel summarization algorithms

Preliminary experimental results

Invited Questions

1) How do you get the API in the motivation slide from the example program you showed?

2) Can you give an example of the effect of past vs. future?

3) I didn’t get merge, can you show another example?

4) Can you say when the results are precise? 5) Can you say something more about experiment

al results?6) Related Work?

API in motivation slide vs. one from example

finConread,write

finConread,write

close

close

0 1 2 3 4 5config connect

0 1

3 42

5

6

config

close

connect

fincon

connect read

write

read

write

connect

connect

close

close

close

• Elements in list not known to be unique

• connect can be repeated• close can be repeated

• Read and write never happen together

•Thus kept in separate parts of the automaton

• This is not a bad result for an automated tool (and a single! example program)

• All these would be “washed away” with a sufficient number of other examples

Example: Past Abstraction with Exterior Merge

fincnccfgfin

then: while loop: x.read else: while loop: x.write

fincnccfgfin

rd fincnccfgfin

wr

fincnccfgfin

rdrd

fincnccfgfin

wrwr

if(?)

endof for

fincnccfgfin

rdrd

fincnccfgfin

wrwr

fincnccfgfin

No merge !

endof for

fincnccfgfin

rdfincnccfgfin rd

wrfincnccfgfin wr

merge

fincnccfg

finrd

rd

wrfin

wr

fin

Example: Future Abstraction with Exterior Merge

cnc

SocketChannel Specification

fincnccfgfin

wr

wr

rd

rd

cnc

cnc

cl

clcl

cl

cl

Past

fincnccfg

finrd

rd

wrfin

wr

fin

cl

wr

rd

fin

cfg

Future

In this example, different automata, but same language

Merge Criteria

a b

Total Merge

a

Exterior Merge

ab

b

a

ab

a

union union

a

ab

quo

a

b

quo

(past 1 history abstraction)

Can you say when the results are precise?

• when there exists an automaton such that the equivalence relation that we choose uniquely characterizes each states

Experimental Results

API stat

es

edg

es

den

s

stat

es

edg

es

den

s

stat

es

edg

es

den

s

stat

es

edg

es

den

s

stat

es

edg

es

den

s

stat

es

edg

es

den

s

Auth 2 3 1.50 2 3 1.5 2 3 1.5 2 2 1.00 2 2 1.00 2 2 1.00Channel 2 6 3.00 3 6 2.00 3 6 2.00 3 3 1.00 3 3 1.00 3 3 1.00ChannelMgr 2 11 5.50 5 18 3.60 6 19 3.17 4 7 1.75 5 9 1.80 5 9 1.80Cipher 1 5 5.00 4 14 3.50 6 12 2.00 7 10 1.43 7 10 1.43 7 10 1.43Connection 3 12 4.00 4 12 3.00 4 12 3.00 5 7 1.40 5 7 1.40 5 7 1.40KeyAgreement 2 5 2.50 4 6 1.50 4 6 1.5 4 3 0.75 4 3 0.75 4 3 0.8LineAndShape 3 12 4.00 6 15 2.50 6 15 2.50 6 8 1.33 6 8 1.33 6 8 1.33MsgDigest 1 2 2.00 2 2 1.00 2 2 1.00 2 2 1.00 2 2 1.00 2 2 1.00Photo 1 12 12.00 1 12 12.00 1 8 8.00 8 8 1.00 8 8 1.00 8 8 1.00PrintWriter 1 3 3.00 2 3 1.50 2 3 1.50 6 11 1.83 3 5 1.67 3 5 1.67Session 2 7 3.50 5 10 2.00 5 10 2.00 5 4 0.80 5 4 0.80 5 4 0.80Signature 2 8 4.00 5 12 2.40 5 12 2.40 4 6 1.50 4 6 1.50 4 6 1.50TransportMgr 9 24 2.67 2 19 9.50 8 27 3.38 9 26 2.89 9 24 2.67 9 24 2.67URLConnection 2 9 4.50 4 10 2.5 3 6 2 4 7 1.75 NA NA NA NA

Average 4.08 3.46 2.57 1.39 1.33 1.33Std dev 2.54 3.22 1.71 0.56 0.52 0.52

APF/Past/Ext APF/Fut/ExtBase/Past/Tot Base/Past/Ext Base/Fut/Ext APF/Past/Tot

Japanese Toilet API

The two buttons linked together (next to the floating woman) are given the group label (well, "bottom" or "posterior"), one with the word "mild" and the other with "powerful." The icon on each button indicates a water jet.

I can't see the third character labeling the jog shuttle, but that appears to be a "flow" control for a water jet - not sure though.

There are several opportunities for mode errors here which (I hope) are mitigated by the LCD display: the button above the jog shuttle labeled "wide jet" is toggled on/off, and the "dryer" button cycles though three strengths. My experience with toilet UI (although not great) indicates that mode errors are a problem though. If that jet feels rather, er, surprising, a lack of mode data makes you reluctant to try to alter it...

(Some) Related Work

Dynamic DAIKON (…) Perracota (ICSE06) DIDUCE (ICSE02) Strauss (Ammons et. al. POPL02) Whaley et. al. (ISSTA02) …

Static JIST (Alur et. al. POPL05) Whaley et. al. (ISSTA02) …