+ All Categories
Home > Technology > Mining Branch-Time Scenarios From Execution Logs

Mining Branch-Time Scenarios From Execution Logs

Date post: 05-Dec-2014
Category:
Upload: dirk-fahland
View: 656 times
Download: 0 times
Share this document with a friend
Description:
This presentation was given at the International Conference on Automated Software Engineering (ASE 2013) in Palo Alto, November 2013. We describe a technique for automatically extracting specifications from execution traces of an application. The particular specification that we extract are scenarios in the form of conditional existential Live-Sequence Charts (LSC), which are similar to UML Sequence Diagrams. The technique is implemented in a tool and was evaluated on two real-life event logs.
52
Mining Branching-Time Scenarios Dirk Fahland David Lo Shahar Maoz
Transcript
Page 1: Mining Branch-Time Scenarios From Execution Logs

Mining

Branching-Time Scenarios

Dirk Fahland

David Lo

Shahar Maoz

Page 2: Mining Branch-Time Scenarios From Execution Logs

Mining

Branching-Time Scenarios

Dirk Fahland

David Lo

Shahar Maoz

Eindhoven University of Technology

Singapore

Management

University

Tel Aviv University

Page 3: Mining Branch-Time Scenarios From Execution Logs

Mining

Branching-Time Scenarios

Dirk Fahland

David Lo

Shahar Maoz

Page 4: Mining Branch-Time Scenarios From Execution Logs

3

Understanding Existing Applications

understanding of objects and

object interplay

specification fix bugs, add

features, test,

document, …

Page 5: Mining Branch-Time Scenarios From Execution Logs

4

Understanding Existing Applications

understanding of objects and

object interplay

specification fix bugs, add

features, test,

document, …

Usually, there is no specification

or worse, if it exists, it is

outdated…

Page 6: Mining Branch-Time Scenarios From Execution Logs

5

Understanding Existing Applications

understanding of objects and

object interplay

?

?

?

?

source code

specification fix bugs, add

features, test,

document, …

Page 7: Mining Branch-Time Scenarios From Execution Logs

6

Understanding Existing Applications

understanding of objects and

object interplay

?

?

?

?

source code

specification fix bugs, add

features, test,

document, …

Understanding applications from source

code is

• laborious

• time-consuming

• error prone

Page 8: Mining Branch-Time Scenarios From Execution Logs

7

Specification Mining

?

?

?

?

source code

specification automatically

extract

understanding of objects and

object interplay

Page 9: Mining Branch-Time Scenarios From Execution Logs

8

Specification Mining from Event Logs

?

?

?

?

source code

specification

automatically

extract

log

understanding of objects and

object interplay

Page 10: Mining Branch-Time Scenarios From Execution Logs

9

This talk

?

?

?

?

source code

specification

automatically

extract

log

understanding of objects and

object interplay

Page 11: Mining Branch-Time Scenarios From Execution Logs

10

This talk

?

?

?

?

source code

specification

automatically

extract

log

understanding of objects and

object interplay

What is a good specification

language to get an overview of

how an application works?

Page 12: Mining Branch-Time Scenarios From Execution Logs

11

Understanding Object Interplay

FTP server: How does Login/Logout work?

In object-oriented applications,

the hard part is to understand

how the different objects relate

to and interact with each other.

Here are some classes of an

FTP server. How does

login/logout work?

Page 13: Mining Branch-Time Scenarios From Execution Logs

12

Understanding Object Interplay

C UserCmd B A

onConnect()

Scenario: Login/Logout

A

UserCmd

B

C

An object of class A invokes

method onConnect() on an

object of class B.

Page 14: Mining Branch-Time Scenarios From Execution Logs

13

Understanding Object Interplay

setUser()

onLogin()

C UserCmd B A

onConnect()

Scenario: Login/Logout

setLogout() A

UserCmd

B

C

Page 15: Mining Branch-Time Scenarios From Execution Logs

14

Understanding Object Interplay

setUser()

onLogin()

C UserCmd B A

onConnect()

Scenario: Login/Logout

setLogout() A

UserCmd

B

C

• non-local behavior:

multiple objects

• logically related

Page 16: Mining Branch-Time Scenarios From Execution Logs

C

Scenario: Login/Logout

setLogout()

setUser()

onLogin()

onConnect()

UserCmd B A

15

When does it happen?

This scenario tells us

which objects and

methods are involved in

login/logout.

But it does not tell when

this scenario occurs in

the application.

Page 17: Mining Branch-Time Scenarios From Execution Logs

C

Scenario: Login/Logout

setLogout()

setUser()

onLogin()

onConnect()

UserCmd B A

16

When does it happen?

whenever the prechart

happens…

eventually the mainchart

will happen

Page 18: Mining Branch-Time Scenarios From Execution Logs

17

Linear-Time LSCs - Invariants

pre

Login

Login

2 runs

There can be other behaviors

between the behaviors shown in

the LSC.

Page 19: Mining Branch-Time Scenarios From Execution Logs

18

Linear-Time LSCs - Invariants

pre

Login

Login

2 runs

This run does not continue with

the complete main chart of the

LSC.

Page 20: Mining Branch-Time Scenarios From Execution Logs

19

Understanding Everything

FTP

download

2 alternative runs

FTP

delete not everything is an invariant

alternative behaviors

FTP

rename

Page 21: Mining Branch-Time Scenarios From Execution Logs

20

Linear-Time is Insufficient

scenario for FTP delete command

2 runs

This LSC for the delete command

does not hold in every run.

Page 22: Mining Branch-Time Scenarios From Execution Logs

21

Branching Time

execution tree

We merge all runs on their joint

prefixes into an execution tree.

Page 23: Mining Branch-Time Scenarios From Execution Logs

22

Branching-Time LSCs

whenever the prechart happens

there exists a branch

where the mainchart happens

[Sibay, Uchitel, Braberman ICSE 2008]

execution tree

Page 24: Mining Branch-Time Scenarios From Execution Logs

23

Describing Alternatives

execution tree

We can define an LSC for the download

command, that is alternative to delete.

Page 25: Mining Branch-Time Scenarios From Execution Logs

24

Describing Alternatives

execution tree

… and also an LSC for the Rename

command.

Page 26: Mining Branch-Time Scenarios From Execution Logs

25

Describing Alternatives

execution tree

Page 27: Mining Branch-Time Scenarios From Execution Logs

26

Describing Alternatives

execution tree

Page 28: Mining Branch-Time Scenarios From Execution Logs

understanding of objects and

object interplay 27

LSC Mining from Event Logs

automatically

extract

log

complete set of LSCs

(linear / branching)

We want to discover a set of

LSCs that can describe all

behaviors (or as much as

possible of the behaviors).

Page 29: Mining Branch-Time Scenarios From Execution Logs

28

Logs

automatically

extract

log

complete set of LSCs

(linear / branching) log method calls

caller1, callee1, method1(…) caller2, callee2, method2(…) …

Each execution of the application gives

one trace. Run application multiple

times for a log.

Page 30: Mining Branch-Time Scenarios From Execution Logs

log

29

Desired Outcome

automatically

extract

complete set of LSCs

(occuring at least s times

= support)

log

=

tree

Page 31: Mining Branch-Time Scenarios From Execution Logs

30

Mining Algorithm

tree

github.com/scenario-based-tools/sam/

variant of [Lo, Maoz, Khoo ASE 2007]

Page 32: Mining Branch-Time Scenarios From Execution Logs

31

Mining Algorithm

tree

onConnect()

onLogin()

setLogin()

setLogout()

onConnect()

onLogin()

setLogin()

setLogout()

1. enumerate all sequences of events

occurring ≥ s times

onConnect()

onLogin()

setLogin()

setLogout()

candidate words

github.com/scenario-based-tools/sam/

Starting from sequences of

length 1, recursively append

events and check if it occurs

often enough.

Efficient implementation

uses branch and bound and

some heuristics.

Page 33: Mining Branch-Time Scenarios From Execution Logs

32

Mining Algorithm

tree

onConnect()

onLogin()

setLogin()

setLogout()

onConnect()

onLogin()

setLogin()

setLogout()

1. enumerate all sequences of events

occurring ≥ s times

onConnect()

onLogin()

setLogin()

setLogout()

candidate words

onConnect()

onLogin()

setLogin()

setLogout() 2. generate all

candidate LSCs

onConnect()

onLogin()

setLogin()

setLogout()

github.com/scenario-based-tools/sam/

From LSC with pre-chart

length 1 to LSC with main-

chart length 1.

Page 34: Mining Branch-Time Scenarios From Execution Logs

33

Mining Algorithm

tree

onConnect()

onLogin()

setLogin()

setLogout()

onConnect()

onLogin()

setLogin()

setLogout()

1. enumerate all sequences of events

occurring ≥ s times

onConnect()

onLogin()

setLogin()

setLogout()

candidate words

3. test for each LSC

if satisfied ≥ c%

onConnect()

onLogin()

setLogin()

setLogout() 2. generate all

candidate LSCs

onConnect()

onLogin()

setLogin()

setLogout()

github.com/scenario-based-tools/sam/

Page 35: Mining Branch-Time Scenarios From Execution Logs

34

Mining Algorithm

tree

onConnect()

onLogin()

setLogin()

setLogout()

onConnect()

onLogin()

setLogin()

setLogout()

1. enumerate all sequences of events

occurring ≥ s times

onConnect()

onLogin()

setLogin()

setLogout()

candidate words

3. test for each LSC

if satisfied ≥ c%

onConnect()

onLogin()

setLogin()

setLogout() 2. generate all

candidate LSCs

onConnect()

onLogin()

setLogin()

setLogout()

github.com/scenario-based-tools/sam/

Page 36: Mining Branch-Time Scenarios From Execution Logs

35

LSC Mining from Event Logs

automatically

extract

log

complete set of LSCs

(linear and branching)

understanding of objects and

object interplay

What do branching

scenarios add to

specification mining?

Page 37: Mining Branch-Time Scenarios From Execution Logs

s Linear

LSC

covered

events

avg.

length

time

[s]

Branching

LSC

covered

events

avg.

length

time

[s]

20 7 90% 7 3 7+0 90% 7 3

14 9 90% 5 31 9+12 95% 13 26

10 9 90% 5 1008 9+18 95% 18 685

CrossFTP server: 54 traces, 50 event types

36

Experiments

Branching LSC contain the linear LSC and some more

strictly branching LSC that were not found before.

Branching LSC are less frequent (lower support

threshold).

Page 38: Mining Branch-Time Scenarios From Execution Logs

s Linear

LSC

covered

events

avg.

length

time

[s]

Branching

LSC

covered

events

avg.

length

time

[s]

20 7 90% 7 3 7+0 90% 7 3

14 9 90% 5 31 9+12 95% 13 26

10 9 90% 5 1008 9+18 95% 18 685

CrossFTP server: 54 traces, 50 event types

37

Experiments

Branching LSC can explore more events of the log than

just Linear LSC.

Page 39: Mining Branch-Time Scenarios From Execution Logs

s Linear

LSC

covered

events

avg.

length

time

[s]

Branching

LSC

covered

events

avg.

length

time

[s]

20 7 90% 7 3 7+0 90% 7 3

14 9 90% 5 31 9+12 95% 13 26

10 9 90% 5 1008 9+18 95% 18 685

CrossFTP server: 54 traces, 50 event types

38

Experiments

Branching LSC are longer than Linear LSC. In other

words, they show more details for a particular behavior.

Page 40: Mining Branch-Time Scenarios From Execution Logs

s Linear

LSC

covered

events

avg.

length

time

[s]

Branching

LSC

covered

events

avg.

length

time

[s]

20 7 90% 7 3 7+0 90% 7 3

14 9 90% 5 31 9+12 95% 13 26

10 9 90% 5 1008 9+18 95% 18 685

CrossFTP server: 54 traces, 50 event types

39

Experiments

Running times for extraction are feasible.

Note that LSCs shown here are the LSCs left after

removing subsumed ones. Originally, the algorithm finds

around 6 million branching LSC in 685 seconds.

Page 41: Mining Branch-Time Scenarios From Execution Logs

CrossFTP server: 54 traces, 50 event types

Columba mail client: 104 traces, 79 event types

40

Experiments

s Linear

LSC

covered

events

avg.

length

time

[s]

Branching

LSC

covered

events

avg.

length

time

[s]

20 7 90% 7 3 7+0 90% 7 3

14 9 90% 5 31 9+12 95% 13 26

10 9 90% 5 1008 9+18 95% 18 685

s / c Linear

LSC

covered

events

avg.

length

time

[s]

Branching

LSC

covered

events

avg.

length

time

[s]

20 57 70% 4 159 57+1 71% 9 154

10 205 72% 6 2191 205+53 75% 9 2055

10/.5 163 78% 6 2256 163+44 84% 6 2125

full data sets and results:

http://dx.doi.org/10.4121/uuid:aa7db920-aae6-4750-8975-cb739262f432

Page 42: Mining Branch-Time Scenarios From Execution Logs

CrossFTP server: 54 traces, 50 event types

Columba mail client: 104 traces, 79 event types

41

Experiments

s Linear

LSC

covered

events

avg.

length

time

[s]

Branching

LSC

covered

events

avg.

length

time

[s]

20 7 90% 7 3 7+0 90% 7 3

14 9 90% 5 31 9+12 95% 13 26

10 9 90% 5 1008 9+18 95% 18 685

s / c Linear

LSC

covered

events

avg.

length

time

[s]

Branching

LSC

covered

events

avg.

length

time

[s]

20 57 70% 4 159 57+1 71% 9 154

10 205 72% 6 2191 205+53 75% 9 2055

10/.5 163 78% 6 2256 163+44 84% 6 2125

full data sets and results:

http://dx.doi.org/10.4121/uuid:aa7db920-aae6-4750-8975-cb739262f432

Page 43: Mining Branch-Time Scenarios From Execution Logs

application life-cycle

from end to end

42

Linear vs. Branching: CrossFTP

connect

logout

clean up

login

What is the qualitative contribution

of branching LSC to specification

mining?

Page 44: Mining Branch-Time Scenarios From Execution Logs

application life-cycle

from end to end

43

Linear vs. Branching: CrossFTP

short invariants of

individual FTP

commands

invariant of RENAME

Page 45: Mining Branch-Time Scenarios From Execution Logs

application life-cycle

from end to end

44

Linear vs. Branching: CrossFTP

FTP command

+

where triggered

short invariants of

individual FTP

commands

rename command

login

The branching LSC fills

the gap between large

and small invariants.

Page 46: Mining Branch-Time Scenarios From Execution Logs

application life-cycle

from end to end

45

Linear vs. Branching: CrossFTP

individual FTP

commands +

where they are

triggered

individual FTP

commands +

where they are

triggered

all FTP commands

+

can be triggered in

the same situation

short invariants of

individual FTP

commands

We found all ftp commands

supported by the server, as

alternative LSC.

Page 47: Mining Branch-Time Scenarios From Execution Logs

application life-cycle

from end to end

46

Linear vs. Branching: CrossFTP

individual FTP

commands +

where they are

triggered

individual FTP

commands +

where they are

triggered

all FTP commands

+

can be triggered in

the same situation

short invariants of

individual FTP

commands

cycles: rename delete

… and we could discover

cyclic behavior: after

rename, there could be

another delete command

Page 48: Mining Branch-Time Scenarios From Execution Logs

47

Take Home Points

log

complete set of LSCs • mining branching scenarios

alternatives, cycles

• combined with linear:

comprehensive specification

• future work:

visualizing results

distributed scenarios understanding of objects and

object interplay

http://github.com/scenario-based-tools/sam/

Page 49: Mining Branch-Time Scenarios From Execution Logs

Mining

Branching-Time Scenarios

about.me/dirk.fahland

@dfahland

Page 50: Mining Branch-Time Scenarios From Execution Logs

49

Q&A …is branching time really necessary?

if

then

delete

download

or

Yes, here is a linear LSC showing a disjunction

for continuing after the pre-chart.

Page 51: Mining Branch-Time Scenarios From Execution Logs

50

Branching Time vs. Disjunction The full execution tree satisfies this Linear LSC with disjunction

and two branching LSCs describing the two alternatives in

separate LSCs.

Page 52: Mining Branch-Time Scenarios From Execution Logs

51

Branching Time vs. Disjunction

Removing one branch

from the tree (the

execution of the

download command),

violates the branching

LSCs, but still satisfies

the disjunctive linear

LSCs (because only

one of them has to

hold).


Recommended