Supporting Software Integration Activities with Fine ... · Untangling Fine-Grained Code Changes...

Post on 20-Jun-2020

2 views 0 download

transcript

Supporting Software Integration Activities

with Fine-grained Code Changes

Martín Dias

Advisor: Stéphane Ducasse

Co-Advisor: Damien Cassou

Research team: RMoD - Inria Lille

Begin: November 30th, 2012

End: November 29th, 2015

Fundings: INRIA doctoral grant

Software Integration

IntegratorDevelopers

Branch

MergeCommit

? $@%&!

Text-based

Semantical Conflicts

Impacts

Cherry-picking

1

2

2+𝝙

Integrator

1+𝝙

Developers

ah crotte!(biiiip)

• Who is the owner of this changed code?

• Which entities (e.g. classes, methods) have been changed?

• What is the intention of this commit?

• What bug fixes also affected the entities impacted by this change?

• Does this commit depend on previous ones?

Authorship

Code structure

Change intention

Bug tracking

Change sequence

Do Tools Support Code Integration? A Survey

(1): RMoD Inria Lille–Nord Europe, University of Lille — CRIStAL, France(2): Norizzk.com, Belgium

(Under submission to Journal of Technology)

Martín Dias (1), Stéphane Ducasse (1), Damien Cassou (1), Verónica Uquillas-Gómez (2)

RQ What is the importance and tools support of each question?

RQ What questions do integrators ask?

➡ Open call in 3 development mailing-lists➡ Literature survey

➡ Survey experts (42 integrators)

Impact (ripple effects) ?

Tangled changes?

Understanding Change Impact

Understanding Change Dependencies when Cherrypicking

Understanding Change Scattering

Most important questions without tool support

Untangling Fine-Grained Code Changes

(1): RMoD Inria Lille–Nord Europe, University of Lille — CRIStAL, France(2): SORCERERS @ Software Engineering Research Group, Delft University of Technology, The Netherlands (3): Digital Security Group, Radboud Universiteit Nijmegen, The Netherlands

(SANER’15)

Martín Dias (1), Alberto Bacchelli (2), Georgios Gousios (3), Damien Cassou (1), Stéphane Ducasse (1)

VCS

commit

t

Feature #6

Fix Bug

Feature #6

CH1 A.toString()

CH2 B.getFoo()

CH3 B.toString()

CH4 C.getBar()

CH5 B.toString()

Tangled

Development

VCS

commit

CH1 A.toString()

CH2 B.getFoo()

CH3 B.toString()

CH4 C.getBar()

CH5 B.toString()-1

+1

Integration

VCS

commit

CH1 A.toString()

CH2 B.getFoo()

CH3 B.toString()

CH4 C.getBar()

CH5 B.toString()-1

+1

Integration

VCS

commit

commit

CH1

CH4

CH5

CH2

CH3

TangledCH1

CH4

CH5

CH2

CH3

untangler tool

Herzig and Zeller (MSR 2013)

VCS repositories of 6 Java projects

Tangled commits: 20%

Untangling algorithm using features of code changes

Features of code changes?

CH4

CH5

CH1

CH3

CH1

CH2

Pair

Linear Regression

Call Graph Distance

File Distance

CH1

CH4

CH5

CH2

CH3

CH1

CH4

CH5

CH2

CH3

CH4

CH5

CH1

CH3

CH1

CH2

… ……… …

Herzig and Zeller (MSR 2013)

Limitations

dynamically-typed languages

light static analysis

t

CH1

CH2

CH3 B.toString()

CH4

CH5 B.toString()

t

CH1

CH2

CH3 B.toString()

CH4

CH5 B.toString()

t

CH1

CH2

CH3 B.toString()

CH4

CH5 B.toString()

Shadowing

Negara et al. (ECOOP’12)

“We found that 37% of code changes are shadowed by other changes, and are not

stored in VCS.”

t

CH1

CH2

CH3

CH4

CH5

t

CH1

CH2

CH3

CH4

CH5

Timestamps

t

CH1

CH2

CH3

CH4

CH5

Timestamps

VCSs don’t have this information

t

CH1

CH2

CH3

CH4

CH5

unit test☑

unit test☒

Test Runner

Activity

t

CH1

CH2

CH3

CH4

CH5

unit test☑

unit test☒

Test Runner

Activity

Overcoming such limitations

Epicea

fine-grained code changes & IDE events logging

plugin

A.toString()

B.getFoo()

B.toString()

C.getBar()

B.toString()

t

CH1

CH2

CH3

CH4

CH5

unit test☒

unit test☑

💡

Fine-Grained

Code Changes

Epicea Model: Events

Epicea Model: Code Changes

Epicea Log Browser

pluginA.toString()

B.getFoo()

B.toString()

C.getBar()

B.toString()

t

CH1

CH2

CH3

CH4

CH5

unit test☒

unit test☑☑

💡

untangler tool

Fine-Grained

Code Changes

Epicea Untangler

CH1

CH4

CH5

CH2

CH3

CH1

CH4

CH5

CH2

CH3

CH4

CH5

CH1

CH3

CH1

CH2

… ……… …

any binary classifier+ fine-grained

features

A.toString()

B.getFoo()

B.toString()

C.getBar()

B.toString()

t

CH1

CH2

CH3

CH4

CH5

unit test☒

unit test☑

CH4

CH5

CH2

CH3

CH1

CH3

Pair

Ordered distance

2

1

1

Time difference

300”

10”

10”

Same test run

F

T

F

Fine-Grained

Code Changes

Epicea Untangler: Features

Static Code Analysis

same class same package same method name # shared variable accesses # shared method calls # shared variable accesses in delta # shared method calls in delta # variable accesses reciprocal method calls both cosmetic changes

Fine-grained Code Change Analysis

ordered distance timestamp difference same test run

Epicea Untangler: Features

binary logistic regression naïve bayes random forests

Epicea Untangler: Classifiers

different assumptions on underlaying data and model

Epicea Untangler

CH1

CH4

CH5

CH2

CH3

CH1

CH4

CH5

CH2

CH3

CH4

CH5

CH1

CH3

CH1

CH2

… ……… …

binary classifier+ fine-grained

features

Which features are dominant?

Most effective classifier?RQRQ

4 monthsx 2

plugin

Manual Untangling

http://dx.doi.org/10.6084/m9.figshare.1241571

Published

training testing

same

crossed

combined

Most effective classifier?RQ

1

2

AUC ACC PREC REC F.MEASURE G.MEAN

binary logistic regression 0.92 0.68 0.43 0.96 0.60 0.76

naïve bayes 0.88 0.65 0.41 0.94 0.57 0.73

random forests 0.99 0.96 0.96 0.88 0.92 0.93

1 1

1 2

1 12

Which features are dominant?RQ

AUC ACC PREC REC F.MEASURE G.MEAN

binary logistic regression 0.92 0.68 0.43 0.96 0.60 0.76

naïve bayes 0.88 0.65 0.41 0.94 0.57 0.73

random forests 0.99 0.96 0.96 0.88 0.92 0.93

random forests w/ 0.98 0.95 0.96 0.82 0.88 0.90

time difference

ordered distance

same class

dominant { Simple

features!

dominant

Which features are dominant?RQ

95% of accuracyCH x 200

97% of accuracyCH x 800

Only 2 days

of work!

time difference

ordered distance

same class

dominant { Simple

features!

CH1

CH4

CH5

CH2

CH3

CH1

CH4

CH5

CH2

CH3

CH4

CH5

CH1

CH3

CH1

CH2

… ……… …

RQ Is it effective with new data from real users?

Good with

initial data

plugin EpiceaUntangler

manual

sorting

2 weeksx 6

Is it effective with new data?RQ

➡Median success rate: 91%

➡Qualitative feedback:

• “It works good in many cases, especially for not so big change sets”

• “It was a bit painful to check everything”

# successfully clustered changes # changes

Success rate =

Conclusion

Supporting Software Integration Activities with Fine-grained Code Changes

Martín Dias, RMoD Inria-Lille — University of Lille 1, Cristal

IntegratorDevelopers

Branch

MergeCommit