+ All Categories
Home > Documents > Revision Control System Using Delta Script of Syntax Tree

Revision Control System Using Delta Script of Syntax Tree

Date post: 30-Dec-2015
Category:
Upload: xantha-contreras
View: 23 times
Download: 3 times
Share this document with a friend
Description:
Revision Control System Using Delta Script of Syntax Tree. Yasuhiro Hayase Makoto Matsushita Katsuro Inoue Graduate School of Information Science and Technology, Osaka University, Japan. Contents. Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees - PowerPoint PPT Presentation
29
1 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Revision Control System Using Delta Script of Syntax Tree Yasuhiro Hayase Makoto Matsushita Katsuro Inoue Graduate School of Information Science and Technology, Osaka University, Japan
Transcript
Page 1: Revision Control System Using Delta Script of Syntax Tree

1Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Revision Control SystemUsing Delta Script of Syntax Tree

Yasuhiro Hayase

Makoto Matsushita

Katsuro Inoue

Graduate School of Information Science and Technology,

Osaka University, Japan

Page 2: Revision Control System Using Delta Script of Syntax Tree

2Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Contents Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees

Step 1. Converting the Source Code into a TreeStep 2. Computing Delta of the TreesStep 3. Merging

Implementation of the System Experiments Conclusion and Future Work

Page 3: Revision Control System Using Delta Script of Syntax Tree

3Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Open Source Software Development

Increasing attention on the open-source development.

Developers are using the following tools. Revision Control System

Storing the history of the source codes and the documents through the development process.

Example: CVS, Subversion … Mailing List

Developers and users discuss using Mailing Lists. Bug-Tracking System

Page 4: Revision Control System Using Delta Script of Syntax Tree

4Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Merging on Parallel Development

Repository

XX

X1X1

Edit

Check out

Check in

XX

Check out

DeveloperA

DeveloperB

EditX2

X2 X3X3

Check inCheck outthe newest version(= X1)

The modification of Developer Awill be lost if X2 will be checked in.

X

X1

X2X3

Merging byRevision Control System

Page 5: Revision Control System Using Delta Script of Syntax Tree

5Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Problems The existing revision control systems used in open-source

development merge the files line-by-line. The line-by-line merging sometimes generates inaccurate

outputs when applied to source code:1. Detecting false conflicts

when the same line is changed by both developers.

2. Overlooking real conflictswhen the changes are occur in different lines.

If the system fails in merging the two files,

the developers have to fix it.

Page 6: Revision Control System Using Delta Script of Syntax Tree

6Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Problem 1. False Conflict Developer A and B are editing working copies of the

same file concurrently.If developers changed the same line, the revision

control system detects a conflict.But changes to the same line might not always

conflict, they can be compatible.

int refs;

int refs; /* reference count */

int refs=0;

int refs=0; /* reference count */

Fails in merging

Developer A

Developer B

Page 7: Revision Control System Using Delta Script of Syntax Tree

7Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Problem 2. Overlooking Conflict Developer A and B are editing working copies of the

same file concurrently.If developers do not change the same line, the

revision control system does not detect conflict.But changes to different lines may conflict.

int num, sum, avg;

int num, sum;

int num, sum;:avg = sum/num;int num, sum, avg;

:avg = sum/num; Illegal

merging output

Developer A

Developer B

Page 8: Revision Control System Using Delta Script of Syntax Tree

8Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Our Research GoalBuild an intelligent merging system and reduce the load on the

developers.

Avoiding false conflict on merging. Finer grained merging.

Reducing problems caused by merging. Checking that the use of a variable corresponds to its

declaration. Allowing the developers to keep their working habits.

The developers can use arbitrary editor to edit source codes. Usability of the new system should be similar to the existing sy

stems.

Page 9: Revision Control System Using Delta Script of Syntax Tree

9Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Contents Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees

Step 1. Converting the Source Code into a TreeStep 2. computing Delta of the TreesStep 3. Merging

Implementation of the System Experiments Conclusion and Future Work

Page 10: Revision Control System Using Delta Script of Syntax Tree

10Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Merging Source Codes Recognizing Tree Structure

Difference Computation and Merging of Tree StructureStep 1. Analyze the source codes and convert it to trees.

Step 2. Compute the delta of the trees.

Step 3. Apply the delta to the target tree.

Delta

Source Code

Source Code

Source

Code

Source

Code

Origin of Delta Computation Destination of Delta Computation

Target

Source Code

Source Code

Source Code

Source Code

Page 11: Revision Control System Using Delta Script of Syntax Tree

11Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Step 1. Source Code Conversion The source code is parsed and an augmented parse-tree is

built The tree includes white-space and comment nodes Each node has a string value A unique ID is assigned to each node:

the current tree is compared with the previous version of the tree stored in the repository

If corresponding node exists, same ID is assigned Otherwise, new unique ID is assigned

Each node corresponding to the use of a variable is linked to the node corresponding the declaration of that variable

4 Declare

5 int 7 i 11 i

10 Statement

1 Block{ int i; i;}

9 <WS>3 <WS> 13 <WS>

6 <WS> 8 ; 12 ;

2 { 14 }

Page 12: Revision Control System Using Delta Script of Syntax Tree

12Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Step 2. Delta ComputationDelta of two trees is computed Editing Operation

Insertion of a node: insert(NewID, String, ParentID, Index) Deletion of a leaf node: delete(ID) Updating of the node’s string: update(ID, NewString) Moving a sub-tree: move(ID, ParentID, Index)

Editing Script A sequence of editing operations Represents all the operations needed to transform a tree A into a tree

B

2 Declare

3 int 4 i 7 i

6 Statement

1 Block

5 ; 9 ;

insert(10, Declare, 1, 0)10 Declaredelete(8)

update(3, long)move(2, 1, 0) 3 long

2 Declare

4 i 5 ;3 long 8 <WS>

Page 13: Revision Control System Using Delta Script of Syntax Tree

13Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Editing ScriptThe differences between the tree A and the tree B are expressed by the editing script.

When determining the editing script, we must care to not include unnecessary operations. Assign a cost to each editing operation. Define the cost of the editing script as the sum of the cost of

each editing operations. Minimize the editing script cost.

An extended version of the existing approximate algorithm FMES is used to compute the delta between the trees.

* S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom.Change detection in hierarchically structured information.In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 493–504, 1996.

Page 14: Revision Control System Using Delta Script of Syntax Tree

14Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Delta Computation Algorithm Cost of the editing operations

insert = delete = move = 1update = from 0 to 2 it depends on the value of string before and a

fter the update operation:2*(1 – 2 * length(LCS(before, after))/(length(before)+length(after)))

Algorithm Determine the couples of matching nodes

Leaf nodes: string similarity. Inner nodes except for identifier nodes: match ratio of leaf nodes. Identifier nodes: exact same string or matching of the descendent

nodes. Build the editing script

Page 15: Revision Control System Using Delta Script of Syntax Tree

15Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example: Delta Computation

? if

? then

? doA

? Blockdelete(4)

? x

1 doA

2 x

0 Block

3 doB

4 y

5 if

6 then

1 doA

2 x

1 doA

2 x

0 Block

3 doB

4 y

delete(3)insert(5, if, 0, 1)insert(6, then, 5, 0)move(1, 6, 0)

1 doA

2 x

0 Block

5 if

6 then

Page 16: Revision Control System Using Delta Script of Syntax Tree

16Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Step 3. MergingThe editing script for converting tree A to tree B is

applied to tree C.

Problem:For some operation in the editing script there may not be a corresponding node in the tree C.If no node with a matching ID is present in the tree C, a similar node is searched. Similarity is based on: Matching of the parent node or sibling nodes Similar string

If a suitable node is found, replace the original ID in the editing script with the ID of node found.

Page 17: Revision Control System Using Delta Script of Syntax Tree

17Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example of Merging

0 if

1 then 4 else

2 doA 5 doB

3 x

A0 if

1 then

2 doA5 doB

3 x

Bupdate(6, i) move(5, 1, 0)delete(4)

0 if

1 then 4 else

2 doA 8 doC

C

6 i

D1 D3

6 y

9 z3 x

update(6, i) move(5, 1, 0)delete(4)

0 if

1 then

2 doA8 doC

9 z 3 x

update(6, i) move(5, 1, 0)delete(4)

update(6, i) move(8, 1, 0)delete(4)

0 if

1 then

2 doA 8 doC

9 z3 x

4 else

0 if

1 then

D2

2 doA

3 x

No node can be substituted

Node 8 is similar a bit. Building two treesone with the operation applied to node 8,

and one without the operation applied

Node 4 has a child node in tree C2.Building both of trees to which the operation is not

applied and sub-tree whose root is node 4 is deleted.The developer selects one of them

0 if

1 then

2 doA 8 doC

9 z3 x

4 else

0 if

1 then

8 doC 8 doA

3 x9 z

4 else

C1 C2

Page 18: Revision Control System Using Delta Script of Syntax Tree

18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Contents Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees

Step 1. Converting the Source Code into a TreeStep 2. Computing Delta of the TreesStep 3. Merging

Implementation of the System Experiments Conclusion and Future Work

Page 19: Revision Control System Using Delta Script of Syntax Tree

19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

System ImplementationThe implementation of our system is based on the

existing revision control system subversion. Client-server system The delta computation and the merge operations are made on

the client side.

Target Programming Language is Java.

Repository stores the augmented parse trees instead of the raw source files.

The tree is stored in XML format.

Page 20: Revision Control System Using Delta Script of Syntax Tree

20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

System Overview

subversionServer

Repository

subversionClient

Developer

Delta Computation

Delta Application

Mutual ConversionXML and source code

Node Matching

XML Merging

Converting between source code and XML

Page 21: Revision Control System Using Delta Script of Syntax Tree

21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Check-in and check-out Source Code

subversionServer

Repository

subversionClient

Node MatchingMutual Conversion

XML and source code

OriginalXML File

OriginalXML File

Editedsource code

Editedsource code

XML Filewith Node ID

XML Filewith Node ID

XML Filewithout Node

ID

XML Filewithout Node

ID

Source codeSource code

Edit

Delta Computation

Delta Application

Dataflow on Check-outDataflow on Check-in

Developer

Page 22: Revision Control System Using Delta Script of Syntax Tree

22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Merging

subversionServer

Repository

subversionClient

OriginalXML File

OriginalXML File

Editedsource code

Editedsource code

Dataflow on Merging

XML Filewith Node ID

XML Filewith Node ID

The Newest Version of XML File

The Newest Version of XML File

Delta Computation

Delta Application

Delta

Offer themto Developer

マージ結果の XML

マージ結果の XML

マージ結果の XML

Sorted XML Files as merging result

Mutual ConversionXML and source code

マージ結果の XML

マージ結果の XML

マージ結果の XML

Sorted source codes as

merging result

XML Filewithout Node

ID

XML Filewithout Node

IDNode Matching

Developer

Page 23: Revision Control System Using Delta Script of Syntax Tree

23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Contents Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees

Step 1. Converting the Source Code into a TreeStep 2. Computing Delta of the TreesStep 3. Merging

Implementation of the System Experiments Conclusion and Future Work

Page 24: Revision Control System Using Delta Script of Syntax Tree

24Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 1Checking the proper functionality of the system with a trivial test

case A small source file has been written. (Original) From Original, three variants have been derived:

Variant 1: The variable avg has been deleted.

Variant 2: A method accessing the variable avg had been added

Variant 3: The variable avg has been renamed to average

The deltas between Original and each of the three variants has been computed (Delta 1…3)

Apply each Delta to each Variant.

class C { double num, sum, avg; …}

class C { double num, sum; …}

class C { double num, sum, avg; … m() { … avg … } …}

class C { double num, sum, average; …}

Page 25: Revision Control System Using Delta Script of Syntax Tree

25Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of Experiment 1

Delta 1 Delta 2 Delta 3

Variant 1

Illegal output

Detect conflict

Variant 2

Illegal output

Illegal output

Variant 3

Detect conflict

Illegal output

Line-by-line MergingDelta 1 Delta 2 Delta 3

Variant 1

Failed (too many candidates are generated)

Success

Variant 2

Detect conflict

Success

Variant 3

Success Success

Our Algorithm

Our algorithm gave correct a result in 5 out of 6 cases. In just one case our algorithm failed to search a valid substitute node and generated too many candidates.

Page 26: Revision Control System Using Delta Script of Syntax Tree

26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 2

Evaluating the efficiency of the algorithm at actual software development

Two open source projects has been selected as test cases:Jakarta Project (22,606 files, 162,683 revisions)Eclipse Project (19,420 files, 103,358 revisions)

84 pairs of check-ins where merge occurred have been identified.

The line-by-line merging and our algorithm have been compared.

Page 27: Revision Control System Using Delta Script of Syntax Tree

27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of Experiment 2Line-by-line merging Count Our algorithm Count

Success 71 Success 71

Failure 13 Success 9

Failure 4

Our algorithm succeeded in the cases in which line-by-line merging succeeded.

Our algorithm also succeeded in 9 of the 13 cases in which line-by-line merging failed.

Page 28: Revision Control System Using Delta Script of Syntax Tree

28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of Experiment 2:Detail when line-by-line merging failedCause of failure of

line-by-line mergingOur algorithm

Comment

Addition or deletion of white space to same line

4 Success 4

Semantic change and reform 1 Success 1

EOL code change 1 Success 1

Overlapped semantic change 2 Success 2 Many candidates are generated in one case.

Overwriting prior change 2 Success 1

Failure 1 Too many candidates are generated.

Semantic Conflict 2 Failure 2

Broken source code 1 Failure 1 Can’t parse and make tree.

3 of the 4 cases in which our algorithm failed are real conflict. But in another one case. our algorithm failed to find substitute nodes and positions, and generated too many candidates. And in one case in which our algorithm succeeded, many candidates are generated also.

Page 29: Revision Control System Using Delta Script of Syntax Tree

29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion and Future Work Summary of this presentation

Problems on existing revision control systems used in open source development.

Syntactic merging of source code as solution.Implementation of the system.Two evaluations.

Future workImproving the precision of the search algorithmImproving user interface for selecting merging result

Highlight the differences between the candidates.Making inter-file link


Recommended