Date post: | 14-Jun-2015 |
Category: |
Technology |
Upload: | sksowe |
View: | 414 times |
Download: | 0 times |
WoPDaSD ~.WoPDaSD ~.11
Sulayman K. Sowe, I. Samoladas, I. Stamelos, L. AngelisDept. of Informatics, Aristotle University, Greece.
3rd International Workshop on Public Data about Software Development (WoPDaSD)10th September 2008, Milan, Italy.
Are FLOSS Developers Committing to CVS/SVN as much as they are Talking in Mailing Lists?
Challenges for Integrating Data from Multiple Repositories
This research is partially sponsored by the FLOSSMetrics Project (Ref. No. FP6-IST5-033547), http://flossmetrics.org/ and SQO-OSS project (Ref. No. FP6-IST-5-033331),http://www.sqo-oss.eu/
WoPDaSD ~.WoPDaSD ~.22
In this presentation...➲ Nomadic life of FLOSS developers
Motivation for this research: Research hypothesis
➲ Methodology in brief
Data & Source Identification of developers from SVN & Lists
➲ Results & Discussion
➲ Summary & conclusion Ongoing research
WoPDaSD ~.WoPDaSD ~.33
➲ Like the Fulani nomads of the West African planes FLOSS developers are not bound to a single territory and are free to:
Nomadic life of FLOSS developers
participate in other projects or communities, use and reuse software/bits of code from other projects, suggest, argue for or against requirements, specs., etc. in
projects where they have least commits rights, use different identities (usernames, email), etc.
WoPDaSD ~.WoPDaSD ~.44
➲ Why research FLOSS developers or nomads? Understand the collaborative nature of developing FLOSS in
terms developer participation (code commits and email postings)
in multiple repositories - SVN and Mailing Lists.
➲ Research Hypothesis: IF Mailing lists are the main communication veins in most projects,
then CVS/SVN is a collection of arteries. Thus, FLOSS developers code and participate in lists discussions:
H0: ”FLOSS developers contribute equally to code repository and mailing lists”, alternative
H1: “FLOSS developers contribute more to code repository than mailing lists”.
Motivation for this research
WoPDaSD ~.WoPDaSD ~.55
➲ Retrieve data from 14 projects from the Flossmetric retrieval system
Mailing lists data dumps (.sql file format) SVN data dumps (.sql file format)
Methodology…Data & Source
WoPDaSD ~.WoPDaSD ~.66
➲ How many SVN commiters and Mailing Lists posters in each project?
Initial (Raw) Data
SVN Commits
ML Posts
WoPDaSD ~.WoPDaSD ~.77
➲ The main problem in studying developers activities in multiple repositories is identification:
➲ Is committer A in SVN of project X the same person (Poster A) in mailing lists of project X?
Methodology…Identification of developers
WoPDaSD ~.WoPDaSD ~.88
➲ The query result for each project gave us developers co-occurrence in both SVN and mailing list
➲ N=486 for all 14 projects. Percentage of developer in both repositories
In 8 projects = 57.14% In 4 projects = 90.11% In 2 projects = 80.21%
➲ What is going on in ibatis and turbine?
Results & Discussion…1
WoPDaSD ~.WoPDaSD ~.99
➲ Distribution of Commits & Posts Domination of commits over posts Mean commit per developer > Mean post per developer Developers are committing more to SVN than they are posting to mailing lists,
EXCEPT in ibatis and turbine.
Results & Discussion...2
WoPDaSD ~.WoPDaSD ~.1010
➲ Relationship between Commits and Posts➲ Overall correlation between commits and posts shows statistical significance
(with * and for p < 0.05).
Results & Discussion...3
WoPDaSD ~.WoPDaSD ~.1111
➲ Developers contribution in terms of commits and posts Wilcoxon signed rank test applied on mean values shows almost 50-50 split
between projects where commits = posts (green) and commits > posts (yellow). With only the turbine project showing otherwise.
Results & Discussion...4
WoPDaSD ~.WoPDaSD ~.1212
➲ FLOSS developers are coding as much as they are talking. They contribute equally to cod repositories and mailing lists, H0 supported.
➲ However, in almost all the projects, developers made more commits than posts, H1 supported.
➲ Why turbine and ibatis are outliers? Maybe the high prolific developer is making more posts than commits; in
a ratio 4:1. Something peculiar about the composition of Apache related projects
➲ Ongoing aspects of this research Automate data collection and identification process Analyze a total of 60 or more projects from the FM retrieval system. Add a quality dimension to committers variable:
Categorize commits: modifications, deletions, additions, code related, documentation (reports, readme, etc)
Time scale/Sliding frames: the evolution of commits and posts over a given period.
Summary & conclusion
WoPDaSD ~.WoPDaSD ~.1313
Thank you for your attentionQuestions ?Comments
Suggestion for improvements