Evidence for the Pareto principle in OSS Activity
Mathieu Goeminne & Tom MensService de Génie Logiciel, Institut d’Informatique
Faculté des Sciences, Université de Mons
1
Université de Mons Mathieu Goeminne & Tom Mens
Research topic
• Study of open source software evolution.
• Taking into account the community (social network) of persons surrounding the software project (developers, users).
• Looking for recurrent behaviour in this community.
Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany2
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Goals
• Long-term goal
• Understand how software community and software product/project co-evolve
• Provide guidelines and tools to support this
• Short-term goal
• Study of how development activity is distributed over the different stakeholders.
• Find evidence for the Pareto principle in evolving OSS.
3
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Questions• Is there a core group (of developers and/or
users) being significantly more active than the others?
• How does the activity distribution evolve over time?
• Is there an overlap between the different activities?
• How does the activity distribution vary across different projects?
4
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Methodology
• Exploiting available data from source code repositories, mailing lists and bug trackers.
• Use of economy metrics measuring distribution (in)equality.
• Empirical study of 3 OSS : Brasero, Evince and Wine.
5
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Econometrics
• Gini, Hoover, Theil (normalised) : express inequality in a distribution.
• Values between 0 and 1
• 0 reflects a perfect equality
• 1 reflects a perfect inequality
• Have similar behaviors.
6
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
The 3 notions of activities we used
• # commits done
• # mails sent
• # bug status changed
All of them are related to typical developer activities
7
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Activity distributions (Gini index)
!"
!#$"
!#%"
!#&"
!#'"
!#("
!#)"
!#*"
!#+"
!#,"
$"
-./0!)" 1230!*" -./0!*" 1230!+" -./0!+" 1230!," -./0!," 1230$!" -./0$!"
.4556/7"
58697"
:;"0".<8=>?7"
!"
!#$"
!#%"
!#&"
!#'"
!#("
!#)"
!#*"
!#+"
!#,"
$"
-./0,," -./0!!" -./0!$" -./0!%" -./0!&" -./0!'" -./0!(" -./0!)" -./0!*" -./0!+" -./0!," -./0$!"
1233456"
37486"
9:;"/<.2/5"1=7>;<6"
Brasero
Evince
!"
!#$"
!#%"
!#&"
!#'"
!#("
!#)"
!#*"
!#+"
!#,"
$"
-./0,+"-./0,," 1230!!" 1230!$" 1230!%" 1230!&" 1230!'" 1230!(" 1230!)" 1230!*" 1230!+" 1230!," 1230$!"
2.44536"
47586"
9:;"<=>.<3"2?7@;=6"
Wine
8
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Pareto principle
• Most of the activity is carried out by a small group of persons.
• Typically : 20% do 80% of the job.
• Doesn’t necessarily imply that the activity distribution follows a Pareto law.
9
Université de Mons
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Pareto principle (cont.)Brasero
Evince
Wine
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
commits
mails
br changes
10
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Core groups
• Display Venn diagrams of most active (top 20) persons, according to each definition of activity.
• For each person, show the percentage of activity attributable to this person.
• Use heuristics to take into account and merge multiple identities representing the same real person.
11
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Core groups (cont.)Brasero
Evince
Wine
12
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Conclusions
• Activity distributions seem to become more and more unequally distributed.
• The Pareto principle is clearly present in studied projects.
• For Brasero and Evince, the activity is led by a limited number of persons involved in 2 or 3 of the defined activities.
• For Wine, it seems not to be the case.
13
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Future work
• Determine if the core group of a project evolves over time.
• Use sliding windows to ignore inactive persons and discover new active persons.
• Study “bus factor” and the persons involved.
• Automatic generation of Venn diagrams including all persons involved in software evolution.
14
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Future work (cont.)
• Study correlation between community structure (social network) and source code quality (as computed using software metrics).
• Automatic statistical analysis to determine the distributions fitting the data.
• Extend and refine types of activity. For instance:
• different types of commit activity (doc, source code, test, etc.); of mail activity (information, asking, answering, etc.); of bug repository activity (bug creation, modification and commenting)
15
Université de Mons Mathieu Goeminne & Tom MensTuesday 2011-03-01, SQM workshop, Oldenburg, Germany
Thank you
16