+ All Categories
Home > Documents > By Chris Zachor. Introduction Background Changes Methodology Data Collection Network...

By Chris Zachor. Introduction Background Changes Methodology Data Collection Network...

Date post: 12-Jan-2016
Category:
Upload: jodie-lyons
View: 229 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
SOFTWARE COLLABORATION NETWORKS By Chris Zachor
Transcript
Page 1: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

SOFTWARE COLLABORATION

NETWORKSBy Chris Zachor

Page 2: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Overview

Introduction Background

Changes Methodology

Data Collection Network Topologies Measures Tools

Conclusion Questions

Page 3: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Introduction

Use network analysis to better understand the SourceForge and Github community developers

Identify key differences (if any) within the two communities

Examine the diversity of collaborations within these two communities

Page 4: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Changes

The addition of Github to the study Contains some of the same attributes to

allow for a comparison

Other communities were looked at, but they either were not large enough or did not provide enough public data.

Page 5: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Data Collection

Crawling the websites using a simple Perl script and regular expressions

Collect a project list from Sourceforge www.sourceforge.net/projects/projectTitle No specified request limit Check for duplicates

Page 6: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Sourceforge Project Page

Page 7: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Github Crawling

Using the Github API provides our data Limited to 60 API calls per minute Use multiple computers to collect all 1.5

million projects

Page 8: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Github Project Page

Page 9: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Github API

Page 10: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Developer/Project Network

Page 11: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Project-Developer Network

Page 12: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Measures and Metrics

Degree Clustering Coeficient Modularity Power Law Small World Phenomenon

Page 13: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Degree

Average number of projects worked on by a developer

Average number of collaborations Average number of developers on a

project

Page 14: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Clustering Coeficient

Examine how likely developers are to stick together in groups

Examine both average clustering coefficient for the entire network and the local clustering coefficient for nodes of interest

Page 15: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Modularity

Provide us with a measure of how diverse developer collaborations are.

Range -1 < Q < 1 Ranges closer to one show less diversity

in collaboration choices Ranges closer to negative one show more

diversity in collaboration choices

Page 16: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Power Law

Previous studies have found that the Sourceforge community does follow the power law

No such study has been done on the Github community

Fewer developers should be apart of many project while many developers should be involved with only one project

Page 17: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Small World Phenomenon

Previous studies have shown the Sourceforge community does exhibit small world properties

Once again, no study has been done on the Github community

Using Pajek, I will create a random network of the same nodes and edges

Then, compare the clustering coefficient and the average shortest path

Page 18: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Tools

Perl Pajek cURL wget GUESS

Page 19: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Conclusion

Through the use of network analysis, we hope to gain a better understanding of the developers of Sourceforge and Github communities.

Page 20: By Chris Zachor.  Introduction  Background  Changes  Methodology  Data Collection  Network Topologies  Measures  Tools  Conclusion  Questions.

Questions?

Suggestions?Comments?


Recommended