Date post: | 26-May-2015 |
Category: |
Technology |
Upload: | bram-adams |
View: | 282 times |
Download: | 0 times |
The Evolution of the R Software Ecosystem
Daniel M. GermanUniversity of Victoria
Bram AdamsÉcole Polytechnique
de Montréal
Ahmed E. HassanQueen's University
An Ecosystem is ...
An Ecosystem is ...
Jansen et al., ICSE '09
a set of (1) businesses functioning as a unit and interacting with a shared
market for (2) software and services, together with (3) the
relationships among [the businesses].
In Other Words
coreplatform
user contributions building on platform
coreplatform
user contributions building on platform
coreplatform
ecosystem infrastructure
user contributions building on platform
ecosystem infrastructure
user contributions building on platform
CRAN
ggplot
wethepeopledata.tableSim.DiffProc
randomForestrbundler
foreach
RODBC
rms
WGCNA
minpack.lm
fields caret heavy
plm
rv
ggplot2
Sim.DiffProcGUI
CRAN
ggplot
wethepeopledata.tableSim.DiffProc
randomForestrbundler
foreach
RODBC
rms
WGCNA
minpack.lm
fields caret heavy
plm
rv
ggplot2
Sim.DiffProcGUI
CRAN
In Other Words
Bosch, SPLC '09
Desktop ecosystems for end-user programming are the holy grail of software platforms!
6
6h#p://www.)obe.com
6
h#p://www.rexeranaly)cs.com/Data-‐Miner-‐Survey-‐Results-‐2011.html
6h#p://www.rexeranaly)cs.com/Data-‐Miner-‐Survey-‐Results-‐2011.html
But How Did they Get This Far?
• Very successful sta)s)cs analysis system• Created by Robert Gentleman in 1993• One of the most successful languages for non-‐programmers
• Very successful sta)s)cs analysis system• Created by Robert Gentleman in 1993• One of the most successful languages for non-‐programmers
Robert Gentleman, 1993
• Very successful sta)s)cs analysis system• Created by Robert Gentleman in 1993• One of the most successful languages for non-‐programmers
Robert Gentleman, 1993
non-‐programmers
# Goals: A first look at R objects - vectors, lists, matrices, data frames.
# To make vectors "x" "y" "year" and "names"x <- c(2,3,7,9)y <- c(9,7,3,2)year <- 1990:1993names <- c("payal", "shraddha", "kritika", "itida")# Accessing the 1st and last elements of y --y[1]y[length(y)]
# To make a list "person" --person <- list(name="payal", x=2, y=9, year=1990)person# Accessing things inside a list --person$nameperson$x
# To make a matrix, pasting together the columns "year" "x" and "y"# The verb cbind() stands for "column bind"cbind(year, x, y)
# To make a "data frame", which is a list of vectors of the same length --D <- data.frame(names, year, x, y)nrow(D)# Accessing one of these vectorsD$names# Accessing the last element of this vectorD$names[nrow(D)]# Or equally,D$names[length(D$names)] 8
The R Language
9
R has an ACTIVE Community
9
R has an ACTIVE Community
package infrastructure
9
R has an ACTIVE Community
package infrastructure mailing lists
9
R has an ACTIVE Community
package infrastructure blogsmailing lists
9
R has an ACTIVE Community
package infrastructure
books
blogsmailing lists
9
R has an ACTIVE Community
package infrastructure
books
blogsmailing lists
commercial partners
9
R has an ACTIVE Community
package infrastructure
books
blogsmailing lists
commercial partners conference
How does a Successful Ecosystem like R Evolve?
10
How does a Successful Ecosystem like R Evolve?
10
Package Characteris)cs
How does a Successful Ecosystem like R Evolve?
10
Package Characteris)cs Package Evolu)on
How does a Successful Ecosystem like R Evolve?
10
Package Characteris)cs Package Evolu)on Package Dependencies
How does a Successful Ecosystem like R Evolve?
10
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
Package Data Used
Package Data Used
CRAN
23/04/1997 -‐ 25/02/201180 official R versions
base
recommended
popular
contributed
Package Data Used
CRAN
23/04/1997 -‐ 25/02/201180 official R versions
2,733
15
13
179
19,593 versions
+
How to Define Popular Packages?
How to Define Popular Packages?
How to Define Popular Packages?
contest providing list of installed packages by 52 users
1
510
50100
5001000
Number of Packages InstalledN
umbe
r of d
iffer
ent p
acka
ges
per u
ser
All Inst. by at least 20% users
popular packages=
1
510
50100
5001000
Number of Packages InstalledN
umbe
r of d
iffer
ent p
acka
ges
per u
ser
All Inst. by at least 20% users
Mailing List Data Used
13
Mailing List Data Used
13
R-‐help
R-‐devel
Mailing List Data Used
13
R-‐help
R-‐devel
MailMiner[Be#enburg et al.]
Mailing List Data Used
13
R-‐help
R-‐devel
MailMiner[Be#enburg et al.]
PostgreSQL
How does a Successful Ecosystem like R Evolve?
14
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
How does a Successful Ecosystem like R Evolve?
14
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
0.0
0.1
0.2
0.3
0.4
0.5
Proportion of files for a given extension
Prop
ortio
n of
file
s ●
●
● ●
●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
BaseRecommendedPopularContributed
rd rtx
thp
prd
a c hde
scrip
tion
cpp
nam
espa
cef
rdat
apn
g gif
java rnw
save
htm
lxm
lte
x s qci
tatio
n
Documenta)on Files Dominate!
15
0.0
0.1
0.2
0.3
0.4
0.5
Proportion of files for a given extension
Prop
ortio
n of
file
s ●
●
● ●
●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
BaseRecommendedPopularContributed
rd rtx
thp
prd
a c hde
scrip
tion
cpp
nam
espa
cef
rdat
apn
g gif
java rnw
save
htm
lxm
lte
x s qci
tatio
n
Documenta)on Files Dominate!
15
documentaDon
0.0
0.1
0.2
0.3
0.4
0.5
Proportion of files for a given extension
Prop
ortio
n of
file
s ●
●
● ●
●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
BaseRecommendedPopularContributed
rd rtx
thp
prd
a c hde
scrip
tion
cpp
nam
espa
cef
rdat
apn
g gif
java rnw
save
htm
lxm
lte
x s qci
tatio
n
Documenta)on Files Dominate!
15
documentaDon
source code
base recommended popular contributed
Size of Documentation per Package
Documentation Files (.rd)
Line
s
0
100
1k
10k
100k
Extensive Package Documenta)on
16
5.3k 3.6k1.7k
0.6k
Contributed Packages Contain Less Code
17
Size of Source Code per Package
File type
SLO
Cs
0
1001k
10k100k
1M
All source code r c
Base Recommended Popular Contributed
Size of Source Code per Package
File type
SLO
Cs
0
1001k
10k100k
1M
All source code r c
Base Recommended Popular Contributed
Size
of S
ourc
e C
ode
per P
acka
ge
File
type
SLOCs
0
1001k10k
100k1M
All s
ourc
e co
der
c
Base
Rec
omm
ende
dPo
pula
rC
ontri
bute
d
Size
of S
ourc
e C
ode
per P
acka
ge
File
type
SLOCs
0
1001k10k
100k1M
All s
ourc
e co
der
c
Base
Rec
omm
ende
dPo
pula
rC
ontri
bute
d
base recommended popular contributed
Size of Documentation per Package
Documentation Files (.rd)
Line
s
0
100
1k
10k
100k
7.3k 3.5k 1.8k0.7k
18
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
18
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
18
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
15
50500
Number of Packages over Time
Total
●●
●●
●● ●●●
●
1998 2000 2002 2004 2006 2008 2010
●
BaseRecommendedPopularContributed
Fast Growth of Contributed Packages
19
15
50500
Number of Packages over Time
Total
●●
●●
●● ●●●
●
1998 2000 2002 2004 2006 2008 2010
●
BaseRecommendedPopularContributed
Fast Growth of Contributed Packages
19
super-‐linea
r growth
15
50500
Number of Packages over Time
Total
●●
●●
●● ●●●
●
1998 2000 2002 2004 2006 2008 2010
●
BaseRecommendedPopularContributed
Fast Growth of Contributed Packages
19
super-‐linea
r growth
conservaDve base/recommended evoluDon
Evolution of the Size of Source Code per Package
1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011
0100
10k
1M
Base Recommended Popular Contributed
Evolution of the Size of Source Code per Package
1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011
0100
10k
1M
Base Recommended Popular Contributed
Contributed Packages have Stable Size
20
Evolution of the Size of Source Code per Package
1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011
0100
10k
1M
Base Recommended Popular Contributed
Evolution of the Size of Source Code per Package
1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011
0100
10k
1M
Base Recommended Popular Contributed
Number of Releases Per Package
●
● ●
●●
●
●● ●
●
●●
●●
●
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
15
1020
50160
● RecommendedPopularContributed
The Less Core, the Less Releases
21
Number of Releases Per Package
●
● ●
●●
●
●● ●
●
●●
●●
●
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
15
1020
50160
● RecommendedPopularContributed
The Less Core, the Less Releases
21
50% had <=17 releases
Number of Releases Per Package
●
● ●
●●
●
●● ●
●
●●
●●
●
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
15
1020
50160
● RecommendedPopularContributed
The Less Core, the Less Releases
21
50% had <=3 releases
50% had <=17 releases
Date of Latest Release per Package
●
●
● ●● ● ● ● ● ● ● ● ● ● ●
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
200320042005200620072008200920102011
● RecommendedPopularContributed
... but Contributed Packages are Ac)vely Maintained!
22
>90% of packages had release in last 2 years
23
23
24
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
24
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
fast growth of contributed packages
stable package size
ac)ve maintenance
24
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
fast growth of contributed packages
stable package size
ac)ve maintenance
05
1015
2025
Number of Dependencies Per Package
Proportion of Packages
Num
ber o
f Dep
ende
ncie
s
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RecommendedPopularContributed
Packages have Few Dependencies
05
1015
2025
Number of Dependencies Per Package
Proportion of Packages
Num
ber o
f Dep
ende
ncie
s
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RecommendedPopularContributed
Packages have Few Dependencies
1/3 has NONE
05
1015
2025
Number of Dependencies Per Package
Proportion of Packages
Num
ber o
f Dep
ende
ncie
s
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RecommendedPopularContributed
Packages have Few Dependencies
1/3 has NONE 1/4 has 1 dependency
Number of Dependents Per Package
Proportion of Packages
Num
ber o
f Dep
ende
nts
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
01
310
5026
0
RecommendedPopularContributed
Contributed Packages are Higher-‐Level
Number of Dependents Per Package
Proportion of Packages
Num
ber o
f Dep
ende
nts
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
01
310
5026
0
RecommendedPopularContributed
Contributed Packages are Higher-‐Level
NO dependents
Number of Dependents Per Package
Proportion of Packages
Num
ber o
f Dep
ende
nts
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
01
310
5026
0
RecommendedPopularContributed
Contributed Packages are Higher-‐Level
NO dependents50%
popular pa
ckages has
<=6 depe
ndents
27
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
fast growth of contributed packages
stable package size
ac)ve maintenance
27
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
fast growth of contributed packages
stable package size
ac)ve maintenance
few dependencies
contributed packages are higher level
27
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
fast growth of contributed packages
stable package size
ac)ve maintenance
few dependencies
contributed packages are higher level
1998 2000 2002 2004 2006 2008 2010
05000
10000
15000
20000
#messages
● ● ● ● ●
●
●
●●
●
●
● ●
●
baserecommendedpopularcontributed
Contributed Packages Generate More User Traffic
1998 2000 2002 2004 2006 2008 2010
0500
1000
1500
2000
2500
#messages
● ●● ●
●
●
●
●
● ● ●
●
●
●
baserecommendedpopularcontributed
Contributed Packages take over Developer Traffic
1998 2000 2002 2004 2006 2008 2010
0500
1000
1500
2000
2500
#messages
● ●● ●
●
●
●
●
● ● ●
●
●
●
baserecommendedpopularcontributed
Contributed Packages take over Developer Traffic
110
010
000
Tota
l #m
essa
ges
base recommended popular contributed
The Less Core, the Less Traffic
110
010
000
Tota
l #m
essa
ges
base recommended popular contributed
The Less Core, the Less Trafficstrong
compeDDon
Tim
e
instantday
week
month
year
5 year10 year
1st msg. 10th msg. 100th msg. 1000th msg.
baserecommendedpopularcontributed
Star)ng up a Community takes 1 Year
Tim
e
instantday
week
month
year
5 year10 year
1st msg. 10th msg. 100th msg. 1000th msg.
baserecommendedpopularcontributed
Star)ng up a Community takes 1 Year
3 months
Tim
e
instantday
week
month
year
5 year10 year
1st msg. 10th msg. 100th msg. 1000th msg.
baserecommendedpopularcontributed
Star)ng up a Community takes 1 Year
3 months 1 year
Tim
e
instantday
week
month
year
5 year10 year
1st msg. 10th msg. 100th msg. 1000th msg.
baserecommendedpopularcontributed
Star)ng up a Community takes 1 Year
3 months 1 year
5 months slower
Tim
e
instantday
week
month
year
5 year10 year
1st msg. 10th msg. 100th msg. 1000th msg.
baserecommendedpopularcontributed
Star)ng up a Community takes 1 Year
3 months 1 year
5 months slower 44.9% gets here
Tim
e
instantday
week
month
year
5 year10 year
1st msg. 10th msg. 100th msg. 1000th msg.
baserecommendedpopularcontributed
Star)ng up a Community takes 1 Year
3 months 1 year
5 months slower
only 6.5% gets this far
44.9% gets here
32
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
fast growth of contributed packages
stable package size
ac)ve maintenance
few dependencies
contributed packages are higher level
32
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
fast growth of contributed packages
stable package size
ac)ve maintenance
few dependencies
contributed packages are higher level
strong compe))on for a#en)on
building a community takes a year
So What?• How do contributors deal with the fight for aYenDon?– What is their mo)va)on?– How much effort do they spend on their package?
• How does a package become popular/recommended?– Do bloggers/books have an impact?– Or is it the other way around?
• How do R-‐forge and the core team ensure high quality releases without broken packages?
• ...
Bosch, SPLC '09
Desktop ecosystems for end-user programming are the holy grail of software platforms!
base
recommended
popular
contributed
Case Study on R
CRAN
23/04/1997 -‐ 25/02/201180 official R versions
2,733
15
13
179
19,593 versions
+
37
Package Characteris)cs Package Evolu)on Package Dependencies Package Community
extensive documenta)on
small contributed packages
fast growth of contributed packages
stable package size
ac)ve maintenance
few dependencies
contributed packages are higher level
strong compe))on for a#en)on
building a community takes a year
1st International Workshop on Release Engineering
http://releng.polymtl.ca May 20, 2013, San Francisco, USA
RELENG 2013