Invited Talk MESOCA 2014: Evolving software systems: emerging trends and challenges

Post on 26-May-2015

253 views 1 download

Tags:

description

Software evolution research is a thriving area of software engineering research. Recent years have seen a growing interest in variety of evolution topics, as witnessed by the growing number of publications dedicated to the subject. Without attempting to be complete, in this talk we provide an overview of emerging trends in software evolution research, such as extension of the traditional boundaries of software, growing attention for social and socio-technical aspects of software development processes, and interdisciplinary research applying research techniques from other research areas to study software evolution, and software evolution research techniques to other research areas. As a large body of software evolution research is empirical in nature, we are confronted by important challenges pertaining to reproducibility of the research, and its generalizability.

transcript

Software Evolution anno 2014:directions and challenges

Alexander Serebrenik

@aserebrenik

a.serebrenik@tue.nl

2008

Time for a new book!

2014

2008 vs. 2014

From systems to ecosystems

Business-oriented view

“a set of actors functioning as a unit and interacting with a shared market for software and services, together with the relationships among them.”

with thanks to International Data Corporation (IDC)

Development-centric view

a collection of software projects that are developed and evolve together in the same environment

with thanks to Bram Adams

Socio-technical viewa community of persons (end-users, developers, debuggers, …) contributing to a collection of projects

Technical

Scientific

Practical

Legal and ethical

Technical challenges

• eliminate non-names• eliminate specific quirks• group “similar” names

– first/last name – textual similarity– latent semantic analysis

• (correct groups manually)

Technical challenges

Technical challenges

• eliminate non-names• eliminate specific quirks• group “similar” names

– first/last name – textual similarity– latent semantic analysis

• (correct groups manually)

Technical challenges

Structured data2008

Unstructured data2014

Technical challenges

Structured data2008

Unstructured data2014

Scientific challenges

Scientific challenges

Raw dataProcessed data set

Tools & scripts

#MSR papers 2004-2009

Y Y Y 2Y Y N 2Y P Y 1Y P P 2Y P N 2Y N Y 16Y N P 19Y N N 64P N Y 1P N N 2N Y N 2N P N 1N N Y 7N N P 2N N N 31N/A N/A N/A 17

We share raw data but rarely share tools – reinventing the wheel anybody?

Practical challenges

• How can we share our big data with other researchers?• Different formats, different tools, storage

problems, …• How can we make our research results useful

to practitioners and development communities?

• How can we build tools and dashboards that integrate our findings?

Legal and ethical challenges

(especially for survey data)

http://www.intracto.com/blog/online-privacy-belangrijk

k-anonymity

k-anonymity

l-diversityt-closeness

2008 vs. 2014

From “traditional” to “non-traditional” artifacts:

What is software?

http://ctms.engin.umich.edu/CTMS/index.php?example=Introduction&section=SimulinkModeling

Maintainability???Evolution???

BumbleBee: a refactoring tool for spreadsheets

with thanks to Felienne Hermans

http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse.m2m.atl.doc%2Fguide%2Fconcepts%2FModel-Transformation.html

http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse.m2m.atl.doc%2Fguide%2Fconcepts%2FModel-Transformation.html

• describe evolutionary steps • relate to changes of other

artifacts• describe prevalence in

practice • support automation

New kind of verification

artifacts

2008

2009

2012

2013

2008 vs. 2014

From technical to socio-technical perspective:

Who are these people?

What do they do?

> 90% in WordPress & Drupal> 95% in FLOSS surveys> 87% in GNOME> 70% in software-related jobs (NSF)

MEN

FLOSS 2013

Europe,US,CA,AUBrazil/Argentina

How can we reliably and efficiently identify gender, age, location?

Technical challenges

?

Name + Location = Gender

Lonzo Alonzo ⇒

w35l3y wesley ⇒

Name + Location = Gender

<title>Ben Kamens</title>…<h1>We&#8217;re willing to be embarrassed about what we <em>haven&#8217;t</em> done&#8230;</h1>

Heuristics: title + first h1

Ben Kamens We’re willing to be embarrassed about what we haven’t done…

<PERSON>Ben Kamens</PERSON> We’re willing to be embarrassed about what we haven’t done…

Stanford Named Entity Tagger

Quality of gender resolution: SurveySelf-identification

As inferred TotalM F ?

M 60 3 43 106F 2 5 4 11

Self-identification

As inferred TotalM F ?

M 90 3 13 106F 2 9 0 11

+ avatars, other social media sites (manually)

PAGE 4212-04-2023

.cpp .po

.jpg

/test/

/library/ .doc

makefile .sql .conf

Occasional contributors

Frequent contributors

How can we reliably and efficiently identify human activities?

Technical challenges

How can we reliably and efficiently identify human activities?

Technical challenges