Advancing Foundation and Practice of Software Analytics

Post on 05-Dec-2014

1,016 views 2 download

description

Vision Statement Presentation on "Advancing Foundation & Practice of Software Analytics" at the 2nd International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/

transcript

Advancing Foundation & Practice of Software Analytics

Tao Xie

North Carolina State Universitywith Dongmei Zhang (Microsoft Research Asia) Xusheng Xiao (North Carolina State University)

Chunhua Weng (Columbia University)

RAISE 2013

Software Analytics

Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.

Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences. In Proc. MALETS 2011.

MSRA Software Analytics group founded in May 2009 Term coined/defined expanding scope of previous work [Buse and Zimmermann, FoSER 10][Hassan and Xie, FoSER 10]

http://research.microsoft.com/en-us/groups/sa/ http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx

ICSE 2013

Five Dimensions

Research Topics

Technology Pillars

Target Audience

Connection to Practice

Output

Research Topics – the Trinity View

Software Users

Software Development Process

Software System

• Covering different areas ofsoftware domain

• Throughout entire development cycle

• Enabling practitioners to obtain insights

Data Sources

Runtime traces

Program logs

System events

Perf counters

Usage logUser surveysOnline forum

postsBlog & Twitter

Source codeBug history

Check-in historyTest cases

Target Audience – Software Practitioners

Developer

Tester

Program Manager

Usability engineer

Designer

Support engineer

Management personnel

Operation engineer

ICSE 2013

Output – Insightful Information

Conveys meaningful and useful understanding or knowledge towards completing the target task

Not easily attainable via directly investigating raw data without aid of analytics technologies

Going from correlation to causality Examples

It is easy to count the number of re-opened bugs, but how to find out the primary reasons for these re-opened bugs?

When the availability of an online service drops below a threshold, how to localize the problem?

ICSE 2013

Output – Actionable Information

Enables software practitioners to come up with concrete solutions towards completing the target task

Examples Why bugs were re-opened?▪ A list of bug groups each with the same reason

of re-opening Why availability of online services dropped?▪ A list of problematic areas with associated

confidence values Which part of my code should be refactored?▪ A list of cloned code snippets easily explored

from different perspectives

Research Topics & Technology Pillars

Vertical

Horizontal

Information Visualization

Data Analysis Algorithms

Large-scale Computing

Software Users

Software Development Process

Software System

ICSE 2013

Connection to Practice

Software Analytics is naturally tied with software development practice

Getting real

RealData

RealProblem

s

RealUsers

RealTools

Human/Tool Cooperation: Performance Debugging in the Large

11

Pattern Matching

Bug update

Problematic Pattern

Repository

Bug Database

Trace analysis

Bug filing

StackMine [Han et al. ICSE 12]

Trace StorageTrace collection

Internet

Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

How many issues are still unknown?

Which trace file should I investigate

first?

Key to issue discovery

Bottleneck of

scalability

StackMine: Industry Impact

“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.”

- from Development Manager in WindowsHighly effective new issue

discovery onWindows mini-hang

Continuous impact on future Windows versions

12

Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

Dual Ends of the Road

13

Foundation: Science of Software Analytics?From correlation to causality

Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice

FoundationPractice

Caricature: Standard Security Research

Choose random system component

Find vulnerability

Suggest defense

Analyze security or test performance

Are we making progress?

Positive aspect: most security research addresses real problems

@J. Mitchell

Meaning of “Science”

Systematization of Knowledge: An organized body of knowledge gained through researchAd hoc point solutions vs. general understandingRepeating failures of the past with each new platform, type

of vulnerability

Scientific Method: System of acquiring knowledge based on the scientific methodProcess of hypothesis testing and experimentsBuilding abstractions and models, theorems

Universal Laws: Laws or theories that are predictiveWidely applicableMake strong, quantitative predictions

@D. Evans, J. Mitchell

Percentage of bug-introducing changes for eclipse

Don’t program on Fridays ;-)

[Zimmermann et al. 05]

Failure is a 4-letter Word

[PROMISE’11 Zeller et al.]

From Correlation to Causality

Analytic techniques are often used for applications that emphasize results over causation of the findings

Users may choose to act on the behavior without focus on understanding it (or its causation) provided that the pattern has a high empirical probability of correctly identifying an issueE.g., smuggling, traveling with false documents,

or predicting winning stock

@L. Williams, M. Rappa

From Correlation to Causality cont.

Analytic techniques are often not used to support the identification and advancement of fundamental scientific principles based upon an analysis of causation

Emphasize the use of analytics to advance science (e.g., producing insights) besides the use of analytics in providing just observations

@L. Williams, M. Rappa

Open Questions

How much science of a field (e.g., soft analytics)?A field may be a means/solution in contrast

to a problem domain like “security”, “design”

How can analytics/AI be used to help build science of “X”?

How to move a field to a foundational level?How to balance foundation and practice?

Dual Ends of the Road

21

FoundationPractice

Foundation: Science of Software Analytics?From correlation to causality

Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice

Fitnex Path-Exploration Strategy for Pex in Pex Download counts

initial 20 months of release Academic: 17,366

Industrial: 13,022 Total: 30,388

22

Released since 2008

Analytics/AI is the Means to the End

Interesting results

Actionable results

vs.

Problem hunting

vs.

Problem driven

Open Questions

24

Who should bring software analytics research results to the hands of practitioners?

How to do so?

Dual Ends of the Road

25

FoundationPractice

Foundation: Science of Software Analytics?From correlation to causality

Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice

Thank you!

Questions ?

https://sites.google.com/site/asergrp

/taoxie@gmail.com

NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award