+ All Categories
Home > Technology > Idea Engineering

Idea Engineering

Date post: 06-May-2015
Category:
Upload: cs-ncstate
View: 176 times
Download: 0 times
Share this document with a friend
Popular Tags:
13
Idea Engineering [email protected] PROMISE’13 Oct’13 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future
Transcript
Page 1: Idea Engineering

Idea Engineering

[email protected]’13

Oct’13

0. algorithmmining

1. landscapemining

2. decisionmining

3. discussionmining

   

   

yesterday today

tomorrow future

Page 2: Idea Engineering

The Premises of PROMISE(2005)

– Wanted: predictions• Nope. Users want decision, or engagement

Page 3: Idea Engineering

The Premises of PROMISE(2005)

– Wanted: predictions• Nope. Users want decision, or engagement

– Data mining will reveal “the truth” about SE• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]• Not(Better learners = better conclusions)

Page 4: Idea Engineering

The Premises of PROMISE(2005)

– Wanted: predictions• Nope. Users want decision, or engagement

– Data mining will reveal “the truth” about SE• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]• Not(Better learners = better conclusions)

– Sooner or later: enough data for general conclusions• Found more differences than generalities• Special issues: [IST’13], [ESEj’13]• Best papers, ASE’11, MSR’12• Menzies, Zimmermann et al [TSE’13]• Lots of local models

Page 5: Idea Engineering

5

Landscape mining:look before your leap

• Report what is true about the data– Not trivia on how algorithms

walk that data

• Map the landscape– Reason on each part of map

• E.g. landscape mining– Unsupervised iterative

dichotomization– Cluster, prune– Then generate rules

Page 6: Idea Engineering

6

Landscape mining:look before your leap

• Report what is true about the data– Not trivia on how algorithms

walk that data

• Map the landscape– Reason on each part of map

• E.g. landscape mining– Unsupervised iterative

dichotomization– Cluster, prune– Then generate rules

• Different to “leap before you look”– i.e. skew learning by class variable– then study the results

• E.g. C4.5, CART, Fayya-Iranni, etc– Supervised iterative dichotomization

• E.g. 61% * 300+effort estimation papers– Algorithm tinkering, without end

Page 7: Idea Engineering

7

Find landscape = cluster data, assign “heights”

Find decisions = report delta highs to lows

Monitor discussions = watch, help, communities explore deltas

IDEA Engineering = <landscape, decisions, discussion>

Page 8: Idea Engineering

Spectral Landscape Mining• Spectrum = condition that is not

limited to a specific set of values but varies in a continuum.

• Groups together a broad range of conditions or behaviors under one single title

• In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues.

• Nystrom algorithms: approximations to eigenvalues– FASTMAP: linear time

Page 9: Idea Engineering

Project data on first 2 PCA; grid that datae.g. Nasa93dem

1) project 23 dimensions projected into 2 2a) cluster 2b) replace clusters with centroids.

MOEA: score= effort+defects +months

Page 10: Idea Engineering

Sanity check:What information loss?

• E.g. POI-3 – 400+ examples– 20 centroids

• Prediction via:– Extrapolation between two

nearest centroids

• Works as well as– Random forest, Naïve Bayes

• For defect prediction (10 data sets)

– Linear regression, M5’• For effort estimation (10 data sets)

Page 11: Idea Engineering

11

• Find delta between neighbors that go worse to better• Very small rules, found in logLinear time• Menzies et al. [TSE’13]

Planning = Inter-cluster contrast sets

Page 12: Idea Engineering

Applications

• Prediction• Planning• Monitoring• Multi-objective optimization

– Cluster first on N objectives • Anomaly detection• Incremental theory revision• Compression• Privacy• etc

Page 13: Idea Engineering

Idea Engineering

0. algorithmmining

1. landscapemining

2. decisionmining

3. discussionmining

   

   yesterday today

tomorrow future

Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear

13

Q: why call it mining?

• A1: because all the primitives for the above are in the data mining literature• So we know how to get from here to there

• A2: because data mining scales


Recommended