Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | trevor-richards |
View: | 216 times |
Download: | 0 times |
Experimentation in Computer Science (Part 2)
Experimentation in Software Engineering --- Outline
Empirical Strategies Measurement Experiment Process
E
Experiment Process:Phases
ExperimentDefinition
ExperimentPlanning
ExperimentOperation
Analysis &Interpretation
Presentation& Package Conclusions
ExperimentIdea
ExperimentProcess
Experiment Process:Phases Defined
Experiment Idea: ask the right question (insight) Experiment Definition: ask the question right Experiment Planning: design experiment to
answer question Experiment Operation: collect metrics Analysis and Interpretation: statistically evaluate
and determine practical consequences Presentation: disseminate results
E
Experiment Process:Phases
ExperimentDefinition
ExperimentPlanning
ExperimentOperation
Analysis &Interpretation
Presentation& Package Conclusions
ExperimentIdea
ExperimentProcess
Experiment Definition:Overview
Formulate experiment idea -- ask the right question Define goals -- why conduct the experiment State research questions:
Descriptive – what percentage of developers use OO? Relational – what percentage of experienced / novice
developers use OO? Causal – what’s the average productivity of developers
using OO versus developers using non-OO?
7
Experiment Definition:Overview – Example
How do test suite size and test case composition affect the costs and benefits of web testing methodologies?
E
Experiment Process:Phases
ExperimentDefinition
ExperimentPlanning
ExperimentOperation
Analysis &Interpretation
Presentation& Package Conclusions
ExperimentIdea
ExperimentProcess
9
Experiment Planning:Overview
ContextSelection
HypothesisFormulation
VariablesSelection
Selection ofSubjects
ExperimentDesign
ExperimentOperation
ExperimentDefinition
ExperimentPlanning
Instrumen-tation
ValidityEvaluation
10
Experiment Planning:Context Selection
Context: environment and personnel: Dimensions include:
off-line vs on-line student vs professional personnel toy vs real problems specific vs general software engineering domain
Selection drivers: validity vs cost
11
Experiment Planning:Hypothesis Formulation
Hypothesis: A formal statement related to a research question
Forms the basis for statistical analysis of results through hypothesis testing
Data collected in the experiment is used to, if possible, reject the hypothesis
12
Experiment Planning: Hypothesis Formulation
There are two hypotheses for each question of interest: Null Hypothesis, H0: Describes the state in which the
prediction does not hold. Alternative Hypothesis, Ha, H1, etc : Describes the
prediction we believe will be supported by evidence.
Goal of experiment is to reject H0 with as high significance as possible; this rejection then implies acceptance of the alternative hypothesis
13
Experiment Planning:Hypothesis Formulation
Hypothesis testing involves risks Type-I-error: The probability of rejecting a true null
hypothesis. In this case we infer a pattern or relationship that does not exist.
Type-II-error: The probability of not rejecting a false null hypothesis. In this case we fail to identify a pattern or relationship that does exist.
Power of a statistical test: The probability that the test will reveal a true pattern if the null hypothesis is false (1 – P(type-II-error))
14
Experiment Planning:Variable Selection
Types of Variables to Select: Independent: manipulated by investigator or nature Dependent: affected by changes in Independent
Also Select: Measures and measurement scales Ranges for variables Specific levels of independent variables to be used
15
Experiment Planning:Selection of Subjects/Objects
Selection process strongly affects ability to generalize results
Process for selecting subjects/objects: Identify population U Draw a sample from U using a sampling technique
16
Experiment Planning: Selection of Subjects/Objects
Probability sampling: Simple random: randomly select from U Systematic random:select first subject from U at
random, then select every nth after that Stratified random: divide U into strata following a
known distribution, then apply random within strata Non-probability sampling:
Convenience: select the nearest, most convenient Quota: used to get subjects from various elements of
a population; convenience is used for each element
17
Experiment Planning:Selection of Subjects/Objects
Larger sample sizes result in lower error If population has large variability, larger sample
size is needed Data analysis methods may influence choice of
sample size However: higher sample size implies higher cost Hence, we want a sample as small as possible,
but large enough so that we can generalize!
Experiment Planning:Experiment Design - Principles
Randomization. Statistical methods require that observations be made from independent random variables; applies to subjects, objects, treatments.
Blocking. Given a factor that may affect results but that we aren’t interested in; we block subjects, objects, or techniques w.r.t. that factor, and analyze blocks independently (e.g, program in TSE paper).
Balancing. Assign treatments such that each has an equal number of subjects; not essential, but simplifies and strengthens statistical analysis
Experiment Planning:Experiment Design - Design Types
We will consider several, suitable for experiments with: One factor with two treatments One factor with more than two treatments Two factors with two treatments More than two factors each with two treatments
Notation: i: the mean of the dependent variable for treatment i
Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts
• Design type:• completely randomized
• Description:• simple means comparison
• Example hypothesis:• H0: 1 = 2
• H1: 1<>2, 1>2 or 1<2,
• Examples of analyses:• T-test• Mann-Whitney
Subjects Trtmt 1 Trtmt 2
1 X
2 X
3 X
4 X
5 X
6 X
Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts
• Design type:• completely randomized
• Description:• simple means comparison
• Example hypothesis:• H0: 1 = 2
• H1: 1<>2, 1>2 or 1<2,
• Examples of analyses:• T-test• Mann-Whitney
EXAMPLE: Investigate whether humans usinga new testing method detect faults better than humans using a previousmethod. The factor is the method, treatments are old and new methods, dependent variable could be numberof faults found.
Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts
• Design type:• paired comparison
• Description:• compare differences between
techniques more precisely; beware learning effects
• Example hypothesis:• H0: d = 0 (d = mean of diff)• H1: d<>0, d>0, or d<0
• Examples of analyses:• Paired t-test, Sign test, Wilcoxon
Subjects Trtmt 1 Trtmt 2
1 2 1
2 1 2
3 2 1
4 2 1
5 1 2
6 1 2
Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts
EXAMPLE: Investigate whether a new testing criterion facilitates fault detection better than a previous criterion. The factor is the criterion, treatments are use of old and new criteria, dependent variable could be number of faults found.
• Design type:• paired comparison
• Description:• compare differences between
techniques more precisely; beware learning effects
• Example hypothesis:• H0: d = 0 (d = mean of diff)• H1: d<>0, d>0, or d<0
• Examples of analyses:• Paired t-test, Sign test, Wilcoxon
Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts
• Design type:• completely randomized
• Description:• means comparison
• Example hypothesis:• H0: 1 = 2 = 3=…= a
• H1: i<>j for some (i,j)
• Examples of analyses:• ANOVA• Kruskal-Wallis
Subjects Trtmt 1 Trtmt 2 Trtmt 3
1 X
2 X
3 X
4 X
5 X
6 X
Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts
• Design type:• completely randomized
• Description:• means comparison
• Example hypothesis:• H0: 1 = 2 = 3=…= a
• H1: i<>j for some (i,j)
• Examples of analyses:• ANOVA• Kruskal-Wallis
EXAMPLE: Investigate whether humans usinga new testing method detect faults better than humans using two previousmethods. The factor is the method, treatments are new and two old methods, dependent variable could be number of faults found.
Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts
• Design type:• randomized complete block
• Description:• compare diffs; esp. if large
variability between subjects
• Example hypothesis:• H0: 1 = 2 = 3=…= a
• H1: i<>j for some (i,j)
• Examples of analyses:• ANOVA• Kruskal-Wallis
Subjects Trtmt 1 Trtmt 2 Trtmt 3
1 1 3 2
2 3 1 2
3 2 3 1
4 2 1 3
5 3 2 1
6 1 2 3
Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts
• Design type:• randomized complete block
• Description:• compare diffs; esp. if large
variability between subjects
• Example hypothesis:• H0: 1 = 2 = 3=…= a
• H1: i<>j for some (i,j)
• Examples of analyses:• ANOVA, Kruskal-Wallis
EXAMPLE: Investigate whether a new testing criterion facilitates fault detection better than two previous criteria. The factor is the criterion, treatments are use of new and old criteria, dependent variable could be number of faults found.
Experiment Planning: Experiment Design – 2 Fctrs, 2 Trtmts
• Design type:• 2*2 factorial, 2 treatments
• Three hypotheses• Effect of treatment Ai • Effect of treatment Bi• Effect of interaction
between Ai and Bi
Factor A
Trtmt A1 Trtmt A2
Factor B Trtmt B1 Subject 4,6 Subject 1,7
Trtmt B2 Subject 2,3 Subject 5,8
• Example hypothesis:• H0: 1 = 2 = 0• H1: at least one i<>j 0• (Hypothesis instantiated for
each treatment and for interaction)
• Examples of analyses:• ANOVA
Experiment Planning: Experiment Design – 2 Fctrs, 2 Trtmts
Example: Investigate regression testability of code usingretest-all and regression test selection, in the case wheretests are coarse-grained and the case where they are fine-grained. Factor A is technique, Factor B is granularity.Design is 2*2 factorial because both factors have 2 treatments and every combination of treatments occurs
• Design type:• 2*2 factorial, 2 treatments
• Three hypotheses• Effect of treatment Ai • Effect of treatment Bi• Effect of interaction
between Ai and Bi
• Example hypothesis:• H0: 1 = 2 = 0• H1: at least one i<>j 0• (Hypothesis instantiated for
each treatment and for interaction)
• Examples of analyses:• ANOVA
Experiment Planning: Experiment Design – k Fctrs, 2 Trtmts
Given k factors, results can depend on each factor or interactions among them.
2k design has k factors with two treatments, tests all combinations
Hypotheses and analyses are the same as for 2*2 factorial
Fctr A Fctr B Fctr C Sbjcts
A1 B1 C1 2, 3
A2 B1 C1 1, 13
A1 B2 C1 5, 6
A2 B2 C1 10, 16
A1 B1 C2 7, 15
A2 B1 C2 8, 11
A1 B2 C2 4, 9
A2 B2 C2 12, 14
Experiment Planning: Experiment Design – k Fctrs, 2 Trtmts
As factors grow, expense grows. If high-order interactions can be assumed to be negligible, it is possible to run a fraction of complete factorial
This approach may be used, in particular, for exploratory studies, to identify factors having large effects
Strengthen results by running other fractions in sequence
Fctr A Fctr B Fctr C Sbjcts
A2 B1 C1 2, 3
A1 B2 C1 1, 8
A1 B1 C2 5, 6
A2 B2 C2 4, 7
One-half fractional factorialdesign of the 2k factorial design
Select combinations s.t. if one factor isremoved, remaining design is full 2k-1