1
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
© 2015 Carnegie Mellon University
COCOMO 2015November 17, 2015
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic GraphsBob Stoddard SEMA
Mike Konrad SEMA
2Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Copyright 2015 Carnegie Mellon University
This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center.
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Department of Defense.
References herein to any specific commercial product, process, or service by trade name, trade mark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.
[Distribution Statement A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.
This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at [email protected].
Carnegie Mellon® is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University.
DM-0003059
2
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
3Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Problem of Developing CERs1
Why Causation instead of Correlation
Causal Modeling using DAGs2
Examples
Call for Action and Collaboration
Agenda
1Cost Estimating Relationships
2 Directed Acyclic Graphs
4Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Problem of Developing CERs
Many CERs are built using traditional correlation and statistical regression modeling
However, serious concerns exist in using these methods for the development of CERs, namely:
• What if other factors not represented in the model are responsible for the cost effects?
• What if there are convoluted factors impacting cost?
• What if cost analysts decide to interpret the regression coefficients as the degree of influence on cost?
• How do cost analysts confidently know that the CER parameters influence cost as compared to other factors that are correlated with these parameters?
3
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
5Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Problem of Developing CERs
Why Causation instead of Correlation
Causal Modeling using DAGs
Examples
Call for Action and Collaboration
Agenda
6Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Why Traditional Correlation Falls Short
Los Angeles Times
May 12, 2014
http://www.latimes.com/business/hiltzik/la-fi-mh-see-correlation-is-not-causation-20140512-column.html
4
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
7Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Why Causal Modeling is a Game Changer
8Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Causal Modeling – Dr. Judea Pearl
5
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
9Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
“… I see no greater impediment to scientific progress than the prevailing practice of focusing all of our mathematical resources on probabilistic and statistical inferences while leaving causal considerations to the mercy of intuition and good judgment.”Pearl, J. (2009). Causality. Cambridge university press. (Preface to 1st Edition)
“The development of Bayesian Networks, so people tell me, marked a turning point in the way uncertainty is handled in computer systems. For me, this development was a stepping stone towards a more profound transition, from reasoning about beliefs to reasoning about causal and counterfactual relationships.”Judea Pearl: From Bayesian Networks to Causal and Counterfactual Reasoning
Keynote Lecture at the 2014 BayesiaLab User ConferenceRecorded on September 24, 2014, in Los Angeles.
Quotes by Judea Pearl
10Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Causal Modeling – Dr. Stephen Morgan
6
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
11Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
CMU Causal Modeling Researchers-01
12Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
CMU Causal Modeling Researchers-02
7
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
13Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
2-Day Seminar offered by Dr. Felix Elwert, Univ of Wisconsin
Available through two channels:
Statistical Horizons www.statisticalhorizons.com
BayesiaLabhttp://www.bayesia.us/causal-inference-course-fairfax
Causal Inference with Directed Graphs Training
14Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Problem of Developing CERs
Why Causation instead of Correlation
Causal Modeling using DAGs
Examples
Call for Action and Collaboration
Agenda
8
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
15Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Landscape of Causal Modeling
Raw Observational
Data
Statistical Discovery ofCausal Relationships
To create the DAG(CMU Faculty)
Quantifying Causal Relations
using DAG graph surgery
and Instrumental Variables
(Pearl & Elwert)
Identity of truecausal parameters
of cost
16Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
1. Derive testable implications of a causal model to evaluate if the model is correct
2. Understand causal identification requirements to confirm whether causality may be extracted from the data
• Separating causal from spurious associations in the data
3. Inform use of traditional statistical techniques such as regression
• Deciding which control variables to include versus not to include in the analysis to achieve identification of causality
Use of Directed, Acyclic Graphs
9
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
17Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
1. DAGs consist of:
a) nodes (variables),
b) directed arrows (possible causal relationships ordered by time), and
c) missing arrows (confident assumptions about absence of causal effects
2. DAGs are nonparametric
a) No distributional assumptions
b) Linear and/or nonlinear
3. DAGs have both causal paths and non-causal (spurious) paths
Basic Concepts of DAGs
18Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
1. Indirect Connection
2. Common Cause
3. Common Effect (Collider)
Three Structures Studied in a DAG
10
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
19Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
1. Uses a technique called d-Separation
a) Algorithm to help determine which paths are causal versus non-causal
b) Uses concept of blocking a path to stop transmission of non-causal association
2. Additional techniques employed include
a) Graphical identification
b) Adjustment Criterion
c) Backdoor Criterion
d) Frontdoor Criterion
e) Pearl’s do-Calculus
Deriving Testable Implications of a DAG
20Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
1. Controlling a variable
2. Stratifying a variable
3. Setting evidence on a variable
4. Observing a variable
5. Matching a variable (eg making distributions of sub-populations as similar as possible for comparison)
Blocking or Adjusting Paths
11
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
21Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Problem of Developing CERs
Why Causation instead of Correlation
Causal Modeling using DAGs
Examples
Call for Action and Collaboration
Agenda
22Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Excerpts taken from:
Example: Causality Modeling with BayesiaLab
12
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
23Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
24Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
13
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
25Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
26Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
14
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
27Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
28Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
15
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
29Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
30Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
16
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
31Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
32Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
17
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
33Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
34Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Use the CMU tool, Tetrad, to discover causal parameters in a data set containing a wide variety of factors deemed relevant to cost, or
Hypothesize a set of factors related to cost, along with their hypothesized interrelationships, followed by causal modeling using Pearl graph surgery or instrumental variable analysis using Stata
Factors may relate to existing cost parameters as well as factors related to new or emergent cost influences, such as Agile and DevOps
Cost Estimation Example
18
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
35Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Problem of Developing CERs
Why Causation instead of Correlation
Causal Modeling using DAGs
Examples
Call for Action and Collaboration
Agenda
36Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Causal modeling with observational data is practical
Causal modeling informs which variables to include in experimental research
You should consider building causal methodology into your CER development
Practical methods and tooling now exist to discover (Tetrad) and model (Tetrad, Stata) causal relationships in data
We (SEI) seek to partner with you in developing CERs by applying causal methods to your data
Call for Action and Collaboration
19
Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs
COCOMO 2015
11/17/15
© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
37Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015)© 2015 Carnegie Mellon University
Distribution Statement A: Approved for Public Release; Distribution is Unlimited
Contact Information
Points of Contact
SEMA Cost Estimation Research Group
Robert [email protected]
Mike [email protected]
U.S. Mail
Software Engineering Institute
Customer Relations
4500 Fifth Avenue
Pittsburgh, PA 15213-2612, USA
Web
www.sei.cmu.edu
www.sei.cmu.edu/contact.cfm
Customer Relations
Email: [email protected]
Telephone: +1 412-268-5800
SEI Phone: +1 412-268-5800
SEI Fax: +1 412-268-6257