1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 1/40
Foundations ofFoundations of Data AnalysisData Analysis
Lecture 1: IntroductionLecture 1: Introduction
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 2/40
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 3/40
Pierre-Simon Laplace (1749–1827)
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 4/40
Births in ParisBirths in Paris1745 - 17701745 - 1770
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 5/40
Births in ParisBirths in Paris1745 - 17701745 - 1770
251,527 Boys
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 6/40
Births in ParisBirths in Paris1745 - 17701745 - 1770
251,527 Boys 241,945 Girls
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 7/40
Are males born at a higher rateAre males born at a higher ratethan females?than females?
251,527 Boys 241,945 Girls
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 8/40
Some possible statisticsSome possible statistics251,527 Boys 241,945 Girls
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 9/40
Some possible statisticsSome possible statistics251,527 Boys 241,945 Girls
Di�erence: +9,582 Boys
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 10/40
Some possible statisticsSome possible statistics251,527 Boys 241,945 Girls
Di�erence: +9,582 Boys
Ratio: 104 Boys to 100Girls
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 11/40
Some possible statisticsSome possible statistics251,527 Boys 241,945 Girls
Di�erence: +9,582 Boys
Ratio: 104 Boys to 100Girls
Proportion: 50.97% Boys
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 12/40
Some possible visualizationsSome possible visualizations251,527 Boys 241,945 Girls
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 13/40
How did Laplace solve this?How did Laplace solve this?
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 14/40
How did Laplace solve this?How did Laplace solve this?Conditional Probability that:
rate of boys, , is greater than girls,
given
observed data
θ
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 15/40
Answer?Answer?
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 16/40
Answer?Answer?
where
P(θ > 0.5 ∣ data) = 1 − ϵ,
ϵ ≈ 1 × .10−42
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 17/40
What is probability?What is probability?
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 18/40
What is probability?What is probability?De�nition: Probability is the study of themathematical rules that govern random events.
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 19/40
But what is randomness?But what is randomness?
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 20/40
But what is randomness?But what is randomness?
Informally, a random event is an event where wedo not know the outcome without observing it.
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 21/40
But what is randomness?But what is randomness?
Informally, a random event is an event where wedo not know the outcome without observing it.
Probability tells us what we can say about suchevents, given our assumptions about thepossible outcomes.
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 22/40
What is statistics?What is statistics?
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 23/40
What is statistics?What is statistics?De�nition: Statistics is the application ofprobability to the collection, analysis, anddescription of random data.
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 24/40
Statistics is used to:
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 25/40
Statistics is used to:
Design experiments
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 26/40
Statistics is used to:
Design experiments
Summarize data
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 27/40
Statistics is used to:
Design experiments
Summarize data
Make conclusions about theworld
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 28/40
Statistics is used to:
Design experiments
Summarize data
Make conclusions about theworld
Explore complex data
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 29/40
What is machine learning?What is machine learning?
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 30/40
What is machine learning?What is machine learning?De�nition: Machine Learning builds statisticalmodels of data in order to recognize complexpatterns and to make decisions based on theseobservations.
Machine Learning is Everywhere?
Chess player Recommendation system
Assisted driving Cancer diagnosis
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 31/40
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 32/40
Levels of data analysisLevels of data analysisexpertiseexpertise
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 33/40
Levels of data analysisLevels of data analysisexpertiseexpertise
0: What is data?
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 34/40
Levels of data analysisLevels of data analysisexpertiseexpertise
0: What is data?1: I know how to run data analysis software
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 35/40
Levels of data analysisLevels of data analysisexpertiseexpertise
0: What is data?1: I know how to run data analysis software2: I understand the math behind the analysis
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 36/40
Levels of data analysisLevels of data analysisexpertiseexpertise
0: What is data?1: I know how to run data analysis software2: I understand the math behind the analysis3: I'm able to invent new data analysis methods
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 37/40
Why should you know theWhy should you know themathematical foundations?mathematical foundations?
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 38/40
When machine learning goes wrongWhen machine learning goes wrong
from Goodfellow et al. ICLR 2015
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 39/40
When machine learning goes wrongWhen machine learning goes wrong
Panda (57.7% con�dence)
from Goodfellow et al. ICLR 2015
1/17/2019 Lecture 1: Introduction
file:///C:/Users/fletcher/Research/presentations/reveal.js/L01-Introduction.html?print-pdf 40/40
When machine learning goes wrongWhen machine learning goes wrong
Panda (57.7% con�dence) Gibbon (99.3% con�dence)
from Goodfellow et al. ICLR 2015