+ All Categories
Home > Documents > Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when...

Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when...

Date post: 03-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
56
Science in the era of Gaia data Andy Casey Astrophysics; Statistics “big” andycasey astrowizicist astrowizici.st
Transcript
Page 1: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Science in the era of Gaia data

Andy Casey Astrophysics; Statistics

“big”

andycasey astrowizicist astrowizici.st

Page 2: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

- The Gaia mission

All about Gaia.

What makes data big?

Science in the era of Gaia data“big

Andy Casey

Page 3: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

- The Gaia mission

All about Gaia.

What makes data big?

- Pedagogy of data analysis, when you have lots of data

Examples of how pedagogy drives decisions in big and small data analysis (data-driven methods, non-parametric models)

Science in the era of Gaia data“big

Andy Casey

Page 4: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

- The Gaia mission

All about Gaia.

What makes data big?

- Pedagogy of data analysis, when you have lots of data

Examples of how pedagogy drives decisions in big and small data analysis (data-driven methods, non-parametric models)

- Tools & resources for data analysis: pick the right tool for the job

Science in the era of Gaia data“big

Andy Casey

Page 5: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

- The Gaia mission

All about Gaia.

What makes data big?

- Pedagogy of data analysis, when you have lots of data

Examples of how pedagogy drives decisions in big and small data analysis (data-driven methods, non-parametric models)

- Tools & resources for data analysis: pick the right tool for the job

- Unsolicited advice to be ahead of the data wave

Science in the era of Gaia data“big

Andy Casey

Page 6: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data
Page 7: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

having data is no longer currency in astronomy

Andy Casey

Page 8: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

having data is no longer currency in astronomy

and the ability to effortlessly use data is currencyhaving good ideas

Andy Casey

Page 9: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

having data

This talk is about making you rich

is no longer currency in astronomy

and the ability to effortlessly use data is currencyhaving good ideas

Andy Casey

Page 10: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

The Gaia satelliteThe Billion Star Surveyor(tm) — One billion stars for one billion Euros

An astrometric mission designed to measure the position, parallax, brightness, and proper motions for

more than one billion stars.

Andy Casey

Page 11: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

The Gaia satelliteThe Billion Star Surveyor(tm) — One billion stars for one billion Euros

Andy Casey

An astrometric mission designed to measure the position, parallax, brightness, and proper motions for

more than one billion stars.

Page 12: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

The Gaia satelliteThe Billion Star Surveyor(tm) — One billion stars for one billion Euros

- Positions - Proper motions - Radial velocities (and scatter) - Parallax - Photometry (G, BP, RP) - Colours (G-BP, G-RP, BP-RP) - Dust along line of sight - Stellar effective temperatures - Stellar radii - Stellar masses - Stellar luminosities - Astrometric excess noise (more than a single-star solution) - Orbital solutions for solar system objects - Variable stars (including light curves of new kinds of objects)

For up to 1.7 billion sources:

Credit: Erik Tollerud

Page 13: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Source count completenessGaia observes everything. Stars, galaxies, quasars, asteroids, et cetera.

Andy Casey

Page 14: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Photometric performanceKepler-precision photometry, but for one billion stars

Andy Casey

Page 15: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Astrometric performance(It is very good)

Andy Casey

Page 16: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Astrometric performanceNote: “Hipparchus data release 1”

Page 17: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Proper motion performanceG ~ 18 star at 30 kpc w/ 0.4 mas/yr is approx. 2 km/s precision at 100,000 light years away

Andy Casey

Page 18: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

You are here.

Andy Casey

Page 19: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Credit: S. BRUNIER/ESO/ESA

Page 20: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Andy Casey

Page 21: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Gaia Data Release 2This was the first “real” data release, and just averaged values.

Andy Casey

Page 22: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Gaia Data Release 2This was the first “real” data release, and just averaged values.

are we at“big data”

yet?

Andy Casey

Page 23: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Position measurements 128 trillion

Brightness measurements 380 trillion

Medium-resolution spectra 1 billion

Low-resolution spectra 100 billion

Size of reduced data products for science 1 petabyte

Gaia Data Release 5The flood is coming. This is what we need to deal with (easily).

Andy Casey

Page 24: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Position measurements 128 trillion

Brightness measurements 380 trillion

Medium-resolution spectra 1 billion

Low-resolution spectra 100 billion

Size of reduced data products for science 1 petabyte

Gaia Data Release 5The flood is coming. This is what we need to deal with (easily).

Andy Casey

are we at“big data”

yet?

Page 25: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Position measurements 128 trillion

Brightness measurements 380 trillion

Medium-resolution spectra 1 billion

Low-resolution spectra 100 billion

Size of reduced data products for science 1 petabyte

Gaia Data Release 5The flood is coming. This is what we need to deal with (easily).

If you can load it into RAM, then you are not at big data.

rule of thumb:

Andy Casey

are we at“big data”

yet?

Page 26: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Five pedagogical questions to ask yourself

1. Do you have small data, or do you have big data?

2. What is the simplest, dumbest model you can think of?

3. What assumptions are you making?

4. What is the utility of the model?

5. What can you afford?

to keep you out of scientific and data analysis cul-de-sacs

Andy Casey

Page 27: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

1. Do you have small data, or do you have big data?

If you can’t load it into RAM, you have options (in terms of difficulty):

• Do you need to load all the data at once?

• Memory-mapped arrays: store data on external hard drives and treat it

(really carefully) as memory.

• Can you subsample the data and get a comparable result?

• Can you use statistics of the data to get a comparable result?

• Can you simplify the data you use and get a comparable result (e.g.,

ignore covariances)?

• Can you recast your problem as a map-reduce problem?Andy Casey

Page 28: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

2. What is the simplest, dumbest model you can think of?

Always start with the simplest model you can think of, even if you

“know” it is dumb and will not give you great results. For example:

1. Linear regression (for fitting data) — a design matrix can have non-

linear entries, but you are still doing linear regression!

2. k-means (for clustering) — use k-means++ for initialisation, “always”

3. Logistic regression (for classification)

Don’t change this model until you have answered all five questions!

When complicated models aren’t working correctly, always ask what is the

simplest, dumbest thing is that you could test to check your intuition.

Andy Casey

Page 29: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

3. What assumptions are you making?

You have made an infinite number of assumptions.

What are the most important assumptions?

(Seriously, write them down)

• Do you assume that your data are drawn from a straight line?

• Do you assume the data points are independent?

• Do you assume the noise in the data are normally distributed?

• Do you assume that you have the correct objective function?

• Do you assume that you have optimised to the global minimum?

• Do you assume that you have used an appropriate optimisation algorithm?

• Do you assume that the noise estimates you have are correct?

• Do you assume that we do not live in a simulation? (Would it matter?)Andy Casey

Page 30: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

4. What is the utility of the model?

All models are wrong, some are useful.

Even a dumb model can tell you a lot about what you should do next. If you

have a dumb model but you parameterise your model errors, then the model

errors (or residuals from the data) will inform you where your model is failing.

• Do the underlying physical models make good predictions?

• Under what conditions will this model fail? (models should fail loudly!)

• Do you need a point estimate of your model parameters, or do you need a

posterior probability distribution over data?

• Does this model give a point estimate that you can use for other purposes?

Andy Casey

Page 31: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

5. What can you afford?

Sometimes a point estimate of the parameters of a very simple model is

good enough to answer the question you have.

Sometimes you will need to sample a posterior probability distribution of a

complicated model. Or worse: calculate the fully marginalised likelihood (FML;

a.k.a. the “evidence”).

What can you afford? (etc.)

Answers to these questions will (in a very practical sense) help drive your

model complexity.

Andy Casey

Page 32: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Example: data-driven modelsFor when the data are better than the models.

Page 33: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Hierarchical data-driven models of stellar propertiesHierarchical, complex

model

Analytic integrals to marginalise parameters

Tractable!-ish

Use joint information between stars to de-

noise properties of the sample

arXivs: 1703.08112, 1706.05055 (Leistedt et al. and Anderson et al.)

Page 34: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Hierarchical data-driven models of stellar properties

arXivs: 1703.08112, 1706.05055 (Leistedt et al. and Anderson et al.)

Hierarchical, complex model

Analytic integrals to marginalise parameters

Tractable!-ish

Use joint information between stars to de-

noise properties of the sample

Page 35: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

1. Do you have small data, or do you have big data? Small.

2. What is the simplest, dumbest model you can think of? Gaussian mixture model.

3. What assumptions are you making? Independence among stars. Many others.

4. What is the utility of the model? Most parallaxes are noisy. This model improves them.

5. What can you afford? Posterior distributions over data, but only through analytic marginalisation.

Hierarchical data-driven models of stellar properties

Page 36: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Example: non-parametric modelsTerribly named, because they really have infinite numbers of parameters.

Page 37: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

1. Do you have small data, or do you have big data? Big. We ded.

2. What is the simplest, dumbest model you can think of? Mixture of two components.

3. What assumptions are you making? Some stars with similar colours and luminosity will be single stars.

4. What is the utility of the model? Point estimates of binary probability for two billion stars.

5. What can you afford? Posterior distributions over data, but only if we get clever.

Non-parametric model for binary star inference

Page 38: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Non-parametric model for binary star inference

Page 39: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Non-parametric model for binary star inference

radial velocity variance template systematics astrometric noise

bluer/redder than expected photometric variability

Fit a mixture model (normal and log-normal) to all observables of stars in

our “ball”

Calculate p(single|data) for the star of interest

Move on to the next…

Page 40: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

105 106 107 108

apparent g flux

0.0

0.2

0.4

0.6

0.8

1.0

radia

lve

loci

tyva

rian

ce(k

ms�

1)

105 106 107 108

apparent bp flux

0.0

0.2

0.4

0.6

0.8

1.0

radia

lve

loci

tyva

rian

ce(k

ms�

1)

105 106 107 108

apparent rp flux

0.0

0.2

0.4

0.6

0.8

1.0

radia

lve

loci

tyva

rian

ce(k

ms�

1)

Non-parametric model for binary star inference

Page 41: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Non-parametric model for binary star inferenceIn practice we might want to sample the mixture parameters

for every star

Can we afford it?

Hell no!

We can barely optimise it!

But we may be able to analytically

marginalise out parameters that we

don’t care about

Page 42: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Non-parametric model for binary star inference

~210 million parameter model for brighter stars, about 1B parameter model for all stars.

Converted a “big data” problem to a “small data” problem that is embarrassingly parallel, and one

where we might be able to analytically marginalise out many hyper-nuisance-parameters.

Page 43: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

10�4 10�3 10�2 10�1 100 101 102 103

K/Pp

1 � e2

10�4

10�3

10�2

10�1

100

101

102

103�

vrad

exce

ss/P

p1

�e2

0.0

0.2

0.4

0.6

0.8

1.0

bin

ary

pro

bab

ility

Non-parametric model for binary star inference

Page 44: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

0.0 1.5 3.0 4.5 6.0

bp-rp

�4

0

4

8

12

abso

lute

Gm

agnitude

N = 6368651

0.0 1.5 3.0 4.5 6.0

bp-rp

�4

0

4

8

12

abso

lute

Gm

agnitude

0 1

binary fraction

Non-parametric model for binary star inference

Now we can do a population study of binary stars that is 105 times larger than anything we could do before.

Page 45: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Why not just turn on the Machine Learning(tm)?

As physicists we are often interested in the mechanisms that produced the data. That is, we want a generative model for the data.

Neural networks are universal function approximators (we’ve known that literally for decades), but they will not give you a generative model for the

data that is interpretable. This applies to most ML methods.

Sometime’s that’s OK. Sometimes you don’t care about interpretability, or how the data were generated. But often we do care, and we can afford an interpretable model, but we (incorrectly) opt to use Machine Learning.

Andy Casey

Page 46: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Why not just turn on the Machine Learning(tm)?

Consider a problem where there are:

• Lots of high quality data.

• It’s hard to model those data, and/or the existing models do not make

good predictions (“the data are better than the models”).

• We just want answers. We don’t care why.

Andy Casey

Page 47: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Why not just turn on the Machine Learning(tm)?Turn on the ML!

• Create some training set of well-known objects.

• Train a Convolutional Neural Network (CNN) to estimate the intrinsic (or latent)

properties of some objects, given an image (or spectrum) of the object.

• You responsibly run cross-validation (or drop-out) to convince yourself things

work.

• You run the test step.

• Your CNN has identified an object with properties that defy everything we

thought we knew about astrophysics! (But in many other ways, it is “similar

enough” to objects in the training set, so we have some reason to trust it)

Andy Casey

Page 48: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

(Get it? Convolutional Neural Network.) Models that lack interpretability can really suck.

Page 49: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

When should I turn on the Machine Learning(tm)?Can you write a generative model for the data (that evaluates in less than a Hubble time)?

Don’t use machine learning. Forward model the data.

Do you care about model interpretability, or interpreting the results that you get?

Don’t use machine learning. Forward model the data.

Do you want a posterior probability distribution over data? Don’t use machine learning. Forward model the data.

Do you need to retain some semblance of probability over data? Don’t use machine learning. Forward model the data.

Do you want to classify or estimate things, or make decisions, and you don’t care about the physics?

Hell yeah! Turn the Machine Learning up to 11!Andy Casey

Page 50: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Even when you turn on the Machine Learning(tm), the rules still apply!

Andy Casey

From Google on ‘Scalable and accurate deep learning with electronic health records’ (Nature):

Regularised logistic regression performed essentially just as well as Deep Neural Networks (mortality C.I. 0.81-0.89 vs 0.94 to 0.96).

Huge cost, complexity, and interpretability difference in those models.

What is the simplest, dumbest model you can think of?

Start with that.

Page 51: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Standard tools for data analysisLinear algebra. Go back to basics. Keep your linear (matrix) algebra sharp.

Python (3): astropy, numpy, scipy, scikit-learn, TensorFlow (not just for ML) Positives: Good glue. Human-readable, machine-executable. Transferable skill. Negatives: Only a little bit slow.

Stan: probabilistic programming language When to use: If you have a model that doesn’t have bespoke parts (e.g., no models at grid points, or functions that are not differentiable). When not to use: When your model contains bespoke parts. Or if statements (kinda).

Fortran/C: Betterise your code by speeding up the slowest parts. You can call Fortran or C functions directly from Python.

PostgreSQL: Learn it. Write scripts to ingest data. You will thank yourself later.

Hadoop: If you have a map-reduce job, use Hadoop. Transferable skill.Andy Casey

Page 52: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

ResourcesStatistics: Information theory, inference and learning algorithms, Sokal’s notes, Probablistic Programming and Bayesian Methods for Hackers, Bayesian Data Analysis, Hamiltonian Monte Carlo

Version control: oh shit git

Machine Learning: Talking Machines, Which ML algorithm is for me?, Matrix calculus you need for deep learning, You should understand backpropagation, Machine Learning 101 (Google Engineers)

Code: astropy, tensorflow, stan, scikit-learn, fortran from python

Probabilistic graphical models: an introduction

Linear algebra: immersive linear algebra

Andy Casey

Page 53: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Unsolicited advice to be ahead of the data wave

1. Create a GitHub or BitBucket account and use it. Push daily. Push good code. Push bad code. Push grant proposals. Push paper drafts. Push. Push. Push.

2. Read arXiv:1008.4686 and do all the exercises.

3. Be familiar with tools (machine learning, optimisation algorithms, linear algebra) and know how to chose the right tool. It’s hard.

4. Think about if you can map-reduce your data analysis problem. If you can, learn Hadoop as part of that project.

5. Start with the simplest model for data analysis. But for fun, think about how to fit a line to one petabyte of data.

Andy Casey

Page 54: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Gaia SprintsNot traditional scientific meetings.

Aim is to bring together people who want to exploit Gaia data on short timescales.

We do everything in the open. Open data. Open science.

No invited participants; everyone applies to attend (incl. the SOC, the Gaia principal investigator, etc).

“Best scientific experience of my life”, “Most important week of my year”.

Next Sprint: 2019 Santa Barbara

gaia.lolAndy Casey

Page 55: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

The data are only going to get bigger. Those who can’t swim, will drown.

Those who can swim will drown in .

Conclusions

Andy Casey

Page 56: Science in the era of Gaia data - Centre for Astrophysics ......- Pedagogy of data analysis, when you have lots of data Examples of how pedagogy drives decisions in big and small data

Conclusions

1. Do you have small data, or do you have big data?

2. What is the simplest, dumbest model you can think of?

3. What assumptions are you making?

4. What is the utility of the model?

5. What can you afford?

The data are only going to get bigger. Those who can’t swim, will drown.

Those who can swim will drown in .

Remember to ask yourself:

Andy Casey


Recommended