What is data science

Post on 16-Dec-2014

265 views 2 download

Tags:

description

An introduction to data science for global health professionals. Part one of a series

transcript

D ATA S C I E N C EW H A T I S

A N D H O W C A N I T H E L P G L O B A L H E A LT H ?

PA R T 1 I N A S E R I E S D E V E L O P E D B Y J O H N S P E N C E R S e p t e m b e r 2 0 1 4 h t t p : / / d a t a r e v o l u t i o n . u s

D A TA S C I E N C E L E T S U S E R S I D E N T I F Y A N D U N D E R S TA N D PA T T E R N S I N D A TA .

I T B A L A N C E S T R A D I T I O N A L H Y P O T H E S I S T E S T I N G A N D PA T T E R N A N A LY S I S .

D A TA S C I E N C E A L S O E M P H A S I Z E S M A K I N G T H E R E S U LT S O F T H E A N A LY S I S E A S I LY U N D E R S T O O D .

F l i c k r i m a g e b y l e e c u l l i v a n h t t p s : / / f l i c . k r / p / 4 W 5 X y m

I T C A N B E A G R E AT T O O L F O R M A N A G I N G A N D U N D E R S TA N D I N G C O M P L E X D ATA .

D ATA S C I E N C E B R I N G S T O G E T H E R S K I L L S A N D M E T H O D S F R O M D I F F E R E N T T E C H N I C A L A N D S U B S TA N T I V E A R E A S .

MATH AND

STAT I ST ICS

KNOWLEDGE

HACK ING

SK I L L SS U B S TA N T I V E K N O W L E D G E

M A C H I N E L E A R N I N G

D A N G E R Z O N E T R A D I T I O N A L R E S E A R C H

D ATA S C I E N C E

F R O M D R E W C O N W AY: b i t . l y / 1 l y G 9 U A

I N A W O R L D T H AT L O O K S L I K E T H I S ,

D ATA S C I E N C E C A N B R I N G S O M E O R D E R .

F l i c k r i m a g e D a v i d S i n g l e t o n h t t p s : / / f l i c . k r / p / 4 j h r z M

S O M E E X A M P L E S …

N E T F L I X P R I Z E

Netflix uses a recommendation engine to suggest movies based on your likes and dislikes.

The machine learning algorithms that make this possible rely on data science principles.

M A L A R I A AT L A S P R O J E C T

The researchers at the Malaria Atlas Project create models of malaria risk using Gaussian processes.

Malaria data as well as data on rainfall, temperature or land cover are inputs to the model. The model can help fill in the gap in areas where reliable data isn’t available.

w w w. m a p . o x . a c . u k

W H AT M A K E S I T D ATA S C I E N C E ?I S N ’ T T H A T J U S T D A TA A N A LY S I S ?

F l i c k r i m a g e : D e m i - B r o o k e h t t p s : / / f l i c . k r / p / 4 T n d 2 s

T R A D I T I O N A L D ATA A N A LY S I S

H Y P O T H E S I S?U N I V E R S E O F D ATA

!Q U E S T I O N A N S W E R

With traditional data analysis, a hypothesis guides data analysis. A few data sets are analyzed to prove or disprove the hypothesis.

D ATA S C I E N C E

H Y P O T H E S I S

?

U N I V E R S E O F D ATA

!

Q U E S T I O N

A N S W E R

With data science, the data itself can guide analysis. Data scientists employ a mix of hypothesis testing and pattern recognition with as many data sets as are relevant.

Data science relies on a mix of deductive and inductive reasoning to create actionable knowledge.

Traditional analysis provides understanding of phenomenon only where data exists. Data science can provide understanding where data doesn’t exist.

F l i c k r i m a g e b y f a u n g g h t t p s : / / f l i c . k r / p / 5 n 2 e F r

W H Y I S T H I S I M P O R TA N T I N G L O B A L H E A LT H ?

Around the world, there is more data collected associated with health programs than ever before.

Paradoxically, despite the fact there is more data than before, there are still data gaps.

It is not possible to collect data about every aspect of health, there will always be data that can’t be collected. Data science can help mitigate the effect of data gaps.

In fact, there is more data about the world in general. This data can provide valuable information about the context in which the programs exist.

F l i c k r i m a g e : P o s s i b l e h t t p s : / / f l i c . k r / p / e y G b M 9

In short, there’s more data about the world than ever before. That includes health related data.

There are still data gaps, things that we don’t know.

Using the data we do have, data science can identify previously unrecognized patterns and can further our understanding about things for which data doesn’t exist.

Part 1 of a series Produced by John Spencer @Jspencerunc

All presentations available via http://datarevolution.us

Produced under a Creative Commons License