+ All Categories
Home > Documents > Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an...

Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an...

Date post: 11-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
55
Making an impact with data science Jordan Engbers, PhD Chief Scientist, Desid Labs Inc. CTO, Systolik Inc.
Transcript
Page 1: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Making an impact with data science

Jordan Engbers, PhDChief Scientist, Desid Labs Inc.

CTO, Systolik Inc.

Page 2: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Outline

Who am I?

What is data science?

Making data products

Where do you go from X?

Are you doing good?

Page 3: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

The Goal

To have a discussion around how to create meaningful impact with data

science

Page 4: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Who am I?

Page 5: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

How did I get here?

Page 6: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

2004

Bioinformatics

Multidisciplinary Program- Computer Science- Biomedical Science

Page 7: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Bioinformatics

2004 2008

Neuroscience

Just starting … no bitterness yet

Page 8: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Bioinformatics

2004 2008

Neuroscience

2013

Clinical Data Science

Big Data

Page 9: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Bioinformatics

2004 2008

Neuroscience

2013

Clinical Data Science

Data ScienceData AnalyticsPredictive Analytics

Page 10: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Bioinformatics

2004 2008

Neuroscience

2013

Clinical Data Science

- Data management for clinical researchers

- International clinical trials- Software development- Data science with clinical

registries and administrative health data (THIN)

Page 11: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Bioinformatics

2004 2008

Neuroscience

2013

Clinical Data Science

2015

Desid Labs Inc.

Data Science consulting company offering end-to-end data science services

Science-as-a-Service

desidlabs.com

Page 12: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Bioinformatics

2004 2008

Neuroscience

2013

Clinical Data Science

2015/16

Desid Labs Inc.Systolik Inc.

Taking Apps to Heart

Cardiovascular Information Systems

Focus on Analytics within Cardiovascular Care

systolik.com

Page 13: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

my random walkmusic

ministry

bioinformatics

neuroscience

clinical data science

entrepreneur

web programming

humanities

development

biology

informatics

business

healthcare

computation

machine learning

big data

Page 14: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Take Away

There is no set path to becoming a data scientist

Focus on:

Developing a scientific mindset

Strengthening your “metaskills”

Exploring many disciplines

Page 15: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Should you listen to me?

I am not speaking as an authority

I am here to share what I have learned and to help move people forward in data science

So:

- Don’t take what I say at face value- Test for yourself- Challenge what you hear- Come up with new and better ideas

Page 16: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

What is Data Science?

http://higheredublog.com/data-science-as-a-masters-a-brief-overview/

Page 17: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Com

pute

r Sci

ence

Math + Statistics

Domain Expertise

software research

machine learning

data scientist(unicorn)

science

Page 18: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

http://www.kdnuggets.com/2015/02/history-data-science-infographic.html

Page 19: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

What is Data Science?

Wikipedia that:

“...interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics…”

“...Data science employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, including signal processing, probability models, machine learning,statistical learning, data mining, database, data engineering, pattern recognition and learning, visualization, predictive analytics, uncertainty modeling, data warehousing, data compression, computer programming, artificial intelligence, and high performance computing.”

Page 20: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

What is a Data Scientist?

“Data scientists use their data and analytical ability to

find and interpret rich data sources;

manage large amounts of data despite hardware, software, and bandwidth constraints;

merge data sources;

ensure consistency of datasets;

create visualizations to aid in understanding data;

build mathematical models using the data; and

present and communicate the data insights/findings.”

Page 21: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty
Page 22: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Is data science just a set of methodologies?

Page 23: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

The purpose of a scientific discipline

Do the following descriptions make sense?

- Astronomy is the field of science that uses telescopes- Chemistry is about mixing chemicals and torturing undergrads- Statistics uses maths

Nope.

- Astronomy is the study of celestial objects and processes that allows us to understand the universe

- Chemistry examines the composition, structure, properties and change of matter to help us understand the physical world

- Statistics allows us to use data more effectively by studying the collection, analysis, interpretation, and organization of data….

Methods are invented to serve the field, not as a purpose in themselves.

Page 24: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Is data science just statistics “rebranded”?

"Data scientist is just a sexed up word for statistician." - Nate Silver

“Statistical modelling - two cultures” - Leo Breiman

“50 Years of Data Science” - David Donoho

Summary, data science is just an expanded form of statistics

But see:

“What ‘50 years of data science’ leaves out” - Sean Owen, Cloudera

What is the purpose of data science?

Page 25: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Data Science is about decisions

We democratize data access to empower all employees to make data-informed decisions, give everybody the ability to use experiments to correctly measure the impact of their decisions, and turn insights on user preferences into data products that improve the experience of using Airbnb

- Scaling Knowledge at Airbnb

That is more than statistics:

- Need to understand business processes- Requires data engineering approaches to provide the

environment- Requires software engineering to create platforms to measure

the impact and develop the data products

Page 26: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Data science is the scientific discipline focused on determining how data can drive better decisions across a wide set of domains

Scientific discipline - not just data analysis, but science

“...determining how data...” - methodologies, statistics, computer science

“...can drive better decisions…” - domain knowledge, science, engineering, social sciences...

Page 27: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

How does a focus on decisions change our approach?

1) Takes the focus away from specific methodologies (we do deep learning too!) to using the appropriate methodologies to achieve a larger overarching goal - better decisionsa) Side effect is we get to use a larger array of disciplines

i) Systems theoryii) Psychology

2) Focus on making good data products that change decisions a) Focusing on data products takes us away from “scripts” and towards an

engineered approach to data product manufacturing

Page 28: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Data science is not rebranded statistics.

Data science is a multidisciplinary discipline that seeks to understand how data can be used to improve decision

making.

Statistics is just a part of the approach.

Page 29: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Making Data Products

Page 30: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

What is a data product?

Desired OutcomeDecisionExperienceWorld

learning

data information knowledge wisdom

data product

Other Outcome

Other Outcome

Other Outcome

Other Outcome

Other Outcome

Other Outcome

Page 31: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Data products are the mechanism by which data science creates impact

Page 32: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Scientific Method

Framework for finding value in data

Data is a raw resource. Converting data to a data product requires experimentation, exploration and learning. This is the domain of science.

Agile Development

Process for creation in the face of uncertainty

Agile processes allow software teams to meet changing requirements, but stay on track and create effective products.

Engineered Products

Practices for ensuring high quality products

It is one thing to make an R script to analyse a dataset. It is another to have a resilient, auditable, scalable data product.

Desid Labs Approach

“Data science - more than just R scripts”- unofficial Desid Labs motto

Page 34: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Other dimensions

Complexity of UI

Complexity, size, and speed of data, information, and knowledge (3V’s)

This branches into the field of AI and decision making

Start with Herbert Simon

Page 36: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

More than just R scripts

“It’s one thing to create an excellent fraud detection model in R, and quite another to build:

● Fault-tolerant ingest of live data at scale that could represent fraudulent actions

● Real-time computation of features based on the data stream● Serialization, versioning and management of a fraud detection model● Real-time prediction of fraud based on computed features at scale● Learning over all historical data● Incremental update of the production model in near-real-time● Monitoring, testing, productionization of all of the above”

- Sean Owen, Cloudera

These are the sorts of things to think about when it comes to implementing your data product

Page 37: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Where do you go from X?

Page 38: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Coursera

Act

iviti

es Data PreparationIntelligence Gathering

Wha

t is th

e que

stion

?

Whe

re is

the da

ta?

Wha

t is th

e data

?

Get the

data

Store t

he da

ta

Transfo

rm th

e data

Load

the d

ata

Modeling

Featur

e eng

ineeri

ng

Preproc

essin

g

Machin

e lea

rning

algo

rithms

Valida

tion (

Phase

I - C

ross V

alida

tion)

Design Production

Visuali

zatio

n

Reduc

ing fe

ature

set

Creatin

g a pl

an fo

r integ

rating

Movem

ent to

prod

uctio

n stac

k

Version

ing an

d man

agem

ent

Monito

ring,

testin

g, de

ploym

ent

Kaggle

Hackathon

Research & Open Data

Data Science Job

Page 39: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Act

iviti

esS

kills

&

Kno

wle

dge

Data PreparationIntelligence Gathering

Wha

t is th

e que

stion

?

Whe

re is

the da

ta?

Wha

t is th

e data

?

Get the

data

Store t

he da

ta

Transfo

rm th

e data

Load

the d

ata

Modeling

Featur

e eng

ineeri

ng

Preproc

essin

g

Machin

e lea

rning

algo

rithms

Valida

tion (

Phase

I - C

ross V

alida

tion)

Design Production

Visuali

zatio

n

Reduc

ing fe

ature

set

Creatin

g a pl

an fo

r integ

rating

Movem

ent to

prod

uctio

n stac

k

Version

ing an

d man

agem

ent

Monito

ring,

testin

g, de

ploym

ent

Domain Knowledge

Data mungingDistributed computingStorageSamplingDigital signal processingHandling missing dataFilteringDatabases

Machine LearningAlgorithmic ComplexityGPU optimizationProgrammingStatisticsProbabilities

Web developmentPsychologyUI/UXSoftware engineering

DevopsTestingDebuggingEnterprise languagesCloud computing

Page 40: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Learn by doing

1) Figure out where you are in the spectrum2) Determine what experience you need to expand in either

direction3) Find projects that will give you that experience

a) Online competitionsb) Hackathonsc) Freelance workd) Your own projectse) Data journalismf) Data for Good (!)

Page 41: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Post production

Treat your data product as an hypothesis about the world

● Collect prospective data on its use● Perform cohort analyses on people who make decisions based

on the data● Consider A/B testing● Consider canary testing● Set a point where you will analyze the data (X people, X

amount of time)● Answer the question - did it make a difference?● Did it make the right difference?

Page 42: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Are you doing good?

Page 43: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

“...science and technology have been unable to keep pace with the second-order effects caused

by their first-order victories.”

- Gerald Weinberg

Page 44: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

How do we know that our data products are having the desired effect?

Data is cleaned, features determined, model created (AUC: 0.88!), implementation tested, UI designed, UX tested, integrated into production system, monitored.

Everything is done

Pat on the back - walk away

Next month’s headline:

Page 45: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

What happened?

- An algorithm is only as good as its data- An algorithm learns from the data - data is an

representation of the real world including its flaws- The real world is complex and there can be non-linear

effects

Page 46: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Obviously Data for Evil (Commission)

Predatory advertising

Surveillance of dissidents, activists

Identity theft

Social Engineering

Page 47: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Gray areas

Web lining

Databases in elections to determine wedge issues

Surveillance for security reasons

Targeted advertising

Page 48: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Data for Good … right? (Omission)

Model to determine who will respond best to social assistanceWhat if the data is from an area with strong historical racism?(Don’t use variables/features that could be impart racial bias)

Automatic tagging of photosWhat are the consequences of the algorithm being wrong?(Need to balance sensitivity and specificity)

Apps to help first-responder (geolocation)Will providing a service to some people limit access based on arbitrary technology choices?

Page 50: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Algorithms aren’t biased - but data is

Historical data encompasses our societal biases

Algorithms learn from that data and inherit these biases

https://www.fordfoundation.org/ideas/equals-change-blog/posts/can-computers-be-racist-big-data-inequality-and-discrimination/

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899

https://www.propublica.org/article/when-big-data-becomes-bad-data

https://theconversation.com/big-data-algorithms-can-discriminate-and-its-not-clear-what-to-do-about-it-45849

Page 51: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

So what do we do?

Possibilities:

● Strengthen User Control of Personal Data● Enforce Structural Changes in Market to Increase Competition● Directly Regulate Big Data Platforms to Prohibit Harmful Practices● Investing in the technical capacity of public interest lawyers, and developing a

greater cohort of public interest technologists● Pressing for “algorithmic transparency.”● Exploring effective regulation of personal data● Ethical code of conduct for data science

These are strategic suggestions - they suggest the what, but not the how

Page 52: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

We need a solution that keeps pace with the tech

1) Systematic scientific process should be appliedEquivalent of peer review

2) Agile development and testingEnsure models are implemented correctly

3) Systems modelingUnderstand the second-order effects of the system

4) MonitoringValidation of our model in the world

Page 53: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Conclusions

Data science is about decisions.

The creation of data products involves many disciplines

Determine where you are at, then expand your skills

Approach data science with care and thought - it is as easier to hurt than help

Page 55: Making an impact with data science Chief Scientist, Desid ...files.meetup.com/11057822/Making an impact with... · Agile Development Process for creation in the face of uncertainty

Questions?

@jengbers@desidlabs

[email protected]


Recommended