+ All Categories
Home > Software > Introduction to Data Science

Introduction to Data Science

Date post: 01-Jul-2015
Category:
Upload: sean-byrnes
View: 336 times
Download: 1 times
Share this document with a friend
Description:
What is Data Science? Why are Data Scientists so sought after at modern technology companies? In this talk, I answer those questions by reviewing the basics of data science and 3 examples of typical data science projects.
29
Introduction Sean Byrnes http://seanbyrnes.com @sbyrnes to Data Science
Transcript
Page 1: Introduction to Data Science

Introduction

Sean Byrnes

http://seanbyrnes.com

@sbyrnes

to

Data Science

Page 2: Introduction to Data Science

Who Am I?

f

ATTENDED

FOUNDED

CURRENTLY

from Yahoo!

Page 3: Introduction to Data Science

Introduction to Data Science

• What is Data Science?

• Example 1: Basic Math

• Example 2: Regression Modeling

• Example 3: Recommender Systems

• Getting started in data science

Page 4: Introduction to Data Science

What is Data Science?

Software Engineering

+

Statistical Analysis

Page 5: Introduction to Data Science

What is Data Science?

1. Question

2. Data Gathering

3. Exploration

4. Modeling

5. Answer

6. Production

Page 6: Introduction to Data Science

Example 1: Basic Math

What is my customer churn rate?

def. Churn rate: The percentage of subscribers to a

service that discontinue their subscription to that service

in a given time period. (aka attrition rate)

Page 7: Introduction to Data Science

Example 1: Basic Math

Churn(month) =

# customers at start

# customers lost

Page 8: Introduction to Data Science

Example 1: Basic Math

Month Churn

Dec '13 3.75%

Nov '13 1.87%

Oct '13 3.82%

Sep '13 2.76%

Aug '13 2.43%

Jul '13 2.04%

Jun '13 1.60%

Page 9: Introduction to Data Science

Example 1: Basic Math

For all customers acquired in a given month

Retention(Cmonth) =

Active(Cmonth)

Total(Cmonth)

Page 10: Introduction to Data Science

Example 1: Basic Math

0 1 2 3 4 5 6

Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14%

Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77%

Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98%

Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4%

Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13%

Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63%

Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%

Page 11: Introduction to Data Science

Example 1: Basic Math

0 1 2 3 4 5 6

Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14%

Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77%

Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98%

Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4%

Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13%

Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63%

Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%

Page 12: Introduction to Data Science

Example 2: Regression Modeling

How many users will we have next month?

Page 13: Introduction to Data Science

Example 2: Regression Modeling

-

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13

Page 14: Introduction to Data Science

Example 2: Regression Modeling

For data set X(n), find f(n) such that

f(ni) ~ X(ni)

Page 15: Introduction to Data Science

Example 2: Regression Modeling

Assume X(ni) = [x1, x2, … xk]

f(n) = c1x1 + c2x2 + c3x3 + … + cnxn

Page 16: Introduction to Data Science

Example 2: Regression Modeling

-

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13

Linear Model

Page 17: Introduction to Data Science

Example 2: Regression Modeling

Assume X(ni) = [x1, x2, … xk]

f(n) = c1x1 + c2x2 + c3x3 + … + cnxn

Or, maybe

f(n) = c1x1 + c2x12 + c3x2 + c4x2

2 + …+ cmxn2

Page 18: Introduction to Data Science

Example 2: Regression Modeling

-

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13

2nd Degree Polynomial Model

Page 19: Introduction to Data Science

Example 2: Regression Modeling

-

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13

4th Degree Polynomial Model

Page 20: Introduction to Data Science

Example 2: Regression Modeling

https://github.com/sbyrnes/Lyric

Page 21: Introduction to Data Science

Example 3: Recommender Systems

What other products might this

customer buy?

Page 22: Introduction to Data Science

Example 3: Recommender Systems

Product 1 Product 2 Product 3 … Product N

Customer 1 3.5 4.0 3.0

Customer 2 2.0 3.5

Customer 3 3.0 2.5

Customer

N4.5 4.5

Page 23: Introduction to Data Science

Example 3: Recommender Systems

Given customer preference matrix M, find

P x Q ~ M

Page 24: Introduction to Data Science

Example 3: Recommender Systems

Product 1 Product 2 Product 3 … Product N

Customer 1 3.5 4.0 2.5 3.0

Customer 2 2.0 1.5 3.5 3.0

Customer 3 1.5 3.0 2.5 4.0

Customer

N4.5 3.5 4.0 4.5

Page 25: Introduction to Data Science

Example 3: Recommender Systems

Given customer preferences c[p1,p2,…pn]

and overall rating average roverall

cbias = mean(c[p1], c[p2],… c[pn]) – roverall

Page 26: Introduction to Data Science

Example 3: Recommender Systems

https://github.com/sbyrnes/likely.js

Page 27: Introduction to Data Science

Getting Started in Data Science

• Programming

• Statistics

• Machine learning

• Toolkit

– R

– Hadoop

– D3

Page 28: Introduction to Data Science

seanbyrnes.com

@sbyrnes

github.com/sbyrnes

Page 29: Introduction to Data Science

Sean Byrnes

seanbyrnes.com

@sbyrnes

github.com/sbyrnes


Recommended