Date post: | 01-Jul-2015 |
Category: |
Software |
Upload: | sean-byrnes |
View: | 336 times |
Download: | 1 times |
Introduction
Sean Byrnes
http://seanbyrnes.com
@sbyrnes
to
Data Science
Who Am I?
f
ATTENDED
FOUNDED
CURRENTLY
from Yahoo!
Introduction to Data Science
• What is Data Science?
• Example 1: Basic Math
• Example 2: Regression Modeling
• Example 3: Recommender Systems
• Getting started in data science
What is Data Science?
Software Engineering
+
Statistical Analysis
What is Data Science?
1. Question
2. Data Gathering
3. Exploration
4. Modeling
5. Answer
6. Production
Example 1: Basic Math
What is my customer churn rate?
def. Churn rate: The percentage of subscribers to a
service that discontinue their subscription to that service
in a given time period. (aka attrition rate)
Example 1: Basic Math
Churn(month) =
# customers at start
# customers lost
Example 1: Basic Math
Month Churn
Dec '13 3.75%
Nov '13 1.87%
Oct '13 3.82%
Sep '13 2.76%
Aug '13 2.43%
Jul '13 2.04%
Jun '13 1.60%
Example 1: Basic Math
For all customers acquired in a given month
Retention(Cmonth) =
Active(Cmonth)
Total(Cmonth)
Example 1: Basic Math
0 1 2 3 4 5 6
Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14%
Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77%
Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98%
Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4%
Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13%
Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63%
Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
Example 1: Basic Math
0 1 2 3 4 5 6
Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14%
Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77%
Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98%
Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4%
Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13%
Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63%
Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
Example 2: Regression Modeling
How many users will we have next month?
Example 2: Regression Modeling
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
Example 2: Regression Modeling
For data set X(n), find f(n) such that
f(ni) ~ X(ni)
Example 2: Regression Modeling
Assume X(ni) = [x1, x2, … xk]
f(n) = c1x1 + c2x2 + c3x3 + … + cnxn
Example 2: Regression Modeling
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
Linear Model
Example 2: Regression Modeling
Assume X(ni) = [x1, x2, … xk]
f(n) = c1x1 + c2x2 + c3x3 + … + cnxn
Or, maybe
f(n) = c1x1 + c2x12 + c3x2 + c4x2
2 + …+ cmxn2
Example 2: Regression Modeling
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
2nd Degree Polynomial Model
Example 2: Regression Modeling
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
4th Degree Polynomial Model
Example 2: Regression Modeling
https://github.com/sbyrnes/Lyric
Example 3: Recommender Systems
What other products might this
customer buy?
Example 3: Recommender Systems
Product 1 Product 2 Product 3 … Product N
Customer 1 3.5 4.0 3.0
Customer 2 2.0 3.5
Customer 3 3.0 2.5
…
Customer
N4.5 4.5
Example 3: Recommender Systems
Given customer preference matrix M, find
P x Q ~ M
Example 3: Recommender Systems
Product 1 Product 2 Product 3 … Product N
Customer 1 3.5 4.0 2.5 3.0
Customer 2 2.0 1.5 3.5 3.0
Customer 3 1.5 3.0 2.5 4.0
…
Customer
N4.5 3.5 4.0 4.5
Example 3: Recommender Systems
Given customer preferences c[p1,p2,…pn]
and overall rating average roverall
cbias = mean(c[p1], c[p2],… c[pn]) – roverall
Example 3: Recommender Systems
https://github.com/sbyrnes/likely.js
Getting Started in Data Science
• Programming
• Statistics
• Machine learning
• Toolkit
– R
– Hadoop
– D3
seanbyrnes.com
@sbyrnes
github.com/sbyrnes
Sean Byrnes
seanbyrnes.com
@sbyrnes
github.com/sbyrnes