+ All Categories
Home > Technology > Demystify Big Data, Data Science & Signal Extraction Deep Dive

Demystify Big Data, Data Science & Signal Extraction Deep Dive

Date post: 07-Aug-2015
Category:
Upload: hyderabad-scalability-meetup
View: 66 times
Download: 1 times
Share this document with a friend
Popular Tags:
42
Signals Introduction & Deep Dive
Transcript

Signals Introduction & Deep Dive

Introduction to business signals

Use Cases for Signals

Signal Extraction : Deep Dive

R Introduction & Common Commands

Q & A

1

What we will cover in the 60 mins

2

3

4

5

What exactly is a Signal ?

• A Signal is a pattern

• It is indicative of an impending business outcome

• For example • In Telecom, billing resolution errors is a signal for customer churn

• In Retail, search frequency is a signal of purchase intent

• In Healthcare, decrease in inter hospital visit is a signal of a medical condition

• Early warning sign - Detection of the signal gives time for the business to intervene and influence an outcome

4 Business Signals in Banking Industry

1. Large balance is a signal for mutual fund product

2. Frequent bottoming is a signal for a loan product

3. Repetitive transmission before 5th of every month is a signal for loan refinance

4. Increase in frequency of delayed loan payments is a signal for PD

3 Business Signals in Retail Industry

1. Downloading a Digital/Mobile coupon request is a demand signal for product

2. Searching for a store location is a proxy for expected demand at store

3. Season change is a demand signal for certain types of products

4 Business Signals in Telecom industry

1. Identity management does not have event but application logs records a login event. Signal for a security event

2. 5 dropped calls in last 3 weeks is a signal for churn

3. Data pack exceeded 40 % of time in last 6 months is a signal for upgrade

4. Billing related tweet frequency is a signal for churn

Signal Analysis is not unique to human beings !

Is their a method to the madness ? Methodology for Signal extraction

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-1 : Business problem to solve

• I am an IT company focussing on services

• Have 300,000 employees globally

• My business model is dependent upon people

• How do I reduce attrition in my company ?

• Powerful Unanswered Questions • Can I give an attrition score for every employee ?

• Which factors are the top 3 drivers of attrition in my company

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-2 : Data Model

1. Employee id

2. Employee name

3. Tenure

4. Niche skill flag

5. Appraisal rating

6. Change in appraisal

7. Salary change

8. Relative Peer group benchmark index

9. Relative Market salary benchmark index

10. Manager

11. Project type ( support / dev / maint )

12. Technology ( mainframe

13. SMAC flag – yes/no ( Social / Mobile / Analytics / Cloud )

14. Has been abroad

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-3 : Analytical Model

1. Scoring Model

2. Logistic

3. Decision tree

4. Support Vector Machine

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-4 : Univariate Analysis

1. Trends

2. Seasonality

3. Distributions

4. Min/Max/Median/Average

5. Outlier detection

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-5 : Correlation Analysis

1. Between numeric outcome and numeric predictors

2. Examine correlation coefficient

3. If correlation coefficient > 0.6 consider it as a potential input to the model

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-6 : Crosstab Analysis

1. Find anomalies in distribution

2. For example are number of churners higher for 30-35 group in Bangalore than what is normally expected

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-7 : Model Building

1. Identify a technique like logistic

2. Build the model

3. Examine the statistical significance / quality of model

4. Examine the predictor power of input vectors

5. Iterate ! Iterate ! Iterate

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-8 : Biz Narrative

1. Convert statistical model into a business narrative

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Step-9 : Actions and ROI

1. Identify specific outbound actions to trigger in response to signal detected

2. Examine the ROI of the actions

3. Recalibrate

1.Biz problem to solve

2. Data model

3. Analytical Model

4. Univariate

5. Correlations

6.Cross tab 7. Model building

8. Business narrative

9. Action and ROI

Using R for Signal extraction

Intro to R

• R is a scripting language for statistical + graphical analysis

• R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics of U of Auckland, New Zealand during 1990s.

Why R ?

1. Inexpensive 2.Expressive

4. Fast 3. Simple

Getting your feet wet…

Your very first Data Science Common Commands

Step-1 : Download R

• http://cran.r-project.org/bin/windows/base/

• Please create a directory called /dataproject

Step-2 : How do we find out where we are ? “getwd”

• Get the current working directory

• Create a separate folder to hold all training data on your machine

Step-3 : How do we set the working directory ? “setwd”

• Change current working directory using setwd command

Step-3 : How do we load some data into R ? “read”

• You can also give the full path if required along with the file name

What is the name of the data file to load ? 2 3

What is the name of the R file handler ?

1

Read function

Step-4 : How do we see data we just loaded into R ?

• Type in the name of the file pointer to see the loaded data

File pointer

1

2. Internalising Meta Model of the imported dataset

Step-2.1 : Analysing meta model How many rows + columns are presented in the imported data ?

“Dim” shows the number of rows and columns in the dataset imported

Step-2.2 : Analysing meta model ( cont …) What are the columns in the imported dataset ?

“names” shows the columns

Step-2.3 : Analysing meta model ( cont …) What are the columns, data types + sample values in the imported dataset ? “str” shows the depth of observations and breadth

of variables used along with sample values

Step-2.4 : Analysing meta model ( cont …) What are the columns + class + # of observation rows in the imported dataset ? “str” shows the depth of observations and breadth

of variables used along with sample values

Step-3 : How to extract a subset of data based on conditions? “subset”

• 3 Key elements 1. subset

2. filepointer

3. condition

Subset command

1

2 File pointer

3 Condition clause

Step-4 : How to extract a subset of data based on MULTIPLE conditions?

• Logical operators like & or etc

Subset command

1

2 File pointer

3 Condition-1

5 Condition-2

4

Separator

Step-5 : How to sort data ? “sort”

sort

1

• Ascending order sort

Step-6 : How to apply conditional statements to get count of observations matching criteria ? “table”

WHERE CLAUSE

No of observations which matched the condition

Step-7 : How to find the median from a range of observations? median() –

Column on which to find median File pointer

Median

What are the CORE points to remember ?

Parting thought –

Slide 39

Industrial revolution,

Oil powered machines

Services revolution

Analytics powers processes

Big Data Resources • datasciencecentral.com

• bigdatauniversity.com

• Courseera.com

• Big Data Architecture

• Spotting Signals in Big Data

• Signal Extraction Methodology

• Advanced Visualization in Big Data

• Exploratory Data Analysis (EDA) : Quick Deep Dive

• Best practices in designing dashboards and scorecards

• Exploring Big Data Using Bivariate Analysis

• Where to start looking in Big Data using Univariate Analysis

• Big Data Platform & Applications

• Statistics Role in Data Science

• Applied Mathematics Role in Data Science

• Data-Scientist-playbook

• 5-disruption-data-products By Data Science

Good luck in hunting for patterns using Data Science


Recommended