+ All Categories
Home > Technology > Data Analytics Intro Session 1, 2013

Data Analytics Intro Session 1, 2013

Date post: 24-Jan-2015
Category:
Upload: florent-renucci
View: 229 times
Download: 0 times
Share this document with a friend
Description:
Introduction to Data Analytics
71
Introduction to Data Analytics Session 1 Florent Renucci, Data Scientist, Amadeus North America 1
Transcript
Page 1: Data Analytics Intro Session 1, 2013

Introduction to Data Analytics Session 1 Florent Renucci, Data Scientist, Amadeus North America

1

Page 2: Data Analytics Intro Session 1, 2013

• I – What is it ?

• II – What’s in it for us ?

• III – How to succeed ?

• IV – How to fail ?

• V – Does it really work ?

• VI – Does it really create value ?

Introduction to Data Analytics

2

Page 3: Data Analytics Intro Session 1, 2013

3

“Field of study that gives the computer the ability to learn without being explicitly programmed.”

- Arthur Samuel, 1959

I – You said Data Science ?

Page 4: Data Analytics Intro Session 1, 2013

4

input X

function f output Y = f(X)

input X

output Y Model f such as Y = f(X) + ε

Computer Sciences

Data Sciences

I – Building a model “Numbers have an important story to tell. They rely on us to give them a voice.”

– Stephen Few

Page 5: Data Analytics Intro Session 1, 2013

5

I – Using a model “Numbers have an important story to tell. They rely on us to give them a voice.”

– Stephen Few

Page 6: Data Analytics Intro Session 1, 2013

6

I – What is it ?

• Feed a metaheuristic with the features and the explained phenomenon. The output is the link between them.

• Use the link to emulate new decisions.

In a nutshell

Page 7: Data Analytics Intro Session 1, 2013

• I – What is it ?

• II – What’s in it for us ?

• III – How to succeed ?

• IV – How to fail ?

• V – Does it really work ?

• VI – Does it really create value ?

Introduction to Data Analytics

7

Page 8: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

Concept

Goal

8

Page 9: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

Concept

Goal Make predictions

9

Page 10: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

Classification Regression

Concept

Goal Make predictions

10

Page 11: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Concept

Goal Make predictions

11

Page 12: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Concept

Goal Make predictions Understand systems

12

Page 13: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Concept

Goal Make predictions Understand systems

13

Page 14: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Concept

Goal Make predictions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

14

Page 15: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

15

Page 16: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

16

Page 17: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

17

Page 18: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

? 18

Page 19: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

? 19

Page 20: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

? 20

Page 21: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

? 21

Page 22: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

? 22

Page 23: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

? 23

Page 24: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Those who ignore Statistics are condemned to reinvent it.”

- Brad Efron

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

? 24

Page 25: Data Analytics Intro Session 1, 2013

• I – What is it ?

• II – What’s in it for us ?

• III – How to succeed ?

• IV – How to fail ?

• V – Does it really work ?

• VI – Does it really create value ?

Introduction to Data Analytics

25

Page 26: Data Analytics Intro Session 1, 2013

Introduction to Data Analytics Session 2 Florent Renucci, Data Scientist, Amadeus North America

26

Page 27: Data Analytics Intro Session 1, 2013

27 Map (filter, clean, project)

Reduce (count)

Machine Learning (learn)

III – From Data to Knowledge : Typical workflow “You can have data without information, but you cannot have information without data.”

– Daniel Keys Moran

Business insight

Page 28: Data Analytics Intro Session 1, 2013

28

III – How to succeed ?

• Keep only the variables that are the most “linked” with the pattern, basing on knowledge about the observed phenomenon. Usually used in a MapReduce framework.

• Clean the data by deleting non-meaningful observations.

• Preprocessing the features to “make them talk”.

• Feed a metaheuristic with this data. The output is the model between them.

• Use the link to emulate new decisions.

In a nutshell

Page 29: Data Analytics Intro Session 1, 2013

• I – What is it ?

• II – What’s in it for us ?

• III – How to succeed ?

• IV – How to fail ?

• V – Does it really work ?

• VI – Does it really create value ?

Introduction to Data Analytics

29

Page 30: Data Analytics Intro Session 1, 2013

30

IV – What is a good predictive model ?

Feature X

Predicted Variable Y

“Essentially, all models are wrong, but some are useful.”

- G. Box

Page 31: Data Analytics Intro Session 1, 2013

31

IV – What is a good predictive model ?

Feature X

Predicted Variable Y

“Essentially, all models are wrong, but some are useful.”

- G. Box

Undersmoothing

Page 32: Data Analytics Intro Session 1, 2013

32

IV – What is a good predictive model ?

Feature X

Predicted Variable Y

“Essentially, all models are wrong, but some are useful.”

- G. Box

Undersmoothing

Page 33: Data Analytics Intro Session 1, 2013

33

IV – What is a good predictive model ?

Feature X

Predicted Variable Y

“Essentially, all models are wrong, but some are useful.”

- G. Box

Overfitting Undersmoothing

Page 34: Data Analytics Intro Session 1, 2013

34

IV – What is a good predictive model ?

Feature X

Predicted Variable Y

“Essentially, all models are wrong, but some are useful.”

- G. Box

Overfitting Undersmoothing

Page 35: Data Analytics Intro Session 1, 2013

35

IV – What is a good predictive model ?

Feature X

Predicted Variable Y

“Essentially, all models are wrong, but some are useful.”

- G. Box

Undersmoothing Good level of complexity Overfitting

Page 36: Data Analytics Intro Session 1, 2013

36

IV – What is a good predictive model ?

Feature X

Predicted Variable Y

“Essentially, all models are wrong, but some are useful.”

- G. Box

Undersmoothing Good level of complexity Overfitting

Page 37: Data Analytics Intro Session 1, 2013

37

IV – What is a good predictive model ?

Feature X

Predicted Variable Y

“Essentially, all models are wrong, but some are useful.”

- G. Box

Error rate : how to measure it ?

Page 38: Data Analytics Intro Session 1, 2013

38

IV – Choose the good metric ! “What gets measured, gets managed.”

- Peter Drucker

• 𝐑𝐌𝐒𝐄 = 𝐲 − 𝐲 𝟐𝐧𝐢=𝟏

• 𝐚𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑𝐌𝐒𝐄 = 𝐲 −𝐲 𝟐𝐧𝐢=𝟏

𝐧−𝐤

• 𝐀𝐈𝐂 = 𝟐 𝐤 − 𝐥𝐨𝐠 𝐋

• 𝐀𝐈𝐂 = 𝐤. 𝐥𝐧 𝐧 − 𝟐. 𝐥𝐧(𝐋)

Page 39: Data Analytics Intro Session 1, 2013

39

IV – Choose the good metric ! “What gets measured, gets managed.”

- Peter Drucker

• 𝐑𝐌𝐒𝐄 = 𝐲 − 𝐲 𝟐𝐧𝐢=𝟏

• 𝐚𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑𝐌𝐒𝐄 = 𝐲 −𝐲 𝟐𝐧𝐢=𝟏

𝐧−𝐤

• 𝐀𝐈𝐂 = 𝟐 𝐤 − 𝐥𝐨𝐠 𝐋

• 𝐀𝐈𝐂 = 𝐤. 𝐥𝐧 𝐧 − 𝟐. 𝐥𝐧(𝐋)

No Silver Bullet

Page 40: Data Analytics Intro Session 1, 2013

40

Learning set

Complexity

error

time

error

Test set

“If you do not know how to ask the right question, you discover nothing.”

- Deming

IV – What is a good predictive model ?

Accuracy Robustness

Page 41: Data Analytics Intro Session 1, 2013

• I – What is it ?

• II – What’s in it for us ?

• III – How to succeed ?

• IV – How to fail ?

• V – Does it really work ?

• VI – Does it really create value ?

Introduction to Data Analytics

41

Page 42: Data Analytics Intro Session 1, 2013

Introduction to Data Analytics Session 3 Florent Renucci, Data Scientist, Amadeus North America

42

Page 43: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us? “Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.”

- Geoffrey Moore

Examples

• Spam detection

• biology/medicine

• fraud detection

• scoring (Google, Meetic)

• Weather prediction

• stock prediction

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• speech recognition

• e-marketing

• sentiment mining

• recommendation system : "you would also like" in Amazon

• rare event detection Obama’s campaign

• automatic investment on financial markets

• game playing

• Yield management

• Pavlov’s dog

• those funny things : https://www.youtube.com/watch?v=Lt-KLtkDlh8

Page 44: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us?

• Random Forest

• Naive Bayes classifier

• Support Vector Machine

• Artificial Neural Network • Bagging

• Time Series

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• K-means

• Fuzzy clustering

• Artificial Neural Network

• DBSCAN

• Expectation-Maximisation

• Gradient Descent

• Max-flow min-cut

• Belief propagation

• Temporal difference learning

• Q-learning

Algos

“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.”

- Geoffrey Moore

Page 45: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us?

• Random Forest

• Naive Bayes classifier

• Support Vector Machine

• Artificial Neural Network • Bagging

• Time Series

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• K-means

• Fuzzy clustering

• Artificial Neural Network

• DBSCAN

• Expectation-Maximisation

• Gradient Descent

• Max-flow min-cut

• Belief propagation

• Temporal difference learning

• Q-learning

Algos

“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.”

- Geoffrey Moore

Page 46: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us?

• Random Forest

• Naive Bayes classifier

• Support Vector Machine

• Artificial Neural Network • Bagging

• Time Series

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• K-means

• Fuzzy clustering

• Artificial Neural Network

• DBSCAN

• Expectation-Maximisation

• Gradient Descent

• Max-flow min-cut

• Belief propagation

• Temporal difference learning

• Q-learning

Algos

No Silver Bullet

“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.”

- Geoffrey Moore

Page 47: Data Analytics Intro Session 1, 2013

II – What ‘s in it for us?

• Random Forest

• Naive Bayes classifier

• Support Vector Machine

• Artificial Neural Network • Bagging

• Time Series

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• K-means

• Fuzzy clustering

• Artificial Neural Network

• DBSCAN

• Expectation-Maximisation

• Gradient Descent

• Max-flow min-cut

• Belief propagation

• Temporal difference learning

• Q-learning

Algos

No Free Lunch Theorem

“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.”

- Geoffrey Moore

Page 48: Data Analytics Intro Session 1, 2013

48

V – Example 1 (see R simulation) Classification and Regression Tree

Color ?

x

y

z

Page 49: Data Analytics Intro Session 1, 2013

49

Color ?

V – Example 1 (see R simulation) Classification and Regression Tree

x

y

z Color ?

up down

Green !

Page 50: Data Analytics Intro Session 1, 2013

50

x

y

z Color ?

up down

left right

Green ! Color ?

Purple ! Orange !

V – Example 1 (see R simulation) Classification and Regression Tree

Page 51: Data Analytics Intro Session 1, 2013

51

α1 α2 α3 α𝑛

Prediction step What is the color/value of the output ?

Learning step Building n trees from n random subsets

Testing step Can I really trust this tree ?

Boosting step Smartly combining the trees to avoid overfitting

𝑌 = α𝑖

𝑛

𝑖=1

𝑦𝑖

Evaluation step How good are my predictions ?

error rate = ?

V – Example 1 (see R simulation) Random Forest

Page 52: Data Analytics Intro Session 1, 2013

52

V– Example 2 (see R simulation) Univariate Time-Series AutoRegression

• The process is entirely explained by its past values.

• The goal is to find the link between k successive values and the next one. “k” (the lag order) and the link have to be estimated.

• Each process (travel) is considered independently.

The idea

• 𝐘𝐭 = f Yt−k, … , Yt−1

• “k” : How many points are related to each other ?

• “f” : what was the underlying pattern ? what happened from (t − k) to (t − 1) ?

Definition

t

Predicted

Variable Yt

t t

Page 53: Data Analytics Intro Session 1, 2013

53

• 𝐘𝐭 = f Yt−k, … , Yt−1, 𝑍𝑡−𝑘′ , … , 𝑍𝑡−1, …

• Again “k”, “f”.

• “p”, “ k’ ” : What neighbors should I consider ? How “late” are they ?

Definition

Predicted

Variable Yt

t

• The process is entirely explained by its past values AND by the past values of its “neighbors”.

• The goal is to find the link between k successive values and the next ones. “k” (the lag order), the p neighbors, and the link have to be estimated.

• Processes (travels) are considered as correlated.

The idea

𝑍t, a "good" neighbor

V– Example 2 (see R simulation) Multivariate Time-Series AutoRegression

Page 54: Data Analytics Intro Session 1, 2013

• I – What is it ?

• II – What’s in it for us ?

• III – How to succeed ?

• IV – How to fail ?

• V – Does it really work ?

• VI – Does it really create value ?

Introduction to Data Analytics

54

Page 55: Data Analytics Intro Session 1, 2013

• Random Forest

• Naive Bayes classifier

• Support Vector Machine

• Artificial Neural Network • Bagging

• Time Series

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• K-means

• Fuzzy clustering

• Artificial Neural Network

• DBSCAN

• Expectation-Maximisation

• Gradient Descent

• Max-flow min-cut

• Belief propagation

• Temporal difference learning

• Q-learning

Algos

IV – Data Analytics “Data really powers everything that we do.”

– Jeff Weiner

Page 56: Data Analytics Intro Session 1, 2013

• Random Forest

• Naive Bayes classifier

• Support Vector Machine

• Artificial Neural Network • Bagging

• Time Series

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• K-means

• Fuzzy clustering

• Artificial Neural Network

• DBSCAN

• Expectation-Maximisation

• Gradient Descent

• Max-flow min-cut

• Belief propagation

• Temporal difference learning

• Q-learning

Algos

IV – Data Analytics “Data really powers everything that we do.”

– Jeff Weiner

Business value ?

Page 57: Data Analytics Intro Session 1, 2013

• Random Forest

• Naive Bayes classifier

• Support Vector Machine

• Artificial Neural Network • Bagging

• Time Series

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• K-means

• Fuzzy clustering

• Artificial Neural Network

• DBSCAN

• Expectation-Maximisation

• Gradient Descent

• Max-flow min-cut

• Belief propagation

• Temporal difference learning

• Q-learning

Algos

IV – Data Analytics “Data really powers everything that we do.”

– Jeff Weiner

Business value ?

Mathematical concept ?

Page 58: Data Analytics Intro Session 1, 2013

• Random Forest

• Naive Bayes classifier

• Support Vector Machine

• Artificial Neural Network • Bagging

• Time Series

Classification Regression

Clustering Discrete optimization

Reinforcement Learning Concept

Goal Make predictions Optimize functions Understand systems

• K-means

• Fuzzy clustering

• Artificial Neural Network

• DBSCAN

• Expectation-Maximisation

• Gradient Descent

• Max-flow min-cut

• Belief propagation

• Temporal difference learning

• Q-learning

Algos

Implementation ?

Business value ?

Mathematical concept ?

IV – Data Analytics “Data really powers everything that we do.”

– Jeff Weiner

Page 59: Data Analytics Intro Session 1, 2013

59

IV – Data Analytics “Listening to the data is important… but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model ?”

- Steve Lohr

Page 60: Data Analytics Intro Session 1, 2013

60

IV – Data Analytics “Listening to the data is important… but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model ?”

- Steve Lohr

Page 61: Data Analytics Intro Session 1, 2013

61

IV – Data Analytics “Listening to the data is important… but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model ?”

- Steve Lohr

Page 62: Data Analytics Intro Session 1, 2013

62

IV – Data Analytics “Listening to the data is important… but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model ?”

- Steve Lohr

Page 63: Data Analytics Intro Session 1, 2013

63

IV – Data Analytics “Information is the oil of the 21st century, and Data Analytics is the combustion engine.”

- P. Sondergaard

Business

IT Data

Sciences

High tech

Page 64: Data Analytics Intro Session 1, 2013

64

IV – Data Analytics

Business

IT Data

Sciences

High tech Business Insights

“Information is the oil of the 21st century, and Data Analytics is the combustion engine.”

- P. Sondergaard

Page 65: Data Analytics Intro Session 1, 2013

65

IV – Data Analytics

Business

IT Data

Sciences

High tech Business Insights

Data processing

“Information is the oil of the 21st century, and Data Analytics is the combustion engine.”

- P. Sondergaard

Page 66: Data Analytics Intro Session 1, 2013

66

IV – Data Analytics

Business

IT Data

Sciences

High tech

Breakthrough Innovation

Business Insights

Data processing

“Information is the oil of the 21st century, and Data Analytics is the combustion engine.”

- P. Sondergaard

Page 67: Data Analytics Intro Session 1, 2013

67

IV – Data Analytics

Business

IT Data

Sciences

High tech Business Insights

Data processing

Business expertise.

Technical Expertise.

Mathematical expertise and links with academical research.

Big Data Meaningful Data, and relevant questions.

Investments in R&D.

Consistency with Core Activities.

Maturity of the industry.

Infrastructure.

Culture of innovation.

Requirements

Breakthrough Innovation

“Information is the oil of the 21st century, and Data Analytics is the combustion engine.”

- P. Sondergaard

Page 68: Data Analytics Intro Session 1, 2013

Thanks. Any questions ?

68

“It’s easy to lie with statistics. It’s hard to tell the truth without it.”

- Andrejs Dunkels

Page 69: Data Analytics Intro Session 1, 2013

69

Annex A– Univariate Time-Series AutoRegression

∃ k ∈ ℕ, ∃ α1, … , αk ∈ ℝk such as ∶

Yt= α1Yt−1 + α2Yt−2+…+ αkYt−k+εt

i.e. : Yt= αiYt−iki=1 + εt

So 𝐘𝐭 = αiYt−iki=1 = Yt−1, … , Yt−k

T. α1, … , αk ≝𝐗𝐀

Definition

Penalization for non-parsimonious models :

• k, Αoptim = argmink,Yt

Yt − Yt 22+ 2k (Mallows, 73)

• Akaike Criterion (same with log) (Akaike, 73)

• Student test and forward-backward algorithm

Estimating 𝐤

𝐘𝐭 ≝ 𝐗𝐀 Then 𝐘𝐭 - 𝐘𝐭= 𝛆𝐭

Αoptim = argminYt Yt − Yt 2

2= argmin

AXA− Yt 2

2

first-order condition : 2 XTXΑoptim − 2YTX = 0

So 𝚨𝐨𝐩𝐭𝐢𝐦 = 𝐗𝐓𝐗−𝟏𝐘𝐓𝐗

Estimating 𝐀 (OLS Method)

• The process is entirely explained by its past values.

• The goal is to find the link between k successive values and the next one. “k” (the lag order) and the link have to be estimated.

• Each process (travel) is considered independently.

The idea

Page 70: Data Analytics Intro Session 1, 2013

70

∃ k ∈ ℕ, ∃ α1, … , αk ∈ ℝk∗𝐩 𝐯𝐞𝐜𝐭𝐨𝐫𝐬 ! such as ∶

Yt= α1. Yt−1 + α2. Yt−2+…+ αk. Yt−k+εt

i.e. : Yt= αi. Yt−iki=1 + εt

So 𝐘𝐭 ≝ 𝐗𝐀 (matrices !)

Definition

Penalization for non-parsimonious models :

• k, Αoptim = argmink,Yt

Yt − Yt 22+ 2k (Mallows, 73)

• Akaike Criterion (same with log) (Akaike, 73)

• Student test and forward-backward algorithm

Estimating 𝐤𝐦𝐚𝐱

• The process is entirely explained by its past values AND by the past values of its “neighbors”.

• The goal is to find the link between k successive values and the next ones. “k” (the lag order), the p neighbors, and the link have to be estimated.

• Processes (travels) are considered as correlated.

The idea

cross − correlation Yt0 , Yt0+i t (𝜏) = 𝑐𝑜𝑣 𝑌𝑡0𝑡 , 𝑌𝑡0+𝑖

𝑡−𝜏

p = argmaxi <kmax

(cross − correlation Yt0 , Yt0+i (0)(𝜏) > treshold)

ki = argmax𝜏∈ℕ

(cross − correlation Yt0 , Yt0+i 𝑡 (𝜏) > treshold)

Treshold : chosen empirically from a given grid.

Estimating 𝐤 𝐟𝐨𝐫 𝐞𝐚𝐜𝐡 𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫

Annex B – Multivariate Time-Series AutoRegression

Page 71: Data Analytics Intro Session 1, 2013

71

Annex C – Gradient descent algorithm in a 3-dimensional space


Recommended