Data science in action

Post on 24-Jan-2018

149 views 0 download

transcript

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS,

BRAD PITT AND THE IKEA BILLY INDEX

Longhow Lam – Freelance Data Scientist

https://www.linkedin.com/in/longhowlam

https://longhowlam.wordpress.com

@longhowlam

Data Science in Action

AGENDA

TEXT MINING AND MACHINE LEARNING

SOME CRAZY EXAMPLES

Goede tijden Slechte tijden

IENS Restaurant Reviews

Who looks like Brad Pitt?

The IKEA Billy Index

Text mining and Machine Learning

Text mining: simple exampleDoc 1 “I walked accross the street in Amsterdam, 1057DK, with my bike”

Doc 2 “She didn’t walk but cycled with her blue biike, //bitly.com/sdrtw”

Doc 3 “My bicycle is broken, what a piece of junk, @#$%$@!”

Terms Doc 1 Doc 2 Doc 3

+Bicycle (noun) 1 1 1

Cycling (verb) 0 1 0

Blue (adjective) 0 1 0

Amsterdam (location) 1 0 0

+Walk (verb) 1 1 0

Street (noun) 1 0 0

Broken (adjective) 0 0 1

Piece of junk (noun) 0 0 1

1057DK (postal code) 1 0 0

//bitly.com/sdrtw 0 1 0

TERM DOCUMENT MATRIX: A

• Every text document is a (very)

long string (with many zeros!)

• Data mining techniques are

applied to this matrix A

Data Science in Action

TEXT MINING PREDICT OR CLUSTER

Combine texts and “normal data” to predict behaviour (churn / fraude)

Use machine learning to train a

learner f to predict the TARGET

Automatically create topics / clusters in huge piles of documents

Apply cluster techniques to divide

documents into topic

Topic 1 Topic 2 Topic 3

Data Science in Action

MACHINE LEARNING SOME ALGORITHMS

Predict

Trees

Random Forests

Cluster

K-means

Hierarchical clustering

DBSCAN

Lineair regression

f

y = f(x) = a0 + a1x1 + a2x2+…anxn

Neural networks y = f(g(h(x)))

Data Science in Action

GTST ANALYSIS TEXT ANALYTICS

Business pain

Looking at GTST (Dutch soap): what the hack is this all about?

Are there trends in the series, is it not all the same?

Approach

Take the 5000 summaries and apply text mining in SAS

Data Science in Action

GTST ANALYSIS RESULTS

Main topics in 5000 episodes

Data Science in Action

GTST ANALYSIS DISTANCES BETWEEN TOPICS

Data Science in Action

GTST ANALYSIS ZOOMING IN ON A TOPIC

Data Science in Action

GTST ANALYSIS ZOOMING IN ON A TOPIC

Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine)

Harmsen feeling lonely.

Plan by Jack, dangerous

Writing a farewell letter

Panic, fear,

Questions about giving kid assignment

Getting money back, paying

IMPORTANT: Business validation!

I asked my wife, she used to be a loyal GTST watcher

Data Science in Action

GTST ANALYSIS TREND RESULTS

Trends over time with SAS text profile feature

Data Science in Action

GTST ANALYSIS TRENDS OVER TIME

Data Science in Action

GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS

Data Science in Action

Can you shake hands with your neighbor?

A LITTLE STATISTICAL EXPERIMENT

Two statistics that I like to share:

Data Science in Action

Can you shake hands with your neighbor?

A LITTLE STATISTICAL EXPERIMENT

50.1% of people don’t

wash their hands

after visiting the toilet

Data Science in Action

Can you shake hands with your neighbor?

A LITTLE STATISTICAL EXPERIMENT

50.1% of people don’t

wash their hands

after visiting the toilet

84.6% of all statistics are

just made up on the spot !!

Data Science in Action

IENS RESTAURANT PATH ANALYTICS

Business pain

I have eaten Chinese, where should I go next?.

Approach

Look at what others do, IENS restaurant reviewers!

Data Science in Action

A FEW FACTS… IENS DATA (TRADITIONAL BI)

Most occurring restaurant name (39 times)

Among “dutch”

restaurant (6 times)

% Sustainable kitchensBiological (67%)

French (58%)

Fish (44%)

Vegetarian (39%)

Chinese (3%)700 reviews on a “normal” Saturday

Valentine 2015 1200 reviews (1.7 times)

23 times

12 times

Data Science in Action

IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS

Data Science in Action

IENS REVIEWS CAN SENTIMENT BE PREDICTED?

Translate the reviews into a term document matrix

Apply machine learning to predict scores

Why would you do this?

Data Science in Action

IENS REVIEWS CAN I PREDICT THE SENTIMENT?

Data Science in Action

IENS REVIEWS PREDICT THE ‘EAT’ SCORE

Neural (2 X 20) R2 of 0.65

Linear reg model R2 of 0.56

Data Science in Action

Predicted review score vs. Given review score

IENS REVIEWS PREDICTION THE ‘EAT’ SCORE

Data Science in Action

IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING

Data Science in Action

OUTLIERS IN FACES DATA MINING & MACHINE LEARNING

Business pain

Tell me: Who has a strange face at SAS Netherlands?

Approach

Take SAS photos and translate to data and apply machine learning

Data Science in Action

OUTLIERS IN FACES DATA MINING & MACHINE LEARNING

Data Science in Action

STRANGE FACE

DETECTIONCOMBO OF OPEN API & SAS

Use Face++ to do facial landmarking (no deep learning!!)

Import all landmarks in SAS as an ABT

Now you can solve some funny business issues with machine learning:

Which persons are look-alikes?

Hierarchical clustering

Are there any accountmanagers?

Predictive modeling / machine learning

Who is the Brad Pitt at SAS?

Nearest Neighbor

Funny faces

Anomaly / outlier detection

Data Science in Action

STRANGE FACE

DETECTIONHIERARCHICAL CLUSTERING

Data Science in Action

STRANGE FACE

DETECTIONBRAD PITT LOOK-A-LIKES…

Data Science in Action

STRANGE FACE

DETECTIONOUTLIER DETECTION

Data Science in Action

IKEA WEBSITE KEEP TRACK OF BILLY STOCK

Define the IKEA Billy Index

as the change in stock over time

Data Science in Action

IKEA WEBSITE THE IKEA BILLY INDEX

Data Science in Action

THE BILLY INDEX SOME STATISTICS

Data Science in Action

Every extra unit increase in wind speed results in 19 less Billy’s sold

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Thanks for your attention, QUESTIONS?

Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken

https://www.linkedin.com/in/longhowlam

https://longhowlam.wordpress.com/

@longhowlam