(MED302) Leveraging Cloud-Based Predictive Analytics to Strengthen Audience Engagement | AWS...

Post on 29-Jun-2015

383 views 0 download

Tags:

description

In order to improve audience engagement., media companies must deal with vast amounts of raw data from web, social media, devices, catalogs, and back-channel sources. This session dives into predictive analytic solutions on AWS: We present architecture patterns for optimizing media delivery and tuning overall user experience based on representative data sources (video player clickstream, web logs, CDN, user profiles, social media sentiment, etc.). We dive into concrete implementations of cloud-based machine learning services and show how they can be leveraged for profiling audience demand, cueing content recommendations and prioritizing delivery of related media. Services covered include Amazon EC2, Amazon S3, Amazon CloudFront, and Amazon EMR.

transcript

Michael Limcaco, Amazon Web Services

Content discovery … and the conversation around it … matter!

[1] http://www.slideshare.net/AmazonWebServices/maximizing-audience-engagement-in-media-delivery-med303-aws-reinvent-2013-28622676

[2] http://www.nielsen.com/content/corporate/us/en/press-room/2013/new-nielsen-research-indicates-two-way-causal-influence-between-.html

[3] http://www.google.com.au/think/research-studies/quantifying-movie-magic.html

Search

Watch

Listen

Play

Download

Purchase

Contact sales

Subscribe

Contact support

Cancel

Rate It

Review It

Upgrade It

Sharing

Tagging

Bookmarking

Social Sentiment

• Descriptive

– Retrospective

– What happened or is happening

– Simple aggregations and counters

• Predictive

– Statistical forecast

– Predict a value in a dataset

– Machine learning

• Prescriptive (emergent)

– What should I do about it?

Descriptive

Predictive

Prescriptive

Machine Learning

Signals Predictions

Recommendations

Clustering

Classification

Storage

Visualization

&

Analysis

R

Octave

Matlab

Excel

DAS

Graphlab

Mahout

Spark MLlib

H20

Hbase

HDFS

RDBMS

SAN/NAS

KNIME

WEKA

Python Kits

Single Node Big Data

Use Case 1

Spark H20

Recommendation Clustering Classification

Math Library

Hadoop

Map-Reduce

Estimate similar users and items

http://www.slideshare.net/tdunning/recommendation-techn

User1 Thing1

User2 Thing2

User3 Thing3

User2 Thing4

User5 Thing1

User1 Thing2

User1 Thing3

Mike

Jon

Mary

Phil

Kris

Logs History Matrix

History Matrix

2 8

2 4

8

4

Item-Item Matrix

2 8

2 4

8

4

Item-Item Matrix

LLR

Indicators

(“Items Similar To This….”)

Indicators

(“Items Similar To This….”)

Items Similar To This

Superman Highlander,

Dune

Star Wars Raiders,

Minority

Report

Highlander Superman

Mulan Home Alone,

Mermaid

Star Trek …

… …

4587 223, 5234

748 5345, 235

12 8234

245 9543, 7673

3456 4587

… …

Index

Indicators

748 Star Wars 45, 235

12 Highlander 8234

245 Mulan 9543,

7673

4587 Superman 12, 5234

3456 Star Trek 2458 …

Query

“12”

5345

3456

12

users

users

Media

platforms

Mobile

Search

Play

Buy

Rate

Recommendations

https://github.com/apache/mahout

movie-b movie-c:2.772588722239781

movie-a:2.772588722239781

movie-d ….Indicators

(“Items Similar To This….”)

% mahout spark-itemsimilarity

-i input-folder/data.txt

-o output-folder/

--filter1 buy -fc 1 -ic 2

--filter2 view

Use Case 2

Classify (estimate) as Positive | Negative

http://www.slideshare.net/tdunning/recommendation-techn

“I thought Star Wars Episode 28 was not without merit ”

https://github.com/cyhex/streamcrab

users

users

Media

platforms

Mobile

Search

Play

Buy

Rate

Recommend

Social Media activity

Extract

FeaturesClassify

Extract

FeaturesClassify

Extract

FeaturesClassify

Model

Training

Positive Negative

“I adored this

movie”

“adore” =

POSITIVE

Extract

FeaturesClassify

Extract

FeaturesClassify

Extract

FeaturesClassify

Model

Training

Positive Negative

http://www.nltk.org/book/ch06.html

TextBlob + Natural Language Toolkit (NLTK)

1

2

from textblob.classifier import NaiveBayesClassifier

training_data = [(‘I love this movie’, ‘Positive’),

(‘This makes me mad ’, ‘Negative’) …]

my_classifier = NaiveBayesClassifier(training_data)

“I thought Star Wars Episode 29 was not without merit ”

“Positive”

from amazon_kclpy import kcl import json, base64

class RecordProcessor(kcl.RecordProcessorBase):

def process_records(self, records, checkpointer):

:

inbound_tweet = base64.b64decode(record.get(‘data’))

sentiment = my_classifier.classify(inbound_tweet)

Extract

FeaturesClassify

Extract

FeaturesClassify

Extract

FeaturesClassify

Model

Training

Positive Negative

12 2 7 85 1 997

Mulan

1 5 99 85 50 4

Mulan

1 2 3 4 5 6

Mulan

3 1 4 6 7 9

Mulan

Use Case 3

This is a form of unsupervised learning

Segaran, Toby. Programming Collective Intelligence. Sebastopol: O’Reilly, 2009. Print.

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6374152&isnumber=6374097

R + H20

R + H20

Data

Science

Desktop

Machine

Learning

Cluster

R + H20

% java –jar h20.jar

Use Case 4

Customer Geo Account Type Account

Age

Support

Tickets

Minutes

streamed

Churn?

Mike CA Premium 120 10 240 TBD

John CA Basic 240 1 140 TBD

Ingrid WA Premium 60 5 1800 TBD

Mark WA Basic 30 0 0 TBD

Usman WA Basic 720 0 360 TBD

http://www.bigml.com

AWS Marketplace

Software

• BigML

• Revolution R Enterprise

• PredictionIO

• Yhat

• Mortar

• Zementis

http://bit.ly/awsevals