Date post: | 17-Aug-2015 |
Category: |
Documents |
Upload: | geoffrey-yuen |
View: | 36 times |
Download: | 0 times |
Close Encounter with Data Science
April 2015
Geoff Yuen, Ph.D. VP Emerging Technology [email protected]
How Google beat previous search engines ?
Aside from searched content, also use url data patterns (links)* * Eric Schmidt “How Google Works”; also see http://www.economist.com/node/3171440
An additional datatype can make a hugh difference !
Data itself is not useful, we need insights !
Don’t start by collecting everything
Jiawei Han. Abel Bliss Professor, Department of Computer Science, UIUC; “Pattern Discovery in Data
Mining” Coursera online course with 75000 students 2/2015
The data is the second most important thing
Jeff Leeks, Assistant Professor of Biostatistics, Data Science Program , John Hopkins University :
Putting massive data into a cluster may have unpredictable consequences
Big or Small – you need the right data
Telcos may not have certain data necessary to solve specific commercial problems e.g. personal products recommendation
• Voice usage
• Call Circles
• SMS / VAS
• Time of day
• Location (SGN)
Telco data sets with business potential
• Gender
• Home Address
• Plan & Service
• Devices used
• Searches
• Web browsing history
• Mobile Apps used
• Downloaded Content
• Calls
• Email / Text
• Care History
• Resolved Queries
• Billing
• CRM
• Loyalty
• Spending
• Churn
• Product Lines
• CLV
• Contracts
• Cross-sell/Upsell
• Contact history
• Campaigns
• Product offers
• Acquisition cost
Marketing
Customer Loyalty & Retention
Customer Support
*Digital Usage
* Carriage Usage
Customer Profile
Telco versus other OTT companies – data land grab
Telco Facebook Google Apple Edge
Store / Offline Transaction Yes No No No √
True Id / Demographic Yes Partial No No √
Carriage Usage including location history
Yes No No No √
All Apps Used + Usage Yes Facebook Apps Only Google Apps Only
iPhone only √
Web browsing (unencrypted)
Yes No No No √
Multi channel web browsing (mobile + PC)
Yes No No No √
Web search (encrypted)
No No Yes No X
User Uploaded Textual and Multimedia data
No Yes Yes Yes X
Digital purchase (encrypted)
No No Play Store Apple Store X
Proximity Marketing No No No ibeacon X
Offline, O2O data !
Telco Data Assets Value 1. Representative : sample size in the millions instead of thousands
2. True & Clean : authentic measured usage instead of user reported
3. Whole history : reflects when and where users do what with phones, not just snapshots (like apps)
4. Comes with Location : unlike mobile apps, GPS, WiFi
5. Multichannel : can be supplemented with data from other services e.g. broadband, nowTV
6. Mobile mediated store purchase events (e.g. Apple Pay) can be logged (but not content)
7. Proximity data capture still at large
• Voice usage
• Call Circles
• SMS / VAS
• Time of day
• Location (SGN)
Mobile Carriage and Digital Usage as unique data type
• Gender
• Home Address
• Plan & Service
• Devices used
• Searches
• Web browsing history
• Mobile Apps used
• Downloaded Content
• Calls
• Email / Text
• Care History
• Resolved Queries
• Billing
• CRM
• Loyalty
• Spending
• Churn
• Product Lines
• CLV
• Contracts
• Cross-sell/Upsell
• Contact history
• Campaigns
• Product offers
• Acquisition cost
Marketing
Customer Loyalty & Retention
Customer Support
*Digital Usage
* Carriage Usage
Customer Profile
Location Insights Digital marketing involves targeting customers through their mobile. Location
Insights provides behavioral and demographic profiles of crowds to make
decisions more precise and contextual
39
%
39
2G Network
3G Network
900 MHz
1800 MHz
2100 MHz
2013 4G Network
NETWORK DATA
The O2 mobile network has hundreds of cells to measure the trends in
footfall across the country
39
%
39
Easier to use
Further protecting
anonymity
Extrapolated to
represent local
population
200 x 200 GRID
Footfall is rendered into 200 x 200 metre grid squares across the country
Morrison Supermarkets UK used related service to
increase sales and “leveling the playfield”:
• Analysis of customer journey patterns to
help target high traffic neighborhoods with
coupons while avoid households in the
direct vicinity of a competitor.”
• Improved catchment and optimize
advertising
• Increase store visits 150% without
introducing customer loyalty program
Telefonica Smart Steps & Morrison
Location Insights Products Launched by Telcos
Verizon Precision Insight
Telefonica Dynamic Insights
AT&T AdWorks with Mobile
SK Telecom Bigdata Hub
Singtel DataSpark
User Base 94 mil 200 mil 70 mil + 15 mil U-Verse
27 mil 500 mil
Data used Phone location, Browing history, Apps usage
Phone location, calls, Text, travels between cells, mobile signal loss & connect
CDRs, locations, TV usage, browsing, mobile apps usage, "other information"
location, social networking, voice calls, sensors, SMS and apps
CDRs, locations, TV usage, “other information"
Privacy Precision ID ID Protection Anonymous / aggregate
Exclude all personal
information
Anonymous / aggregate
Reference cases
Sports Stadium Event Promotion
Football Retail Management, Supermarket Campaign
Only multichannel product in US market; Levi ‘s mobile promotion campaign
Business districts info; facilitate highway planning
Transportation, Site planning, Sports Events, Rich Segment, tourism
What do data look like these days ?
• Data = values of qualitative or quantitative variables, belonging to a set of items (usually population)
• Data = often unstructured (without pre-established data model), usually raw file, different formats
chat Genome-DNA base pairs picture
How much unstructured data in a telco ?
Telecommuncations = 55%
Think again !
• share photos
• mobile chat
• mobile video
• network traffic
http://sites.tcs.com/big-data-study/industries-unstructured-data/
Golden Era of Analytics 1995-
• Statistical Machine Learning has contributed many much more powerful
algorithms than simple regression (list modified from Seni Giovanni, A9):
• 1983 CART (Tree)
• 1996 Lasso
• 1996 Bagging
• 1997 AdaBoost
• 2001 Random Forest
• 2003 Learning Ensembles
• 2004 Regularization & Boosted Lasso
• 2005-2013 Deep Belief / Deep Learning
Many ways to predict and classify from structured and unstructured data now exist !
NASA JPL: better flyby surface feature recognition by random forests
Data Science can only make better predictions and classifications; follow up action depends on telco knowhow
By 2017, 10 % of computers will be learning rather than processing (Gartner 2013)
Page 23
Structured Data Unstructured Data
Regression
Linear or Logistic
Problem specific
Learning structure in data
non-Linear (polynomial)
Knowledge specific
Big Data finally found its analytic partner : deep learning
Deep Learning for unstructured data
• Previous paradigm for feature detection and prediction from data is based on
modelling and optimization. “Deep learning” have now surpassed related
performance in diverse problems from researchers around the world. “Tech 2015: Deep Learning And Machine Intelligence Will Eat The World” Forbes 12/2014
• Deep learning scale well with big data to learn “layering of knowledge” in
hidden layers without requiring handcrafting of feature detectors as past
machine learning methods.
• Demonstrated impressive improvements in diverse areas around the world :
speech recognition, object recognition in images, targeted advertising, fraud
detection, personalization • Speech recognition : Microsoft, Google & Apple competing mobile “digital assistants” (Google Now vs Siri vs
Cortana 9/2014) Digital assistants will drive mCommerce & 50% US digital purchases in 2017 (Gartner)
• Object recognition : Facebook
Mining user images for intentions (NYT)
• Real-time translation : Skype
• World Cup / NBA Predicting 2014 (MS)
• Others : Baidu, IBM, Yahoo, Tencent, Netflix, Adobe,
NEC, Toyota
• Telco centric vendors : Wise-athena, Dataspark, Zettics
• Voice usage
• Call Circles
• SMS / VAS
• Time of day
• Location (SGN)
Telco data useful for targeting
• Gender
• Home Address
• Plan & Service
• Devices used
• Searches
• Web browsing history
• Mobile Apps used
• Downloaded Content
• Calls
• Email / Text
• Care History
• Resolved Queries
• Billing
• CRM
• Loyalty
• Spending
• Churn
• Product Lines
• CLV
• Contracts
• Cross-sell/Upsell
• Contact history
• Campaigns
• Product offers
• Acquisition cost
Marketing
Customer Loyalty & Retention
Customer Support
*Digital Usage
* Carriage Usage
Customer Profile
“If I have first-party CRM information and browser history information, I already know so much about the consumer that [demographic] information doesn’t add any [more] value. “
Claudia Perlich, Chief Data Scientist, Dstillery
So we don’t know everything about our data yet …
How Big Data Improve Ad Targeting
1 Visitor browses website or runs mobile app
2 Advertisers see visitor and conversion probabilities (increased by telco big data/machine
learning, “Star”)
3 Advertiser bid
4 RTB exchange selects best bid and sends ad to visitor
5 Visitor sees ad (~100-300 milliseconds)
1 3
4
2 5
Contextual Mobile Targeting
Contextual & unstructured data using machine learning technology also improve
advertising accuracy +219 % (Ad Theorent)
Mobile Targeted Advertising : Telco Examples
Name Combined Base Basis
*Telefonica Axonix (acquired) 200 mil Digital Fingerprinting of mobile usage
*Singtel, Globe, Optus, Telkomsel
Amobee (acquired) 500 mil (combined) Digital Fingerprinting of mobile usage
AT&T Amobee 119 mil Insert tracking id in URL; digital activity
Orange Orange Ad / Amobee 226 mil Partnership with OpenX
Weve UK (Telefónica, O2 EE & Vodafone)
Weve Mobile Display 22 mil (combined)
Digital Fingerprinting of mobile usage
Deutsche Telekom
AudienceScience 36 mil Digital Fingerprinting
Verizon
PrecisionID 123 mil Insert tracking id (UIDH)
Sprint Amobee 55 mil Digital Fingerprinting; Ad Exchange
2014Q4 Survey of Deep Learning Achievements
Previous Accuracy
Data used to train model
Latest Accuracy
Company
Speech Recognition 75% 680 speakers, 10 sentences each
94% (2013) Google, IBM, Skype, MS
Object recognition 70% 1.2 mil images 95% (2015) Baidu, Google, Facebook
Target Advertising <1 % (Banner Ads)
220K users 22% Adtheorent, AlchemyAP, Correlor
Personalization na 220K users
27% Correlor, Optimove
Churn Prediction (Telco)
69% (SAS) 300 mil CDRs 1.8 mil users
82% Sparked, WiseAthena, Correlor
Dealer Fraud Detection (Telco)
<40% (reactive)
700 mil CDRs 1.2 mil users
80% (predictive)
WiseAthena
• Other big companies in related efforts : Baidu, IBM, Yahoo, Tibco, Tencent, Netflix, Adobe, NEC, Toyota
Facebook “Likes” for Predicting Personality Facebook can predict personality based on annotated data better than
humans except for spouse
http://www.pnas.org/content/112/4/1036.full.pdf
Telco mobile usage data should do even better than this
Concluding Thoughts
• Network usage data known to improve business predictions e.g. churn, loyalty
• Combining internal data types improve user targeting
• Unstructured mobile usage data should do well in personality prediction
• Telco should work together due to similar business problems and common interests
With the help of advanced prediction algorithms, telco
data has potential to create significant new business
Questions ? Email [email protected]