Post on 21-Jan-2017
transcript
Machine Learning & Data Lake for IoT scenarios on AWS
JohnChang
TechnologyEvangelistOctober2016
Three types of data-driven development
Retrospec)veanalysisandrepor<ng
AmazonRedshiA,AmazonRDSAmazonS3AmazonEMR
Three types of data-driven development
Retrospec)veanalysisandrepor<ng
Here-and-nowreal-<meprocessinganddashboards
AmazonKinesisAmazonEC2AWSLambda
AmazonRedshiA,AmazonRDSAmazonS3AmazonEMR
Three types of data-driven development
Retrospec)veanalysisandrepor<ng
Here-and-nowreal-<meprocessinganddashboards
Predic)onstoenablesmartapplica<ons
AmazonKinesisAmazonEC2AWSLambda
AmazonRedshiA,AmazonRDSAmazonS3AmazonEMR
Machine learning and smart applica@ons Machinelearningisthetechnologythatautoma<callyfindspaMernsinyourdataandusesthemtomakepredic<onsfornewdatapointsastheybecomeavailable.
Machine learning and smart applica@ons Machinelearningisthetechnologythatautoma<callyfindspaMernsinyourdataandusesthemtomakepredic<onsfornewdatapointsastheybecomeavailable.
Yourdata+machinelearning=smartapplica<ons
Smart applica@ons by example
Basedonwhatyouknowabouttheuser:Willtheyuseyourproduct?
Smart applica@ons by example
Basedonwhatyouknowabouttheuser:Willtheyuseyourproduct?
Basedonwhatyouknowaboutanorder:Isthisorderfraudulent?
Smart applica@ons by example
Basedonwhatyouknowabouttheuser:Willtheyuseyourproduct?
Basedonwhatyouknowaboutanorder:Isthisorderfraudulent?
Basedonwhatyouknowaboutanewsar)cle:Whatotherar)clesareinteres)ng?
And a few more examples… Frauddetec)on Detec<ngfraudulenttransac<ons,filteringspamemails,
flaggingsuspiciousreviews,…
Personaliza)on Recommendingcontent,predic<vecontentloading,improvinguserexperience,…
Targetedmarke)ng Matchingcustomersandoffers,choosingmarke<ngcampaigns,cross-sellingandup-selling,…
Contentclassifica)on Categorizingdocuments,matchinghiringmanagersandresumes,…
Churnpredic)on Findingcustomerswhoarelikelytostopusingtheservice,upgradetarge<ng,…
Customersupport Predic<verou<ngofcustomeremails,socialmedialistening,…
Smart applica@ons by counterexample
Dear Alex,
This awesome quadcopter is on sale for just $49.99!
Smart applica@ons by counterexample SELECT c.ID
FROM customers c
LEFT JOIN orders o
ON c.ID = o.customer
GROUP BY c.ID
HAVING o.date > GETDATE() – 30
Wecanstartbysendingtheoffertoallcustomerswhoplacedanorderinthelast30days
Smart applica@ons by counterexample SELECT c.ID
FROM customers c
LEFT JOIN orders o
ON c.ID = o.customer
GROUP BY c.ID
HAVING O.CATEGORY = ‘TOYS’
AND o.date > GETDATE() – 30
…let’snarrowitdowntojustcustomerswhoboughttoys
Smart applica@ons by counterexample SELECT c.IDFROM customers c LEFT JOIN orders o ON c.ID = o.customer
LEFT JOIN PRODUCTS P ON P.ID = O.PRODUCTGROUP BY c.IDHAVING o.category = ‘toys’ AND ((P.DESCRIPTION LIKE ‘%HELICOPTER%’ AND O.DATE > GETDATE() - 60) OR (COUNT(*) > 2 AND SUM(o.price) > 200 AND o.date > GETDATE() – 30) )
…andexpandthequerytocustomerswhopurchasedothertoyhelicoptersrecently,ormadeseveralexpensivetoypurchases
Smart applica@ons by counterexample SELECT c.ID
FROM customers c
LEFT JOIN orders o
ON c.ID = o.customer
LEFT JOIN products p
ON p.ID = o.product
GROUP BY c.ID
HAVING o.category = ‘toys’
AND ((p.description LIKE ‘%COPTER%’
AND o.date > GETDATE() - 60)
OR (COUNT(*) > 2
AND SUM(o.price) > 200
AND o.date > GETDATE() – 30)
)
…butwhataboutquadcopters?
Smart applica@ons by counterexample SELECT c.IDFROM customers c LEFT JOIN orders o ON c.ID = o.customer LEFT JOIN products p ON p.ID = o.productGROUP BY c.IDHAVING o.category = ‘toys’ AND ((p.description LIKE ‘%copter%’ AND o.date > GETDATE() - 120) OR (COUNT(*) > 2 AND SUM(o.price) > 200 AND o.date > GETDATE() – 30) )
…maybeweshouldgobackfurtherin<me
Smart applica@ons by counterexample SELECT c.ID
FROM customers c
LEFT JOIN orders o
ON c.ID = o.customer
LEFT JOIN products p
ON p.ID = o.product
GROUP BY c.ID
HAVING o.category = ‘toys’
AND ((p.description LIKE ‘%copter%’
AND o.date > GETDATE() - 120)
OR (COUNT(*) > 2
AND SUM(o.price) > 200
AND o.date > GETDATE() – 40)
)
…tweakthequerymore
Smart applica@ons by counterexample SELECT c.IDFROM customers c LEFT JOIN orders o ON c.ID = o.customer LEFT JOIN products p ON p.ID = o.productGROUP BY c.IDHAVING o.category = ‘toys’ AND ((p.description LIKE ‘%copter%’ AND o.date > GETDATE() - 120) OR (COUNT(*) > 2 AND SUM(o.price) > 150 AND o.date > GETDATE() – 40) )
…again
Smart applica@ons by counterexample SELECT c.ID
FROM customers c
LEFT JOIN orders o
ON c.ID = o.customer
LEFT JOIN products p
ON p.ID = o.product
GROUP BY c.ID
HAVING o.category = ‘toys’
AND ((p.description LIKE ‘%copter%’
AND o.date > GETDATE() - 90)
OR (COUNT(*) > 2
AND SUM(o.price) > 150
AND o.date > GETDATE() – 40)
)
…andagain
Smart applica@ons by counterexample SELECT c.ID
FROM customers c
LEFT JOIN orders o
ON c.ID = o.customer
LEFT JOIN products p
ON p.ID = o.product
GROUP BY c.ID
HAVING o.category = ‘toys’
AND ((p.description LIKE ‘%copter%’
AND o.date > GETDATE() - 90)
OR (COUNT(*) > 2
AND SUM(o.price) > 150
AND o.date > GETDATE() – 40)
)
Usemachinelearningtechnologytolearnyourbusinessrulesfromdata!
Why aren’t there more smart applica@ons? 1. Machinelearningexper<seisrare.
2. Buildingandscalingmachinelearningtechnologyishard.
3. Closingthegapbetweenmodelsandapplica<onsis)me-consumingandexpensive.
Building smart applica@ons today Exper)se Technology Opera)onaliza)on
Limitedsupplyofdatascien<sts
Manychoices,fewmainstays Complexanderror-pronedataworkflows
Expensivetohireoroutsource
Difficulttouseandscale
CustompladormsandAPIs
Manymovingpiecesleadtocustomsolu<onsevery&me
Reinven<ngthemodellifecyclemanagementwheel
What if there were a beIer way?
Introducing Amazon Machine Learning Easy-to-use,managedmachinelearningservicebuiltfordevelopers
Robust,powerfulmachinelearningtechnologybasedonAmazon’sinternalsystems
CreatemodelsusingyourdataalreadystoredintheAWScloud
Deploymodelstoproduc<oninseconds
Easy-to-use and developer-friendly Usetheintui<ve,powerfulserviceconsoletobuildandexploreyourini<almodels
• Dataretrieval• Modeltraining,qualityevalua<on,fine-tuning• Deploymentandmanagement
AutomatemodellifecyclewithfullyfeaturedAPIsandSDKs
• Java,Python,.NET,JavaScript,Ruby,PHPEasilycreatesmartiOSandAndroidapplica<onswithAWSMobileSDK
Powerful machine learning technology BasedonAmazon’sbaMle-hardenedinternalsystems
Notjustthealgorithms:• Smartdatatransforma<ons• Inputdataandmodelqualityalerts• Built-inindustrybestprac<ces
Growswithyourneeds
• Trainonupto100GBofdata• Generatebillionsofpredic<ons• Obtainpredic<onsinbatchesorreal-<me
Integrated with the AWS data ecosystem AccessdatathatisstoredinAmazonS3,AmazonRedshiA,orMySQLdatabasesinAmazonRDSOutputpredic<onstoAmazonS3foreasyintegra<onwithyourdataflowsUseAWSIden<tyandAccessManagement(IAM)forfine-graineddataaccesspermissionpolicies
Fully-managed model and predic@on services End-to-endservice,withnoserverstoprovisionandmanage
One-clickproduc<onmodeldeployment
Programma<callyquerymodelmetadatatoenableautoma<cretrainingworkflows
Monitorpredic<onusagepaMernswithAmazonCloudWatchmetrics
Pay-as-you-go and inexpensive Dataanalysis,modeltraining,andevalua<on:$0.42/instancehour
Batchpredic<ons:$0.10/1000Real-<mepredic<ons:$0.10/1000+hourlycapacityreserva<oncharge
Three supported types of predic@ons Binaryclassifica<on
PredicttheanswertoaYes/Noques<on
Mul<classclassifica<onPredictthecorrectcategoryfromalist
Regression
Predictthevalueofanumericvariable
Train model
Evaluate and op@mize
Retrieve predic@ons
1 2 3
Building smart applica@ons with Amazon ML
Trainmodel
Evaluateandop<mize
Retrievepredic<ons
1 2 3
Building smart applica@ons with Amazon ML
- Createadatasourceobjectpoin<ngtoyourdata- Exploreandunderstandyourdata- Transformdataandtrainyourmodel
Create a datasource object
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ds = ml.create_data_source_from_s3(
data_source_id = ’my_datasource',
data_spec = {
'DataLocationS3': 's3://bucket/input/data.csv',
'DataSchemaLocationS3': 's3://bucket/input/data.schema',
’compute_statistics’: True } )
Explore and understand your data
Train your model
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_ml_model(
ml_model_id = ’my_model',
ml_model_type = 'REGRESSION',
training_data_source_id = 'my_datasource')
Train model
Evaluate and optimize
Retrieve predictions
1 2 3
Building smart applica@ons with Amazon ML
- Measureandunderstandmodelquality- Adjustmodelinterpreta<on
Explore model quality
Fine-tune model interpreta@on
Fine-tune model interpreta@on
Train model
Evaluate and optimize
Retrieve predictions
1 2 3
Building smart applica@ons with Amazon ML
- Batchpredic<ons- Real-<mepredic<ons
Batch predic@ons Asynchronous,large-volumepredic<ongenera<onRequestthroughserviceconsoleorAPIBestforapplica<onsthatdealwithbatchesofdatarecords
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_batch_prediction(
batch_prediction_id = 'my_batch_prediction’,
batch_prediction_data_source_id = ’my_datasource’,
ml_model_id = ’my_model',
output_uri = 's3://examplebucket/output/’)
Real-@me predic@ons Synchronous,low-latency,high-throughputpredic<ongenera<onRequestthroughserviceAPI,server,ormobileSDKsBestforinterac<onapplica<onsthatdealwithindividualdatarecords
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ml.predict(
ml_model_id = ’my_model',
predict_endpoint = ’example_endpoint’,
record = {’key1':’value1’, ’key2':’value2’})
{
'Prediction': {
'predictedValue': 13.284348,
'details': {
'Algorithm': 'SGD',
'PredictiveModelType': 'REGRESSION’
}
}
}
Data Lake
Retailers need to deliver con@nuous differen@a@on
Personalization Merchandising Real-time engagement
Personalization Merchandising Real-time engagement
Retailers need to deliver con@nuous differen@a@on
Afull-serviceresiden<alrealestatebrokerage
Redfin manages data on hundreds of millions
of properties and millions of customers
The Hot Homes algorithm automatically calculates
the likelihood by analyzing more than 500 attributes of
each home
Was fully AWS-native since day one
https://aws.amazon.com/solutions/case-studies/redfin/
Hot Homes
There's an 80% chance this home will sell in the next 11 days – go tour it soon.
Ingest/Collect
Consume/visualizeStore Process/
analyze
Data1 40 9
5
AmazonS3Datalake AmazonEMR
AmazonKinesis
AmazonRedShiV
Answers&Insights
HotHomesUsers
Proper)es
Agents
User Profile Recommendation Hot Homes Similar Homes Agent Follow-up Agent Scorecard Marketing A/B Testing Real Time Data …
AmazonDynamoDB
BI/Repor)ng
Redfin Manages Data on Hundreds of Millions of Properties Using AWS
.
Once we solved the infrastructure problem, we could
dream a little bigger. Now we can deliver results without
worrying about how to scale.
Yong Huang, Director, Big Data and Analytics
”
“ • Zero on-premises infrastructure
• Using spot pricing for EC2, Redfin saved 90% compared to running on-demand
• Using AWS, Redfin maintains a small technical team, allowing much simplified server management and allowing the transition to DevOps
• Redfin is able to launch products like Hot Homes to greatly increase the buyer experience, by leveraging the agility and scale of AWS
Personalization Merchandising Real-time engagement
Retailers need to deliver con@nuous differen@a@on
American upscale fashion retailer
Nordstrom has 323 stores operating in 38 of the United States and also in Canada; the largest in number of
stores and geographic footprint
of its retail competitors
Fashion retailer that sells clothing, shoes, cosmetics,
and accessories
Nordstrom is going all in on AWS
https://aws.amazon.com/solutions/case-studies/nordstrom/
NORDSTROM
Ingest/ Collect
Consume/ visualize Store Process/
analyze
Data 1 4
0 9 5 Outcomes
& Insights
Personalized recommendations within seconds (from 15-20 min) Scale the expertise of stylists to all shoppers Reduce costs by 2X order of magnitude …
Mobile Users
Desktop Users
Analytics Tools
Online Stylist
Amazon RedShift
Amazon Kinesis
AWS Lambda
Amazon DynamoDB
AWS Lambda
AmazonS3DataStorage
NORDSTROM
Nordstrom gives personalized style recommendations in seconds
.
Alert me when the internet is down ...
Keith Homewood Cloud Product Owner, Nordstrom
”
“ • Nordstrom Recommendation is the online version of a stylist. It can analyze and deliver personalized recommendations in seconds
• Going All-In on AWS has resulted in reducing costs by 2X
• Continuous delivery allows Nordstrom to deliver multiple production launches a day in a single application
• Can now create a personalized recommendation in seconds, in what used to take 15-20 minutes of processing
• Nordstrom Cloud Product Owner finds the reliability and availability of AWS so suitable that as long as the internet is working, Nordstrom Recommendation is working
Nordstrom
Personalization Merchandising Real-time engagement
Retailers need to deliver con@nuous differen@a@on
Technology that helps brick-and-mortar retailers optimize performance
Trusted by over 500 global brands in
45 countries worldwide and counting
Euclid analyzes customer movement data to
correlate traffic with marketing campaigns and to help retailers optimize
hours for peak traffic
Was fully AWS-native since day one
https://aws.amazon.com/solutions/case-studies/euclid/
Ingest/Collect
Consume/visualizeStore Process/
analyze
Data1 40 9
5
Answers&Insights
EuclidAnaly)cs
Campaigns
WiFi-Foottraffic
Transac)ons
Walk-Bys New & Return Visitors Visit Duration Engagement Rate Bounce Rate Storefront Potential & Conversion
Customer segmentation and loyalty assessment Regional and categorical roll-up reporting Zoning for large-format locations
EuclidEventIQAmazonS3Datalake
AmazonRDSforMySQL
AmazonEMR
AmazonRedShiV
AmazonEC2
AmazonElas)cBeanstalk
Elas)cLoadBalancing
AmazonRedshiA AmazonElas<cMapReduce
DataWarehouse Semi-structured
Amazon Glacier
Use an op@mal combina@on of highly interoperable services
AmazonSimpleStorageService
DataStorage Archive
AmazonDynamoDB
AmazonMachineLearning
AmazonKinesis
NoSQL Predic)veModels OtherAppsStreaming
AWS IoT
“SecurelyconnectoneoronebilliondevicestoAWS,sotheycaninteractwithapplica<onsandotherdevices”
AWS IoT
DEVICESDKSetofclientlibrariestoconnect,
authen<cateandexchangemessages
DEVICEGATEWAYCommunicatewithdevicesvia
MQTTandHTTP
AUTHENTICATIONAUTHORIZATIONSecurewithmutual
authen<ca<onandencryp<on
RULESENGINETransformmessages
basedonrulesandroutetoAWSServices
AWSServices-----
3PServices
DEVICESHADOWPersistentthingstateduringintermiMentconnec<ons
APPLICATIONS
AWSIoTAPI
DEVICEREGISTRYIden<tyandManagementof
yourthings
AWS IoT Rules Engine Ac@ons
RULESENGINETransformmessages
basedonrulesandroutetoAWSServices
AWSServices-----
3PServices
AWSServices-----
3PServices
1.AWSServices(DirectIntegra&on)
RulesEngine
Ac<ons
AWS IoT Rules Engine
AWSLambda
AmazonSNS
AmazonSQS
AmazonS3
AmazonKinesis
AmazonDynamoDB Amazon RDS
Amazon Redshift
Amazon Glacier
Amazon EC2
3.ExternalEndpoints(viaLambdaandSNS)
RulesEngineconnectsAWSIoTtoExternalEndpointsandAWSServices.
2.RestofAWS(viaAmazonKinesis,AWSLambda,AmazonS3,andmore)
AWS IoT Rules Engine Ac@ons
Rules Engine evaluates inbound messages published into AWS IoT, transforms and delivers to the appropriate endpoint based on business rules. External endpoints can be reached via Lambda and Simple Notification Service (SNS).
InvokeaLambdafunc<on
PutobjectinanS3bucket
Insert,Update,ReadfromaDynamoDBtablePublishtoanSNSTopicorEndpoint
PublishtoanAmazonKinesisstream
Ac<ons
AmazonKinesisFirehose
RepublishtoAWSIoT
AmazonMachineLearning
AmazonElas<csearch
AWS IoT Rules Engine & Amazon SNS
Push Notifications Apple APNS Endpoint, Google GCM Endpoint, Amazon ADM Endpoint, Windows WNS Amazon SNS -> HTTP Endpoint (Or SMS or Email) Call HTTP based 3rd party endpoints through SNS with subscription and retry support
SNS
2
AWS IoT Rules Engine for Machine Learning
Anomaly Detection Amazon Machine Learning can feed predictive evaluation criteria to the Rules Engine Continuous Improvement Around Predication Continuously look for outliers and re-calibrate the Amazon Machine Learning models
SendtoS3
AmazonMachineLearning
Re-Train
S3
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
AWS IoT Rules Engine & Stream Data
N:1 Inbound Streams of Sensor Data (Signal to Noise Reduction) Rules Engine filters, transforms sensor data then sends aggregate to Amazon Kinesis Amazon Kinesis Streams to Enterprise Applications Simultaneously stream processed data to databases, applications, other AWS Services
OrderedStream
AmazonKinesis
Thankyou!
Getpredic<onswithAmazonMLbatchAPI
ProcessdatawithEMR
RawdatainS3Aggregateddata
inS3Predic)ons
inS3 Yourapplica)on
Batch predic@ons with EMR
Structured data In Amazon RedshiT
Load predic@ons into Amazon RedshiT
- or - Read predic@on results directly
from S3
Predic@ons in S3
Get predic@ons with Amazon ML batch API
Your applica@on
Batch predic@ons with Amazon RedshiT
Your applica@on
Get predic@ons with Amazon ML real-@me API
Amazon ML service
Real-@me predic@ons for interac@ve applica@ons
Your applica@on Amazon DynamoDB
+
Trigger events with Lambda +
Get predic@ons with Amazon ML real-@me API
Adding predic@ons to an exis@ng data flow
Recommenda@on engine
AmazonS3 AmazonRedshiA
AmazonML
DataCleansing
RawData
Trainmodel
BuildModels
S3Sta<cWebsite
Predic<ons