Post on 05-Dec-2014
description
transcript
Google Prediction API
{ "label": "awesome", "score": 0.98 },
{ "label": "lame", "score": 0.08 }
Gabe Hamilton
What kind of Prediction?
Predict an output value based on some input values.
Things like:
Sentiment Analysis, Spam Detection, Today's temperature, GDP Growth
How does Google predict things?
Through an intensive breeding program Google has managed to distribute Punxsutawney Phils throughout its datacenters across the world. Each Phil is kept in a climate controlled enclosure that mimics the conditions of a perfectly average February 2nd. A full scale digital sundial maps your problem domain onto the shadow matrix of the enclosure allowing each Phil to fully interact with your model. The early spring / long winter emergence probability of each Phil is then sorted and reduced to determine the final result returned by the prediction API.
Well, it's Google
No Really, How do they do it?Short Answer:
I have no idea
Long answer:
It's a service, they can do whatever works, swap implementations run multiple algorithms
Possible Implementations
Regression AnalysisNeural NetworksPrimary Comp. Analysis Support Vector MachineMonte Carlo SimDecision TreesEvolutionary Algorithmsetc, etc
But basically it is
STATISTICS
Types of Prediction you can doRegression
How do inputs cause an output to vary?
Output is a numeric value: Shopping Cart Size Stock Price GDP
Classification
Deciding which bucket some input belongs in
Buckets are text values: French, Spanish, English
What is Classification good for?
Classification● Sentiment analysis● Spam detection● Language categorization● Tagging● Assign priority to bugs● Predict movie ratings● Message routing decisions● <Your brilliant idea here>
Hello World page is great
https://developers.google.com/prediction/docs/hello_world
Getting Started
So you have a big pile of data
Time for some cleanup90% of the development time is data cleanup
Great rubyconf talk on thishttp://www.slideshare.net/ryanweald/building-data-driven-products-with-ruby-rubyconf-2012
CSV Input file aka Training Set
First column is expected values.
2nd through N columns are input values
"French", "Je pense donc j'essuie", "Paris"
Output an input more input
No header columns 250MB max file size
1. Create a CSV file of your training data2. Create a new Project in the Prediction API
a. requires entering billing info
3. Upload your csv file to Google Storage4. In Prediction API Browser:
a. insert a new training setb. view your trained setc. use trainedmodel.predict to make
predictionsSee the hello world for details of the method calls
4 Steps to Prediction
Let's make some predictions...
Storage for datasets
https://storage.cloud.google.com
API Explorer
https://developers.google.com/apis-explorer/#s/prediction/v1.5/