Post on 05-Dec-2014
description
transcript
Regression Analysis & Prediction
Devon JonesLead Systems Engineer, Knewton
Gabe HamiltonSoftware Engineering Mgr, Revionics
For those who work downtown, check out our DOSUG inspired group.
meetup.com/TechConfluence
3rd Wednesday of the month
At lunch: 12:30 - 1:30pm
Tech Confluence
The Plan
1. Regression Analysis - Devon
2. Google Prediction API - Gabe
3. Applying Regression - Devon
Google Prediction API
{ "label": "awesome", "score": 0.98 },
{ "label": "lame", "score": 0.08 }
Gabe Hamilton
What kind of Prediction?
Predict an output value based on some input values.
Things like:
Sentiment Analysis, Spam Detection, Today's temperature, GDP Growth
How does Google predict things?
Through an intensive breeding program Google has managed to distribute Punxsutawney Phils throughout its datacenters across the world. Each Phil is kept in a climate controlled enclosure that mimics the conditions of a perfectly average February 2nd. A full scale digital sundial maps your problem domain onto the shadow matrix of the enclosure allowing each Phil to fully interact with your model. The early spring / long winter emergence probability of each Phil is then sorted and reduced to determine the final result returned by the prediction API.
Well, it's Google
No Really, How do they do it?
Short Answer:I have no idea
Long answer:It's a service, they can do whatever works, swap implementations run multiple algorithms
Possible Implementations
Regression AnalysisNeural NetworksSupport Vector MachineMonte Carlo SimDecision TreesEvolutionary Algorithms
Basically it is
STATISTICS
Types of Prediction you can do
Regression
How do inputs cause an output to vary?
Output is a numeric value: Shopping Cart Size Stock Price
Classification
Deciding which bucket some input belongs in
Buckets are text values: French, Spanish, English
What is Classification good for?
Classification
● Sentiment analysis● Spam detection● Language categorization● Tagging● Assign priority to bugs● Predict movie ratings● Message routing decisions● <Your brilliant idea here>
Hello World page is great
https://developers.google.com/prediction/docs/hello_world
Getting Started
So you have a big pile of data
Time for some cleanup
90% of the development time is data cleanup
Good talk on data driven projectshttp://www.slideshare.net/ryanweald/building-data-driven-products-with-ruby-rubyconf-2012
CSV Input file aka Training Set
First column is expected values.
2nd through N columns are input values
"French", "Je pense donc j'essuie", "Paris"
Output an input more input
No header columns 250MB max file size
1. Create a CSV file of your training data2. Create a new Project in the Prediction API
a. requires entering billing info3. Upload your csv file to Google Storage4. In Prediction API Browser:
a. insert a new training set (the csv file)b. view your trained setc. use trainedmodel.predict to make
predictionsSee the hello world for details of the method calls
4 Steps to Prediction
Let's make some predictions...
Live demo screenshots: List Models
Live demo screens: Analyze Model
Live demo: Predict Model Category
Live demo: Predict Model Numeric
Storage for datasets
https://storage.cloud.google.com
API Explorerhttps://developers.google.com/apis-explorer/#s/prediction/v1.6/