Date post: | 26-Dec-2014 |
Category: |
Technology |
Upload: | radu-sebastian-amarie |
View: | 638 times |
Download: | 1 times |
Predicting the future with Google Prediction
API
Talks #32
RESTful API Flexible Input
Asynchronous cloud-based training, automatic model selection and tuning, and the ability to add training data on the fly.
Numeric or text input that can output hundreds of discrete categories or continuous values.
Great, so what do we do now?
The same thing we do every night Pinky, TRY TO TAKE OVER THE WORLD!
Does that take any money?
• Well… It’s free. În limita bunului simț :D
1.0 requests/second/user100 requests/dayTraining de 5MB / zi100 de streaming updates / zi
• Lifetime cap (20k predicții), deci după 20k predicții trebuie să începi să plătești...
Great, so what do I get for my MONEY? X(• 10$, 10k predicții pe lună gratuite• 10k streaming updates gratis• Max training upload (via Google Cloud Storage 2.5GB)
How do I get started?
• Glad you asked!
• Trebuie să creezi un proiect nou în Google Console API și să enable• Google Prediction API• Google Cloud Storage API (requires billing ON, adică vrea cardul tău)
Great, any documentation to read?
• Yes!• But it totally sucks. (Toate lucrurile din Tools and Resources au link-uri
broken…)• But the Hello World example works. Yuppie!
Great, I got things done, now What?
• Now we traing the CSV. If we have it• If not we build it.
Great, how should my CSV look like?
“like”, “Am castigat la loto si vreau sa dau tuturor hosting gratuity forever”, “bucuresti” , “loto”“dislike”, “Doi caini maidanezi au muscat 3 pisici clonate si au murit.”, “bucuresti” , “venim”
[output], [feature1], [feature2], [feature3]
Output = Output. Hhahah.
Feature = Input. Poate să fie numeric / text / whatever.
Și FĂRĂ HEADERE la CSV.
Și de maxim 2.5GB.
Eh, dacă ai varianta Free de Google Prediction, 4mb mai exact
Great, ne arăți unul?
That’s one ugly Excel, not a CSV
NEVER USE EXCEL!Nu face output *content**quotation_mark**comma**quotation_mark**content*
Și nici uploadat în Google Drive și Export din Spreadsheet-ul lor.
So, go for OpenOffice!
So? Now what?
• Upload la CSV în Google Cloud Storage.
500 training Data = 18 sec
476 instances? Shouldn’t be 500 ?
Let’s see some fresh meat. I mean tweet. Lol
So, cât de bine prezice Google Prediction API ?
• Un băiețaș a vrut să facă niște teste / exemple:• http://blog.notdot.net/2010/06/Trying-out-the-new-Prediction-API• Training on movie/book reviews to try and predict the score given
based on the text• Training on product descriptions to try and predict their rating• Training on Reddit submissions to try and predict the subreddit a new
submission belongs in
Guessing subreddits with the Prediction API• He had: 75MB of JSON-encoded data, comprising 72,986 submissions• A determinat 20 subreddits cu cele mai multe submisii in acea
perioada de timp• This subset made up 42,753 submissions, or about 58% of the
original.• Submissions were randomly split into either the training set (98%) or
the validation set (2%):
Reddit Submissions
reddit.com 14578
pics 4157
AskReddit 3375
reportthespammers 3258
politics 3162
funny 2176
WTF 1773
gaming 1367
worldnews 938
videos 849
atheism 834
Music 833
technology 732
trees 703
comics 639
nsfw 611
circlejerk 600
news 567
environment 537
DoesAnybodyElse 537
După training, Google a estimate o rată de success de 61%
So? Cum s-a descurcat?
484 of 857 predicted correctly.56% - not far off the system's own estimate.
Where’s the problem?
• People are the problem, not Google Prediction API• Userii au pus incorect categoriile. NEVER TRUST THE USER!
Anyway, back la oile noastre
• Data Harvesting (from Twitter)• Phirehose - https://github.com/fennb/phirehose - a php interface to
twitter streaming api • What have I gathered?
1,3GB twitter #bigdata harvesting. Hihi
Am luat 500 de exemple (but the more, the better)
Le-am introdus în excel, și împărțit în 3 bucket-uri (0,1,2)
0 = Dislike = nu-mi place1 = Fav = îi dau doar fav2 = Reshare dar îi dau și fav și retweet Save to CSV, upload, TRAIN.
So, cu ce ne ajută?
The interesting part, este că deși avem 3 valori (0 sau 1 sau 2),El ne va return un float între 0 și 2, adică un rezultat de 1,563212 este foarte posibil!
Ce-am folosit for the Twitter Bot cool Follower gathering Application blabla?• folosit PHP Library-ul asociat Google Prediction si anume
serviceAccount.php• E stricat!
$result = $service->trainedmodels->predict($id, $input);
Trebuie să fie:
$service->trainedmodels->predict($project, $id, $input);
What else?
• Twitter API Exchanger - https://github.com/J7mbo/twitter-api-php
So ce anume facem?
• Database, luam ultimul Tweet• Vedem ce scor scoate• Daca scoate un scor bun ii dam fav / retweet.• Atât.
Huge recap?
• New Google Project• Enable Google Prediction & Google Cloud Storage• Upload your training CSV• Make Predictions via API EXPLORER• Download PHP Library for Google Prediction & Twitter Library• Fix Google Library• Put all toghether in one php file• Run it, put a sleep, make it run forever lol.