● BigData● BigQuery● CloudML● Cloud API
Karthik PadmanabhanDeveloper Relations
@ karthik_padman
Big data and machine learning at Google
Big Query Cloud Dataflow Cloud ML
Anything you can ask in SQL
Parallel processing, batch and stream
Machine learning, neural networks
Big data and machine learning at Google
Big Query Cloud Dataflow Cloud ML
Anything you can ask in SQL
Parallel processing, batch and stream
Machine learning, neural networks
Big data and machine learning at Google
Big Query Cloud Dataflow Cloud ML
Apache Beam Tensorflow
Open source
Big data and machine learning at Google
Big Query Cloud Dataflow Cloud ML
Apache Beam Tensorflow
Vision API
Speech API
Translate API
Pre-trained models
Cloud Dataflow demo
Vision API demo
Tensorflow demo
Photo credit: Matt Chanphoto credit - isaiah115 on flickr
Photo credit: Matt Chan
Google Research Publications
Google Research Publications
Open Source Implementations
Bigtable
Flume
Dremel
Managed Cloud Versions
Bigtable Bigtable
Flume Dataflow
Dremel BigQuery
BigQuery demo
Google BigQueryGoogle BigQuery
02 Count some stuff
SELECT count(word)FROM publicdata:samples.shakespeare
Words in Shakespeare
SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_20150212_01]
Wikipedia hits over 1 hour
SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_201505]
Wikipedia hits over 1 month
Several years of Wikipedia data
SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_201105], [fh-bigquery:wikipedia.pagecounts_201106], [fh-bigquery:wikipedia.pagecounts_201107],
...
SELECT SUM(requests) AS totalFROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")')
Several years of Wikipedia data
How about a RegExp
SELECT SUM(requests) AS totalFROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")')WHERE (REGEXP_MATCH(title, '.*[dD]inosaur.*'))
03 How did it do that?o_O
Qualities of a good RDBMS
Qualities of a good RDBMS
● Inserts & locking● Indexing● Cache● Query planning
Qualities of a good RDBMS
● Inserts & locking● Indexing● Cache● Query planning
Storing data
-- -- -- ---- -- -- ---- -- -- --
Table
Columns
Disks
Reading data: Life of a BigQuery
SELECT sum(requests) as sumFROM ( SELECT requests, title FROM [fh-bigquery:wikipedia.pagecounts_201501] WHERE (REGEXP_MATCH(title, '[Jj]en.+')) )
Life of a BigQuery
L L
MMixer
Leaf
Storage
L L L L
M M
M
Life of a BigQuery
Root Mixer
Mixer
Leaf
Storage
Life of a BigQueryQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
Life of a BigQueryLife of a BigQuery
Root Mixer
Mixer
Leaf
StorageSELECT requests, title
L L L L
M M
M
Life of a BigQueryLife of a BigQuery
Root Mixer
Mixer
Leaf
Storage5.4 Bil
SELECT requests, title
WHERE (REGEXP_MATCH(title, '[Jj]en.+'))L L L L
M M
M
Life of a BigQueryLife of a BigQuery
Root Mixer
Mixer
Leaf
Storage5.4 Bil
SELECT sum(requests)
5.8 MilWHERE (REGEXP_MATCH(title, '[Jj]en.+'))
SELECT requests, title
L L L L
M M
M
Life of a BigQueryLife of a BigQuery
Root Mixer
Mixer
Leaf
Storage5.4 Bil
SELECT sum(requests)
5.8 MilWHERE (REGEXP_MATCH(title, '[Jj]en.+'))
SELECT requests, title
SELECT sum(requests)
L L L L
M M
M
04 Something Useful Use Wikipedia data to pick a movie
1. Wikipedia edits2. ???3. Movie recommendation
Follow the edits
Follow the edits
Same editor
select title, id, count(id) as editsfrom [publicdata:samples.wikipedia]where title contains 'Hackers' and title contains '(film)' and wp_namespace = 0group by title, idorder by editslimit 10
Pick a great movie
select title, id, count(id) as edits from [publicdata:samples.wikipedia]where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where
id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)'group each by title, idorder by edits desclimit 100
Find edits in common
Discover the most broadly popular filmsselect id from ( select id, count(id) as edits from [publicdata:samples.wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20)
Edits in common, minus broadly popularselect title, id, count(id) as edits from [publicdata:samples.wikipedia]where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where
id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)' and id not in (
select id from ( select id, count(id) as edits from [publicdata:samples.wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20 ) )group each by title, idorder by edits desclimit 100
Interesting challenges await
The plan
01
02
03
04
05
A (very) brief overview of machine learning
Vision API
Speech API
Natural Language API
Tears (of joy)
Confidential & ProprietaryGoogle Cloud Platform 51
Machine Learning is
using many examples to answer questions
Confidential & ProprietaryGoogle Cloud Platform 52
Confidential & ProprietaryGoogle Cloud Platform 53
Why the sudden explosion in machine learning?
Confidential & ProprietaryGoogle Cloud Platform 54
Confidential & ProprietaryGoogle Cloud Platform 55
Confidential & ProprietaryGoogle Cloud Platform 56
Confidential & ProprietaryGoogle Cloud Platform 57
Google Cloud is
The Datacenter as a Computer
Confidential & ProprietaryGoogle Cloud Platform 58
Confidential & ProprietaryGoogle Cloud Platform 59
Confidential & ProprietaryGoogle Cloud Platform 60
Confidential & ProprietaryGoogle Cloud Platform 61
Confidential & ProprietaryGoogle Cloud Platform 62
So what's special?
● Sound → Text
● Pixels → Meaning
Understanding the real world is hard
Confidential & ProprietaryGoogle Cloud Platform 63
How can we make it easier?
Confidential & ProprietaryGoogle Cloud Platform 64
Cloud Speech API Cloud Vision API
Confidential & ProprietaryGoogle Cloud Platform 6565
Speech API● Speech to text transcription in over 80 languages
● Supports streaming and non-streaming recognition
● Filters inappropriate content
● Demo!
Confidential & ProprietaryGoogle Cloud Platform 6666
Vision API
● Label
● Landmark
● Logo
● Face
● Text
● Safe search
Photo attributions: Eiffel Tower (Creative Commons via Sathish J), Lens (Creative Commons via Mark Hunter)
67
{ "labelAnnotations": [ { "mid": "/m/0c9ph5", "description": "Flower", "score": 98 }, { "mid": "/m/05s2s", "description": "Plant", "score": 93 }, { "mid": "/m/03bmqb", "description": "Flora", "score": 83 }, { "mid": "/m/0k3b9", "description": "Hydrangea", "score": 81 }, ] }
Label Detection
67
68
{
"landmarkAnnotations" : [
{
"boundingPoly" : {
"vertices" : [
{
"x" : 52,
"y" : 25
},
...
]
},
"mid" : "\/m\/0b__kbm",
"score" : 0.4231607,
"description" : "The Wizarding World of Harry Potter",
"locations" : [
{
"latLng" : {
"longitude" : -81.471261,
"latitude" : 28.473
}
}
]
}
]
}
Landmark Detection
68
69
{..."itemListElement": [ { "@type": "EntitySearchResult", "result": { "@id": "kg:/m/0b__kbm", "name": "The Wizarding World of Harry Potter", ...
"detailedDescription": { "articleBody": "The Wizarding World of Harry Potter is a themed area spanning two theme parks – Islands of Adventure and Universal Studios Florida – at the Universal Orlando Resort in Orlando, Florida, USA.\n",
...
Knowledge Graph sidebarGET https://kgsearch.googleapis.com/v1/entities:search?ids=%2Fm%2F0b__kbm&key={API_KEY}
70
"faceAnnotations" : [
{
"headwearLikelihood" : "VERY_UNLIKELY",
"surpriseLikelihood" : "VERY_UNLIKELY",
"rollAngle" : 8.5484314,
"angerLikelihood" : "VERY_UNLIKELY",
"detectionConfidence" : 0.9996134,
"joyLikelihood" : "VERY_LIKELY",
"panAngle" : 18.178885,
"sorrowLikelihood" : "VERY_UNLIKELY",
"tiltAngle" : -12.244568,
"underExposedLikelihood" : "VERY_UNLIKELY",
"blurredLikelihood" : "VERY_UNLIKELY"
"landmarks" : [
{
"type" : "LEFT_EYE",
"position" : {
"x" : 268.25815,
"y" : 491.55255,
"z" : -0.0022390306
}
},
...
Face Detection
70
{
"type" : "RIGHT_EYE",
"position" : {
"x" : 418.42868,
"y" : 508.22632,
"z" : 49.302765
}
},
{
"type" : "MIDPOINT_BETWEEN_EYES",
"position" : {
"x" : 359.86551,
"y" : 500.2868,
"z" : -7.9241152
}
},
{
"type" : "NOSE_TIP",
"position" : {
"x" : 358.51404,
"y" : 611.80286,
"z" : -31.350466
}
},
...
Confidential & ProprietaryGoogle Cloud Platform 71
Confidential & ProprietaryGoogle Cloud Platform 72
Confidential & ProprietaryGoogle Cloud Platform 73
Confidential & ProprietaryGoogle Cloud Platform 74
How about some meaning in those words?
Confidential & ProprietaryGoogle Cloud Platform 75
Natural Language API
Three methods:
1. Analyze entities - Montreal is a city in Canada
2. Analyze sentiment - I love Montreal
3. Analyze syntax - Michelle Obama is married to
Barack Obama
Confidential & ProprietaryGoogle Cloud Platform 76
https://cloud.google.com/nl
77
Free tears!
78
● Vision API - 1,000 requests / month
● Speech API - 60 minutes / month
● Natural Language API - 5,000 units /
month (1 unit = 1000 unicode
characters)
Free tears!tiers
Thank you!@karthik_padman
Resources:
Speech APIcloud.google.com/speech
Vision APIcloud.google.com/vision
Natural Language APIcloud.google.com/nl
Thank you!
Karthik PadmanabhanDeveloper RelationsGoogle Cloud Platform@ karthik_padman
Try BigQuery: bigquery.google.comCloud APICloudML
Slides:
About you
● Game developers?● Data people?● Students?● Not techies at all?