+ All Categories
Home > Technology > IoT at Google Scale

IoT at Google Scale

Date post: 15-Jan-2017
Category:
Upload: james-chittenden
View: 1,486 times
Download: 0 times
Share this document with a friend
30
IoT @ Google Scale James Chittenden Google Cloud Platform Solutions Engineer [email protected]
Transcript

IoT @ Google ScaleJames Chittenden Google Cloud Platform Solutions [email protected]

+James Chittenden(Big Data Cloud Engineer)

[email protected]

Big Data at Googleaka. Data at Google

Manage the Entire Lifecycle of Big Data

Cloud Logs

Google App Engine

Google Analytics Premium

Cloud Pub/Sub

BigQuery Storage(tables)

Cloud Bigtable(noSQL)

Cloud Storage(files)

Cloud Dataflow

BigQuery Analytics(SQL)

Capture Store Analyze

Batch

Real time analytics and Alerts

Cloud DataStore

Process

Stream

Cloud Dataflow

Cloud Monitoring

End to End View of the GCP IoT Architecture

Device to Device Protocols

● Device Discovery● Device to Device authentication● Device Configuration● Protocol Routing

Machine Learning: Pattern Detection and Prediction

● Subscribers scan real time streams and feed data into the Machine Learning Recognition algorithm

● Dataflow Orchestrates streaming algorithms which compare data streams against Experience Database

● Correlators detect known patterns and publish alerts using Cloud Pub/Sub

Cloud Storage Archival and Retrieval

● Data is periodically unloaded from Big Table and stored in Cloud Storage for archival

● Data in Cloud Storage can be quickly re-loaded in Big Table should it need to be re-processed.

Cloud Pub/SubReal-time and reliable messaging with Pub/Sub

Messaging is a shock-absorber

Throughput LatencyAvailability

Images by Connie Zhou

• Buffer new requests during outages

• Prevent overloads that cause outages

• Redirect requests to recover from outages

• Smooth out spikes in new request rate

• Balance load across multiple workers

• Balance arrival rate with service rate

• Accept requests closer to the network edge

• Optimize message flow across regions

• Leverage shared efforts to improve protocols

Pub/Sub is a change-absorber

Sinks TransformsSources

Images by Connie Zhou

• New data sources can plug into old data flows

• New data sources can use new schemas

• Common security policies for all sources

• Data can be sent to new destinations

• Push and Pull delivery are both available

• Spans organizational boundaries

• Select subsets of messages that matter

• Helps manage schema and version changes

• Can merge streams into new topics

Chat & Mobile

Every time your GMail box pops up a new message, it’s because of a push notification to your browser or mobile device.

One of the most important real-time information streams in the company is advertising revenue — we use Pub/Sub to broadcast budgets to our entire fleet of search engines

Google Cloud Messaging for Android delivers billions of messages a day, reliably and securely for Google’s own mobile apps and the entire developer community

Updating search results as you type is a feat of real-time indexing that depends on Pub/Sub to update caches with breaking news

Ads & Budgets Instant SearchPush Notifications

Pub/Sub at Google

HTTP ServerSubscriber

Pub/Sub System

WebhookDelivery

Publisher

Topic

Subscription

HTTP PushDelivery

GoogleApp Engine

Pull Subscriber

Subscription Subscription

Google RPCDelivery

CloudDataflow

Subscription

On-Prem/Cloud Any Environment

Subscriber

Msg

Pub/Sub System

Subscriber

Msg

Pub/Sub System

Ack

RPC SendRPC Return

Ack

Push Subscription Pull Subscription

“We don’t really run MapReduce at Google anymore”- Urs Hoelzle

Google Dataflow

Google Technologies

SpannerDremelMapReduce

Big Table MillWheel

2012 2014+2002 2004 2006 2008 2010

GFS

2013

More!

Flumejava

Colossus

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Dataflow Goodies

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Pipeline p = Pipeline.create();

p.begin()

.apply(TextIO.Read.from(“gs://…”))

.apply(ParDo.of(new ExtractTags())

.apply(Count.create())

.apply(ParDo.of(new ExpandPrefixes())

.apply(Top.largestPerKey(3))

.apply(TextIO.Write.to(“gs://…”));

p.run();

Dataflow Goodies

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Deploy

Schedule & Monitor

Dataflow Goodies

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

800 RPS 1200 RPS 5000 RPS 50 RPS

Dataflow Goodies

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Dataflow Goodies

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Pipeline p = Pipeline.create();

p.begin()

.apply(TextIO.Read.from(“gs://…”))

.apply(ParDo.of(new ExtractTags())

.apply(Count.create())

.apply(ParDo.of(new ExpandPrefixes())

.apply(Top.largestPerKey(3))

.apply(TextIO.Write.to(“gs://…”));

p.run();

.apply(PubsubIO.Read.from(“input_topic”))

.apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))

.apply(PubsubIO.Write.to(“output_topic”));

Dataflow Goodies

Unified Model

Unified Model

Pub/Sub + Dataflow + BigQuery Demo

Life of a Pipeline

Dataflow

Your Data BigQuery

Fast ETLRegexJSONUDFs

Spreadsheets

BI Tools

Coworkers

Applications + Reports PubSub

Cloud Storage

BigTable

Enterprise Big Data Architecture on Google

Plus True Stream Processing

Plus Autoscaling and per-minute billing

All the benefits of Hadoop-on-Google

Plus a Fully-Managed Service

Plus New, Intuitive Framework

1

2

3

4

5

Why Dataflow?

Questions?


Recommended