Date post: | 20-Jan-2017 |
Category: |
Technology |
Upload: | edwin-poot |
View: | 144 times |
Download: | 0 times |
Analyzing petabytes of smart meter data using Cloud Bigtable, Cloud Dataflow, and BigQuery
Edwin Poot & Erik van Wijk, Energyworx
Max Luebbe, Google
3
● rise of renewable energy sources
● regulation & market demands
● competition & increased costs
● intelligent devices in the home or along the utilities infrastructure (“Internet of Things”)
● two-way flow of information instead of one-way
● increase of consumption
4
1. increasing density brings increasing data quality problems
2. strict regulations for safeguarding user privacy
3. redistribution of economic power and energy demand
4. rising competition between distributed and central
5. innovation outpaces regulation
Top 5 industry challenges
www.energyworx.com
CHINA435 M
USA132 M
JAPAN58.7 MFRANCE
35 M
UK53 M
NL8 M
Italy32 M
Ontario4.7 M
British Columbia
1.2 M
Quebec3.8 M Germany
50 M
5
conventional utility systems cannot cope with this data diversity and endless stream of all types, shapes and sizes
smart meters
smart grid equipmentsensors
home automation
multichannel customer interactions
consumers’ usage behavior
weather
social
spatial
creating a single, centralized view of data – accessible to many, and for many use cases, that is the key to success
6
“We enable the energy evolution by uncovering and monetizing the hidden value of your data!”
ingest, process, analyze & learn
7
8
Enabling data-driven business models for the Energy & Utility industry since 2012
Offices in The Netherlands and in the United States,
Delivering a revolutionary data management & intelligence cloud
service disrupting the global Energy & Utilities market
Pushing out established vendors using pure play SaaS
Creating actionable information - sparking new
business concepts and models
Crunching data without being limited by scale,
speed and obsolete pricing models
9
generation
Meter Data Management
Renewable Energy Management
transmission trading distribution supply
Social EnergyConsumer Engagement
imbalancessettlements
Energy insights for wholesale connections
energyworx and the energy value chain
10
ENERGY INTELLIGENCE
ENERGY PROSUMERS & RETAILERS
Demand Response (price)
Energy Insights
Demand Response (load)
Grid InsightsRenewables Engagement
Gamification Benchmarking
Balancing Congestion
Optimization Anomalies
MARKETS & SOLUTIONS
ENERGY DATA MANAGEMENTMeter Data Management Energy Data Hub
ENERGY SYSTEM OPERATORS
11
● Always supporting the latest IoT products and/or equipment
● Protocol agnostic data ingestion and limitless computation capacity
● Cloud Machine learning to support new business concepts and models
● Pay as you grow SaaS model, so no large upfront investments
OUR ADVANTAGES
13
PLATFORM EVOLUTION HIGHLIGHTS
2012 2013 2014 2015 2016
- batched data- temporal aggregations - VEE- utility connectivity- API
- multi-tenancy- permissions- custom querying- grouping- tag properties
- datalabs (EDA)- Machine learning- CloudML- (A)DR
- streaming data- pseudonymisation- tagging- analytics- dynamic profiling- PayPerUse model
- IoT devices- many new adapters- performance- web console- Sheets addon
Data ingestion & management Insights & analysis Intelligence & IoT control
DELIVER A DATA MANAGEMENT & ANALYTICS SERVICE FOR ENERGY & UTILITY COMPANIES
PUBLIC
&
PRIVATECLOUD
15
18
Big Data technologies invented at Google
2012 20132002 2004 2006 2008 2010
GFS
MapReduce
Bigtable Colossus
Dremel Flume
Millwheel
20
… build a 100TB+ filesystem?
Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.
21
… build a 100TB+ filesystem?
Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.
Solution: GFS (replaced by higher-scale Colossus in 2010)
22
… build a 100TB+ filesystem?
Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.
Solution: GFS (replaced by higher-scale Colossus in 2010)
Google Cloud Storage
23
Need: Massive data index files took weeks to rebuild. We needed random read/write access.
… build a petabyte database?
24
Need: Massive data index files took weeks to rebuild. We needed random read/write access.
Solution: Bigtable (internal service launched 2006)
… build a petabyte database?
25
Need: Massive data index files took weeks to rebuild. We needed random read/write access.
Solution: Bigtable (internal service launched 2006)
Google Cloud Bigtable
… build a petabyte database?
26
Need: Ad hoc queries over massive quantities of data, in just seconds.
… query a trillion rows in seconds?
27
Need: Ad hoc queries over massive quantities of data, in just seconds.
Solution: Dremel
… query a trillion rows in seconds?
28
Need: Ad hoc queries over massive quantities of data, in just seconds.
Solution: Dremel
Google BigQuery
… query a trillion rows in seconds?
29
Need: Process petabytes of static and streaming data, quickly.
… build data-processing at Google scale?
30
Need: Process petabytes of static and streaming data, quickly.
Solution: MapReduce, Flume, and Millwheel
… build data-processing at Google scale?
31
Need: Process petabytes of static and streaming data, quickly.
Solution: MapReduce, Flume, and Millwheel
Google Cloud Dataflow
… build data-processing at Google scale?
34
Google Cloud Platform is the same infrastructure
Cloud Storage BigQuery Cloud DataflowCloud Bigtable
36
What is Cloud Bigtable?
NoSQL database for large datasets / large throughput
Supports sequential scans
Auto-adjusts to access patterns
37
Bigtable Node
Bigtable Node
Bigtable Node
How does Cloud Bigtable work?
Colossus Filesystem
Client Client Client Client Client Client
Processing
Storage
Clients
38
Node
Cloud Bigtable learns access patterns...
Filesystem
Node Node
Client Client Client Client Client Client
Processing
Storage
Clients
A B C D E
39
Node Node Node
… and rebalances data accordingly
Filesystem
Client Client Client Client Client Client
Processing
Storage
Clients
A B C D EB C
40
Throughput can be controlled by node count
Node Node Node
Nodes
80,000
60,000
40,000
20,000
QPS
Bigtable Nodes
864200
41
Throughput can be controlled by node count
400,000
300,000
200,000
100,000
QPS
Bigtable Nodes
4030201000
Nodes
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
42
Throughput can be controlled by node count
4,000,000
3,000,000
2,000,000
1,000,000
QPS
Bigtable Nodes
40030020010000
NodesNode Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
43
Years of engineering to...
Teach Bigtable to configure itself
Isolate performance from “noisy neighbors”
React automatically to new patterns, splitting and balancing
Cloud Bigtable
44
Google has had an internal cloud for over a decade
The same engineering that has made our internal services better makes our Cloud better:
Simpler control planes Multi-tenancy Adapts to large, new patterns
46
Why did we choose
● Fastest with consistent performance
● Competitive and transparent pricing
● Autoscale to millions of users (and back)
● Unlimited flexible storage and caching
● Big Data & Machine Learning capabilities
● Development SDK & tools
● 24/7 access to expert support resources
47
5 things we’ve learned along the way
1 2 3 4 5
SKILLS, KNOWLEDGE &
TRAININGREQUIRED
IMPLEMENTATION TIME CODE
ABSTRACTION USING API’S
PAAS SANDBOX
IMPACT ON BUSINESS MODEL
understand all PaaS possibilities and components to
prevent reinventing what already exists
and speed-up implementation &
migration
shorter release cycles require smaller feature sets per release, adapt
your software development &
release management method
to be cloud agnostic you need code
abstraction layers per PaaS service
you use
design and modify your software
architecture to fit the PaaS sandbox
adapt your business model to PaaS cost
model
49
INGEST PROCESS ANALYZESTORE
App Engine
Cloud PubSub
App EngineCloud Storage
Datastore
Bigtable
BigQuery
Cloud SQL
Dataflow
Dataproc CloudML
Datalab
BigQuery
API
Events
Devices
Validate
Aggregate
Calculate
Timeseries
Metadata
Tags
Insights
Predict
Decide
“Creating actionable insights - sparking new business
concepts and models. Crunching data without being
limited by scale, speed and obsolete pricing models.”
52
54
• Classification• Clustering• Regression• Anomaly detection• Prediction/forecasting• Motif discovery• Association rules
Exploratory Data Analysis with Energyworx
Uncover hidden value from your data!
Features:- part of Energyworx SaaS- autoscaling with demand- notebook development
environment - private & public models- Energyworx shared models