BIG DATA IN THE GOOGLE CLOUD BIGQUERY, APACHE BEAM, DATAFLOW
2107.06.12.
Kassai Csaba - Lead Data Architect
Farkas Péter - Data Engineer
BIG DATA IN THE GOOGLE CLOUD
● Google Cloud Storage
● Google BigQuery
● Apache Beam
● Google Cloud Pub/Sub
● Google Cloud Dataflow
● Case studies
Agenda
GCP
Cloud Storage
Google Cloud Storage
When a write succeeds, the latest copy of the object is guaranteed to be returned to any GET, globally. This applies to PUTs of new or overwritten objects and DELETEs.
Consistency
Google Cloud Storage
Object Lifecycle Management
● delete live/archived objects
● “downgrade” storage class
Actions Conditions
● age● create time● live/archive● # newer versions● storage class
Google Cloud Storage
Pricing● Storage● Data retrieval (Nearline,
Coldline)● Network● Operations
Google Cloud Storage
Quickstart
https://cloud.google.com/storage/docs/quickstart-consolehttps://cloud.google.com/storage/docs/quickstart-gsutil
A fast, economical and fully-managed
enterprise data warehouse for
large-scale data analytics
Google BigQuery
enterprise data warehouse
fast & large-scale
fully-managed
economical
Google BigQuery
Dremel
Google BigQuery
Structure
SQL QueryPetabit Network
BigQuery
Storage ComputeStreaming Ingest
Fast Batch Load
Google BigQuery
Columnar-storage
Size: 60 GB
c1 c2 c3 c4 c5
125 GB
80GB
45GB
99GB
20160101
20160102
20160103
20160104
20160105
Google BigQuery
(Almost) append-only
● Data Manipulation Language: with a lot of constraints
○ No required field
○ Empty streaming buffer
○ Partitioned tables are not supported
○ No multi-statement transaction
○ Limited concurrency
● Use as an append only db when possible
A BRIEF INTRODUCTION TO BIG QUERY
Structure / Dataset
PROJECT
DATASETS
Contain a collection of tables, views
Access controll applied to all tables/views in dataset
ACLs for Readers, Writers and OwnersAccess can be granted to datasets for users who are not members of the project
PROJECT
DATASETS
TABLES
A BRIEF INTRODUCTION TO BIG QUERY
Structure / Table
Data stored in managed storageCollection of columns and rows
Virtual tables defined by SQL query
Have a schema
Views are supported
Describes strongly-typed columns of values
PROJECT
DATASETS
JOBS
TABLES
A BRIEF INTRODUCTION TO BIG QUERY
Structure / Job
Used to start all potentially long-running actions
Examples:
Can be cancelled
Queries, Importing / exporting data, Copying data
A BRIEF INTRODUCTION TO BIG QUERY
Schema - Types
● INT, FLOAT, STRING, BOOLEAN, BYTE
● DATE, DATETIME, TIME, TIMESTAMP
● ARRAY: An ARRAY is an ordered list of zero or more elements of
non-ARRAY values
● STRUCT: Container of ordered fields each with a type (required) and field
name (optional).
A BRIEF INTRODUCTION TO BIG QUERY
Query results
Used by caching
Free storage
Limited lifetime
TEMPORARY TABLES
permanent
billed
USER-DEFINED TABLES
A BRIEF INTRODUCTION TO BIG QUERY
Pricing
/GB/month
in /MB/sec granularity
discount after 90 days
10 GB per month is free
STORAGE
amount of data processed by the query
First 1 TB/month free
Cached result free
Error - free
insert row by row via the REST API
/GB
QUERIESSTREAMING
INSERT
A BRIEF INTRODUCTION TO BIG QUERY
Interfaces
WEB UI CLI RESTFUL API
A BRIEF INTRODUCTION TO BIG QUERY
BQ basic exercises
https://cloud.google.com/bigquery/quickstart-web-ui goo.gl/jxU7a5
A BRIEF INTRODUCTION TO BIG QUERY
BQ as ETL tool
Daily snapshots of the source table as
CSV
Dimension table with Type-2 history
in BQ
A BRIEF INTRODUCTION TO BIG QUERY
BQ as ETL tool - Source
Schema● STORENO: unique id of the store● STORENAME: name of the store● CHAIN: name of the chain where store belong to. Can be null. ● STORETYPE: type of the store. INTERNAL or EXTERNAL. Only INTERNAL stores should be
imported into BQ. ● BATCHDATE: the date when the snapshot was created
Location: https://console.cloud.google.com/storage/browser/bdf-bigquery-demo/storedata/
Separator: ‘;’
A BRIEF INTRODUCTION TO BIG QUERY
BQ as ETL tool - Target
BQ - Schema● code: unique id of the store● name: name of the store● chain: name of the chain where store belong to. Can be null. ● valid_from: Type2 history● valid_to: Type2 history
A BRIEF INTRODUCTION TO BIG QUERY
BQ as ETL tool - SolutionData import:
bq load --autodetect --field_delimiter=';' --replace {project_id}:bdf_demo.store_raw gs://bdf-bigquery-demo/storedata/*
Query for data transformation:https://bigquery.cloud.google.com/savedquery/862243936433:2283d8e8c4e942e1ae5ed8f7ed3d1cbd
View for proper Type-2 history:
SELECT * EXCEPT(deleted) from ( SELECT *, LEAD(valid_from) over(PARTITION BY code ORDER BY valid_from ) AS valid_to FROM `{project_id}.bdf_demo.store`)WHERE deleted = FALSE
PROGRAMMING MODEL RUNNERS
APACHE BEAM MODEL
Processing- vs event-time
Source: The world beyond batch: Streaming 102 (Tyler Akidau)
APACHE BEAM MODEL
Watermark
A watermark with a value of time X makes the statement: “all input data with event times less than X have been observed.” As such, watermarks act as a metric of progress when observing an unbounded data source with no known end.
APACHE BEAM MODEL
Watermark
Source: The world beyond batch: Streaming 102 (Tyler Akidau)
Source SinkPTransformPCollection PCollection
APACHE BEAM MODEL
Pipeline structure
What results are being computed?
Where in event time they are being computed?
When in processing time they are materialized?
How earlier results relate to later refinements?
APACHE BEAM MODEL
Concepts
Element wise Aggregation Composite
APACHE BEAM MODEL
What are you computing?
PTransform
APACHE BEAM MODEL
What are you computing? PCollection<Integer> salesRecords = ...;
PCollection<Integer> totalSales = salesRecords
.apply(new Sum.SumIntegerFn());
APACHE BEAM MODEL
What are you computing?
Source: The world beyond batch: Streaming 102 (Tyler Akidau)
● Fixed
APACHE BEAM MODEL
Where in event time?
1
2
3
Key 1 Key 2 Key 3
● Fixed
● Sliding
APACHE BEAM MODEL
Where in event time?
12
3
Key 1 Key 2 Key 3
● Fixed
● Sliding
● Per-Session
APACHE BEAM MODEL
Where in event time?
12
4
Key 1 Key 2 Key 3
3
PCollection<Integer> salesRecords = ...;
PCollection<Integer> totalSales = salesRecords
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))
.apply(new Sum.SumIntegerFn());
APACHE BEAM MODEL
Where in event time?
APACHE BEAM MODEL
Where in event time?
Source: The world beyond batch: Streaming 102 (Tyler Akidau)
Time based Data-driven Composite
APACHE BEAM MODEL
When in processing time? Triggers
Triggers
APACHE BEAM MODEL
When in processing time? PCollection<Integer> salesRecords = ...;
PCollection<Integer> totalSales = salesRecords
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AtWatermark()
.withEarlyFirings(AtPeriod(Duration.standardMinutes(1)))
.withLateFirings(AtCount(1))))
.apply(new Sum.SumIntegerFn());
APACHE BEAM MODEL
When in processing time?
Source: The world beyond batch: Streaming 102 (Tyler Akidau)
Firing Elements Discarding Accumulating Accumulating & Retracting
Early 3, 4 7 7 7
Watermark 2, 6 8 15 15, -7
Late 3 3 18 18, -15
Total observed
18 18 40 18
APACHE BEAM MODEL
How refinements relate?
APACHE BEAM MODEL
What Where When How PCollection<Integer> salesRecords = ...;
PCollection<Integer> totalSales = salesRecords
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(1)))
.discardingFiredPanes())
.apply(new Sum.SumDoubleFn());
APACHE BEAM MODEL
Live demo - Events
00:01 00:02 00:03
23:59 00:00 00:01 00:02 00:03
5 732 6 4 71 9 8
23:59 00:00
5 7 3 2 6 4 1 97 8
10 27 15
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:02
watermark23:59:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:17
watermark23:59:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:21
watermark23:59:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:27
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:48
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:50
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:58
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:02
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:12
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:16
watermark00:01:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:48
watermark00:01:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:02:01
watermark00:01:00
APACHE BEAM MODEL
Live demo - Pipeline 1 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:02:22
watermark00:02:00
APACHE BEAM MODEL
Live demo - Pipeline 2 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:02
watermark23:59:00
time00:00:17
watermark23:59:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:21
watermark23:59:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:27
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:47
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:48
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:50
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:00:58
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:02
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:12
watermark00:00:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:16
watermark00:01:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:32
watermark00:01:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:01:48
watermark00:01:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:02:01
watermark00:01:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:02:18
watermark00:01:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
time00:02:22
watermark00:02:00
APACHE BEAM MODEL
Live demo - Pipeline 2
pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
APACHE BEAM MODEL
Live demo - Pipeline 3 pipeline.apply(HumanIO.read()).setCoder(StickyNotesCoder.of());
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))
.triggering(AfterWatermark.pastEndOfWindow())
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(30)))
.withLateFirings(AfterPane.elementCountAtLeast(1)))
.discardingFiredPanes())
.apply(Sum.integersGlobally());
.apply(FlipChartIO.write())
Events
● Messaging - Many-to-many topology
● Topic - subscription model
● No-ops
● At-least one delivery
● Rest API
● Scalable: 10000 message/sec by default
Google Cloud Pub/Sub
Google Cloud Pub/Sub
Google Cloud Pub/Sub
Intro
https://cloud.google.com/pubsub/docs/quickstart-consolehttps://cloud.google.com/pubsub/docs/quickstart-cli
CLOUD DATAFLOW
The Cloud Dataflow runner
● Fully managed, no-ops
execution environment
● Seamless integration with other
GCP services
● Autoscale
CLOUD DATAFLOW
Fully managed
● Dynamic work rebalancing
● Graph optimization
● Worker lifecycle management
CLOUD DATAFLOW
Monitoring interface
CLOUD DATAFLOW
Logging
More to read - Beam/Dataflowhttps://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
https://www.youtube.com/watch?v=3UfZN59Nsk8
https://cloud.google.com/dataflow/
http://beam.incubator.apache.org/
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
Processing and storing sales transactions in real time, in order to do:
● Performance metrics● Demand prediction● Logistic optimization● Collecting and selling insights
USE CASE
Retail BI system
USE CASE
Architecture