Streaming Patterns Revolutionary Architectures with the Kafka API

Post on 15-Apr-2017

411 views 5 download

transcript

© 2016 MapR Technologies L1-1 ® © 2016 MapR Technologies

®

Streaming Patterns, Revolutionary Architectures Carol McDonald

© 2016 MapR Technologies L1-2 ®

Agenda Streams Core Components

•  Topics, Partitions •  Fault Tolerance •  High Availability

Patterns •  Event Sourcing •  Duality of Streams and Databases •  Command Query Responsibility Separation •  Polyglot Persistence, Multiple Materialized Views •  Turning the Database Upside Down

Real World Examples •  Fraud Detection •  Healthcare Exchange

© 2016 MapR Technologies L1-3 ®

Which products are we discussing?

© 2016 MapR Technologies L1-4 ® © 2016 MapR Technologies © 2016 MapR Technologies

Streams Core Components

© 2016 MapR Technologies L1-5 ®

What’s a Stream ?

Producers Consumers Events_Stream

A stream is an unbounded sequence of events carried from a set of producers to a set of consumers.

Events

© 2016 MapR Technologies L1-6 ®

What is Streaming Data? Got Some Examples?

Data Collection Devices

Smart Machinery Phones and Tablets Home Automation

RFID Systems Digital Signage Security Systems Medical Devices

© 2016 MapR Technologies L1-7 ®

Why Streams?

Trigger Events: •  Stock Prices •  User Activity •  Sensor Data

Topic

Many Big Data sources are Event Oriented

Stream Stream Stream

Event Data

Topic Topic

Real-Time Analytics

© 2016 MapR Technologies L1-8 ®

Analyze Data What if you need to analyze data as it arrives?

© 2016 MapR Technologies L1-9 ®

It was hot at 6:05

yesterday!

Batch Processing with HDFS

Analyze

6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°

90° 90° 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°

© 2016 MapR Technologies L1-10 ®

Event Processing with Streams

6:05 P.M.: 90° Topic

Stream

Temperature

Turn on the air conditioning!

© 2016 MapR Technologies L1-11 ®

Organize Data What if you need to organize data as it arrives?

© 2016 MapR Technologies L1-12 ®

Integrating Many Data Sources and Applications

Sources (Producers)

Applications (Consumers)

Unorganized, Complicated, and Tightly Coupled.

© 2016 MapR Technologies L1-13 ®

Organize Data into Topics with MapR Streams Topics Organize Events into Categories and Decouple Producers from Consumers

Consumers

MapR Cluster

Topic: Pressure

Topic: Temperature

Topic: Warnings

Consumers

Consumers

Kafka API Kafka API

© 2016 MapR Technologies L1-14 ®

Process High Volume of Data What if you need to process a high volume of data as it arrives?

© 2016 MapR Technologies L1-15 ®

What if BP had detected problems before the oil hit the water ?

•  1M samples/sec •  High performance at

scale is necessary!

© 2016 MapR Technologies L1-16 ®

Legacy Messaging

Millions of Sources

Hundreds of Destinations insert

Legacy Message Queue:

Message rate <100K/s

Publish Acks

delete

Consume Acks

© 2016 MapR Technologies L1-17 ®

Mechanisms for Decoupling Traditional message queues? •  Huge performance hit for persistence:

•  message acknowledgement per message per consumer •  Lots of Non sequential disk I/O when messages added/removed

© 2016 MapR Technologies L1-18 ®

Scalable Messaging with MapR Streams

Server 1

Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning

Server 2

Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning

Server 3

Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning

Topics are partitioned for throughput and scalability

© 2016 MapR Technologies L1-19 ®

Scalable Messaging with MapR Streams

Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning

Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning

Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning

Producers are load balanced between partitions

Kafka API

© 2016 MapR Technologies L1-20 ®

Scalable Messaging with MapR Streams

Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning

Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning

Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning

Consumers

Consumers

Consumers

Consumer groups can read in parallel

Kafka API

© 2016 MapR Technologies L1-21 ®

Core Components: Partitions

Consumers

MapR Cluster

Topic: Admission / Server 1

Topic: Admission / Server 2

Topic: Admission / Server 3

Consumers

Consumers

Partition

1

Partitions: –  Messages are

appended in order

Offset: –  Sequential id of a

message in a partition Partition

2

Partition

3

6 5 4 3 2 1

3 2 1

5 4 3 2 1

Producers

Producers

Producers

New Message

6 5 4 3 2 1 Old

Message

© 2016 MapR Technologies L1-22 ®

Read Cursors •  Read cursor: offset ID of most recent read message •  Producers Append New messages to tail •  Consumers Read from head

MapR Cluster

6 5 4 3 2 1 Consumer

group Producers

Read cursors

Consumer group

© 2016 MapR Technologies L1-23 ®

Consumers

MapR Cluster

Topic: Admission / Server 1

Topic: Admission / Server 2

Topic: Admission / Server 3

Consumers

Consumers

Partition

1

Partition

2

Partition

3

6 5 4 3 2 1

3 2 1

5 4 3 2 1

Producers

Producers

Producers

Events are delivered in the order they are received, like a queue. Partitioned, Sequential Access = High Performance

New Message

6 5 4 3 2 1 Old

Message

© 2016 MapR Technologies L1-24 ®

Unlike a queue, events are persisted even after they’re delivered

Messages remain on the partition, available to other consumers Minimizes Non-Sequential disk read-writes

MapR Cluster (1 Server)

Topic: Warning

Partition 1

3 2 1 Unread Events

Get Unread

3 2 1

Client Library Consumer Poll

© 2016 MapR Technologies L1-25 ®

Considering a Messaging Platform Kafka-esque Logs?

•  Sequential writing/reading disk: •  Messages are persisted sequentially as produced, and read sequentially when consumed •  Performance plus persistence •  performance of up to a billion messages per second at millisecond-level delivery times.

Kafka model is BLAZING fast

•  Kafka 0.9 API with message sizes at 200 bytes •  MapR Streams on a 5 node cluster sustained 18 million events / sec •  Throughput of 3.5GB/s and over 1.5 trillion events / day

© 2016 MapR Technologies L1-26 ®

When Are Messages Deleted? •  Messages can be persisted forever Or •  Older messages can be deleted automatically based on time to live

MapR Cluster (1 Server)

6 5 4 3 2 1 Partition 1

Older message

© 2016 MapR Technologies L1-27 ®

Parallelism When Reading To read messages from the same Topic in parallel: •  create consumer groups •  consumers with same group.id •  partitions assigned dynamically round-robin

Consumer group: Oil Wells

Consumer A

Consumer B

Consumer C

MapR Cluster

Partition 4: Warning

Partition 3: Warning

Partition 2: Warning

Partition 1: Warning

Partition 5: Warning

© 2016 MapR Technologies L1-28 ®

Fault Tolerance Consumption: Partitions Re-Assigned Dynamically If consumer goes offline, partitions re-assigned

Consumer group.id: Oil Wells

Consumer A

Consumer C

MapR Cluster

Partition4: Warning

Partition3: Warning

Partition2: Warning

Partition1: Warning

Partition5: Warning

© 2016 MapR Technologies L1-29 ®

Processing Same Message for Different Views

Consumers

Consumers

Consumers

Producers

Producers

Producers

MapR-FS

Kafka API Kafka API

Pub Sub: Multiple Consumers, Multiple Destinations

© 2016 MapR Technologies L1-30 ® © 2016 MapR Technologies © 2016 MapR Technologies

Partition Fault Tolerance

© 2016 MapR Technologies L1-31 ®

Message Recovery What if you need to recover messages in case of server failure?

© 2016 MapR Technologies L1-32 ®

Partitions are Replicated for Fault Tolerance

Producer

Producer

Server 2 Partition2: Topic - Warning

Producer

Server 1 Partition1: Topic - Warning

Server 3 Partition3: Topic - Warning

Server 2

Server 3

Server 1

Server 3

Server 1

Server 2

© 2016 MapR Technologies L1-33 ®

Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica

Partition1: Warning Replica

Partition3: Warning Replica

Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning

Producer

Producer

Producer

Server 1

Server 2

Server 3

Security Investigation & Event Management

Operational Intelligence

Real-time Analytics

Partition2: Warning

Partitions are Replicated for Fault Tolerance

© 2016 MapR Technologies L1-34 ®

Partitions are Replicated for Fault Tolerance

Producer

Producer

Producer

Security Investigation & Event Management

Operational Intelligence

Real-time Analytics

Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica

Partition1: Warning Replica

Partition3: Warning Replica

Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning

Server 1

Server 2

Server 3

Partition2: Warning

© 2016 MapR Technologies L1-35 ®

Partitions are Replicated for Fault tolerance

Producer

Producer

Producer

Security Investigation & Event Management

Operational Intelligence

Real-time Analytics

Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica

Partition1: Warning Replica

Partition3: Warning Replica

Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning

Server 1

Server 2

Server 3

Partition2: Warning

© 2016 MapR Technologies L1-36 ® © 2016 MapR Technologies © 2016 MapR Technologies

Streams and High Availability

© 2016 MapR Technologies L1-37 ®

•  Stream: –  collection of topics managed together

•  Manage stream: –  replication –  security –  time-to-live –  number of partitions

Core Components: Streams

Stream

Pressure Temperature Warning

Stream

Pressure Temperature Warning

Consumers

Consumers

Consumers

Consumers

Producers

Producers

Replication

© 2016 MapR Technologies L1-38 ®

Real-time Access What if you need real-time access to live data distributed across multiple clusters and multiple data centers?

© 2016 MapR Technologies L1-39 ®

Lack of Global Replication

Topic: C

© 2016 MapR Technologies L1-40 ®

Streams and Replication Streams:

•  are a collection of topics •  can be replicated worldwide

Topic: A

Topic: B

Topic: C

Topic: A

Topic: B

Topic: C

Replicating to another cluster

© 2016 MapR Technologies L1-41 ®

Streams and Replication

Topic: A

Topic: B

Topic: C

Fail Over

Streams: •  high availability •  disaster recovery

© 2016 MapR Technologies L1-42 ®

Replicating Streams: Master-Slave Replication

Venezuela_HA Cluster

Metrics Stream

Metrics Producers

Venezuela Cluster

Metrics Stream

Metrics

Consumers

High Availabiltiy Backup for Venezula

Master Slave

© 2016 MapR Technologies L1-43 ®

Replicating Streams: Many-to-One Replication

Houston

Metrics Stream

Metrics

Producers Venezuela

Metrics Stream

Metrics Consumers

Consumers

Producers Mexico

Metrics Stream

Metrics Consumers Analyze all data from Houston

Many

One

© 2016 MapR Technologies L1-44 ®

Replicating Streams: Multi-Master Replication

Producers Seoul

Metrics Stream

Metrics Consumers

Producers San Francisco

Metrics Stream

Metrics Consumers

Both send and receive updates

© 2016 MapR Technologies L1-45 ®

Stream Replication

WAN

Stream

Pressure Temperature Warning

Stream

Pressure Temperature Warning

Stream

Pressure Temperature Warning

© 2016 MapR Technologies L1-46 ®

Ship picks up containers…

Singapore

© 2016 MapR Technologies L1-47 ®

Arrives at destination…

Tokyo

© 2016 MapR Technologies L1-48 ®

While enroute to next destination…

Washington

© 2016 MapR Technologies L1-49 ®

Where does the data live…

Singapore Washington

Tokyo

© 2016 MapR Technologies L1-50 ®

What is important about this? Data is generated on the ship

•  Must have an easy way (i.e. foolproof) to move the data off the ship

Each port stores the data from the ship

•  Moving data between locations •  Analytics could happen at any location

This is a multi-data center time series data use case

•  Events from sensors = metrics •  Same concepts as data center monitoring

© 2016 MapR Technologies L1-51 ® © 2016 MapR Technologies © 2016 MapR Technologies

Patterns

© 2016 MapR Technologies L1-52 ®

Event Sourcing

Updates

Imagine each event as a change to an entry in a database.

Account Id Balance WillO 80.00 BradA 20.00

1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

Change log

4 3 2 1

credit, debit events

current account balances

© 2016 MapR Technologies L1-53 ®

Replication

Change Log

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

3 2 1 3 2 1 3 2 1

Duality of Streams and Tables: Database: captures data at rest Stream: captures data change

Master: Append writes

Slave: Apply writes in order

© 2016 MapR Technologies L1-54 ®

Which Makes a Better System of Record?

Which of these can be used to reconstruct the other?

1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00

Account Id Balance WillO 80.00 BradA 20.00

Change Log 3 2 1

© 2016 MapR Technologies L1-55 ®

Rewind: Reprocessing Events

MapR Cluster

6 5 4 3 2 1 Producers

Reprocess from oldest message

Consumer

Create new view, Index, cache

© 2016 MapR Technologies L1-56 ®

Rewind Reprocessing Events

MapR Cluster

6 5 4 3 2 1 Producers

To Newest message

Consumer

new view

Read from new view

© 2016 MapR Technologies L1-57 ®

Event Sourcing, Command Query Responsibility Separation: Turning the Database Upside Down

Key-Val Document Graph

Wide Column

Time Series Relational

??? Events Updates

© 2016 MapR Technologies L1-58 ®

What Else Do I Use My Stream For?

Lineage - “how did BradA’s balance get so low?” Auditing - “who deposited/withdrew from BradA’s account?” History – to see the status of the accounts last year Integrity - “can I trust this data hasn’t been tampered with?”

•  Yup - Streams are immutable

0: WillO : Deposit : 100.00 1: BradA : Deposit : 50.00 2: BradA : Withdraw : 30.00 3: WillO : Withdraw: 20.00

© 2016 MapR Technologies L1-59 ®

What Do I Need For This to Work?

Infinitely persisted events A way to query your persisted stream data An integrated security model across the stream and databases

© 2016 MapR Technologies L1-60 ®

Fraud Detection

Point of Sale -> Data Center is Transaction Fraud ? •  Lots of requests •  Need answer within ~ 50 100 milliseconds

Data Center

Point of Sale

Location, time, card#

Fraud yes/no ?

© 2016 MapR Technologies L1-61 ®

Traditional Solution

POS 1..n

Fraud detector

Last card use

1.  Look up last card use 2.  Compute the card velocity:

•  Subtract last location, time from current location, time

3.  Update last card use

© 2016 MapR Technologies L1-62 ®

What Happens Next?

POS 1..n

Fraud detector

Last card use

POS 1..n

Fraud detector

POS 1..n

Fraud detector

1.  Look up last card use 2.  Compute the card velocity 3.  Update last card use

Bottleneck !

© 2016 MapR Technologies L1-63 ®

Service Isolation: Separate Read from Write

POS 1..n

Fraud detector

Last card use

Updater

card activity

Read

Read last card use

© 2016 MapR Technologies L1-64 ®

Separate Read Model from the Write Model: Command Query Responsibility Separation

POS 1..n

Fraud detector

Last card use

Updater

card activity

Read

Event last card use

Write last card use

© 2016 MapR Technologies L1-65 ®

Event Sourcing: New Uses of Data Processing Same Message for Multiple Views

POS 1..n

Fraud detector

Last card use

Updater

Card location history

Other

card activity

© 2016 MapR Technologies L1-66 ®

Scaling Through Isolation allows Multiple Consumers

POS 1..n

Last card use

Updater

POS 1..n

Last card use

Updater

card activity

Fraud detector

Fraud detector

Multiple fraud detectors can use the same message queue

•  De-coupling and isolation are key

•  Propagate events, not table updates

© 2016 MapR Technologies L1-67 ®

Decoupled Architecture

Producer

Activity Handler

Producer

Producer Historical

Interesting Data Real-time

Analysis

Results Dashboard

Anomaly Detection

more than one component can make use of the same stream of messages for a variety of uses

© 2016 MapR Technologies L1-68 ®

Lessons De-coupling and isolation are key Propagate events, not table updates

© 2016 MapR Technologies L1-69 ®

Building Enterprise Software vs Internet Companies

Enterprise Software: Complexity of domain => Business logic, Business rules Banking, Healthcare, Telecom Compliance=> Security

Internet Companies: Volume of data => Complex data infrastructure Large Scale Availability, Recovery

Reference Martin Kleppmann

© 2016 MapR Technologies L1-70 ®

Building Enterprise Software vs Internet Companies

Enterprise Software: Event Sourcing

Internet Companies: Stream Processing

Reference Martin Kleppmann

© 2016 MapR Technologies L1-71 ® © 2016 MapR Technologies © 2016 MapR Technologies

Real World Solution

© 2016 MapR Technologies L1-72 ®

Credit Card Fraud Model Building

© 2016 MapR Technologies L1-73 ®

Serve NoSQL Storage Data Ingest

Fraud Stream Processing Architecture

Stream Processing Source

MapR-FS

MapR-DB

Topic: A

Topic: B

Topic: C

Topic: A

Topic: B

Topic: C

© 2016 MapR Technologies L1-74 ®

Streams Messaging

Fraud Processing

Stream Processing

Derive features

Model

raw

enriched

alerts

process

Batch Processing

MapR-FS

MapR-DB

MapR-DB

raw

enriched

alerts

Model

build model update model

© 2016 MapR Technologies L1-75 ®

Streams Messaging

Fraud Event Processing

Stream Processing

NoSQL Storage

MapR-FS

MapR-DB

Raw

Enriched

Fraud

1.  Parse raw event 2.  read card holder

profile from MapR-DB 3.  Derive features 4.  Get prediction from

model with features 5.  Publish not fraud to

enriched topic 6.  Publish fraud to

fraud topic

© 2016 MapR Technologies L1-76 ®

Fraud Processing Same Message for Different Views

Partition1: Topic – Raw Trans Partition1: Topic – Enriched Partition1: Topic – Fraud Alert

Partition2: Topic – Raw Trans Partition2: Topic - Enriched Partition2: Topic – Fraud Alert

Partition3: Topic – Raw Trans Partition3: Topic - Enriched Partition3: Topic – Fraud Alert

Consumers MapR-FS

MapR-DB

Consumers

Consumers

Consumers MapR-FS

MapR-DB

Consumers

Consumers

Consumers MapR-FS

MapR-DB

Consumers

Consumers

© 2016 MapR Technologies L1-77 ® © 2016 MapR Technologies © 2016 MapR Technologies

Real World Solution

© 2016 MapR Technologies L1-78 ®

JSON DB (MapR-DB)

Graph DB (Titan on

MapR-DB)

Search Engine (Elastic-Search)

Transforming the Health Care Ecosystem

Electronic Medical Records

“The Stream is the System of Record” –Brad Anderson VP Big Data Informatics

© 2016 MapR Technologies L1-79 ®

Liaison ALLOY™ Platform

79

Data Integration

ingest syndicatetransform

Data Management

masterdeduplicateharmonize

relatemerge

tokenize

store / persist

analyzesummarize

reportdistill

recommend

explorequery

sandboxbatch transform

learntraverse

© 2016 MapR Technologies L1-80 ®

Use Case: Streaming System of Record for Healthcare

Objective: •  Build a flexible, secure

healthcare exchange

Records Analysis Applications

Challenges: •  Many different data models •  Security and privacy issues •  HIPAA compliance

Records

© 2016 MapR Technologies L1-81 ®

ALLOY Health: Exchange State HIE

Clinical Data Viewer

Analytics queries like: What are the outcomes in the entire state on diabetes? Are there doctors that are doing this better than others?

Clinical Data

Financial Data Provider

Organizations

© 2016 MapR Technologies L1-82 ® 2000+ Practices 200 + Labs 30,000 + Clinicians

OrdersAnywhere PORTAL (no EHR)

EHR with HL7 ONLY

EHR with WORKFLOW INTEGRATION

RADIOLOGY

LAB

© 2016 MapR Technologies L1-83 ®

This is a PAIN !

COMPLIANCE

SECURITY CONTROLS

COMPLIANCE FEATURES

PRIVACY

PCI DSS 3.0

21 CFR Part 11

SSAE16 / SOC2

HIPAA/HITECH  

© 2016 MapR Technologies L1-84 ®

WHY NOW?

84 http://bit.ly/29aBatK

© 2016 MapR Technologies L1-85 ®

WHY NOW?

2014 FQ4 profit

$ -440 M Total Cost Estimate

$ -12 B

© 2016 MapR Technologies L1-86 ®

Why Now? The Relational database is not the only tool

1234

Attribute Value

patient_id 1234

Name Jon Smith

Age 50

999

Attribute Value

patient_id 999

Name Jonathan Smith

DOB Jun 1965

86

9876

Attribute Value

provider_id 86

Name Dr. Nora Paige

Specialty Diabetes

Attribute Value

rx_id 9876

Name Sitagliptin

Dosage 325mg

Visited

Prescribed

WasPrescribed

Patient

Patient

Prescription

Provider

Context and Relationships

© 2016 MapR Technologies L1-87 ®

WHY NOW? Mind the Gap

87

© 2016 MapR Technologies L1-88 ®

Streaming System of Record for Healthcare

Stream

Topic

Records

Applications

6 5 4 3 2 1

Search

Graph DB

JSON

HBase

Micro Service

Micro Service

Micro Service

Micro Service

Micro Service

Micro Service

A P I

Streaming System of Record Materialized Views

© 2016 MapR Technologies L1-89 ®

89  Immutable Log

Raw Data

workflow

Key/Value (MapR-DB)

materialized

view

workflow

Search Engine

materialized

view

CEP

k v v v v v

k v v v

k v v

k v v v v

k v v v

k v v v v v

Document Log (MapR-FS)

log

API

App

pre-processor

workflow

Graph (ArangoDB)

materialized

view

workflow

Time Series

(OpenTSDB)

materialized view

micro service

micro service

micro service

micro service

micro service

micro service

micro service

micro service

App App App

...

The Promised Land Compliance Auditor

© 2016 MapR Technologies L1-90 ®

The Promised Land

Auditor smiley faces •  Data Lineage •  Audit Logging •  Wire-level encryption •  At Rest encryption

Replication

•  Disaster Recovery •  EU – data can’t leave

Non-Stream / Non-”Big Data” •  Software Development Lifecycle •  System Hardening •  Separation of Concerns

-  Dev vs Ops •  Patch Management

90

Compliance Auditor

© 2016 MapR Technologies L1-91 ®

Solution Design/architecture solved some

•  Streams •  Data Lineage/System of Record •  Kappa Architecture (Kreps/Kleppman)

MapR solved others •  Unified Security •  Replication DC to DC •  Converge Kafka/HBase/Hadoop to one cluster •  Multi-tenancy (lots of topics, for lots of tenants)

91

© 2016 MapR Technologies L1-92 ® © 2016 MapR Technologies © 2016 MapR Technologies

API

© 2016 MapR Technologies L1-93 ®

Sample Producer: All Together public class SampleProducer { String topic=“/streams/pump:warning”; public static KafkaProducer producer; public static void main(String[] args) { producer=setUpProducer(); for(int i = 0; i < 3; i++) { String txt = “msg ” + i; ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, txt); producer.send(rec); System.out.println("Sent msg number " + i); } producer.close(); }

© 2016 MapR Technologies L1-94 ®

public class MyConsumer { public static String topic = "/stream/pump:warning”; public static KafkaConsumer consumer; public static void main(String[] args) { configureConsumer(args); consumer.subscribe(topic); while (true) { ConsumerRecords<String, String> msg= consumer.poll(pollTimeOut); Iterator<ConsumerRecord<String, String>> iter = msg.iterator(); while (iter.hasNext()) { ConsumerRecord<String, String> record = iter.next(); System.out.println(”read " + record.toString()); } } consumer.close(); } }

Sample Consumer: All Together

© 2016 MapR Technologies L1-95 ® © 2016 MapR Technologies © 2016 MapR Technologies

Summary

© 2016 MapR Technologies L1-96 ®

Can we get “Extreme” ?

1+ Trillion Events

•  per day Millions of Producers

•  Billions of events per second Multiple Consumers

•  Potentially for every event Multiple Data Centers

•  Plan for success •  Plan for drastic failure

Think that is crazy? Consider having 100 servers and performing: Monitoring and Application logs…

•  100 metrics per server •  60 samples per minute •  50 metrics per request •  1,000 log entries per request (abnormally

small, depends on level) •  1million requests per day

~ 2 billion events per day, for one small (ish) use case

Extreme Average Reality

© 2016 MapR Technologies L1-97 ®

Stream Processing

Building a Complete Data Architecture

MapR File System (MapR-FS)

MapR Converged Data Platform

MapR Database (MapR-DB) MapR Streams

Sources/Apps Bulk Processing

© 2016 MapR Technologies L1-98 ®

© 2016 MapR Technologies L1-99 ®

© 2016 MapR Technologies L1-100

®

bit.ly/jjug-aug2016 Find my slides & other related materials to this talk here:

or search:

© 2016 MapR Technologies L1-101

®

MapR Blog

• https://www.mapr.com/blog/

© 2016 MapR Technologies L1-102

®

…helping you put data technology to work

●  Find answers

●  Ask technical questions

●  Join on-demand training course discussions

●  Follow release announcements

●  Share and vote on product ideas

●  Find Meetup and event listings

Connect with fellow Apache Hadoop and Spark professionals

community.mapr.com