+ All Categories
Home > Software > Streaming Patterns Revolutionary Architectures with the Kafka API

Streaming Patterns Revolutionary Architectures with the Kafka API

Date post: 15-Apr-2017
Category:
Upload: carol-mcdonald
View: 410 times
Download: 5 times
Share this document with a friend
102
© 2016 MapR Technologies L1-1 ® © 2016 MapR Technologies ® Streaming Patterns, Revolutionary Architectures Carol McDonald
Transcript
Page 1: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-1 ® © 2016 MapR Technologies

®

Streaming Patterns, Revolutionary Architectures Carol McDonald

Page 2: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-2 ®

Agenda Streams Core Components

•  Topics, Partitions •  Fault Tolerance •  High Availability

Patterns •  Event Sourcing •  Duality of Streams and Databases •  Command Query Responsibility Separation •  Polyglot Persistence, Multiple Materialized Views •  Turning the Database Upside Down

Real World Examples •  Fraud Detection •  Healthcare Exchange

Page 3: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-3 ®

Which products are we discussing?

Page 4: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-4 ® © 2016 MapR Technologies © 2016 MapR Technologies

Streams Core Components

Page 5: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-5 ®

What’s a Stream ?

Producers Consumers Events_Stream

A stream is an unbounded sequence of events carried from a set of producers to a set of consumers.

Events

Page 6: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-6 ®

What is Streaming Data? Got Some Examples?

Data Collection Devices

Smart Machinery Phones and Tablets Home Automation

RFID Systems Digital Signage Security Systems Medical Devices

Page 7: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-7 ®

Why Streams?

Trigger Events: •  Stock Prices •  User Activity •  Sensor Data

Topic

Many Big Data sources are Event Oriented

Stream Stream Stream

Event Data

Topic Topic

Real-Time Analytics

Page 8: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-8 ®

Analyze Data What if you need to analyze data as it arrives?

Page 9: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-9 ®

It was hot at 6:05

yesterday!

Batch Processing with HDFS

Analyze

6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°

90° 90° 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°

Page 10: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-10 ®

Event Processing with Streams

6:05 P.M.: 90° Topic

Stream

Temperature

Turn on the air conditioning!

Page 11: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-11 ®

Organize Data What if you need to organize data as it arrives?

Page 12: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-12 ®

Integrating Many Data Sources and Applications

Sources (Producers)

Applications (Consumers)

Unorganized, Complicated, and Tightly Coupled.

Page 13: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-13 ®

Organize Data into Topics with MapR Streams Topics Organize Events into Categories and Decouple Producers from Consumers

Consumers

MapR Cluster

Topic: Pressure

Topic: Temperature

Topic: Warnings

Consumers

Consumers

Kafka API Kafka API

Page 14: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-14 ®

Process High Volume of Data What if you need to process a high volume of data as it arrives?

Page 15: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-15 ®

What if BP had detected problems before the oil hit the water ?

•  1M samples/sec •  High performance at

scale is necessary!

Page 16: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-16 ®

Legacy Messaging

Millions of Sources

Hundreds of Destinations insert

Legacy Message Queue:

Message rate <100K/s

Publish Acks

delete

Consume Acks

Page 17: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-17 ®

Mechanisms for Decoupling Traditional message queues? •  Huge performance hit for persistence:

•  message acknowledgement per message per consumer •  Lots of Non sequential disk I/O when messages added/removed

Page 18: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-18 ®

Scalable Messaging with MapR Streams

Server 1

Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning

Server 2

Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning

Server 3

Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning

Topics are partitioned for throughput and scalability

Page 19: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-19 ®

Scalable Messaging with MapR Streams

Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning

Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning

Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning

Producers are load balanced between partitions

Kafka API

Page 20: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-20 ®

Scalable Messaging with MapR Streams

Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning

Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning

Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning

Consumers

Consumers

Consumers

Consumer groups can read in parallel

Kafka API

Page 21: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-21 ®

Core Components: Partitions

Consumers

MapR Cluster

Topic: Admission / Server 1

Topic: Admission / Server 2

Topic: Admission / Server 3

Consumers

Consumers

Partition

1

Partitions: –  Messages are

appended in order

Offset: –  Sequential id of a

message in a partition Partition

2

Partition

3

6 5 4 3 2 1

3 2 1

5 4 3 2 1

Producers

Producers

Producers

New Message

6 5 4 3 2 1 Old

Message

Page 22: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-22 ®

Read Cursors •  Read cursor: offset ID of most recent read message •  Producers Append New messages to tail •  Consumers Read from head

MapR Cluster

6 5 4 3 2 1 Consumer

group Producers

Read cursors

Consumer group

Page 23: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-23 ®

Consumers

MapR Cluster

Topic: Admission / Server 1

Topic: Admission / Server 2

Topic: Admission / Server 3

Consumers

Consumers

Partition

1

Partition

2

Partition

3

6 5 4 3 2 1

3 2 1

5 4 3 2 1

Producers

Producers

Producers

Events are delivered in the order they are received, like a queue. Partitioned, Sequential Access = High Performance

New Message

6 5 4 3 2 1 Old

Message

Page 24: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-24 ®

Unlike a queue, events are persisted even after they’re delivered

Messages remain on the partition, available to other consumers Minimizes Non-Sequential disk read-writes

MapR Cluster (1 Server)

Topic: Warning

Partition 1

3 2 1 Unread Events

Get Unread

3 2 1

Client Library Consumer Poll

Page 25: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-25 ®

Considering a Messaging Platform Kafka-esque Logs?

•  Sequential writing/reading disk: •  Messages are persisted sequentially as produced, and read sequentially when consumed •  Performance plus persistence •  performance of up to a billion messages per second at millisecond-level delivery times.

Kafka model is BLAZING fast

•  Kafka 0.9 API with message sizes at 200 bytes •  MapR Streams on a 5 node cluster sustained 18 million events / sec •  Throughput of 3.5GB/s and over 1.5 trillion events / day

Page 26: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-26 ®

When Are Messages Deleted? •  Messages can be persisted forever Or •  Older messages can be deleted automatically based on time to live

MapR Cluster (1 Server)

6 5 4 3 2 1 Partition 1

Older message

Page 27: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-27 ®

Parallelism When Reading To read messages from the same Topic in parallel: •  create consumer groups •  consumers with same group.id •  partitions assigned dynamically round-robin

Consumer group: Oil Wells

Consumer A

Consumer B

Consumer C

MapR Cluster

Partition 4: Warning

Partition 3: Warning

Partition 2: Warning

Partition 1: Warning

Partition 5: Warning

Page 28: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-28 ®

Fault Tolerance Consumption: Partitions Re-Assigned Dynamically If consumer goes offline, partitions re-assigned

Consumer group.id: Oil Wells

Consumer A

Consumer C

MapR Cluster

Partition4: Warning

Partition3: Warning

Partition2: Warning

Partition1: Warning

Partition5: Warning

Page 29: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-29 ®

Processing Same Message for Different Views

Consumers

Consumers

Consumers

Producers

Producers

Producers

MapR-FS

Kafka API Kafka API

Pub Sub: Multiple Consumers, Multiple Destinations

Page 30: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-30 ® © 2016 MapR Technologies © 2016 MapR Technologies

Partition Fault Tolerance

Page 31: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-31 ®

Message Recovery What if you need to recover messages in case of server failure?

Page 32: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-32 ®

Partitions are Replicated for Fault Tolerance

Producer

Producer

Server 2 Partition2: Topic - Warning

Producer

Server 1 Partition1: Topic - Warning

Server 3 Partition3: Topic - Warning

Server 2

Server 3

Server 1

Server 3

Server 1

Server 2

Page 33: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-33 ®

Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica

Partition1: Warning Replica

Partition3: Warning Replica

Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning

Producer

Producer

Producer

Server 1

Server 2

Server 3

Security Investigation & Event Management

Operational Intelligence

Real-time Analytics

Partition2: Warning

Partitions are Replicated for Fault Tolerance

Page 34: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-34 ®

Partitions are Replicated for Fault Tolerance

Producer

Producer

Producer

Security Investigation & Event Management

Operational Intelligence

Real-time Analytics

Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica

Partition1: Warning Replica

Partition3: Warning Replica

Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning

Server 1

Server 2

Server 3

Partition2: Warning

Page 35: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-35 ®

Partitions are Replicated for Fault tolerance

Producer

Producer

Producer

Security Investigation & Event Management

Operational Intelligence

Real-time Analytics

Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica

Partition1: Warning Replica

Partition3: Warning Replica

Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning

Server 1

Server 2

Server 3

Partition2: Warning

Page 36: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-36 ® © 2016 MapR Technologies © 2016 MapR Technologies

Streams and High Availability

Page 37: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-37 ®

•  Stream: –  collection of topics managed together

•  Manage stream: –  replication –  security –  time-to-live –  number of partitions

Core Components: Streams

Stream

Pressure Temperature Warning

Stream

Pressure Temperature Warning

Consumers

Consumers

Consumers

Consumers

Producers

Producers

Replication

Page 38: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-38 ®

Real-time Access What if you need real-time access to live data distributed across multiple clusters and multiple data centers?

Page 39: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-39 ®

Lack of Global Replication

Topic: C

Page 40: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-40 ®

Streams and Replication Streams:

•  are a collection of topics •  can be replicated worldwide

Topic: A

Topic: B

Topic: C

Topic: A

Topic: B

Topic: C

Replicating to another cluster

Page 41: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-41 ®

Streams and Replication

Topic: A

Topic: B

Topic: C

Fail Over

Streams: •  high availability •  disaster recovery

Page 42: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-42 ®

Replicating Streams: Master-Slave Replication

Venezuela_HA Cluster

Metrics Stream

Metrics Producers

Venezuela Cluster

Metrics Stream

Metrics

Consumers

High Availabiltiy Backup for Venezula

Master Slave

Page 43: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-43 ®

Replicating Streams: Many-to-One Replication

Houston

Metrics Stream

Metrics

Producers Venezuela

Metrics Stream

Metrics Consumers

Consumers

Producers Mexico

Metrics Stream

Metrics Consumers Analyze all data from Houston

Many

One

Page 44: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-44 ®

Replicating Streams: Multi-Master Replication

Producers Seoul

Metrics Stream

Metrics Consumers

Producers San Francisco

Metrics Stream

Metrics Consumers

Both send and receive updates

Page 45: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-45 ®

Stream Replication

WAN

Stream

Pressure Temperature Warning

Stream

Pressure Temperature Warning

Stream

Pressure Temperature Warning

Page 46: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-46 ®

Ship picks up containers…

Singapore

Page 47: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-47 ®

Arrives at destination…

Tokyo

Page 48: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-48 ®

While enroute to next destination…

Washington

Page 49: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-49 ®

Where does the data live…

Singapore Washington

Tokyo

Page 50: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-50 ®

What is important about this? Data is generated on the ship

•  Must have an easy way (i.e. foolproof) to move the data off the ship

Each port stores the data from the ship

•  Moving data between locations •  Analytics could happen at any location

This is a multi-data center time series data use case

•  Events from sensors = metrics •  Same concepts as data center monitoring

Page 51: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-51 ® © 2016 MapR Technologies © 2016 MapR Technologies

Patterns

Page 52: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-52 ®

Event Sourcing

Updates

Imagine each event as a change to an entry in a database.

Account Id Balance WillO 80.00 BradA 20.00

1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

Change log

4 3 2 1

credit, debit events

current account balances

Page 53: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-53 ®

Replication

Change Log

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

3 2 1 3 2 1 3 2 1

Duality of Streams and Tables: Database: captures data at rest Stream: captures data change

Master: Append writes

Slave: Apply writes in order

Page 54: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-54 ®

Which Makes a Better System of Record?

Which of these can be used to reconstruct the other?

1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00

Account Id Balance WillO 80.00 BradA 20.00

Change Log 3 2 1

Page 55: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-55 ®

Rewind: Reprocessing Events

MapR Cluster

6 5 4 3 2 1 Producers

Reprocess from oldest message

Consumer

Create new view, Index, cache

Page 56: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-56 ®

Rewind Reprocessing Events

MapR Cluster

6 5 4 3 2 1 Producers

To Newest message

Consumer

new view

Read from new view

Page 57: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-57 ®

Event Sourcing, Command Query Responsibility Separation: Turning the Database Upside Down

Key-Val Document Graph

Wide Column

Time Series Relational

??? Events Updates

Page 58: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-58 ®

What Else Do I Use My Stream For?

Lineage - “how did BradA’s balance get so low?” Auditing - “who deposited/withdrew from BradA’s account?” History – to see the status of the accounts last year Integrity - “can I trust this data hasn’t been tampered with?”

•  Yup - Streams are immutable

0: WillO : Deposit : 100.00 1: BradA : Deposit : 50.00 2: BradA : Withdraw : 30.00 3: WillO : Withdraw: 20.00

Page 59: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-59 ®

What Do I Need For This to Work?

Infinitely persisted events A way to query your persisted stream data An integrated security model across the stream and databases

Page 60: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-60 ®

Fraud Detection

Point of Sale -> Data Center is Transaction Fraud ? •  Lots of requests •  Need answer within ~ 50 100 milliseconds

Data Center

Point of Sale

Location, time, card#

Fraud yes/no ?

Page 61: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-61 ®

Traditional Solution

POS 1..n

Fraud detector

Last card use

1.  Look up last card use 2.  Compute the card velocity:

•  Subtract last location, time from current location, time

3.  Update last card use

Page 62: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-62 ®

What Happens Next?

POS 1..n

Fraud detector

Last card use

POS 1..n

Fraud detector

POS 1..n

Fraud detector

1.  Look up last card use 2.  Compute the card velocity 3.  Update last card use

Bottleneck !

Page 63: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-63 ®

Service Isolation: Separate Read from Write

POS 1..n

Fraud detector

Last card use

Updater

card activity

Read

Read last card use

Page 64: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-64 ®

Separate Read Model from the Write Model: Command Query Responsibility Separation

POS 1..n

Fraud detector

Last card use

Updater

card activity

Read

Event last card use

Write last card use

Page 65: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-65 ®

Event Sourcing: New Uses of Data Processing Same Message for Multiple Views

POS 1..n

Fraud detector

Last card use

Updater

Card location history

Other

card activity

Page 66: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-66 ®

Scaling Through Isolation allows Multiple Consumers

POS 1..n

Last card use

Updater

POS 1..n

Last card use

Updater

card activity

Fraud detector

Fraud detector

Multiple fraud detectors can use the same message queue

•  De-coupling and isolation are key

•  Propagate events, not table updates

Page 67: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-67 ®

Decoupled Architecture

Producer

Activity Handler

Producer

Producer Historical

Interesting Data Real-time

Analysis

Results Dashboard

Anomaly Detection

more than one component can make use of the same stream of messages for a variety of uses

Page 68: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-68 ®

Lessons De-coupling and isolation are key Propagate events, not table updates

Page 69: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-69 ®

Building Enterprise Software vs Internet Companies

Enterprise Software: Complexity of domain => Business logic, Business rules Banking, Healthcare, Telecom Compliance=> Security

Internet Companies: Volume of data => Complex data infrastructure Large Scale Availability, Recovery

Reference Martin Kleppmann

Page 70: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-70 ®

Building Enterprise Software vs Internet Companies

Enterprise Software: Event Sourcing

Internet Companies: Stream Processing

Reference Martin Kleppmann

Page 71: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-71 ® © 2016 MapR Technologies © 2016 MapR Technologies

Real World Solution

Page 72: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-72 ®

Credit Card Fraud Model Building

Page 73: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-73 ®

Serve NoSQL Storage Data Ingest

Fraud Stream Processing Architecture

Stream Processing Source

MapR-FS

MapR-DB

Topic: A

Topic: B

Topic: C

Topic: A

Topic: B

Topic: C

Page 74: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-74 ®

Streams Messaging

Fraud Processing

Stream Processing

Derive features

Model

raw

enriched

alerts

process

Batch Processing

MapR-FS

MapR-DB

MapR-DB

raw

enriched

alerts

Model

build model update model

Page 75: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-75 ®

Streams Messaging

Fraud Event Processing

Stream Processing

NoSQL Storage

MapR-FS

MapR-DB

Raw

Enriched

Fraud

1.  Parse raw event 2.  read card holder

profile from MapR-DB 3.  Derive features 4.  Get prediction from

model with features 5.  Publish not fraud to

enriched topic 6.  Publish fraud to

fraud topic

Page 76: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-76 ®

Fraud Processing Same Message for Different Views

Partition1: Topic – Raw Trans Partition1: Topic – Enriched Partition1: Topic – Fraud Alert

Partition2: Topic – Raw Trans Partition2: Topic - Enriched Partition2: Topic – Fraud Alert

Partition3: Topic – Raw Trans Partition3: Topic - Enriched Partition3: Topic – Fraud Alert

Consumers MapR-FS

MapR-DB

Consumers

Consumers

Consumers MapR-FS

MapR-DB

Consumers

Consumers

Consumers MapR-FS

MapR-DB

Consumers

Consumers

Page 77: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-77 ® © 2016 MapR Technologies © 2016 MapR Technologies

Real World Solution

Page 78: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-78 ®

JSON DB (MapR-DB)

Graph DB (Titan on

MapR-DB)

Search Engine (Elastic-Search)

Transforming the Health Care Ecosystem

Electronic Medical Records

“The Stream is the System of Record” –Brad Anderson VP Big Data Informatics

Page 79: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-79 ®

Liaison ALLOY™ Platform

79

Data Integration

ingest syndicatetransform

Data Management

masterdeduplicateharmonize

relatemerge

tokenize

store / persist

analyzesummarize

reportdistill

recommend

explorequery

sandboxbatch transform

learntraverse

Page 80: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-80 ®

Use Case: Streaming System of Record for Healthcare

Objective: •  Build a flexible, secure

healthcare exchange

Records Analysis Applications

Challenges: •  Many different data models •  Security and privacy issues •  HIPAA compliance

Records

Page 81: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-81 ®

ALLOY Health: Exchange State HIE

Clinical Data Viewer

Analytics queries like: What are the outcomes in the entire state on diabetes? Are there doctors that are doing this better than others?

Clinical Data

Financial Data Provider

Organizations

Page 82: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-82 ® 2000+ Practices 200 + Labs 30,000 + Clinicians

OrdersAnywhere PORTAL (no EHR)

EHR with HL7 ONLY

EHR with WORKFLOW INTEGRATION

RADIOLOGY

LAB

Page 83: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-83 ®

This is a PAIN !

COMPLIANCE

SECURITY CONTROLS

COMPLIANCE FEATURES

PRIVACY

PCI DSS 3.0

21 CFR Part 11

SSAE16 / SOC2

HIPAA/HITECH  

Page 84: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-84 ®

WHY NOW?

84 http://bit.ly/29aBatK

Page 85: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-85 ®

WHY NOW?

2014 FQ4 profit

$ -440 M Total Cost Estimate

$ -12 B

Page 86: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-86 ®

Why Now? The Relational database is not the only tool

1234

Attribute Value

patient_id 1234

Name Jon Smith

Age 50

999

Attribute Value

patient_id 999

Name Jonathan Smith

DOB Jun 1965

86

9876

Attribute Value

provider_id 86

Name Dr. Nora Paige

Specialty Diabetes

Attribute Value

rx_id 9876

Name Sitagliptin

Dosage 325mg

Visited

Prescribed

WasPrescribed

Patient

Patient

Prescription

Provider

Context and Relationships

Page 87: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-87 ®

WHY NOW? Mind the Gap

87

Page 88: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-88 ®

Streaming System of Record for Healthcare

Stream

Topic

Records

Applications

6 5 4 3 2 1

Search

Graph DB

JSON

HBase

Micro Service

Micro Service

Micro Service

Micro Service

Micro Service

Micro Service

A P I

Streaming System of Record Materialized Views

Page 89: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-89 ®

89  Immutable Log

Raw Data

workflow

Key/Value (MapR-DB)

materialized

view

workflow

Search Engine

materialized

view

CEP

k v v v v v

k v v v

k v v

k v v v v

k v v v

k v v v v v

Document Log (MapR-FS)

log

API

App

pre-processor

workflow

Graph (ArangoDB)

materialized

view

workflow

Time Series

(OpenTSDB)

materialized view

micro service

micro service

micro service

micro service

micro service

micro service

micro service

micro service

App App App

...

The Promised Land Compliance Auditor

Page 90: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-90 ®

The Promised Land

Auditor smiley faces •  Data Lineage •  Audit Logging •  Wire-level encryption •  At Rest encryption

Replication

•  Disaster Recovery •  EU – data can’t leave

Non-Stream / Non-”Big Data” •  Software Development Lifecycle •  System Hardening •  Separation of Concerns

-  Dev vs Ops •  Patch Management

90

Compliance Auditor

Page 91: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-91 ®

Solution Design/architecture solved some

•  Streams •  Data Lineage/System of Record •  Kappa Architecture (Kreps/Kleppman)

MapR solved others •  Unified Security •  Replication DC to DC •  Converge Kafka/HBase/Hadoop to one cluster •  Multi-tenancy (lots of topics, for lots of tenants)

91

Page 92: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-92 ® © 2016 MapR Technologies © 2016 MapR Technologies

API

Page 93: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-93 ®

Sample Producer: All Together public class SampleProducer { String topic=“/streams/pump:warning”; public static KafkaProducer producer; public static void main(String[] args) { producer=setUpProducer(); for(int i = 0; i < 3; i++) { String txt = “msg ” + i; ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, txt); producer.send(rec); System.out.println("Sent msg number " + i); } producer.close(); }

Page 94: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-94 ®

public class MyConsumer { public static String topic = "/stream/pump:warning”; public static KafkaConsumer consumer; public static void main(String[] args) { configureConsumer(args); consumer.subscribe(topic); while (true) { ConsumerRecords<String, String> msg= consumer.poll(pollTimeOut); Iterator<ConsumerRecord<String, String>> iter = msg.iterator(); while (iter.hasNext()) { ConsumerRecord<String, String> record = iter.next(); System.out.println(”read " + record.toString()); } } consumer.close(); } }

Sample Consumer: All Together

Page 95: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-95 ® © 2016 MapR Technologies © 2016 MapR Technologies

Summary

Page 96: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-96 ®

Can we get “Extreme” ?

1+ Trillion Events

•  per day Millions of Producers

•  Billions of events per second Multiple Consumers

•  Potentially for every event Multiple Data Centers

•  Plan for success •  Plan for drastic failure

Think that is crazy? Consider having 100 servers and performing: Monitoring and Application logs…

•  100 metrics per server •  60 samples per minute •  50 metrics per request •  1,000 log entries per request (abnormally

small, depends on level) •  1million requests per day

~ 2 billion events per day, for one small (ish) use case

Extreme Average Reality

Page 97: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-97 ®

Stream Processing

Building a Complete Data Architecture

MapR File System (MapR-FS)

MapR Converged Data Platform

MapR Database (MapR-DB) MapR Streams

Sources/Apps Bulk Processing

Page 98: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-98 ®

Page 99: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-99 ®

Page 100: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-100

®

bit.ly/jjug-aug2016 Find my slides & other related materials to this talk here:

or search:

Page 101: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-101

®

MapR Blog

• https://www.mapr.com/blog/

Page 102: Streaming Patterns Revolutionary Architectures with the Kafka API

© 2016 MapR Technologies L1-102

®

…helping you put data technology to work

●  Find answers

●  Ask technical questions

●  Join on-demand training course discussions

●  Follow release announcements

●  Share and vote on product ideas

●  Find Meetup and event listings

Connect with fellow Apache Hadoop and Spark professionals

community.mapr.com


Recommended