Date post: | 22-Jan-2018 |
Category: |
Data & Analytics |
Upload: | guido-schmutz |
View: | 187 times |
Download: | 0 times |
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Apache Kafka – Scalable Stream Processing and more!Guido Schmutz – 5.12.2017
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOAConsultant, Trainer Software Architect for Java, Oracle, SOA andBig Data / Fast DataHead of Trivadis Architecture BoardTechnology Manager @ Trivadis
More than 30 years of software development experience
Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz
Apache Kafka – Scalable Stream Processing and more!
Agenda
1. What is Apache Kafka?2. Kafka Connect3. Kafka Integration with other components4. Kafka Streams5. KSQL
Apache Kafka – Scalable Stream Processing and more!
What is Apache Kafka?
Apache Kafka – Scalable Stream Processing and more!
Apache Kafka History
2012 2013 2014 2015 2016 2017
Clustermirroringdatacompression
Intra-clusterreplication0.7
0.8
0.9
DataProcessing(StreamsAPI)
0.10
DataIntegration(ConnectAPI)
0.11
2018
ExactlyOnceSemanticsPerformanceImprovements
KSQLDeveloperPreview
Apache Kafka – Scalable Stream Processing and more!
1.0 JBODSupportSupportJava9
Apache Kafka – A Streaming Platform
Apache Kafka – Scalable Stream Processing and more!
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
Strong Ordering Guarantees
most business systems need strong ordering guarantees
messages that require relative ordering need to be sent to the same partition
supply same key for all messages that require a relative order
To maintain global ordering use a single partition topic
Producer 1
Consumer 1
Broker 1
Broker 2
Broker 3
Consumer 2
Consumer 3
Key-1
Key-2
Key-3Key-4
Key-5
Key-6
Key-3
Key-1
Apache Kafka – Scalable Stream Processing and more!
Durable and Highly Available Messaging
Producer 1
Broker 1
Broker 2
Broker 3
Producer 1
Broker 1
Broker 2
Broker 3
Consumer 1 Consumer 1
Consumer 2Consumer 2
Apache Kafka – Scalable Stream Processing and more!
Durable and Highly Available Messaging (II)
Producer 1
Broker 1
Broker 2
Broker 3
Producer 1
Broker 1
Broker 2
Broker 3
Consumer 1 Consumer 1
Consumer 2
Consumer 2
Apache Kafka – Scalable Stream Processing and more!
How to get a Kafka environent
Apache Kafka – Scalable Stream Processing and more!
On Premises• Bare Metal Installation
• Docker
• Mesos / Kubernetes
• Hadoop Distributions
Cloud• Oracle Event Hub Cloud Service
• Azure HDInsight Kafka
• Confluent Cloud
• …
Demo - Kafka
Truck-2 truckposition
Truck-1
Truck-3
consoleconsumer
2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631
Testdata-GeneratorbyHortonworks
Apache Kafka – Scalable Stream Processing and more!
Demo – Create Kafka Topic
$ kafka-topics --zookeeper zookeeper:2181 --create \--topic truck_position --partitions 8 --replication-factor 1
$ kafka-topics --zookeeper zookeeper:2181 –list__consumer_offsets_confluent-metrics_schemasdocker-connect-configsdocker-connect-offsetsdocker-connect-statustruck_position
Apache Kafka – Scalable Stream Processing and more!
Demo – Run Producer and Kafka-Console-Consumer
Apache Kafka – Scalable Stream Processing and more!
Demo – Java Producer to "truck_position"
Constructing a Kafka Producer
private Properties kafkaProps = new Properties();kafkaProps.put("bootstrap.servers","broker-1:9092);kafkaProps.put("key.serializer", "...StringSerializer");kafkaProps.put("value.serializer", "...StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);
ProducerRecord<String, String> record =new ProducerRecord<>("truck_position", driverId, eventData);
try {metadata = producer.send(record).get();
} catch (Exception e) {}
Apache Kafka – Scalable Stream Processing and more!
Demo - MQTT instead of Kafka
Truck-2 truck/nn/position
Truck-1
Truck-3
2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631
Apache Kafka – Scalable Stream Processing and more!
Demo –MQTT instead of Kafka
Apache Kafka – Scalable Stream Processing and more!
Demo MQTT instead of Kafka – how to get the data into Kafka?
Truck-2 truck/nn/position
Truck-1
Truck-3
truckposition raw
?
2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631
Apache Kafka – Scalable Stream Processing and more!
Apache Kafka – wait there is more!
Apache Kafka – Scalable Stream Processing and more!
Source Connector
trucking_driver
Kafka BrokerSink
Connector
Stream Processing
Kafka Connect
Apache Kafka – Scalable Stream Processing and more!
Kafka Connect - Overview
SourceConnector
SinkConnector
Apache Kafka – Scalable Stream Processing and more!
Kafka Connect – Single Message Transforms (SMT)
Simple Transformations for a single message
Defined as part of Kafka Connect• some useful transforms provided out-of-the-box• Easily implement your own
Optionally deploy 1+ transforms with each connector• Modify messages produced by source
connector• Modify messages sent to sink connectors
Makes it much easier to mix and match connectors
Some of currently available transforms:• InsertField• ReplaceField• MaskField• ValueToKey• ExtractField• TimestampRouter• RegexRouter• SetSchemaMetaData• Flatten• TimestampConverter
Apache Kafka – Scalable Stream Processing and more!
Kafka Connect – Many Connectors
60+ since first release (0.9+)
20+ from Confluent and Partners
Source:http://www.confluent.io/product/connectors
ConfluentsupportedConnectors
CertifiedConnectors CommunityConnectors
Apache Kafka – Scalable Stream Processing and more!
Demo – Kafka Connect
Truck-2 truck/nn/position
Truck-1
Truck-3
mqtt tokafka
truck_position
consoleconsumer
Apache Kafka – Scalable Stream Processing and more!
2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631
Demo – Create MQTT Connect through REST API #!/bin/bashcurl -X "POST" "http://192.168.69.138:8083/connectors" \
-H "Content-Type: application/json" \-d $'{
"name": "mqtt-source","config": {"connector.class":
"com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector","connect.mqtt.connection.timeout": "1000","tasks.max": "1","connect.mqtt.kcql":
"INSERT INTO truck_position SELECT * FROM truck/+/position","name": "MqttSourceConnector","connect.mqtt.service.quality": "0", "connect.mqtt.client.id": "tm-mqtt-connect-01","connect.mqtt.converter.throw.on.error": "true","connect.mqtt.hosts": "tcp://mosquitto:1883"}
}'
Apache Kafka – Scalable Stream Processing and more!
Demo – Call REST API and Kafka Console Consumer
Apache Kafka – Scalable Stream Processing and more!
Kafka Integration with othercomponents
Apache Kafka – Scalable Stream Processing and more!
Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular products / frameworks
• Apache Spark Streaming
• Apache Flink
• Apache Storm
• Apache Apex
• Apache NiFi
• StreamSets
• Oracle Stream Analytics
• Oracle Service Bus
• Oracle GoldenGate
• Oracle Event Hub Cloud Service
• Debezium CDC
• …
AdditionalInfo:https://cwiki.apache.org/confluence/display/KAFKA/EcosystemApache Kafka – Scalable Stream Processing and more!
StreamSets Data Collector
• Founded by ex-Cloudera, Informaticaemployees
• Continuous open source, intent-driven, big data ingest
• Visible, record-oriented approach fixes combinatorial explosion
• Batch or stream processing• Standalone, Spark cluster, MapReduce cluster
• IDE for pipeline development by ‘civilians’• Relatively new - first public release September
2015• So far, vast majority of commits are from
StreamSets staff
Apache Kafka – Scalable Stream Processing and more!
Demo StreamSets Data Collector
Truck-3
truckposition raw
truck/nn/positionTruck-4
Truck-5
KafkatoCassandra
{"truckid":"57","driverid":"15","routeid":"1927624662","eventtype":"Normal","latitude":"38.65","longitude":"-90.21","correlationId":"4412891759760421296"}
MQTT-2to Kafka
Edge
Port: 1883
trucking
Apache Kafka – Scalable Stream Processing and more!
Demo StreamSets Data Collector
Apache Kafka – Scalable Stream Processing and more!
Demo StreamSets Data Collector
Apache Kafka – Scalable Stream Processing and more!
Demo StreamSets Data Collector
Apache Kafka – Scalable Stream Processing and more!
Demo StreamSets Data Collector
Apache Kafka – Scalable Stream Processing and more!
Demo StreamSets Data Collector
Truck-3
truckposition raw
truck/nn/positionTruck-4
Truck-5
KafkatoCassandra
{"truckid":"57","driverid":"15","routeid":"1927624662","eventtype":"Normal","latitude":"38.65","longitude":"-90.21","correlationId":"4412891759760421296"}
MQTT-2to Kafka
Edge
Port: 1883
trucking
whataboutsomeanalytics?
Apache Kafka – Scalable Stream Processing and more!
Kafka Streams
Apache Kafka – Scalable Stream Processing and more!
Kafka Streams - Overview
• Designed as a simple and lightweight library in Apache Kafka
• no external dependencies on systems other than Apache Kafka
• Part of open source Apache Kafka, introduced in 0.10+• Leverages Kafka as its internal messaging layer• Supports fault-tolerant local state• Event-at-a-time processing (not microbatch) with millisecond
latency• Windowing with out-of-order data using a Google DataFlow-like
model
Apache Kafka – Scalable Stream Processing and more!
Kafka Stream DSL and Processor Topology
KStream<Integer, String> stream1 =builder.stream("in-1");
KStream<Integer, String> stream2=builder.stream("in-2");
KStream<Integer, String> joined =stream1.leftJoin(stream2, …);
KTable<> aggregated = joined.groupBy(…).count("store");
aggregated.to("out-1");
1 2
lj
a
t
State
Apache Kafka – Scalable Stream Processing and more!
Kafka Stream DSL and Processor Topology
KStream<Integer, String> stream1 =builder.stream("in-1");
KStream<Integer, String> stream2=builder.stream("in-2");
KStream<Integer, String> joined =stream1.leftJoin(stream2, …);
KTable<> aggregated = joined.groupBy(…).count("store");
aggregated.to("out-1");
1 2
lj
a
t
State
Apache Kafka – Scalable Stream Processing and more!
Kafka Streams Cluster
Processor Topology
Kafka Cluster
input-1
input-2
store(changelog)
output
1 2
lj
a
tState
Apache Kafka – Scalable Stream Processing and more!
Kafka Cluster
Processor Topology
input-1Partition0
Partition1
Partition2
Partition3
input-2Partition0
Partition1
Partition2
Partition3
Kafka Streams 1 Kafka Streams 2
Kafka Streams 3 Kafka Streams 4
Apache Kafka – Scalable Stream Processing and more!
Demo – Kafka Streams
Truck-2 truck/nn/position
Truck-1
Truck-3
mqtt tokafka
truck_position_s
detect_dangerous_driving
dangerous_driving
consoleconsumer
2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631
Apache Kafka – Scalable Stream Processing and more!
KafkatoCassandra trucking
Demo (IV) - Create Stream
final KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> source = builder.stream(stringSerde, stringSerde, "truck_position");
KStream<String, TruckPosition> positions = source.map((key,value) ->
new KeyValue<>(key, TruckPosition.create(value)));
KStream<String, TruckPosition> filtered = positions.filter(TruckPosition::filterNonNORMAL);
filtered.map((key,value) -> new KeyValue<>(key,value._originalRecord))
.to("dangerous_driving");
Apache Kafka – Scalable Stream Processing and more!
KSQL
Apache Kafka – Scalable Stream Processing and more!
KSQL: a Streaming SQL Engine for Apache Kafka
• Enables stream processing with zero coding required• The simples way to process streams of data in real-time• Powered by Kafka and Kafka Streams: scalable, distributed, mature• All you need is Kafka – no complex deployments• available as Developer preview!
• STREAM and TABLE as first-class citizens• STREAM = data in motion• TABLE = collected state of a stream• join STREAM and TABLE
Apache Kafka – Scalable Stream Processing and more!
Demo – KSQL
Truck-2 truck/nn/position
Truck-1
Truck-3
mqtt tokafka
truck_position
detect_dangerous_driving
dangerous_driving
consoleconsumer
2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631
Apache Kafka – Scalable Stream Processing and more!
KafkatoCassandra trucking
Demo (V) - Start Kafka KSQL$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092
======================================= _ __ _____ ____ _ == | |/ // ____|/ __ \| | == | ' /| (___ | | | | | == | < \___ \| | | | | == | . \ ____) | |__| | |____ == |_|\_\_____/ \___\_\______| == == Streaming SQL Engine for Kafka =
Copyright 2017 Confluent Inc.
CLI v0.1, Server v0.1 located at http://localhost:9098
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>
Apache Kafka – Scalable Stream Processing and more!
Demo (IV) - Create Streamksql> CREATE STREAM truck_position_s \(ts VARCHAR, \truckid VARCHAR, \driverid BIGINT, \routeid BIGINT, \routename VARCHAR, \eventtype VARCHAR, \latitude DOUBLE, \longitude DOUBLE, \correlationid VARCHAR) \WITH (kafka_topic='truck_position', \
value_format='DELIMITED');
Message----------------Stream created
Apache Kafka – Scalable Stream Processing and more!
Demo (IV) - Create Stream
ksql> describe truck_position_s;
Field | Type---------------------------------ROWTIME | BIGINTROWKEY | VARCHAR(STRING)TS | VARCHAR(STRING)TRUCKID | VARCHAR(STRING)DRIVERID | BIGINTROUTEID | BIGINTROUTENAME | VARCHAR(STRING)EVENTTYPE | VARCHAR(STRING)LATITUDE | DOUBLELONGITUDE | DOUBLECORRELATIONID | VARCHAR(STRING)
Apache Kafka – Scalable Stream Processing and more!
Demo (IV) - Create Stream
ksql> SELECT * FROM truck_position_s;
1506922133306 | "truck/13/position0 | �2017-10-02T07:28:53 | 31 | 13 | 371182829 | Memphis to Little Rock | Normal | 41.76 | -89.6 | -20842639519146641061506922133396 | "truck/16/position0 | �2017-10-02T07:28:53 | 19 | 16 | 160405074 | Joplin to Kansas City Route 2 | Normal | 41.48 | -88.07 | -20842639519146641061506922133457 | "truck/30/position0 | �2017-10-02T07:28:53 | 26 | 30 | 160779139 | Des Moines to Chicago Route 2 | Normal | 41.85 | -89.29 | -20842639519146641061506922133485 | "truck/23/position0 | �2017-10-02T07:28:53 | 32 | 23 | 1090292248 | Peoria to Ceder Rapids Route 2 | Normal | 41.48 | -88.07 | -20842639519146641061506922133497 | "truck/12/position0 | �2017-10-02T07:28:53 | 80 | 12 | 1961634315 | Saint Louis to Memphis | Normal | 41.74 | -91.47 | -20842639519146641061506922133547 | "truck/14/position0 | �2017-10-02T07:28:53 | 73 | 14 | 1927624662 | Springfield to KC Via Columbia | Normal | 35.12 | -90.68 | -2084263951914664106
Apache Kafka – Scalable Stream Processing and more!
Demo (IV) - Create Stream
ksql> SELECT * FROM truck_position_s WHERE eventtype != 'Normal';
1506922264016 | "truck/11/position0 | �2017-10-02T07:31:04 | 27 | 11 | 1325712174 | Saint Louis to Tulsa Route2 | Lane Departure | 38.5 | -90.69 | -20842639519146641061506922281156 | "truck/11/position0 | �2017-10-02T07:31:21 | 27 | 11 | 1325712174 | Saint Louis to Tulsa Route2 | Unsafe tail distance | 37.81 | -92.31 | -20842639519146641061506922284436 | "truck/10/position0 | �2017-10-02T07:31:24 | 93 | 10 | 1384345811 | Joplin to Kansas City | Unsafe following distance | 37.02 | -94.54 | -20842639519146641061506922297887 | "truck/11/position0 | �2017-10-02T07:31:37 | 27 | 11 | 1325712174 | Saint Louis to Tulsa Route2 | Unsafe following distance | 37.09 | -94.23 | -2084263951914664106
Apache Kafka – Scalable Stream Processing and more!
Demo (IV) - Create Stream
ksql> CREATE STREAM dangerous_driving_s \WITH (kafka_topic= dangerous_driving', \
value_format='DELIMITED') \AS SELECT * FROM truck_position_s \WHERE eventtype != 'Normal';
Message----------------------------Stream created and running
ksql> select * from dangerous_driving_s;1506922849375 | "truck/11/position0 | �2017-10-02T07:40:49 | 90 | 11 | 160779139 | Des Moines to Chicago Route 2 | Overspeed | 41.48 | -88.07 | 35691830713478983661506922866488 | "truck/11/position0 | �2017-10-02T07:41:06 | 90 | 11 | 160779139 | Des Moines to Chicago Route 2 | Overspeed | 40.38 | -89.17 | 3569183071347898366
Apache Kafka – Scalable Stream Processing and more!
Demo (IV) - Create Stream
ksql> describe dangerous_driving_s;
Field | Type---------------------------------ROWTIME | BIGINTROWKEY | VARCHAR(STRING)TS | VARCHAR(STRING)TRUCKID | VARCHAR(STRING)DRIVERID | BIGINTROUTEID | BIGINTROUTENAME | VARCHAR(STRING)EVENTTYPE | VARCHAR(STRING)LATITUDE | DOUBLELONGITUDE | DOUBLECORRELATIONID | VARCHAR(STRING)
Apache Kafka – Scalable Stream Processing and more!
Demo - All
Truck-2 truck/nn/position
Truck-1
Truck-3
mqtt-source
truck_position
detect_dangerous_driving
dangerous_driving
TruckDriver
jdbc-source trucking_driver
join_dangerous_driving_driver
dangerous_driving_driver
27,Walter,Ward,Y,24-JUL-85,2017-10-0215:19:00
consoleconsumer
2016-06-0214:39:56.605|98|27|803014426|Wichita toLittle RockRoute2|Normal|38.65|90.21|5187297736652502631
{"id":27,"firstName":"Walter","lastName":"Ward","available":"Y","birthdate":"24-JUL-85","last_update":1506923052012}
Apache Kafka – Scalable Stream Processing and more!
KafkatoCassandra trucking
Demo (V) – Create JDBC Connect through REST API #!/bin/bashcurl -X "POST" "http://192.168.69.138:8083/connectors" \
-H "Content-Type: application/json" \-d $'{
"name": "jdbc-driver-source","config": {
"connector.class": "JdbcSourceConnector","connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample","mode": "timestamp","timestamp.column.name":"last_update","table.whitelist":"driver","validate.non.null":"false","topic.prefix":"trucking_","key.converter":"org.apache.kafka.connect.json.JsonConverter","key.converter.schemas.enable": "false","value.converter":"org.apache.kafka.connect.json.JsonConverter","value.converter.schemas.enable": "false","name": "jdbc-driver-source","transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id"
}}'
Apache Kafka – Scalable Stream Processing and more!
Demo (V) – Create JDBC Connect through REST API
Apache Kafka – Scalable Stream Processing and more!
Demo (V) - Create Table with Driver Stateksql> CREATE TABLE driver_t \
(id BIGINT, \first_name VARCHAR, \last_name VARCHAR, \available VARCHAR) \WITH (kafka_topic='trucking_driver', \
value_format='JSON');Message----------------Table created
Apache Kafka – Scalable Stream Processing and more!
Demo (V) - Create Table with Driver Stateksql> CREATE STREAM dangerous_driving_and_driver_s \WITH (kafka_topic='dangerous_driving_and_driver_s', \
value_format='JSON') \AS SELECT driverid, first_name, last_name, truckid, routeid,routename, eventtype \FROM truck_position_s \LEFT JOIN driver_t \ON dangerous_driving_and_driver_s.driverid = driver_t.id;
Message----------------------------Stream created and running
ksql> select * from dangerous_driving_and_driver_s;1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Memphis to Little Rock Route 2 | Unsafe tail distance1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Joplin to KansasCity | Lane Departure1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Saint Louis to Chicago Route2 | Unsafe tail distance
Apache Kafka – Scalable Stream Processing and more!
Apache Kafka – Scalable Stream Processing and more!
Technology on its own won't help you.You need to know how to use it properly.