©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-2
Inthischapteryouwilllearn
§ WhatKa8aisandwhatadvantagesitoffers
§ Aboutthehigh-levelarchitectureofKa8a
§ WhatseveralusecasesforKa8aare
§ Howtocreatetopics,publishmessages,andreadmessagesfromthecommandlineandinJavacode
ApacheKa)a
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-3
ChapterTopics
ApacheKa8a
§ Overview
§ UseCases
§ Messages,Topics,andParIIons
§ ProducersandConsumers
§ MessageOrderingGuarantees
§ UsingtheJavaAPI
§ EssenIalPoints
§ Hands-OnExercise:UsingKa)afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-4
§ ApacheKa8aisadistributedcommitlogservice– Widelyusedfordataingest– Offersscalability,performance,reliability,andflexibility– Conceptuallysimilartoapublish-subscribemessagingsystem
§ OriginallycreatedatLinkedIn,butnowanopensourceApacheproject– DonatedtotheApacheSoXwareFoundaIonin2012– GraduatedfromtheApacheIncubatorin2013– IncludedaspartofClouderaLabsin2014– SupportedbyClouderaforproducIonusewithCDHin2015
WhatisApacheKa)a?
Apache Kafka
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-5
§ Scalable– Ka)aisadistributedsystemthatsupportsmulIplenodes
§ Fault-tolerant– Dataispersistedtodiskandreplicatedthroughoutthecluster
§ Highthroughput– Eachbrokercanprocesshundredsofthousandsofmessagespersecond
§ Lowlatency– DataisdeliveredinafracIonofasecond
§ Flexible– DecouplestheproducIonofdatafromitsconsumpIon
CharacterisIcsofKa)a
*
*Usingmodesthardware,withmessagesofatypicalsize
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-6
§ Messagesrepresentarbitraryuser-definedcontent– Forexample,applicaIoneventsorsensorreadings
§ AnoderunningtheKa8aserviceiscalledabroker– AproducIonclustertypicallyhasmanyKa)abrokers– Ka)aalsodependsontheZooKeeperserviceforcoordinaIon
§ Producerspushmessagestoabroker– Theproducerassignsatopic,orcategory,toeachmessage
§ ConsumerspullmessagesfromaKa8abroker– Theyreadonlymessagesinrelevanttopics
High-LevelArchitecture:Terminology
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-7
High-LevelArchitecture:Example
Producer #1 Producer #2
Consumer #1 Consumer #2
Kafka Cluster
Broker
Broker
Broker
Broker
Broker
Broker
login_failure
Producer #3
call_placed
login_failure
login_failure call_placed
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-8
ChapterTopics
ApacheKa8a
§ Overview
§ UseCases
§ Messages,Topics,andParIIons
§ ProducersandConsumers
§ MessageOrderingGuarantees
§ UsingtheJavaAPI
§ EssenIalPoints
§ Hands-OnExercise:UsingKa)afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-9
§ Ka8aisusedforavarietyofusecases,suchas– LogaggregaIon– Messaging– WebsiteacIvitytracking– OperaIonalmetrics– Streamprocessing– Eventsourcing
§ AsubsetofthesecouldalsobedonewithFlume– Forexample,aggregaIngWebserverlogdataintoHDFS
§ Ka8aoUenbecomesabeVerchoiceasusecasecomplexitygrows
WhyKa)a?
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-10
§ Distributedmessagebus/centraldatapipeline– EnableshighlyscalableEAI,SOA,CEPandmicroservicearchitectures– DecouplesserviceswithastandardizedmessageabstracIon– SupportsmulIplemessageclientlanguageswithhighthroughput
§ LogaggregaYon– Ka)acancollectlogsfrommulIpleservices– LogscanbemadeavailabletomulIpleconsumers,suchasHadoopandApacheSolr
CommonKa)aUseCases(1)
EAI: EnterpriseApplicaIonIntegraIonSOA:Service-OrientedArchitectureCEP:ComplexEventProcessing
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-11
§ WebsiteacYvitytracking– WebapplicaIonsendseventssuchaspageviewsandsearchestoKa)a– Eventsbecomeavailableforreal-Imeprocessing,dashboards,andofflineanalyIcsinHadoop
§ AlerYngandreporYngonoperaYonalmetrics– Ka)aproducersandconsumersoccasionallypublishtheirmessagecountstoaspecialKa)atopic– AservicecomparescountsandsendsanalertupondetecIngdataloss
§ Streamprocessing– AframeworksuchasSparkStreamingreadsdatafromatopic,processesit,andwritesprocesseddatatoanewtopicwhereitbecomesavailableforusersandapplicaIons– Ka)a’sstrongdurabilityhelpstofacilitatethisusecase
CommonKa)aUseCases(2)
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-12
ChapterTopics
ApacheKa8a
§ Overview
§ UseCases
§ Messages,Topics,andParYYons
§ ProducersandConsumers
§ MessageOrderingGuarantees
§ UsingtheJavaAPI
§ EssenIalPoints
§ Hands-OnExercise:UsingKa)afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-13
§ MessagesinKa8aarevariable-sizebytearrays– AllowsforserializaIonofdatainanyformatyourapplicaIonrequires– Commonformatsincludestrings,JSON,andAvro
§ Thereisnoexplicitlimitonmessagesize– OpImalperformanceusuallyoccurswithmessagesofafewKBinsize– Werecommendthatyoudonotexceed1MBpermessage
§ Ka8aretainsallmessagesforadefinedYmeperiod– Thisperiodcanbesetonglobalorper-topicbasis– Messageswillberetainedregardlessofwhethertheywereread– TheyarediscardedautomaIcallyaXertheretenIonperiodisexceeded
Messages
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-14
§ Thereisnoexplicitlimitonthenumberoftopics– Ka)aworksbe@erwithafewlargetopicsthanmanysmallones
§ Atopiccanbecreatedexplicitlyorsimplybypublishingtothetopic– Controlledbytheauto.create.topics.enableproperty– Werecommendthattopicsbecreatedexplicitly
Topics
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-15
§ EachtopicisdividedintosomenumberofparYYons*– ParIIoningimprovesscalabilityandthroughput
§ AtopicparYYonisanorderedandimmutablesequenceofmessages– NewmessagesareappendedtotheparIIonastheyarereceived– EachmessageisassignedauniquesequenIalIDknownasanoffset
TopicParIIoning
*NotethatthisisunrelatedtoparIIoninginHDFS,MapReduce,orSpark
Older messages Newer messages
Partition 1
Partition 2
0 1 2 3 4 5 6 7 8 9
Time
Partition 0
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8
Producer A
Producer B
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-16
§ EachparYYoncanbereplicatedacrossaconfigurablenumberofbrokers*– Doingsoisrecommended,asitprovidesfaulttolerance
§ EachbrokeractsasaleaderforsomeparYYonsandafollowerforothers– Followerspassivelyreplicatetheleader– Iftheleaderfails,afollowerwillautomaIcallybecomethenewleader
ReplicaIon
*NotethatthisisunrelatedtoHDFSreplicaIon
Broker B
Partition 0
Partition 2
Broker C
Partition 0
Partition 1
Partition 0
Broker A
Partition 1
Partition 2
Partition 1
Partition 2
Leader
Follower
Legend
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-17
§ InproducYon,youwilllikelystartKa8aviaClouderaManager– Inthisclass,wemuststartitmanuallyontheVM
§ SinceKa8adependsonZooKeeper,wemuststartthatservicefirst
§ WecanthenstarttheKa8aservice
StarIngtheKa)aBroker
$ sudo service zookeeper-server start
$ sudo service kafka-server start
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-18
§ Ka8aincludesaconvenientsetofcommandlinetools– ThesearehelpfulforexploringandexperimentaIon
§ Thekafka-topicscommandoffersasimplewaytocreateKa8atopics– Providethetopicnameofyourchoice,suchasdevice_status – YoumustalsospecifytheZooKeeperconnecIonstringforyourcluster
CreaIngTopicsfromtheCommandLine
$ kafka-topics --create \ --zookeeper localhost:2181 \ --replication-factor 1 \ --partitions 1 \ --topic device_status
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-19
§ Usethe--listparametertolistalltopics
DisplayingTopicsfromtheCommandLine
$ kafka-topics --list \ --zookeeper localhost:2181
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-20
ChapterTopics
ApacheKa8a
§ Overview
§ UseCases
§ Messages,Topics,andParIIons
§ ProducersandConsumers
§ MessageOrderingGuarantees
§ UsingtheJavaAPI
§ EssenIalPoints
§ Hands-OnExercise:UsingKa)afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-21
§ ProducerspublishmessagestoKa8atopics– Theycommunicatewithabroker,notaconsumer
ProducerRecap
Producer #1 Producer #2
Kafka ClusterBroker
Broker
Broker
Broker
Broker
Broker
login_failure
Producer #3
call_placed
login_failure
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-22
§ AproducerisresponsibleforselecYngparYYonsformessagesitpublishes– ThisisprimarilydonetobalancetheloadacrossallparIIons– TheproducerwritesmessagestoaparIIoninorder– ApluggablePartitionerclassselectstheparIIonforeachmessage
SelecIngtheParIIon
Older messages Newer messages
Partition 1
Partition 2
0 1 2 3 4 5 6 7 8 9
Time
Partition 0
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8
Producer A
Producer B
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-23
§ ProducerscancollectmulYplemessagestowritetoaparYYon– Thisreducesthenumberofrequestsmadetobrokers– SuchrequestssenttobrokerscontainonebatchperparIIon
§ BatchingiscontrolledthroughproperYessetfortheproducer– Thedefaultistosendmessagesimmediately– Batchsizeisconfigurable,asisthemaxImetowaitbeforesending
Aside:MessageBatchesIncreaseThroughputandLatency
Older messages Newer messages
Partition 1
Partition 2
0 1 2 3 4 5 6 7 8
Time
Partition 0
0 1 2 3 4 5 6 8
0 1 2 3 4 5 6 7 Producer B
9
Producer A7
8
9 10
9 10
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-24
§ Theproducerisconfiguredwithalistofoneormorebrokers– ItasksthefirstavailablebrokerfortheleaderofthedesiredparIIon
§ Theproducerthensendsthemessagetotheleader– Theleaderwritesthemessagetoitslocallog– Eachfollowerthenwritesthemessagetoitsownlog– AXeracknowledgementsfromfollowers,themessageiscommi@ed
MessagesareReplicated
Broker B
Partition 0
Partition 2
Broker C
Partition 0
Partition 1
Broker A
Partition 1
Partition 2
Partition 1
Partition 2
1 Partition 02
3Producer
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-25
§ Youcancreateaproducerusingthekafka-console-producertool
§ Specifyoneormorebrokersinthe--broker-listopYon– Eachbrokerconsistsofahostname,acolon,andaportnumber– IfspecifyingmulIplebrokers,separatethemwithcommas– Inourcasethereisonebroker:localhost:9092
§ Youmustalsoprovidethenameofthetopic– Wewillpublishmessagestothetopicnameddevice_status
CreaIngaProducerfromtheCommandLine(1)
$ kafka-console-producer \ --broker-list localhost:9092 \ --topic device_status
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-26
§ YoumayseeafewlogmessagesintheterminalaUertheproducerstarts
§ Itwillthenacceptinputintheterminalwindow– Eachlineyoutypewillbeamessagesenttothetopic
§ UnYlyouhaveconfiguredaconsumerforthistopic,you’llseenootheroutputfromKa8a
CreaIngaProducerfromtheCommandLine(2)
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-27
§ AconsumerreadsmessagesthatwerepublishedtoKa8atopics– Theycommunicatewithabroker,notaproducer
§ ConsumeracYonsdonotaffectotherconsumers– Forexample,issuingtheKa)acommandlinetoolto"tail"thecontentsofatopicdoesnotchangewhatisconsumedbyotherconsumers
§ Theycancomeandgowithoutimpactontheclusterorotherconsumers
ConsumerRecap
Consumer #1 Consumer #2
login_failure call_placed
Kafka ClusterBroker
Broker
Broker
Broker
Broker
Broker
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-28
§ Youcancreateaconsumerwiththekafka-console-consumertool
§ ThisrequirestheZooKeeperconnecYonstringforyourcluster– UnlikecreaIngaproducer,whichinsteadrequiredalistofbrokers
§ Thecommandalsorequiresatopicname– Inourcase,wewillusedevice_status
§ Youcanuse--from-beginningtoreadallavailablemessages– Otherwise,itwouldreadonlynewmessages
CreaIngaConsumerfromtheCommandLine
$ kafka-console-consumer \ --zookeeper localhost:2181 \ --topic device_status \ --from-beginning
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-29
§ UsingUNIXpipesorredirecYon,youcanreadinputfromfiles– Thedatacanthenbesenttoatopicusingthecommandlineproducer
§ Thisexampleshowshowtoreadinputfromafilenamedalerts.txt – Eachlineinthisfilebecomesaseparatemessageinourtopic
§ ThistechniquecanbeaneasywaytointegratewithexisYngprograms
WriIngFileContentstoTopicsviatheCommandLine
$ cat alerts.txt | kafka-console-producer \ --broker-list localhost:9092 \ --topic device_status
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-30
§ MessaginghastwotradiYonalmodels– Queuing– Publish-subscribe
§ Withqueuing,apoolofconsumersmayreadfromaserverandeachmessagegoestooneofthem
§ Inpublish-subscribe,themessageisbroadcasttoallconsumers
§ AKa8aconsumergroupisaconsumerabstracYonthatgeneralizesbothofthesemodels
HowdoesKa)adifferfromtradiIonalmessagemodels?
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-31
§ Eachmessagepublishedtoatopicisdeliveredtooneconsumerinstancewithineachsubscribingconsumergroup
§ Consumerinstancescanbeinseparateprocessesoronseparatemachines
§ ThediagrambelowdepictsaKa8aclusterwithtwobroker(servers)– ThebrokersarehosIngfourparIIons,P0-P3– ConsumergroupAhastwoconsumerinstancesandgroupBhasfour
Ka)aConsumerGroupOperaIon
Ka8aCluster P0 P3 P1 P2
ConsumerGroupA
C1 C2
ConsumerGroupB
C3 C4 C5 C6
Broker1 Broker2
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-32
§ Ka8afuncYonslikeatradiYonalqueuewhen– Allconsumerinstancesbelongtothesameconsumergroup– Inthiscase,agivenmessageisreceivedbyoneconsumer
§ Ka8afuncYonsliketradiYonalpublish-subscribewhen– Eachconsumerinstancebelongstoadifferentconsumergroup– Inthiscase,allmessagesarebroadcasttoallconsumers
Ka)aConsumerGroupConfiguraIons
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-33
§ Inbetweenthetwoextremesofqueuingorpublish-subscribeliesabalancedsoluYon– Atopiccanhaveoneconsumergroupforeach“logicalsubscriber”
§ Inthisapproach,eachconsumergroupiscomposedofmanyconsumerinstances– Thisprovidesscalabilityandfaulttolerance– Amountstopublish-subscribesemanIcswherethesubscriberisaclusterofconsumersinsteadofasingleprocess
Using“LogicalSubscribers”
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-34
ChapterTopics
ApacheKa8a
§ Overview
§ UseCases
§ Messages,Topics,andParIIons
§ ProducersandConsumers
§ MessageOrderingGuarantees
§ UsingtheJavaAPI
§ EssenIalPoints
§ Hands-OnExercise:UsingKa)afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-35
§ AtradiYonalqueueretainsmessagesinorderontheserver– Theserverhandsoutmessagestoconsumersintheordertheyarestored
§ Insomemessagesystems,messagesdeliveredtoconsumersasynchronouslymayarriveoutoforderatdifferentconsumers– MessageorderiseffecIvelylostinthepresenceofparallelconsumpIon
§ Theworkaroundistoallowonlyoneprocesstoconsumefromaqueue– Thisisthe"exclusiveconsumer"approach– Thereisnoparallelism
TradiIonalMessageOrdering
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-36
§ ParYYonswithinKa8atopicsmakeitpossibletoprovideaconsumergroupwith– Messageorderingguarantees– Loadbalancing
§ ParYYonsareassignedtoconsumersinaconsumergroup– EachparIIonisconsumedbyexactlyoneconsumerinthegroup– TheconsumerofaparIIonistheonlyreaderofthatparIIonandconsumesthedatainorder
§ ThenumberofconsumerscannotexceedthenumberofparYYons
Ka)aOrdering
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-37
§ Ka8aonlyprovidesatotalorderovermessageswithinaparYYon,notbetweendifferentparYYonsinatopic
§ Per-parYYonorderingcombinedwiththeabilitytoparYYondatabykeyissufficientformostapplicaYons
§ SomeapplicaYonsrequiretotalorderingforagiventopic– AccomplishthisbycreaIngjustoneparIIonforthetopic– Notethatthismeansonlyoneconsumerprocessisallowed
Ka)aOrderingTip
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-38
§ MessagessentbyaproducertoaparYculartopicparYYonwillbeappendedintheordertheyaresent– Forexample,ifmessageM1issentbythesameproducerasmessageM2,andM1issentfirst,then– M1willhavealoweroffsetthanM2– M1willappearearlierinthelogthanM2
§ Aconsumerseesmessagesintheorderinwhichtheyarestoredinthelog
§ ForatopicwithreplicaYonfactorN,uptoN-1serverfailurescanoccurwithoutlosinganymessagescommiVedtothelog
Ka)aGuarantees
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-39
ChapterTopics
ApacheKa8a
§ Overview
§ UseCases
§ Messages,Topics,andParIIons
§ ProducersandConsumers
§ MessageOrderingGuarantees
§ UsingtheJavaAPI
§ EssenIalPoints
§ Hands-OnExercise:UsingKa)afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-40
§ Ka8a’sJavaAPIallowsyoutoeasilycreateproducersandconsumers– Yourcodecansendmessagestoatopicusingaproducer– Yourcodecanalsoreadmessagessenttoatopicusingaconsumer
§ Thenextthreeslidesshowsamplecodeforasimpleproducerthatsendsamessagetoatopic
Ka)aJavaAPI:Producer
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-41
SimpleProducer(1):ImportStatementsandClassDeclaraIon
package com.loudacre.example; import java.util.Properties; import java.util.concurrent.Future; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.ProducerConfig; import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.clients.producer.RecordMetadata; import org.apache.kafka.common.serialization.StringSerializer; public class ProducerExample { public static void main(String[] args) {
Note:filecon0nuesonnextslide
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-42
SimpleProducer(2):ProducerProperIesConfiguraIon
Properties props = new Properties();
// This is a comma-delimited list of brokers to contact props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
// This specifies that the write will only be committed // after all brokers with replicas have acknowledged it props.put(ProducerConfig.ACKS_CONFIG, "all");
// # of bytes to collect in message batch before sending props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384); // Specifies classes used for message serialization props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
Note:filecon0nuesonnextslide
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-43
SimpleProducer(3):MessageCreaIonandPublicaIon
// Create a Producer using our configuration properties Producer<String, String> producer =
new KafkaProducer<String, String>(props); // Specify the topic and value for the message String topic = "app_events"; String value = "CART_ADD,alice,0872584";
// Create and send the message ProducerRecord<String, String> message = new ProducerRecord<String, String>(topic, value); producer.send(message); // Close the producer once we no longer need it producer.close(); } }
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-44
§ Thenextfewslidesprovidesamplecodeforasimpleconsumer– Thisconsumerreadsmessagespostedtotheselectedtopic
Ka)aJavaAPI:Consumer
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-45
High-LevelConsumer(1):ImportsandClassDeclaraIon
package com.loudacre.example; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Properties; import kafka.consumer.Consumer; import kafka.consumer.ConsumerConfig; import kafka.consumer.ConsumerIterator; import kafka.consumer.KafkaStream; import kafka.javaapi.consumer.ConsumerConnector; import kafka.serializer.Decoder; import kafka.serializer.StringDecoder; public class ConsumerExample { public static void main(String[] args) {
Note:filecon0nuesonnextslide
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-46
High-LevelConsumer(2):PropertyConfiguraIon
// Define required properties and configure the consumer Properties props = new Properties(); props.put("zookeeper.connect", "localhost:2181"); props.put("group.id", "example"); ConsumerConfig cfg = new ConsumerConfig(props); ConsumerConnector consumer = Consumer.createJavaConsumerConnector(cfg); // Prepare to subscribe to app_events with one thread String topic = "app_events"; Map<String, Integer> tpx=new HashMap<String, Integer>(); tpx.put(topic, Integer.valueOf(1)); // Set up the message decoder and subscribe to the topic Decoder<String> dec = new StringDecoder(null); Map<String, List<KafkaStream<String, String>>> sm = consumer.createMessageStreams(tpx, dec, dec);
Note:filecon0nuesonnextslide
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-47
High-LevelConsumer(3):MessageProcessing
// Get our topic's stream and iterate over its messages for (KafkaStream<String, String> str : sm.get(topic)) { ConsumerIterator<String, String> i = str.iterator();
// Process each incoming message while (i.hasNext()) { String message = i.next().message(); System.out.println("Message was: " + message); } } } }
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-48
ChapterTopics
ApacheKa8a
§ Overview
§ UseCases
§ Messages,Topics,andParIIons
§ ProducersandConsumers
§ MessageOrderingGuarantees
§ UsingtheJavaAPI
§ EssenYalPoints
§ Hands-OnExercise:UsingKa)afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-49
§ NodesrunningtheKa8aservicearecalledbrokers
§ Producerspublishmessagestocategoriescalledtopics
§ Messagesinatopicarereadbyconsumers– MulIpleconsumerinstancescanbelongtoaconsumergroup– Ka)aretainsmessagesforadefined(butconfigurable)amountofIme– Consumersmaintainanoffsettotrackwhichmessagestheybeenread
§ TopicsaredividedintoparYYonsforperformanceandscalability– TheseparIIonsarereplicatedforfaulttolerance
EssenIalPoints
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-50
ThefollowingoffermoreinformaYonontopicsdiscussedinthischapter
§ TheApacheKa8aWebsite– http://kafka.apache.org/
§ Real-TimeFraudDetec:onArchitecture– http://tiny.cloudera.com/kmc01a
§ Ka8aReferenceArchitecture– http://tiny.cloudera.com/kmc01b
§ TheLog:WhatEverySoDwareEngineerShouldKnow…– http://tiny.cloudera.com/kmc01c
Bibliography
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-51
ChapterTopics
ApacheKa8a
§ Overview
§ UseCases
§ Messages,Topics,andParIIons
§ ProducersandConsumers
§ MessageOrderingGuarantees
§ UsingtheJavaAPI
§ EssenIalPoints
§ Hands-OnExercise:UsingKa8afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-52
§ Inthisexercise,youwilluseKa8a’scommandlineuYliYestocreateanewtopic,publishmessagestothetopicwithaproducer,andreadmessagesfromthetopicwithaconsumer– PleaserefertotheHands-OnExerciseManualforinstrucIons
Hands-OnExercise:UsingKa)afromtheCommandLine
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-2
Inthischapteryouwilllearn
§ WhattoconsiderwhenchoosingbetweenFlumeandKa<aforausecase
§ HowFlumeandKa<acanworktogether
§ HowtoconfigureaFlumesourcethatreadsfromaKa<atopic
§ HowtoconfigureaFlumesinkthatpublishestoaKa<atopic
Integra(ngFlumeandKa0a
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-3
ChapterTopics
IntegraBngFlumeandKa<a
§ Overview
§ UseCases
§ Configura(on
§ TipsforDeployment
§ Essen(alPoints
§ Hands-OnExercise:UsingKa0aasaFlumeSink
§ Hands-OnExercise:UsingKa0aasaFlumeSource
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-4
§ BothFlumeandKa<aarewidelyusedfordataingest– Althoughthesetoolsdiffer,theirfunc(onalityhassomeoverlap– SomeusecasescouldbeimplementedwitheitherFlumeorKa0a
§ HowdoyoudeterminewhichisabeGerchoiceforyourusecase?
ShouldIUseKa0aorFlume?
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-5
§ FlumeisefficientatmovingdatafromasinglesourceintoHadoop– ItofferssinksthatwritetoHDFS,anHBasetable,oraSolrindex– Easilyconfiguredtosupportcommonscenarios,withoutwri(ngcode– Canalsoprocessandtransformdataduringtheingestprocess
Characteris(csofFlume
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-6
§ Ka<aisapublish-subscribemessagingsystem– Itoffersmoreflexibilityforconnec(ngmul(plesystems– ProvidesbeEerdurabilityandfaulttolerancethanFlume– Typicallyrequireswri(ngcodeforproducersand/orconsumers– NodirectsupportforprocessingmessagesorloadingintoHadoop
Characteris(csofKa0a
Apache Kafka
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-7
§ BothsystemshavestrengthsandlimitaBons
§ Youdon’tnecessarilyhavetochoosebetweenthem– Itispossibletousebothwhenimplemen(ngyourusecase
§ Fla<aistheinformalnameforFlume-Ka<aintegraBon– ItusesaFlumeagenttoreadfromorwritemessagestoKa0a
§ ItisimplementedasaKa<asourceandsinkforFlume– ThesecomponentsshipwithFlume,star(ngwithCDH5.2.0– AKa0achannelalsonowshipswithFlume,star(ngwithCDH5.3.0
Fla0a=Flume+Ka0a
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-8
ChapterTopics
IntegraBngFlumeandKa<a
§ Overview
§ UseCases
§ Configura(on
§ TipsforDeployment
§ Essen(alPoints
§ Hands-OnExercise:UsingKa0aasaFlumeSink
§ Hands-OnExercise:UsingKa0aasaFlumeSource
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-9
§ ByusingtheKa<asink,Flumecanpublishmessagestoatopic
§ Inthisexample,anapplicaBonusesFlumetopublishapplicaBonevents– Theapplica(onsendsdatatotheFlumesourcewheneventsoccur– Theeventdataisbufferedinthechannelun(litistakenbythesink– SinceweuseaKa0asink,theeventsarepublishedtoaspecifiedtopic– AnyKa0aconsumercanthenreadmessagesforapplica(onevents
UsingFlumeasaKa0aProducer
Application
Source (netcat)
Channel (Memory)
Sink (Kafka)
Flume Agent Kafka Cluster
Broker
Broker
Broker
Consumer
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-10
§ ByusingtheKa<asource,Flumecanreadmessagesfromatopic– Itcanthenwritethemtoyourdes(na(onofchoiceusingaFlumesink
§ Inthisexample,theProducersendsmessagestoKa<abrokers– TheFlumeagentusesaKa0asource,whichactsasaconsumer– TheKa0asourcereadsmessagesinaspecifiedtopic– Themessagedataisbufferedinthechannelun(litistakenbythesink– ThesinkthenwritesthedataintoHDFS
UsingFlumeasaKa0aConsumer
Kafka Cluster
Broker
Broker
Broker
Producer
Source (Kafka)
Channel (Memory)
Sink (HDFS)
Flume Agent Hadoop Cluster
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-11
ChapterTopics
IntegraBngFlumeandKa<a
§ Overview
§ UseCases
§ ConfiguraBon
§ TipsforDeployment
§ Essen(alPoints
§ Hands-OnExercise:UsingKa0aasaFlumeSink
§ Hands-OnExercise:UsingKa0aasaFlumeSource
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-12
§ ThetablebelowdescribesseveralproperBesoftheKa<asink
Configura(on:UsingFlumeasaKa0aProducer(1)
Application
Source (netcat)
Channel (Memory)
Sink (Kafka)
Flume Agent Kafka Cluster
Broker
Broker
Broker
Consumer
Name DescripBon
type Mustbesettoorg.apache.flume.sink.kafka.KafkaSink
brokerList Comma-separatedlistofbrokers(formathost:port)tocontact
topic ThetopicinKa0atowhichthemessageswillbepublished.
batchSize Howmanymessagestoprocessinonebatch
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-13
§ ThisistheFlumeconfiguraBonfortheexampleonthepreviousslide
Configura(on:UsingFlumeasaKa0aProducer(2)
# Define names for the source, channel, and sink agent1.sources = source1 agent1.channels = channel1 agent1.sinks = sink1 # Define the properties of our source, which receives event data agent1.sources.source1.type = netcat agent1.sources.source1.bind = localhost agent1.sources.source1.port = 44444 agent1.sources.source1.channels = channel1 # Define the properties of our channel agent1.channels.channel1.type = memory agent1.channels.channel1.capacity = 10000 agent1.channels.channel1.transactionCapacity = 1000
Note:filecon.nuesonnextslide
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-14
§ TheremainingporBonoftheconfiguraBonfilesetsuptheKa<asink
Configura(on:UsingFlumeasaKa0aProducer(2)
# Define our Kafka sink, which publishes to the app_event topic agent1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink agent1.sinks.sink1.topic = app_events agent1.sinks.sink1.brokerList = localhost:9092 agent1.sinks.sink1.batchSize = 20 agent1.sinks.sink1.channel = channel1
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-15
§ ThetablebelowdescribesseveralproperBesoftheKa<asource
Configura(on:UsingFlumeasaKa0aConsumer(1)
Name DescripBon
type org.apache.flume.source.kafka.KafkaSource
zookeeperConnect ZooKeeperconnec(onstring(e.g.,localhost:2181)
groupId UniqueIDtousefortheconsumergroup(default:flume)
topic NameofKa0atopicfromwhichmessageswillberead
Kafka Cluster
Broker
Broker
Broker
Producer
Source (Kafka)
Channel (Memory)
Sink (HDFS)
Flume Agent Hadoop Cluster
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-16
Configura(on:UsingFlumeasaKa0aConsumer(2)
§ ThisistheFlumeconfiguraBonfortheexampleonthepreviousslide– ItdefinesasourceforreadingmessagesfromaKa0atopic
# Define names for the source, channel, and sink agent1.sources = source1 agent1.channels = channel1 agent1.sinks = sink1 # Define a Kafka source that reads from the calls_placed topic # The "type" property line wraps around due to its long value agent1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource agent1.sources.source1.zookeeperConnect = localhost:2181 agent1.sources.source1.topic = calls_placed agent1.sources.source1.groupId = flume agent1.sources.source1.channels = channel1
Note:filecon.nuesonnextslide
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-17
Configura(on:UsingFlumeasaKa0aConsumer(2)
§ ThisistheFlumeconfiguraBonfortheexampleonthepreviousslide
# Define the properties of our channel agent1.channels.channel1.type = memory agent1.channels.channel1.capacity = 10000 agent1.channels.channel1.transactionCapacity = 1000 # Define the sink that writes call data to HDFS agent1.sinks.sink1.type=hdfs agent1.sinks.sink1.hdfs.path = /user/training/calls_placed agent1.sinks.sink1.hdfs.fileType = DataStream agent1.sinks.sink1.hdfs.fileSuffix = .csv agent1.sinks.sink1.channel = channel1
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-18
ChapterTopics
IntegraBngFlumeandKa<a
§ Overview
§ UseCases
§ Configura(on
§ TipsforDeployment
§ Essen(alPoints
§ Hands-OnExercise:UsingKa0aasaFlumeSink
§ Hands-OnExercise:UsingKa0aasaFlumeSource
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-19
§ Ka<ahasasignificantlysmallerproducerandconsumerecosystem– UseKa0aifyou’repreparedtoimplementproducersandconsumers
§ UseFlumeifitssourcesandsinksmatchyourrequirements– Flumehasmanybuilt-insourcesandsinksfromwhichtochoose– Usingthemrequiresonlyconfigura(on,notwri(ngcode
UseKa0aforCustomProducersandConsumers
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-20
§ Flumecanprocessdatain-flightusinginterceptors– Thesecanbeveryusefulforfilteringortransformingdata
§ Ka<arequiresanexternalstreamprocessingsystem– SparkStreamingisapopularchoice
UseFlumeforFilteringandTransformingData
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-21
§ BothKa<aandFlumearereliablesystemsthatcanguaranteenodataloss
§ However,Flumedoesnotreplicateevents– Asaresult,ifanodewiththeFlumeagentcrashes,youwillloseaccesstotheeventsinthechannelun(lyourecoverthedisks– Thisistrueevenwhenusingthefilechannel
§ UseKa<aifyouneedaningestpipelinewithveryhighavailability
UseKa0aforHighAvailability
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-22
§ YoucanconfigureaFlumeagenttousemulBplechannels– Eachchannelsendsdatatoanassociatedsink
§ ThiscanbeusedtowritedatatoHDFSandKa<asimultaneously
AFlumeAgentCanWritetoMul(pleSinks
Flume Agent
Channel Kafka Sink
Source
Channel HDFS Sink
Write data to HDFS
Publish toKafka topic
Application
Broker
Broker
Broker
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-23
ChapterTopics
IntegraBngFlumeandKa<a
§ Overview
§ UseCases
§ Configura(on
§ TipsforDeployment
§ EssenBalPoints
§ Hands-OnExercise:UsingKa0aasaFlumeSink
§ Hands-OnExercise:UsingKa0aasaFlumeSource
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-24
§ FlumeandKa<aaredisBnctsystemswithdifferentdesigns– Youmustweighttheadvantagesanddisadvantagesofeachwhenselec(ngthebesttoolforyourusecase
§ FlumeandKa<acanbecombinedwithFla<a– ThisistheinformalnameforFlumecomponentsforKa0aintegra(on– YoucanreadmessagesfromatopicusingaKa0asource– YoucanpublishmessagestoatopicusingaKa0asink
Essen(alPoints
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-25
ThefollowingoffermoreinformaBonontopicsdiscussedinthischapter
§ Fla<a:ApacheFlumeMeetsApacheKa<aforEventProcessing– http://tiny.cloudera.com/kmc02a
§ DesigningFraud-DetecBonArchitectureThatWorksLikeYourBrainDoes– http://tiny.cloudera.com/kmc02b
Bibliography
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-26
ChapterTopics
IntegraBngFlumeandKa<a
§ Overview
§ UseCases
§ Configura(on
§ TipsforDeployment
§ Essen(alPoints
§ Hands-OnExercise:UsingKa<aasaFlumeSink
§ Hands-OnExercise:UsingKa0aasaFlumeSource
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-27
§ Inthisexercise,youwilluseFlume’sKa<asinktowritedatathatwasreceivedbyaFlumeagentintoaKa<asink– PleaserefertotheHands-OnExerciseManualforinstruc(ons
Hands-OnExercise:UsingKa0aasaFlumeSink(Fla0a)
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-28
ChapterTopics
IntegraBngFlumeandKa<a
§ Overview
§ UseCases
§ Configura(on
§ TipsforDeployment
§ Essen(alPoints
§ Hands-OnExercise:UsingKa0aasaFlumeSink
§ Hands-OnExercise:UsingKa<aasaFlumeSource
©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-29
§ Inthisexercise,youwilluseFlume’sKa<asourcetoreaddatapublishedtoaKa<atopicandwriteittoadirectoryinHDFS– PleaserefertotheHands-OnExerciseManualforinstruc(ons
Hands-OnExercise:UsingKa0aasaFlumeSource(Fla0a)