Post on 22-Jan-2018
transcript
1 ©HortonworksInc.2011– 2017.AllRightsReserved
Real-TimeIngestingandTransformingSensorDataandSocialDatawithNiFiandTensorFlowTimothySpannHortonworks@PaaSDev
2 ©HortonworksInc.2011– 2017.AllRightsReserved
Agenda
• What do we want to do?• Why?• How?• Apache NiFi• TensorFlow• Natural Language Processing• Demo• Questions
3 ©HortonworksInc.2011– 2017.AllRightsReserved
Whatdowewanttodo?
• MiniFi ingestscameraimagesandsensordata
• RunTensorFlow Inceptionv3torecognizeobjectsinimage
• NiFistoresimages,metadataandenricheddatainHadoop
• NiFiingestssocialdataandfeeds
• NiFianalyzessentimentoftextualdata
4 ©HortonworksInc.2011– 2017.AllRightsReserved
WhyGatherandAnalyzeSocialMediaStream?
- AutomateprocessestomaximizeSocialMediateam’stime
- Improvedresponsetimetorequests,complaintsandemergenciesinsocialmedia
- Predictiveanalyticstoknowwhenandwhereproblemswillhappen
- Learnwhereunhappycustomersareandaddressinstantly
5 ©HortonworksInc.2011– 2017.AllRightsReserved
Aggregatealldatafromsensors,geo-locationdevices,machinesandsocialfeeds
Collect:BringTogether
Mediatepoint-to-pointandbi-directionaldataflows,deliveringdatareliablytoHBase,Hive,SlackandEmail.
Conduct:MediatetheDataFlow
Parse,filter,join,transform,fork, query,sort,dissect;enrichwithweather,location,NLPandTensorFlow.
Curate:GainInsights
6 ©HortonworksInc.2011– 2017.AllRightsReserved
WhyApacheNiFi?
• Guaranteeddelivery• Databuffering
- Backpressure- Pressurerelease
• Prioritizedqueuing• FlowspecificQoS
- Latencyvs.throughput- Losstolerance
• Dataprovenance• Supportspushandpull
models
• Hundredsofprocessors• Visualcommandand
control• Overafiftysources• Flowtemplates• Pluggable/multi-role
security• Designedforextension• Clustering
7 ©HortonworksInc.2011– 2017.AllRightsReserved
DATAENR ICHMENT
DATAD ISCOVERY
Inceptionv3
PRED ICT IVEANALYT ICS
SentimentAnalysis
8 ©HortonworksInc.2011– 2017.AllRightsReserved
WhyTensorFlow?AlsoApacheMXNet,PyTorch andDL4J.
• Google• Multipleplatform
support• Hadoopintegration• Sparkintegration• Keras• LargeCommunity• PythonandJavaAPIs• GPUSupport• MobileSupport
• Inceptionv3• Clustering• Fullyfunctionaldemos• OpenSource• ApacheLicensed• LargeModelLibrary• Buzz• ExtensiveDocumentation• RaspberryPiSupport
9 ©HortonworksInc.2011– 2017.AllRightsReserved
• TensorFlow (C++, Python, Java) via ExecuteStreamCommand
• TensorFlow NiFi Java Custom Processor
• TensorFlow Running on Edge Nodes (MiniFi)
ApacheNiFiIntegrationwithTensorFlow Options
10 ©HortonworksInc.2011– 2017.AllRightsReserved
• TensorFlow Mobile (iOS, Android, RPi)
• TensorFlow on Spark (Yahoo) via Livy, S2S, Kafka
• TensorFlow Running in Containers in YARN 3.0 on Hadoop
• gRPC Call to TensorFlow Serving
ApacheNiFiIntegrationwithTensorFlow Options
11 ©HortonworksInc.2011– 2017.AllRightsReserved
ExecuteStreamCommand To TensorFlow
https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html
12 ©HortonworksInc.2011– 2017.AllRightsReserved
pythonclassify_image.py --image_file /dir/solarroofpanel.jpg
solardish,solarcollector,solarfurnace(score=0.98316)windowscreen(score=0.00196)manholecover(score=0.00070)radiator(score=0.00041)doormat,welcomemat(score=0.00041)
TensorFlow via Python
13 ©HortonworksInc.2011– 2017.AllRightsReserved
TensorFlow Java Processor in NiFi
https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-apache-nifi-12-for.html
https://github.com/tspannhw/nifi-tensorflow-processor
15 ©HortonworksInc.2011– 2017.AllRightsReserved
pipinstall-Utextblobpython-mtextblob.download_corpora
Installing TextBlob for Python
Installing spaCy for Python
https://community.hortonworks.com/articles/76935/using-sentiment-analysis-and-nlp-tools-with-hdp-25.html
pipinstall-Uspacypython-mspacy.en.download all
Installing NLTK for Python 2.7
http://www.nltk.org/install.html
pip install -U nltkpip install -U numpy
16 ©HortonworksInc.2011– 2017.AllRightsReserved
run.shpythonsentiment.py "$@”
sentiment.py
fromnltk.sentiment.vader importSentimentIntensityAnalyzerimportsyssid =SentimentIntensityAnalyzer()ss =sid.polarity_scores(sys.argv[1])print('Compound{0}Negative{1}Neutral{2}Positive{3}'.format(
ss['compound'],ss['neg'],ss['neu'],ss['pos']))
Local Sentiment Analysis via Python
17 ©HortonworksInc.2011– 2017.AllRightsReserved
ApacheOpenNLP forEntityResolutionProcessorhttps://github.com/tspannhw/nifi-nlp-processor
RequiresinstallationofNARandApacheOpenNLP BINs
Thisisanon-supportedprocessorthatIwroteandputintothecommunity.
Installing Apache OpenNLP NiFi Processor
https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
18 ©HortonworksInc.2011– 2017.AllRightsReserved
StanfordCoreNLP Processorhttps://github.com/tspannhw/nifi-corenlp-processor
RequiresinstallofNARandStanfordEnglishModelshttp://nlp.stanford.edu/software/stanford-english-corenlp-2017-06-09-models.jar
Thisisanon-supportedprocessorthatIwroteandputintothecommunity.
Installing Stanford CoreNLP Processor
https://community.hortonworks.com/articles/81270/adding-stanford-corenlp-to-big-data-pipelines-apac-1.html
19 ©HortonworksInc.2011– 2017.AllRightsReserved
Contact:
TimothySpann
@PaaSDeV
http://www.meetup.com/futureofdata-princeton
https://dzone.com/users/297029/bunkertor.html
https://github.com/tspannhw/dws2017sydney/blob/master/README.md
http://community.hortonworks.com/users/9304/tspann.html
20 ©HortonworksInc.2011– 2017.AllRightsReserved
à https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-apache-nifi-12-for.html
à https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
à https://community.hortonworks.com/articles/73833/an-example-websocket-application-in-apache-nifi-11.html
à https://community.hortonworks.com/articles/81694/extracttext-nifi-custom-processor-powered-by-apach.html
à https://community.hortonworks.com/articles/79842/ingesting-osquery-into-apache-phoenix-using-apache.html
à https://community.hortonworks.com/articles/67980/using-command-line-security-tools-from-apache-nifi.html
à https://community.hortonworks.com/articles/52415/processing-social-media-feeds-in-stream-with-apach.html
à https://community.hortonworks.com/articles/121916/controlling-big-data-flows-with-gestures-minifi-ni.html
à https://community.hortonworks.com/articles/86570/hosting-and-ingesting-data-from-web-pages-desktop.html
à https://community.hortonworks.com/articles/63228/monitoring-your-containers-with-sysdig-from-hdf-20.html
à https://community.hortonworks.com/articles/101679/iot-ingesting-gps-data-from-raspberry-pi-zero-wire.html
à https://community.hortonworks.com/articles/101904/part-2-iot-augmenting-gps-data-with-weather.html
à https://community.hortonworks.com/articles/101904/part-2-iot-augmenting-gps-data-with-weather.html
à https://community.hortonworks.com/articles/110475/ingesting-sensor-data-from-raspberry-pis-running-r.html
à https://community.hortonworks.com/articles/76240/using-opennlp-for-identifying-names-from-text.html
à https://community.hortonworks.com/articles/76935/using-sentiment-analysis-and-nlp-tools-with-hdp-25.html
21 ©HortonworksInc.2011– 2017.AllRightsReserved
à https://community.hortonworks.com/articles/76924/data-processing-pipeline-parsing-pdfs-and-identify.html
à https://community.hortonworks.com/articles/80339/iot-capturing-photos-and-analyzing-the-image-with.htmlh
à ttps://community.hortonworks.com/articles/122077/ingesting-csv-data-and-pushing-it-as-avro-to-kafka.html
à https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-apache-nifi-12-for.html
à https://github.com/tspannhw/nifi-tensorflow-processor
à https://community.hortonworks.com/articles/118148/creating-wordclouds-from-dataflows-with-apache-nif.html
à https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
à https://community.hortonworks.com/articles/110469/simple-backup-and-restore-of-hdfs-data-via-hdf-30.html
à https://github.com/tspannhw/rpi-rainbowhat
à https://community.hortonworks.com/articles/110475/ingesting-sensor-data-from-raspberry-pis-running-r.html
à https://community.hortonworks.com/articles/108718/ingesting-rdbms-data-as-new-tables-arrive-automagi.html
à https://community.hortonworks.com/articles/108947/minifi-for-ble-bluetooth-low-energy-beacon-data-in.html
à https://community.hortonworks.com/articles/108966/minifi-for-sensor-data-ingest-from-devices.html
à https://github.com/tspannhw/rpi-sensehat-minifi-python
à https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html
à https://community.hortonworks.com/articles/104255/ingesting-and-testing-jms-data-with-nifi.html
à https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html
22 ©HortonworksInc.2011– 2017.AllRightsReserved
à https://community.hortonworks.com/articles/104226/simple-backups-of-hadoop-with-apache-nifi-12.html
à https://community.hortonworks.com/articles/101904/part-2-iot-augmenting-gps-data-with-weather.html
à https://community.hortonworks.com/articles/101679/iot-ingesting-gps-data-from-raspberry-pi-zero-wire.html
à https://community.hortonworks.com/articles/99861/ingesting-ibeacon-data-via-ble-to-mqtt-wifi-gatewa.html
à https://community.hortonworks.com/articles/92345/store-a-flow-to-disk-and-then-reserialize-it-to-co.html
à https://community.hortonworks.com/articles/92495/monitor-apache-nifi-with-apache-nifi.html
à https://community.hortonworks.com/articles/92496/qadcdc-our-how-to-ingest-some-database-tables-to-h.html
à https://community.hortonworks.com/articles/89455/ingesting-gps-data-from-onion-omega2-devices-with.html
à https://community.hortonworks.com/articles/87397/steganography-with-apache-nifi-1.html
à https://community.hortonworks.com/articles/87632/ingesting-sql-server-tables-into-hive-via-apache-n.html
à https://community.hortonworks.com/articles/88404/adding-and-using-hplsql-and-hivemall-with-hive-mac.html
23 ©HortonworksInc.2011– 2017.AllRightsReserved
HortonworksCommunityConnection
Read access for everyone, join to participate and be recognized
• FullQ&APlatform(likeStackOverflow)
• KnowledgeBaseArticles
• CodeSamplesandRepositories