BIOData Engineer at Lookout, started 2013Previously at Demandbase, Project Perf. Corp.6 years of Data EngineeringFrom Mumbai, Indiaetl.svbtle.com
DATA ANALYTICS TEAM7 Analysts/Scientists
Questions they answer
How many users located their phone yesterday?How many users were billed for AT&T?
REPORTINGTableau
Dashboards - Retentions, Activations, etc.Email reports
Custom email reports (Ruby)
ADHOC QUERYINGHive CLI - Command-line interface to HiveHue - Toad style GUI for ad hoc queries on HiveR Studio - Statistical analysisShiny - Reporting/Querying tool based on RSparkle Pony(Homegrown Ruby app) - MySQL Querying forstakeholdersHadoop File System Browser
STORMApache Storm is a distributed realtime computation system. It
can be used with any programming language.
NIMBUS AND SUPERVISORA storm cluster has
One Nimbus node which is the masterA set of Supervisor nodes which are the workers
LANDING DATA IN HADOOPTopologies write data to a landing directory in Hadoop using
Directories are rotated depending on latency requirements ofdownstream reportsDirectories are moved to location of the table in Hive
HDFS Bolt
DEPLOYMENTStorm topologies are jars that can be submitted to Storm Nimbus
storm jar path/to/allmycode.jar org.MyTopologyClass arg1 arg2
DEPLOYMENTConfiguration is stored in shell scripts that launch topologiesstorm jar /topolgoies/data-storm-0.0.3-SNAPSHOT.jar com.lookout.data.topology.KafkaToHdfsTopology-topologyname kafka-hdfs \-nimbushost dw-storm2 \-topologymaxtaskparallelism 1 \-D hdfs.sync.tuple.count=500 \-D hdfs.file.rotation.seconds=3600 \-D hdfs.landing.directory=/user/hive/warehouse/staging.db/locate_events \-D hdfs.destination.directory=/user/hive/warehouse/realdb.db/locate_events \-D hdfs.filesystem.url=hdfs://hadoop-cluster-01:8020/ \-D kafka.zookeeper.hosts=zk1:2181,zk2:2181,zk3:2181 \-D kafka.topic=locate_event \-D statsd.host=statsdhost
METRICS MONITORINGUse Storm's Metrics API (counters)Success/Failure metrics are sent to StatsD for aggregationVisualized using graphite
SLIDESJavascript Slides - reveal.jshttp://lab.hakim.se/reveal-js/#/