Why is my Hadoop* job slow?
Bikas Saha@bikassaha
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,Oozie, Zeppelin and the Hadoop elephant logo are trademarks of theApache Software Foundation.
Hitesh Shah
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics and Monitoring
Metrics as high level pointers
Ambari Metrics System
Ambari Grafana Integration
HBase, HDFS, YARN Dashboards
Metrics based alerting
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics as high level pointers
Machine level metrics like CPU load
Application level metrics like HDFS counters
Metrics at point of time
Metrics anomalies along a time series
Correlated anomalies
Problem is to need to know what to look for
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Metrics Service - Motivation
Limited Ganglia capabilities
OpenTSDB – GPL license and needs a Hadoop cluster
Need service level aggregation as well as time based
Alerts based on metrics system
Ability to scale past a 1000 nodes
Ability to perform analytics based on a use case
Allow fine grained control over aspects like: retention, collection intervals, aggregation
Pluggable and Extensible
First version released with Ambari 2.0.0
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Grafana Integration
Open source dashboard builder integrated with AMS.
Available from Ambari-2.2.2
Pre-defined host level and service level (HDFS, HBase, Yarn etc) dashboards.
Added to Ambari through API after upgrade
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics based Alerting
Top N support to quickly identify potential offenders
Alerting based on time series
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging and Correlation
HDFS, YARN Audit logs
Caller Context
YARN Application Timeline Service
Lineage tracking of operations across workloads
Ambari Log Search
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Audit Logs and Caller ContextFSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.32 cmd=createsrc=/tmp/in/_temporary/1/_temporary/attempt_14644848874070_0009_m_009995_0/part-m-09995 dst=null perm=root:hdfs:rw-r--r-- proto=rpccallerContext=tez_ta:attempt_1464484887407_0009_1_00_009995_0
FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.33 cmd=createsrc=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000097_0/part-m-00097 dst=null perm=root:hdfs:rw-r--r-- proto=rpccallerContext=mr_attempt_1464484887407_0011_m_000097_0
FSNamesystem.audit: allowed=true ugi=userB (auth:SIMPLE) ip=/172.22.68.34 cmd=create src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000095_0/part-m-00095 dst=null perm=root:hdfs:rw-r--r-- proto=rpccallerContext=mr_attempt_1464484887407_0011_m_000095_0
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
ResourceManager Audit Logs and Caller Context
resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.32 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0001
CALLERCONTEXT=PIG-pigSmoke.sh-8a052588-0013-4e39-83b1-ebad699d8e2e
resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.30 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0009
CALLERCONTEXT=CLI
resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.34 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0008
CALLERCONTEXT=mr_attempt_1464484887407_0007_m_000000_0
resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.30 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0012
CALLERCONTEXT=HIVE_SSN_ID:f3aadf99-9e36-494b-84a1-99b685ac344b
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Application Timeline Service
YARN service for fine grained application level tracing
Enables complex metadata to be recorded as the YARN app makes progress
Allows retrieval of this timeline data based on filters
Can be used to drive limited online analytics and extensive post-hoc analysis
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lineage Tracking using YARN Timeline
Timeline:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1464484887407_0013_1
dagContext: { callerId: "root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2",
callerType: "HIVE_QUERY_ID",
context: "HIVE",
description: "select user, count(visit_id) as visits from users group by user order by visits” }
Timeline:8188/ws/v1/timeline/HIVE_QUERY_ID/root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2
hiveContext: { callerId: “workflow_abcd",
callerType: “OOZIE_ID",
context: “OOZIE",
description: “Daily ETL Summary Job” }
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tracing and Analysis
Use Big Data methods to solve Big Data problems
Apache Zeppelin as analytical tool
Hive/Tez/YARN notebook for analysis