Pivot TracingDynamic Causal Monitoring for
Distributed Systems
Jonathan Mace, Ryan Roelke, Rodrigo Fonseca
Brown University
1
Pivot TracingDynamic Causal Monitoring for
Distributed Systems
Dynamically instrument live distributed systems
2
Pivot TracingDynamic Causal Monitoring for
Distributed Systems
Dynamically instrument live distributed systems
Correlate and group events across components
2
Pivot TracingDynamic Causal Monitoring for
Distributed Systems
Dynamically instrument live distributed systems
Correlate and group events across components
Hadoop Stack2
Disk
HDFS
Hadoop Stack3
Disk
HBase
HDFS
Hadoop Stack3
Disk
HBase MapReduce
HDFS
Hadoop Stack3
FSREAD4MFSREAD64M
Disk
HBase MapReduce
HDFS
Hadoop Stack4
FSREAD4MFSREAD64M
HSCANHGET
Disk
HBase MapReduce
HDFS
Hadoop Stack4
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
Hadoop Stack4
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
How is disk bandwidth being used?4
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
How is disk bandwidth being used?
DataNodeMetrics
5
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
How is disk bandwidth being used?
DataNodeMetrics
5
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
5
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
6
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
6
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
7
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
7
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
How is disk bandwidth being used?
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
8
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
How is disk bandwidth being used?
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Missed!
8
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
How is disk bandwidth being used?Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
IOStream
8
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
How is disk bandwidth being used?Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]
0
50
100
150
200
0 5 10 15
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]
0
50
100
150
200
0 5 10 15
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]0
50
100
150
200
0 5 10 15
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]
0
50
100
150
200
0 5 10 15
IOStream
8
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]
0
50
100
150
200
0 5 10 15
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]
0
50
100
150
200
0 5 10 15
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]0
50
100
150
200
0 5 10 15
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]
0
50
100
150
200
0 5 10 15
IOStream
9
MRSORT10G
MRSORT100GFSREAD4M
FSREAD64MHSCAN
HGET
Disk
HBase MapReduce
HDFS
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]
0
50
100
150
200
0 5 10 15
IOStream
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
9
Instrumentation is decided at development time
10
Instrumentation is decided at development time
Probably not have enough info for your problem
Probably too much irrelevant info for your problem
10
Instrumentation is decided at development time
Probably not have enough info for your problem
Probably too much irrelevant info for your problem
Should every user bear the cost of a feature?
10
Instrumentation is decided at development time
Probably not have enough info for your problem
Probably too much irrelevant info for your problem
Should every user bear the cost of a feature?
10
Instrumentation is decided at development time
Probably not have enough info for your problem
Probably too much irrelevant info for your problem
Should every user bear the cost of a feature?
10Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
Dynamic dependenciesYou often need to correlate information from different points in the system
11
Dynamic dependenciesYou often need to correlate information from different points in the system
Systems are designed to compose
Systems don’t embed monitoring that relates to other services
11
Dynamic dependencies
Netflix “Death Star” Microservices Dependencies@bruce_m_wong
You often need to correlate information from different points in the system
Systems are designed to compose
Systems don’t embed monitoring that relates to other services
11
Pivot Tracing
12
Pivot Tracing
You don’t know the questions in advance
Dynamic instrumentation
Fay (SOSP’11), Dtrace (ATC’04), …
12
Pivot Tracing
You don’t know the questions in advance
Dynamic instrumentation
Fay (SOSP’11), Dtrace (ATC’04), …
You often need to correlate information from different points in the system
Causal tracing
X-Trace (NSDI’07), Dapper (Google), Pip (NSDI’06), …
12
13
Pivot Tracing
13
Model system events as tuples in a streaming, distributed dataset
Pivot Tracing
13
Model system events as tuples in a streaming, distributed dataset
Dynamically evaluate relational queries over this dataset
Pivot Tracing
13
Model system events as tuples in a streaming, distributed dataset
Dynamically evaluate relational queries over this dataset
Join based on Lamport’s happened-before relation
Happened-before Join ( )
Pivot Tracing
Pivot TracingOverview
14
15
HBase
HDFS
MapReduce
DataNodeMetrics
15
HBase
HDFS
MapReduce
DataNodeMetrics
DataNodeMetrics.java
50 public class DataNodeMetrics {...
266 public void incrBytesRead(int delta) {267 ...268 }
...407 }
15
(“DataNodeMetrics”, delta=10, host=“hop01”, …)
HBase
HDFS
MapReduce
DataNodeMetrics
DataNodeMetrics.java
50 public class DataNodeMetrics {...
266 public void incrBytesRead(int delta) {267 ...268 }
...407 }
15
(“DataNodeMetrics”, delta=10, host=“hop01”, …)
HBase
HDFS
MapReduce
DataNodeMetrics
DataNodeMetrics.java
50 public class DataNodeMetrics {...
266 public void incrBytesRead(int delta) {267 ...268 }
...407 }
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
15
From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)
(“DataNodeMetrics”, delta=10, host=“hop01”, …)
HBase
HDFS
MapReduce
DataNodeMetrics
DataNodeMetrics.java
50 public class DataNodeMetrics {...
266 public void incrBytesRead(int delta) {267 ...268 }
...407 }
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
15
From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)
(“DataNodeMetrics”, delta=10, host=“hop01”, …)
HBase
HDFS
MapReduce
DataNodeMetrics
DataNodeMetrics.java
50 public class DataNodeMetrics {...
266 public void incrBytesRead(int delta) {267 ...268 }
...407 }
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
15
From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)
(“DataNodeMetrics”, delta=10, host=“hop01”, …)
HBase
HDFS
MapReduce
DataNodeMetrics
DataNodeMetrics.java
50 public class DataNodeMetrics {...
266 public void incrBytesRead(int delta) {267 ...268 }
...407 }
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
15
From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)
(“DataNodeMetrics”, delta=10, host=“hop01”, …)
HBase
HDFS
MapReduce
DataNodeMetrics
DataNodeMetrics.java
50 public class DataNodeMetrics {...
266 public void incrBytesRead(int delta) {267 ...268 }
...407 }
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
TracepointClass: DataNodeMetricsMethod: incrBytesReadExports: “delta”=delta
15
From incr In DataNodeMetrics.incrBytesReadGroupBy incr.hostSelect incr.host, SUM(incr.delta)
HBase
HDFS
MapReduce
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
15
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
DataNodeMetrics
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
16
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
DataNodeMetrics
ClientProtocols
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
16
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
DataNodeMetrics
Happened-before Join ( )
ClientProtocols
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
16
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
ClientProtocols
DataNodeMetrics
17
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
17
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
17
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
17
ClientProtocols
DataNodeMetrics
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
17
ClientProtocols
DataNodeMetrics
(“ClientProtocols”, procName=“HGET”, … “DataNodeMetrics”, delta=10, host=“Hop01”, …)
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
17
From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)
From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
18
From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
18
From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
18
From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
18
From incr In DataNodeMetrics.incrBytesReadJoin client In First(ClientProtocols) On client -> incrGroupBy client.procNameSelect client.procName, SUM(incr.delta)
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
put [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
HBase
HDFS
MapReduce
Happened-before Join ( )
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
(“DataNodeMetrics”, delta=10, host=“Hop01”, …)
(“ClientProtocols”, procName=“HGET”, …)
ClientProtocols
DataNodeMetrics
18
(procName=“HGET”, delta=10, …)
Design & ImplementationPivot Tracing Pre-requisites
19
Design & ImplementationPivot Tracing Pre-requisites
Dynamic instrumentation PT Agent
19
Design & ImplementationPivot Tracing Pre-requisites
Dynamic instrumentation
Causal tracing Baggage
PT Agent
19
Causal tracing Baggage19
Process
Tenant
Workload
Request
Causal tracing Baggage
Baggage is a Key:Value container propagated alongside a request
• Generalization of metadata in end-to-end tracing• One instance per request
20
A
B
Process
Tenant
Workload
Request
• Generalization of metadata in end-to-end tracing• One instance per request
Causal tracing Baggage
Baggage is a Key:Value container propagated alongside a request
21
A
B
Process
Tenant
Workload
Request
• Generalization of metadata in end-to-end tracing• One instance per request
Causal tracing Baggage
Baggage is a Key:Value container propagated alongside a request
21
A
B
PACK(clientName,“FSRead”)
clientName=“FSRead”
Process
Tenant
Workload
Request
• Generalization of metadata in end-to-end tracing• One instance per request
Causal tracing Baggage
Baggage is a Key:Value container propagated alongside a request
21
A
B
clientName=“FSRead”
Process
Tenant
Workload
Request
• Generalization of metadata in end-to-end tracing• One instance per request
Causal tracing Baggage
Baggage is a Key:Value container propagated alongside a request
21
A
B UNPACK(clientName)
clientName=“FSRead”
clientName=“FSRead”
Process
Tenant
Workload
Request
• Generalization of metadata in end-to-end tracing• One instance per request
Causal tracing Baggage
Baggage is a Key:Value container propagated alongside a request
21
Process
Tenant
Workload
Request
21
Pivot Tracing Enabled
+ Baggage
PT Agent+
Process
Tenant
Workload
Request
22
Design & ImplementationQueries
23
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
24
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
24
Places where PT can add instrumentation
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
Tracepoints
25
Places where PT can add instrumentation
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
Tracepoints
Tracepoint AClass: AMethod: A1()
Tracepoint BClass: BMethod: B1()Exports: “delta”=delta
25
Places where PT can add instrumentation
Export identifiers accessible to queriesDefaults: host, timestamp, pid, proc name
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
Tracepoints
Tracepoint AClass: AMethod: A1()
Tracepoint BClass: BMethod: B1()Exports: “delta”=delta
25
Places where PT can add instrumentation
Export identifiers accessible to queriesDefaults: host, timestamp, pid, proc name
Only references – not materialized until query is installed
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
Tracepoints
Tracepoint AClass: AMethod: A1()
Tracepoint BClass: BMethod: B1()Exports: “delta”=delta
25
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
Relational query language, similar to SQL, LINQ
Refers to tracepoint-exported identifiers
Query Language
• Selection• Projection• Filter
• GroupBy• Aggregation• Happened-Before Join Tracepoint A
Class: AMethod: A1()
Tracepoint BClass: BMethod: B1()Exports: “delta”=delta
26
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
PTRelational query language, similar to SQL, LINQ
Refers to tracepoint-exported identifiers
Query Language
• Selection• Projection• Filter
• GroupBy• Aggregation• Happened-Before Join Tracepoint A
Class: AMethod: A1()
Tracepoint BClass: BMethod: B1()Exports: “delta”=delta
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
26
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
PTRelational query language, similar to SQL, LINQ
Refers to tracepoint-exported identifiers
Output: stream of tuplese.g., (procName, delta)
Query Language
• Selection• Projection• Filter
• GroupBy• Aggregation• Happened-Before Join Tracepoint A
Class: AMethod: A1()
Tracepoint BClass: BMethod: B1()Exports: “delta”=delta
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
26
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
PT
Pivot TracingFront End
Query is compiled to advice(intermediate representation for instrumentation)
Advice
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
27
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
PT
Pivot TracingFront End
Query is compiled to advice(intermediate representation for instrumentation)
Advice will be installed at tracepoints
Advice
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
27
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
A
B
PT
Pivot TracingFront End
Query is compiled to advice(intermediate representation for instrumentation)
Advice will be installed at tracepoints
Limited instruction setOBSERVE
PACK
FILTER
UNPACK
EMIT
Advice
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
27
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
PT Agent dynamically enables advice at tracepoints
Weaving
B1
A1
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
28
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
procNameOBSERVE
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
procName
PACK
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
procName
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
procName
deltaOBSERVE
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
procName
deltaprocName
UNPACK
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
procName
deltaprocName
EMIT
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1delta
procName
Baggage explicitly follows execution
Evaluated inline during a request(no global aggregation needed)
Evaluating
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
Query Results
Tuples are accumulated locally in PT Agent
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
29
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
A
B
Query Results
Tuples are accumulated locally in PT Agent
Periodically reported back to usere.g., every second
PT Agent
PT Agent
Instrumented System (+ Baggage, PT Agent)+
PT
Pivot TracingFront End
A1
B1
29
Pivot TracingEvaluation
30
Java-Based Implementation
31
Java-Based Implementation
PT agent thread that runs inside each process• Javassist for dynamic instrumentation
• PubSub to receive commands / send tuples
PT Agent
31
Java-Based Implementation
PT agent thread that runs inside each process• Javassist for dynamic instrumentation
• PubSub to receive commands / send tuples
Baggage library for use by instrumented system• Data format specified using Protocol Buffers
PT Agent
31
Java-Based Implementation
PT agent thread that runs inside each process• Javassist for dynamic instrumentation
• PubSub to receive commands / send tuples
Baggage library for use by instrumented system• Data format specified using Protocol Buffers
Front-end client library• Define tracepoints and write text queries
• Compile queries to advice
• Submit advice to PT agents
PT Agent
A1
B1
A
B
Pivot TracingFront End
31
Pivot Tracing Enabled (+ Baggage, PT Agent)+
HBase YARN ZooKeeper
HDFS MapReduce
32
Pivot Tracing Enabled (+ Baggage, PT Agent)+
HBase YARN ZooKeeper
HDFS MapReduce
Adding Baggage: ~50-200 lines of code per system
32
Pivot Tracing Enabled (+ Baggage, PT Agent)+
HBase YARN ZooKeeper
HDFS MapReduce
Adding Baggage: ~50-200 lines of code per system
Primarily modifying execution boundaries:Thread, Runnable, Callable, Queue
RPC invocations
32
Pivot Tracing Enabled (+ Baggage, PT Agent)+
HBase YARN ZooKeeper
HDFS MapReduce
Adding Baggage: ~50-200 lines of code per system
Primarily modifying execution boundaries:Thread, Runnable, Callable, Queue
RPC invocations
32
Pivot Tracing Overheads
• Pivot Tracing EnabledApplication level benchmarks: baseline 0.3% overhead
(+ Baggage, PT Agent)+
33
Pivot Tracing Overheads
• Pivot Tracing EnabledApplication level benchmarks: baseline 0.3% overhead
• No overhead for queries / tracepoints until installed
(+ Baggage, PT Agent)+
33
Pivot Tracing Overheads
• Pivot Tracing EnabledApplication level benchmarks: baseline 0.3% overhead
• No overhead for queries / tracepoints until installed
• With queries from paper installedApplication level benchmarks: max 14.3% overhead
(CPU-only lookups)
(+ Baggage, PT Agent)+
33
Pivot Tracing Overheads
• Pivot Tracing EnabledApplication level benchmarks: baseline 0.3% overhead
• No overhead for queries / tracepoints until installed
• With queries from paper installedApplication level benchmarks: max 14.3% overhead
(CPU-only lookups)
Largest baggage size: ~137 bytes
(+ Baggage, PT Agent)+
33
Experiments
34
Experiments
1. Monitoring queries with various groupings
Time [min]
0
50
100
150
200
0 5 10 15
HD
FS
Th
rou
gh
pu
t [M
B/s
] Host A Host EHost B Host FHost C Host GHost D Host H
Time [min]
HD
FS
Th
rou
gh
pu
t [M
B/s
]
0
50
100
150
200
0 5 10 15
MRSORT100G HSCANMRSORT10G HGET
FSREAD4MFSREAD64M
MapTaskShuffleHandlerReduceTask
DataNode
0
50
100
150
200
0 5 10 15Time5[min]
Dis
k5T
hro
ug
hp
ut5[M
B/s
]
0
50
100
150
200
0 5 10 15
34
1. Monitoring queries with various groupings
2. Decomposing request latencies
Experiments
35
1. Monitoring queries with various groupings
2. Decomposing request latencies
3. Debugging recurring problems
Experiments
36
1. Monitoring queries with various groupings
2. Decomposing request latencies
3. Debugging recurring problems
Experiments
36
HDFS NameNode
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File
FileFile File
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File
File 2 3 5
FileFile File
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File 2 3 5
FileFile File
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File 2 3 5
FileFile File
Client
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File 2 3 5
FileFile File
Client FileGetBlockLocations
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File 2 3 5
FileFile File
Client
File 2 3 5
GetBlockLocations
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File 2 3 5
FileFile File
Client
File 2 3 5
GetBlockLocations
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File 2 3 5
FileFile File
Client
File
GetBlockLocations
DataTransferProtocol
37
HDFS NameNode
HDFS DataNode
HDFS DataNode
HDFS DataNode
HDFS DataNode DataNode
Replicated block storage
File 2 3 5
FileFile File
Client
File
GetBlockLocations
DataTransferProtocol
37
Client Workload Generator• Randomly read from large dataset
HDFS DataNode
8 Worker Hosts
Host A Host B Host C Host D
Host E Host F Host G Host H
38
Client Workload Generator• Randomly read from large dataset
HDFS DataNode
8 Worker Hosts
+ HDFS NameNode
Host A Host B Host C Host D
Host E Host F Host G Host H
38
Same machines, same processes, same workloads
39
Same machines, same processes, same workloads
I expected uniform throughput from workload generators
39
Same machines, same processes, same workloads
I expected uniform throughput from workload generators
39
Same machines, same processes, same workloads
I expected uniform throughput from workload generators
I expected uniform throughput on DataNodes
39
Same machines, same processes, same workloads
I expected uniform throughput from workload generators
I expected uniform throughput on DataNodes
39
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
40
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
HDFS NameNode
GetBlockLocations
From blockLocations In NameNode.GetBlockLocationsGroupBy blockLocations.fileNameSelect blockLocations.fileName, COUNT
40
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
HDFS NameNode
GetBlockLocations
From blockLocations In NameNode.GetBlockLocationsGroupBy blockLocations.fileNameSelect blockLocations.fileName, COUNT
Freq
ue
ncy
Number of times accessed
40
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT
HDFS NameNode
GetBlockLocations
ClientDoRandomRead
Freq
ue
ncy
Number of times accessed
40
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT
HDFS NameNode
GetBlockLocations
ClientDoRandomRead
Freq
ue
ncy
Number of times accessed
40
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT
HDFS NameNode
GetBlockLocations
ClientDoRandomRead
Freq
ue
ncy
Number of times accessed
40
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT
HDFS NameNode
GetBlockLocations
ClientDoRandomRead
Freq
ue
ncy
Number of times accessed
40
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT
HDFS NameNode
GetBlockLocations
ClientDoRandomRead
40
It’s probably a bug in the workload generator I wrote
My hypothesis: Workload generator is not randomly looking up files
From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.fileNameSelect cl.host, blockLocations.fileName, COUNT
HDFS NameNode
GetBlockLocations
ClientDoRandomRead
40
Maybe skewed DataNode throughput is because some DataNodes store more files than others
41
Maybe skewed DataNode throughput is because some DataNodes store more files than others
How often was each DataNode a replica host?
41
Maybe skewed DataNode throughput is because some DataNodes store more files than others
HDFS NameNode
GetBlockLocations
From blockLocations In NameNode.GetBlockLocationsGroupBy blockLocations.replicasSelect blockLocations.replicas, COUNT
How often was each DataNode a replica host?
41
Maybe skewed DataNode throughput is because some DataNodes store more files than others
HDFS NameNode
GetBlockLocations
From blockLocations In NameNode.GetBlockLocationsGroupBy blockLocations.replicasSelect blockLocations.replicas, COUNT
Replica Location
Co
un
t
How often was each DataNode a replica host?
41
Maybe skewed DataNode throughput is because some DataNodes store more files than others
From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.replicasSelect cl.host, blockLocations.replicas, COUNT
HDFS NameNode
GetBlockLocations
ClientDoRandomRead
Replica Location
Co
un
t
How often was each DataNode a replica host?
41
Maybe skewed DataNode throughput is because some DataNodes store more files than others
From blockLocations In NameNode.GetBlockLocationsJoin cl In Client.DoRandomRead On cl -> blockLocationsGroupBy cl.host, blockLocations.replicasSelect cl.host, blockLocations.replicas, COUNT
HDFS NameNode
GetBlockLocations
ClientDoRandomRead
How often was each DataNode a replica host?
Replica Location
Clie
nt
41
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
42
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
42
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
42
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
42
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
When a file is read from a DataNode, where else could it have been read from?
42
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
When a file is read from a DataNode, where else could it have been read from?
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
42
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
When a file is read from a DataNode, where else could it have been read from?
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
42
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
When a file is read from a DataNode, where else could it have been read from?
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
42
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
When a file is read from a DataNode, where else could it have been read from?
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
42
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
When a file is read from a DataNode, where else could it have been read from?
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
42
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
Conclusions so far:Clients are selecting files uniformly at randomFiles are distributed across DNs uniformly at random
Hypothesis: choice of replica isn’t random?
When a file is read from a DataNode, where else could it have been read from?
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
42
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
42
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
When both and host replicas,
Clients choose this often: (~50%)
Clients choose this often: (~50%)
43
Clients choose this often: (100%)
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
When both and host replicas,
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
43
Clients choose this often: (100%)
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
When both and host replicas,
Clients choose this often: (0%)
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
43
Clients choose this often: (100%)
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
When both and host replicas,
Clients choose this often: (0%)
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
43
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
When both and host replicas,Clients choose this often: (0%)
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
43
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
When both and host replicas,Clients choose this often: (0%)
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
43
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
When both and host replicas,
Clients choose this often: (100%)
Clients choose this often: (0%)
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
43
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
When both and host replicas,
Clients choose this often: (100%)
Clients choose this often: (0%)
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
43
ClientDoRandomRead
HDFS NameNode
GetBlockLocations
HDFS DataNode
DataTransferProtocol
When both and host replicas,
Clients choose this often: (100%)
Clients choose this often: (0%)
From readBlock In DataNode.DataTransferProtocolJoin blockLocations In NameNode.GetBlockLocations
On blockLocations -> readBlockJoin cl In Client.DoRandomRead
On cl -> blockLocationsWhere cl.host != readBlock.hostGroupBy blockLocations.replicas, readBlock.hostSelect blockLocations.replicas, readBlock.host, COUNT
43
44
• Lack of randomization skewed workload toward certain DNs
44
• Lack of randomization skewed workload toward certain DNs
• Independently discovered. Fixed in HDFS 2.5
44
• Lack of randomization skewed workload toward certain DNs
• Independently discovered. Fixed in HDFS 2.5
• Seamlessly add correlations between multiple components• Very specific, one-off metrics• This experiment: 1.5% application-level overhead
44
Pivot TracingDynamic Causal Monitoring for Distributed Systems
45
Pivot Tracing
Happened-Before Join
Dynamic Causal Monitoring for Distributed Systems
45
Pivot Tracing
Dynamic Instrumentation Causal Tracing
Happened-Before Join
Dynamic Causal Monitoring for Distributed Systems
45
Pivot Tracing
Acceptable overheads for production (we think)
Dynamic Instrumentation Causal Tracing
Happened-Before Join
Dynamic Causal Monitoring for Distributed Systems
45
Pivot Tracing
Acceptable overheads for production (we think)
Dynamic Instrumentation Causal Tracing
Happened-Before Join
Standing basic queries Potential to dig deeper
Dynamic Causal Monitoring for Distributed Systems
45
Pivot Tracing
Acceptable overheads for production (we think)
Dynamic Instrumentation Causal Tracing
Happened-Before Join
Standing basic queries Potential to dig deeper
Dynamic Causal Monitoring for Distributed Systems
Jonathan Mace Ryan Roelke Rodrigo Fonseca
Tracepoint AClass: AMethod: A1()
Tracepoint BClass: BMethod: B1()Exports: “delta”=delta
From a In AJoin b In B On a -> bGroupBy a.procNameSelect a.procName, SUM(b.delta)
Advice A1OBSERVE procNamePACK procName
A
Advice B1OBSERVE deltaUNPACK procNameEMIT procName, SUM(delta)
B
46