+ All Categories
Home > Documents > StackMine Performance Debugging in the Large via Mining...

StackMine Performance Debugging in the Large via Mining...

Date post: 19-Sep-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
28
StackMine – Performance Debugging in the Large via Mining Millions of Stack Traces Shi Han 1 , Yingnong Dang 1 , Song Ge 1 , Dongmei Zhang 1 , and Tao Xie 2 Software Analytics Group, Microsoft Research Asia 1 North Carolina State University 2 June 6 th , 2012
Transcript
Page 1: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

StackMine – Performance Debugging in the Large via Mining Millions of Stack Traces

Shi Han1, Yingnong Dang1, Song Ge1, Dongmei Zhang1, and Tao Xie2

Software Analytics Group, Microsoft Research Asia1 North Carolina State University2

June 6th, 2012

Page 2: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Performance Issues in the Real World

2

• One of top user complaints

• Impacting large number of users every day

• High impact on usability and productivity

High Disk I/O High CPU consumption

Given limited time and resource before software release, development-site testing and debugging become insufficient to

ensure satisfactory software performance.

ICSE 2012

Page 3: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Performance Debugging in the Large

3

Pattern Matching

Trace StorageTrace collection

Bug update

Problematic Pattern Repository

Bug DatabaseNetwork

Trace analysis

How many issues are still unknown?

Which trace file should I investigate first?

Bug filingKey to issue

discovery

Bottleneck of scalability

ICSE 2012

Page 4: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Problem Definition

Input: Runtime traces

collected from millions of users

Output: Program execution patterns causing the most impactful performance problems

inspected by performance analysts

4ICSE 2012

Page 5: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Goal

Conduct systematic discovery & analysis of program execution patterns

• Efficient handling of large-scale trace sets

• Automatic discovery of new patterns

• Effective prioritization of investigation

5ICSE 2012

Page 6: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Challenges

6

Combination of expertise• Generic machine learning tools without domain

knowledge guidance do not work well

Highly complex analysis• Numerous program runtime combinations

triggering performance problems• Multi-layer runtime components from application

to kernel being intertwined

Large-scale trace data• TBs of trace files and increasing• Millions of events in single trace stream

Internet

ICSE 2012

Page 7: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

What happens behind a typical UI-delay? An example of delayed browser tab creation -

Intuition

7

Intuition

ReadyThread Callstacks

Wait Callstacks

CPU Sampled Callstacks

CPU Wait Ready CPUWaitCPUUI thread Ready

Time

Wait callstackntdll!UserThreadStartBrowser! Main…

ntdll!LdrLoadDll…

nt!AccessFaultnt!PageFault…

Wait callstackntdll!UserThreadStartBrowser! Main…

Browser!OnBrowserCreatedAsyncCallback…

BrowserUtil!ProxyMaster::GetOrCreateSlaveBrowserUtil!ProxyMaster::ConnectToObject…

rpc!ProxySendReceive…wow64!RunCpuSimulationwow64cpu!WaitForMultipleObjects32wow64cpu!CpupSyscallStub…

ReadyThread callstacknt!KiRetireDpcList

nt!ExecuteAllDpcs…

nt!IopfCompleteRequest…nt!SetEvent…

Underlying Disk I/O

Worker thread Ready CPU

Unexpected long execution

CPU sampled callstackntdll!UserThreadStart…

Ntdll!WorkerThread…

ole!CoCreateInstance…

ole!OutSerializer::UnmarshalAtIndexole!CoUnmarshalInterface…

CPU sampled callstackntdll!UserThreadStart…

ntdll!WorkerThread…

ole!CoCreateInstance…

ole!OutSerializer::UnmarshalAtIndexole!CoUnmarshalInterface…

CPU sampled callstackntdll!UserThreadStart…

ntdll!WorkerThread…

ole!CoCreateInstance…

ole!OutSerializer::UnmarshalAtIndexole!CoUnmarshalInterface…

ReadyThread callstackntdll!UserThreadStart…

rpc!LrpcIoComplete…

user32!PostMessage…

win32k!SetWakeBitnt!SetEvent…

ICSE 2012

Page 8: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Formulated as a callstack mining and clustering problem

Approach

8

Problematic program execution patterns

Callstack patternsPerformance

Issues

Caused by

Discovered by mining & clustering costly patterns

Mainly represented by

ICSE 2012

Page 9: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

ICSE 2012

Approach – Workflow

9

Callstacks

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

ntdll!OpenFile

Callstack patterns

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!AccessFault

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!PageRead

709

412

259

Pattern clusters

ntdll!OpenFile

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!PageRead

nt!AccessFault

1392

709

259

412

Cluster Hits Total Wait time (ms)

1 94,279 927,824

2 51,107 561,416

3 35,536 3,051,307

… …

Ranked cluster

123

Trace streams

AOI…

AOI Extraction

Sequence Pattern Mining

Pattern Clustering

AOI Extraction

Sequence Pattern Mining

PatternClustering

Ranking

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- KernelBase!LoadLibrary

| | | | | |- ……

| | | | | | |- ntdll!OpenFile

| | | | | | | |- ……

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- kernel32!LoadLibrary

| | | | | |- ……

| | | | | | |- nt!PageRead

| | | | | | | |- ……

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- kernel32!LoadLibrary

| | | | | |- KernelBase!LoadLibrary

| | | | | | |- ……

| | | | | | | |- nt!Trap

| | | | | | | | |- nt!AccessFault

| | | | | | | | | |- ……

Ranking

Page 10: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

ICSE 2012

Approach – AOI Extraction

10

Callstacks

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

ntdll!OpenFile

Callstack patterns

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!AccessFault

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!PageRead

709

412

259

Pattern clusters

ntdll!OpenFile

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!PageRead

nt!AccessFault

1392

709

259

412

Cluster Hits Total Wait time (ms)

1 94,279 927,824

2 51,107 561,416

3 35,536 3,051,307

… …

Ranked cluster

123

Trace streams

AOI…

AOI Extraction

Sequence Pattern Mining

PatternClustering

Ranking

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- KernelBase!LoadLibrary

| | | | | |- ……

| | | | | | |- ntdll!OpenFile

| | | | | | | |- ……

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- kernel32!LoadLibrary

| | | | | |- ……

| | | | | | |- nt!PageRead

| | | | | | | |- ……

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- kernel32!LoadLibrary

| | | | | |- KernelBase!LoadLibrary

| | | | | | |- ……

| | | | | | | |- nt!Trap

| | | | | | | | |- nt!AccessFault

| | | | | | | | | |- ……

Page 11: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Approach – AOI ExtractionMotivation

Runtime traces capture both

• Relevant executions for performance issue

– E.g., executions relevant to browser-tab creation

• Irrelevant executions for performance issues

– E.g., executions of concurrently executed IM

– Noisy data for mining

– Huge investigation scope induced

ICSE 2012 11

Page 12: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Approach – AOI ExtractionCritical Path

12

CPU Wait Ready CPUWaitCPUUI thread Ready

Time

Disk I/O

CPUReady Wait Ready CPUWorker

thread 1

Ready CPUWorker

thread 2

CPU(UI) CPU(WT1) CPU(WT2) CPU(WT1) CPU(UI) Disk I/O Ready CPUCritical

path

Scenario start Scenario finish

ICSE 2012

Page 13: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Approach – AOI ExtractionWait Graph

ICSE 2012 13

CPU Wait CPUReadyUI thread

Ready CPU Wait Ready CPU Wait Ready CPUWorker

thread 1

CPUWorker

thread 2Execution irrelevant to Worker thread 1’s waits

Time

Scenario start Scenario finish

Disk I/O

Critical

path

Wait Graph

Page 14: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

ICSE 2012

Approach – Callstack Pattern Mining

14

Callstacks

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

ntdll!OpenFile

Callstack patterns

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!AccessFault

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!PageRead

709

412

259

Pattern clusters

ntdll!OpenFile

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!PageRead

nt!AccessFault

1392

709

259

412

Cluster Hits Total Wait time (ms)

1 94,279 927,824

2 51,107 561,416

3 35,536 3,051,307

… …

Ranked cluster

123

Trace streams

AOI…

AOI Extraction

Sequence Pattern Mining

PatternClustering

Ranking

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- KernelBase!LoadLibrary

| | | | | |- ……

| | | | | | |- ntdll!OpenFile

| | | | | | | |- ……

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- kernel32!LoadLibrary

| | | | | |- ……

| | | | | | |- nt!PageRead

| | | | | | | |- ……

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- kernel32!LoadLibrary

| | | | | |- KernelBase!LoadLibrary

| | | | | | |- ……

| | | | | | | |- nt!Trap

| | | | | | | | |- nt!AccessFault

| | | | | | | | | |- ……

Page 15: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Approach – Callstack Pattern MiningFrequent Sequence Pattern

Sequence Occurrence

A B 4

A B C D F 1

A B C E F 3

A B G F 2

All Patterns Support

A, B, AB 10

F, AF, BF, ABF 6

Closed Patterns Support

A, B, AB 10

F, AF, BF, ABF 6

Maximal Patterns Support

A, B, AB 10

F, AF, BF, ABF 6

• Non-consecutive frequent sub-sequence as callstack pattern

• Frequent maximal patterns are compact

• Example

Frequent sequence pattern miner

Frequency threshold 𝑇

5 (or 50% of total)

• Non-consecutive costly sub-sequence as callstack pattern

• Costly maximal patterns are compact

Cost threshold 𝑇

5 ms (or 50% of total)

Sequence Wait time

A B 4 ms

A B C D F 1 ms

A B C E F 3 ms

A B G F 2 ms

Costly sequence pattern miner

Approach – Callstack Pattern MiningCostly Sequence Pattern

ICSE 2012

Page 16: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

ICSE 2012

Approach – Callstack Pattern Clustering

16

Callstacks

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

ntdll!OpenFile

Callstack patterns

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!AccessFault

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!PageRead

709

412

259

Pattern clusters

ntdll!OpenFile

stobject!WakeUp_DeviceMonitor

KernelBase!LoadLibrary

nt!PageRead

nt!AccessFault

1392

709

259

412

Cluster Hits Total Wait time (ms)

1 94,279 927,824

2 51,107 561,416

3 35,536 3,051,307

… …

Ranked cluster

123

Trace streams

AOI…

AOI Extraction

Sequence Pattern Mining

PatternClustering

Ranking

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- KernelBase!LoadLibrary

| | | | | |- ……

| | | | | | |- ntdll!OpenFile

| | | | | | | |- ……

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- kernel32!LoadLibrary

| | | | | |- ……

| | | | | | |- nt!PageRead

| | | | | | | |- ……

- ntdll!UserThreadStart

|- ……

| |- ……

| | |- user32!InternalCallWinProc

| | | |- stobject!WakeUp_DeviceMonitor

| | | | |- kernel32!LoadLibrary

| | | | | |- KernelBase!LoadLibrary

| | | | | | |- ……

| | | | | | | |- nt!Trap

| | | | | | | | |- nt!AccessFault

| | | | | | | | | |- ……

Page 17: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Approach – Callstack Pattern Clustering

• Motivation

– Same issue often reflected by variant patterns

– Defect often hidden in invariant parts of variant patterns

• Goal

– Precise measurement of issue impact for better prioritization

– Comprehensive issue representation with pattern variations for quick and precise fixing

17ICSE 2012

Page 18: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Approach – Callstack Pattern ClusteringSimilarity Model

18

App_main

InitComponents

GetHashCode

GetShortPathName

……

……

MmAccessFault

SwapKernelStack

MiIssueHardFault

IoPageRead

……

App_main

InitComponents

GetHashCode

GetShortPathName

……

……

MmAccessFault

ExpandKernelStack

MiIssueHardFault

IoPageRead

……

wow64Service

wow64System

wow64QueryAttr

……

Match

Insertion/Deletion

Substitution

Match

0. A

lignm

ent b

ased

on

edit d

istance m

od

el

1. Common-purpose function: weight↓𝑈𝑛𝑖() is small

2. Special-purpose function: weight↑ 𝑈𝑛𝑖() is large

3. Variant part representing non-essential factors

4. Similar names implying relevant functionalities

SwapKernelStack ExpandKernelStack

5. Constant call path: weight↓ 𝐹𝐵𝑖() + 𝐵𝐵𝑖() is small

ICSE 2012

Page 19: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Technical Highlights

• Machine learning for system domain– Formulate the discovery of problematic execution

patterns as callstack mining & clustering

– Systematic mechanism to incorporate domain knowledge

• Interactive performance analysis system– Parallel mining infrastructure based on HPC + MPI

– Visualization aided interactive exploration

19ICSE 2012

Page 20: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Evaluation – Windows 7 Study

• Task: since Dec 2010, a continued effort to improve Windows performance– Analysts from one performance analysis team for

Microsoft Windows– Hunt for hidden performance bugs that caused common

impact on Windows Explorer UI response– Based on over 6,000 trace streams

• Data– 921 qualified out of 1,000 randomly sampled trace streams– 181 million callstacks in total

• 140 million wait callstacks• 41 million CPU sampled callstacks

ICSE 2012 20

Page 21: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Evaluation – Windows 7 StudyResearch Questions

• RQ1. How much does StackMine improvepractices of performance debugging in the large?

• RQ2. How well do the derived performance signatures capture performance bottlenecks?

• RQ3. How much does StackMine outperformalternative techniques?

ICSE 2012 21

Page 22: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Evaluation – Windows 7 StudyRQ1. Overall Improvement of Practices

• Traditional approach – would take 20~60 days

• Using StackMine – 18 hours

ICSE 2012 22

140 million callstacks

689 thousand callstacks

2,239 costly patterns

1,251 pattern clusters

Top 400 pattern clusters

93 performance signatures

12 highly impactful bugs

AOIExtraction

CallstackPattern Mining

PatternClustering

PatternRanking

10 hours of automatic computation

Human analyst confirmation

8 hours of one human analyst’s review

Feature teamconfirmation

Page 23: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Evaluation – Windows 7 StudyRQ2. Performance Bottleneck Coverage

• Performance Bottleneck Coverage (PBC) of a set of performance signatures

• The higher PBC achieved, the lower possibility that high-impact performance bugs remain not captured

• 58.26% PBC achieved by the 93 signatures

ICSE 2012 23

𝑃𝐵𝐶 =𝑇𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒𝑠

𝑇𝑜𝑡𝑎𝑙 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑎𝑐𝑒 𝑠𝑡𝑟𝑒𝑎𝑚𝑠

Page 24: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Evaluation – Windows 7 StudyRQ3. Comparison with Alternative Techniques

StackMine requires only 7.2%, 5.8%, and 6.3% of trace streams required by the other three techniques

ICSE 2012 24

193

238220

140

50

100

150

200

250

20% 30% 40% 50% 60%

Nu

mb

er

of

req

uir

ed

tra

ce

stre

ams

to in

vest

igat

e

Performance bottleneck coverage (%)

Baseline-Random

Greedy-Total

Greedy-Max

StackMine

Page 25: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Impact

25

“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.”

- from Development Manager in Windows

Highly effective new issue discovery on

Windows mini-hang

Continuous impact on future Windows versions

ICSE 2012

Page 26: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Conclusion

• The first formulation and real-world deployment of performance debugging in the large

– as a data mining problem on callstacks

• A mining-clustering mechanism for reducing costly-pattern mining results

– based on domain-specific characteristics of callstacks

• Industrial impact on using StackMine in performance debugging in the large for Microsoft Windows

26ICSE 2012

Page 27: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Acknowledgment

• Our partners in Microsoft product teams

• The researchers from Microsoft Research

27ICSE 2012

Page 28: StackMine Performance Debugging in the Large via Mining …sei.pku.edu.cn/conference/ishcs/2013/StackMine-han.pdf · 2016. 12. 14. · StackMine –Performance Debugging in the Large

Q&A

Thank you!

ICSE 2012 28


Recommended