Accurate and Efficient Filtering for the Intel Thread Checker Race Detector
By Paul Sack, Brian E. Bliss, Zhiqiang Ma, Paul Petersen, Josep Torrellas
04/20/23
OS Lab Ok-Kyoon Ha2006 ACM
SBMP06 2
Motivation
debugging data races is a difficult task
detector has two common types of algorithms - Lockset-based algorithm & Vector clock-based algorithm
data race-detection tools- have reasonable overheads (2x slowdowns)
- do not provide much useful information or have limited usage models
Intel Thread Checker
- provide an abundance of useful information and have few usage constrains
- have high performance costs (233x slowdowns)
SBMP06 3
Overheads of Intel’s Thread Checker
- instrumentation alone: slowdown of 22x
- full algorithm: slowdown of 233x
- memory overhead: imposes a 20x
SBMP06 4
Approach
Objective- to reduce the amount of work done by the algorithm
Filtering useless references
SBMP06 5
Three Filters (1/3)
Stack Filter- filter if one thread accesses another’s stack
- cannot cause data races to be lost and is very efficient
Implementation Issues of Stack Filter- the simplest filter and has the lowest overhead
- compares the memory reference address with the stack base and limit address
SBMP06 6
Three Filters (2/3)
Duplicate Filter- maintain the first load and store references to a variable in each segments
- filter duplicate references in segments
- can only cause Thread Checker to lose duplicate data races
Implementation Issues of Duplicate Filter- slower than the stack filter
- maintains filter tables that organized 4 fields
add size type ID
add size type ID
T1
T2
SBMP06 7
Three Filters (3/3) FSM Filter
- base the Eraser state machine
- filter reference in the Private state and in the Shared Read Only state
- filter the initial references (Uninit → Private, Private → SHD RO)
R, WR
R1, W1
UNINIT PRIVATE
SHR RW SHR RO
Eraser state machine
R1, W1
W
W’R’
SBMP06 8
Experimental Setup
Environments- 4-way 2.5GHz Pentium 4 workstation
- use the SPLASH-2 applications
- run with 4 threads on 4 processors
Measurements- filtering statistics are collected by running each application three times
- performance results are collected by running each application nine times
- each application is run in Thread Checker with and without three filters
- compare the number of data-race bugs reported with and without the filters
SBMP06 9
Filtering Effectiveness
Different filter combinations Incremental filtering effectiveness
SBMP06 10
Performance
Speedups obtained with filtering
SBMP06 11
Data-race Detection
Characterizing the impact of the three filers combined
SBMP06 12
Conclusions and Future Work
Conclusion- Intel Thread Checker slowdown of 233x on average
- filtering out the vast majority of memory references
- develop three filters that filter 98% of all memory references
- speedups of 3.3x on average
Future Work- improve the FSM filter
- to improve the other overhead sources in Thread Checker