Post on 23-Feb-2016
description
transcript
Improving Bloom Filter Configuration for Lazy Transactional Memory
Mark Jeffrey and J. Gregory SteffanECE, University of Toronto
November 10, 2011
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 2
Parallel Programming is Hard
T1
Rd(a)
Rd(b)
Wr(a)
T2
Rd(a)
Wr(c)
Rd(a)
T3
Rd(x)
Rd(a)
Tools offload some burden of managing data accesses:– Memory Race Replay– Atomicity Violation Survival– Transactional Memory– Speculative Optimizations
Many tools are using Bloom filters
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 3
Bloom Filter
• Bit-vector-based data structure [1970]– offers fast set operations– in exchange for some imprecision
• Recently used to compare memory accesses• With unconventional practices: Intersection
&
We show new practices are inefficient!(in theory and empirically)
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 4
Bloom Filters in Concurrency ToolsSystem Year ApplicationBulk 2006 Hardware TMBulkSC 2007 Memory ConsistencyHARD 2007 Race DetectionDeLorean 2008 Deterministic Race ReplaySoftSig 2008 Code Analysis/Optimization/DebugRingSTM 2008 Software TMSigRace 2009 Race DetectionColorSafe 2010 Atomicity ViolationInvalSTM 2010 Software TMAdapSig 2010 Software TMSvS 2011 Auto-protection of shared state
Our propositions will improve parallelism!
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 5
Tracking Address-Set Conflicts
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 6
Address-Sets
T1
Rd(a)
Rd(b)
Wr(a)
T2
Rd(a)
Wr(c)
Rd(a)
T3
Rd(x)
Rd(a)
Read Set:• memory locations read• RT1 = {a,b}
Write Set:• memory locations written• WT1 = {a}
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 7
Burden: Address-Set Conflicts
T1
Rd(a)
Rd(b)
Wr(a)
T2
Rd(a)
Wr(c)
Rd(a)
T3
Rd(x)
Rd(a)
Conflicts– address accesses are dependent– independence -> parallelism!– address conflicts -> no parallelism
Conflict Detection requires – read and write set comparison
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 8
Test address-sets for null-intersections
Detect conflicts at the end of a transaction
Lazy Conflict Detection
R1={a,c}W1={b}
T1 T2
Wr(b)--Rd(a)Rd(a)-
Rd(c)- -Rd(b)
?021 RW
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 9
Bloom Filters (BF)
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 10
Bloom Filter Background
• Bloom filter is a compact set representation– bit vector - much smaller than address space
x
h()
xS )BF(
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 11
Bloom Filter Background
y h()?)BF(Sy
{Yes, No}
Query for an address, y
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 12
Bloom Filter False Positives (FPs)
• Encode a large address space into a bit-vector – response to query is actually No or Maybe
• False Positives – when “maybe” is wrong
is y in ?
x y
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 13
Partitioned Bloom Filter
Insert an address, x:– k hash functions encode k bit indices to set
x
h1() h2() hk()…
…
xS )BF(
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 14
Probability of False Positives is well understood
Query for an address, y:
Partitioned Bloom Filter
y
h1() h2() hk()…
…
{Maybe, No}
?)BF(Sy
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 15
UnconventionalBloom Filter Null-Intersection Tests
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 16
Two existing approaches:1. build a Queue of Queries (QoQ)
2. combine queries into distinct Bloom filter– replace many queries with 1 intersection!
Bloom Filter Null-Intersection Tests
a2a3a4a5 a1 ?
?
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 17
Do two sets share any elements?
Partitioned BF Intersection
…
?021 SS
…& …
{Disjoint, Maybe Overlap}
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 18
Any asserted bits indicate set overlap
Unpartitioned BF Intersection
…
?021 SS
…& …
{Disjoint, Maybe Overlap}
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 19
Imprecision in BF Intersection
• Bloom filter was intended for fast Querying
• Recent systems use filter for Intersection– Imprecision can produce False Set-Overlaps (FSO)– We are the first to study Bloom filter FSOs– Our goal is to
Understand and improve Bloom filter intersection
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 20
Important Questions
When using BFs for testing null-intersection1. How do BF Intersection and QoQ compare?– theoretical study [SPAA ‘11]
2. Can we compromise? – new Bloom filter design
3. Does theory work in practice? – empirical study
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 21
1. How do BF Intersection and QoQ compare?
Bloom Filters for Null-Intersection Tests
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 22
Definitions
sets access addressdisjoint ,BA
bits m
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 23
Definitions
h1() h2() hk()……
partitions k
sets access addressdisjoint ,BA
bits m
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 24
• Unpartitioned BF Intersection
• Partitioned BF Intersection
• Queue of BF Queries
BAkmUnpartp
2111
Probability of FSO [SPAA ‘11]h1 h2 hk…
h1 h2 hk…
kBA
mk
Partp 11
BkA
mk
QoQp 1111b2b3b4b5 b1 ϵ?
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 25
For any length m, and k > 1 hash functions,
nedUnpartitiodPartitioneQoQ ppp
Queue of Queries gives the fewest false conflictsPartitioned intersection improves on Unpartitioned
Comparing FSOs [SPAA ’11]
b2b3b4 b1 ϵ?
h1 hk… h1 hk…
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 26
2. Can we compromise? A new Bloom filter design
Bloom Filters for Null-Intersection Tests
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 27
Batch-of-Bloom-filters (BoB)
…
x hpre
x
…
h1 hk…
…
…h1 hk
xS )BoB(
…
…h1 hk
bSSSS 21
)BF( 1S )BF( 2S )BF( bS)BF(S
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 28
{Disjoint, Maybe Overlap}
BoB Intersection
&…
…
……
…
…
…
?021 SS
BoB: compromise between QoQ and Intersect
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 29
3. Does theory work in practice?Bloom Filters for Null-Intersection Tests
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 30
Methodology
• Augment RingSTM with alternate BF configs[Spear et al. SPAA ’08]– unpartitioned Bloom filter intersection
• Stress BF configurations using STAMP bench
• 8-core Intel Xeon with SSE2 ISA– 32-bit Linux 2.6.32-5-686
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 31
QoQ, BoB, part. intersect outperform baseline
Performance Results: LabyrinthExecution Time Aborts
21% Speedup
Better
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 32
Querying overhead counteracts reduced aborts
Performance Results: Kmeans-low
Better
>25% slowdown
Execution Time Aborts
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 33
Conclusion
Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 34
Conclusion
Conflict detection often applies Bloom filters– for fast set operations: y ϵ S and S1∩S2
– unconventionally using BFs for null-intersection
Our recommendations (from theory & practice)1. strongly consider querying before intersection2. in hardware, consider intersecting BoBs3. build adaptive systems for application behaviors
Improving Bloom Filter Configuration for Lazy Transactional Memory
Thank you!markj@eecg.toronto.edu