Date post: | 21-Dec-2015 |
Category: |
Documents |
Upload: | ambrose-stanley |
View: | 227 times |
Download: | 0 times |
Beyond Set Disjointness: The Communication Complexity of Finding the
Intersection
Grigory Yaroslavtsevhttp://grigory.us
Joint with Brody, Chakrabarti, Kondapally and Woodruff
Communication Complexity [Yaoβ79]
Alice: Bob:
π (π ,π )=?
Shared randomness
β¦
π (π ,π )β’ = min. communication (error ) β’ min. -round communication (error )
Set Intersection
π=πΊ ,π=π» , π (π , π )=πΊβ©π»πΊβ [π ] ,|π|β€π π» β [π ] ,|π|β€π = ?
(-Intersection) = ?
is big, n is huge, where huge big
Our results
Let
β’ (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODCβ14]β’ (-Intersection) = [Saglam-Tardos FOCSβ13; Brody, Chakrabarti, Kondapally, Woodruff, Y.β; RANDOMβ14]
{
times
(-Intersection) = for
Applications
β’ Exact Jaccard index (for -approximate use MinHash [Broderβ98; Li-Konigβ11; Path-Strokel-Woodruffβ14])β’ Rarity, distinct elements, joins,β¦β’ Multi-party set intersection (later)β’ Contrast:
1-round -protocol
π : [π ]β[π3]
πΊ π»
π(πΊ) π(π» )
[π ] [π ]
[π3] [π3]
Hashing
log π
=# of buckets
π : [π ]β[π / logπ]
Expected # of elements
Secondary Hashing
= # of hash functions
log 3π where
2-Round -protocol
log 3π
log 3π
|hπ (πΊ )|,|hπ (π» )|=π ( logπ log logπ )
Total communication = = O()
Collisions
πlogπ
log 3πPr [ππππππ πππ ]=π( 1logπ )
Collisions
log 3π
log 3π
Key fact: If then also =
Collisions
β’ Second round: β For each bucket send -bit equality check (total -
communication)β Correct intersection computed in buckets where
β Expected # items in incorrect buckets β Use 1-round protocol for incorrect bucketsβ Total communication
Main protocol
π (1)
=# of buckets
π : [π ]β[π]
Expected # of elements
Verification tree -degree
β¦i logπ β1π
buckets = leaves of the verification tree
Verification bottom-up
πΊπβ ,ππ
β πΊπβ ,ππ
β
πΊπββͺπΊπ ,ππ
ββͺπ» π
πΊπββ©ππ
βπΊπββ©ππ
β
(πΊπββͺπΊπ )β©(π ΒΏΒΏπββͺπ» π)ΒΏ
EQUALITY CHECK
Verification bottom-up
πΊπββ©ππ
βπΊπββ©ππ
β
(πΊπββͺπΊπ )β©(π ΒΏΒΏπββͺπ» π)ΒΏ
Correct Incorrect
Incorrect
πΊπββ©ππ
βπΊπββ©ππ
β
(πΊπββͺπΊπ )β©(π ΒΏΒΏπββͺπ» π)ΒΏ
Correct Incorrect
Correct
Verification bottom-up
πΊπββ©ππ
βπΊπββ©ππ
β
(πΊπββͺπΊπ )β©(π ΒΏΒΏπββͺπ» π)ΒΏ
Correct Incorrect
EQUALITY CHECK FAILS =>RESTART THE SUBTREE
πΊπββ©ππ
βπΊπββ©ππ
β
(πΊπββͺπΊπ )β©(π ΒΏΒΏπββͺπ» π)ΒΏ
Correct Incorrect
Correct
Verification bottom-up
ππ βπ
β¦ππ
πΊππ ,ππ
π β¦ πΊππ ,π π’
ππΊππ ,ππ
π πΊππ ,ππ
πβ¦
ππ βπ
Analysis of Stage
β’ = [node at stage computed correctly]β’ Set = β Run equality checks and basic intersection
protocols with success probability β Key lemma: [# of restarts per leaf => Cost of
Intersection in leafs = β Cost of Equality =
β’ [protocol succeeds] =
Multi-party extensions
players: , where
β’ Boost error probability of 2-player protocol to β’ Average per player (using coordinator):
in roundsβ’ Worst-case per player (using a tournament)
in rounds
Open Problems
β’ (-Intersection) = ?β’ Better protocols for the multi-party setting?
-Disjointnessβ’ , iff β’ [Razborovβ92; Hastad-Wigdersonβ96] β’ [Folklore + Dasgupta, Kumar, Sivakumar; Buhrmanβ12, Garcia-Soriano, Matsliah, De Wolfβ12]
β’ [Saglam, Tardosβ13]β’ [Braverman, Garg, Pankratov, Weinsteinβ13]