Database Tuples Play Cooperative Games
Ester Livshits
Joint work with:
Leopoldo Bertossi, Benny Kimelfeld, Alon Reshef, Moshe Sebag
Ester Livshits Oxford Data and Knowledge Seminar 2
AUTHOR
Name Affiliation
Alice UCLA
Bob NYU
Cathy MIT
David UCSD
Ellen NYU
INSTITUTE
Name STATE
UCLA CA
UCSD CA
NYU NY
MIT MA
PUBLICAION
Author Paper
Alice A
Alice B
Bob C
Cathy C
Cathy D
David C
CITATIONS
PAPER CITS
A 18
B 2
C 8
D 12
๐ ๐ง, ๐ค :โAUTHOR ๐ฅ, ๐ฆ , INSTITUE ๐ฆ, โฒCAโฒ , PUBLICATION ๐ฅ, ๐ง , CITATIONS ๐ง, ๐ค
PAPER CITS
A 18
B 2
C 8
Why we obtained a
particular answer?
Why we did not obtain
some other answer?
Ester Livshits Oxford Data and Knowledge Seminar 3
AUTHOR
Name Affiliation
Alice UCLA
Bob NYU
Cathy MIT
David UCSD
Ellen NYU
INSTITUTE
Name STATE
UCLA CA
UCSD CA
NYU NY
MIT MA
PUBLICAION
Author Paper
Alice A
Alice B
Bob C
Cathy C
Cathy D
David C
CITATIONS
PAPER CITS
A 18
B 2
C 8
D 12
๐ ๐ง, ๐ค :โAUTHOR ๐ฅ, ๐ฆ , INSTITUE ๐ฆ, โฒCAโฒ , PUBLICATION ๐ฅ, ๐ง , CITATIONS ๐ง, ๐ค
PAPER CITS
A 18
B 2
C 8
Why we obtained a
particular answer?
Why we did not obtain
some other answer?
Ester Livshits Oxford Data and Knowledge Seminar 4
Which tuples in the database
explain the query result?
Measuring Contribution
โข Causal responsibility [Meliou et al. 2010]
โ ๐ก is a counterfactual cause for ๐ if ๐ท โจ ๐ and ๐ท โ {๐ก} โจ ๐
โ ๐ก is an actual cause for ๐ if ๐ท โ ฮ โจ ๐ and ๐ท โ {ฮ โช {๐ก}} โจ ๐for some ฮ โ ๐ท โ {๐ก}
โ The responsibility of ๐ก is 1
1+|ฮmin|
โข Not extendable to aggregate queries
โข May be counterintuitive
Ester Livshits Oxford Data and Knowledge Seminar 5
Is there a path from a to b?
Contingency
set
Measuring Contribution
โข Causal effect [Salimi et al. 2016]
โ See the database as a probabilistic database
โ CE ๐ก = E ๐ ๐ก โ ๐ท) โ E ๐ ๐ก โ ๐ท)
Ester Livshits Oxford Data and Knowledge Seminar 6
What makes the choice of a contribution score a good one?
Shapley Value
โข A widely known profit-sharing formula in cooperative game theory
โข Introduced by Lloyd Shapley in 1953
โข Applied in various areas beyond cooperative game theory:
โ Pollution responsibility in environmental management
โ Influence measurement in social network analysis
โ Identifying candidate autism genes
โ Bargaining foundations in economics
โ Takeover corporate rights in law
โ Explanations in machine learning
Ester Livshits Oxford Data and Knowledge Seminar 7
Shapley Value
Ester Livshits 8
Set ๐ด of players: Wealth function ๐ฃ:๐ซ ๐ด โ โ:
3
7
42
How to distribute the total
wealth among the players?
Machine learning
Query answering
Inconsistency
Features Prediction
Tuples Answer
Tuples Measure
[Lundberg, Lee 2017]
[L, Kimelfeld 2021]
[L et al. 2020]
Oxford Data and Knowledge Seminar
Shapley Value
Ester Livshits Oxford Data and Knowledge Seminar 9
Shapley ๐ด, ๐ฃ, ๐ =
๐ตโ๐ดโ{๐}
๐ต ! ๐ด โ ๐ต โ 1 !
๐ด !๐ฃ ๐ต โช ๐ โ ๐ฃ ๐ต
72
21 25
+4
The Shapley value is the expected delta
due to the addition in a random permutation
Shapley Value for Database Queries
โข Which tuples in the database explain the query result?
Ester Livshits Oxford Data and Knowledge Seminar 10
AUTHOR
Name Affiliation
Alice UCLA
Bob NYU
Cathy MIT
David UCSD
Ellen NYU
INSTITUTE
Name STATE
UCLA CA
UCSD CA
NYU NY
MIT MA
PUBLICAION
Author Paper
Alice A
Alice B
Bob C
Cathy C
Cathy D
David C
CITATIONS
PAPER CITS
A 18
B 2
C 8
D 12
๐ ๐ง, ๐ค :โAUTHOR ๐ฅ, ๐ฆ , PUBLICATION ๐ฅ, ๐ง , CITATIONS(๐ง, ๐ค)
SUM๐คโจ๐ ๐ง, ๐ค โฉ
Players
Wealth function
๐๐ ๐ด๐๐๐๐ = 20๐๐ ๐ถ๐๐กโ๐ฆ = 14.67๐๐ ๐ต๐๐ = 2.67๐๐ ๐ท๐๐ฃ๐๐ = 2.67๐๐ ๐ธ๐๐๐๐ = 0
Ester Livshits Oxford Data and Knowledge Seminar 11
AUTHOR
Name Affiliation
Alice UCLA
Bob NYU
Cathy MIT
David UCSD
Ellen NYU
INSTITUTE
Name STATE
UCLA CA
UCSD CA
NYU NY
MIT MA
PUBLICAION
Author Paper
Alice A
Alice B
Bob C
Cathy C
Cathy D
David C
CITATIONS
PAPER CITS
A 18
B 2
C 8
D 12
๐ ๐ง, ๐ค :โAUTHOR ๐ฅ, ๐ฆ , INSTITUE ๐ฆ, โฒCAโฒ , PUBLICATION ๐ฅ, ๐ง , CITATIONS ๐ง, ๐ค
PAPER CITS
A 18
B 2
C 8
Ester Livshits Oxford Data and Knowledge Seminar 12
AUTHOR
Name Affiliation
Alice UCLA
Bob NYU
Cathy MIT
David UCSD
Ellen NYU
INSTITUTE
Name STATE
UCLA CA
UCSD CA
NYU NY
MIT MA
PUBLICAION
Author Paper
Alice A
Alice B
Bob C
Cathy C
Cathy D
David C
CITATIONS
PAPER CITS
A 18
B 2
C 8
D 12
๐ ๐ง, ๐ค :โAUTHOR ๐ฅ, ๐ฆ , INSTITUE ๐ฆ, โฒCAโฒ , PUBLICATION ๐ฅ, ๐ง , CITATIONS ๐ง, ๐ค
PAPER CITS
A 18
B 2
C 8
๐():โAUTHOR ๐ฅ, ๐ฆ , INSTITUE ๐ฆ, โฒCAโฒ , PUBLICATION ๐ฅ, โฒAโฒ , CITATIONS โฒAโฒ, 18
โข Explaining Query Answers
โข Computational Complexity
โข Responsibility to Inconsistency
Outline
Ester Livshits Oxford Data and Knowledge Seminar 13
Computational Complexity
Ester Livshits Oxford Data and Knowledge Seminar 14
โข A CQ ๐ is hierarchical if for every two existential variables ๐ฅ and ๐ฆ:
โ ๐ด๐ก๐๐๐ ๐ฅ โ ๐ด๐ก๐๐๐ ๐ฆ or
โ ๐ด๐ก๐๐๐ ๐ฆ โ ๐ด๐ก๐๐๐ ๐ฅ or
โ ๐ด๐ก๐๐๐ ๐ฅ โฉ ๐ด๐ก๐๐๐ ๐ฆ = โ
๐1():โ๐ ๐ฅ, ๐ฆ , ๐(๐ฅ, ๐ง)
Query Hierarchical Non-hierarchical
SJFCQ PTIME FP#P-complete
SJFCQ with
negationsPTIME FP#P-complete
sum \ count PTIME FP#P-complete
[L et al.
ICDT 2020]
[Reshef et al.
PODS 2020]
๐():โAUTHOR ๐ฅ, ๐ฆ , INSTITUTE ๐ฆ, โฒCAโฒ , PUBLICATION ๐ฅ, ๐ง
๐ฆ ๐ง
๐ฅ
Computational Complexity
Ester Livshits Oxford Data and Knowledge Seminar 15
โข A CQ ๐ is hierarchical if for every two existential variables ๐ฅ and ๐ฆ:
โ ๐ด๐ก๐๐๐ ๐ฅ โ ๐ด๐ก๐๐๐ ๐ฆ or
โ ๐ด๐ก๐๐๐ ๐ฆ โ ๐ด๐ก๐๐๐ ๐ฅ or
โ ๐ด๐ก๐๐๐ ๐ฅ โฉ ๐ด๐ก๐๐๐ ๐ฆ = โ
Query Hierarchical Non-hierarchical
SJFCQ PTIME FP#P-complete
SJFCQ with
negationsPTIME FP#P-complete
sum \ count PTIME FP#P-complete
๐2():โ๐ ๐ฅ , ๐ ๐ฅ, ๐ฆ , ๐(๐ฆ)
[L et al.
ICDT 2020]
[Reshef et al.
PODS 2020]
๐():โAUTHOR ๐ฅ, ๐ฆ , INSTITUTE ๐ฆ, โฒCAโฒ , PUBLICATION ๐ฅ, ๐ง
๐ฆ
๐ฅ
Conjunctive Queries
โข To prove hardness, we consider the simplest non-hierarchical query
๐๐ ๐๐(): โ๐ ๐ฅ , ๐ ๐ฅ, ๐ฆ , ๐(๐ฆ)
โข Reduction from counting independent sets in a bipartite graph
Ester Livshits Oxford Data and Knowledge Seminar 16
R S T
Conjunctive Queries
โข Each instance provides us with an equation over |IS(๐, ๐)|
โข |IS(๐, ๐)| - number of independent sets of size ๐ in ๐
Ester Livshits Oxford Data and Knowledge Seminar 17
Computational Complexity
Ester Livshits Oxford Data and Knowledge Seminar 18
โข A CQ ๐ is hierarchical if for every two existential variables ๐ฅ and ๐ฆ:
โ ๐ด๐ก๐๐๐ ๐ฅ โ ๐ด๐ก๐๐๐ ๐ฆ or
โ ๐ด๐ก๐๐๐ ๐ฆ โ ๐ด๐ก๐๐๐ ๐ฅ or
โ ๐ด๐ก๐๐๐ ๐ฅ โฉ ๐ด๐ก๐๐๐ ๐ฆ = โ
Query Hierarchical Non-hierarchical
SJFCQ PTIME FP#P-complete
SJFCQ with
negationsPTIME FP#P-complete
sum \ count PTIME FP#P-complete
[L et al.
ICDT 2020]
[Reshef et al.
PODS 2020]
๐():โAUTHOR ๐ฅ, ๐ฆ , ยฌINSTITUTE ๐ฆ, โฒCAโฒ , PUBLICATION ๐ฅ, ๐ง
Computational Complexity
Ester Livshits Oxford Data and Knowledge Seminar 19
โข A CQ ๐ is hierarchical if for every two existential variables ๐ฅ and ๐ฆ:
โ ๐ด๐ก๐๐๐ ๐ฅ โ ๐ด๐ก๐๐๐ ๐ฆ or
โ ๐ด๐ก๐๐๐ ๐ฆ โ ๐ด๐ก๐๐๐ ๐ฅ or
โ ๐ด๐ก๐๐๐ ๐ฅ โฉ ๐ด๐ก๐๐๐ ๐ฆ = โ
Query Hierarchical Non-hierarchical
SJFCQ PTIME FP#P-complete
SJFCQ with
negationsPTIME FP#P-complete
sum \ count PTIME FP#P-complete
[L et al.
ICDT 2020]
[Reshef et al.
PODS 2020]
๐ ๐ง, ๐ค :โAUTHOR ๐ฅ, ๐ฆ , PUBLICATION ๐ฅ, ๐ง , CITATIONS(๐ง, ๐ค)SUM๐คโจ๐ ๐ง, ๐ค โฉ
Computational Complexity
Ester Livshits Oxford Data and Knowledge Seminar 20
โข A CQ ๐ is hierarchical if for every two existential variables ๐ฅ and ๐ฆ:
โ ๐ด๐ก๐๐๐ ๐ฅ โ ๐ด๐ก๐๐๐ ๐ฆ or
โ ๐ด๐ก๐๐๐ ๐ฆ โ ๐ด๐ก๐๐๐ ๐ฅ or
โ ๐ด๐ก๐๐๐ ๐ฅ โฉ ๐ด๐ก๐๐๐ ๐ฆ = โ
Query Hierarchical Non-hierarchical
SJFCQ PTIME FP#P-complete
SJFCQ with
negationsPTIME FP#P-complete
sum \ count PTIME FP#P-complete
[L et al.
ICDT 2020]
[Reshef et al.
PODS 2020]
๐ ๐ง, ๐ค :โAUTHOR ๐ฅ, ๐ฆ , PUBLICATION ๐ฅ, ๐ง , CITATIONS(๐ง, ๐ค)MAX๐คโจ๐ ๐ง, ๐ค โฉ, MIN๐คโจ๐ ๐ง, ๐ค โฉ, AVERAGE๐คโจ๐ ๐ง, ๐ค โฉ
Hardness can be extended to
general numerical queries
โข Computing the Shapley value is often hard
โข The picture is more positive when allowing approximation
โข Generalizes to unions of CQs
Approximation Complexity
Ester Livshits Oxford Data and Knowledge Seminar 21
Pr๐(๐ฅ)
1 + ๐โค ๐ด ๐ฅ, ๐, ๐ฟ โค (1 + ๐)๐(๐ฅ) โฅ 1 โ ๐ฟ
Query Hierarchical Non-hierarchical
SJFCQ PTIME FPRAS
sum \ count PTIME FPRAS
โข Additive approximation via Monte Carlo sampling
โข Also a multiplicative approximation due to the โgap propertyโ
โข Does not hold when allowing negation
โข Negation fundamentally changes the complexity picture!
Approximation Complexity
Ester Livshits Oxford Data and Knowledge Seminar 22
Pr ๐ ๐ฅ โ ๐ โค ๐ด ๐ฅ, ๐, ๐ฟ โค ๐ ๐ฅ + ๐ โฅ 1 โ ๐ฟ
For every tuple ๐ก in the database ๐ท:
Shapley(๐ก)=0 or Shapley(๐ก)โฅ1
๐(|๐ท|)
โข With negation, the contribution can be negative
Approximation Complexity
Ester Livshits Oxford Data and Knowledge Seminar 23
Register
Student Course
Alice OS
Alice AI
Bob OS
Cathy DB
Cathy IC
Student
Name
Alice
Bob
Cathy
David
TA
Name
Alice
Bob
David
๐(): โStudent ๐ฅ , ยฌTA ๐ฅ , Register(๐ฅ, ๐ฆ)
In some cases, deciding whether Shapley(๐ก)โ 0 is hard
โข Causal effect [Salimi et al. 2016]
โ See the database as a probabilistic database
โ CE ๐ก = E ๐ ๐ก โ ๐ท) โ E ๐ ๐ก โ ๐ท)
โข Coincides with the Banzhaf Power Index [Banzhaf 1965]
โข Our complexity results extend to this measure
Ester Livshits Oxford Data and Knowledge Seminar
Banzhaf Power Index
24
โข Explaining Query Answers
โข Computational Complexity
โข Responsibility to Inconsistency
Outline
Ester Livshits Oxford Data and Knowledge Seminar 25
Inconsistent Databasesโข A database is inconsistent if it violates integrity constraints
Ester Livshits Oxford Data and Knowledge Seminar 26
Cullen Douglas
dbo:birthPlace
โช dbr:California
โช dbr:Florida
Marion Jones
dbo:height
โช 1.524
โช 1.778
Irene Tedrow
dbo:deathPlace
โช dbr:California
โช dbr:Hollywood,_Los_Angeles
โช dbr:New_York_City
Inconsistent Databases
Ester Livshits Oxford Data and Knowledge Seminar 27
โข Imprecise data sources
โ Crowd, Web pages, social encyclopedias, sensors, โฆ
โข Imprecise data generation
โ natural-language processing, sensor/signal processing, image recognition, โฆ
โข Conflicts in data integration
โ Crowd + enterprise data + KB + Web + ...
โข Data staleness
โ Entities change address, status, ...
โข And so onโฆ
Ester Livshits Oxford Data and Knowledge Seminar 28
Idea:
Quantify the extent to which
integrity constraints are violated
Reliability estimationHow reliable is a new data source?
Progress indicationProgress bar for data repairing
Action prioritizationWhich tuples are mostly
responsible for inconsistency?
Ester Livshits Oxford Data and Knowledge Seminar 29
How can we quantify the
responsibility of individual tuples
to inconsistency?
Inconsistency measure
Responsibility sharing
mechanism
Ester Livshits Oxford Data and Knowledge Seminar 30
How can we quantify the
responsibility of individual tuples
to inconsistency?
Inconsistency measure
Responsibility sharing
mechanism
How to Measure Inconsistency?
โข Several measures proposed by the KR and DB communities
โ The drastic measure โ 1 if inconsistent, 0 otherwise [Thimm 2017]
โ #minimal inconsistent subsets [Hunter and Konieczny 2008]
โ #problematic tuples [Grant and Hunter 2011]
โ Minimal #tuples to remove to satisfy the constraints [Grant and Hunter 2013], [Bertossi 2018]
โ #maximal consistent subsets [Grant and Hunter 2011]
โข What makes a measure a good one? [L et al. SIGMOD 2021]
Ester Livshits Oxford Data and Knowledge Seminar 31
Ester Livshits Oxford Data and Knowledge Seminar 32
How can we quantify the
responsibility of individual tuples
to inconsistency?
Inconsistency measure
Responsibility sharing
mechanism
Shapley Value
Computational Complexity
Ester Livshits Oxford Data and Knowledge Seminar 33
Measure lhs chainNo lhs chain,
tractable c-repairother
drastic PTIME FP#P-complete
#min-
inconsistentPTIME
#problematic
tuplesPTIME
cardinality
repairPTIME Open NP-hard
#repairs PTIME FP#P-complete
FD: birthCity โ birthState
Tractable Measures
Ester Livshits Oxford Data and Knowledge Seminar 34
โข ๐ผ๐๐ผ - Number of minimal inconsistent subsets
๐4
Train Departs Arrives Time Duration
๐1 16 NYP BBY 1030 315
๐2 16 NYP PVD 1030 250
๐3 16 PHL WIL 1030 20
๐4 16 PHL BAL 1030 70
๐5 16 PHL WAS 1030 120
๐6 16 BBY PHL 1030 260
๐7 16 BBY NYP 1030 260
๐8 16 BBY WAS 1030 420
๐9 16 WAS PVD 1030 390
Train Time โ Departs
Train Time Duration โ Arrives
๐7 ๐1 ๐3 ๐9 ๐2 ๐5 ๐8 ๐6
Tractable Measures
Ester Livshits Oxford Data and Knowledge Seminar 35
โข ๐ผ๐๐ผ - Number of minimal inconsistent subsets
๐4
Train Departs Arrives Time Duration
๐1 16 NYP BBY 1030 315
๐2 16 NYP PVD 1030 250
๐3 16 PHL WIL 1030 20
๐4 16 PHL BAL 1030 70
๐5 16 PHL WAS 1030 120
๐6 16 BBY PHL 1030 260
๐7 16 BBY NYP 1030 260
๐8 16 BBY WAS 1030 420
๐9 16 WAS PVD 1030 390
Train Time โ Departs
Train Time Duration โ Arrives
๐7 ๐1 ๐3 ๐9 ๐2 ๐5 ๐8 ๐6
+2
๐ increases the value of ๐ผ๐๐ผ by ๐ if
๐ of the previous tuples conflict with it
Tractable Measures
Ester Livshits Oxford Data and Knowledge Seminar 36
โข ๐ผ๐ - Number of problematic tuples
๐4
Train Departs Arrives Time Duration
๐1 16 NYP BBY 1030 315
๐2 16 NYP PVD 1030 250
๐3 16 PHL WIL 1030 20
๐4 16 PHL BAL 1030 70
๐5 16 PHL WAS 1030 120
๐6 16 BBY PHL 1030 260
๐7 16 BBY NYP 1030 260
๐8 16 BBY WAS 1030 420
๐9 16 WAS PVD 1030 390
Train Time โ Departs
Train Time Duration โ Arrives
๐7 ๐1 ๐3 ๐9 ๐2 ๐5 ๐8 ๐6
+1
๐ increases the value of ๐ผ๐ by ๐ if
(๐ โ 1) of the previous tuples:
(1) conflict with ๐,
(2) do not conflict with other
tuples that occur before ๐.
Computational Complexity
Ester Livshits Oxford Data and Knowledge Seminar 37
Measure lhs chainNo lhs chain,
tractable c-repairother
drastic PTIME FP#P-complete
#min-
inconsistentPTIME
#problematic
tuplesPTIME
cardinality
repairPTIME Open NP-hard
#repairs PTIME FP#P-complete
{๐ฉ โ ๐ด,๐ฉ๐ช โ ๐ท,๐ฉ๐ช๐ฎ โ ๐ธ,๐ฉ๐ช๐ญ โ ๐ป}
๐ต โ ๐ต, ๐ถ โ {๐ต, ๐ถ, ๐น} ๐ต, ๐ถ, ๐บ โ ๐ต, ๐ถ, ๐น , {๐ต, ๐ถ, ๐น}โ ๐ต, ๐ถ, ๐บ
{๐ฉ โ ๐ด,๐ฉ๐ช โ ๐ท,๐ฉ๐ช๐ญ โ ๐ธ}
Left-Hand Side Chain
Ester Livshits Oxford Data and Knowledge Seminar 38
Train Departs Arrives Time Duration
๐1 16 NYP BBY 1030 315
๐2 16 NYP PVD 1030 250
๐3 16 PHL WIL 1030 20
๐4 16 PHL BAL 1030 70
๐5 16 PHL WAS 1030 120
๐6 16 BBY PHL 1030 260
๐7 16 BBY NYP 1030 260
๐8 16 BBY WAS 1030 420
๐9 16 WAS PVD 1030 390
Train Time โ Departs
Train Time Duration โ Arrives
PVD
Train, Time
Departs
Duration
NYP PHL BBY WAS
16, 1030
315 250 20 70 120 260 420 390
BBY PVD WIL BAL WAS PHL NYP WAS
Arrives
Independent
branchesConflicting
branches
Computational Complexity
Ester Livshits Oxford Data and Knowledge Seminar 39
Measure lhs chainNo lhs chain,
tractable c-repairother
drastic PTIME FP#P-complete
#min-
inconsistentPTIME
#problematic
tuplesPTIME
cardinality
repairPTIME Open NP-hard
#repairs PTIME FP#P-complete
{๐ฉ โ ๐ด,๐ฉ๐ช โ ๐ท,๐ฉ๐ช๐ฎ โ ๐ธ,๐ฉ๐ช๐ญ โ ๐ป}
๐ต โ ๐ต๐ถ โ {๐ต๐ถ๐น} ๐ต๐ถ๐บ โ ๐ต๐ถ๐น , {๐ต๐ถ๐น}โ ๐ต๐ถ๐บ
{๐ฉ โ ๐ด,๐ฉ๐ช โ ๐ท,๐ฉ๐ช๐ญ โ ๐ธ}
Efficiency: ฯ๐โ๐ด Shapley ๐ด, ๐ฃ, ๐ = ๐ฃ(๐ด)
Approximation Complexity
Ester Livshits Oxford Data and Knowledge Seminar 40
Measure lhs chainNo lhs chain,
tractable c-repairother
drastic PTIME FPRAS
#min-
inconsistentPTIME
#problematic
tuplesPTIME
cardinality
repairPTIME FPRAS No FPRAS
#repairs PTIME Open
Would imply an FPRAS for #MIS in a bipartite
graph โ long standing open problem
โข Two situations where we wish to quantify the responsibility of tuples:
โ Query answering
โ Database inconsistency
โข We treat the contribution from the viewpoint of game theory
โข We investigated the computational complexity
Ester Livshits Oxford Data and Knowledge Seminar
Concluding Remarks
41
Ester Livshits Oxford Data and Knowledge Seminar 42
Thank you for listening!
Questions?