LIVE
A lineage-supported, versioned DBMS
Anish Das Sarma Martin Theobald Jennifer Widom
ULDB Data Model and the Trio System Uncertainty & Lineage
LIVE Data Model (LDM) Uncertainty, Lineage & Versioning
Data Modifications Insert/Delete Tuples, Update Values, Update
Confidences Query Evaluation
Valid-At vs. Snapshot Queries, Interval Computations, Confidence Computations, Complexity
Experiments/Conclusions
Agenda
21.04.232 LIVE - A lineage-supported, versioned DBMS
ULDB Data Model
21.04.233 LIVE - A lineage-supported, versioned DBMS
Different types of uncertainty: 1. Tuple Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences
Implementation of the ULDB data model: Trio System
TriQL query language TrioExplorer browser frontend, trioplus client,
API Enhanced PostgreSQL backend (SPI) Search for “Stanford Trio”
ULDBs – Alternatives
21.04.234 LIVE - A lineage-supported, versioned DBMS
1. Alternatives: uncertainty about attribute values
2. ‘?’ (Maybe) Annotations 3. Confidences
Saw (witness, color, car)
Amy red, Honda ∥ red, Toyota ∥ orange, Mazda
Three possibleworlds
ULDBs – Maybe Annotations
21.04.235 LIVE - A lineage-supported, versioned DBMS
Six possibleworlds
1. Alternatives 2. ‘?’ (Maybe): uncertainty about tuple
presence 3. Confidences
?
Saw (witness, color, car)
Amy red, Honda ∥ red, Toyota ∥ orange, Mazda
Betty blue, Acura
ULDBs – Confidences
21.04.236 LIVE - A lineage-supported, versioned DBMS
1. Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences: weighted uncertainty
Six possible worlds,each with a probability
?
Saw (witness, color, car)
Amy red, Honda 0.5 ∥ red, Toyota 0.3 ∥ orange, Mazda 0.2
Betty blue, Acura 0.6
ULDBs – Closure
21.04.237 LIVE - A lineage-supported, versioned DBMS
Saw (witness, car)
Cathy
Mazda ∥ Honda
Drives (person, car)
Jimmy, Toyota ∥ Jimmy, Mazda
Billy, Honda ∥ Frank, Honda
Hank, Honda
Suspects
Jimmy
Billy ∥ Frank
Hank
Suspects = πperson(Saw ⋈ Drives)
???
Does not correctlycapture possibleworlds in theresult!
CANNOT
ULDBs – Lineage
21.04.238 LIVE - A lineage-supported, versioned DBMS
ID Saw (witness, car)
11
Cathy
Honda ∥ Mazda
ID Drives (person, car)
21
Jimmy, Toyota ∥ Jimmy, Mazda
22
Billy, Honda ∥ Frank, Honda
23
Hank, Honda
ID Suspects
31
Jimmy
32
Billy ∥ Frank
33
Hank
Suspects = πperson(Saw ⋈ Drives)
???
λ(31) = (11,2)(21,2)
λ(32,1) = (11,1)(22,1)
λ(33) = (11,1)23
; λ(32,2) = (11,1)(22,2)
ULDBs – Summary
21.04.239 LIVE - A lineage-supported, versioned DBMS
1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences4. Lineage
ULDBs are closed and complete
Uncertainty-Lineage Databases (ULDBs)Uncertainty-Lineage Databases (ULDBs)
Can exclusively utilize lineage in order to compute the confidence of a result tuple.
#P-complete for general Boolean formulas Approximation algorithms: Luby-Karp, etc.
Lineage & Confidences
21.04.2310 LIVE - A lineage-supported, versioned DBMS
λ(21) = (11 12 13)
ID Saw(witness, car)
11 (Mary, Honda) : 0.8
12 (Susan, Honda) : 0.9
13 (Betty, Honda) : 0.5
ID SuspectCars(car)
21 Honda : ?
Select distinct car from Saw;
P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5)
0.99
ID Photo(Number,Name)2
11 (1, Amy) [0,1] : 1.0
12 (1, Bob) [0,] : 0.6
13 (2, Carl) [0,1] : 0.314 (3, Dale) [1,1] : 0.1
Versioning (LDM Data Model)
21.04.2311 LIVE - A lineage-supported, versioned DBMS
Version intervals for tuples Contiguous version numbers 0,…, Database has current version vD
Tuples have a validity intervals [s, e]
Valid-At Queries: Select * from Photo valid-at 2;
Snapshot Queries: View Photo at 2;
Possible Worlds: LDM databases encode lists of sets of
possible worlds.
ID Photo(Number,Name)2
12 (1, Bob) [0,] : 0.6
ID Photo@2(Number,Name)
12 (1, Bob) : 0.6
Insert Tuple: Insert t with version [vD+1,]
commit; Increase vD
Data Modifications – Insert
21.04.2312 LIVE - A lineage-supported, versioned DBMS
ID People(Name, State, Job)0
21 (Bob, NY, Analyst) [0,] : 1.0
22 (Carl, IL, Teacher) [0,] : 1.0
23 (David, PA, Manager)
[0,] : 0.6
24 (Frank, CA, Eng.) [1,] : 0.3
ID People(Name, State, Job)1
ID People(Name, State, Job)2
25 (David, PA, CEO) [2,] : 0.3
(1)
(2)
(2)
Insert Tuple: Insert t with version [vD+1,]
Delete Tuple: Set end(t) to vD
commit; Increase vD
Data Modifications – Delete
21.04.2313 LIVE - A lineage-supported, versioned DBMS
ID People(Name, State, Job)2
21 (Bob, NY, Analyst) [0,] : 1.0
22 (Carl, IL, Teacher) [0,] : 1.0
23 (David, PA, Manager)
[0,] : 0.6
24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.3
22 (Carl, IL, Teacher) [0,2] : 1.0
ID People(Name, State, Job)3
(1)
(2)
(3)
(2)
Insert Tuple: Insert t with version [vD+1,]
Delete Tuple: Set end(t) to vD
Update Value: Set end(t) to vD
Insert t’ with version [vD+1,]
commit; Increase vD
Data Modifications – Update
21.04.2314 LIVE - A lineage-supported, versioned DBMS
ID People(Name, State, Job)3
21 (Bob, NY, Analyst) [0,] : 1.0
22 (Carl, IL, Teacher) [0,2] : 1.0
23 (David, PA, Manager)
[0,] : 0.6
24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.321 (Bob, CA, Student) [4,] : 0.3
21 (Bob, NY, Analyst) [0,3] : 1.0
(1)
(2)
(3)
(2)
(4)
(4)
ID People(Name, State, Job)4
Insert Tuple: Insert t with version [vD+1,]
Delete Tuple: Set end(t) to vD
Update Value: Set end(t) to vD
Insert t’ with version [vD+1,]
Update Probability: Set end(t) to vD
Insert t’=t with probability p’ and version [vD+1,]
commit; Increase vD
Data Modifications – Update
21.04.2315 LIVE - A lineage-supported, versioned DBMS
ID People(Name, State, Job)4
21 (Bob, NY, Analyst) [0,3] : 1.0
22 (Carl, IL, Teacher) [0,2] : 1.0
23 (David, PA, Manager)
[0,] : 0.6
24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.321 (Bob, CA, Student) [4,] : 0.3
(1)
(2)
(3)
(2)
(4)
(4)21 (Bob, CA, Student) [5,] :
0.7
21 (Bob, CA, Student) [4,4] : 0.3 (5)
ID People(Name, State, Job)5
Insert Tuple: Insert t with version [vD+1,]
Delete Tuple: Set end(t) to vD
Update Value: Set end(t) to vD Insert t’ with version [vD+1,]
Update Probability: Set end(t) to vD Insert t’=t with probability p’ and version
[vD+1,]
Possible worlds: Updates may create duplicate
worlds, which are merged (at any version v).
Data Modifications – Summary
21.04.2316 LIVE - A lineage-supported, versioned DBMS
ID People(Name, State, Job)4
21 (Bob, NY, Analyst) [0,3] : 1.0
22 (Carl, IL, Teacher) [0,2] : 1.0
23 (David, PA, Manager)
[0,] : 0.6
24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.326 (Bob, CA, Student) [4,] : 0.3
(1)
(2)
(3)
(2)
(4)
(4)21 (Bob, CA, Student) [5,] :
0.7
21 (Bob, CA, Student) [4,4] : 0.3 (5)
ID People(Name, State, Job)5
1) Data Computation (regular SQL, including lineage) 2) Interval Computation (stored procedure)
Query Evaluation
21.04.2317 LIVE - A lineage-supported, versioned DBMS
DD
D1, D2, …, Dn1D1, D2, …, Dn1
possibleworlds
at versionsQ on each
world
encodingof possible worlds
Q(D1), Q(D2), …, Q(Dn)Q(D1), Q(D2), …, Q(Dn)
implementation of Q
operational semantics
D + ResultD + Result
D1, D2, …, Dn2D1, D2, …, Dn2
@ (0)
@ (1)
D1, D2, …, DnvD1, D2, …, Dnv @ (vD)
…
…
@ (0)
Can exclusively utilize lineage in order to compute the confidence of any result tuple.
Can exclusively utilize lineage in order to compute the version interval of any result tuple.
Lineage, Confidences & Versions
21.04.2318 LIVE - A lineage-supported, versioned DBMS
Positive Lineage (disjunctions & conjunctions) In the lineage formula λ(t)
Replace every tuple t’ by its version interval Replace every with and every with
Version Interval Computation
21.04.2319 LIVE - A lineage-supported, versioned DBMS
λ(21) = (11 12 13)
ID Saw(witness, car)3
11 (Mary, Honda) [1,] : 0.8
12 (Susan, Honda) [2,] : 0.9
13 (Betty, Honda) [3,] : 0.5
ID SuspectCars(car)3
21 (Honda) ? : ?
Select distinct car from Saw;
P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5)
[1,] :
0.99
Positive Lineage (disjunctions & conjunctions) In the lineage formula λ(t)
Replace every tuple t’ by its version interval Replace every with and every with
Version & Confidence Computation
21.04.2320 LIVE - A lineage-supported, versioned DBMS
λ(21) = (11 12)
ID Saw(witness, car)3
11 (Mary, Honda) [1,] : 0.8
12 (Susan, Honda) [2,] : 0.9
13 (Betty, Honda) [3,] : 0.5
ID SuspectCars(car)3
21 (Honda) [1,] : 0.99
Select distinct car from Saw;
P(21) = 1 – (1-0.8) X (1-0.9)
ID SuspectCars(car)2
21 (Honda) ? : ?
Select distinct car from Saw valid-at 2;
[1,] : 0.98
21.04.2321 LIVE - A lineage-supported,
versioned DBMS
Can decouple interval computation from data computation
Or: push interval computation into query plans only when there is no negation.
Interval Computations & Query Plans
Select R.A from R EXCEPT ( Select R.A from R EXCEPT Select S.A from S ); r=(a)[0,10] u=(a)[0,10]
t=(a)[0,10]
r=(a)[0,10] s=(a)[5,15]
–
–
Select R.A from R,SWhere R.A=S.A;
r=(a)[0,10] s=(a)[5,15]
t=(a)[5,10]
Positive Lineage (disjunctions & conjunctions) Version interval computation
PTIME (linear) Confidence computation
#P-complete
Arbitrary Lineage (including negation) Version interval computation
PTIME (linear) if all confidences are known NP-hard if confidences are not known
(need to check for idempotence of negated tuples) Confidence computation
#P-complete
Complexity Results
21.04.2322 LIVE - A lineage-supported, versioned DBMS
Probabilistic & versioned TPC-H setting Queries over Lineitem, Orders tables
with varying join selectivity from 0.1% to 1% (6,000-60,000 and1,500-15,000 tuples for Lineitem & Orders)
Update 0.1% to 1% of the input data Assign probabilities within [0,1] uniform-randomly to
tuples
Additional indexes for versioning Two B+-trees on (start, end) and end points of intervals Rewrite valid-at & snapshot queries using
WHERE (start ≤ v ≤ end) predicates
Experiments – Setup
21.04.2323 LIVE - A lineage-supported, versioned DBMS
Experiments – Results (I)
21.04.2324 LIVE - A lineage-supported, versioned DBMS
Join query Overhead of versioned
system vs. non-versioned system (versions not computed)
Join query Overhead of
computing versions (versioned system)
(%)
Experiments – Results (II)
21.04.2325 LIVE - A lineage-supported, versioned DBMS
Join query Progressive data
updates (overwrite multiple times)
Join query Valid-at queries vs. full version
computation
Experiments – Results (III)
21.04.2326 LIVE - A lineage-supported, versioned DBMS
Overhead of version computation, different query types (1% data modified)
LDMs are closed and complete Generalizes to full ULDB data
model (including value alternatives & maybe (?) annotations)
Can employ lineage also for update propagations Supports all of
INSERT/DELETE/UPDATE with INTERSECT/UNION/EXCEPT set operations
Conclusions
21.04.2327 LIVE - A lineage-supported, versioned DBMS
Lineage
Uncertainty Versioning
DBMS