Temporal and Spatial Data ManagementFall 2017
Sequenced SemanticsSL04
I Elements of the sequenced semanticsI Integration of the sequenced semantics into the DBMS kernelI SQL implementation
TSDM17, SL04 1/40 M. Böhlen, ifi@uzh
Table of Contents
I Elements of the sequenced semanticsI Snapshot reducibilityI Change preservationI Extended snapshot reducibilityI Scaling
I Integration of the sequenced semantics into the DBMS kernelI Algebraic operatorsI Reduction rules
I SQL implementation
TSDM17, SL04 2/40 M. Böhlen, ifi@uzh
Goal of Reduction to Snapshots
I There is a close relationship between a temporal and a nontemporaldatabase:
I the snapshot of a temporal relation at a time t is a nontemporalrelation.
I All nontemporal statements can be evaluated at each snapshot of atemporal database (“at each time point”)
I There should be a close relationship between a temporal and anontemporal statement:
I a temporal aggregation should resemble a nontemporal aggregation.
I With SQL this is not the case (remember temporal join versus join).
TSDM17, SL04 3/40 M. Böhlen, ifi@uzh
Setup and Notations
I Relation schema: R(A1, ...An,TS ,TE )I A1, ...,An are the explicit attributesI TS ,TE are temporal attributes
I TS : valid time startI TE : valid time end
I zn+2 denotes a tuple of arity n + 2
I We assume half open intervals: [TS ,TE )I We write T to refer to the period [TS ,TE )I t ∈ T ≡ TS ≤ t < TE
TSDM17, SL04 4/40 M. Böhlen, ifi@uzh
Timeslice and Snapshot Equivalence
I The timeslice operator maps a temporal to a nontemporal relation.I Definition of the timeslice operator:
τt(r) = {z(n) | ∃x ∈ r(z .A = x .A ∧ x .TS ≤ t < x .TE )}
I Two temporal relations, r and s, are snapshot equivalent, r s≡ s, ifffor all times t their snapshots are identical.
I Definition of snapshot equivalence:
r s≡ s iff ∀t(τt(r) = τt(s))
TSDM17, SL04 5/40 M. Böhlen, ifi@uzh
Example of Snapshot Equivalence
I Example: Two-day and four-day checkouts in the video examplecheckout1CustID TapeNum TC102 T1245 [19,20]C102 T1245 [21,22]
checkout2CustID TapeNum TC102 T1245 [19,22]
I Apply the timeslice operator at each time point:
τ19(checkout1) = {(C102,T1245)} τ19(checkout2) = {(C102,T1245)}τ20(checkout1) = {(C102,T1245)} τ20(checkout2) = {(C102,T1245)}τ21(checkout1) = {(C102,T1245)} τ21(checkout2) = {(C102,T1245)}τ22(checkout1) = {(C102,T1245)} τ22(checkout2) = {(C102,T1245)}
I checkout1 and checkout2 are snapshot equivalent.I checkout1 and checkout2 are syntactically different.
TSDM17, SL04 6/40 M. Böhlen, ifi@uzh
Snapshot ReducibilitySnapshot reducibility reduces the semantics of temporal operators to thesemantics of the corresponding nontemporal operators.
ψT is snapshot reducible to ψ iff for all t:
τt(ψT(R1, . . . ,Rn)) ≡ ψ(τt(R1), . . . , τt(Rn))
Illustration of snapshot reducibility:
I ∀t : τt(ψT (DT )) = ψ(τt(DT ))
DT
Dt Rt
RT
τt
ψT
ψ
τt
I DT = temporal DBI ψT ∈ {σT , πT , ϑT ,×T ,∪T ,−T}I RT = temporal result relationI τt = snapshot at time point tI Dt = snapshot of DT at time tI ψ ∈ {σ, π, ϑ,×,∪,−}I Rt = result relation at time t
TSDM17, SL04 7/40 M. Böhlen, ifi@uzh
Reducibility of Temporal Operators Task 4.1
I For each non-temporal operator there is a snapshot reducibleoperator:
I count at each point in timeI join at each point in timeI primary key at each point in timeI ranking at each point in time
I We write ψT for the snapshot counterpart of the relational algebraoperator ψ.
I Ex: A temporal count ϑT corresponds to a nontemporal count ϑ ateach point in time.
I Snapshot reducibility ensures a systematic generalization of allnontemporal operators to temporal operators.
I DϑSUM . . .⇒ DϑTSUM . . .
I 1θ . . .⇒ 1Tθ . . .
TSDM17, SL04 8/40 M. Böhlen, ifi@uzh
Data Lineage/1The lineage set, L[ψT(R1, . . . ,Rn)](z , t), of result tuple z at time t isthe set of witness lists of argument tuples, {〈r1, . . . , rn〉}, ri ∈ Ri , fromwhich z is derived:
L[σTθ (R)](z , t) = {〈r〉 | r ∈ R ∧ z .A = r .A ∧ t ∈ r .T}
L[πTB (R)](z , t) = {〈r〉 | r ∈ R ∧ z .B = r .B ∧ t ∈ r .T}
L[BϑTF (R)](z , t) = {〈r〉 | r ∈ R ∧ z .B = r .B ∧ t ∈ r .T}
L[R −T S](z , t) = {〈r ,⊥〉 | r ∈ R ∧ z .A = r .A ∧ t ∈ r .T}
L[R ∪T S](z , t) = {〈r ,⊥〉 | r ∈ R ∧ z .A = r .A ∧ t ∈ r .T} ∪{〈⊥, s〉 | s ∈ S ∧ z .A = s.A ∧ t ∈ s.T}
L[R ×T S](z , t) = {〈r , s〉 | r ∈ R ∧ z .A = r .A ∧ t ∈ r .T ∧s ∈ S ∧ z .C = s.C ∧ t ∈ s.T}
L[R �Tθ S](z , t) = {〈r ,⊥〉 | r ∈ R ∧ z .A = r .A ∧ t ∈ r .T}
TSDM17, SL04 9/40 M. Böhlen, ifi@uzh
Data Lineage/2 Task 4.11
Consider query Q = DϑTCNT (∗)(R):
RN D B T
r1 P1 CS 5K [1, 6)r2 P2 CS 6K [4, 7)r3 P3 MA 2K [1, 3)
D CNT Tz1 CS 1 [1, 4)z2 CS 2 [4, 6)z3 CS 1 [6, 7)z4 MA 1 [1, 3)
For z1, z2, and z3 and time points 3, 4, 5 and 6, we get the followinglineage sets:
I L[DϑTCNT (∗)(R)](z1, 3) = {〈r1〉}
I L[DϑTCNT (∗)(R)](z2, 4) = {〈r1〉, 〈r2〉}
I L[DϑTCNT (∗)(R)](z2, 5) = {〈r1〉, 〈r2〉}
I L[DϑTCNT (∗)(R)](z3, 6) = {〈r2〉}
TSDM17, SL04 10/40 M. Böhlen, ifi@uzh
Change Preservation Task 4.2
We use lineage to preserve changes (start and end of intervals).I {(DB, 5K , [Feb,Jul))} 6= {(DB, 5K , [Feb,Apr)), (DB, 5K , [Apr,Jul))}I scaling of values (based on old and new timestamps) gets possible
ψT is change preserving iff ∀z , z ′ ∈ ψT(R1, . . . ,Rn):
∀t, t ′ ∈ z .T (L(z , t) = L(z , t ′)) ∧(z .TS−1 ∈ z ′.T ⇒ L(z ′, z .TS − 1) 6=L(z , z .TS)) ∧(z .TE ∈ z ′.T ⇒ L(z ′, z .TE ) 6=L(z , z .TS))
Change preservation permits facts that hold for an entire interval but nota subinterval. Intervals are not coalesced automatically.
I πTD
TSDM17, SL04 11/40 M. Böhlen, ifi@uzh
Extended Snapshot Reducibility Task 4.3
We want to access time intervals in snapshot reducible operators.I at each time point join long and short projects
ψT is extended snapshot reducible iff for all t:
τt(ψT(R1, . . . ,Rn)) ≡ πE(ψ(τt(εU1(R1)), . . . , τt(εUn(Rn)))).
For ψ ∈ {ϑ, σ, π,×,1,1 , 1 ,�,�}, E = schema of ψT(R1, . . . ,Rn).
Extended snapshot reducibility supports snapshot reducibility and accessto timestamps (by propagating timestamps with the extend operator ε).
I at each time point the average duration of externally fundedprojects: Dϑ
TAVG(DUR(T ))(R)
TSDM17, SL04 12/40 M. Böhlen, ifi@uzh
Scaling/1I Consider a project relation p that tracks project fundings B:
pD N B TS TEDB 1 181K 2013/2/1 2013/8/1DB 2 196K 2013/5/1 2014/1/1AI 1 153K 2013/4/1 2013/9/1AI 2 120K 2013/4/1 2013/9/1
I When aggregating such data according to snapshot reducibility wemight want to scale the funding to the new time periods:
rD B TS TEDB 89K 2013/2/1 2013/5/1DB 165.6K 2013/5/1 2013/8/1DB 122.4K 2013/8/1 2014/1/1AI 273K 2013/4/1 2013/9/1
I DϑT :B@scaleSUM(B) (R)
TSDM17, SL04 13/40 M. Böhlen, ifi@uzh
Scaling/2
Let x be an attribute value to be scaled and TN and TO be intervaltimestamps such that TN ⊆ TO.
A scaling function scale defines a weight 0 < w(TN ,TO) ≤ 1 and scalesx accordingly:
scale(x ,TN ,TO) = x · w(TN ,TO), where 0 < w(TN ,TO) ≤ 1
Lemma: Scaling must be a parameter of temporal operators, since forsome temporal operators (e.g., aggregation, difference), it cannot beperformed in a pre- or post-processing step.
The reason is that neither before nor after the temporal operator all therequired attributes for scaling are available.
TSDM17, SL04 14/40 M. Böhlen, ifi@uzh
Table of Contents
I Elements of the sequenced semanticsI Snapshot reducibilityI Change preservationI Extended snapshot reducibilityI Scaling
I Integration of temporal support in DBMS kernelI Algebraic adjustment operatorsI Reduction rules
I SQL implementation
TSDM17, SL04 15/40 M. Böhlen, ifi@uzh
Integration of the Sequenced Semantics into theDBMS Kernel
I Goal:I Determine and implement in the DBMS kernel the functionality that
is required to offer support for the sequenced semantics (snapshotreducibility + change preservation + extended snapshot reducibility +scaling).
I The sequenced semantics supports all features that have beenidentified as important for processing time periods.
I Solution:I New algebraic adjustment operators that split periods into pieces,
such that snapshot reducible queries only need equality forcomparison.
I Possibility to propagate and access the original timestamps.
TSDM17, SL04 16/40 M. Böhlen, ifi@uzh
Algebraic Basis for Sequenced Semantics
I Solution is at the level of the algebra.⇒ Any language with sequenced semantics can be supported.
IXSQL
SQL/Temporal
ATSQL
TSQL2
SQL/TP
RA + N + φ
DBMS
TSDM17, SL04 17/40 M. Böhlen, ifi@uzh
Temporal Query Processing
I To implement the sequenced semantics two new algebra operatorsfor the adjustment of periods are needed:
I Temporal normalization NI Temporal alignment φ
I Adjustment respects the lineage.
I Reduction rules from temporal RA to nontemporal RA.I “How to use the adjustment operators”
I Period propagation ε makes it possible to access the originalperiods.
I Used for extended snapshot reducibility and scaling
TSDM17, SL04 18/40 M. Böhlen, ifi@uzh
Temporal Adjustment
I The purpose of a temporal adjustment operators is to break periodsinto pieces.
I Two temporal adjustment operators are required since there are twoclasses of operators in relational DBMSs:
I One input tuple contributes to at most one result tuple per timepoint.⇒ temporal normalizationExample: Aggregation
I One input tuple contributes to more than one result tuple per timepoint.⇒ temporal alignmentExample: Joins
TSDM17, SL04 19/40 M. Böhlen, ifi@uzh
Temporal Normalization/1
I Normalize splits each tuple in R with respect to the tuples in Sthat match on the grouping attributes B.
rR
g1
g2S
T1
T2
T3
T4NB(R, S)
I Algebra: NB(R, S)I R is split with respect to all relevant tuples in S.
TSDM17, SL04 20/40 M. Böhlen, ifi@uzh
Temporal Normalization/2I Number of contracts per department: Dϑ
TAVG(DUR(T ))(R)
RN D T
r1 Joe DB [Feb, Jul)r2 Ann DB [Feb, Sep)r3 Sam AI [May, Oct)
N D TJoe DB [Feb, Jul)Ann DB [Feb, Jul)
N D TAnn DB [Jul, Sep)
N D TSam AI [May, Oct)
CNT D T2 DB [Feb, Jul)
CNT D T1 DB [Jul, Sep)
CNT D T1 AI [May, Oct)
CNT D T2 DB [Feb, Jul)1 DB [Jul, Sep)1 AI [May, Oct)
adjustment (disjoint)
nontemporal aggregation
I One input tuple contributes to at most one result tuple per month.TSDM17, SL04 21/40 M. Böhlen, ifi@uzh
Temporal Alignment/1
I Align splits each tuple in R with respect to each tuple in the groupof tuples in S that satisfy θ.
rR
g1
g2S
T1
T2T3φθ(R,S)
I Algebra: φθ(R, S)I R is split with respect to each relevant tuple in S.
TSDM17, SL04 22/40 M. Böhlen, ifi@uzh
Temporal Alignment/2I Employees managed by manager: M 1T
M.D=R.D R
MM D T
m1 Tom DB [Feb, Dec)
RN D T
r1 Joe DB [Feb, Jul)r2 Ann DB [Feb, Sep)r3 Sam AI [May, Oct)
M D TTom DB [Feb, Jul)
M D TTom DB [Feb, Sep)
M D TTom DB [Sep, Dec)
N D TJoe DB [Feb, Jul)
N D TAnn DB [Feb, Sep)
N D T
M D N TTom DB Joe [Feb, Jul)
M D N TTom DB Ann [Feb, Sep)
M D N TTom DB ω [Sep, Dec)
M D N TTom DB Ann [Feb, Sep)Tom DB Joe [Feb, Jul)Tom DB ω [Sep, Dec)
adjustment (overlapping)
nontemporal left outer join
I One input tuple contributes to more than one result tuple permonth. E.g., m1 contributes twice to month Feb.
TSDM17, SL04 23/40 M. Böhlen, ifi@uzh
Absorb
I The alignment produces all possibly required fragments (for Joins,LOJs, FOJs, etc). This can lead to temporal duplicates that must beeliminated.
I Absorb eliminates from r temporal duplicates, i.e., tuples with aperiod that is contained in the period of a value-equivalent tuple.
RA B Ta c [1, 9)a c [3, 7)a d [3, 7)b c [3, 7)b d [3, 7)
α(R)A B Ta c [1, 9)a d [3, 7)b c [3, 7)b d [3, 7)
I Algebra: α(R)
TSDM17, SL04 24/40 M. Böhlen, ifi@uzh
Reduction Rules
Operator ReductionSelection σT
θ (R) = σθ(R)Projection πT
B(R) = πB,T (NB(R ,R))Aggregation Bϑ
TF (R) = B,TϑF (NB(R ,R))
Difference R −T S = NA(R , S)−NA(S,R)Union R ∪T S = NA(R , S)∪NA(S,R)Intersection R ∩T S = NA(R , S)∩NA(S,R)Cart. Prod. R ×T S = α(φ>(R , S)1R.T=S.Tφ>(S,R))Inner Join R 1T
θ S = α(φθ(R , S)1θ∧R.T=S.Tφθ(S,R))Left O. Join R 1T
θ S = α(φθ(R , S) 1 θ∧R.T=S.Tφθ(S,R))Right O. Join R 1 T
θ S = α(φθ(R , S)1 θ∧R.T=S.Tφθ(S,R))Full O. Join R 1 T
θ S = α(φθ(R , S) 1 θ∧R.T=S.Tφθ(S,R))Anti Join R �T
θ S = φθ(R , S)�θ∧R.T=S.Tφθ(S,R)
TSDM17, SL04 25/40 M. Böhlen, ifi@uzh
Answering Sequenced Queries/1
Approach for formulating sequenced queries:
1. Formulate query without thinking about time/periods; make alloperators temporal (e.g., ϑT ).
2. Add copies of periods that are needed later (in conditions, forscaling, etc).
3. Replace all references to periods with references to these copies.
4. Apply reduction rules to get nontemporal relational algebraexpression.
TSDM17, SL04 26/40 M. Böhlen, ifi@uzh
Answering Sequenced Queries/2Query: Dϑ
TAVG(DUR(T ))(R)
1. Timestamp propagation:Dϑ
TAVG(DUR(T ))(εU(R))
2. Timestamp substitution:Dϑ
TAVG(DUR(U))(εU(R))
3. Temporal adjustment:ND(εU(R), εU(R))
4. Nontemporal aggregation:D,TϑAVG(DUR(U))(ND(εU(R), εU(R)))
DϑTAVG(DUR(T ))
R
D,TϑAVG(DUR(U))
ND
εU
R
εU
R
TSDM17, SL04 27/40 M. Böhlen, ifi@uzh
Table of Contents
I Elements of the sequenced semanticsI Snapshot reducibilityI Change preservationI Extended snapshot reducibilityI Scaling
I Integration of temporal support in DBMS kernelI Algebraic adjustment operatorsI Reduction rules
I SQL implementation
TSDM17, SL04 28/40 M. Böhlen, ifi@uzh
PostgreSQL Implementation/1
I DBMS kernel integration of temporal adjustment.
SQL
DBMS1
Parser60kloc150
Analyzer/Rewriter20kloc450
Optimizer50kloc150
Executor40kloc400
Files and Access Methods
Buffer Manager
Disk Manager
RecoveryManager
RecoveryManager
LockManager
Data and Index Files
TSDM17, SL04 29/40 M. Böhlen, ifi@uzh
PostgreSQL Implementation/2 Task 4.4
I SQL extension that provides direct access to the adjustmentoperators:
εU(R) : SELECT Ts AS Us, Te AS Ue, * FROM R
NB(R,S) : FROM (R NORMALIZE S USING(B)) AS R
φθ(R,S) : FROM (R ALIGN S ON θ) AS R
α(φ(R, S) 1 φ(R,S)) : WHERE ...
I Additional (technical) details will be added on the following slides
I Source Code:http://www.ifi.uzh.ch/dbtg/research/align.html
TSDM17, SL04 30/40 M. Böhlen, ifi@uzh
Timestamp Propagation in SQL
I The extend operator adds to the schema of a relation R anattribute U that is a duplicate of the period of R.
I Algebra: εU(R)I SQL:
WITHR AS (SELECT *, Ts AS Us, Te AS Ue FROM R)
...
TSDM17, SL04 31/40 M. Böhlen, ifi@uzh
Temporal Normalization in SQL Task 4.5
I The normalize operator splits each tuple in R with respect to thetuples in S that match on the grouping attributes B.
I Algebra: NB(R, S) or Nθ(R,S)
I SQL:
... FROM (R NORMALIZE SUSING(B)WITH (Rs,Re,Ss,Se)) AS Rnorm ...
... FROM (R NORMALIZE SON θWITH (Rs,Re,Ss,Se)) AS Rnorm ...
I In SQL the timestamp attributes must be specified explicitly
TSDM17, SL04 32/40 M. Böhlen, ifi@uzh
Temporal Alignment in SQL Task 4.6
I The align operator splits each tuple in R with respect to each tuplein S that satisfies θ.
I Algebra: φθ(R, S)
I SQL:
FROM ( R ALIGN SON θWITH (Rs,Re,Ss,Se) ) AS Radj
I In SQL the timestamp attributes must be specified explicitly
TSDM17, SL04 33/40 M. Böhlen, ifi@uzh
Absorb in SQL
I The absorb operator eliminates from r tuples with a period that iscontained in the period of a value-equivalent tuple.
I Algebra: α(φ(R,S) 1θ φ(R,S))I SQL:
SELECT ...FROM (...) AS Radj, (...) AS SadjWHERE θAND (((Radj.Ts = Rs OR Radj.Ts = Ss)AND (Radj.Te = Re OR Radj.Te = Se))OR Rs IS NULL OR Ss IS NULL);
I Rs and Re are the propagated timestamps of RI Ss and Se are the propagated timestamps of SI (TS = Rs ∨ TS = Ss) ∧ (TE = Re ∨ TE = Se) ∨ Rs = ω ∨ Ss = ω
TSDM17, SL04 34/40 M. Böhlen, ifi@uzh
Example/1
R r1 = (Ann)r2 = (Joe) r3 = (Ann)
εU(R) r1 = (Ann, [1, 8))r2 = (Joe, [2, 6)) r3 = (Ann, [8, 12)
N(εU(R), εU(R))(Ann, [1, 8))
(Ann, [1, 8))
(Ann, [1, 8))
(Joe, [2, 6)) (Ann, [8, 12))
TϑAVG(DUR(U))(N(εU(R), εU(R))) (7) (5.5) (7) (4)
TSDM17, SL04 35/40 M. Böhlen, ifi@uzh
Example/2 Task 4.7, Task 4.8, Task 4.9
Translate resulting algebra expression to SQL:I TϑAVG(DUR(U))(N(εU(R), εU(R)))
I Schema: R(N, Ts, Te)
I WITH x AS ( SELECT Ts Us, Te Ue, * FROM R )SELECT AVG(Ue - Us), Ts, TeFROM (x AS x1 NORMALIZE x AS x2
USING()WITH (Ts,Te,Ts,Te)) AS Rnorm
GROUP BY Ts, Te;
TSDM17, SL04 36/40 M. Böhlen, ifi@uzh
Scaling in SQL/1
I Scaled values are functions of the original value, the old period, andthe new period.
I The following function is a simple user-defined function thatuniformly scales values.
CREATE OR REPLACE FUNCTIONscale(x FLOAT, ts_new DATE, te_new DATE,
ts_old DATE, te_old DATE)RETURNS FLOAT AS$$BEGIN
RETURN x * (te_new - ts_new) / (te_old - ts_old);END;$$ LANGUAGE PLPGSQL;
TSDM17, SL04 37/40 M. Böhlen, ifi@uzh
Scaling in SQL/2
Procedure for the integration of scaling into the query processingworkflow with adjustment operators:
1. propagate periods, ε;2. normalize, Nθ or align, φθ (possibly use scaled values in θ);3. possibly scale values and remove propagated attributes;4. apply the corresponding nontemporal operator, ψ.
TSDM17, SL04 38/40 M. Böhlen, ifi@uzh
Scaling in SQL/3 Task 4.10
Query: Determine amount of external funding per department.
ϑT :B@scaleSUM(B)(p)
⇒ ϑT :B@scaleSUM(B)(εU(p))
⇒D,T ϑSUM(scale(B,T ,U))(ND(εU(p), εU(p)))
WITHP1 AS (SELECT Ts Us, Te Ue, * FROM P),P2 AS (SELECT * FROM (P1 x NORMALIZE P1 y ON x.D=y.D
WITH (Ts,Te,Ts,Te)) AS P),
P3 AS (SELECT D, N, scale(B, Ts, Te, Us, Ue) B, Ts, Te FROM P2)
SELECT D, SUM(B), Ts, TeFROM P3GROUP BY D, Ts, Te;
TSDM17, SL04 39/40 M. Böhlen, ifi@uzh
Summary
I Sequenced semanticsI Snapshot reducibilityI Change preservationI Extended snapshot reducibilityI Scaling
I New algebraic adjustment operators in the DBMS kernel thatprovide support for the sequenced semantics.
I normalizeI alignI extendI absorb
I Timestamp propagationI Direct mapping of the adjustment operators to SQL for illustration
purposes. Other SQL extensions are possible.
TSDM17, SL04 40/40 M. Böhlen, ifi@uzh