Date post: | 18-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 2 times |
Dependency Problems for
Future Database Systems:Ingres Project D
Adrian Hudnott
Warwick Postgraduate Colloquium in Computer
Science 2008
Research Problems• The Third Manifesto is the subtitle of a book written by Chris Date and
Hugh Darwen that defines what a relational database system should and should not do
• The Third Manifesto and related publications do not include any implementation techniques
• There is no scalable implementation of any part of The Manifesto currently in existence
• There are concerns that some of the demands of The Manifesto cannot be met using extensions of techniques in use by SQL DBMS
• When these were catalogued all of the difficult ones had a common theme: constraints
• All the problems involve the DBMS efficiently deducing a fact using known properties about the data source
• Little or no access to the actual data
Research Problems• Constraint Enforcement
• Semantic Query Optimization
• Multiple Simultaneous Assignment
• Updates to Views
• Interface with SQL Databases
• Determination of Type (compile- and run-time)
Database Representation• Use Datalog (like Prolog)
• E.g.supplier(s1, smith, 20, london).supplier(s2, jones, 10, paris). supplier(s3, blake, 30, paris). supplier(s4, clarke, 20, london).supplier(s5, adams, 30, athens).
Notation• Extensional Database: EDB
• Intensional Database: IDB
• Statistics: Stat
• Constraints (invariant): Inv
• Assignments: Ri := U
i
• Queries: Qi(t)
• Boolean expressions: Bi(t)
• Cost function: c(exp)
Constraint Enforcement
RUInv
EDBStatRUIDBInv
/
/
Or, alternatively:
RU¬Inv
EDBStatRUIDBInv
/
/
(Prove Safety)
(Prove Failure)
N.B. Minimize access to the database (EDB)
Updates in the Presence of Views
• Example insertion:
supplier(s6, carter, 30, oxford).• R is “supplier”• U is <previous definition>
OR <tuple is the one above>• supplier_city(C) :- supplier(SN, N, ST, C); (SN=s6, N=carter, ST=30, C=oxford).
RUIDB
/
Materialized Views• Materialized View: contents of a view cached
to avoid re-calculation.
• Can be modelled using constraintsCREATE TABLE supplier (City PRIMARY KEY
REFERENCES supplier_city,...
CREATE TABLE supplier_city (City PRIMARY KEY REFERENCES supplier);
• And vice versa: A constraint is a view that is always empty.
Multiple Assignment: Naive
Implementation
• Example: X := Y, Y := X;
• Internal representation:
atomic {
Z := X;
X := Y;
Y := Z;
}
Multiple Assignment
tUIDBEDBStat
tRUURUIDBEDBStat
'
/'/
)'
/'/
(
tUIDBEDBStat
tRUURUIDBEDBStat
Inv
IIF U’ is the first assignment:
.... R’:= U’, R1 := U1, …, Rn := Un;
Multiple Assignment Idea1) Semantically optimize RHS expressions
2) Place RHS expressions into canonical form and find
subexpressions equal to other RHS expressions
3) Build a dependency graph
4) Depth first search to find cycles
5) Break cycles by creating simulated copies
6) Schedule and execute assignments
Transaction Repair
• Transaction repair is modifying an
update request so that it conforms with
the constraints
• Compensating actions
– E.g. ON DELETE CASCADE
• Research on more advanced repairs
– Generates more than one repair
View Update =
Transaction Repair + Disambiguation
DELETE FROM supplier_city WHERE City = ‘London’;
Transaction repair needed:
DELETE FROM supplier WHERE city=‘London’;
For more complex cases, disambiguation is needed
– Default values, rules, etc.
Semantic Optimization
tQtQ
StatIDBInv
21
No access to the data
and:
21 QC<QC
(Equivalence)
(Lower Cost)
• NULLs are flawed for many reasons
– Simple example: B OR NOT B is NULL if B is NULL
• Can consider missing data as free variables
• Problem is to efficiently determine if a formula is true
for all possible values of the free variable
Missing Information
])/[()(:)dom( and
)()()(
22
21
axtBtBax
tBtBtQEDBStatIDBInv
Current Progress• Catalogued all known issues requiring investigation• Outlined solutions or literature for the lesser problems
– E.g. comparison of large values
• Devised formal semantics for type checking Tutorial D programs
• Elaborated on the problems requiring original research in the context of dependency
• Resolved perceived ambiguities in The Manifesto• Detailed ideas for future development into a solution
to the multiple assignment problem• Completed a literature review on constraint checking
techniques
Project D RoadmapFurther literature review & code familiarization Summer 2008
Address multiple assignment problem Oct 2008 – Jan 2009
Development Phase 1 Feb – Aug 2009
Development Phase 2 Aug 2009 – Mar 2010
Thesis write-up Apr – Sep 2010
First complete build Mid – Late 2011
Product released Early 2012
Summing up Project D
• Essentially about relieving the industry of the burdens that have been imposed by SQL implementations in the past 30 years
• We think that the relational model can be properly implemented without SQL’s restrictions and ad hoc additions
– … but requires innovative research
• Project D is still in very early development
• However, some small successes already