Post on 23-Feb-2016
description
transcript
1
S. Abiteboul – INRIA Saclay
Trees, semistructured data,and other strange ways to go beyond tables
Serge Abiteboul INRIA & ENS CachanPODS 30th Anniversary, 2011
Luc Véro
Another one of these No-SQL
talks?IMS, hierarchical model, V-relations, Jacobs’s calculus, Hardgrave’s broom, nested relations, format model, complex objects, logical data model, object databases, lambda calculus, regular trees, F-logic, NF1F, NF2, COL, IFO, LDL, IQL, SGML, HTML, ASN.1, XML, YAML, JSON…
2
S. Abiteboul – INRIA Saclay
Trees are useless n
A tree is a tree. How many more do you have to look at?
Ronald Reagan, governor of California, opposing the expansion of Redwood National Park (1966)
We don’t need anything beyond relations. These things are useless. Reject!
Anonymous referee (circa 1990)
Knowledge lives in trees
But of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die.Genesis, 2. 17
Introduction
Theorem: Information lives in trees and not in relations
Proof: the Bible does not say « But of the two dimensional table of knowledge of good and evil … »
3
S. Abiteboul – INRIA Saclay
Organization
Introduction
Hierarchical data model 60s
Nested relations 80s
Complex objects early 90s
Semistructured data & unranked labeled trees late 90s
Unranked labeled ordered trees, aka XML early 00s
Evolving trees, aka Active XML mid 00s
Cycles 90s to now
Conclusion
More or less chronological
4
S. Abiteboul – INRIA Saclay
For lack of time, we will ignore IMS and the hierarchical model• The language was purely navigational anyway
We will also ignore early works such as Makinouchi, Jacobs or Hardgrave
We will start with N1NF• François Bancilhon in France• Hans Schek in Germany • PhD thesis of Nicole Bidoit
5
S. Abiteboul – INRIA Saclay
Non-First-Normal-Form N1NF
Name Child CarAlice Toto Jaguar
Alice Lulu 2CV
Bob Mimi Mustang
Bob Zaza PriusA quarter on tables. Now what?
Trees!
Name Child Car
Alice TotoLulu
Jaguar2CV
Bob MimiZaza
MustangPrius
Data would prefer to live in infamous nested relations aka V-relationsaka N1NF relationsaka NF2 relations
Data live in 1NF relationsDB101
6
S. Abiteboul – INRIA Saclay
The devil is in the details
V-relations N1NF-relationsA B
1 1
1 2
2 2
2 3
3 1
3 3
A B C1 1
21
2 23
3 13
34
A C1 1
3 3
3 4
A1
2
3
A B
1
1 1
1 2
1 3
1 12
1 13
1 23
1 123
A is not a keyThe size is now possibly exponential in the size of the domain
A is a keyNo new power
7
S. Abiteboul – INRIA Saclay
Complex object model tuple and set constructors used freely
*
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
*
* *
Name
Peter
Cars
Name
2CV
Year
1976
Name
Mimi
Sex
F
Children
*
Name
Zaza
Sex
F
8
S. Abiteboul – INRIA Saclay
A logic and algebra for complex objects
Logic: main novelty is set variables – non first-order
Example: AbouBanat Query
{ T.Father | Families(T) X T.Children ( X.Sex = F ) }
Algebra: powerset operation, unnest/nest
Name Child CarAlice TotoBob Mimi
ZazaMustang
Bob Lulu Prius
Name Child CarBob Mimi Mustang
Bob Zaza Mustang
Bob Lulu Prius
Name Child CarBob Mimi
ZazaLulu
MustangPrius
9
S. Abiteboul – INRIA Saclay
Results
Equivalence theorem: algebra and logic have same expressive power
Remark: one can compute TC using algebra/logic (waoh! Cool!)
Also studied: fixpoint, datalog, while…
Complexity: each new level of nesting introduces one more exponential
Need to control the use of powerset
2n 2 2n ….
10
S. Abiteboul – INRIA Saclay
From complex objects to semistructured data
*
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
*
* *
Name
Peter
Cars
Name
2CV
Year
1976
Name
Mimi
Sex
F
Children
*
Name
Zaza
Sex
F
11
S. Abiteboul – INRIA Saclay
Revolution 1: more flexibility
*
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
*
* *
Name
Peter
Cars
Name
2CV
Year
1976
Name
Mimi
Sex
F
Children
*
Name
Zaza
Sex
F
Annotations
Trash
12
S. Abiteboul – INRIA Saclay
Revolution 2: Remove some nodes; name all
*
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
*
* *
Name
Peter
Cars
Name
2CV
Year
1976
Name
Zaza
Sex
F
Ann.
Trash
Family Family
Car CarChild Child
13
S. Abiteboul – INRIA Saclay
Unranked label trees
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
Name
Peter
Cars
Name
2CV
Year
1976
Name
Zaza
Sex
F
Ann.
Trash
Family Family
Car CarChild Child
14
S. Abiteboul – INRIA Saclay
This is better adapted to a Web context
Self describing data: No separation between schema and data
Flexibility
Not such a big deal
May be the main contribution is the format?
<families><family><name>Peter</Name><Cars><Car><Name>BMW</Name><Year>2010</Year></Car></Cars><Children><Child> …
Plus ça change, plus c’est la même choseThe more things change, the more they stay the same
15
S. Abiteboul – INRIA Saclay
What else? The trees are unbounded
Like nested relations, trees are unbounded in width
Unlike nested relations, they are unbounded in depth
One can simulate 2 counter machines with 2 branches• Do applications simulate 2 counter machines with XML
documents?• I am still looking for one• XML documents are rarely deep
But even for bounded trees there are fun questions: e.g., is the equivalence of monadic datalog decidable for bounded data trees
r
a$
aa
aa
aa
aa
a
a$
ab
ab
ab
16
S. Abiteboul – INRIA Saclay
What else? the trees are orderedUnranked labeled ordered trees = XML
Ignore order
Classical optimization
Respect order
Totally new ball game
Bring in tree automata
Reconcile
Order is often painful for optimization
17
S. Abiteboul – INRIA Saclay
Selling argument is the Web…
The move from relations to trees is interesting
But the move from centralized to distributed as well
and much less investigated
Where the fun is:• Scale is beyond what we though was thinkable• Machines are totally autonomous• Schema replaced by numerous ontologies• True/false logic replaced by inconsistency, probabilities, trust, belief…
18
S. Abiteboul – INRIA Saclay
And the trees are evolving (aka Active XML)
An old idea from object databases: mix data and computation
Resort
Resorts
snowcondName
Aspen
State
Colorado
!Unisys.com/snow(“Aspen”)
hotels
Unit DepthMeter 1
!Yahoo.com/GetHotels<city name=“Aspen”/>)
snow
19
S. Abiteboul – INRIA Saclay
And there are cycles
For lack of time, I will not mention the network model [Codasyl 1969]
• The language was purely navigational anyway
If I would add references to XML, I’d get cycles
Lots of models for graph data, e.g., IQL
Some fun results: e.g., some copy elimination problem when trying to obtain a ChandraHarel completeness for IQL
• Similar issue for unordered trees [recent result with Vianu]
Person
Name Spouse
Adam Person
Name Spouse
Eve
Paris C. Kanellakis
20
S. Abiteboul – INRIA Saclay
Conclusion
Is this a good time to do research on trees in databases?
The best time to plant a tree was 20 years ago.
The next best time is now.
Chinese Proverb
Advertisement
Book on Web data management to appear at Cambridge University Press http://webdam.inria.fr/Jorge