+ All Categories
Home > Documents > XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do...

XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do...

Date post: 19-Mar-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
36
XML Design for Relational Storage Solmaz Kolahi University of Toronto Leonid Libkin University of Edinburgh 16th International World Wide Web Conference (WWW 2007) Kolahi, Libkin @WWW 2007 – p.1/17
Transcript
Page 1: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

XML Design for Relational Storage

Solmaz KolahiUniversity of Toronto

Leonid LibkinUniversity of Edinburgh

16th International World Wide Web Conference (WWW 2007)

Kolahi, Libkin @WWW 2007 – p.1/17

Page 2: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Motivation

Like relational data, XML documents can contain redundantinformation due to functional dependencies.

416

CityToronto

416 416 416

City City CityToronto TorontoToronto

AreaCode AreaCodeAreaCode AreaCode

Functional Dependency: @AreaCode → @City

Kolahi, Libkin @WWW 2007 – p.2/17

Page 3: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Motivation

This redundancy is reflected in the relational storage of XMLdocuments.

AreaCode416

CityToronto

CityAreaCode

XML functional dependency

@AreaCode → @City AreaCode → City

regular functional dependency

416

416

416

416

Toronto

Toronto

Toronto

Toronto

XML Document Relational Storage

Kolahi, Libkin @WWW 2007 – p.3/17

Page 4: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Motivation

This redundancy is reflected in the relational storage of XMLdocuments.

AreaCode

416

416

416

416

City

Toronto

Toronto

Toronto

TorontoAreaCode416

CityToronto

XML functional dependency

XML Document Relational Storage

equality-generating dependency

on

@AreaCode → @City ∀ R1(x, a) ∧ R2(y, c) ∧

R1(x′, a) ∧ R2(y′, c′) → c = c′

Kolahi, Libkin @WWW 2007 – p.3/17

Page 5: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Motivation

This redundancy is reflected in the relational storage of XMLdocuments.

AreaCode

416

416

416

416

City

Toronto

Toronto

Toronto

TorontoAreaCode416

CityToronto

XML functional dependency

XML Document Relational Storage

equality-generating dependency

on

@AreaCode → @City ∀ R1(x, a) ∧ R2(y, c) ∧

R1(x′, a) ∧ R2(y′, c′) → c = c′

The more redundant the data, the more prone to updateanomalies.

Kolahi, Libkin @WWW 2007 – p.3/17

Page 6: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Motivation

Solution: normalizing data to eliminate redundancies.

Normal forms for relational data:• BCNF eliminates all redundancies w.r.t. functional

dependencies but may lose dependencies.• 3NF eliminates some redundancies but preserves

dependencies.

Normal forms for XML documents:• XNF eliminates redundancies w.r.t. XML functional

dependencies.

If nontrivial FD p1, . . . , pn → q.@l holds then p1, . . . , pn → qshould also hold.

Kolahi, Libkin @WWW 2007 – p.4/17

Page 7: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Motivation

Many XML documents are stored and queried using relationaldatabase management systems.

Questions:• How do XML constraints translate to the relational storage?• What is the best XML design to have a redundancy-free

relational storage?• What is a good XML design to have a low-redundancy

relational storage?

We use an information-theoretic technique to measure theredundancy of data.

Kolahi, Libkin @WWW 2007 – p.5/17

Page 8: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Outline

• Overview of the information-theoretic measure.• Storing XML in relations and constraint translation.• Designing XML to achieve low redundancy in relational

storage.• Concluding remarks.

Kolahi, Libkin @WWW 2007 – p.6/17

Page 9: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

• Proposed by Arenas & Libkin in PODS’03.• Used to measure the redundancy of a data value in a

database instance with respect to a set of constraints.• Intuitively, RICI(p|Σ) measures the relative information

content of position p in instance I w.r.t. constraints Σ.• Independent of data models and query languages.

Kolahi, Libkin @WWW 2007 – p.7/17

Page 10: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

• Proposed by Arenas & Libkin in PODS’03.• Used to measure the redundancy of a data value in a

database instance with respect to a set of constraints.• Intuitively, RICI(p|Σ) measures the relative information

content of position p in instance I w.r.t. constraints Σ.• Independent of data models and query languages.

Σ = {A → C}

RICI(P |Σ)

0.875

A B C D

1 2 3 4

1 2 3 5

Σ = {A → C, B → C}

RICI(P |Σ)

0.781

Kolahi, Libkin @WWW 2007 – p.7/17

Page 11: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

• Proposed by Arenas & Libkin in PODS’03.• Used to measure the redundancy of a data value in a

database instance with respect to a set of constraints.• Intuitively, RICI(p|Σ) measures the relative information

content of position p in instance I w.r.t. constraints Σ.• Independent of data models and query languages.

Σ = {A → C}

RICI(P |Σ)

0.875

0.781

A B C D

1 2 3 4

1 2 3 5

1 2 3 6

Σ = {A → C, B → C}

RICI(P |Σ)

0.781

0.629

Kolahi, Libkin @WWW 2007 – p.7/17

Page 12: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

• Proposed by Arenas & Libkin in PODS’03.• Used to measure the redundancy of a data value in a

database instance with respect to a set of constraints.• Intuitively, RICI(p|Σ) measures the relative information

content of position p in instance I w.r.t. constraints Σ.• Independent of data models and query languages.

Σ = {A → C}

RICI(P |Σ)

0.875

0.781

0.711

A B C D

1 2 3 4

1 2 3 5

1 2 3 6

1 2 3 7

Σ = {A → C, B → C}

RICI(P |Σ)

0.781

0.629

0.522

Kolahi, Libkin @WWW 2007 – p.7/17

Page 13: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

• Proposed by Arenas & Libkin in PODS’03.• Used to measure the redundancy of a data value in a

database instance with respect to a set of constraints.• Intuitively, RICI(p|Σ) measures the relative information

content of position p in instance I w.r.t. constraints Σ.• Independent of data models and query languages.

Σ = {A → C}

RICI(P |Σ)

0.875

0.781

0.711

0.658

A B C D

1 2 3 4

1 2 3 5

1 2 3 6

1 2 3 7

1 2 3 8

Σ = {A → C, B → C}

RICI(P |Σ)

0.781

0.629

0.522

0.446

Kolahi, Libkin @WWW 2007 – p.7/17

Page 14: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

R(A,B,C) Σ = {A → B}

A B C

1 2 3

1 2 4

Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).For every X ⊆ Pos(I) − {p} compute probability distributionP (a|X) for every a ∈ {1, . . . , k}.

Kolahi, Libkin @WWW 2007 – p.8/17

Page 15: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

R(A,B,C) Σ = {A → B}

A B C

2 3

1 2

Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).For every X ⊆ Pos(I) − {p} compute probability distributionP (a|X) for every a ∈ {1, . . . , k}.

P (2|X) =

Kolahi, Libkin @WWW 2007 – p.8/17

Page 16: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

R(A,B,C) Σ = {A → B}

A B C

1 2 3

1 2 1

Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).For every X ⊆ Pos(I) − {p} compute probability distributionP (a|X) for every a ∈ {1, . . . , k}.

P (2|X) =

Kolahi, Libkin @WWW 2007 – p.8/17

Page 17: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

R(A,B,C) Σ = {A → B}

A B C

4 2 3

1 2 7

Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).For every X ⊆ Pos(I) − {p} compute probability distributionP (a|X) for every a ∈ {1, . . . , k}.

P (2|X) =

Kolahi, Libkin @WWW 2007 – p.8/17

Page 18: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

R(A,B,C) Σ = {A → B}

A B C

4 2 3

1 2 7

Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).For every X ⊆ Pos(I) − {p} compute probability distributionP (a|X) for every a ∈ {1, . . . , k}.

P (2|X) = 48/

Kolahi, Libkin @WWW 2007 – p.8/17

Page 19: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

R(A,B,C) Σ = {A → B}

A B C

a 3

1 2

Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).For every X ⊆ Pos(I) − {p} compute probability distributionP (a|X) for every a ∈ {1, . . . , k}.

P (2|X) = 48/(48 + 6 × 42) = 0.16P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a 6= 2

Kolahi, Libkin @WWW 2007 – p.8/17

Page 20: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

R(A,B,C) Σ = {A → B}

A B C

1 2 3

1 2 4

Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).For every X ⊆ Pos(I) − {p} compute probability distributionP (a|X) for every a ∈ {1, . . . , k}.

P (2|X) = 48/(48 + 6 × 42) = 0.16P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a 6= 2

Conditional entropy : 2.8057

Average over all possible X: RICkI = 2.4558

Kolahi, Libkin @WWW 2007 – p.8/17

Page 21: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure of Information Content

R(A,B,C) Σ = {A → B}

A B C

1 2 3

1 2 4

Pick k such that adom(I) ⊆ {1, . . . , k} (k = 7).For every X ⊆ Pos(I) − {p} compute probability distributionP (a|X) for every a ∈ {1, . . . , k}.

P (2|X) = 48/(48 + 6 × 42) = 0.16P (a|X) = 42/(48 + 6 × 42) = 0.14 for every a 6= 2

Conditional entropy : 2.8057

Average over all possible X: RICkI = 2.4558

RICI(p|Σ) = limk→∞

RICkI (p | Σ)

log k= 0.875

Kolahi, Libkin @WWW 2007 – p.8/17

Page 22: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure and Normal Forms

Ideally, we want to achieve a well-designed database by havingthe maximum information content for the entire database.

If this is not achievable, we want to maximize the informationcontent for all positions to the possible extent by enforcing somedesign conditions.

Given a condition C, guaranteed information content (GIC(C)) isthe largest number g ∈ [0, 1] such that for all positions in allinstances of all schemas satisfying C, the information content isnot smaller than g.

instances of (S, Σ)

RICI(p|Σ) ≥ g

Schema satisfying condition C(S, Σ) =⇒

Kolahi, Libkin @WWW 2007 – p.9/17

Page 23: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Measure and Normal Forms

Known results (Arenas&Libkin: PODS’03, Kolahi&Libkin:PODS’06):

• For relational schemas with functional dependencies:◦ BCNF is the only normal form that guarantees

well-design databases: GIC(BCNF) = 1.◦ a good 3NF normalization guarantees a minimum of 1/2

information content: GIC(3NF+) = 1/2.• For XML designs with functional dependencies:

◦ XNF is the only normal form that guaranteeswell-designed XML documents: GIC(XNF) = 1.

How do we design XML to achieve high information content forthe relational storage?

Kolahi, Libkin @WWW 2007 – p.10/17

Page 24: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Storing XML in Relations

Inlining technique: Given a DTD, separate relations are createdfor the root and elements occurring under a Kleene star.

student

db

contact

addressphone

@streetNo@number @city @postalCode

@name

*

**

student(stID, name, conID)

address(addID, conID, postalCode, streetNo, city)

phone(phID, conID, number)

Kolahi, Libkin @WWW 2007 – p.11/17

Page 25: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Storing XML in Relations

Inlining technique: Given a DTD, separate relations are createdfor the root and elements occurring under a Kleene star.

student

db

contact

addressphone

@streetNo@number @city @postalCode

@name

*

**

student(stID, name, conID)

address(addID, conID, postalCode, streetNo, city)

phone(phID, conID, number)

address[conID] ⊆FK student[conID]

phone[conID] ⊆FK student[conID]

Kolahi, Libkin @WWW 2007 – p.11/17

Page 26: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Storing XML in Relations

Inlining technique: Given a DTD, separate relations are createdfor the root and elements occurring under a Kleene star.

student

db

contact

addressphone

@streetNo@number @city @postalCode

@name

*

**

student(stID, name, conID)

address(addID, conID, postalCode, streetNo, city)

phone(phID, conID, number)

student, @postalCode → address

address[conID] ⊆FK student[conID]

phone[conID] ⊆FK student[conID]

∀ student(s, n, c) ∧ address(a, c, pc, st, apt, ct) ∧

student(s, n, c) ∧ address(a′, c, pc, st′, apt′, ct′)

→ a = a′

Kolahi, Libkin @WWW 2007 – p.11/17

Page 27: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Redundancy-Free Design

XML Design

• DTD• XML functional depen-

dencies

Relational Storage

• Relational schema• Keys, foreign keys• EGDs

Kolahi, Libkin @WWW 2007 – p.12/17

Page 28: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Redundancy-Free Design

XML Design

• DTD• XML functional depen-

dencies

Relational Storage

• Relational schema• Keys, foreign keys• EGDs

RICI (P |Σ) =?

Kolahi, Libkin @WWW 2007 – p.12/17

Page 29: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Redundancy-Free Design

XML Design

• DTD• XML functional depen-

dencies

Relational Storage

• Relational schema• Keys, foreign keys• EGDs

RICI (P |Σ) =?

Theorem: XML design in XNF ⇔ Max information content forpositions in the relational storage.

Kolahi, Libkin @WWW 2007 – p.12/17

Page 30: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Non-XNF Designs

If there are non-XNF functional dependencies,• we can do XNF normalization, but• normalizing is not always efficient.

With no restriction on FDs, we can have high redundancy in therelational storage.

• For any ε > 0, the information content can get as close as εto zero.

Can we restrict XML functional dependencies to guarantee areasonable information content?

• like the restriction that 3NF enforces to guarantee 1/2information content.

Kolahi, Libkin @WWW 2007 – p.13/17

Page 31: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Relative vs Absolute Functional Dependencies

student

db

contact

addressphone

@streetNo@number @city @postalCode

@name

*

**

Relative functional dependency:holds within each student element.

student, @postalCode → address

Absolute functional dependency:holds globally.

@postalCode → @city

We are interested in FDs relative to an element that occursunder a Kleene star in the DTD.

Kolahi, Libkin @WWW 2007 – p.14/17

Page 32: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Another Good Design

Theorem: If all XML functional dependencies are either XNF orrelative, then the information content for all data values in therelational storage is not less than 1/2.

Kolahi, Libkin @WWW 2007 – p.15/17

Page 33: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Another Good Design

Theorem: If all XML functional dependencies are either XNF orrelative, then the information content for all data values in therelational storage is not less than 1/2.

RICI (P |Σ) =?

only XNF functional dependencies ⇔ RICI(p|Σ) = 1

XNF or relative functional dependencies ⇒ RICI(p|Σ) ≥ 1/2

Kolahi, Libkin @WWW 2007 – p.15/17

Page 34: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Another Relational Storage for XML

We can treat XML documents as edge-labeled graphs andshred them into relations:

Edge(source, target, label) orValue(vid, val)

Blabel(source, target)

Value(vid, val)

Kolahi, Libkin @WWW 2007 – p.16/17

Page 35: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Another Relational Storage for XML

We can treat XML documents as edge-labeled graphs andshred them into relations:

Edge(source, target, label) orValue(vid, val)

Blabel(source, target)

Value(vid, val)

Our results extend:

only XNF functional dependencies ⇔ RICI(p|Σ) = 1

XNF or relative functional dependencies ⇒ RICI(p|Σ) ≥ 1/2

But enforcing XML functional dependencies requires arbitrarilymany more joins.

Kolahi, Libkin @WWW 2007 – p.16/17

Page 36: XML Design for Relational Storagesolmaz/docs/ · database management systems. Questions: • How do XML constraints translate to the relational storage? • What is the best XML design

Conclusions

Design tips to have a good relational storage:• try to have an XNF design to ensure a redundancy-free

relational storage.• organize XML elements so that there is no absolute FD to

ensure a low-redundancy relational storage.

Comparing inlining and edge relational representations:• they are equivalent in terms of redundancy.• edge can be significantly worse when enforcing constraints.

Future work:• is it always possible to avoid absolute FDs?

Kolahi, Libkin @WWW 2007 – p.17/17


Recommended