+ All Categories
Home > Documents > Models and languages for semistructured data

Models and languages for semistructured data

Date post: 11-Jan-2016
Category:
Upload: bud
View: 29 times
Download: 1 times
Share this document with a friend
Description:
Models and languages for semistructured data. Bridging documents and databases. Lectures. 1. Introduction to data models 2. Query languages for relational databases 3. Models and query languages for object databases 4. Embedded query languages - PowerPoint PPT Presentation
48
Models and languages for semistructured data Bridging documents and databases
Transcript
Page 1: Models and languages for semistructured data

Models and languages forsemistructured data

Bridging documents and databases

Page 2: Models and languages for semistructured data

Lectures

1. Introduction to data models2. Query languages for relational databases3. Models and query languages for object

databases4. Embedded query languages 5. Models and query languages for

semistructured data, XML6. Semantic Web, introduction7. Semantic Web, continued

Page 3: Models and languages for semistructured data

Why do we like types?

Types facilitate understanding

Types enable compact representations

Types enable query optimisation

Types facilitate consistency enforcement

Page 4: Models and languages for semistructured data

Background assumptions fortyped data

Data stable over timeOrganisational body to control data

Exercise: Give an example of a context where these assumptions do not hold

Page 5: Models and languages for semistructured data

Semistructured data

Semistructured data is schemaless and self describing

The data and the description of the data are integrated

Page 6: Models and languages for semistructured data

An example

{name: {first: “John”, last: “Smith”}, tel: 112233, email: “[email protected]”}

“John” “Smith”

112233 “[email protected]

name tel email

first last

Page 7: Models and languages for semistructured data

Another example

person person

name age name age

child

&o1 &o2

“Eva” 40 “Abel” 20

{person:&o1{name: “Eva”, age: 40, child: &o2},person:&o2{name: “Abel”, age: 20}}

An object identifier, such as &o1, before a structure, binds the object identifier to the identity of that structure. The object identifier can then be used to refer to the structure.

Page 8: Models and languages for semistructured data

Terminology

The following is an ssd-expression:

&o1{name: “Eva”, age: 40, child: &o2}

Label ValueObjectidentifier

Page 9: Models and languages for semistructured data

A database

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

Page 10: Models and languages for semistructured data

Path expressions

A path expression is a sequence of labels:l1.l2…ln

A path expression results in a set of nodes

Path properties are specified by regular expressions on two levels: on the alphabet of labels and on the alphabet of characters that comprise labels

Page 11: Models and languages for semistructured data

A path expression

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

biblio.book.author

Page 12: Models and languages for semistructured data

A path expression

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

biblio.(book l paper).author

Page 13: Models and languages for semistructured data

Examples of path expressions

biblio.book.author - authors of booksbiblio.paper.author - authors of papersbiblio.(book l paper).author - authors of

books or papersbiblio._.author - authors of anythingbiblio._*.author - nodes at the ends of

paths starting with biblio, ending with author, and having an arbitrary sequence of labels between

Page 14: Models and languages for semistructured data

Example of a label pattern

((b l B)ook l (a l A)uthor) (s)? - book, Book, author, Author, books, Books, authors, Authors

Page 15: Models and languages for semistructured data

An exercise

biblio._*.author.(“[s l S]ection”)

Which ones of the following paths match the path expression above?

1. Biblio.author.Section2. Biblio.cat.rat.hat.author.section3. Biblio.author4. Biblio.cat.author.section.Section

Page 16: Models and languages for semistructured data

A simple query

Select author: Xfrom biblio.book.author X

Result:{author: “Darwin”, author: “Marx”}

Page 17: Models and languages for semistructured data

A query with a condition

select row: Xfrom biblio._ Xwhere “Crick” in X.author

Result:{row: {author: “Crick”,

author: “Wallace”,date: 1956,title: “The spiral DNA”}, …}

Page 18: Models and languages for semistructured data

Two exercises

select row: {title: Y, date: Z}from biblio.paper X, X.title Y, X.date Z

select row: {author: Y, date: Z}from biblio.book X, X.author Y, X.date

Z

Page 19: Models and languages for semistructured data

A database

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

select row: {title: Y, date: Z}from biblio.paper X, X.title Y, X.date Z

Page 20: Models and languages for semistructured data

A database

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

Page 21: Models and languages for semistructured data

Nested queries

select row: (select author: Y from X.author Y)

from biblio.book X

Page 22: Models and languages for semistructured data

Three exercises

Which authors have written a book or a paper in 1992?

Which authors have written a book together with Jones?

Which authors have written both a book and a paper?

Page 23: Models and languages for semistructured data

Expressing relations

a b c

1 2 33 2 24 3 1

b d e

1 1 33 4 22 3 1

r1 r2

{ r1: { row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2} }, r2: { row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2} } }

Page 24: Models and languages for semistructured data

Expressing relational joins

select a: A, d: Dfrom r1.row X

r2.row YX.a A, X.b B, Y.b B’, Y.d D

where B = B’

Page 25: Models and languages for semistructured data

Label variables

select L: Xfrom biblio._*.L Xwhere matches(“.*Shakespeare.*”, X)

Label variable

biblio book

author

titledate

n2

Shakespeare Macbeth 1622

db

author

titledate

n3

Smith Best of Shakespeare 1992book

…….

Page 26: Models and languages for semistructured data

Label variables

select L: Xfrom biblio._*.L Xwhere matches(“.*Shakespeare.*”, X)

{author: “Shakespeare”, title: “Best of Shakespeare”}

Page 27: Models and languages for semistructured data

Turning labels into data

select publ: {type: L, author: A}

from biblio.L X, X.author A

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

{publ: {type: “paper”, author: “Crick”},publ: {type: “paper”, author: “Wallace”},publ: {type: “book”, author: “Darwin”}

Page 28: Models and languages for semistructured data

An exercise

List all publications in 1992, their types, and titles.

Page 29: Models and languages for semistructured data

Basic XML syntax

XML is a textual representation of dataAn element is a text bounded by tags

<name> John </name>

start-tagend-tagcontent

element

<name> </name> can be abbreviated as <name/>

Page 30: Models and languages for semistructured data

Basic XML syntax

Elements may contain subelements

<person><name> John </name><tel> 112233 </tel><email> [email protected] </email>

</person>

Page 31: Models and languages for semistructured data

XML attributes

An attribute is defined by a name-value pair within a tag

<price currency = “dollar”> 500 </price>

<length unit = “cm”> 25 </length>

Page 32: Models and languages for semistructured data

XML attributes and elements

<product><name> widget </name><price> 10 </price>

</product>

<product price = “10”><name> widget </name>

</product>

<product name = “widget” price = “10”/>

Page 33: Models and languages for semistructured data

XML and ssd-expressions

<person><name> John </name><tel> 112233 </tel><email> [email protected] </email>

</person>

{person: {name: “John”, tel: 112233, email: “[email protected]”}}

Page 34: Models and languages for semistructured data

XML references

<person id = “p1”><name> John </name><tel> 112233 </tel>

</person>

<person id = “p2”><name> Peter </name><tel> 998877 </tel><boss idref = “p1”/>

</person>

element identifier

reference attribute

Page 35: Models and languages for semistructured data

Document Type Definitions

<!DOCTYPE db [<!ELEMENT db (person*)><!ELEMENT person (name, age, email)><!ELEMENT name (#PCDATA)><!ELEMENT age (#PCDATA)><!ELEMENT email (#PCDATA)>

]>

Page 36: Models and languages for semistructured data

An exercise on DTDs as schemas

<db> <r1> <a> a1 </a> <b> b1 </b> </r1><r1> <a> a2 </a> <b> b2 </b> </r1> <r2> <c> a1 </c> <d> b1 </d> </r1> <r2> <c> c2 </c> <d> d2 </d> </r1> <r3> <a> a1 </a> <c> b1 </c> </r1>

</db>

Write down a DTD for the data above!

Page 37: Models and languages for semistructured data

Attributes in DTDs

<product>

<name language = “Swedish” department = “music”>

trumpet </name>

<price currency = “dollar”> 500 </price>

<length unit = “cm”> 25 </length>

</product>

<!ATTLIST name language CDATA #REQUIRED department CDATA #IMPLIED>

<!ATTLIST price currency CDATA #REQUIRED><!ATTLIST length unit CDATA #REQUIRED>

Page 38: Models and languages for semistructured data

Reference attributes in DTDs

<!DOCTYPE people [

<!ELEMENT people (person*)>

<!ELEMENT person (name)>

<!ELEMENT name (PCDATA)>

<!ATTLIST person id ID #REQUIRED

boss IDREF #REQUIRED

friends IDREFS#IMPLIED>

]>

Page 39: Models and languages for semistructured data

An exercise

<people><person> id = “sven” boss = “olle”>

<name> Sven Svensson </name></person> <person> id = “olle” friends = “nils eva”>

<name> Olle Olsson </name></person> <person> id = “pelle” boss = “nils eva”>

<name> Per Persson </name></person>

<people>

Does this XML element conform to the previous DTD?

Page 40: Models and languages for semistructured data

Limitations of DTDs as schemas

DTDs impose order

No base types

The types of IDREFs cannot be

constrained

Page 41: Models and languages for semistructured data

XSL - extensible stylesheet language<bib> <book> <title> t1 </title>

<author> a1 </author> <author> a2 </author>

</book><paper>

<title> t2 </title> <author> a3 </author> <author> a4 </author>

</paper> <book> <title> t3 </title>

<author> a5 </author> <author> a6 </author>

</book></bib>

Page 42: Models and languages for semistructured data

Template rules and XSL patterns

<xsl: template><xsl: apply-templates/>

</xsl: template>

<xsl: template match = “bib/*/title”><result>

<xsl: value-of/></result>

</xsl: template>

}Template rule

XSL pattern

<result> t1 </result><result> t2 </result><result> t3 </result>

Page 43: Models and languages for semistructured data

Two exercises

select row: {title: Y, date: Z}from biblio.paper X, X.title Y, X.date Z{row: {title: “The spiral DNA”,

date: 1956}, {title: “Origin”,date: 1848}, {title: “Kapital”,date: 1860}}

select row: {author: Y, date: Z}from biblio.book X, X.author Y, X.date Z

Page 44: Models and languages for semistructured data

Which authors have written a book or a paper in 1992?

select author: Xfrom biblio.(book | paper) Y, Y.author Xwhere Y.date = 1992

Page 45: Models and languages for semistructured data

Which authors have written a book together with Jones?

select author: Xfrom biblio.book Y, Y.author Xwhere “Jones” in Y.author

Page 46: Models and languages for semistructured data

Which authors have written both a book and a paper?

select author: Afrom biblio.book B, biblio.paper P, B.author Awhere B.author = P.author

select author: A1from biblio.book B, biblio.paper P, B.author A1, P.author A2where A1 = A2

Page 47: Models and languages for semistructured data

List all publications in 1992, their types, and titles.

select publ: {type: L, title: T}from biblio.L X, X.title Twhere X.date = 1992

Page 48: Models and languages for semistructured data

<!DOCTYPE db [<!ELEMENT db (r1*, r2*, r3*)><!ELEMENT r1 (a, b)><!ELEMENT r2 (c, d)><!ELEMENT r3 (a, c)><!ELEMENT a (#PCDATA)><!ELEMENT b (#PCDATA)><!ELEMENT c (#PCDATA)><!ELEMENT d (#PCDATA)>

]>

<db> <r1> <a> a1 </a> <b> b1 </b> </r1><r1> <a> a2 </a> <b> b2 </b> </r1> <r2> <c> a1 </c> <d> b1 </d> </r1> <r2> <c> c2 </c> <d> d2 </d> </r1> <r3> <a> a1 </a> <c> b1 </c> </r1>

</db>


Recommended