+ All Categories
Home > Documents > Representing and Querying XML with Incomplete Information

Representing and Querying XML with Incomplete Information

Date post: 09-Jan-2016
Category:
Upload: tuan
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Representing and Querying XML with Incomplete Information. Serge Abiteboul INRIA. Luc Segoufin INRIA. Victor Vianu UCSD. Organization. Motivations Simplifying assumptions Model of incompleteness Answering queries Results Discussion Conclusion. Motivations. - PowerPoint PPT Presentation
43
Representing and Querying XML with Incomplete Information Serge Abiteboul INRIA Luc Segoufin INRIA Victor Vianu UCSD
Transcript
Page 1: Representing and Querying XML with Incomplete Information

Representing and QueryingXMLwith Incomplete Information

Serge AbiteboulINRIA

Luc SegoufinINRIA

Victor VianuUCSD

Page 2: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 2

Organization

• Motivations• Simplifying assumptions• Model of incompleteness• Answering queries• Results• Discussion• Conclusion

Page 3: Representing and Querying XML with Incomplete Information

Motivations

Page 4: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 4

The Web is a world of incompleteness

• Information you get from the web is seldom complete:• Queries return you some - not all - data • Limited storage capability• Documents change on the Web:

expiration• Sites are unavailable…

• Context: A warehouse of XML documents from the Web, Xyleme

Page 5: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 5

This work

• This work: simple, practically appealing approach to managing incomplete information

• Sequence of queries to the web • (q1,A1)+(q2,A2)+… • Answers are cached

• Process a new query without access to the web• Give an incomplete answer• Explain incompleteness to user • Seek additional information, i.e., find minimal set

of queries to fully answer

Page 6: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 6

Related works

• Semantic caching• Answering queries using views

• keep (Qi,Ai)

• try to rewrite query Q into Q’(A1,...,An)

• reject if you cannot

• Incomplete database • (Qi,Ai) is some incomplete knowledge of DB

• Related to querying incomplete information – e.g. Lipski-Imielinski

Page 7: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 7

Challenge: balance expressiveness and tractability

• Choice of data model• Choice of the query language• Choice of a representation of

incompleteness

• Results• Simple, practical solution• Extra features lead to serious problems

Page 8: Representing and Querying XML with Incomplete Information

Simplifying Assumptions

Page 9: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 9

Data is XML: trees

dealer

UsedCars NewCars

ad ad

model year model

<dealer> <UsedCars> <ad> <model>Honda</model> <year>96</year> </ad> </UsedCars> <NewCars> <ad> <model>Acura</model> </ad> </NewCars></dealer>

Honda 96 Acura

Page 10: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 10

Simplified XML

=can =444 =electronique=can =444 =electronique=nik =234 =electronic=nik =234 =electronic

=camera=camera=camera=camera

=c.jpg=c.jpg

value functionvalue function

unordered treesunordered trees

name price cat picturename price cat picture

catalogcatalog

productproduct

subcategorysubcategory

productproduct

name price categoryname price category

subcategorysubcategory

labelling functionlabelling function

Page 11: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 11

Simple XML types

catalogcatalog

productproduct

name price cat picturename price cat picture

subcategorysubcategory

**

**

1 : 1 child (default)1 : 1 child (default)* : 0 or more* : 0 or more+ : 1 or more+ : 1 or more? : 0 or 1? : 0 or 1

Page 12: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 12

Prefix Selection Queries (ps-queries)

catalogcatalog

productproduct

name price cat=elecname price cat=elec

subcategorysubcategory

<200<200

Query1Query1catalogcatalog

productproduct

name name

Query2Query2

picturepicture

Page 13: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 13

Simplifications

Data

• No order• No distinction

attribute/element• No recursion• No links

Query

• No complex path expressions

• No join• No repeated child

productproduct

name cat=elec cat=toyname cat=elec cat=toy

NONO

Page 14: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 14

Crucial assumption: XID

prodprod

canon 120 eleccanon 120 elec

cameracamera

&245&245 prodprod&245&245

c.jpgc.jpg

++c.jpgc.jpg

prodprod

canon 120 eleccanon 120 elec

&245&245

cameracamera

==

• URLsURLs• ID/IDrefsID/IDrefs

Page 15: Representing and Querying XML with Incomplete Information

Representation of incomplete information:

Incomplete trees

Page 16: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 16

Document Type Definition (DTD) are used to represent incompleteness

• Set of rules: e r• e element name• r regular expression• Set of trees satisfying a

DTD d: tree(d)• Shortcoming of DTDs

• An element has a single definition independently of the context

• Type of ad depends on the context

dealerdealer

newxarnewxarusedcarusedcar

adadadad

modelmodel yearyear modelmodel

Page 17: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 17

Solution: specialization (decoupled tags)

• adused and adnew h(adused)=h(adnew )=ad

dealerdealer

newxarnewxarusedcarusedcar

adadnewadadused

modelmodel yearyear modelmodel

dealerdealer

newxarnewxarusedcarusedcar

adadadad

modelmodel yearyear modelmodel

hh

Page 18: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 18

DTDs + Specialization

The sets of trees that can be specified: the regular unranked tree languages [Bruggeman—Klein+Murata+Wood]

• Same closure properties: intersection, union, complement

• Same complexity

Page 19: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 19

Example

Q1: name, subcat, price of electronic products with price Q1: name, subcat, price of electronic products with price less than $200less than $200

Q2: name, pictures of cameras at least pictured onceQ2: name, pictures of cameras at least pictured once

--------------------------------------------------------

Q3: name, price, pictures of cameras costing less than Q3: name, price, pictures of cameras costing less than $100 and at least pictured once$100 and at least pictured once

can be can be completelycompletely answered using A1, A2 answered using A1, A2

Q4: list all camerasQ4: list all cameras

can be can be partiallypartially answered using A1, A2 answered using A1, A2

Page 20: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 20

catalogcatalog

cdplayercdplayer

productproduct

canon 120 eleccanon 120 elec

cameracamera

productproduct

nikon 199 elecnikon 199 elec

cameracamera

productproduct

sony 175 elecsony 175 elec

product1product1 product2product2

****

Q1: name, subcat, price of electronic products with price less than 200Q1: name, subcat, price of electronic products with price less than 200

missingmissing

Page 21: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 21

Missing data after Q1

product1product1

name price cat picturename price cat picture

subcategorysubcategory

**

product2product2

name price cat picturename price cat picture

subcategorysubcategory

**

!=elec!=elec =elec=elec>200>200

Page 22: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 22

catalogcatalog

productproduct

canon 120 eleccanon 120 elec

cameracamera

productproduct

nikon 199 elecnikon 199 elec

cameracamera

productproduct

sony 175 elecsony 175 elec

cdplayercdplayer

product2aproduct2a

Q2: name, pictures of cameras at least pictured onceQ2: name, pictures of cameras at least pictured once

product1product1

missingmissingproduct2cproduct2c

product2product2**

** product2bproduct2b**

c.jpgc.jpg akai a.jpg elecakai a.jpg elec

cameracamera

3333

Page 23: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 23

Incomplete information

• Known information• Prefix of the real data tree

• Missing information• Extended tree type• Conditions on data values• Specializations, disjunctions

Page 24: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 24

product1product1

name price name price catcat picture picture

subcategorysubcategory

**

!=elec!=elec

product2product2aa

name name priceprice catcat picture picture

subcategorysubcategory

=elec=elec>200>200

name price name price catcat

product3product3

elecelecproduct2product2bb

namename priceprice catcat picturepicture

**

=elec=elec>200>200

product2product2cc

namename priceprice catcat

subcategorysubcategory

=elec=elec>200>200

subcategorysubcategory!=camera!=camera

subcategorysubcategory!=camera!=camera

no pictureno picture

no pictureno picture

product +product +

Known data

Missing data

Page 25: Representing and Querying XML with Incomplete Information

Answering Queries

Page 26: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 26

Complete answer to Q3

• Q3: name, price, Q3: name, price, pictures of cameras pictures of cameras costing less than $150 costing less than $150 and having at least one and having at least one picturepicture

• Can be fully answered Can be fully answered using available using available informationinformation

• Need to check whether Need to check whether answer is completeanswer is complete

catalogcatalog

prodprod

canon 120 canon 120 c.jpgc.jpg

Page 27: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 27

Incomplete answer to Q4• Provide known cameras• Explain incompleteness

canoncanon nikonnikon sony sony akaiakai

more productsmore products

name name

price>200price>200andandno pictureno picture

Page 28: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 28

Completing answer to Q4

• It suffices to ask:

productproduct

name price cat name price cat

sub=camerasub=camera

=elec=elec>200>200 picturepicture

00

Page 29: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 29

Revisit the types• DTD • Conditions• Specialization: same

element name may have several types

• Not sufficient

• Need to extend again the types: disjunctions

productproduct2b2b

**

=elec=elec>200>200

subcategorysubcategory!=camera!=camera

namename priceprice catcat picturepicture

Page 30: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 30

Disjunction

??

??

vehiclevehicle

datadata engineengine

descriptiondescription

sailsail

vehiclevehicle

datadata

descriptiondescription

vehiclevehicle

datadataengineengine

sailsail

Query1’Query1’ Query2’Query2’

vehiclevehicle

data=“….”data=“….”

description=“….”description=“….”

Empty!Empty!&322&322

Page 31: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 31

Disjunction continued

• Type of &322vehicle1 + vehicle2

vehicle2vehicle2

datadata

descriptiondescription

sailsail

vehicle1vehicle1

datadata engineengine

descriptiondescription

The type of &322 can not be described independently of that of data below

Page 32: Representing and Querying XML with Incomplete Information

Results

Page 33: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 33

Representation System:Lipski’s+Imielinski’s

reprep rep(T)rep(T)Set of possible Set of possible worldsworlds

q(rep(T))q(rep(T))==

rep(q(T))rep(q(T))

qq

Set of possible Set of possible answersanswers

TT

Representation Representation of informationof information

q(T)q(T) reprep

qq

Representation Representation of resultof result

Page 34: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 34

Representation System for PS-queries

• Incomplete tree T to representq1

-1(A1) … qk-1(Ak)

• PS-query q

• q(T) can be computed in ptime(representation of the answer can be

computed in ptime)

Page 35: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 35

Querying Incomplete Trees

• Given T and a query q, one can • Give in ptime the sure answers up to

our current knowledge• Check in ptime whether query q can be

fully anwered• Generate in ptime queries to complete

answer

Page 36: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 36

Comparison with IL

Relational model

• Relational calculus/algebra

• Conditional table

• Closed or open world

• Representation system

XML tree model

• Weaker language (no join)

• Weaker system (no variable)

+ Closed and open World

• Representation system

Page 37: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 37

Drawback: exponential blowup

• Incomplete information may become exponential w.r.t the sequence of query/answer q1/A1;q2/A2…

11 11 qqii::

Answers are emptyAnswers are empty

databasedatabase

a=ia=i b=ib=i

databasedatabase

aa bb

Type:Type:

Page 38: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 38

Dealing with exponential blowup

• Make the representation more complex using disjunctions of types• Size of representation stays polynomial• Manipulations much more complex

• Restrict tree types and PS-queries • Already very/too? simple

• Accept to loose some information • Ask extra queries to simplify

representation

Page 39: Representing and Querying XML with Incomplete Information

Discussion

Page 40: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 40

Discussion: extend language

• Some results in paper• Extensions often lead to intractability

• E.G. : K-pebble transducers [Milo,Suciu,Vianu] that somehow subsume XML-QL and XSL• No (known) representation system• Testing rep(T) is empty is non-

elementary

Page 41: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 41

Discussion : node Ids

Without node Ids• much less information to integrate

results• more complex• tedious case analysis

Page 42: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 42

Discussion: ordering

• Ordering in XML, DTD, queries • Problem is totally different and very complex

• Example: • Q1/A1: list of males; Q2/A2: list of females; Q3: list all

• Depending on the type of input• (Male)*(Female)* A3= A1 || A2• (Male Female)* A3= shuffle(A1,A2)• (Male + Female)* we cannot answer A3

• Regular expression processing

Page 43: Representing and Querying XML with Incomplete Information

pods 2001 Abiteboul-Segoufin-Vianu 43

Conclusion

• Framework for acquiring, maintaining, querying incomplete XML data

• Limitations: • simple queries• no order and Id assumption • small extensions lead to problems

• Possible to represent the incompleteness• Possible to answer with incompleteness• Possible to obtain queries to provide full

answer


Recommended