+ All Categories
Home > Documents > Querying Infinite Databases

Querying Infinite Databases

Date post: 10-Feb-2016
Category:
Upload: kellsie
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Querying Infinite Databases. Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and Vianu ’97). Itay Maman 049011 Student Symposium, 5 July 2006. Simple Technion Queries…. (Domain: The Technion’s students database) - PowerPoint PPT Presentation
26
1 Querying Infinite Databases Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and Vianu ’97) Itay Maman 049011 Student Symposium, 5 July 2006
Transcript
Page 1: Querying Infinite Databases

1

Querying Infinite Databases

Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90)Queries and Computation on the Web (Abiteboul and Vianu ’97)

Itay Maman049011 Student Symposium, 5 July 2006

Page 2: Querying Infinite Databases

2/19

Simple Technion Queries…

(Domain: The Technion’s students database)

• Q1: Which courses did Gidi attend? SELECT course FROM students WHERE name='Gidi'

• Q2: Which students took 234218? SELECT name FROM students WHERE course='234218'

coursescourse name234218 Gidi236703 Gidi234218 Dina… …

Page 3: Querying Infinite Databases

3/19

Simple Web Queries…

• Q3: Which pages does my home page link to? SELECT target FROM links WHERE source='www.geocities.com/mysite'

• Q4: Which pages link to my home page? SELECT source FROM links WHERE target='www.geocities.com/mysite'

• Q4 is challenging: No matter how long my web-crawler works… … I can never find all incoming links of a page! This is an infinite query

• The more you crawl the more answers you get (In Q3 the size of the result set is bounded)

linksSource target www.google.com www.google.co.il www.geocities.com/mysite www.ynet.co.il www.cnn.com www.geocities.com/mysite … …

Page 4: Querying Infinite Databases

4/19

Leading questions

• What does an infinite DB look like? • Can we evaluate a query over an infinite DB?• Can we determine the finiteness of a query?

• But first, some Datalog…

Page 5: Querying Infinite Databases

5/19

Datalog• Why Datalog?

Supports recursion/transitive closure (unlike SQL)• Recursion is essential in large data-sets

Terminates if DB is finite Very simple

• program = A collection of rules• rule = A sequence of terms

• In our program: Three rules Two queries (AKA: IDB): g(X), small(X,Y) One Table (AKA: EDB): before(X,Y) A goal predicate from which execution starts

• We choose g(X) as the goal

g(X) :- small(X,2).small(X,Y) :- before(X,Y).small(X,Y) :- small(X,Z), before(Z,Y).

Page 6: Querying Infinite Databases

6/19

Finiteness

• A DB is finite If every table is a finite set before(X,Y) { (0,1), (1,2), (2,3) }

• Possible evaluation schemes: Brute force Bottom up

• Optimizations

•The Requirement: Finiteness of tables

•The guarantee: Termination of the Datalog program

Page 7: Querying Infinite Databases

7/19

Infinity

• Here is another definition for our table before(X,Y) { (X,X+1) | X 0 }

• We now have an infinite DB The Problem: we cannot iterate over the tuples in the set The solution: Top-down algorithm

• Such tables are quite common The internet links relation

links(X,Y) { (X,Y) | page X links to page Y } Java’s subclassing relation

extends(X,Y) { (X,Y) | class X extends Y }

Leading question:What does as infinite DB look like?

Page 8: Querying Infinite Databases

8/19

Example: Top-down evaluation

g(W) = s(W,2) = b(W,2) s(W,Z) b(Z,2) = {(1,2)} s(W,1) {(1,2)} = {(1,2)} [b(W,1) s(W,Z) b(Z,1)] {(1,2)} = {(1,2)} [{(0,1)} s(W,0) {(0,1)}] {(1,2)} = {(1,2)} [{(0,1)} [b(W,0) s(W,Z) b(Z,0)] {(0,1)}]

{(1,2)} = {(1,2)} [{(0,1)} [ s(W,Z) ] {(0,1)}] {(1,2)} = {(1,2)} [{(0,1)} {(0,1)}] {(1,2)} = {(1,2)} {(0,1)} {(1,2)} = {(1,2)} {(0,2)} = {(1,2),

(0,2)}

g(W) :- small(W,2).small(A,B) :- before(A,B).small(X,Y) :- small(X,Z), before(Z,Y).before(X,Y) { (X,X+1) | X 0 }

•b : before•s : small : Join

s(X,Y) = b(X,Y) s(X,Z) b(Z,Y)

Page 9: Querying Infinite Databases

9/19

Top-down evaluation• The Top-down algorithm

Init: assign r body of the goal Loop:

• (Intelligently) Pick a term, t, from r• If t is a query term:

Replace it with the union of the rules indicated by t• If t is a table term:

Replace it with the set generated by the table• Replace s expressions (in r) with • Replace s expressions (in r) with s• Evaluate relational algebra expressions (if both sides are known)

Stop if no further replacements can be made

Leading question:Can we evaluate a query over an infinite DB?

Yes

Page 10: Querying Infinite Databases

10/19

Infinite Queries• Can the top-down algorithm run forever?

Yes

• Case 1: An table that returns an infinite result evenProduct(X,Y) { (X,Y) | X*Y mod 2 = 0 } divides(X,Y) { (X,Y) | X mod Y = 0 } links(X,Y) { (X,Y) | page X links to page Y }

• weak-safety: all intermediate results are finite

• Result #1 (Sagiv and Vardi ’90): Weak-safety is decidable given F/C (finiteness constraints) of tables

• F/C of evenProduct: None• F/C of divides: X => Y• F/C of links: X => Y

Algorithm: Tracking flow of values from assigned variables

Page 11: Querying Infinite Databases

11/19

g(W) = s(2,W) = b(2,W) s(2,Z) b(Z,W) = {(2,3)} s(2,Z) b(Z,W) = {(2,3)} [b(2,Z) s(2,Z’) b(Z’,Z)] b(Z,W)…

Infinite Queries (cont.)• Can the top-down algorithm run forever?

Yes

• Case 2: The algorithm’s recursion never stops A query/table is used in its “unbounded” direction

g(W) :- small(2,W).small(A,B) :- before(A,B).small(X,Y) :- small(X,Z), before(Z,Y).before(X,Y) { (X,X+1) | X 0 }

s(X,Y) = b(X,Y) s(X,Z) b(Z,Y)

• Results #2-3 (Sagiv and Vardi ’90): Termination is undecidable in the general case Termination is decideable if all queries are unary

Page 12: Querying Infinite Databases

12/19

Infinite Queries (summ.)

• We can automatically determine weak-safety• We cannot (automatically) determine termination

• But, one can analytically prove that a given query over a given DB is finite E.g., our small(W,2) program

Leading question:Can we determine the finiteness of a query?

No

Page 13: Querying Infinite Databases

13/19

The Web as a DB

• The web data model (WDM): A scheme of a DB that can represent the web graph Just three tables:

urls = { u | u is a url of a web-page }links = { (u1,u2) | u1 links to u2; u1, u2 urls }Words = { (u,w) | w appears in page u; u urls }

• Result #4 (Abiteboul and Vianu ’97): If a Datalog program with no literals halts over

an infinite DB, its result is • => A non-trivial query (over an infinite DB) must have a literal

Page 14: Querying Infinite Databases

14/19

Web - Machines

• Browsing Machine A weakly safe Datalog program (over WDM) At least one URL literal

• Searching/Browsing Machine An unsafe Datalog program (over WDM)

• Evaluates queries in parallel Allowed literal types: URLs, Words

• Claims #1-2 (Abiteboul and Vianu ’97): Browsing machine:

• Represent a user following static links from a page Searching/Browsing machine:

• Also allows the user to access search engine

Page 15: Querying Infinite Databases

15/19

Discussion: Finite approximation• Relational Database servers are very popular

Such DBs are finite

• Also, computing a table on demand may be slow Better performance at batch processing

The challenge: Build a finite replacement for an infinite DB

• Formally: Given a finite query, q, over an infinite DB,

• (Finiteness of q proved analytically) Build a finite Database, , such that q over yield the

same result as q over

Page 16: Querying Infinite Databases

16/19

Discussion: Finite approximation

• Example: Our small(W,2) program A finite, sound table: before(X,Y) { (0,1), (1,2) } A finite, unsound table: before(X,Y) { (0,1) }

• The process: Compute the transitive closure of the before relation Start from the literal ‘2’ at the right-hand side position

• Condition: the table graph must end with a sink In before the sink is the vertex ‘0’

• => We can build a finite DB Sadly, In the web-graph no such sink exists

Page 17: Querying Infinite Databases

17/19

Discussion: Temporality• Crawling takes time• The subject may change while crawling

The DB is a snapshot which never happened

• (Open Question):• Can we decide whether a result was really “true”

at some point?

Page 18: Querying Infinite Databases

18/19

More issues

• Relational algebra over large relations BDD

• Negation Stratified Datalog

Page 19: Querying Infinite Databases

19/19

- Questions ? -

Page 20: Querying Infinite Databases

20/19

Page 21: Querying Infinite Databases

21/19

Datalog

• Semantics: ???• Straight forward mapping to Relational

Algebra??

g(X) :- small(X,2).small(X,Y) :- before(X,Y).small(X,Y) :- small(X,Z), before(Z,Y).

Page 22: Querying Infinite Databases

22/19

Example: Bottom-up evaluation

beforeX Y0 11 22 3

Initialization: Translate the EDBs into relations

Page 23: Querying Infinite Databases

23/19

Example: Bottom-up evaluation

smallX Y0 11 22 3

apply small(X,Y) :- before(X,Y).beforeX Y0 11 22 3

Page 24: Querying Infinite Databases

24/19

Example: Bottom-up evaluation

beforeZ Y0 11 22 3

apply small(X,Y) :- small(X,Z), before(Z,Y).lessX Z0 11 22 3

smallX Y0 11 22 30 21 3

Join

smallX Z0 11 22 3

beforeZ Y0 11 22 3

smallX Z0 11 22 30 21 3

smallX Z0 11 22 30 21 3

smallX Z0 11 22 30 21 30 3

Page 25: Querying Infinite Databases

25/19

Example: Bottom-up evaluation

apply g(X) :- small(X,2).smallX Y0 11 22 30 21 30 3

gX10

smallX Y0 11 22 30 21 30 3

Page 26: Querying Infinite Databases

26/19

Finitenessbefore(X,Y) { (0,1) (1,2) (2,3) }

• The Bottom-up algorithm: Init:

• For each EDB, p, assign r(p) Relation of all tuples satisfying p• For each IDB, p, assign r(p)

Loop:• Choose a rule p(…) :- t1(…), t2(…), … tn(…)• t join of all r(ti), where 1 i n• r(p) r(p) t

Continue until a fix-point is reached•Requires: Finiteness of EDBs•Ensures: Termination


Recommended