+ All Categories
Home > Documents > XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox...

XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox...

Date post: 27-Dec-2015
Category:
Upload: phebe-singleton
View: 215 times
Download: 1 times
Share this document with a friend
35
XPath and Beyond: Formal Foundations Jean-Yves Vion-Dury Xerox Xerox Research Centre Europe / INRIA Research Centre Europe / INRIA Pierre Genevès INRIA INRIA
Transcript
Page 1: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

XPath and Beyond: Formal Foundations

Jean-Yves Vion-Dury XeroxXerox Research Centre Europe / INRIAResearch Centre Europe / INRIA

Pierre Genevès INRIAINRIA

Page 2: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Roadmap: Part 1

XPath: a cornerstone of the XML architecture Theory and Engineering Some key problems The trends around XPath theoretical studies A Logic Based approach Mathematical Characterization Why using the Coq Proof Assistant ?

Page 3: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

XPath: a cornerstone of the XML architecture

Expresses both node selection and/or structural properties

Currently used in XSLT, XQuery, XML Schema, XLink, XPointer,…

XPath is elegant, compact, effective and powerful Claim: will be increasingly used and studied in the

future Indexing large document bases Checking integrity constraints / global structural properties Linking increasing document volumes

Page 4: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Theory and Engineering in Computer Sciences

Some decades ago, some theoretical studies prepared engineering The relational algebra enabled a huge market around data storage and

access Information Theory prepared digital processing (networks, image and

sound processing, compression algorithms,…) Linguistic, Logic and Formal mathematics prepared programming

languages A Strange situation today around documents…

W3C Standardization activities produce specifications, and many problems remain open

Some theoreticians try to capture problems and to understand underlying issues, long after the publication of the specifications!

This induces new difficulties and requires different approaches In order to deal with low level issues, closed from implementations In order to face complexity of systems

Page 5: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Some Key Problems around XPath

Formal semantics definition Formal Model of Documents (trees, streams, graphs, strings,…?) Precise, useful and simple Denotational/Operational semantics

Type checking Constraints on Document structure (tree grammars, graph grammars,

pattern matching) Valid/Invalid Path expression with respect to a particular schema

Rewriting path expressions In order to customize compilation/interpretation Normalization

Optimization Reduction of the complexity of suitable models Simplifying expressions while preserving semantics

Equivalence p1 ≈ p2 gives a fundamental understanding of the language

Containment p1 ≤ p2 Gives an even more fundamental view Key inference: If p is a key for a schema S, then all p’ such that p’ ≤ p are

keys too

Page 6: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Linking Key Problems around XPath

Invalid expression and containment p ≤

Rewriting and equivalence (p1 | p2)/p -> p1/p | p2/p and (p1 | p2)/p ≈ p1/p | p2/p

Optimization and containment If p1 ≤ p2 then (p1 | p2)/p -> p2/p

Equivalence and containment p1 ≈ p2 iff p1 ≤ p2 and p2 ≤ p1

Containment and type checking Structural constraints can be captured in XPath expression Structural Constraint satisfaction can thus be checked

Page 7: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

The problem of containment (expression)

)()(,, 2121 tptppptxt xx

/td table/ dtable/tr/t ** table// dtable/tr/t

**/[.//c]/* a/b/c

::*::** td]/parent[self/ table/ d]table/tr[t

Page 8: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

The problem of typed containment (expression)

/td) table/tr /td*(table/

)()(,,, 2121 tptppptxStS xxS

/td table/tr /td*table/ html

Page 9: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

The Trends around XPath Theoretical Studies

Formal semantics

Rewriting OptimizationContainment

& Equivalence

Typed Containment/Optimization

Child and descendant axes

[Geneves,vion04][Flesca03]

[Miklau,Suciu][Neven,Schwentick][Deutsch,Tannen][Geneves,vion04]

[Deutsch,Tannen][Kwong,Gertz]

All axes [Olteanu01][Olteanu01][vion,layaida03]

[Geneves,vion][vion,layaida03] ?

All axes+ Position andcount

[Wadler99][vion,layaida03][Gottlob,koch03]

[Geneves,Rose04]

[Geneves,vion] [Geneves,vion]

Page 10: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

A Logic Based Approach

A set of axioms to reason on terms comparison ≤ As opposed to model based approaches

A partial equivalence relation to minimize the axiom set fully congruent (e.g. p1 ≤ p2 and p1==p3 implies p3 ≤p2)

Theorems for simplifying the containment proofs E.g. reflexivity, transitivity

Drawback: syntactic level more combinatorial as opposed to model based approaches

Advantage: syntactic level more extensible, provided the previous point is addressed Gives more indication on the underlying issues due to

language peculiarities

Page 11: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

XPath: abstract syntax ([Wadler99],[Olteanu01])

Page 12: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Denotational semantics ([Wadler99][Olteanu01])

Page 13: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Denotational semantics ([Wadler99][Olteanu01])

Page 14: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Denotational semantics ([Wadler99][Olteanu01])

Page 15: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Basic axioms

1'::'::

''d

NaNa

NNaa

topc^^ 1cp

24321

424321

//

][][d

pppp

pppppp

Page 16: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Union & Intersection

121

21 ippp

pppp

ccppp

pppp3

21

21

| acppp

pp3

21

1

|

221

21 ippp

pppp

Page 17: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Qualifiers

32211

2121

][][d

qpqp

qqpp

Page 18: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

The equivalence relation ( [Olteanu01])

Page 19: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Using equivalence in proofs

app

pppp

1

221

bpp

pppp

1

221

Page 20: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Mathematical Characterization

Soundness of the equivalence

Soundness of rules (e.g.)

Completeness of rule system (e.g.)

)]()|()([| tpptpppp xx321321

)]()|()([| tpptpppp xx321321

)]()()()([ 122121 tptptptppp xxxx

Page 21: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Why Using the Coq Proof Assistant ?

Coq http://coq.inria.fr is a Proof Assistant based on the Calculus of Inductive Constructions

Higher Order Logic Constructive Logic Typed

To address the complexity problem related to proofs To benefit from the help of the Proof Assistant in case analysis To maintain all the mathematical architecture along exploratory work

To work in a rigorous frame To produce rock solid and readable results

The challenge: Require powerful data structure modelling capabilities Learning Coq is an additional difficulty ! Developing a proof in Coq is more demanding

But… Coq is quite mature now (v8.0, 25 years of research !) and very

expressive

Page 22: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Roadmap: Part 2

Modelling XPath using inductive constructions Formal Semantics and interpretations

Interpreter based on the denotational semantics A relational semantics for XPath

Modelling the containment relation Using the proof system: containment checking Current work on characterization Methodology and expected outcomes

Page 23: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Modelling XPath using inductive constructions

Paths are defined inductively “void” (), “top” () are atoms | / … are binary

constructors [] involves qualifiers

_true, _false are atoms “and”, “or”, “not” :

constructors “leq” (): a cross-inductive

definition Functional notation, example:

a/b[c] slash a (qualif b c)

Page 24: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Interpreter based on the denotational semantics

Evaluates a path p from the context node x of the tree t

The evaluation of a path returns a set of nodes

Cross-Recursive and terminating functions

The evaluation of a qualifier returns a boolean

Page 25: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Need for a logic-based semantics

The classical semantics describes an interpreter that computes nodesets

This computational vision leads to useless complexity in proofs

Is there another way to capture XPath Semantics?

Page 26: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

A Relational Semantics for XPath

An Interpretation of paths in First-Order Logic

A path is translated into a dyadic formula

Rp holds for all pairs (x,y) of nodes such that y is accessed from x through the path p.

Advantages: interpretations of paths

and qualifiers are unified Direct translation in Coq

Sem math du papier

Page 27: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Modelling the containment relation (1)

A binary logical relation “Ple” Gathers all containment rules in a single inductive

construction Suited for using Coq’s built-in tactics (constructor,

inversion)

Page 28: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Modelling the containment relation (2)

The containment relation ≤ for paths Is inductive

Is defined using its dual relation for qualifiers (“Qipl”)

Page 29: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Using the proof system: Containment Checking

We have modelled: XPath terms Their interpretation The containment relation (that gathers our containment

axioms) We can now check containment facts with the proof

engine Demo of a tactical which proves the fact:

./*/b ≤ ./descendant::b

Underlying goal: extend the tactical in order to automatize the checking of all containment facts

Page 30: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Proving Properties: Characterization

Proving the equivalence of semantics (done) Current work: proving the validity of our axiomatization:

Soundness Completeness Finding relevant induction schemes

mutual induction (duality between paths-qualifiers) Induction on a measure of the term complexity

Finding generic and modular Coq tactics (to reduce combinatorial issues)

Page 31: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Methodology and Possible outcomes

SoundNot Sound

Inductive Relation Ple

Incomplete

Complete

Fix wrong rulesAdd missing rules

Intrinsically Incomplete

Incomplete Algorithm

Algorithm

Extend the fragment

Undecidable

Decidable

Undecidable

Decidable

why?

why?

Page 32: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Conclusion

We proposed a Logic based framework for static analysis of XPath

Modelling with inductive constructions (XPath terms and interpretations, Containment Relation)

Preliminary result: a simpler semantics Ongoing Work on Characterization

Page 33: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Backup slides

Applications

Page 34: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Some Applications (1)

Optimization of XPath queries Detecting contradictions (p ≤ void) Eliminating redundancies

Example: //a[*/b/c and descendant::b]

/descendant::a[*/b/c] */b/c => descendant::b

An optimization not currently achieved at runtime by XPath engines:

Xalan C++

Page 35: XPath and Beyond: Formal Foundations Xerox Research Centre Europe / INRIA Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA INRIA Pierre Genevès.

05/2004XPath and Beyond: Formal

foundations

Some Applications (2)

Static Analysis of XPath host languages Example: XSLT

Checking XSLT stylesheets Optimization of XSLT stylesheets

Extending XPath expressive power with an inclusion constraint: p[p1 p2]

Integrity Constraint-Checking id(//book/@authors) //persons/@name

Transformation languages strongly based on XPath


Recommended