1
1
Extreme JavaG22.3033-007
Session 3 - Sub-Topic 6XML Information Retrieval
Dr. Jean-Claude Franchitti
New York UniversityComputer Science Department
Courant Institute of Mathematical Sciences
2
AgendaApplications of XML to Database Technology
XML Query LanguagesXPathXML Queries
XQuery: A Query Language for XMLXML Query EnginesXML Object PersistenceAdvanced XQuery ConceptsPresentation Oriented Publishing (POP)Frameworks
2
3
XML-Based Retrieval DevelopmentXML Software Development Methodology
Language + Stepwise Process + ToolsRational Unified Process (RUP) v.s. “XML Unified Process”
XML Application Development InfrastructureMetadata Management (e.g., XMI)XML Query Engine (3rd party software)XML Tools (e.g., XML Editors)
XML Applications Involved in the Rendering Phase:Application(s) of XMLXML-based applications/services (markup language mediators)
MOM, POP, Persistence service
Application Infrastructure Frameworks
4
XML Data Retrieval Patterns
XML Data Retrieval OperationsAccessQueryManipulate
Multiple XML Data Sources IntegrationXML Message FilteringDBMS Data ViewsDatabase System Interfacing
3
5
Part I
Applications of XML toDatabase Technology
6
Towards XML Application ServicesProcessing
DOM ExtensionsBinding ExtensionsComponent Frameworks (reusable component models)Model-Based Automation (MDA)
RenderingDOM 2.1.0, SAX 2.0, JAXP 1.1 & TraX, XSL-FO 1.0Component Frameworks
QueryingXQuery 1.0, XSLT 1.1/2.0, XPath 1.0/2.0
Security (signatures encryption/decryption, etc.)Etc.
4
7
Retrieval Software DevelopmentLanguages (XML-QL, YaTL, XQL, etc.)
Data Model + Operations + Syntax
Process (“XUP”)Frameworks
Custom Enginee.g., XQEngine
Translation to SQLe.g., DB2XML, Oracle’s XML I/F
Translation to OQL
XML Query InfrastructureXPath Processors: Saxon 6.1, Xalan-J 2.1.0XQuery ProcessorsSQLX
8
XML Query History
SGML Query FacilitiesAd-hoc Approach to Query Languages02/98: XQL Proposal08/98: XML-QL Submission12/98: W3C QL’98 Workshop
Candidate Requirements for XML QueryDatabase Desiderata for an XML Query Language
11/99: XPath Recommendation
5
9
W3C XML Query WG
07/99: WG Proposal09/99: WG Official InceptionToday:
30 W3C Member Companies11 Meetings, 60+ TelconsHeartbeat every Three MonthsProposed Recommendation(s)
Goals:XML Query Data Model for XML DcoumentsQuery Operators for XML Query Data ModelQuery Language Based on XML Query Operators
10
W3C’s Related Standards
XML Query Specifications:XML Query Requirements (02/16/01 - orig. 01/00)XML Query Use Cases (06/08/01 - orig. 08/00)XQuery 1.0 and XPath 2.0 Data Model (06/07/01 - orig. 05/00)XQuery 1.0 Formal Semantics (06/07/01 - orig. 12/00)XQuery 1.0: An XML Query Language (06/07/01)XML Syntax for XQuery 1.0 (XQueryX) (06/07/01)
XPath 2.0 SpecificationsXPath Requirements Version 2.0 (02/15/01)XQuery 1.0 and XPath 2.0 Data Model (06/07/01)
6
11
Related XML TechnologiesXPathXSLXPointerXML SchemaXML InfosetWAIInternationalizationIETF DASL
Distributed Authoring Searching and Locatinghttp://www.ics.uci.edu/pub/ietf/dasl/
12
Properties of RDBMS Queries
Pattern + Filter + Construction clauseConstruction clause may have orderingsubclausesQueries may perform joins across multipleinput setsQueries may generate intermediatevariables or path expressions
7
13
Mapping XML to a RDBMS
SQL-like queries that return XMLdocuments
e.g., Microsoft IIS + SQL Servere.g., Oracle Database Server
Broad spectrum of possible mappingsHierarchical v.s. limited RDBMS tree structure
14
JDBC Refresher
See section 6.2 of XML and Java textbookImporting JDBC PackageLoading a JDBC DriverConnecting to a DatabaseSubmitting a Query
8
15
XML Embedded in SQL (SQLX)
SQL Embedded in XMLSee Section 6.3 of XML and Java textbookFront-end to RDBMS that provides XML-basedInput/OutputTranslates XML query into sequence of JDBCcalls, and converts the result to a DOMstructure which is returned
16
Part II
XML Query
9
17
XML Query Requirements (Part I)General:
Declarative LanguageReadable XML SyntaxProtocol IndependenceStandard Error ConditionsSupport for Future Updates
Data ModelBased on XML InfosetsNamespace AwareSupport for XML Schema Data TypesSupport Inter/Intra Document References
18
XML Query Requirements (Part II)Query Functionality:
Operators on All Data TypesText Operators Across Element BoundariesHierarchies and SequencesCombination of Data from Various LocationsAggregation and SortingCombination of Operators (Queries as Operands)Support NULL valuesPreservation of Structure/IdentityOperations on Names/SchemasExtensibility & Closure
10
19
XML Query Use CasesApproach
Description, DTD/Schema, Input, Queries, ResultsExisting Use Cases
XMP (examples)TREE (queries that preserve hierarchy)SEQ (queries based on sequence)R (relational data access)TEXT (text search)NS (namespace-based queries)PARTS (recursive parts queries)REF (queries based on references)
20
XML Query Data Model
Information Presented to a Query ProcessorAugmented Infoset:
XML Schema Data Types (PSVI)Document CollectionsReferences
Node-Labeled Tree Constructor Model withNode IdentityInfoset Mapping to Query Data Model isDefined as Part of the Specification
11
21
XML Query Data Model(continued)
NodesNode = DocNode | ElemNode | AttrNode | ValueNode| NSNode | PINode | CommentNode | InfoItemNode
XML Schema Primitive Typesstring, boolean, ID, IDREF, decimal, etc.
Collectionslist [T], set {T}, bag {|T|}, disjoint/union (T1 | T2),tuple (T1, …, Tn)
Referencesref(T)
22
XML Query Algebra
Defines Static and Dynamic SemanticsStatic Semantics are Type Inference Rules
Relate Algebra Expressions to Types
Dynamic Semantics are Value Inference RulesRelate Algebra Expressions to Values
Issues:Algebra Type System Alignment with XML SchemaOperators on Schema Simple Types not DefinedLexical Representation of Schema Simple Types notDefined
12
23
ConstructorsConstruct Values in XML Query Data ModelattrNode : (Ref(QNameValue), Ref(ValueNode))-> AttrNodeValueNode = QNameValue | StringValue |DecimalValue | ...qnameValue : (uriReference | null, string,Ref(Def_QName))-> QNameValuedecimalValue : (decimal, Ref(Def_decimal)) ->DecimalValue
<part price=10.50/><xsd:attribute name=“price” type=xsd:decimal/>
attrNode(ref(qnameValue(null, “price”,Ref(Def_QName)),ref(decimalValue(10.50, Ref(Def_decimal))))
24
Assessors
Deconstruct Values in XML Query Data Modelname : AttrNode -> Ref(QNameValue)value : AttrNode -> Ref(ValueNode)type : AttrNode -> Ref(ElemNode)
<xsd:attribue name=“price” type=xsd:decimal/><part price=10.50/>
name(A1) = ref(qnameValue(null, “price”))value(A1) = ref(decimalValue(10.50,Ref(Def_decimal)))type(A1) =<!-- data model representation of simple typedecimal -->
13
25
Part III
XML Query Languages
26
XQueryFunctional Language
Query Represented as an ExpressionExpressions can be Nested without RestrictionInput/Output of an XQuery are Instances ofthe XML Query Data ModelBased on OQL, SQL, XML-QL, XPathReadable XML Syntax
14
27
XQuery Expressions
Path ExpressionsElement ConstructorsFLWR ExpressionsExpressions with Operators/FunctionsConditional ExpressionsQuantified ExpressionsList ConstructorsExpressions to Test/Modify Datatypes
28
XQuery Path ExpressionsAbbreviated XPath 1.0 Syntax
Find figure(s) with caption “Tree Frogs” insecond chapter of “zoo.xml”document(“zoo.xml)/chapter[2]//figure[caption= “Tree Frogs”]
ExtensionsDereference OperatorRange PredicateDocument FormatsFind captions of figures referenced by <figref”elements in “Frogs” chapter of “zoo.xml”document(“zoo.xml”)/chapter[title =“Frogs”]//figref/@refid->fig/caption
15
29
XQuery Element Constructor
Start/End Tag + Enclosed List of ExpressionsGenerate an element with a computed name thatcontains nested elements:
<$tagname><description> $d </description><price> $p </price></$tagname>
30
XQuery For Let Where Return (FLWR)
FOR and LET ClauseGenerate a List of Tuples that Preserves Doc Order
WHERE ClauseApplies a Predicate to Eliminate Some Tuples
RETURN ClauseExecuted on Resulting Tuples -> Ordered Output List
Syntax:FOR var IN expr WHERE expr RETURN exprLET var := expr
16
31
FLWR Sample Expressions
List titles of books published by MK in 98FOR $b IN document (“bib.xml”)//bookWHERE $b/publisher = “Morgan Kaufmann”
AND $b/year = “1998”RETURN $b/title
List each publisher and its books average priceFOR $p INdistinct(document(“bib.xml”)//publisher)LET $a := avg(document(“bib.xml”)
/book[publisher = $p]/price)
32
XQuery Operators and Functions
Infix/Prefix Operatorse.g., Infix Operators BEFORE and AFTER
Parenthesized ExpressionsArithmetic/Logical OperatorsCollection Operators
e.g., UNION, INTERSECT, EXCEPTFunctions Can Be Defined in XQuery
17
33
Sample Operators and Functions
Find max depth of “partlist.xml”NAMESPACExsd=“http”//www.w3.org/2001/03/XMLSchema-datatypes”
FUNCTION depth(ELEMENT $e) RETURNS xsd:integer{
IF empty ($e/*) THEN 1ELSE max (depth($e/*))+1
}depth(document(“partlist.xml”))
34
XQuery Conditional Expressions
FOR $h IN //holdingRETURN
<holding>$h/titleIF $h/@type=“Journal” THEN $h/editorELSE $h/author
<holding> SORTBY (title)
18
35
XQuery Quantified Expressions
Example 1:FOR $b IN //bookWHERE SOME $p IN $b//para SATISFIES
contains($p, “sailing”)AND contains($p, “windsurfing”)
RETURN $b/title
Example 2:FOR $b IN //bookWHERE EVERY $p IN $b//para SATISFIES
contains($p, “sailing”)RETURN $b/title
36
XQuery List Constructors
List encloses zero or more expressions insquare brackets, separated by commasList of member variables: [$x, $y, $z]Empty list: [ ]
19
37
XQuery Operators on Data Types
INSTANCEOF (instance, type)CAST
Convert value from one datatype to anotherTREAT
Causes the query processor to treat an expressionas if its datatype were a subtype of its static type
38
XQuery Outstanding Issues
Integration with XPath 2.0Alignment of XQuery and XML QueryAlgebra SyntaxInternationalization
e.g., Collation Sequences for Sorting, Strings opsXML Query SyntaxOperators and Functions TBD
20
39
Part IV
XML Query Enginesand
Advanced Concepts
40
Various ApproachesXQEngine
Full-text search engine for XMLJava APIs availableW3C XQuery Specification Support
DB2XMLStandalone tool (with GUI or command line)Servlet to dynamically generate XML-documentsDB2XML API
Oracle XML Developer Kit (XDK )Microsoft SQL Server support for XML
21
41
XML Object Persistence
Started as SODL and XMOPSimple Object Definition LanguageXML Metadata Object Persistence
XML and JavaBeans integration (e.g., BML,Coins, etc.)XML and EJB integration
See XML Development with Java 2 (chapter 8)XML serialization for Java (e.g., Koala, etc.)SOAP - XML-RPC protocol
42
Advanced XQuery Concepts
Mainstream XQuery EnginesSoftware AG’s QuiPH. Katz XQEngine
Experiment with Complex Queries and QuiP
22
43
POP Frameworks
Client-Side POPIE5
Server-Side POPCocoon & XSPRocketCPAN’s Perl Framework
44
Part V
Conclusions
23
45
SummaryApplying XML to Database Technology allowsthe viewing of database data as an XMLdocument.XML Query is based on a well defined DataModel and AlgebraVarious syntaxes are possible for an XML QueryLanguageXML Query Engines are infrastructurecomponents that support XML Query
46
Summary(continued)
Bindings approaches are currently implementedbetween XML and JavaBeans/EJBsSoftware AG’s Quip implements complex queryprocessing as per XQuery 1.0Server-side POP is the approach of choice forXML processing