Cost Framework for a Heterogeneous Distributed Semi-structured Environment
Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1)
DBMAN 2007
(1) ETIS LaboratoryUniversity of Cergy-Pontoise
Cergy-Pontoise, France
(2) Xcalia S.A., Paris, France
June 18th, 2007
Outline
Motivation
Cost models for heterogeneous data sources
Contributions Generic language for cost communication Dynamic cost estimation framework
Conclusion
DBMAN 2007
Motivation
Cost-based query optimization Various execution plans for the same query Different costs for each plan (execution time, price,
communication, etc.) Cost model used to estimate the cost of candidate plans
Cost formulas: source oriented or operation oriented Statistics of data sources
Problems in the case of mediation context
Data source autonomy: cost models not available Integration of various cost models at mediator level Cost communication between components of the system
DBMAN 2007
Cost models for heterogeneous data sources
Cost models based on operation implementation
Generic cost models Specific methods
Known sources Heterogeneous autonomous sources
Relational
Data sources
Object oriented
Data souces
Semi-structured
Data sources
Operation[GP89]
[ML86] [SA82]
Sampling[ZL98]
Calibration[DKS92]
Adaptive[Zhu95]
Adapted
Refined
Operation[CD92] [BMG93]
[DOA+94]
Calibration[GST96]
AccessPath
[GGT96]
Extended
Flora[Flo96][Gru96]
Hybrid cost model[NGT98]
Cost modelby history[ACP96]
Wrapper[HKWY97][ROH99]
Operation[AAN01][MW99]
XQuerySelf-Learning[ZHJGML05]
Applied
Applied
DBMAN 2007
Background XLive mediation system and its XQuery evaluation process
DBMAN 2007
Wrapper Wrapper Wrapper
… …
XQuery
Query Result(XML)
Relational data source
XML data source
Web services
CanonizedXQuery
Tree Graph View(TGV)
AnnotatedTGV
XAlgebra
Query
Canonization
Modeling
Annotation
Transformation
Evaluation
Cost-basedOptimization
Response
Wrapperoperators
Mediator Equivalent rules SearchStrategy Mediator
InformationRepository
WrapperInformationRepository
Cost information
Cost information
Mediatoroperators
BackgroundTree Graph View (TGV)
An example of XQuery TGV presentation
DBMAN 2007
Generic cost model in a mediation context
Design a generic cost model… Source type: relational, semi-structured, web-service… Specific methods
Calibration, History… APIs implemented by the system Principle: as accurate as possible
…Using cost formulas Equation systems Statistics expressed also in the form of equation Constant values
Existing generic cost model (Disco) Object Oriented environment Predefined variables in the language
DBMAN 2007
Our proposal: Generic Language for Cost Communication (GLCC)
A language based on XML Cost formulas and equation
systems in the form of MathML
A generic language No predefined variables Express different costs for
various optimization objectives (time, price…)
DBMAN 2007
Dynamic cost estimation framework
Cooperation and communication between different components of XLive
Use execution results (response time) to improve the accuracy of cost models
Cost communication performed in GLCC
DBMAN 2007
Overall cost estimation on the mediatorTGV cost annotation
For one or a group of operations in a TGV, annotate with cost information
Annotated
DBMAN 2007
Overall cost estimation on the mediatorCost Annotation Tree (CAT)
Breadth-first traversal of CAT to associate the execution cost for each node
DBMAN 2007
Conclusion and future work
Contributions First cost-based query optimization framework for XML-based
mediation system Generic language Suitable for various search strategies
Future work Cost model validation: Accuracy and performance Calibrating cost of native XML Data sources Search Strategy
DBMAN 2007
Thanks for your attention!
Questions?
DBMAN 2007