Post on 27-Dec-2015
transcript
Multi-Query OptimizationMulti-Query Optimization
Prasan RoyIndian Institute of Technology - Bombay
OverviewOverview
Multi-Query Optimization: What?– Problem statement
Multi-Query Optimization: Why?– Application scenarios
Multi-Query Optimization: How?– A cost-based practical approach– Prototyping Multi-Query Optimization
• On MS SQL-Server at Microsoft• Research prototype at IIT-Bombay
Multi-Query Optimization: Multi-Query Optimization: What?What?Exploit common subexpressions (CSEs) in
query optimization Consider DAG execution plans in addition
to tree execution plans
ExampleExample
AA
BB CC
BB
CC DD
Best Plan for Best Plan for A JOIN B JOIN CA JOIN B JOIN C
Best Plan forBest Plan forB JOIN C JOIN DB JOIN C JOIN D
Example (contd)Example (contd)
Alternative:
AA
BB CC
DD
Common SubexpressionCommon Subexpression
Multi-Query Optimization: Multi-Query Optimization: Why?Why? Queries on views, nested queries, … Overlapping query batches generated by
applications Update expressions for materialized views Query invocations with different parameters . . .
Practical solutions needed! Practical solutions needed!
Multi-Query Optimization: Multi-Query Optimization: How?How?Set up the search space
– Identify the common subexpressionsExplore the search space efficiently
– Find the best way to exploit the common subexpressions
ProblemsProblems
Materializing and sharing a CSE not necessarily cheaper
Mutually exclusive alternatives (A JOIN B JOIN C)(A JOIN B JOIN C)
(B JOIN C JOIN D)(B JOIN C JOIN D)
(C JOIN D JOIN E)(C JOIN D JOIN E)
What to share: (B JOIN C)(B JOIN C) or (C JOIN D)(C JOIN D) ?
Huge search space! Huge search space!
Earlier Work: Earlier Work: Practical SolutionsPractical SolutionsAs early as 1976 Preprocess query before optimization
[Hall, IBM-JRD76]
As late as 1998 Postprocess optimized plans
[Subramanium and Venkataraman, SIGMOD98]
Query optimizer is not aware! Query optimizer is not aware!
Earlier Work: Earlier Work: Theoretical StudiesTheoretical Studies[Sellis, TODS88], [Cosar et al., CIKM93], [Shim et al., DKE94],...
Set of queries {Q1, Q2, …, Qn} For each query Qi, set of execution plans
{Pi1, Pi2, …, Pim} Pij is a set of tasks from a common pool
Pick a plan for each query such that the cost of tasks in the union is minimized
Not integrated with existing optimizers, no practical study Not integrated with existing optimizers, no practical study
Microsoft ExperienceMicrosoft Experience
with Paul Larson,Microsoft Research
Prototyping MQO on Prototyping MQO on SQL-ServerSQL-ServerAdd multi-query optimization capability to
SQL-Server Well integrated with the existing
optimization framework– another optimization level– minimal changes, minimal extra lines of code
First cut: exhaustive– How slow can it be?
A working prototype by the summer-end
What (almost) already exists What (almost) already exists in the SQL-Server Optimizerin the SQL-Server Optimizer AND/OR Query-DAG representation of plan space
Group (OR node)Group (OR node)
AA BB CC DD
Op (AND node)Op (AND node)
What actually exists in the What actually exists in the SQL-Server OptimizerSQL-Server Optimizer Relations cloned for each use
AA B1B1 C1C1 DDB2B2 C2C2
Preprocessing Step: Preprocessing Step: Query-DAG UnificationQuery-DAG Unification Performed in a bottom-up traversal
AA B1B1 C1C1
DDB2B2 C2C2
Common Subexpression Common Subexpression IdentificationIdentification Unified nodes are CSEs
Common SubexpressionCommon Subexpression
AA BB CC DD
Exploring the Search Space: Exploring the Search Space: A Naïve AlgorithmA Naïve Algorithm For each set S of common subexpressions
– materialize each node in S– MatCost(S) = sum of materialization costs of the
nodes in S– invoke optimizer to find the best plan for the root
and for each node S – CompCost(S) = sum of costs of above plans– Cost(S) = MatCost(S) + CompCost(S)
Pick S with the minimum Cost
Doing Better: Doing Better: Incremental ReoptimizationIncremental ReoptimizationGoal: best plan for Si best plan for Sj Observation
– Best plans change for only the ancestors of nodes in Si XOR Sj
Algorithm: – Propagate changed costs in bottom-up topological
order from nodes in Si XOR Sj
– Update min-cost plan at each node visited
– Do not propagate further up if min-cost plan remains unchanged at a node
Work done at IIT-BombayWork done at IIT-Bombay
min-costmin-cost
Incremental Optimization: Incremental Optimization: ExampleExample Si =
AA BB CC DD
Previous min-costPrevious min-cost
New min-costNew min-cost
Incremental Optimization: Incremental Optimization: ExampleExample Si = Sj = {(B JOIN C)}
Now materializedNow materialized
AA BB CC DD
Current StatusCurrent Status
A first-cut implementation working– Lines of C++ code added: 1500 approx.
Future WorkFuture Work
Performance tuning and smarter data structures needed
Ways to restrict enumeration taking DAG structure into account
Research at IIT-Bombay: Research at IIT-Bombay: Heuristics for MQOHeuristics for MQO
with S. Sudarshan, S. Seshadri
A Greedy HeuristicA Greedy Heuristic
Pick nodes for materialization one at a time, in “benefit” order
Benefit(n) = reduction in cost on materialization of n
Benefit computation is expensiveBenefit computation is expensive
Monotonicity AssumptionMonotonicity Assumption
Benefit of a node does not increase due to materialization of other nodes
Exploited to avoid some benefit computations
Optimization costs decrease by 90%
A Postpass Heuristic: A Postpass Heuristic: Volcano-SHVolcano-SH No change in Volcano best plan
computation Cost-based materialization of nodes in
best Volcano plan
Implementation easy
Low overhead
Optimizer is not aware
A Volcano Variant: A Volcano Variant: Volcano-RUVolcano-RU Volcano best plan search aware of best
plans for earlier queries– Cost based materialization of best plan nodes
that are used by later queries
Implementation easy
Low overhead
Local decisions, plan quality sensitive to query sequence
Experimental ConclusionExperimental Conclusion
Greedy – Expensive, but practical– Overheads typically offset by plan quality
• especially for expensive “canned” queries
– Almost linear scaleup with query batch size• typically, only the width of the Query DAG affected
Volcano-RU – Mostly better than Volcano-SH, same overhead– Negligible overhead over Volcano
• recommended for cheap but complex queries
ConclusionConclusion
Multi-query optimization is neededMulti-query optimization is practical!Multi-query optimization is an easy
next step for DAG-based optimizers