Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | seamus-popham |
View: | 220 times |
Download: | 1 times |
Hopkins Storage Systems Lab, Department of Computer Science
Automated Physical Design in Database Caches
T. Malik, X. Wang, R. BurnsJohns Hopkins University
D. Dash, A. AilamakiCarnegie Mellon University
Hopkins Storage Systems Lab, Department of Computer Science
Outline
Target Application: Proxy caches for SkyQuery Physical Design in Proxy Caches
– Need for vertical partitioning– Workload evolution
Online Vertical Partitioning– Simple scenario: Two configuration
– General scenario: N configurations Experiments
Hopkins Storage Systems Lab, Department of Computer Science
SkyQuery
Publicly accessible federation of sky surveys (a virtual telescope with terabyte data sets)
Autonomous, heterogeneous, and geographically distributed sites
Data intensive, read-only workload Scaling through proxy caching
– Minimize network traffic – Offload query processing
Hopkins Storage Systems Lab, Department of Computer Science
Bypass Caching (Malik et al., ICDE’05)
Proxy database cache for SkyQuery– Brings columns closes to users– Economic model for bypassing queries
Hopkins Storage Systems Lab, Department of Computer Science
The Need for Vertical Partitioning
Poor I/O performance in the cache– Mirrors the backend DB design
Largest relations groups 446 columns
– Index-free environment Auxiliary data structures (indices/views) pollute cache
Offsets response time benefits from network savings– 6x benefit with partitioning alone
Performance without redundant data
Hopkins Storage Systems Lab, Department of Computer Science
Is Partitioning Feasible?
Hopkins Storage Systems Lab, Department of Computer Science
Why Not Existing Solutions?(ie. DB tuning advisor, Autopart)
Require representative workloads– Not readily available– Astronomy workloads exhibit evolution
Offline in nature– Invoked periodically– Costly to run – Ignore the cost of partitioning
Static design– Output a single configuration for the input workload– Ignores incremental changes within the workload
Hopkins Storage Systems Lab, Department of Computer Science
Workload Evolution
Hopkins Storage Systems Lab, Department of Computer Science
Workload Evolution
Hopkins Storage Systems Lab, Department of Computer Science
Online Vertical Partitioning Problem
Hopkins Storage Systems Lab, Department of Computer Science
Two Configuration Scenario
Algorithm: Given current config C and an alternative C’, transition if remaining in C incurs substantial overhead
Capturing overhead– Penalty :– Cumulative Penalty :– Max cumulative penalty :
Transition if 3-competitive
– After k transitions, 2Conf incurs (3k/2)(d(C,C’)+d(C’,C))– OPT incurs at least (k/2)(d(C,C’)+d(C’,C))
Hopkins Storage Systems Lab, Department of Computer Science
NConf: Extending to N-Configurations
Let Cy be the current config., Cx be the previous config., transition to the first Cz (Cz≠Cy) satisfying:
Number of configurations is exponential (51 trillion ways to partition 20 attributes)
Pruning heuristics– Neighboring configurations– Attribute Groups
Hopkins Storage Systems Lab, Department of Computer Science
Neighboring ConfigurationsA4A3A1 A2
A4A3A2A1 A3A1 A2 A4 A1 A3 A4 A2
Neighbors of Cy
Curr. Config: Cy
Small, incremental partitions Lower threshold to overcome
Hopkins Storage Systems Lab, Department of Computer Science
Attribute GroupsA4A3A1 A2
Curr. Config: Cyqk: {A1,A3,A4}
Attr. Groups: {A1}, {A3}, {A4}, {A1,A3}, {A1,A4}, {A3,A4}, {A1,A3,A4}
A4A1 A2 A3 A3A1 A2 A4 A3 A4A1 A2 A1 A2 A3 A4
weight+=qk(Cy) weight+=qk(Cy) weight+=qk(Cy) weight+=qk(Cy)
Candidate config if n.weight > d(Cx, Cy)+d(Cy,n) Candidates with high weight benefits from repartitioning
Hopkins Storage Systems Lab, Department of Computer Science
Experiments
TPC benchmark in SQL Server 2000 Partition orders relation using select queries Two 10k workloads
– WkldSky: Evolving access pattern that approximates SkyQuery
– WkldConst: Access pattern remains unchanged
AutoPart: an offline partitioning tool (Papadomanolakis et al.)
Hopkins Storage Systems Lab, Department of Computer Science
Query Performance
Hopkins Storage Systems Lab, Department of Computer Science
Estimated I/O and Transitions
Hopkins Storage Systems Lab, Department of Computer Science
Future Work
Impact of cache replacement policy– Database state change periodically– Reuse work to find new partitions
Scaling to SkyQuery with thousands of attributes Fast techniques for cost estimation Integration of index selection in caches
Hopkins Storage Systems Lab, Department of Computer Science
Conclusion
Proxy caches present a dynamic environment Vertical partitioning improves performance
without adding redundant data Online vertical partitioning
– Balances query execution performance with cost of transitioning
Experiments show 17% improvement by partitioning a single table
Hopkins Storage Systems Lab, Department of Computer Science
Questions
???