+ All Categories
Home > Documents > Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database...

Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database...

Date post: 14-Dec-2015
Category:
Upload: seamus-popham
View: 220 times
Download: 1 times
Share this document with a friend
Popular Tags:
20
Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University D. Dash, A. Ailamaki Carnegie Mellon University
Transcript
Page 1: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Automated Physical Design in Database Caches

T. Malik, X. Wang, R. BurnsJohns Hopkins University

D. Dash, A. AilamakiCarnegie Mellon University

Page 2: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Outline

Target Application: Proxy caches for SkyQuery Physical Design in Proxy Caches

– Need for vertical partitioning– Workload evolution

Online Vertical Partitioning– Simple scenario: Two configuration

– General scenario: N configurations Experiments

Page 3: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

SkyQuery

Publicly accessible federation of sky surveys (a virtual telescope with terabyte data sets)

Autonomous, heterogeneous, and geographically distributed sites

Data intensive, read-only workload Scaling through proxy caching

– Minimize network traffic – Offload query processing

Page 4: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Bypass Caching (Malik et al., ICDE’05)

Proxy database cache for SkyQuery– Brings columns closes to users– Economic model for bypassing queries

Page 5: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

The Need for Vertical Partitioning

Poor I/O performance in the cache– Mirrors the backend DB design

Largest relations groups 446 columns

– Index-free environment Auxiliary data structures (indices/views) pollute cache

Offsets response time benefits from network savings– 6x benefit with partitioning alone

Performance without redundant data

Page 6: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Is Partitioning Feasible?

Page 7: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Why Not Existing Solutions?(ie. DB tuning advisor, Autopart)

Require representative workloads– Not readily available– Astronomy workloads exhibit evolution

Offline in nature– Invoked periodically– Costly to run – Ignore the cost of partitioning

Static design– Output a single configuration for the input workload– Ignores incremental changes within the workload

Page 8: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Workload Evolution

Page 9: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Workload Evolution

Page 10: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Online Vertical Partitioning Problem

Page 11: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Two Configuration Scenario

Algorithm: Given current config C and an alternative C’, transition if remaining in C incurs substantial overhead

Capturing overhead– Penalty :– Cumulative Penalty :– Max cumulative penalty :

Transition if 3-competitive

– After k transitions, 2Conf incurs (3k/2)(d(C,C’)+d(C’,C))– OPT incurs at least (k/2)(d(C,C’)+d(C’,C))

Page 12: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

NConf: Extending to N-Configurations

Let Cy be the current config., Cx be the previous config., transition to the first Cz (Cz≠Cy) satisfying:

Number of configurations is exponential (51 trillion ways to partition 20 attributes)

Pruning heuristics– Neighboring configurations– Attribute Groups

Page 13: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Neighboring ConfigurationsA4A3A1 A2

A4A3A2A1 A3A1 A2 A4 A1 A3 A4 A2

Neighbors of Cy

Curr. Config: Cy

Small, incremental partitions Lower threshold to overcome

Page 14: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Attribute GroupsA4A3A1 A2

Curr. Config: Cyqk: {A1,A3,A4}

Attr. Groups: {A1}, {A3}, {A4}, {A1,A3}, {A1,A4}, {A3,A4}, {A1,A3,A4}

A4A1 A2 A3 A3A1 A2 A4 A3 A4A1 A2 A1 A2 A3 A4

weight+=qk(Cy) weight+=qk(Cy) weight+=qk(Cy) weight+=qk(Cy)

Candidate config if n.weight > d(Cx, Cy)+d(Cy,n) Candidates with high weight benefits from repartitioning

Page 15: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Experiments

TPC benchmark in SQL Server 2000 Partition orders relation using select queries Two 10k workloads

– WkldSky: Evolving access pattern that approximates SkyQuery

– WkldConst: Access pattern remains unchanged

AutoPart: an offline partitioning tool (Papadomanolakis et al.)

Page 16: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Query Performance

Page 17: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Estimated I/O and Transitions

Page 18: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Future Work

Impact of cache replacement policy– Database state change periodically– Reuse work to find new partitions

Scaling to SkyQuery with thousands of attributes Fast techniques for cost estimation Integration of index selection in caches

Page 19: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Conclusion

Proxy caches present a dynamic environment Vertical partitioning improves performance

without adding redundant data Online vertical partitioning

– Balances query execution performance with cost of transitioning

Experiments show 17% improvement by partitioning a single table

Page 20: Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Hopkins Storage Systems Lab, Department of Computer Science

Questions

???


Recommended