+ All Categories
Home > Documents > Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality...

Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality...

Date post: 30-Mar-2015
Category:
Upload: faith-winham
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
ICICLES: SELF-TUNING SAMPLES FOR APPROXIMATE QUERY ANSWERING BY VENKATESH GANTI, MONG LI LEE, AND RAGHU RAMAKRISHNAN CSE6339 – DATA EXPLORATION Raghavendra Madala
Transcript
Page 1: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: SELF-TUNING SAMPLES FOR APPROXIMATE QUERY ANSWERING

BY VENKATESH GANTI, MONG LI LEE, AND RAGHU RAMAKRISHNAN

CSE6339 – DATA EXPLORATION

Raghavendra Madala

Page 2: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

2

In this presentation… Introduction

Icicles

Icicle Maintenance

Icicle-Based Estimators

Quality Guarantee

Performance Evaluation

Conclusion

Page 3: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

3

Introduction

Analysis of data in data warehouses useful in decision support

• OLAP-provide interactive response times to aggregate queries

• AQUA- Approximate query answering systems provide very fast alternatives to OLAP systems

Page 4: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

4

Approaches

• Sampling-based• Histogram-based• Probabilistic-based• Wavelet-based• Clustering-based

Page 5: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

5

Join synopsis

Is a Uniform Random Sampling• All tuples are assumed to be equally important• OLAP queries follow a predictable repetitive

pattern• Sampling wastes precious main-memory• Join of random samples of base relations may

not be a random sample of the join of the base relations. This is basis for Join Synopsis by Gibbons

Page 6: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

6

Why Icicles?

• To capture the data locality of aggregate queries on foreign key joins

• Is expected to consist of more tuples in regions that are accessed more frequently

• Sample relation space better utilized if more samples from actual result set are present

• Dynamic algorithm that changes the sample to suit the queries being executed in the workload

Page 7: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

7

Icicles Is a uniform random sample of a

multiset of tuples L (an extension of R), which is the union of a relation R and all sets of tuples that were required to answer queries in the workload

Page 8: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

8

Icicle Maintenance

The intuition is to incrementally maintain a sample, called icicles.

We maintain an icicle such that the probability of a tuple being selected is proportional to frequency with which it is required to answer queries(exactly).

Page 9: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

9

Icicle Maintenance Algorithm

Efficient incremental maintenance is possible for the the following reasons• Uniform Random Sample of L(extension of

relation R) ensures that tuple’s selection in the icicle is proportional to it’s frequency

• Incremental maintenance of icicle requires only the segment of R that satisfies the new query each time

• Reservoir Sampling Algorithm is used to stream each tuple being appended to L.

Page 10: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

10

Icicle Maintenance Algorithm

Page 11: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

11

Icicle Maintenance Example

Page 12: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

12

Icicle-Based Estimators

• Icicle is a non-uniform sample of original data

• Frequency must be maintained over all tuples

• Different Estimation mechanisms for Average, Count and Sum

Page 13: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

13

Estimators for Aggregate queries

• Average is the average of distinct

tuples in sample satisfying query• Count is the sum of expected

contributions of all tuples in icicle that satisfy the query

• Sum is the product of average and count

Page 14: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

14

Maintaining Frequency Relation

• Add Frequency Attribute to the Relation R• Frequency of each tuples is set to 1• Frequency incremented each time when a

tuple is used to answer a query• Frequencies of relevant tuples updated

only when icicle updated with new query

Page 15: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

15

Quality Guarantees

• When queries in workload exhibit data locality, then icicles consists of more tuples from frequently accessed subsets of the relation

• Accuracy improves with increase in number of tuples used to compute it

Page 16: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

16

Performance Evaluation

Plots definition:• Static sample:

Uniform random sample on the relation• Icicle:

Icicle evolves with the workload• Icicle-complete

The tuned icicle again on the same workload

Page 17: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

17

Performance EvaluationSELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)

FROM LI, C, O, S, N, R

WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND

C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND

R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)

FROM LICOS-icicle, N, R

WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND

R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

Qworkload : Template for generating workloads

Template for obtaining approximate answers

Page 18: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

18

Performance Evaluation

Page 19: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

19

Performance Evaluation

Page 20: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

20

Conclusion

• Icicles are class of samples that are sensitive to workload characteristics

• Adapt quickly to changing workload• Icicles are useful when the workload

focuses on relatively small subsets in relation

• Icicle is a trade-off between accuracy and cost

Page 21: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

21

References

• V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000.

Page 22: Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.

ICICLES: Self-tuning Samples for Approximate Query

22

Thank you!


Recommended