Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | asher-york |
View: | 216 times |
Download: | 3 times |
Bhargav Vadher (208)APRIL 9th, 2008
Submittetd To:Dr. T Y Lin
Computer Science DepartmentSan Jose State University
Introduction Multipass sort-based algorithm. Performance of multipass sort-based
algorithm. Multipass hash-based algorithm. Performance of multipass hash-based
algorithm.
So far we seen most of algorithm required two passes.
But, what if relation R is big and required multipass.› Multipass sort-based algorithm.› Multipass hash-based algorithm.
Assume that › Number of memory buffer = M› We have relation R and S
BASIS:if B(R) ≤ M then› Read R in main memory› Sort R by favorite sorting algorithm› Write R back to disk.
INDUCTION:if B(R) > M then› Partition R in M blocks (R1, R2, …….RM)
› Sort Ri recursively i = 1,2,3….M
› Merge sorted sub list into one
If we are not just sorting but also want to do unary operation› just modify the previous algorithm to calculate δ and γ.
for δ output 1 copy of each distinct tuple and discard the rest.
for γ sort only on grouping attribute. combine tuples by grouping attribute.
Finally› Divide the M buffers between R and S according to number of
block in R and S acquired.› for R M * B(R) / (B(R) + B(S))
S rest of buffer blocks available.
Suppose S(M, k) = Max size of relation sorted with M block of buffer and k passes.
BASIS:If k = 1 only one pass allowedso, B(R) ≤ M S(M, 1) = M
INDUCTION:If k > 1 multiple pass allowed› partition R into M buffer blocks› S(M, k) = M S(M, k-1)
where, k-1 = no. of pass for each block of R.
Each pass of algorithm…› Requests data from disk› Sort it with accordance method› Write it back to disk
So, k – pass sorting algorithm requires› 2k B(R) disk I/O operations
And, multipass sorting algorithm requires› 2 (k-1) (B(R) + B(S)) disk I/O operation for sort sub list
+› B(R) + B(S) disk I/O operation for merging sorted sub list in
final phase
Basics:› alternative approach of multipass algorithm› has the relations in M-1 buckets,
where, M is number of memory buffers› for unary, apply the operation to each bucket
individually› for binary, apply the operation to each
corresponding pair of bucket
The approach can be described as…BASIS:
for unaryif the relation fits into the M memory blocks
› Read it into the memory from disk› Perform the operation on it
for binaryif one of them relation fits into the M-1 memory
blocks› Read that relation into main memory M-1 blocks› Read second relation 1 block at a time into Mth block› Perform the operation
INDUCTION:If none of two relation fits into the main memory buffers
› Hash each relation into main memory’s M-1 buckets.
› Hash the alternative relations in Mth bucket.› Recursively perform the operation on each bucket
or pairs of
corresponding buckets.› Accumulate the output form each of the bucket
For unary operation:Assume
› operations are like δ and γ› Relation is R› Number of buuffer M› u(M, k) = number of blocks in largest relation with k pass
hash
BASIS:If u(M, 1) = M, since R must be fitted in M buffers
so, B(R) ≤ M
INDUCTION: Assume that first step divides R into M-1 equal
buckets. The buckets of second relation must be small
enough to be handled by k-1 passes. So, buckets are of size u(M, k-1). Since R is divided in M-1 buckets, we have
› u(M, k) = (M-1) u(M, k-1).
if we expand the recurrence above we can perform unary operation of relation R in k passes with M
buffers› provided that M ≤ (B(R)) 1/k
For binary operation:BASIS:
If we use the one pass algorithm to join then› Either R or S must be fit into M-1 blocks.› j(M, 1) = M-1.
INDUCTION:› On the first of k passes, divide the R into M-1 buckets so each
buckets is of 1 / (M-1) of entire relation. So, j(M, k) = (M-1) j(M, k-1)
› So, we can join R(X, Y) S(Y, Z) using k passes and M buffers Provided Mk ≥ min (B(R), B(S))