Download - XJoin: A Reactively-Scheduled Pipelined Join Operator

CS561 - XJoin 1

XJoin: A Reactively-Scheduled Pipelined Join Operator

IEEE Bulletin, 2000by Tolga Urhan and Michael J. Franklin

CS561 - XJoin 2

Goal of XJoin

Efficiently evaluate equi-join in online query processing over distributed data sources

Optimization objectives: Having small memory footprint Fast initial result delivery Hiding intermittent delays in data arrival

CS561 - XJoin 3

Outline

Hash Join History Motivation of XJoin Challenges in Developing XJoin Three Stages of XJoin Preventing Duplicates Experimental Results Conclusion

CS561 - XJoin 4

Classic Hash Join

key2 R tuples

key1 R tuples

key3 R tuples

key4 R tuples

Key5 R tuples

1. Build S tuple 1

S tuple 2

S tuple 3

S tuple 4

S tuple 5

2. Probe

2-phase: build and probe Only one table is hashed in memory

CS561 - XJoin 5

Hybrid Hash Join One table is hashed both to disk and memory (partitions) G. Graefe, “Query Evaluation Techniques for Large Databases”.

ACM 1993.

Disk

Bucket i

Bucket i+1

Bucket i+2

Bucket …

Bucket j-1

Bucket j

R tuples

R tuples

R tuples

R tuples

R tuples

R tuplesBucket n

Bucket n+1

Bucket n+2

Bucket …

Bucket m-1

Bucket m

R tuples

R tuples

R tuples

R tuples

R tuples

R tuples

Memory

S tuple 1

S tuple 2

S tuple 3

S tuple 4

S tuple …

CS561 - XJoin 6

Symmetric Hash Join (Pipelined) Both tables are hashed (both kept in main memory only) A. Wilschut, P. M.G. Apers, “Dataflow Query Execution in a

Parallel Main-Memory Environment”, DPD 1991.

Source R

OUTPUT

Source S

Key n

Key n+1

Key n+2

Key …

Key m-1

Key m

R tuples

R tuples

R tuples

R tuples

R tuples

R tuples

BUILD

PROBE

R tuple S tuple

Key i

Key i+1

Key i+2

Key …

Key j-1

Key j

S tuples

S tuples

S tuples

S tuples

S tuples

S tuples

BUILD

PROBE

R tuple S tuple

CS561 - XJoin 7

Problems of SHJ:

Rather memory intensive

Won’t work for large input streams.

Won’t allow for many joins to be processed in a pipeline (or even in parallel).

CS561 - XJoin 8

New Problems in Online Query Processing over Distributed Data Sources Unpredictable data access due to link

congestion, load balances, etc. Three classes of delays

Initial Delay: first tuple arrives from remote source more slowly than usual

Slow Delivery: data arrives at a constant, but slower than expected rate

Bursty Arrival: data arrives in a fluctuating manner

CS561 - XJoin 9

Question: Why are delays undesirable?

Prolongs the time for first output Slows the processing if wait for data to first be

there before acting If too fast, you want to avoid loosing any data Waste time if you sit idle while no data is coming Unpredictable, one single strategy won’t work

CS561 - XJoin 10

Motivation of XJoin

Produce results incrementally when available Tuples returned as soon as produced

Exploit available main memory as long as possible Favor main-memory join when possible

Allow progress to be made when one or more sources experience delays by: Background processing performed on previously received

tuples so results are produced even when both inputs are stalled

CS561 - XJoin 11

XJoin Design

Tuples are stored in partitions (Hash Join):

A memory-resident (m-r) portion

A disk-resident (d-r) portion

CS561 - XJoin 12

Memory-resident partitions of source B

Tuple B

hash(Tuple B) = n

SOURCE-BSOURCE-A

D I S K

M E

M O

R Y 1

. . . . . . nn1

Memory-resident partitions of source A

1

. . . . . . . . . . . . n

1

Disk-residentpartitions of source A

. . . n

Disk-residentpartitions of source B

. . . . . .1 nk

k

flush

Tuple A

hash(Tuple A) = 1

CS561 - XJoin 13

Challenges in Developing XJoin Manage flow of tuples between memory and

secondary storage (when and how to do it) Control background processing when inputs

are delayed (reactive scheduling idea) Provide both quick initial result as well as

good overall throughput Ensure the full answer is produced Ensure duplicate tuples are not produced

CS561 - XJoin 14

XJoin Stages

XJoin proceeds in 3 stages (separate threads)

M : M

M : D

D : D

CS561 - XJoin 15

M E

M O

R Y

Partitions of source B

. . . . . . . . .i j

SOURCE-B

hash(record B) = j

Tuple B

SOURCE-A

Tuple Ahash(record A) = i

i j

Partitions of source A

. . . . . . . . .

Output

Insert Probe InsertProbe

1st Stage: Memory-to-Memory Join

CS561 - XJoin 16

1st Stage: Memory-to-Memory Join Join processing continues as long as:

Memory permits, and One of the inputs is producing tuples

If memory is full, one partition is picked to be flushed to disk and appended to end of disk-resident portion

If no new input, then stage 1 is blocked and stage 2 starts

CS561 - XJoin 17

Why Stage 1?

In-memory operations are much faster and cheaper than on-disk operations

Thus this guarantees that results are produced as soon as possible.

CS561 - XJoin 18

Question:

What does the 2nd Stage do? When does the 2nd Stage start?

Hint: What occurs when data input (tuples) are too large for

memory? Answer:

The 2nd Stage joins Memory-to-Disk Occurs when both inputs are blocking

CS561 - XJoin 19

Output

i . . . . . . .. . . . . . .i. . . . . . .. . . . . . .

M E

M O

R Y

Partitions of source BPartitions of source A

D I

S K

Partitions of source BPartitions of source A

ii . . . . .. . . . .. . . . .. . . . .

DPiA MPiB

Stage 2

CS561 - XJoin 20

2nd Stage: Memory-to-Disk Join Activated when 1st Stage is blocked Performs 3 steps:

1. Choose partition according to throughput and size of partition from one source

2. Use tuples from d-r portion to probe m-r portion of other source and output matches, until d-r completely processed

3. Check if either input resumed producing tuples. If yes, resume 1st Stage. If no, choose another d-r portion and continue 2nd Stage.

CS561 - XJoin 21

Controlling 2nd Stage Cost of 2nd Stage is hidden when both inputs

experience delays Tradeoffs ? What are the benefits of using second stage?

Produces results when input sources are stalled Allows varying input rates

What is the disadvantage? The second stage must complete a d-r portion before

checking for new input (overhead) To address tradeoff, use an activation threshold:

Pick a partition likely to produce many tuples right now

CS561 - XJoin 22

3rd Stage: Disk-to-Disk Join Clean-up stage

Assume that all data for both inputs has arrived Assume that 1st and 2nd stage have completed

Why is this step necessary? Completeness of answer: make sure that all result

tuples are being produced. Reason: some tuples in disk-resident portions

may not have had chance to join each other.

CS561 - XJoin 23

Preventing Duplicates

When could duplicates be produced? Duplicates could be produced in both 2nd and 3rd

stages which may perform overlapping work.

How to address it? XJoin prevents duplicates with timestamps.

When address this? During processing when trying to join two tuples.

CS561 - XJoin 24

Time Stamping : Part 1 2 fields are added to each tuple:

Arrival TimeStamp (ATS) Indicates time when tuple first arrived in memory

Departure TimeStamp (DTS) Indicates time when tuple was flushed to disk

[ATS, DTS] indicates when tuple was in memory

When did two tuples get joined in 1st state? If Tuple A’s DTS is within Tuple B’s [ATS, DTS]

Tuples that meet this overlap condition are not considered for joining at 2nd or 3rd stage

CS561 - XJoin 25

Tuple B1 178 198

Tuples joined in first stage

B1 arrived after A and before A was flushed to disk

Tuple A 102 234

DTSATS

Tuple B2 348 601

Tuples not joined in first stage

B2 arrived after A and after A was flushed to disk

Tuple A 102 234

DTSATS

Non-Overlapping

Detecting Tuples Joined in 1st Stage

Overlapping

CS561 - XJoin 26

Time Stamping : Part 2 For each partition, keep track of :

ProbeTS: time when a 2nd stage probe was done DTSlast: the DTS of last tuple of disk-resident portion

Several such probes may occur Keep an ordered history of such probe descriptors

Meaning : All tuples before and including at time DTSlast were joined in

stage 2 with all tuples in main memory at time ProbeTS

CS561 - XJoin 27

Detecting Tuples Joined in 2nd stage

All A tuples in Partition 2 up to DTSlast 350,were joined with m-r tuples that arrived before Partition 2’s ProbeTS.

100 300 800 900

20 340 350 550 700 900Tuple A 100 200

Tuple B 500 600

ATS DTS

ATS DTS

overlap

DTSlast ProbeTS

History list for corresponding partition.

Partition 2

Partition 2

CS561 - XJoin 28

Experiments

HHJ (Hybrid Hash Join)

XJoin (with 2nd stage and with caching)

XJoin (without 2nd stage)

XJoin (with aggressive usage of 2nd stage)

CS561 - XJoin 29

Case 1: Slow NetworkBoth Sources Are Slow

CS561 - XJoin 30

Case 1: Slow NetworkBoth Sources Are Slow (Bursty) XJoin improves delivery time of initial

answers -> interactive performance The reactive background processing is an

effective solution to exploit intermittent delays to keep continued output rates

Shows that 2nd stage is very useful if there is time for it

CS561 - XJoin 31

Case 2: Fast NetworkBoth Sources Are Fast

CS561 - XJoin 32

Case 2: Fast NetworkBoth Sources Are Fast All XJoin variants deliver initial results earlier. XJoin also can deliver the overall result in

equal time to HHJ HHJ delivers the 2nd half of the result faster

than XJoin. 2nd stage cannot be used too aggressively if

new data is coming in continuously

CS561 - XJoin 33

Conclusion

Can be conservative on space (small footprint)

Can produce initial result as early as possible Can hide intermittent data delays Can be used in conjunction with online query

processing to manage data streams (limited)

CS561 - XJoin 34

How to Further Optimize XJoin? Resuming Stage 1 as soon as data arrives Removing no-longer-joining tuples in timely

manner Other ideas ? …

CS561 - XJoin 35

References

Urhan, Tolga and Franklin, Michael J. “XJoin: Getting Fast Answers From Slow and Bursty Networks.”

Urhan, Tolga and Franklin, Michael J. “XJoin: A Reactively-Scheduled Pipelined Join Operator.”

Hellerstein, Franklin, Chandrasekaran, Deshpande, Hildrum, Madden, Raman, and Shah. “Adaptive Query Processing: Technology in Evolution”. IEEE Data Engineering Bulletin, 2000.

Hellerstein and Avnur, Ron. “Eddies: Continuously Adaptive Query Processing.”

Babu and Wisdom, Jennifer. “Continuous Queries Over Data Streams”.

CS561 - XJoin 36

Stream: New Query Context

Challenges faced by XJoin Potentially unbounded growing join state Indefinite delay of some join results

Solutions Exploit semantic constraints to remove no-longer-

joining data in timely manner Constraints:

sliding window punctuations

CS561 - XJoin 37

Punctuation Punctuation is predicate on stream elements

that evaluates to false for every element following the punctuation.

9961234 Edward 17

9961235 Justin 19

9961238 Janet 18

* * (0, 18]

no more tuples for students whose age

are less than or equal to 18!

ID Name Age

9961256 Anna 20

…

CS561 - XJoin 38

An Example

Open Stream

Group-byitem_id (sum(…))

Open Stream

item_id | seller_id | open_price | timestamp1080 | jsmith | 130.00 | Nov-10-03 9:03:00<1080, *, *, *>1082 | melissa | 20.00 | Nov-10-03 9:10:00<1082, *, *, *>…

item_id | bidder_id | bid_price | timestamp1080 | pclover | 175.00 | Nov-14-03 8:27:001082 | smartguy | 30.00 | Nov-14-03 8:30:001080 | richman | 177.00 | Nov-14-03 8:52:00<1080, *, *, *>…

Bid Stream

Query: For each item that has at least one bid, return its bid-increase value.

Select O.item_id, Sum (B.bid_price - O.open_price)From Open O, Bid BWhere O.item_id = B.item_idGroup by O.item_id

Bid Stream

Joinitem_id

Out1

(item_id)Out2

(item_id, sum)No more bids for item 1080!

CS561 - XJoin 39

PJoin Execution Logic

Hash TableHash Table

Join State (Disk-Resident Portion)

Join State (Memory-Resident Portion)

… 35399

…

Hash Table

5935

…

State of Stream A (Sa) State of Stream B (Sb)

Stream A Stream B

3

Hash(ta) = 1

Tuple ta

33

Purge Cand. Pool3

Purge Cand. Pool

Hash Table

…

1

2

4

3 <10Punct. Set (PSb)Punct. Set (PSa)

CS561 - XJoin 40

PJoin Execution Logic

Hash TableHash Table

Join State (Disk-Resident Portion)

Join State (Memory-Resident Portion)

… 35399

…

Hash Table

5935

…

State of Stream A (Sa) State of Stream B (Sb)

Stream A Stream B

3

Hash(pa) = 1

Punctuation pa

Purge Cand. Pool

3

Purge Cand. Pool

Hash Table

…

<10Punct. Set (PSb)Punct. Set (PSa)

CS561 - XJoin 41

PJoin vs. XJoin: Memory Overhead

02000400060008000

1000012000140001600018000

0 10000 20000 30000 40000 50000Ti me (mi l l i seconds)

# of

Tup

les

in J

oin

Stat

es XJ oi nPJ oi n

Tuple inter-arrival: 2 millisecondsPunctuation inter-arrival: 40 tuples/punctuation

CS561 - XJoin 42

PJoin vs. XJoin: Tuple Output Rate

0100000200000300000400000500000600000700000800000

0 10000 20000 30000 40000 50000 60000Ti me (mi l l i seconds)

# of

Out

put

Tupl

es

PJ oi nXJ oi n

Tuple inter-arrival: 2 millisecondsPunctuation inter-arrival: 30 tuples/punctuation

CS561 - XJoin 43

Conclusion Memory requirement for PJoin state almost

insignificant compared to XJoin’s. Increase in join state of XJoin leading to

increasing probe cost, thus affecting tuple output rate.

Eager purge is best strategy for minimizing join state.

Lazy purge with appropriate purge threshold provides significant advantage in increasing tuple output rate.