+ All Categories
Home > Documents > APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002...

APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002...

Date post: 17-Jan-2018
Category:
Upload: muriel-townsend
View: 219 times
Download: 0 times
Share this document with a friend
Description:
Data Warehousing Lab. 3 Introducton (1/2) Traditional indexes generally record all label paths from the root element in XML data. APEX does not keep all paths starting from the root and utilizes frequently used paths to improve the query performance.
17
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In Ewha Womans University In Lab Seminar 2003.10.29
Transcript
Page 1: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

APEX: An Adaptive Path Index for XML data

Chin-Wan Chung, Jun-Ki Min, Kyuseok ShimSIGMOD 2002

Presentation: M.S.3 HyunSuk JungData Warehousing Lab.

In Ewha Womans University

In Lab Seminar2003.10.29

Page 2: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

2

Outline Introduction Related work Overview of APEX Construction and management of APEX Experiment result Conclusion

Page 3: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

3

Introducton (1/2)

Traditional indexes generally record all label paths from the root element in XML data.

APEX does not keep all paths starting from the root and utilizes frequently used paths to improve the query performance.

Page 4: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

4

Introduction (2/2)

contribution Efficient Processing of Partial Matching

Queries Workload-Aware Path Indexes Incremental Update

Page 5: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

5

Related work

Strong Dataguide 1-index Index Fabric

Such path indexes may result in performance degradation due to large sizes and exhaustive navigations for partial matching path queries start with the self-or-descendent axis(“//”)

Page 6: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

6

Preliminary (1/2) Figure1: A sample XML data

Page 7: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

7

Preliminary (2/2) Definition 2

Label path Ex) movie.title, name: label path of node 7

Definition 3 Data path Ex) movie.8.title and name.11: data paths of

node 7 Definition 4

Data path d is an instance of a label path Ex) movie.8.title.10 is an instance of

movie.title and name.11 is an instance of name.

Page 8: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

8

Overview of APEX

APEX (HAPEX and GAPEX) Example query: //actor/name

Searches for all actor names Just looks up the hash tree with

actor.name in reverse order

Page 9: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

9

Overview of APEX The support of a label path p=li…lj : sub(p)

The ratio of the number of queries having p as a subpath to the total number queries.

Definition 6. p=li…lj in GXML is a frequently used path if

sup(p)≥minSup. Let p be a required path if it is either a frequently used path or the length of p is one.

Definition 7. For a label path p of the form li. li+1…lj in GXML, a edge

set, T(p), is {< oj-1, oj >| lioi. lj-1.oj-1…lj.oj is a data path in GXML}. That is, a edge set T(p) is a set of pairs of nids for the incoming edges to the last nodes that are reachable by traversing a given label path p.

Ex) edge set T(title)={<8,10>,<14,17>}

Page 10: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

10Construction and management of APEX

Architecture of APEX management tool The system consists of 3 main components

Page 11: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

11

APEX0 : initial index structure

APEX0 is the initial structure to build APEX. This step is executed only once at the beginning. have not only the structural summary in GAPEX but also the extents in

the nodes of GAPEX.

Page 12: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

12Frequently used path extraction

Sequential pattern mining Used naïve algorithm

simply counts all sequential subsequences that appear in the query workload by one scan.

Suppose required path set:{A,B,C,D,B.D}

Entry of hash table Label: key value for the entry Count: frequency of label path New: check a newly create entry Xnode: points to a node in GAPEX wh

ose incoming label path is represented by the entry

Next: points another node in HAPEX

Page 13: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

13

The update with frequently used paths

The basic idea of update traverse the nodes in GAPEX update not only the structure of GAPEX with frequently

used paths but also the xnode field of entries in HAPEX.

Page 14: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

14

The update with frequently used paths

Page 15: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

15

Experiment results (1/2) Data Sets

Play: tree structured data FlixML: graph structured data GedML: graph structured data

Query Workload //l1/l2/…/lm or //l1/…/lili+1/…/lm where li is a ta

g or an attribute (with the prefix of ‘@’) : QTYPE1 //li // li : QTYPE2

Page 16: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

16

Experiment results (2/2) the strong DataGuide is generally inefficient for

complex XML data. The performance of APEX depends on the value

of minSup. As minSup decreases, the number of frequently u

sed paths increases and more entries is stored in the HAPEX.

Thus, more queries can be directly obtained by look-up of HAPEX.

best bestbest

Page 17: APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

Data Warehousing Lab.

17

Conclusion APEX review summary

does not keep all paths starting from the root and utilizes frequently used paths to improve the query performance.

Utilizing the data mining algorithm to summarize paths that appear frequently in the query workload.

can be incrementally updated in order to minimize the overhead of construction whenever the query workload changes.

guarantees all paths of length two so that any label path expression can be evaluated by joins of extents in APEX without scanning original data.

consists of two structures the graph structure GAPEX : represents the structural

summary of XML data with extents the hash tree HAPEX : keeps the information for frequently

used paths and their corresponding nodes in GAPEX.


Recommended