+ All Categories
Home > Documents > Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State...

Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State...

Date post: 20-Jan-2016
Category:
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
23
Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University
Transcript
Page 1: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Supporting High-Level Abstractions through XML Technologies

Xiaogang Li

Gagan Agrawal

The Ohio State University

Page 2: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Motivation

The need Analysis of datasets is becoming crucial for scientific advances

Emergence of X-Informatics Complex data formats complicate processing Need for applications that are easily portable – compatibility

with web/grid services The opportunity

The emergence of XML and related technologies developed by W3C

XML is already extensively used as part of Grid/Distributed Computing

Can XML help in scientific data processing?

Page 3: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

The Big Picture

TEXT

NetCDF

RMDB

HDF5

XML

XQuery

???

Page 4: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Programming/Query Language

High-level declarative languages ease application development Popularity of Matlab for scientific computations

New challenges in compiling them for efficient execution

XQuery is a high-level language for processing XML datasets Derived from database, declarative, and functional

languages ! XPath (a subset of XQuery) embedded in an

imperative language is another option

Page 5: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Approach / Contributions

Use of XML Schemas to provide high-level abstractions on complex datasets

Using XQuery with these Schemas to specify processing

Issues in Translation High-level to low-level code Data-centric transformations for locality in low-level

codes Issues specific to XQuery

Recognizing recursive reductions Type inferencing and translation

Page 6: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

External Schema

XQuery Sources

Compiler

XML Mapping Service

System Architecture

logical XML schema physical XML schema

C++/C

Page 7: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Outline

Example application and Use of XML Schemas

XQuery features and example Translation Issue

High-level to low-level translation Data-centric transformations Analyzing recursive reductions

Experimental results Conclusions

Page 8: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Satellite Data ProcessingTime[t]

···

Data collected by satellites is a collection of chunks, each of which captures an irregular section of earth captured at time t The entire dataset comprises multiples pixels for each point in earth at different times, but not for all times Typical processing is a reduction along the time dimension - hard to write on the raw data format

Page 9: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Using a High-level Schema

High-level view of the dataset – a simple collection of pixels

Latitude, longitude, and time explicitly stored with each pixel

Easy to specify processing Don’t care about locality / unnecessary scans

At least one order of magnitude overhead in storage Suitable as a logical format only

Page 10: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

XQuery Overview

XQuery -A language for querying and

processing XML document - Functional language - Single Assignment - Strongly typed

XQuery Expression - for let where return (FLWR) - unordered - path expression

Unordered(For $d in document(“depts.xml”)//deptno let $e:=document(“emps.xml”)//emp [Deptno= $d] where count($e)>=10 return <big-dept> {$d, <count> {count($e) }</count> <avg> {avg($e/salary)}<avg> } </big-dept> )

Page 11: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Satellite- XQuery Code

Unordered ( for $i in ( $minx to $maxx) for $j in ($miny to $maxy) let

p:=document(“sate.xml”) /data/pixel where

lat = i and long = j return <pixel> <latitude> {$i} </latitude> <longitude> {$j} <longitude> <sum>{accumulate($p)}</sum> </pixel> )

Define function accumulate ($p) as double { let $inp := item-at($p,1) let $NVDI := (( $inp/band1

-$inp/band0)div($inp/band1+$inp/band0)+1)*512

return if (empty( $p) ) then 0 else { max($NVDI, accumulate(subsequence

($p, 2 ))) }

Page 12: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Challenges

Need to translate to low-level schema Focus on correctness and avoiding unnecessary reads

Enhancing locality Data-centric execution on XQuery constructs Use information on low-level data layout

Issues specific to XQuery Reductions expressed as recursive functions Generating code in an imperative language

For either direct compilation or use a part of a runtime system

Requires type conversion

Page 13: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Mapping to Low-level Schema

A number of getData functions to access elements(s) of required types

getData functions written in XQuery

allow analysis and transformations

Want to insert getData functions automatically

preserve correctness and avoid unnecessary scans

getData(lat x, long y)

getData(lat x)

getData(long y)

getData(lat x, long y, time t)

….

Page 14: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Low-level Code Unordered ( for $i in ( $minx to $maxx) for $j in ($miny to $maxy) let p:= getData(i,j) return <pixel> <latitude> {$i}

</latitude> <longitude> {$j}

<longitude>

<sum>{accumulate($p)}</sum> </pixel> )

Generate correct low-level code Insert getData function which reads the smallest superset of the values read in high-level code Use relational algebra

•Reduce to canonical form •Compare canonical forms

Resulting code•Can require unnecessary scans•May have very poor locality

Page 15: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Data Centric Transformation

Objective -Reconstruct unordered loops of the query so that

only one scan of the entire dataset is sufficient

Algorithm 1. Perform loop fusion that is necessary2. Generate abstract iteration space3. Extracting necessary and sufficient conditions that

maps an element in the dataset to the iteration space

Page 16: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Naïve Strategy

DatasetOutput

Requires 3 Scans

Page 17: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Data Centric Strategy

DatasetsOutput

Requires just one scan

Page 18: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Recursion Analysis

Assumption -Expressed in Canonical form 1) Linear recursive function 2) Operation is associative and

commutative

Objective - Extracting associative and

commutative operations - Extracting initialization

conditions - Transform into iterative

operations - Generate a global reduction

function

Canonical Form

Define function F($t) { if (p1) then F1 ($t)Else F2(F3($t),

F4(F(F5($t)))) }

Page 19: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Recursion analysis -Algorithm Algorithm1. Add leaf nodes represent or are

defined by recursive function to Set S.

2. Keep only nodes that may be

returned as the final value to Set S. 3. Recursively find a least common

ancestor A of all nodes in S.4. Return the subtree with A as

Root. 5. Examine if the subtree

represents an associative and communicative operation

Example

define function accumulate ($p)return double if (empty($p) ) then 0

else let $val := accumulate(subsequence($p,2)) let $q := item-at($p,1)

return If ($q >= $val) then $val

else $q

Page 20: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Evaluating Data centric Transformation

01002003004005006007008009001000

150M 600M0

200400600800100012001400160018002000

150M 600M

Opt

Naïve

Virtual Microscope Satellite

Page 21: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Comparison with Manual - VMScope

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 2 4 8

Xquery

C

Page 22: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Comparison with Manual - Satellite

050100150200250300350400450500

1 2 4 8

Xquery

C

Page 23: Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Conclusions

• A case for the use of XML technologies in scientific

data analysis • XQuery – a data parallel language ? • Identified and addressed compilation challenges • A compilation system has been built

• Very large performance gains from data-centric transformations

• Preliminary evidence that high-level abstractions and query language do not degrade performance substantially


Recommended