Date post: | 29-Nov-2014 |
Category: |
Data & Analytics |
Upload: | inria-oak |
View: | 30 times |
Download: | 0 times |
PAXQuery
Owner : Juan A. M. Naranjo Presenter : Katerina Tzompanaki
Efficient Parallel Processing of Complex XQuery
2%
PAXQuery
• Execu&on engine for the XML query language (XQuery) that runs on the Apache Flink (previously known as Stratosphere) plaBorm that • Translates XML queries to algebraic trees
• Maps algebraic trees to PACT plans
• Parallelizes XML queries
• Development Period: January 2013-‐ December 2014
2
Code locaGon
• Under Stratosphere’s Project github page (not accessible &ll 12/2014) URL: hXps://github.com/stratosphere/paxquery
• In Gforge (not frequently updated, needs permission) URL: hXps://gforge.inria.fr/scm/viewvc.php/xmlstratosphere/paxquery/?root=xmlinthecloud
• Parts are based on code from ViP2P project
URL: hXps://scm.gforge.inria.fr/svn/vip2p/trunk/vip2p
3
Code volume
Number of lines of code • Paxquery-‐algebra: 5454 • Paxquery-‐client: 36 • Paxquery-‐common: 7152 • Paxquery-‐pact: 4359 • Paxquery-‐transla&on: 778 • Paxquery-‐xparser: 7538 Total: 25317
4
Number of types • Paxquery-‐algebra: 36 • Paxquery-‐client: 1 • Paxquery-‐common: 82 • Paxquery-‐pact: 60 • Paxquery-‐transla&on: 1 • Paxquery-‐xparser: 123 Total: 303
Code Contributors
5
• Past : • Jesús Camacho-‐Rodríguez
• The ViP2P contributors
• Current: • Juan A. M. Naranjo
Architecture
6
Code structure
7
It is a Maven Project that has: • Input: Xquery in text file • Output: XML result, in text file • Modules:
• Paxquery-‐algebra Algebraic plan and algebraic operators • Paxquery-‐client Old client (not to be used in the release) • Paxquery-‐common Global-‐scope func&onality (e.g. XML naviga&on) • Paxquery-‐pact Custom PACT operators for Apache Flink • Paxquery-‐transla5on Algebraic tree to PACT tree • Paxquery-‐xparser Xquery to algebraic tree (under construc,on)
8
Flink org.apache.flink.client.CliFrontend
paxquery-xparser fr.inria.oak.paxquery.xparser.client.Xclient
fr.inria.oak.paxquery.XQueryVisitorImplementation
paxquery-algebra
2. instantiation invocation of getLogicalPlan()
7. returns LogicalPlan object (an algebraic plan)
3. instantiation of: -LogicalPlan object -BaseLogicalOperator objects
paxquery-common
3. instantiation of: -BasePredicate objects -NavigationTreePattern objects -ConstructionTreePattern objects
paxquery-translation
4. invocation of: planTranslate()
paxquery-pact
5. instantiation of: -Operator objects
6. planTranslate() returns Plan object (a PACT plan)
1. invocation of getPlan()
8. getPlan() returns a Plan object (a PACT plan)
PAXQuery workflow Source instantiates or invokes end
9
Flink org.apache.flink.client.CliFrontend
paxquery-xparser fr.inria.oak.paxquery.xparser.client.Xclient
fr.inria.oak.paxquery.XQueryVisitorImplementation
paxquery-algebra
paxquery-common
paxquery-translation
paxquery-pact
Source depends on end (we can see it as “import” statements)
Dependencies between modules
External Code Dependencies
• Apache Flink • Log4j • Apache Commons Configura&on • Apache Commons Lang • Google Guava • Junit • ANTLRv4 – the grammar parser used in paxquery-‐xparser • XMLUnit • JSON Simple • DOT • To be used: Dagre-‐d3 – for the new web interface design
10
TODO
• Finish the developement of paxquery-‐xparser • Web client with the following features:
• Input query • XML output • Diagrams of algebraic and PACT plans, as well as naviga&on tree paXerns.
11
Merci
12