+ All Categories
Home > Documents > ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing Donald Kossmann Technical...

ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing Donald Kossmann Technical...

Date post: 18-Dec-2015
Category:
View: 221 times
Download: 0 times
Share this document with a friend
46
ObjectGlobe Open, Secure, and QoS- enhanced Distributed Query Processing Donald Kossmann Technical University of Munich http://www3.in.tum.de Joint work with Alfons Kemper (Passau) and others
Transcript

ObjectGlobe Open, Secure, and QoS-enhanced

Distributed Query Processing

Donald Kossmann

Technical University of Munich

http://www3.in.tum.de

Joint work with Alfons Kemper (Passau) and others

Outline

• Background

• The ObjectGlobe Lookup Service

• (Security Aspects)

• QoS Management

• Summary

Query Processing on the Internet

• Web servers, relational databases on the Web: centralized or limited query capabilities

• Middleware Systems:a great deal of data shipping

• Goals of ObjectGlobe: – integrate any kind of data– integrate any kind of query processing capabilities– bring query processing capabilities to the data

Middleware for Query Processing

R

... ... ...

... ... ...

... ... ...S

Data-Provider A

T... ... ...... ... ...... ... ...

Data-Provider B

wrap_S

thumbnail

thumbnail

wrap_S

Use

r-d

efin

ed o

per

ato

rs

R... ... ...... ... ...

... ... ...

T... ... ...... ... ...... ... ...

Heavy data shipping

wrap_S

thumbnail

wrap_S

thumbnail

Fct-Provider

R... ... ...... ... ...

... ... ...S

Data-Provider AT

... ... ...

... ... ...

... ... ...

Data-Provider B

Open Query Processing (Step 1)

Load

functions

T... ... ...... ... ...

... ... ...

wrap_S

thumbnail

wrap_S

thumbnail

Fct-Provider

R... ... ...... ... ...

... ... ...S

Data-Provider AData-Provider B

Open Query Processing (Step 2)

Load

functions

Traveling from M to UCB

flights rental cars

Selection SelectionRoutenplaner

Route

Top N

Function Provider

Data Provider Data Provider

Cycle Provider

Open QP with ObjectGlobe

• Create an open marketplace for– data providers– cycle providers– function providers

• Requirements– wrappers exist for all data of data providers– JVM runs on all cycle providers– fixed interface for operators of function providers

Scenarios

• Free Internet: everything is free and available for everybody

• Restricted Internet: charge according to usage, quality, and timeliness; restrictions (e.g., age)

• Intranet: everything is free and available for „insiders“

• Outsourcing: charge for certain services (e.g., backup, business analyses)

Challenges

• Lookup Service– Find the relevant services

• Security – Protect data and cycle providers from bad code

• Quality of Service– What you pay is what you get

ObjectGlobe Lookup-Service

Lookup-Service

Parser OptimizerExecution

Engine

Application /User

Browse, Search

Authorisation,... Statistics, Cost Information, ...

Provider

Register

Description of Services

• Providers register RDF or XML documents

• There is a pre-defined schema to describe services

• Data Providers:– Theme (e.g., Hotel)– Attributes (e.g., rate, location, category)– Access paths and wrappers– Characteristics of the server (e.g., availability)– Information for authorization – Statistics– ...

• Function Provider:– Signature (e.g., foo(int, int) -> int)– Information for authorization– Hardware requirements (e.g., 30 MB main memory)– Size of Java byte code– ...

• Cycle Provider:– Hardware (e.g., 1 GB main memory)– Location and network connections / bandwidth– Information for authorization– ...

XML Description of a Data Provider

<DataProvider> <id> 4711 </id> <theme> <name> Hotel </name> <desc> All hotels you ever want </desc> </theme> <Attribute>

<topic> city </topic><type> string </type>

</Attribute>...

Lookup Query

• Data Providers for Hotels that return the City and Rate of each hotel

search DataProvider dselect d.uniqueId, d.attr.*where d.theme.name = „hotel“ and d.attr.?.topic = „city“ and d.attr.?.topic = „rate“

Three-tier Architecture

• Local Lookup-Servers– Keep copies of meta-data of services that are relevant

for a particular organization or subsidary– Evaluate Lookup requests for that organization– Relevance is determined by subscription rules (queries)

• Public Lookup-Servers (Backbone)– Store all (public) meta-data– Store subscription rules of local Lookup-Servers– Notify local Lookup-Servers of changes – Users can browse in the public info of the backbone

Three-tier Architecture

PublicLookup-Server

PublicLookup-Server

Local LS Local LS Local LS

New Rules Answers

Client Client Client ClientClient

QueriesAnswersNew Rules

Updates, Inserts

• Processing Lookup Requests– Local Lookup-Servers store meta-data in RDBMS– Translate Lookup request into SQL

• Registering new services– Public Lookup-Servers store meta-data in RDBMS– Public Lookup-Servers store rules in RDBMS– Apply filter algorithm using RDBMS in order to find

relevant local Lookup-Servers

• Deletes and updates of services– Apply filter algorithm to find affected local Lookup-

Servers (more complicated, however)

• Principle: Map everything to RDBMS

<person, id = 4711>

<name> Lilly Potter </name>

<child> <person, id = 314>

<name> Harry Potter </name>

</child>

</person>

<person, id = 666>

<name> James Potter </name>

<child> 314 </child>

</person>

person person

Harry Potter

name

name name

person

Lilly Potter James Potter

child

314

0

4711 666

i314

Storing XML Data in an RDBMS

Edge Approach

Source Label Target

0 person 4711

0 person 666

4711 name v1

4711 child i314

666 name v2

666 child i314

Id Value

v1 Lilly Potter

v2 James Potter

v3 Harry Potter

Id Value

v4 12

Edge Table Value Table (String)

Value Table (Integer)

XML Queries

• Find the name of all persons that like to play Quidditch and are younger than 18 years

select $nwhere <person>

<name> $n </name><age> $a </age><hobby> Quidditch </hobby>

</person>, $a < 18

• Carry out pattern matching with „document graph“

Translation to SQL

SELECT nv.value FROM Edge p, Edge n, Edge h, Value nv, Value hvWHERE p.label = „person“ AND p.target = n.source AND

n.label = „name“ AND n.target = nv.id AND

p.target = h.source AND h.label = „hobby“ AND h.target = hv.id AND hv.value = „Quidditch“;

Works essentially in the same way for the query language of ourLookup service.

Publish & Subscribe Algorithm

• Decompose subscription rules and store them in RDMBS of Public Lookup-Servers

• SQL Join-Queries in order to match sub-rules with meta-data objects(Recall: meta-data is decomposed, too)

• SQL Join-Queries in order to re-construct matching subscription rules from sub-rules

Decomposition of Subscription Rules

• Data Providers for Stock Market Information that cost less than 500 Dollars:search DataProvider dwhere d.theme.name = „Stock Market“ and d.cost < 500

• Decomposition into three atomic rules:R1: search Theme t where t.name = „Börse“R2: search DataProvider d where d.cost < 500R3: search R1 a, R2 b where b.theme = a

• Store these rules in RDBMSRule Class Operator Attribute Value

R1 Theme = name Stock Mkt

R2 DataProv. < cost 500

MatchingRule Class Operator Attribute Value

R1 Theme = name Stock Mkt

R2 DataProv. < cost 500

Object Type Attribute Value

O1 Theme name Stock Mkt

O1 Theme description SE InfoSys

O2 DataProv. theme O1

O2 DataProv. attr O3 (kurs)

O2 DataProv. attr O4 (wkn)

O2 DataProv. cost 70

Result of Join: (R1, O1); (R2, O2)

Re-constructing Subscription Rulesfrom matching atomic sub-rules

• Store decomposition graph in RDMBS– higher-level and atomic rules are vertices– Top-level rules are so-called triggering rules;

if they are affected, notify LLS

• Walk „bottom up“ through decomposition graph– SQL-Join Query: for each pair of matching rules, find

out whether they have a common parent– N.B. the decomposition graph is a binary directed,

acyclic graph

Preliminary Experiments

• Synthetic benchmark database with 100.000 (different) subscription rules

• Oracle 8i used in the Public Lookup Server

# new providers Proc. Time (PLS)

1 250 msecs

100 (batch) 5000 msecs

Batch updates are crucial

Summary

• Basic Principle: decompose rules and data

• Advantages:– Generic, independent of schema– Very easy to implement, no administration needed– Exploit query capabilities of RDBMS– Need not worry about document boundaries– Finding common sub-rules is trivial

• Disadvantage:– Sub-optimal query performance (many Joins)

but probably sufficient, if updates are batched

Related Work

• Lookup Services: Jini, UDDI, Plug & Play

• Publish & Subscribe:– IR world– SIFT (Stanford)– XFilter (Berkeley)– LeSelect (INRIA)– Continuous Queries (Niagra, ...)

• Storing and Indexing XML Data: ...

Outline

• Background

• The ObjectGlobe Lookup Service

• (Security Aspects)

• QoS Management

• Summary

Security Requirements in ObjectGlobe

• Protection of Data and Cycle Providers• Secure Communication

– use SSL connections (authenticated and encrypted)• Authentication of Clients

– passwords / certificates – digitally signed requests (query subplans)

• Authorization control– data/cycle providers are autonomous– but register user privileges in lookup service

Security of Data/Cycle Providers

ObjectGlobe

runtime

system

Class

loader

Class

loader

Class

loader

Internal

class

loader

Secure

sandbox

Internet

Query 1

Query 2

Query 3

Privileged Built-inOperatorsfor Disk or Network Access

sandbox

externaloperator

Internaloperator

tmpfile

QoS Management

• State of the Art: best-effort

• Goal: users should be able to constrain– Cost of execution– Running time – Quality of the results

• Initial approach (to get a feeling)– extended query optimization– Admission control– Monitoring and plan adaptions at execution time

• Real solution: ???

Quality Parameters

• Cost of execution– $

• Running time– First tuple, last tuple, Nth tuple

• Quality of the results– Number of results– Coverage: Number (or %) of data sources queried– Staleness of data

• Cost as a function of coverage (-> Mariposa)

• Cost as a function of #wheels (Mercedes)

Quality of Service-Parameters

Completeness

Cost

(€

)

min

max

max

Respo

nse

time

Desired

space for

query plans

Extended Query Optimization

Bottom-up dynamic programming query optimizer,standard costing etc., and the following extensions1. Generate alternatives for each operator

• Consider classes of equivalent providers

2. Extended Pruning, Heuristics for choosing a Winner

3. Enumerate „incomplete“ UNIONs4. Initialize QoS-Accounts

Query Optimization: Quality of Service-Considerations

Completeness

Cost

40%

illegal QEP

P

Q

R

QoS-Annotated Query Plan

scan scan

thumbnail

display

host=A.com

host=client

host=client

host=A.com

host=B.com

host=B.com

wrap_Shost=A.com

host=B.comcost timeOtimeN

cost timeOtimeN

cost timeOtimeN

QoS Accounts

Optimization: Open Questions

• Revisit heuristics to choose winning plan– Dynamic heuristics depending on workload

and/or feedback

• Reverse engineering a plan– How much data should a plan read if the cost

should be $5.00?

• Does query optimization matter?

Admission Control & Monitoring

• Admission Control:– Check assumptions of optimizer – Carried out at plan instantiation time for each

plan fragment (set of operators at one site)

• Monitoring:– Predict quality of results at the end of execution– Carried out by special Monitoring operators

• Take actions if violations are detected– ECA rules specify actions

Monitoring Operators

• at the end of pipelines• are non-blocking / low cost • above „receive“ ops• keep statistics for predictions• differentiate between „open“

and „next“ phase• Communicate with each

other for liveliness monitor

A

send

monitor

B

send

monitor

receive

Join

Plan Adaptions• General: Abort, Restart / Reoptimize

• Response Time Violation:– compressConnection – movePlan (w/wo state)– increasePriority– removeTempResults, ...

• Coverage / Result Quality Violation:– addSubPlan

• Cost Violation:– movePlan, decreasePriority, ...

ECA Rules for Adaptions

if cost is high and coverage is low then abort

if cost is high and coverage is high then delResults

if rt is high and cost is low and network is critical then compress

Plan Adaptions: Open Questions

• What is the right mix of actions?

• What are the right thresholds for the rules?

• How to avoid the „Schweinezyklus“?

• How to draw the right conclusions from the statistics produced by Monitoring?

• What is the right granularity of actions?Plan vs. Operator vs. Tuple

Project Status

• First demo presented at SIGMOD 99– Travel information– Four Web data sources (hotels, sights, train conns)– One function provider (travel routes, top N)– Three cycle providers (two in Europe, one in US)

• Online-Demo: http://db.fmi.uni-passau.de/projects/OG

• Current work: more experiments– Problem: getting data from Web sources is sloooow


Recommended