+ All Categories
Home > Documents > Mariposa: a wide-area distributed database system

Mariposa: a wide-area distributed database system

Date post: 12-Jan-2016
Category:
Upload: barney
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Mariposa: a wide-area distributed database system. Kumar Ramdurgkar. CIS 661. Mariposa Distributed Database Management System. Principal Investigator: Prof. Michael Stonebraker. SECTION 1. Introduction to Mariposa. LAN Vs WAN databases. - PowerPoint PPT Presentation
Popular Tags:
37
Mariposa: a wide- area distributed database system Kumar Ramdurgkar. CIS 661
Transcript
Page 1: Mariposa: a wide-area distributed database system

Mariposa: a wide-area distributed database system

Kumar Ramdurgkar.

CIS 661

Page 2: Mariposa: a wide-area distributed database system

Mariposa Distributed Database Management System

Principal Investigator: Prof. Michael Stonebraker

Page 3: Mariposa: a wide-area distributed database system

SECTION 1Introduction to Mariposa

Page 4: Mariposa: a wide-area distributed database system

LAN Vs WAN databases LAN database management is common

most often used in industries where the data is local to the installation.

LAN has a single RDBMS source. LAN is maintained by a well defined set of

rules, data types, and services.

The difference ?

Page 5: Mariposa: a wide-area distributed database system

WAN Databases Many databases interconnected over a

WAN In WAN there are many sites participating

in the DBMS Different site administrators. Different data types, extensions

and service handling times. How do we interconnect ? What are the issues ?

Page 6: Mariposa: a wide-area distributed database system

Issues and problems Network connections and traffic. Different ‘load’ handling capabilities and

service times. Different data type and extensions. A single program acting as a query

optimizer will NOT work

continued…

Page 7: Mariposa: a wide-area distributed database system

Issues and problems Cost based optimization does not respond

well to site specific type extensions and access constraints, charging algorithms and time-of-day constraints.

No proper scaling for LAN algorithms to suite WAN DBMS

The Solution…

Page 8: Mariposa: a wide-area distributed database system

An excellent idea ! MARIPOSA UBID !! Have you been there ??

The Mariposa is a distributed DBMS working on the economic paradigm of Bidding.

Mariposa was proposed by: Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu

Proposed: Nov 1994 Accepted: Sept 1995

Page 9: Mariposa: a wide-area distributed database system

Mariposa… vision Standard approach for distributed data. A set of standard guidelines for WAN

databases. Application of query storage and

optimization using a different perspective. Scalability and data explosion handling. A query optimizer for the WWW ??

Need to formalize

Page 10: Mariposa: a wide-area distributed database system

WAN Guidelines for Mariposa Scalability to a large number of

cooperating sites. Data mobility. No global synchronization of data. Total local autonomy and complete

control. Easily configurable policies for changing

the behavior of Mariposa.

Page 11: Mariposa: a wide-area distributed database system

Mariposa System architecture Microeconomic mechanisms. All Mariposa clients and servers have a

account with a network bank. A user allocates a budget in the currency

of this bank to each query. The goal of the query processing system

is to solve the query within the allotted time by contracting various Mariposa clients.

Page 12: Mariposa: a wide-area distributed database system

Mariposa Broker mechanism Obtain bid pieces for a query from sites. Uses a distributed advertising system as

over the usual META – DATA mechanisms used in LAN.

The server who has advertised the best time for the given query wins.

Page 13: Mariposa: a wide-area distributed database system

Scalability Site can join Mariposa by buying ‘objects’

and advertising services Site can leave Mariposa by selling objects

and by ceasing to bid. Hence a highly scalable system.

Infact the success of Mariposa depends on a large number of sites participating in the system.

Page 14: Mariposa: a wide-area distributed database system

Storage decisions Objects have no notion of home. All secondary indices are moved with the

objects. Avoidance of global sync is simplified

because of the economic paradigm. Mariposa fosters data mobility and free

trade of objects Object here means ‘data’

Page 15: Mariposa: a wide-area distributed database system

Total local control Since each Mariposa site is free to bid on

any business of interest, it has total local autonomy.

Each site is expected to maximize its individual profit per unit of operating time and to bid on those queries that it feels will accomplish this goal.

Page 16: Mariposa: a wide-area distributed database system

Sounds good… any drawbacks ?? Some queries may not be solvable either

because nobody will bid on them or the minimum bids exceeds what the client is willing to pay.

A site can refuse to give up objects A site may not find buyers for objects that

it wants to sell.

Page 17: Mariposa: a wide-area distributed database system

SECTION 2Mariposa architecture

Page 18: Mariposa: a wide-area distributed database system

Mariposa Architectural details Hardware Flow chart Processes (bidding, bid protocols,

acceptance, finding bidders, sub–query bidding, network bidding, splitting and combining)

Code languages (RUSH) Mariposa experiments and results Conclusions

Page 19: Mariposa: a wide-area distributed database system

Client query in SQL3 Middleware consists

of several query separator and query broker.

Broker and Bidder coded in RUSH.

Local execution at the site that wins the bid.

Details…

Architecture overview

Page 20: Mariposa: a wide-area distributed database system

Architecture details

Page 21: Mariposa: a wide-area distributed database system

Processes : Bidding

Each query Q has a budget B(t) that can be used to solve the query

The budget is a value the user gives to solve this query.

Broker receives query plan for Q and tries to bid and solve each fragment using either the expensive bid protocol or a cheaper purchase order protocol.

Page 22: Mariposa: a wide-area distributed database system

Processes : Bidding

Brokers split each query into sub queries and bid for each sub query

There is a set sequence of sub query execution.

Finding the right winners is implemented in a greedy algorithm at the broker.

Page 23: Mariposa: a wide-area distributed database system

Processes : Bid Protocols The expensive bid protocol has 2 phases:

Broker sends requests and Bidder sends back triplet value (Ci, Di, Ei) indicating cost Ci for Delay of Di and expiration of bid is Ei (for Qi)

The broker notifies winners (and losers). The purchase order protocol is faster and

involves the Broker sending the query to the site it is most likely to be processed. There is a risk that the query might not be processed in the given time.

Page 24: Mariposa: a wide-area distributed database system

Finding Bidders Brokers examine ‘Ad Tables’ to find out

the servers that are willing to perform the task at hand.

Using records in an Ad Table the server posts its bids.

Ad tables typically have the bidding information for the sample query structures run on that server.

Page 25: Mariposa: a wide-area distributed database system

Sample Ad Table design Not all fields might be used

Page 26: Mariposa: a wide-area distributed database system

Bidding strategies Bulk purchase contracts allowing lower

than normal bids (wholesale) Coupons Sale Broker intelligence (remember last

successful bid history and try that site query combination again)

Page 27: Mariposa: a wide-area distributed database system

Processes: Network Bidding Account for network bandwidth. Data size comes into the consideration. Minimum available bandwidth is calculated

from node to node. This bandwidth must be reserved to

achieve desired performance. Mariposa uses Telnet protocols RTIP and

RCAP for network bidding.

Page 28: Mariposa: a wide-area distributed database system

Coding (RUSH language) Mariposa provides a low level, very

efficient embedded scripting language and rule system called Rush

Using Rush, it is straightforward to change policy decisions; one simply modifies the rules by which these modules are implemented.

The Mariposa architecture is primarily coded in Rush.

Page 29: Mariposa: a wide-area distributed database system

SECTION 3Mariposa experiments and

results

Page 30: Mariposa: a wide-area distributed database system

Operational system Mariposa operational on Digital Equipment

Corp. Alpha AXP workstations. UC Berkeley,

The basic server engine is that of POSTGRES.

Implementation of the Rush language itself has required careful design and performance engineering.

Requirement of multithreaded network communication package.

Page 31: Mariposa: a wide-area distributed database system

Experiment setup Workstations connected by 10MB/s

ethernet WAN experiments conducted at night. The benchmark database consists of three

tables, R1, R2 and R3. The workload query is an equijoin of all

three tables:SELECT * FROM R1, R2, R3

WHERE R1.u1 = R2.u1

AND R2.u1 = R3.u1

Page 32: Mariposa: a wide-area distributed database system

In the wide area case, the query originates at Berkeley and performs the join over the WAN connecting UC Berkeley,UC Santa Barbara and UC San Diego.

Page 33: Mariposa: a wide-area distributed database system

Timing Results

Page 34: Mariposa: a wide-area distributed database system

Conclusions Mariposa, a prototype data management

system that unifies the best features of distributed operating system and distributed database management system research.

Distributed query optimization has been identified as an area that will receive a strong emphasis and we will also examine how to build a system that has a rule system at its core.

Page 35: Mariposa: a wide-area distributed database system

Conclusions Future work remains in the areas of

system robustness, distributed failure recovery, and performance assessment.

Page 36: Mariposa: a wide-area distributed database system

References Mariposa homehttp://s2k-ftp.cs.berkeley.edu:8000/mariposa/index.html

Page 37: Mariposa: a wide-area distributed database system

Thank you.


Recommended