+ All Categories
Home > Documents > MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

Date post: 31-Dec-2015
Category:
Upload: amanda-hatfield
View: 41 times
Download: 0 times
Share this document with a friend
Description:
MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database. Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam Presented at SSDBM, July 2007, Banff, Canada. 230 million object images 1 million spectra 4TB catalog data 9TB images. - PowerPoint PPT Presentation
17
MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam Presented at SSDBM, July 2007,
Transcript

MonetDB/SQL Meets SkyServer:the Challenges of a Scientific Database

Milena Ivanova, Niels Nes,

Romulo Goncalves, Martin Kersten

CWI, Amsterdam

Presented at SSDBM, July 2007, Banff, Canada

SkyServer provides public access to SDSS

for astronomers, students, and wide public

A project to make a map of a large part of the

Universe

230 million object images1 million spectra4TB catalog data9TB images

M. Ivanova et al., CWI DBDBD’07, Eindhoven

SkyServer Schema

446 columns>370 million rows

Vertical fragment of 100+ popular columns

Materialized join of Photo and Spectra

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Outline

• MonetDB/SQL

• SkyServer porting lessons

• Query log lessons

• Evaluation

• Outlook

M. Ivanova et al., CWI DBDBD’07, Eindhoven

MonetDB Background

H0@0

TRa

0@0

1@0

2@0

0.0645

0.1433

0.2811

H0@0

TDec

0@0

1@0

2@0

1.2079

1.0662

1.2495

H0@0

TU

0@0

1@0

2@0

14.70872

11.71277

12.02889

Ra Dec U ...

0.0645

0.1433

0.2811

1.2079

1.0662

1.2495

14.70872

11.71277

12.02889

PhotoObjAll

Ra BAT Dec BAT U BAT

M. Ivanova et al., CWI DBDBD’07, Eindhoven

MonetDB Architecture

SQL

MonetDB Server

Tactical Optimizer

MonetDB Kernel

XQuery

MAL

MAL

function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X16 := calc.oid(0@0); X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,"");end s3_1;

select count(*) from photoobjall;

M. Ivanova et al., CWI DBDBD’07, Eindhoven

SkyServer with MonetDB

Goal: To provide SkyServer mirror with similar functionality using MonetDB

Three phases: 1%, 10%, entire SDSS data set

Can we • Do better in terms of performance and

functionality?• Improve query processing by novel parallelism and

query cracking techniques?• Extend functionality to support, e.g. LOFAR?

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Portability Lessons

• Need for rich SQL environment (PSM)

• Cast to SQL:2003 standard– Replacement of data types and operations– Specific extensions ignored or replaced

• Avoid data redundancy– Auxiliary tables replaced by views:10%

size reduction

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Spatial Search Lesson

• HTM (Hierarchical Triangular Mesh)– Implemented in C++, C#– Good for point-near-point and point-in-

region queries

• Zones– Implemented in SQL– Good for point-near-point (x3)– Efficient for batch-oriented spatial join(x32)– Enables SQL optimizer usage

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Query Log Lessons

• Query logs important for both application and science

• Analysed 1.2M queries, August 2006

• Spatial access prevails (83%)

• Small core of photo and spectro tables accessed– 64% photo, 44% spectro, 27% both

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Common Patterns

• Limited number of query patterns – Correlation to web site interface

• Most popular query (25%)SELECT top 10 p.objID, p.run, p.rerun, p.camcol, p.field, p.obj, p.type, p.ra, p.dec, p.u, p.g, p.r, p.i, p.z, p.Err_u, p.Err_g, p.Err_r, p.Err_i, p.Err_z

FROM fGetNearbyObjEq(195,2.5,3) n, PhotoPrimary pWHERE n.objID = p.objID;

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Query Coverage

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Spatial Overlap

• 24% queries overlap• Mean sequence length of 9.4,

max of 6200• Overlap and equality patterns for script-

based interaction• Zoom in/zoom out patterns for manual

interaction

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Evaluation on 1GB

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Evaluation on 100GB

• ‘Color-cut’ for low-z quasarsSELECT g, run, rerun, camcol, field, objID, FROM GalaxyWHERE ( ( g <= 22) and

(u - g >= -0.27) and (u - g < 0.71) and (g - r >= -0.24) and (g - r < 0.35) and(r - i >= -0.27) and (r - i < 0.57) and(i - z >= -0.35) and (i - z < 0.7) );

• Moving asteroidsSELECT objID,

sqrt(power(rowv,2) + power(colv,2)) as velocity

FROM PhotoObjWHERE power(rowv,2) + power(colv,2) > 50

and rowv >= 0 and colv >= 0;

1135,532

405

Quasars Asteroids

Tim

e in

se

c.

MonetDB MS SQL

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Status

• Staircase to the sky– 1GB: done– 100GB: towards

completion– Entire 4TB DR6: in

progress

• Web site

M. Ivanova et al., CWI DBDBD’07, Eindhoven

Inspirations

• Self-organization vs. hard-coded zoning– Adaptive segmentation (ICDE’08)– Adaptive replication (EDBT’08)

• Results caching and reuse

• Workload-driven optimization


Recommended