Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | amanda-hatfield |
View: | 41 times |
Download: | 0 times |
MonetDB/SQL Meets SkyServer:the Challenges of a Scientific Database
Milena Ivanova, Niels Nes,
Romulo Goncalves, Martin Kersten
CWI, Amsterdam
Presented at SSDBM, July 2007, Banff, Canada
SkyServer provides public access to SDSS
for astronomers, students, and wide public
A project to make a map of a large part of the
Universe
230 million object images1 million spectra4TB catalog data9TB images
M. Ivanova et al., CWI DBDBD’07, Eindhoven
SkyServer Schema
446 columns>370 million rows
Vertical fragment of 100+ popular columns
Materialized join of Photo and Spectra
M. Ivanova et al., CWI DBDBD’07, Eindhoven
Outline
• MonetDB/SQL
• SkyServer porting lessons
• Query log lessons
• Evaluation
• Outlook
M. Ivanova et al., CWI DBDBD’07, Eindhoven
MonetDB Background
H0@0
TRa
0@0
1@0
2@0
0.0645
0.1433
0.2811
…
H0@0
TDec
0@0
1@0
2@0
1.2079
1.0662
1.2495
…
H0@0
TU
0@0
1@0
2@0
14.70872
11.71277
12.02889
…
Ra Dec U ...
0.0645
0.1433
0.2811
…
1.2079
1.0662
1.2495
…
14.70872
11.71277
12.02889
…
…
…
…
PhotoObjAll
Ra BAT Dec BAT U BAT
M. Ivanova et al., CWI DBDBD’07, Eindhoven
MonetDB Architecture
SQL
MonetDB Server
Tactical Optimizer
MonetDB Kernel
XQuery
MAL
MAL
function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X16 := calc.oid(0@0); X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,"");end s3_1;
select count(*) from photoobjall;
M. Ivanova et al., CWI DBDBD’07, Eindhoven
SkyServer with MonetDB
Goal: To provide SkyServer mirror with similar functionality using MonetDB
Three phases: 1%, 10%, entire SDSS data set
Can we • Do better in terms of performance and
functionality?• Improve query processing by novel parallelism and
query cracking techniques?• Extend functionality to support, e.g. LOFAR?
M. Ivanova et al., CWI DBDBD’07, Eindhoven
Portability Lessons
• Need for rich SQL environment (PSM)
• Cast to SQL:2003 standard– Replacement of data types and operations– Specific extensions ignored or replaced
• Avoid data redundancy– Auxiliary tables replaced by views:10%
size reduction
M. Ivanova et al., CWI DBDBD’07, Eindhoven
Spatial Search Lesson
• HTM (Hierarchical Triangular Mesh)– Implemented in C++, C#– Good for point-near-point and point-in-
region queries
• Zones– Implemented in SQL– Good for point-near-point (x3)– Efficient for batch-oriented spatial join(x32)– Enables SQL optimizer usage
M. Ivanova et al., CWI DBDBD’07, Eindhoven
Query Log Lessons
• Query logs important for both application and science
• Analysed 1.2M queries, August 2006
• Spatial access prevails (83%)
• Small core of photo and spectro tables accessed– 64% photo, 44% spectro, 27% both
M. Ivanova et al., CWI DBDBD’07, Eindhoven
Common Patterns
• Limited number of query patterns – Correlation to web site interface
• Most popular query (25%)SELECT top 10 p.objID, p.run, p.rerun, p.camcol, p.field, p.obj, p.type, p.ra, p.dec, p.u, p.g, p.r, p.i, p.z, p.Err_u, p.Err_g, p.Err_r, p.Err_i, p.Err_z
FROM fGetNearbyObjEq(195,2.5,3) n, PhotoPrimary pWHERE n.objID = p.objID;
M. Ivanova et al., CWI DBDBD’07, Eindhoven
Spatial Overlap
• 24% queries overlap• Mean sequence length of 9.4,
max of 6200• Overlap and equality patterns for script-
based interaction• Zoom in/zoom out patterns for manual
interaction
M. Ivanova et al., CWI DBDBD’07, Eindhoven
Evaluation on 100GB
• ‘Color-cut’ for low-z quasarsSELECT g, run, rerun, camcol, field, objID, FROM GalaxyWHERE ( ( g <= 22) and
(u - g >= -0.27) and (u - g < 0.71) and (g - r >= -0.24) and (g - r < 0.35) and(r - i >= -0.27) and (r - i < 0.57) and(i - z >= -0.35) and (i - z < 0.7) );
• Moving asteroidsSELECT objID,
sqrt(power(rowv,2) + power(colv,2)) as velocity
FROM PhotoObjWHERE power(rowv,2) + power(colv,2) > 50
and rowv >= 0 and colv >= 0;
1135,532
405
Quasars Asteroids
Tim
e in
se
c.
MonetDB MS SQL
M. Ivanova et al., CWI DBDBD’07, Eindhoven
Status
• Staircase to the sky– 1GB: done– 100GB: towards
completion– Entire 4TB DR6: in
progress
• Web site