+ All Categories
Home > Documents > Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any...

Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any...

Date post: 27-Feb-2018
Category:
Upload: hahuong
View: 217 times
Download: 3 times
Share this document with a friend
27
Towards Benchmarking Large Arrays in Databases H. Stamerjohanns P. Baumann Computer Science Jacobs University Bremen WBDB12 H. Stamerjohanns, P. Baumann (Jacobs University Bremen) Benchmarking Large Arrays in Databases WBDB12 1 / 23
Transcript
Page 1: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Towards BenchmarkingLarge Arrays in Databases

H. Stamerjohanns P. Baumann

Computer ScienceJacobs University Bremen

WBDB12

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 1 / 23

Page 2: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

An Array DBMS: RasdamanGoal of rasdaman database:

handle raster datamassive n-dimensionalSensor-, Image-,Model & Statistics DB 1

Tile-based architecturen-D array → set of n-D tilesadapting storage to access pattern(preserve locality of reference) 1

984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

2010

2012

Grid DataBlade

Rasdaman

TerraLib

PostGis Raster

Oracle genraster

ESRI ArcSDE

SciQL

SciDB

Paradisepicdms

SpatiaLite

Grid & Gridfield

AQuery

RAMAML

AQL

EXTRA/EXCESS

OpenTSD, ExtaScid

1Baumann 1992, Baumann VLDBJ 1994H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 2 / 23

Page 3: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

An Array DBMS: Rasdaman

declarative, minimal, safe Array Algebra:Intensive user studies: statistics, image, signal processing

minimally invasive DBMS integrationnew attribute type: array<celltype,extent>

maps d-dimensional Euclidean hypercube Xonto value set V

Array is function a : X → V

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 3 / 23

Page 4: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

An Array DBMS: Rasdaman

implements SQL-embedded DML with array operatorsselect / insert / update / delete + partial update

select img.scene.green[x0:x1,y0:y1] > 130from LandsatArchive as imgwhere some_cells(img.scene.nir > 127)

Web mapping, image & signalprocessing statistics,linear algebra, pattern mining,scientific analytics

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 4 / 23

Page 5: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What is Big Data?

somehow connected to volumebut volume is moving targetnot only petabytes are Big Data

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 5 / 23

Page 6: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What is Big Data?

unless you are reeeaally big, storage volume is not biggestproblemto do proper analysis then is the difficultysuboptimal access patterns show up→ inability of existing DB to scale

cardinality of data is typically small compared to volumerepeated observations of time or spacemany datasets have inherent temporal or spatial dimensionsbut not ordered accordingly to preserve localityanalysis then results in random-access patterns → sloow.

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 6 / 23

Page 7: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What is Big Data?

ETL may not be the right solution...big volumes need to be transferred for further processing

Meta-definition:"Any point in time when data volume forces us to look beyondthe tried-and-true methods that are prevalent at that time"2

2A. Jacobs 2009H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 7 / 23

Page 8: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Array database domain

Diverse worlddifferent approaches to implement arrays on databasesexist

MonetDB3

SciDB4

no unified query language availabledifferent usage scenarios

(web-) service providing access to many usersbut also personal research tool to analyse data

3van Ballegoji et al., 2005, www.monetdb.org4P. Cudre-Mauroux et al., 2009, www.scidb.org

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 8 / 23

Page 9: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Benchmarking Array DBMS

Benchmarks should be... [Gray 1993]relevant

→ map real-world needs→ rather practice driven

systematically cover features and data properties→ apply to different application domains

simpleobviously some trade-off to previous point needed

portableas no unified query language available→ high level description of tasks to fulfill

scalable

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 9 / 23

Page 10: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Benchmarking Array DBMS

Need to testfurther details follow...

array featuresdimensionality, cell types

data propertiesvolume, sparsity

array query operationsdomain specific features

special operations, transformations

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 10 / 23

Page 11: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What needs to be tested... relevance

number of dimensionslow-dimensional (1-D - 5-D)1-D environmental sensor time series2-D satellite images, seafloor maps3-D x/y/t image time seriesand x/y/z geophysics data4-D x/y/z/t climate and ocean datamedium-dimensional (6-D - 12-D)OLAPhigh-dimensional (up to thousands)Data-Mining, collection of features

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 11 / 23

Page 12: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What needs to be tested... relevance

number of dimensionslow-dimensional (1-D - 5-D)1-D environmental sensor time series2-D satellite images, seafloor maps3-D x/y/t image time seriesand x/y/z geophysics data4-D x/y/z/t climate and ocean datamedium-dimensional (6-D - 12-D)OLAPhigh-dimensional (up to thousands)Data-Mining, collection of features

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 11 / 23

Page 13: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What needs to be tested... relevance

number of dimensionslow-dimensional (1-D - 5-D)1-D environmental sensor time series2-D satellite images, seafloor maps3-D x/y/t image time seriesand x/y/z geophysics data4-D x/y/z/t climate and ocean datamedium-dimensional (6-D - 12-D)OLAPhigh-dimensional (up to thousands)Data-Mining, collection of features

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 11 / 23

Page 14: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What needs to be tested... relevance

number of dimensionslow-dimensional (1-D - 5-D)1-D environmental sensor time series2-D satellite images, seafloor maps3-D x/y/t image time seriesand x/y/z geophysics data4-D x/y/z/t climate and ocean datamedium-dimensional (6-D - 12-D)OLAPhigh-dimensional (up to thousands)Data-Mining, collection of features

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 11 / 23

Page 15: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What needs to be tested... relevance

number of dimensionslow-dimensional (1-D - 5-D)1-D environmental sensor time series2-D satellite images, seafloor maps3-D x/y/t image time seriesand x/y/z geophysics data4-D x/y/z/t climate and ocean datamedium-dimensional (6-D - 12-D)OLAPhigh-dimensional (up to thousands)Data-Mining, collection of features

precipitationx/y/z/t

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 11 / 23

Page 16: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What needs to be tested... relevanceSpace time cube

Satellite creates several scenesSatellite scene referenced by latitude/longitude + timeat least twice per year each point should be mappedset of scenes that have temporal and spatial overlap

Example query:give me the Near-field infrared (NIR) values between 2007and 2009 in Vienna

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 12 / 23

Page 17: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What needs to be tested...

Dimensions and cell type constitute array model featurescell types

singlerecords (e.g. colored pixel)domain specific data structures

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 13 / 23

Page 18: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

What needs to be tested... scaleability

Data propertiesVolume of data

range MB to PBSparsity of data

sparse arrays like statistical data cubesdense arrays like satellite imagery

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 14 / 23

Page 19: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Relevance in array database domain

Array is function a : X → V

Query operationson X : trimming, slicing

on V : pixel-wise addition of images

on the function itself: histogram

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 15 / 23

Page 20: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Relevance in array database domain

Array is function a : X → V

Query operationsde-arraying functions: aggregations

querying irregular time axis (most rain in june in last years)

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 16 / 23

Page 21: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Relevance in array database domain

Array is function a : X → V

Irregular time axiscalendar is highly irregular,month lengths differ, leap yearsbut need to analyse by month, season→ create additional dimensionshas effect on tiling strategies

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 17 / 23

Page 22: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Ease of use in array database domain

Array is function a : X → V

Query operation supportnatively supported?via User Defined Functions (UDF)?

expertise neededadditional costs involved

.

..how to implement in benchmark?

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 18 / 23

Page 23: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Suitability cube

Combination of assessments can be called a suitability cubeaddresses challenges from all relevant sidesdevelopers want to address all possibilitiesusers want one single number...

Does modern technology help?

(modified image from qrarts.com)

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 19 / 23

Page 24: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Existing array DB benchmarks

Early attempts: Sequoia 20005, Paradise6

Standard Science DBMS Benchmark (SS-DB)7

applies space-science use caserelevant, performs nine queries on astronomical data

load dataqueries raw datacreates derived data (cooking)queries derived data

portable, source-code available (but difficult to find...)→ repeatablescalable, covers small to big data volumes, data generator

5Stonebraker 19936Patel et al. 19977Cudre-Mauroux et al. 2010

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 20 / 23

Page 25: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Existing array DB benchmarks, SS-DB

However...only single-user queriesselection of queries seems rather limiteddoes not address higher-dimensions, such as 4-d, 5-d→ does not fully cover other application domains, such asgeophysics, climate and ocean dataonly regular time axis

Trade-off between simplicity and functional coverageease of use, no analysis of array queries used

natively supported?user defined functions

result is not a single number...

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 21 / 23

Page 26: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Conclusion

arrays inherent in Big Databenchmarks for big data shouldconsider array operations as wellsuitability cube tries to address many metricsSS-DB good basis for discussion

benchmarks will make us work harder...

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 22 / 23

Page 27: Towards Benchmarking Large Arrays in · PDF fileto do proper analysis then is the ... "Any point in time when data volume forces us to ... Benchmarking Large Arrays in Databases WBDB12

Conclusion

H. Stamerjohanns, P. Baumann (Jacobs University Bremen)Benchmarking Large Arrays in Databases WBDB12 23 / 23


Recommended