+ All Categories
Home > Documents > Projection Indexes in HDF5

Projection Indexes in HDF5

Date post: 01-Jan-2016
Category:
Upload: warren-wynn
View: 19 times
Download: 0 times
Share this document with a friend
Description:
Projection Indexes in HDF5. Rishi Rakesh Sinha The HDF Group. 144 MB/hr. 200 GB/run. Science Produces Large Datasets. Observation/experiment driven. Simulation driven. Information driven. > 7GB/expt. Why Not Commercial DMBSs?. Proprietary format Lack of portability Low scalability - PowerPoint PPT Presentation
Popular Tags:
21
1 Projection Indexes Projection Indexes in HDF5 in HDF5 Rishi Rakesh Sinha Rishi Rakesh Sinha The HDF Group The HDF Group
Transcript
Page 1: Projection Indexes in HDF5

11

Projection Indexes in Projection Indexes in HDF5HDF5

Rishi Rakesh SinhaRishi Rakesh Sinha

The HDF GroupThe HDF Group

Page 2: Projection Indexes in HDF5

22

Science Produces Large DatasetsScience Produces Large Datasets

Observation/experiment drivenObservation/experiment driven

Simulation driven

Information driven

144 MB/hr

200 GB/run

> 7GB/expt

Page 3: Projection Indexes in HDF5

33

Why Not Commercial DMBSs?Why Not Commercial DMBSs?

Proprietary formatProprietary format Lack of portabilityLack of portability Low scalabilityLow scalability Lack of desirable access modesLack of desirable access modes Presence of expensive concurrency Presence of expensive concurrency

control and logging mechanismcontrol and logging mechanism Expensive parallel versionsExpensive parallel versions

Page 4: Projection Indexes in HDF5

44

State of the Art Not EnoughState of the Art Not Enough

Scientific file formatsScientific file formats and associated and associated I/O APIsI/O APIs Concentrating on HDF5Concentrating on HDF5

Data recovery is Data recovery is navigationalnavigational

SubsettingSubsetting only on a small set of only on a small set of attributesattributes

Page 5: Projection Indexes in HDF5

55

Why Indexes?Why Indexes?

Easy

Not So Easy

Page 6: Projection Indexes in HDF5

66

Previous Indexing EffortsPrevious Indexing Efforts

Implicit indexing in HDF5Implicit indexing in HDF5 JPL use of HDF VdatasJPL use of HDF Vdatas HDF-EOS point dataHDF-EOS point data PyTablesPyTables HDF5 internal B-Tree structuresHDF5 internal B-Tree structures

Page 7: Projection Indexes in HDF5

77

Why a Standard Indexing API?Why a Standard Indexing API?

Avoid duplication of effortAvoid duplication of effort PyTablesPyTables

Standardize indexing in HDF5Standardize indexing in HDF5 Standard API can be differently Standard API can be differently

implementedimplemented Make indexes portableMake indexes portable

Store indexes in HDF5 filesStore indexes in HDF5 files

Page 8: Projection Indexes in HDF5

88

H5IN APIH5IN API

Create_indexCreate_index Parameters: location of index, location of Parameters: location of index, location of

data, binning information, memory limitsdata, binning information, memory limits Returns: location of the indexReturns: location of the index

QueryQuery Parameters: dataset to query, query stringParameters: dataset to query, query string Returns: selection representing subset of the Returns: selection representing subset of the

data corresponding to the querydata corresponding to the query

Page 9: Projection Indexes in HDF5

99

Design DecisionsDesign Decisions

Limited scope of the prototypeLimited scope of the prototype Index stored in a separate datasetIndex stored in a separate dataset Returns a selectionReturns a selection Projection indexProjection index Support for simple boolean queriesSupport for simple boolean queries

Page 10: Projection Indexes in HDF5

1010

Limited ScopeLimited Scope

11stst indexing prototype in HDF5 indexing prototype in HDF5 Presence of implicit indexingPresence of implicit indexing

Index on single datasetsIndex on single datasets Query over single datasetsQuery over single datasets

Conditions should be over a single datasetConditions should be over a single dataset Result could be mapped to a separate Result could be mapped to a separate

datasetdataset

Page 11: Projection Indexes in HDF5

1111

Index StorageIndex StorageRoot Group: /

DAY1 DAY2 DAY3 DAY4

F3F3F2F2F1F1

Location DataPressureTemperature

Page 12: Projection Indexes in HDF5

1212

Index StorageIndex StorageRoot Group: /

DAY3

F3F3F2F2F1F1

Location Data

LD_INDEX

F1 F2

Page 13: Projection Indexes in HDF5

1313

Index StorageIndex StorageRoot Group: /

DAY3

PressureTemperature

T_IN P_IN

PressureTemperature

Page 14: Projection Indexes in HDF5

1414

Returns a SelectionReturns a Selection

Temperature Pressure

Concise StorageConcise Storage Efficient Boolean operationsEfficient Boolean operations

FIND PRESSURE WHERE TEMP IN [100, 200]

Page 15: Projection Indexes in HDF5

1515

Projection IndexProjection Index

TempTemp CategoryCategory PressurePressure

5252 AA 3232

4242 DD 3434

5757 FF 2121

2222 AA 2222

6767 DD 2727

AA

DD

FF

AA

FF

DD

Page 16: Projection Indexes in HDF5

1616

BinningBinning

11 22 33 44 55 66 77 88 99 1010 1111 1212 1313 1414 1515

1-31-3 4-64-6 7-97-9 10-1210-12 13-1513-15

Page 17: Projection Indexes in HDF5

1717

Projection IndexProjection Index

605040

313029Pressure

Temp

Page 18: Projection Indexes in HDF5

1818

Why Projection Index ?Why Projection Index ?

Data is read onlyData is read only Mostly dataset once written is not changedMostly dataset once written is not changed

Index does not need to be updatedIndex does not need to be updated Projection indexes well suitedProjection indexes well suited

Number of disk accesses is same as in case Number of disk accesses is same as in case of a B-Treeof a B-Tree

Are not considering multidimensional Are not considering multidimensional queriesqueries

Page 19: Projection Indexes in HDF5

1919

Only Simple Boolean QueriesOnly Simple Boolean Queries

Query FormatQuery FormatSELECT SELECT SELECTIONSELECTION

WHEREWHERE c11 < Attribute1 < c12c11 < Attribute1 < c12

AND c21 < Attribute2 < c22AND c21 < Attribute2 < c22

…… Results being selections boolean operations Results being selections boolean operations

can be done inside the library can be done inside the library

Page 20: Projection Indexes in HDF5

2020

ConclusionConclusion

Developing a standard indexing API in Developing a standard indexing API in HDF5HDF5

Creating a proof of concept prototype Creating a proof of concept prototype using projection indexesusing projection indexes

Take first step towards developing a Take first step towards developing a query language for HDF5query language for HDF5

Page 21: Projection Indexes in HDF5

2121

Future WorkFuture Work

Multi-dimensionalityMulti-dimensionality Multiple datasets in same fileMultiple datasets in same file Multiple datasets across filesMultiple datasets across files Indexes on attributesIndexes on attributes Allow user to index subset of datasetsAllow user to index subset of datasets


Recommended