WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM...

WFCAM Science Archive

Critical Design Review, April 2003

The SuperCOSMOS Science Archive (SSA)

• WFCAM Science Archive prototype

• Existing ad hoc flat file archive (inflexible, restricted access) re-implemented in an RDBMS

• Catalogue data only (no image pixel data)

• 1.26 Tbytes of catalogue data

• Implement a working service for users & developers to exercise prior to arrival of Tbytes of WFCAM data



SSA has several similarities to WSA:

• spatial indexing is required over celestial sphere

• many source attributes in common, eg. position, brightness, colour, shape, …

• multi-colour, multi-epoch merged source information results from multiple measurements of the same source



Entity – Relationship Models (ERMs):

• generalised, DBMS – independent

• simple, pictorial summary of relational design

• ERMs map directly to table design



SSA relational model:

• very simple relational model

• total of 5 entities

• Catalogues have ~256 byte records with mainly 4-byte attributes, ie. 50 to 60 per record

• so 2 tables dominate the DB - SurveyCat: 0.82 Tbyte - MergedCat: 0.44 Tbyte



SSA has been implemented and 1% of data ingested:

• as prototype for V1.0 WSA

• Windows/SQL Server => “SkyServer”

• real-world queries used to exercise SSA

• 100% ingested and online by end Q2 2003

• test-bed for user access tools and archive scientist curation



Development method: “20 queries approach”

• a set of real-world astronomical queries, expressed in SQL

• includes joint queries between the SSA and SDSS

• currently have been exercised in the EDR region: - SSA: 13 million records; ~ 3 Gbyte - SDSS: 14 ; ~22



WSA has significant differences, however:

• catalogue and pixel data;

• calibration and other extensive metadata;

• science – driven, nested survey programmes (as opposed to SSA “atlas” maps of whole sky) result in complex data structure;

• curation & update within DBMS (whereas SSA is a finished data product ingested once into the DBMS).



WSA key requirements:

• flexibility: - ingested data are rich in structure - ingest occurs daily, curation daily/weekly/monthly … - many varied usage modes - protect proprietorial rights for many data

• scalability: - ~2 Tbyte of new catalogue & ancillary data per year • rapid response: - need to maintain rapid response despite increasing data volumes



Schematic picture of the WSA:

• Pixels: - one flat – file image store; access layer restricts public access - filenames and all metadata are tracked in DBMS tables with unrestricted access

• Catalogues: - WFAU incremental (no public access) - Public, released DBs - external survey datasets also held



WFCAM pixel data

• pixel data consist of multiframes and combiframes in WSA parlance;

• stored as flat files (not BLOBs in the DBMS)

• metadata are stored in the DBMS

• library calibration frames are held

• default image products are held



WFCAM multiframe - any pipeline product that:

• retains instrumental “paw print” as distinct images (WSA calls these “detector frames”)

• is not made up from other ingested frames (eg. microstep interleave is a multiframe)

WSA includes difference images as multiframes



WFCAM combiframe – any pipeline or archive product that:

• is the result of combination process on stored multiframes - eg. pipeline dither/stack/mosaic product - eg. archive default stack/mosaic product

(NB: combiframe may still reflect the “paw print” so can havemultiframe characteristics)



Multiframe ERM:

• Programme & Field => vital

• library calibration multiframes stored & related

• primary/extension HDU keys logically stored & related

• this will work for VISTA



Combiframe ERM:• every combiframe has provenance linking to multiframes

• individual calibration frames not reqd. but individual confidence frames are

• combiframe may consist of multiframe-like detector combiframes



Astrometric and photometric calibration data:

• require to store calibration information (SRAD)

• recalibration is required – esp. photometric (SRAD)

• old calibration coefficients must be stored (SRAD)

• time-dependence (versioning) complicates the relational model

Calibration data are related to images; source detections arerelated to images and hence their relevant calibration data

Multiframe calibration data:

• “set-ups” define nightly detector & filter combinations: - extinctions have nightly values - zps have detector & nightly values

• coefficients split into current & previous entities• Versioning & timing recorded• highly non-linear systematics are allowed for via 2D maps



Combiframe calibration data: • no “set-ups”: each image separately calibrated;

• detector combiframes are catered for

• “luptidude” parameters stored for each image separately



Catalogue data: general model

• related back through progenitor image to calibration data

• detection list for each programme (or set of sub-surveys)

• merged source entity is maintained

• merge events recorded

• list re-measurements derived



Example: UKIDSS LAS, GPS & GCS

• LAS, GPS & GCS share one detection & one list re-measurement entity

• individual merged source and source re-measurement entities

• note curation information: - merge log (one per prog.) - current/repeat detections - primary/secondary (eg. overlaps) - new/old merge image flag (to trigger list re-measurement)


Non-WFCAM data: general model

• each non-WFCAM survey has a stored catalogue

• cross-neighbour table: - records nearby sources between any two surveys - yields associated (“nearest”) source

• non-WFCAM list measurements where image data are available (NB: V2.0 requirement)


Example: UKIDSS LAS & relationship to SDSS

• UKIDSS LAS overlaps with SDSS

• list measurements: - at positions defined by IR source, but in optical image data; - do not currently envisage implementing this the other way (ie. optical source positions placed in IR image data)


Curation – set of entities to trackin-DBMS processing:

• archived programmes have: - required filter set - required join(s) - required list – driven measurement product(s) - release date(s) - final curation task - one or more curation timestamps

• a set of curation procedures is defined for the archive

The V2.0 R&D programme for the WSA: scalability issues

• speed: wish to maintain query response performance as catalogue data accumulate to many Tbytes

- goal is ~100sec for Tbyte trawls (ie. non-indexed)

• data volume: wish to cope ultimately with single tables of size 10s of Tbytes or more …

But: we’re poor academics!

• limited financial resources - can’t afford rolls-royce SAN-type solution, for example

• staff resources limited - need low maintenance systems

Date post:	01-Jan-2016
Category:	Documents
Upload:	chad-curtis
View:	219 times
Download:	0 times

WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM...

Documents