H. Thiemann (M&D) / 10.04.23 / 1
CERA (Climate and Environmental Retrieval and Archive)
Hannes Thiemann
(M&D/MPIMET, Hamburg)
Kiel, 17.3.2004
H. Thiemann (M&D) / 10.04.23 / 2
Data Group maintaining the WDCC
Michael Kurtz
Hans Luthardt
Michael Lautenschlager
Heinke Höck
Hannes Thiemann
Hermann Winter
Jörg Wegner
Frank Toussaint
Peter Lenzen
H. Thiemann (M&D) / 10.04.23 / 3
Content:
• General remarks
• DKRZ archive development
• CERA1) concept
• CERA data model and structure
• Automatic fill process (not presented)
• CERA user interface
1) Climate and Environmental data Retrieval and Archiving
H. Thiemann (M&D) / 10.04.23 / 4
Semantic data management
• Data consist of numbers and metadata.
• Metadata construct the semantic data context.
• Metadata form a data catalogue which makes data searchable.
• Data are produced, archived and extracted within their semantic context.
Data without explanation are only numbers.
Problems:• Metadata are of different complexity for different data types. • Consistency between numbers and metadata have to be ensured.
H. Thiemann (M&D) / 10.04.23 / 5
DKRZ Architecture
Proc.: 24 nodes 192 CPU'sMemory: 1.5 TeraBytePerform.: 1.5 TeraFLOPS (peak) 500 GigaFLOPS (sust.)
Tape Archive: > 3.4 PetaByte Disk Cache: 60 TeraByteBandwidth Comp.S. – Data S.: 450 Mbyte/sec
155 Mbs
H. Thiemann (M&D) / 10.04.23 / 6
DKRZ Archive Development
Basics observations and assumptions:1. Unix-File archive content end of 2002: 600 TB including
Backup's
2. Observed archive rate (Jan. - May 2003): 40 TB/month
3. System changes: 50% compute power increase in August 2003
4. CERA DB size end of 2002: 12 TB
5. Observed Increase (Jan. - May 2003): 1 TB/month
6. Automatic fill process into CERA DB is going to become operational with 4 TB/month this year and should increase from 10% of the archiving rate to approx. 30% end of 2004
H. Thiemann (M&D) / 10.04.23 / 7
DKRZ Archive Development
DKRZ's Archive Increase (Estim. 09.03)
6001200
1920
2640
3360
4080
12 40 184424 664 904
2002 2003 2004 2005 2006 2007
Years
Dat
a A
mo
un
t [T
B]
Unix-File Archive
CERA DB
H. Thiemann (M&D) / 10.04.23 / 8
Problems in file archive access: Missing Data Catalogue Data are not stored application-oriented Lack of experience with climate model data Lack of computing facilities at client site
Year 2003 2004 2005 2006 2007
Estimated File Archive Size
1,2 PB 1,9 PB 2,6 PB 3,4 PB 4,1 PB
H. Thiemann (M&D) / 10.04.23 / 9
Limits of model resolution
ECHAM4(T42)Grid resolution: 2.8°Time step: 40 min
ECHAM4(T106)Grid resolution: 1.1°Time step: 20 min
Noreiks (MPIM), 2001
H. Thiemann (M&D) / 10.04.23 / 10
• (I) Data catalogue and Unix files (pointer or BLOB-table-entry)
Enable search and identification of data Allow for data access as they are
• (II) Application-oriented data storage Time series of individual variables are stored as BLOB
entries in DB Tables• Allow for fast and selective data access
Storage in standard file-format (GRIB, NetCDF)• Allow for application of standard data processing routines
(PINGOs)
CERA Concept:Semantic Data Management
H. Thiemann (M&D) / 10.04.23 / 12
Web-Based User InterfaceCatalogue Inspection
Climate Data Retrieval
CERA Database30 TB (12/2003)Data Catalogue
Processed Climate DataPointer to Raw Data
Mass Storage Archive1 PB (12/2003)
Parts of CERA
H. Thiemann (M&D) / 10.04.23 / 13
CERA Data: Jan. Temp.
H. Thiemann (M&D) / 10.04.23 / 14
CERA Data: Jan. Wind
(2 x 250 MB)
H. Thiemann (M&D) / 10.04.23 / 15
• Complete with respect to IEEE’s Reference Model for Metadata (Bretherton, 1994)– Browse, Search and Retrieval– Ingest, Quality Assurance, Reprocessing– Application to Application Transfer– Storage and Archive
• Reference– “The CERA-2 Data Model” (DKRZ-Report No. 15,
1998)– URL:
http://www.pik-potsdam.de/dept/dc/e/sdm/cera/
CERA-2 Data Model
H. Thiemann (M&D) / 10.04.23 / 16
Interoperability
• Supports interoperability due to inclusion of international standards– Directory Interchange Format (NASA, 1998)– FGDC Metadata Content Standard (FGDC, 1996)– ISO Metadata Standard for Geographic
Information (ISO 19115)
H. Thiemann (M&D) / 10.04.23 / 17
Metadata EntryThis is the central CERA Block,providing information on• the entry's title• type and relation to other entries• the project the data belong to• a summary of the entry• a list of general keywords related to data• creation and review dates of the metadata
Additionally: Modules and Local Extensions
Module DATA_ORGANIZATION (grid structure)Module DATA_ACCESS (physical storage)Local extension for specific information on (e.g.)• data usage• data access and data administration
CoverageInformation on the volume of space-time
covered by the dataReference
Any publication related to the data togehter with the publication form
StatusStatus information like data quality, processing steps, etc.
DistributionDistribution information including access restrictions, data format and fees if necessary
Contact
Data related to contact persons and institutes like distributor, investigator, and owner of copyright
ParameterBlock describes data topic,
variable and unit
Spatial Reference
Information on the coordinatesystem used
CERA-2 Data Model Blocks
H. Thiemann (M&D) / 10.04.23 / 18
Level 1 - Interface:Metadata entries(XML, ASCII)+ Data Files
Level 2 – Interf.:Separate filescontaining BLOBtable data in application adapted structure(time series ofsingle variables)
Experiment Description
Unix-FilesTable / Pointer
Dataset 1Description
Dataset nDescription
BLOB DataTable
BLOB DataTable
CERA Structure
H. Thiemann (M&D) / 10.04.23 / 19
Climate Model Raw Data
Application-oriented Data Storage(Interface level 2)
Primary DataProcessing
H. Thiemann (M&D) / 10.04.23 / 20
Start: Approved in January 2003
Maintenance: Model and Data (M&D/MPIMET) and German Climate Computing Centre (DKRZ)
Mission: Data for climate research are collected, stored and disseminated
ICSU Policy: long-term archiving and unrestricted data access for scientists
Restriction: Only climate data products in CERA DB, no raw data storage.
Content: Emphasis is spent on climate modelling and related data products.
Co-operation: with thematically corresponding data centres like WDC-MARE (Bremen) and WDC-RSAT (Oberpfaffenhofen)
URL: http://www.mad.zmaw.de/wdcc/
H. Thiemann (M&D) / 10.04.23 / 21
WDC Verbund Erdsystemforschung
Wurde am 25.04.03 von den 3 deutschen ICSU WDC's in Oberpfaffenhofen gegründet.
• WDC for Climate: M&D / DKRZ, Hamburghttp://www.mad.zmaw.de/wdcc/ • WDC MARE (Marine Environmental Sciences): Marum, Bremen und Bremerhavenhttp://www.wdc-mare.org/ • WDC RSAT (Remote Sensing for the Atmosphere): DFD/DLR, Oberpfaffenhofenhttp://wdc.dlr.de/
Verpflichtung: Langzeit-Datenarchivierung und freier, unbeschränkter Datenzugang für alle Wissenschaftler (ICSU Rules for WDC's und Regeln zur guten wissenschaftlichen Praxis)
H. Thiemann (M&D) / 10.04.23 / 22
WDC Verbund Erdsystemforschung
Grundsatzerklärung• Datenpublikation- Die Daten selbst sollen unabhängig vom archivierenden System eindeutig identifizierbar, referenzierbar und universell zugreifbar sein (z.B. Vergabe von DOI's oder URN's ).
- DFG Projekt "Publikation und Zitierfähigkeit wissenschaftlicher Primärdaten" (12 Monate, Beginn 01.10.03)
• Service der Datenzentren- Qualifizierte thematische Datenzentren übernehmen die Rolle für die Archivierung und Publikation von wissenschaftlichen Daten.
- Die Zentren garantieren eine langfristige und freie Verfügbarkeit archivierter Daten im Rahmen der Richtlinien der ISCU Weltdatenzentren.
- Datenzentren stehen mit ihrer Expertise den Fördereinrichtungen, den Gutachtern und der Wissenschaft beratend zur Verfügung.
H. Thiemann (M&D) / 10.04.23 / 23
WDC-CLIMATEData Content
• Climate Model Data (Continuous stream of new data)• IPCC DDC (Data Distribution Centre)
– Will be continued for the Fourth Assessment Report
• CEOP (Coordinated Enhanced Observing Period) Model output retention and handling Centre
– Part of WCRP that was motivated by GEWEX with focus on water and energy cycles within the climate system (01.10.2002 – 31.12.2004)
• Observational Data– Model related observations: ERA15/40 (ECMWF), NCEP 40 Y. Reanal.– Instrumental data: WOCE (World Ocean Circulation Experiment)– Earth observations: Access to SST's from NOAA AVHRR in cooperation
with WDC RSAT (distributed archive)• Project Support (encourage Good Scientific Practice)
• HOAPS (Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data)
• CARIBIC (Civil Aircraft for Regular Investigation of the Atmosphere Based on an Instrumentation Container), MPI Mainz
• Different model applications
H. Thiemann (M&D) / 10.04.23 / 24
Experiment
Exp.-Acronym: EH5_T63L19_AMIP_6H
Exp.-Name: ECHAM5_T63L19_AMIP Control Run 6H values
Exp.-Description:
Simulation of current climate using ECHAM5.2 forced with observed monthly sea surface temparatures and sea-ice concentrations (AMIP-2).
The simulation was run on a NEC-SX6 (hurrikan). Atmospheric data is stored every 6 hours. Monthly means are available, too.
Related experiments:
- ECHAM5_TTTLLL_AMIP in where TTTLLL is: T21L19, T31L19, T42L19, T85L19, T106L19, T42L31, T63L31, T85L31 and T106L31
The output from the model run: schauer.dkrz.de:/pf/m/m214002/NEWEXP/EXP300/run365
Project: Climate Model Simulations at MPI
Keyword: AMIP2
WDCC Example
H. Thiemann (M&D) / 10.04.23 / 25
Experiment
Exp.-Acronym: EH5_T63L19_AMIP_6H
WDCC Example
Dataset (BLOB-Table)
DS-Acronym: EH5_T63L19_R365_TEMP2
Variable: 2m temperature
Dataset (BLOB-Table)
DS-Acronym: EH5_T63L19_R365_WIND10M
Variable: 10m wind speed
Number of datasets: 350 time series of 2D global fieldsTotal amount of GRIB data: 350 * 1.6 GB = 560 GB
schauer.dkrz.de:/pf/m/m214002/NEWEXP/EXP300/run365
H. Thiemann (M&D) / 10.04.23 / 26
Dataset
DS-Acronym: EH5_T63L19_R365_TEMP2
DS-Name: EH5_T63L19_R365_TEMP2
DS-Summary: See summary of corresponding experiment. This dataset contains 6H values.
Creation Date: 25-MAI-2003
Format: GRIB
Size (Bytes): 1659519420
Storage: Model and Data: DB Internal Storage; Nearline
Download Permission: No
Topic / Parameter / Variable / Unit: atmosphere / atmospheric temperature / 2m temperature / Kelvin
Code Type / Code # / Code Acronym: Echam5 / 167 / TEMP2
Temporal Structure: length of time series and storage intervalls
Spatial Structure: precise definition of 3D grid points
WDCC Example
H. Thiemann (M&D) / 10.04.23 / 27
H. Thiemann (M&D) / 10.04.23 / 28
Inclusion of other Data
Sources
Client applet receivesforeign data URIfrom CERA-2 DB
Foreign server provides DB data by http:German Aerospace Centre
H. Thiemann (M&D) / 10.04.23 / 29
Download StatisticsNumber of Volumedownloads (GB)
MARCH 2004 950 81FEBRUARY 2004 4018 911JANUARY 2004 1583 1154DECEMBER 2003 1077 366NOVEMBER 2003 1959 923OCTOBER 2003 2844 86SEPTEMBER 2003 3168 241AUGUST 2003 1576 208JULY 2003 3347 213JUNE 2003 3426 78MAY 2003 5803 117APRIL 2003 5343 66
Month
H. Thiemann (M&D) / 10.04.23 / 30
CERA DB using countries
H. Thiemann (M&D) / 10.04.23 / 31
Contact
• Email: [email protected]
• Web: http://cera-www.dkrz.de/CERA