+ All Categories
Home > Documents > Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Date post: 19-Jan-2016
Category:
Upload: avice-whitehead
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
18
Data Services @ CISL/NCAR Data Services @ CISL/NCAR NSF, 1 November ‘07 NSF, 1 November ‘07 Steven Worley Steven Worley
Transcript
Page 1: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Data Services @ CISL/NCARData Services @ CISL/NCAR

NSF, 1 November ‘07NSF, 1 November ‘07

Steven WorleySteven Worley

Page 2: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

FoundationFoundation

Research Data Archive (RDA), (dss.ucar.edu)Research Data Archive (RDA), (dss.ucar.edu)– 40+ year history40+ year history– Observations, analyses, reanalyses (met. & ocn.)Observations, analyses, reanalyses (met. & ocn.)– 550+ datasets, 160 TB, 250K files550+ datasets, 160 TB, 250K files– 7 SEs, Manager, Admin.7 SEs, Manager, Admin.

Three essential data activitiesThree essential data activities– CurationCuration

70+ datasets are actively extended (daily, monthly, 70+ datasets are actively extended (daily, monthly, annually)annually)

20 or so new datasets added annually20 or so new datasets added annually

– StewardshipStewardship Ensure data integrity, systematic organization, Ensure data integrity, systematic organization,

documentationdocumentation

– User AccessUser Access Provision methods varyProvision methods vary

Page 3: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

AccessAccess

MethodsMethods– MSS - MSS - allall datasets available to NCAR computing datasets available to NCAR computing– Online - most-demanded datasets (newest)Online - most-demanded datasets (newest)

Complete systematic discovery for all datasetsComplete systematic discovery for all datasets– Personal Data Requests - Personal Data Requests - allall datasets datasets

Principle: Principle: Successful data management is judged by its usefulness to the current and future users– User community and RDA are large and diverseUser community and RDA are large and diverse– RDA development driven by NCAR and US University needsRDA development driven by NCAR and US University needs– Curation and stewardship are always crucial, some access Curation and stewardship are always crucial, some access

is required and advanced access capability is at a “best” is required and advanced access capability is at a “best” possible level within resource limitspossible level within resource limits

– User benefits and efficient management always evaluatedUser benefits and efficient management always evaluated

Page 4: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

RDA Users and Data DeliveredRDA Users and Data Delivered

Users, 5000+, four categories

Data, 140 TB combined

Web versus MSS work paradigm, (U 4700/400) -> (D 55/83 TB)

Top

Datasets

by

Volume

•NCEP FNL

•NARR

•NNR

•IDD/LDM

•JRA-25

Page 5: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

RDA Data Volumes and Growth (MSS & Web)RDA Data Volumes and Growth (MSS & Web)

MSS 159 TB, TIGGE 66 TB, Other new 19TB (JRA-25, ECMWF Nature Run /OSSE, Hi-Res. IDD/LDM, NCEP Analy., Reanal., Obs.)

WEB 19 TB, TIGGE 4 TB, Other new 3 TB

(NCEP FNL Reanal, Hi-Res. IDD/LDM, GODAS, etc)

Page 6: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Major RDA efforts for FY08Major RDA efforts for FY08

TIGGE*– Add 5 NWP Centers, total = 10, 300 GB/day, 2M GRIB2/day– Improve access through portal on CDP

Multi-center temporal-spatial-parameter ensemble subsets on selected uniform horizontal grid - resources permitting

More frequent updates (daily) NCEP operational model and observations

Annual distribution 33+ TB, very popular for WRF model users Operations to Research, inverse of a common theme

Fully deploy JRA-25 Fix Gaussian Grid metadata Organize products User registration, per agreement with JMA (NCAR’s unique

position!) Open web interfaces - MSS access is already underway

Page 7: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Major RDA efforts for FY08Major RDA efforts for FY08

ISD Collaboration, NCAR and NCDC– Have inventoried all NCAR and NCDC holdings separately– Adding unique sources from NCAR into ISD at NCDC– “Best” global land surface dataset, to be available at NCAR*

OSSE – ECMWF Nature Run validation dataset, 13 months T511 at

11 levels (from T799L91), 3hourly– Replace some files– Organize and open access to public

20th Century Reanalysis– Compo and Whitaker et al., NOAA/ESRL and CU– Computed as NERSC (DOE ENCITE)– Tranfer (tested) to NCAR via ESG (All data)– Serve from MSS and most-demanded products via Web

ICOADS– 20+ year collaboration with NOAA, expect a new release ‘08– World-wide best long-term marine surface dataset (1750-

2007)

Page 8: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Advanced Stewardship ExamplesAdvanced Stewardship Examples**

ERA-40ERA-40– Recomputed vector wind components to correct errorsRecomputed vector wind components to correct errors– Computed T85 resolution products from high resolution spectral Computed T85 resolution products from high resolution spectral

model output. Match up with CCSM and transform to regular model output. Match up with CCSM and transform to regular guassian.guassian.

UAUA– Evaluation of RDA against NOAA’s IGRAEvaluation of RDA against NOAA’s IGRA– DB under developmentDB under development– Includes feedback records from reanalysesIncludes feedback records from reanalyses– Significant work still require before ready for publicSignificant work still require before ready for public**

MetadataMetadata– Big effort to standardize, not 100% yet, but in excellent shapeBig effort to standardize, not 100% yet, but in excellent shape– NASA GCMD draws RDA metadata via OAI-PMH NASA GCMD draws RDA metadata via OAI-PMH

((Open Archives Initiative Protocol for Metadata Harvesting)Open Archives Initiative Protocol for Metadata Harvesting)

– Complete THREDDS catalogs are generated for CDPComplete THREDDS catalogs are generated for CDP– Poised to offer intuitive search and browse with accurate data Poised to offer intuitive search and browse with accurate data

discovery resultsdiscovery results

Page 9: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Data Access ToolsData Access Tools

ObservationObservationScientist want easy access, and what that means varies greatly in Scientist want easy access, and what that means varies greatly in

our diverse community.our diverse community.

ExamplesExamples– Simple program codes (C or Fortran)Simple program codes (C or Fortran)

Easy to modify, design customized research focused computationsEasy to modify, design customized research focused computations– Files formats that easily go into computational applicationsFiles formats that easily go into computational applications

MatLab, R, IDL, etc.MatLab, R, IDL, etc.– Analysis and display packagesAnalysis and display packages

GrADS, NCL, etc.GrADS, NCL, etc.– Real-time interactive interfacesReal-time interactive interfaces

LAS, GUIs built to use OPeNDAP (IDV), TDS, GDS, etc.LAS, GUIs built to use OPeNDAP (IDV), TDS, GDS, etc.

ImplementationImplementationWe don’t exclusively promote any one in particular, try to offer We don’t exclusively promote any one in particular, try to offer

several and meet the community needsseveral and meet the community needs– We can influence the development path of NCL, e.g. for TIGGEWe can influence the development path of NCL, e.g. for TIGGE

Page 10: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Tools, example; TIGGE NCLTools, example; TIGGE NCL

QuickTime™ and a decompressor

are needed to see this picture.

Page 11: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Additional CollaborationsAdditional Collaborations 2006 NCEP-NCAR Annual Analyses DVD2006 NCEP-NCAR Annual Analyses DVD

– Continuation of 1950-2005 seriesContinuation of 1950-2005 series– NCEP GFS, RI, RII NCEP GFS, RI, RII andand operational data from ECMWF, CMC, FNOC, operational data from ECMWF, CMC, FNOC,

UKMETUKMET NOMADSNOMADS

– New NOMADS requirements analysisNew NOMADS requirements analysis RDA is or has received about 12 data streams from NCEP beginning RDA is or has received about 12 data streams from NCEP beginning

over 30 years agoover 30 years ago More things we’d like to have - NOMADS is looking into it, e.g. NARR More things we’d like to have - NOMADS is looking into it, e.g. NARR

forecastsforecasts Data resources overlap, RDA service paradigm is different, and Data resources overlap, RDA service paradigm is different, and

community needs more bandwidthcommunity needs more bandwidth

– NOMADS will prepare some NCEP TIGGE fields NOMADS will prepare some NCEP TIGGE fields To be merged with work currently done at CISLTo be merged with work currently done at CISL

Reanalysis ObservationsReanalysis Observations– Continuous improvement to obs. sources for NCEP, ECMWF, JMAContinuous improvement to obs. sources for NCEP, ECMWF, JMA– Brokered a deal to get unique obs. from JMA from JRA-25Brokered a deal to get unique obs. from JMA from JRA-25

Chinese Academy of Science to mirror parts of RDAChinese Academy of Science to mirror parts of RDA– International open access principle in actionInternational open access principle in action– Lead to data exchange, e.g. better precip. and snow data from Lead to data exchange, e.g. better precip. and snow data from

ChinaChina

Page 12: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Community AwarenessCommunity Awareness

User Surveys, 3-4 year cycleUser Surveys, 3-4 year cycle– Results are excellentResults are excellent– Read between the lines (general comments) to gain insights for futureRead between the lines (general comments) to gain insights for future

Meeting participation and presentationsMeeting participation and presentations Noteworthy activitiesNoteworthy activities

– NAS/NRC Committee, Environmental Data Management at NOAA: NAS/NRC Committee, Environmental Data Management at NOAA: Archiving, Stewardship, and Access - TBP soon.Archiving, Stewardship, and Access - TBP soon.

– Report to the NSB for Long-Lived Data Collections: Enabling Digital Report to the NSB for Long-Lived Data Collections: Enabling Digital Research and Education in the 21st CenturyResearch and Education in the 21st Century

– IOOS DMAC, two plus year effort with several other authorsIOOS DMAC, two plus year effort with several other authors– Working Group on Observational Data Sets for Reanalysis, under GCOS Working Group on Observational Data Sets for Reanalysis, under GCOS

WCRP Observation and Assimilation Panel.WCRP Observation and Assimilation Panel.– Member of Users Working Group to advise JPL/NASA PO.DAACMember of Users Working Group to advise JPL/NASA PO.DAAC– EtcEtc..

Does not represent community awareness activities in areas of portal Does not represent community awareness activities in areas of portal development and technologies.development and technologies.

Page 13: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Portals and Evolution to ESKEPortals and Evolution to ESKE

Overview/brief - CISL/NCAR achievements, status, and Overview/brief - CISL/NCAR achievements, status, and plans would be best conveyed in a longer discussion plans would be best conveyed in a longer discussion with different representation.with different representation.

ESKEESKEAn online environment for advancing data and knowledge An online environment for advancing data and knowledge

management and access.management and access.– Building toward an ESKE with portals for 3 years, 1-2 FTEBuilding toward an ESKE with portals for 3 years, 1-2 FTE

Some Features:Some Features:– Integrated secure environment for models, data, Integrated secure environment for models, data,

analyses, frameworks, tools, and visualizationsanalyses, frameworks, tools, and visualizations– Must enable efficient comprehensive workflows - Must enable efficient comprehensive workflows -

decrease time to results (increase productivity)decrease time to results (increase productivity)– Must integrate with NCAR Supercomputer facilitiesMust integrate with NCAR Supercomputer facilities

Page 14: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Portals and Evolution to ESKEPortals and Evolution to ESKE

Example Active PortalsExample Active Portals– Community Data Portal (CDP)Community Data Portal (CDP)

– General and cross-cutting all NCAR Laboratories and some General and cross-cutting all NCAR Laboratories and some UOPUOP

– Earth Systems Grid (ESG), Climate and IPCC, DOE and Earth Systems Grid (ESG), Climate and IPCC, DOE and NSFNSF

– NCAR’s Science Gateway to the TeraGridNCAR’s Science Gateway to the TeraGrid– THORPEX Interactive Grand Global Ensemble (TIGGE), THORPEX Interactive Grand Global Ensemble (TIGGE),

improved weather forecast research, NSFimproved weather forecast research, NSF– ESMF and Earth System Curator, models and data ESMF and Earth System Curator, models and data

software, software, NSF, NASA, NOAA, many othersNSF, NASA, NOAA, many others– Collaborative Arctic Data Information System (CADIS), Collaborative Arctic Data Information System (CADIS),

CISL/EOL/NSIDC to support AON, NSF CISL/EOL/NSIDC to support AON, NSF – North American Regional Climate Change Assessment North American Regional Climate Change Assessment

Program (NARCCAP), NSF, NOAA, DOE, moreProgram (NARCCAP), NSF, NOAA, DOE, more– Virtual Solar-terrestrial Observatory (VSTO), NSFVirtual Solar-terrestrial Observatory (VSTO), NSF

Page 15: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Portals and Evolution to ESKEPortals and Evolution to ESKE

CDP - Some Players (cdp.ucar.edu)CDP - Some Players (cdp.ucar.edu)– Projects and modelsProjects and models

– IPCC, VEMAP, Daymet (CGD)IPCC, VEMAP, Daymet (CGD)– WACCM, CME, IHOPE (ESSL, EOL, ASP, etc.)WACCM, CME, IHOPE (ESSL, EOL, ASP, etc.)– MOZART, ROSE, TUV, MILARGO, TOPSE, MEGAN (ACD, EOL)MOZART, ROSE, TUV, MILARGO, TOPSE, MEGAN (ACD, EOL)– COLA (UMD)COLA (UMD)– WMO, ESMF, ERA-40, TIGGE (CISL)WMO, ESMF, ERA-40, TIGGE (CISL)– WRF (MMM)WRF (MMM)

– Data catalogsData catalogs– EOLEOL– DSS (RDA)DSS (RDA)– CGDCGD– BADC (British Atmospheric Data Center)BADC (British Atmospheric Data Center)

Take away pointsTake away points– Many participants - actually more than CDP staff can handleMany participants - actually more than CDP staff can handle– Gained much experience with technology evolution and Gained much experience with technology evolution and

integrationintegration– Steps toward an ESKESteps toward an ESKE

Page 16: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Data Service Enhancements TBDData Service Enhancements TBD**

Time Series Specific Portal of Reanalyses DataTime Series Specific Portal of Reanalyses Data– Reanalysis output structure is not conducive for a major Reanalysis output structure is not conducive for a major

segment of climate researchsegment of climate research– To include high resolution global and grid point extractionTo include high resolution global and grid point extraction

Archiving and improved metadataArchiving and improved metadata– Web based metadata (including complete documentation Web based metadata (including complete documentation

is complex - interlinked) needs an archiving strategy, is complex - interlinked) needs an archiving strategy, OAIS and PREMIS give some ideasOAIS and PREMIS give some ideas

– Capture and share user feedback on datasetsCapture and share user feedback on datasets Technology exists, needs methods to monitor, organized, Technology exists, needs methods to monitor, organized,

summarize, and searchsummarize, and search Dataset life cycle managementDataset life cycle management

– Standard procedures for version controlStandard procedures for version control– Automatic user determined notification for updates, new Automatic user determined notification for updates, new

datasets, and corrections - proactivedatasets, and corrections - proactive

Page 17: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

Data Service Enhancements TBDData Service Enhancements TBD

Formalize archiving service for NSF projectsFormalize archiving service for NSF projects– Need data acceptance policy and procedure (probably a Need data acceptance policy and procedure (probably a

data review board)data review board)– Complete archiving package definition and service Complete archiving package definition and service

agreement (multi-level)agreement (multi-level)– Defined roles for PI participationDefined roles for PI participation– Exclusively network drivenExclusively network driven– Appropriate recognition and support within NCARAppropriate recognition and support within NCAR

Develop “efficient” ways to handle TB datasetsDevelop “efficient” ways to handle TB datasets– Currently any one or several can be handled - usually Currently any one or several can be handled - usually

with significant effortwith significant effort Tool and software servers (TDS, GDS, etc) are addressing Tool and software servers (TDS, GDS, etc) are addressing

issues, but easy implementation and scalability are issues, but easy implementation and scalability are continuous challengescontinuous challenges

Page 18: Data Services @ CISL/NCAR NSF, 1 November ‘07 Steven Worley.

ENDEND

http://www.cisl.ucar.edu/http://www.cisl.ucar.edu/

http://dss.ucar.eduhttp://dss.ucar.edu

http://cdp.ucar.eduhttp://cdp.ucar.edu


Recommended