+ All Categories
Home > Documents > GRIB NetCDF Setting the scene - ECMWF · 2015. 11. 17. · PT and PV levels SCDA Forecast SCDA...

GRIB NetCDF Setting the scene - ECMWF · 2015. 11. 17. · PT and PV levels SCDA Forecast SCDA...

Date post: 25-Jan-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
53
Slide 1 © ECMWF GRIB NetCDF Setting the scene
Transcript
  • Slide 1 © ECMWF

    GRIB NetCDFSetting the scene

  • Slide 2 © ECMWF

    GRIB at ECMWF

  • Slide 3 © ECMWF

    ECMWF model output are encoded in GRIB

    ● One parameter

    ● One date

    ● One time

    ● One step

    ● One level

    ● One forecasting system

    ● …

  • Slide 4 © ECMWF

    Size of the archive vs. Sustained HPC performance

  • Slide 5 © ECMWF

    GRIB are stored in MARS

    ● 28 years in existence

    ● A managed archive

    ● MARS is not a file system

    – Users are not aware of the location of the data

    – Retrievals are expressed in meteorological terms

    ● An archive, not a database

    – Metadata online

    – Data offline (automated tape library)

  • Slide 6 © ECMWF

    MARS in numbers

    ●53 Petabytes of primary data in ~ 11 million files, for more

    than 170 billion (1.7 · 1011) meteorological fields

    ● ~ 800 Gigabytes of metadata

    ●200 million fields added daily (peaks at 100 Terabytes)

    ●650 active users/day executing 1.5 million requests/day

    ●~ 100 Terabytes retrieved daily

  • Slide 7 © ECMWF

    Addition of data type in MARS

    3DVar 4DVar 12 Hour 4DVar DCDA

    EPS 15 days

    Vareps/Monthy EDA

    50 Members EPS

    T106L16

    T106L19

    T213L31T319L31

    T319L50

    T319L60

    T511L60 T799L91

    T1279L91

    FC Pressure levels

    FC Model levels

    Chernobyl

    SSTs

    TOGA FC

    Errors in FG

    Waves

    EPS

    Clusters

    Waves FG

    Probabilities

    Ensemble means & stdev

    Other centers

    Sensitivity

    NCEP EPS

    OI

    Errors in AN and FG

    4D-Var

    Tubes

    Wave EPS

    Errors if FG, surface

    Wave proba.

    SCDA Analysis

    PT and PV levels

    SCDA Forecast

    SCDA Forecast

    Wave 4V

    SCDA Waves

    Multi-Analysis

    4D-var increments

    EFIs

    DCDA

    DCDA Wave

    SCDA 4D-Var

    EPS PT levels

    Overlap, CalVal

    Wave EFIs

    Vareps/Monthy

    4d-Var Model errors

    Ensemble data assimilation

    X-MP/4 Y-MP/8 C90/12

    C90/16

    VPP700-48

    VPP700-112

    VPP5000 IBM-P4 IBM-P5 IBM-P5+ IBM-P6

    10M

    100M

    1G

    10G

    100G

    1T

    10T

    85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10

    Weekend EPS

    Weekly Monthly

    Extra fields, new gaussian grid

    00Z EPS

    00Z 10 day FC

    00Z Run

    00Z Run

    End sensitivity

  • Slide 8 © ECMWF

    What data?

    ● Operational runs

    – Medium-range (15 days, twice a day, including ensemble)

    – Extended-range (a month), Long-range (a year)

    – Re-forecasts , Ocean waves

    ● Projects

    – Reanalyses (15 years, 45 years, 100 years)

    – WMO: TIGGE, TIGGE-LAM, S2S

    – EU projects: DEMETER, ENSEMBLES, EURO4M, MACC, PROVOST, ECSN…

    ● Research experiments

    – ECMWF, Member States

    ● Member States’ Projects

    – COSMO-LEPS , Aladin-LEAF

  • Slide 9 © ECMWF

    MARS language – Retrieve request

    retrieve, action

    class = od, identificationstream = oper,

    expver = 1,

    date = -3, date & time relatedtime = 12,

    type = analysis, data relatedlevtype = model levels,

    levelist = 1/to/137,

    param = temperature,

    grid = 2.5/2.5, post-processing

    target = “analysis” storage

  • Slide 10 © ECMWF

    Metview and GRIB

  • Slide 11 © ECMWF

    Metview and NetCDF

  • Slide 12 © ECMWF

    But we want more…

  • Slide 13 © ECMWF

    We want to archive NetCDF in MARS

    …and provide a service on par with what we do for GRIBs.

  • Slide 14 © ECMWF

    GRIB NetCDFSetting the scene (part 2)

  • Slide 15 © ECMWF

    Disclaimer

    GRIB vs. NetCDF: I have no preferences

    I know GRIB better, that’s all.

  • Slide 16 © ECMWF

    GRIB

    ● Designed for telecoms

    ● As small as possible, table driven (must to read the doc :-)

    ● Record/message format, in memory

    ● No separation between format and data model

    ● Used in operational NWP, exchanged on the GTS

    ● Designed by committee

    ● No such thing as a “GRIB file”, just a file with GRIB messages

    % cat file1.grib file2.grib > file3.grib

    ● file3.grib is a valid “GRIB file”

  • Slide 17 © ECMWF

    NetCDF

    ● Primarily an API/library

    ● Self describing

    ● Needs a convention (CF)

    ● Clean separation between format and data model (CF)

    ● Loads of tools

    ● Used in academia, oceanography and climate modeling

    ● CF: community driven

    ● NetCDF is a file format:

    % cat file1.nc file2.nc > file3.nc

    ● file3.nc is NOT a valid NetCDF file.

  • Slide 18 © ECMWF

    GRIB/BUFR: data and

    envelop are mixedCF: data stored

    using a convention

    NetCDF is an envelop

    NetCDF vs. GRIB/BUFR

  • Slide 19 © ECMWF

    Converting from GRIB to NetCDF

    ● Metadata

    – Date and time

    – Parameters

    ● Data

    – Units

    – Grids

    ● Compression

    – Internal (Packing)

    – External (zlib)

    ● File structures

  • Slide 20 © ECMWF

    Metadata

  • Slide 21 © ECMWF

    Why do we need metadata?

    ● Type 1: we cannot use the data otherwise

    – E.g. description of the grid (latitudes, longitudes)

    – E.g. units

    ● Type 2: identification (used for indexing, use for querying)

    – E.g. date/time

    ● Type 3: Nice to have

    – E.g. contact details of principal investigator

    ● GRIB => NetCDF

    – How to map this metadata?

    – How to map the data?

  • Slide 22 © ECMWF

    Convention? What convention?

    ● Parameter names are well covered by CF.

    ● What about other attributes:

    – lat/lon?

    – lat/long?

    – latitude/longitude?

    – x/y?

    – y/x?

    ● And:

    – lev?

    – level?

    – height?

    – z?

    ● I have seen them all. Are users supposed to inspect new files before using them?

  • Slide 23 © ECMWF

    Units

  • Slide 24 © ECMWF

    Should covertion modify the data?

    ● Example: Total Precipitations

    – NetCDF: kg m-2 (e.g. mm assuming 1l of water = 1km)?

    – GRIB: m m-2 ?

    – Mapping implies multiplication by 1000

    ● Is that acceptable?

  • Slide 25 © ECMWF

    Parameter names

  • Slide 26 © ECMWF

    “Semantic Spectrum”M

    ore

    gen

    era

    l Mo

    re s

    pecific

    surface_air_temperature

    discipline = meteorology

    parameter = temperature

    level = 2

    level_unit = meter

    BUFR GRIB NetCDF/CF

    Computer “friendly” Human friendly

  • Slide 27 © ECMWF

    Two interesting examples:

    tendency_of_atmosphere_mass_content_of_

    particulate_organic_matter_dry_aerosol_

    expressed_as_carbon_due_to_emission_from_

    residential_and_commercial_combustion

    surface_upward_mass_flux_of_carbon_dioxide_

    expressed_as_carbon_due_to_emission_from_

    fires_excluding_anthropogenic_land_use_change

  • Slide 28 © ECMWF

    File structure

  • Slide 29 © ECMWF

    File structures

    ● How to structure NetCDF files?

    ● ECMWF golden rule:

    – File must be self describing

    – File name MUST NOT carry any semantic

    ● Consider:

    % mv ECMWF-ERA20C-geopotential-20010101.nc foo.nc

    ● What is in foo.nc ?

    – ncdump (or grib_dump) should tell us

  • Slide 30 © ECMWF

    So how do we structure a NetCDF file?

    ● One 2D field per file?

    ● One 3D field per file?

    ● Many 3D fields per file?

    ● Many 4D files per files?

    => See effect on packing

  • Slide 31 © ECMWF

    File structure: the problem: how to map to NetCDF files?

    GRIB file

  • Slide 32 © ECMWF

    File structure: one file per field

    GRIB file

    file1.nc file2.nc file3.nc

    file4.nc file5.nc

    file7.nc

    file6.nc

    file11.nc

    file10.ncfile8.nc

    file9.nc file12.nc

  • Slide 33 © ECMWF

    File structure: group by time? By level? Both?

    GRIB file

    file1.nc

    file2.nc

    file3.nc

    file4.nc

  • Slide 34 © ECMWF

    File structure: group by time? By level? Both?

    GRIB file NetCDF file

    A =

    B =

    C =

    D =

  • Slide 35 © ECMWF

    Date and time

  • Slide 36 © ECMWF

    Date & time

    D

    D-1

    D-2

    D-3

    T T+1 T+2 T+3 T+4

  • Slide 37 © ECMWF

    Date & time

    D

    D-1

    D-2

    D-3

    T T+1 T+2 T+3 T+4

  • Slide 38 © ECMWF

    D

    D-1

    D-2

    D-3

    T T+1 T+2 T+3 T+4

    Date & time

  • Slide 39 © ECMWF

    D

    D-1

    D-2

    D-3

    T T+1 T+2 T+3 T+4

    Date & time

  • Slide 40 © ECMWF

    D

    D-1

    D-2

    D-3

    T T+1 T+2 T+3 T+4

    Date & time

  • Slide 41 © ECMWF

    Date & time (Hindcasts)

    D

    T T+1 T+2 T+3 T+4

    D – 1Y

    T T+1 T+2 T+3 T+4

    T T+1 T+2 T+3 T+4

    T T+1 T+2 T+3 T+4

    T T+1 T+2 T+3 T+4

    D – 2Y

    D – 3Y

    D – 4Y

  • Slide 42 © ECMWF

    Date & time (long window 4D-var)

    T T+1 T+2 T+3 T+4T-1T-2

    Data assimilation Forecast

  • Slide 43 © ECMWF

    Averaging: ensemble means

    D

    T T+1 T+2 T+3 T+4

    M1

    M2

    M3

    M4

    M5

    M5

    M7

  • Slide 44 © ECMWF

    Averaging: monthly means

    D

    T T+1 T+2 T+3 T+4

    M1

    M2

    M3

    M4

    M5

    M5

    M7

  • Slide 45 © ECMWF

    Averaging: monthly means of ensemble means

    D

    T T+1 T+2 T+3 T+4

    M1

    M2

    M3

    M4

    M5

    M5

    M7

  • Slide 46 © ECMWF

    Compression

  • Slide 47 © ECMWF

    GRIB simple packing

    ● Maps floating point range [Fieldmin,Fieldmax] to integerrange [0,2n-1]

    ● It’s equivalent to sampling the field into 2n buckets

    – Packing is lossy

    ● n can be anything between 1 and 32 (standard does not prevent n to be 255!)

    ● Most of the fields are packed with n = 16.

    – GRIB fields are half the size of the equivalent single precision float (or a quarter of double precision)

    ● Blind conversion from GRIB to NetCDF will create files twice as large (NC_FLOAT) or four time bigger (NC_DOUBLE)

  • Slide 48 © ECMWF

    NetCDF supports “simple packing”

    ● Using scale_factor and add_offset, and packing to NC_BYTE, NC_SHORT, NC_INT

    – Please note that these are signed (unsigned version comes with NetCDF4)

    ● Only multiple of 8 bits are supported

    – NC_BYTE = 8, NC_SHORT = 16, NC_INT = 32

    ● Missing values:

    – GRIB uses a “bitmap” to mark the missing values

    – NetCDF uses _FillValue to mark missing values

    – Consequence:

    ● When packing NetCDF to NC_BYTE or NC_SHORT , we have 1 less value than GRIB

    ● We cannot have encode the same range

  • Slide 49 © ECMWF

    Conversion and “simple packing”: major issue

    ● GRIB applies simple packing per 2D field

    ● NetCDF may apply packing per 3D (space and level, space and time) and even 4D fields (space, level and time)

    ● Consequence: mapping floating point range [Fieldmin, Fieldmax] to integer range [0, 2n-1] is done on more values in the case of NetCDF

    – Higher loss of accuracy

    ● Conversion leads to loss of information !!!!!

    – That’s not good™

  • Slide 50 © ECMWF

    Difference is small, but non-zero (showing precipitations)

  • Slide 51 © ECMWF

    Grids

  • Slide 52 © ECMWF

    Reduced Gaussian Grid

  • Slide 53 © ECMWF

    What do I want from this workshop?

    ● A general agreement on how to map GRIB to NetCDF (parameters, units, metadata, file structures,…)

    – So no one complains that we “are not doing it right”

    ● A general agreement on how we deal with future requirements (new grids, new parameters, …)

    – Maybe a tighter collaboration between WMO and the CF community, like we did for OGC?


Recommended