ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 1
ESA UNCLASSIFIED - For Official Use
Enabling operational service provision with SNAP on EO data processing platformsMarco Peters1, Martin Böttcher1, Thomas Storm1, Norman Fomferra1, Carsten Brockmann1, Marcus Engdahl2
1Brockmann Consult GmbH, Germany, 2ESA ESRIN, Italy
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 2
SNAP - Its Origin• BEAM (est. 2002) - ESA toolbox for the optical sensors on Envisat• NEST (est. 2008) - ESA SAR toolbox and build on top of BEAM• SNAP (est. 2014) - ESA started the new toolbox development for
the upcoming Sentinel platforms• Development on a common base• SNAP leverages on the heritage of
BEAM and NEST
• SNAP is build on 17 years of experience in EO software development and EO data processing & analysis
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 3
SNAP Processing
Batch processing within the GUI is possible
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 4
SNAP Processing
Powerful data processing via the CLI
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 5
SNAP Processing
Use Java or Python to script the processing of your data
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 6
Graph Processing Framework (GPF)
• Directed, acyclic processing graphs• Follows the “pull-processing” paradigm
Processing request propagated backwards through the graph
• Generated user interfaces: Command-line (gpt) and SNAP GUI
Read
AtmCorr
NoiseRed
CloudMask
Write
Source
Product
Target
Product
Tile reuse!Tile reuse!
Tile reuse!Tile reuse!
Tile (0,1)Tile (0,1)
Source product:
3 bands,
3 x 4 tiles
Source product:
3 bands,
3 x 4 tiles
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 7
Processing Platforms using SNAPEuropean Space Agency & national Space Agencies• Thematic Exploitation Platforms (TEPs), OSEA
• Mission Exploitation Platforms (e.g. Proba-V MEP)
• Computing Environments (e.g. RUS)
European Commission:• Copernicus Data and Information Access Services (DIAS)
Copernicus Collaborative Ground Segments • CODE-DE, CEMS
Calvalus• In-house development by Brockmann Consult
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 8
Calvalus – Its Origin
• Calvalus started in 2009 as a ESA LET-SME project• processing system based on the
MapReduce programming modelcombined with aDistributed File System (DSF)
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 9
Calvalus – Processing Power
CalEsa, CalFin, EstHub, CalLand, CalMar, CalHzg
24 Mio. Data Products 7 PB input data 4 PB output data
Processed in 2018
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 10
Benefits of SNAP/GPF• Modular concept of SNAP allows to add/remove new operators without
recompiling
• Software is relocatable, a SNAP module bundle can be copied within the cluster infrastructure, e.g. newly started VM
• SNAP´s Graph Processing Framework allows to chain multiple operations and process data in memory• Intermediate results no not need to be written
• Processing concepts are open for extension• E.g. Special streaming readers/writers
to exploit MapReduce model on Calvalus when aggregating data
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 11
Please Mind Your Steps• Disable the access to the web and the download of
resources• Often access to external resources is not allowed
• Possibly thousands of task download simultaneously
• Collect in advance auxiliary data. Especially in a docker container• Better define source for auxiliary data within cluster
(DEM, water-mask, etc.)
• Improves processing performance (less downloading)
• Relieves the server from the burden to serve many downloads
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 12
Outlook• New std. IO-format for SNAP
(work-in-progress)• Cloud-readiness
(stream from/to Object Storage)• Full ZARR-format compatibility• Multi-res pyramids included• Python interoperability
(dask, dask-distributed, xarray, zarr)
• A new Virtual File System (VFS)• Allows to access files on
remote structures• Supported protocols
Amazon Web Services S3, Open Stack Swift S3, HTTP
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 13
Thanks
Thank you
Hvala
Aitah
KiitosDanke
Merci
Koszonom
Blagodaram
ευχαριστώ Grazie
Takk
Obrigado
Gracias
Dziękuję
Dank u wel
Tak
ESA UNCLASSIFIED - For Official Use Marco Peters | MILAN | 15/05/2019 | Slide 14
Enabling operational service provision with SNAP on EO data processing platforms
Session: C4.01 - Big EO Data Analytics: Platforms and applications
Authors: Marco Peters, Martin Böttcher, Thomas Storm, Norman Fomferra, Carsten Brockmann
The European Space Agency is supporting the paradigm shift to move the software to the data with its SentiNel Application Platform, SNAP. SNAP is known as a toolbox for visualisation, analysis and processing of raster EO data, specifically for Sentinel and Envisat data. SNAP comes with 3 user interfaces: SNAP Desktop for interactive work with EO data, the Command Line Interface (CLI), which allows batch processing, and the Application Programming Interface (API) which allows to call SNAP from own software programs. The Graph Processing Framework (GPF) of SNAP is a key technology running in the background of all three interfaces, which allows to connect sequences of operators, lazy execution, in-memory processing sequences and automated parallelisation. In particular the GPF makes SNAP ideal for efficient operational processing of large datasets, or NRT processing, in a cluster or cloud system.
SNAP is deployed on many EO cloud systems: it is available on all ESA Thematic Exploitation Platforms and all Copernicus DIASes. It is also available on the German National Collaborative Ground Segment (CODE-DE) and potentially other EO cloud systems not known to the authors.
Interactive data users can launch SNAP Desktop in a virtual machine on the cloud and have immediate access to all data provided by the respective system. However, the real strength of SNAP in a cloud or cluster environment is revealed when automated processing graphs are set-up and integrated into a processing management system (which is provided individually by the respective cloud system). Together with the access to the data archives in the cloud or on the cluster, large processing jobs can be executed or a systematic, data driven production can be implemented and executed. A typical development cycle tests the processing graph interactively in SNAP Desktop and executes the (generalised) graph in CLI mode on large datasets or NRT mode.
The Apache Hadoop based Calvalus system for massive parallel processing of EO data is designed to optimally support SNAP’s Graph Processing Framework and thus is most suitable as processing system on a cloud or cluster infrastructure. Calvalus with SNAP is available to users on several public and private clouds and clusters, such as three national Copernicus collaborative ground segments (Germany, Finland, Estonia), the ESA Urban TEP and institutional clusters in Germany, Ireland, UK and Italy. Brockmann Consult is operating Calvalus on a private cloud with 500 processing nodes and 2PB online storage. SNAP has been used here for very large processing jobs generating ESA Climate Change data for Land Cover, Fire and Water Vapour, and is running operationally NRT to provide services to Brockmann Consults customers. Special Calval functions with SNAP on Calvalus allow fast match-up extraction, time series analysis, as well as prototyping and testing of algorithm changes.
In this presentation we present the capabilities of SNAP on EO cloud platforms at examples of the Urban TEP and the German CODE-DE platform. At the example of the Calvalusprivate cloud at Brockmann Consult we discuss advantages as well as shortcomings of SNAP on clusters in several operation modes, and conclude with recommendations for future work to better support EO data users with instrument toolboxes in a cloud and/or cluster environment.