The MashMyData project Combining and comparing environmental science data on the web Alastair...

Post on 20-Jan-2016

215 views 0 download

Tags:

transcript

The MashMyData projectCombining and comparing environmental science

data on the web

Alastair Gemmell1, Jon Blower1, Keith Haines1, Stephen Pascoe2, Phil Kershaw2, Bryan Lawrence2, Simon Woodman3, Hugo Hiden3

1. Reading e-Science Centre (ReSC) @ University of Reading2. Centre for Environmental Data Archival (CEDA) @ British Atmospheric Data

Centre3. School of Computing Science @ University of Newcastle

Outline

• Background to MashMyData

• Motivation

• Challenges

• Interoperability and project architecture

• Current state of the project

• The future of the project

• NERC-funded project under the ‘Technology proof of concept’ programme

• Commenced 1st February 2010. Runs until 30th June 2011

• Aiming to present some of our later outputs at EGU 2011

• Here we introduce the project and show its current status and plans for the future.

• Funded partners are Reading e-Science Centre (ReSC) and the Centre for Environmental Data Archival (CEDA)

MashMyData Background

Motivation

• Environmental scientists use many diverse data sources including:

• in-situ measurements (e.g. ocean buoys, radiosondes)

• remotely-sensed data (e.g. satellite, radar)

• numerical simulations

• However this results in much heterogeneity of data formats, data access methods, and thus suitable software

• We want to allow scientists from different disciplines to bridge between a variety of datasets regardless of the underlying data formats etc.

Technical Challenges

• The MashMyData project is faced with a number of challenges in order to be successful

• Much overlap between these challenges and a number of important challenges in the wider e-Science community

• The solutions will potentially be widely applicable in the future

• Challenges:

• Dealing with data diversity

• Performing calculations remotely in a way that scales

• Accessing secure data, and the delegation problem

• Enabling traceability and reproducibility

Integrating web services and technologies

• Recent discussion on the gains of re-using existing e-Science

• We have identified a number of existing web services and technologies and integrated them in the MashMyData project:

• Reading e-Science Centre’s ncWMS/Godiva2 Web Map Service (displaying gridded environmental data)

• Centre for Environmental Data Archival’s Web Processing Service (number crunching for compute-intensive workflows)

• Newcastle University’s e-Science Central software (upload, workflows, versioning)

• University of Liege’s DIVA-on-web service (interpolating geospatial point data)

Architecture

Current project status

• First important step was to add multi-dataset capability to godiva2 viewing portal.

• As part of this we have added ability to view in-situ point data as well as gridded data

• This paves the way for mashing up datasets (e.g. produce average or difference of existing datasets)

• Security is being engineered currently, as is the Web Processing Service (required for mash-up workflows)

Web Interface

Web Interface

• Click and drag a layer metadata box into a new position

• This alters the layer stacking on the map to enable layers to be moved towards the front or back

• The opacity of the layers can also be modified to reveal those underneath

Web Interface

Interface with e-Science Central

• User can upload data via the e-Science Central API.

• Thereafter they can view available data sources and workflows

• User can run a given workflow on data of choice and this will execute the workflow in e-Science Central

• This interface with e-Science Central will be invisible to the user – they just know they can upload data, view it and run workflows

• Files and workflows versioning and metadata are recorded by e-Science Central

Examples of work in progress

Further Work

• Finish integration with CEDA’s WPS (currently works with a simple test process)

• This in turn will pave the way for adding mash-up functionality to the web interface.

• Finish engineering the security solution. This will allow access to secure datasets (e.g. Met Office) for certain authorised users.

• Continue meetings with test case users to ensure that the system meets their needs (so far so good but relatively early days!)

Thanks!

a.l.gemmell@reading.ac.uk

www.mashmydata.org (Not live yet, but currently links to our project page including

svn on Google Code)