Date post: | 15-Apr-2017 |
Category: |
Data & Analytics |
Upload: | blue-bridge |
View: | 327 times |
Download: | 0 times |
BlueBRIDGE receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675680 www.bluebridge-vres.eu
The BlueBRIDGE approach to collaborative research
Gianpaolo CoroCNR, [email protected]
Context
Progress in Information Technology has changed the paradigms of Science
The large and fast increase of volume and complexity of data requires new approaches to collect-curate-analyse the data
This requires new tools to guarantee exchange and longevity of the data and of the reapplication of the experiments
Big Data• Large volume
• High generation velocity
• Large variety
• Untrustworthy (veracity)
• High complexity(variability)
Big Data: a dataset with large volume, variety, generation velocity, containing complex and untrustworthy information that requires nonconventional methods to extract, manage and process information within a reasonable time.
New Science Paradigms Open Science: make scientific research, data and dissemination
accessible to all levels of an inquiring society, amateur or professional.
Keywords: Open Access, Open research, Open Notebook Science
E-Science: computationally intensive science is carried out in highly distributed network environments that use large data sets and require distributed computing and collaborative tools.
Keywords: Provenance of the scientific process, Scientific workflows
Science 2.0: process and publish large data sets using a collaborative approach. Share from raw data to experimental results and processes. Support collaborative experiments and Reproducibility-Repeatability-Reusability (R-R-R) of Science.
Keywords: collaborative and repeatable Science
Requirements for IT systems
• Support collaborative research and experimentation• Implement Reproducibility-Repeatability-Reusability of
Science• Allow sharing data, processes and findings• Grant free access to the produced scientific knowledge• Tackle Big Data challenges• Sustainability: low operational costs, low maintenance
prices• Manage heterogeneous data/processes access policies• Meet industrial processes requirements
e-Infrastructurese-Infrastructures enable researchers at different locations across the worldto collaborate in the context of their home institutions or in national or multinational scientific initiatives.• People can work together having shared access to unique or distributed scientific
facilities (including data, instruments, computing and communications).
Examples:
Belief, http://www.beliefproject.org/OpenAire, http://www.openaire.eu/i-Marine, http://www.i-marine.eu/EU-Brazil OpenBio, http://www.eubrazilopenbio.eu/
Virtual Research Environments
• Define sub-communities
• Allow temporary dedicated assignment of computational, storage, and data resources
• Manage policies
• Support data and information sharing
Inte
grat
es
e-Infrastructure
Unified Resource Space
Enab
les
VRE VRE VRE
WPS
External e-Infrastructures
Virtual Research EnvironmentsInnovative, web-based, community-oriented, comprehensive, flexible, and
secure working environments.
• Communities are provided with applications to interact with the VRE services• Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces
VREs ExampleThe D4Science e-Infrastructure
D4Science supports scientists in several domains1. More than 25 000 taxonomicstudies per monthwww.i-marine.eu
2. More than 60 000 species distribution maps produced and hostedwww.d4science.eu
3. Used to build a pan- European geothermal energy mapwww.egip.d4science.org
4. Processing and management of heterogeneous environmental and Earth system data
www.envriplus.eu
5. Enhances communication and exchange in Linguistic Studies, Humanities, Cultural Heritage, History and Archaeologywww.parthenos-project.eu
BlueBRIDGE VREs
Stock Assessmentassess the health status of fisheries stocks.
http://www.bluebridge-vres.eu/services/stock-assessment
CMSY model
Marine Protected Areas reduce adverse impact of human activities (e.g. fishing, aquaculture, tourism) on ecosystems, and ensure these activities are properly embedded in policy frameworks.
http://www.bluebridge-vres.eu/services/protected-area-impact-maps
Education VREsLecture-style: the course topics stress is different depending on the audience
Interactive: after each explained topic, students do experiments
Experimental: students reproduce the experiment shown by the teacher and possibly repeat it on their own data
Social: students communicate via messaging or VRE discussion panel
• 1 course/yearIn Pisa
• 1 course/yearIn Paris
• 12 coursesIn Copenhagen
www.bluebridge-vres.eu
International Council for the Exploration of the Sea
• 38 coursesAll over the world+1000 attendees
Social networking is key to share information in e-InfrastructureBlueBRIDGE offers a continuously updated list of events / news produced by users and applications
User-shared News
Application-shared News
Share News
BlueBRIDGE VREs:Social Networking
A free-of-use folder-based file system allows managing and sharing information objects.
Information objects can be • files, dataset,
workflows, experiments, etc.
• organized into folders
• shared• disseminated via public
URLs
BlueBRIDGE VREs:The Workspace – an online files storage system
StorageDatabases Cloud storage Geospatial data
Metadata generation and management Harmonisation Sharing
Data management
Cloud computing Elastic resources assignment
Multi-platform: R, Java, Fortran
Processing
BlueBRIDGE Facilities:Overview
Data Processing
• Experiments on Big Data• Sharing inputs and results• Save the provenance of experiments• Supports R-R-R of experiments
WPS
REST
• Input/Out• Parameters• Provenance
Cloud Computing Platform
BlueBRIDGE computational capabilitiesProject resources: 6 Virtual Machines (VM) with 16 virtual CPU cores, 16GB of RAM and
100GB of storage 100 VMs with 2 virtual CPU cores, 8GB of RAM and 20GB of storageProcesses: ~ 200 algorithms hosted in all the VREs ~ 20 contributing institutes ~ 30,000 requests per month ~ 2000 scientists/students in 44 countries using VREs Programming languages: R, Java, Python, Fortran, Linux-compiled
External providers (European Grid Infrastructure): 6 VMs: 8 virtual CPU cores, 16GB of RAM and 100GB of storage 2 VMs: 16 virtual CPU cores, 32GB of RAM and 100GB of storage 24 VMs: 2 virtual CPU cores, 8GB of RAM and 50GB of storage 5VMs: 4 virtual CPUs cores, 8GB of RAM and 80GB of disk
Integrating new processesIntegration: putting a script that works offline into the Cloud computing platform.Tools: https://wiki.gcube-system.org/gcube/How-to_Implement_Algorithms_for_the_Statistical_Managerhttps://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer
R script
Computing platform Web interface and Web service
SAI - Importing tool
Automatic
Advantages The process is available as-a-Service Invoked via communication standards Higher computational capabilities Automatic creation of a Web interface Provenance management Storage of results on a high-availability system Collaboration and sharing Re-usability, e.g. from other software (e.g. QGIS)
Collaborative experiments
WS
Shared online folders
Inputs
Outputs
Results
Computational system
In the e-Infrastructure
Through third party software
Ensemble ModelImplementation of an ensemble model approach to support advice and management in fisheries.Thorpe et al. (2015). Evaluation and management implications of uncertainty in a multispecies size structured model of population and community responses to fishing. Methods in Ecology and Evolution, 6(1), 49-58.
Diet Information Life history diet information Historical fishing scenarios MSY fishing scenarios Initial abundance values Life history prior information
Total Biomass Stock Spawning Biomass Life history traits
Input OutputProcess
Python script
EM Integration
Download the python scriptand the user’s data
Execute script
Collect output
Destroy local copies of I/O and script
Save Output on the User’s Workspace, with provenance info
Scientist’s provided script
User’s data
Infrastructuremachine
EM Interface
User’s privateWorkspace
EM Interface
EM Interface
EM Interface
Scientific Workflow
Script provider
Updates the script on his private Workspace
The service downloadsthe script on-the-fly
A user executes an experiment on his/her data
The output, the input and the parameters can be shared with another user
This user can execute the experiment againand share the computation with the other user
1
2
3
4
5
6
7
89
10
Limitations and requirements
Input OutputScript
Script
Required Provided
Issues: Code is often designed for one precise data set Often, prototype scripts have code that is not separable from the I/O
In the context of e-Infrastructures and Science 2.0: Modularity is necessary for integration Scripts should be re-organised in a way they could be re-used on other data without
changing the code
Vs
WS
Self-consistent comp. products
RepeatabilityProvenance Prov-O
ReusabilityUse of standards
Reproducibility
Conclusions E-Infrastructures endow processes with several Science
2.0 features BlueBRIDGE offers an e-Infrastructure and resources to
host processes and collaborate Effort is required to algorithms providers to comply with
service and generalisation requirements