ELIXIR: The Federated data infrastructure for Europe’s life-science research
www.elixir-europe.org @ELIXIREurope
A network of data Nodes
• ELIXIR Nodes are funded nationally
• ELIXIR Nodes build on national strengths and priorities
• ELIXIR Nodes provides a national framework for long-term resource management
de.NBI- TheGermanNetworkforBioinformatics Infrastructure
de.NBIconsortium• 39project partners• 30institutions• 8service centers• designated nationalGermannode inELIXIR
www.denbi.de
ELIXIR Common Services – our federated infrastructure platforms
Data deposition:ENA, EGA, PDBe, EuropePMC, …
Data management:Genome annotationData management plans
Added value data resources:UniProt, Ensembl, OrphaNet, …
Data Interoperability:Standards,Identifiers, Ontologies
Bioinformatics tools:Bio.tools, Containers, Galaxy
Compute:Secure data transfer, cloud computing, AAI
Training:TeSS, Data Carpentry, eLearning
ELIXIR StructureFive Platforms for Compute,Data, Tools and Interoperability Complemented by Use Cases for Marine meta-genomics, Rare diseases, Human data, Plants sciences,
PROTEOMICS
METABOLOMICS and galaxyproteomics,
metabolomics
HUMAN CELL ATLAS
HUMAN COPY NUMBER VARIATIONGALAXY
FOOD AND NUTRITION
MICROBIAL BIOTECHNOLOGY
Use cases under review:• Microbial biotechnology • Food and nutrition • Human Cell Atlas • Human copy number variation
OPINION ARTICLE The future of metabolomics in ELIXIR [version 2; referees:
2 approved, 1 approved with reservations]Merlijn van Rijswijk , Charlie Beirnaert , Christophe Caron , Marta Cascante ,
Victoria Dominguez , Warwick B. Dunn , Timothy M. D. Ebbels , Franck Giacomoni , Alejandra Gonzalez-Beltran , Thomas Hankemeier , Kenneth Haug ,
Jose L. Izquierdo-Garcia , Rafael C. Jimenez , Fabien Jourdan , Namrata Kale , Maria I. Klapa , Oliver Kohlbacher , Kairi Koort ,
Kim Kultima , Gildas Le Corguillé , Pablo Moreno , Nicholas K. Moschonas , Steffen Neumann , Claire O’Donovan ,
Martin Reczko , Philippe Rocca-Serra , Antonio Rosato , Reza M. Salek , Susanna-Assunta Sansone , Venkata Satagopam , Daniel Schober ,
Ruth Shimmo , Rachel A. Spicer , Ola Spjuth , Etienne A. Thévenot , Mark R. Viant , Ralf J. M. Weber , Egon L. Willighagen , Gianluigi Zanetti ,
Christoph Steinbeck 33
ELIXIR-NL, Dutch Techcentre for Life Sciences, Utrecht, 3503 RM, NetherlandsNetherlands Metabolomics Center, Leiden, 2333 CC, NetherlandsADReM, Department of Mathematics and Computer Science, University of Antwerp, Antwerp, 2020, BelgiumELIXIR-FR, French Institute of Bioinformatics, Gif-sur-Yvette, F-91198, FranceDepartment of Biochemistry and Molecular Biomedicine, Faculty of Biology, Universitat de Barcelona, Barcelona, 08028, SpainSchool of Biosciences, Phenome Centre Birmingham and Birmingham Metabolomics Training Centre, University of Birmingham,Birmingham, B15 2TT, UKComputational and Systems Medicine, Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, UKINRA, UNH, Human Nutrition Unit, PFEM, Metabolism Exploration Platform, MetaboHUB-Clermont, Clermont Auvergne University,Clermont-Ferrand, F-63000, FranceOxford e-Research Centre, Engineering Science Department, University of Oxford, Oxford, OX1 3QG, UKLeiden Academic Centre for Drug Research, Leiden University, Leiden, 2300 RA, NetherlandsEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UKCentro Nacional Investigaciones Cardiovasculares, Madrid, 28029, SpainCIBER de Enfermedades Respiratorias, Madrid, 28029 , SpainELIXIR Hub, Cambridge, CB10 1SD, UKToxalim, UMR 1331, Université de Toulouse, Toulouse, F-31300, FranceMetabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research &
Technology – Hellas (FORTH/ICE-HT), Patras, GR-26504, GreeceBiomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, 72076, GermanyDepartment of Computer Science, University of Tübingen, Tübingen, 72076, GermanyCenter for Bioinformatics, University of Tübingen, Tübingen, 72076, GermanyThe Centre of Excellence in Neural and Behavioural Sciences, Tallinn, Tallinn, 10120, EstoniaSchool of Natural Sciences and Health, Tallinn University, 10120, 10120, EstoniaDepartment of Medical Sciences, Uppsala University, Uppsala, 752 36, Sweden
UPMC, CNRS, FR2424, ABiMS, Station Biologique, Roscoff, F-29680, France
1,2 3 4 54 6 7 8
9 2,10 1112,13 14 15
11 16 17-19 20,2122 4,23 11
16,24 25 1126 9 27 11
9 28 2520,21 11 29 306 6 31 32
33
123456
78
910111213141516
171819202122
23
Page 1 of 16
F1000Research 2017, 6(ELIXIR):1649 Last updated: 08 NOV 2017
OPINION ARTICLEA community proposal to integrate proteomics activities in
ELIXIR [version 1; referees: 2 approved]Juan Antonio Vizcaíno , Mathias Walzer , Rafael C. Jiménez ,
Wout Bittremieux , David Bouyssié , Christine Carapito , Fernando Corrales , Myriam Ferro , Albert J.R. Heck , Peter Horvatovich , Martin Hubalek , Lydie Lane , Kris Laukens , Fredrik Levander , Frederique Lisacek ,
Petr Novak , Magnus Palmblad , Damiano Piovesan , Alfred Pühler , Veit Schwämmle , Dirk Valkenborg , Merlijn van Rijswijk , Jiri Vondrasek , Martin Eisenacher , Lennart Martens , Oliver Kohlbacher 28-31
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UKELIXIR Hub, Cambridge, CB10 1SD, UKDepartment of Mathematics and Computer Science, University of Antwerp, Antwerp, 2020, BelgiumFrench Proteomics Infrastructure ProFI, Grenoble, (EDyP U1038, CEA/Inserm/ Grenoble Alpes University) Toulouse (IPBS, Université deToulouse, CNRS, UPS), Strasbourg (LSMBO, IPHC UMR7178, CNRS-Université de Strasbourg), FranceProteoRed, Proteomics Unit, Centro Nacional de Biotecnología (CSIC), Madrid, 28049, SpainBiomolecular Mass Spectrometry and Proteomics, Bijvoet Centre for Biomolecular Research and Utrecht Institute for PharmaceuticalSciences, University of Utrecht, Utrecht, 3548 CH, NetherlandsNetherlands Proteomics Center, Utretcht, 3584 CH, NetherlandsAnalytical Biochemistry, Department of Pharmacy, University of Groningen, Groningen, 9713 AV, NetherlandsInstitute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 1, 117 20, Czech RepublicCALIPHO Group, SIB Swiss Institute of Bioinformatics, Geneva, 1015, SwitzerlandDepartment of Human Protein Science, Faculty of Medicine, University of Geneva, Geneva, 1205, SwitzerlandNational Bioinformatics Infrastructure Sweden (NBIS), SciLifeLab, Department of Immunotechnology, Lund University, Lund, 223 62,
SwedenProteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1015, SwitzerlandComputer Science Department, University of Geneva, Geneva, 1205, SwitzerlandInstitute of Microbiology, Czech Academy of Sciences, Prague 1, 117 20, Czech RepublicCenter for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, 2333 ZA, NetherlandsDepartment of Biomedical Sciences, University of Padova, Padova, I-35121, ItalyCenter for Biotechnology, Bielefeld University, Bielefeld, 33615, GermanyDepartment of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, 5230, DenmarkInteruniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Hasselt, 3500, BelgiumCenter for Proteomics, University of Antwerp, Antwerpen, 2000, BelgiumApplied Bio & Molecular Systems, VITO, Mol, BE-2400, BelgiumNetherlands Metabolomics Centre, Utrecht, 3511 GC, NetherlandsDutch Techcentre for Life Sciences / ELIXIR-NL, Utrecht, 3511 GC, NetherlandsMedical Bioinformatics, Medizinisches Proteom-Center, Ruhr-University Bochum, Bochum, 44801, GermanyVIB-UGent Center for Medical Biotechnology, Ghent, 9052, BelgiumDepartment of Biochemistry, Ghent University, Ghent, 9000, BelgiumApplied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, 72074, GermanyCenter for Bioinformatics Tübingen, University of Tübingen, Tübingen, 72074, GermanyQuantitative Biology Center, University of Tübingen, Tübingen, 72074, GermanyBiomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, 72076, Germany
1 1 23 4 4 5
4 6,7 8 910,11 3 12 13,1415 16 17 18
19 20-22 23,24 925 26,27 28-31
1234
56
789101112
13141516171819202122232425262728293031
Page 1 of 11
F1000Research 2017, 6:875 Last updated: 08 NOV 2017
Open data requires infrastructure
Open access life science data is intensively reused
Biosimulation market worth $1bn/yr (2015)http://www.marketsandmarkets.com/Market-Reports/biosimulation-market-838.html
What are ELIXIR Core Data Resources?
• A set of data resources that are of fundamental importance to the broad life science community and the long-term preservation of biological data
• They provide complete collections of generic value to life science, and show high levels of usage, scientific quality and service
ELIXIR Core Data Resources – fundamentally important to life-science research
• 16 Core Data Resourced Nominated
• ELIXIR is committed to Open Access as a core principle for publicly funded research.
• Discussions on-going with Nodes, Resources and funders on high-quality, non-Open Access resources
• ELIXIR Core Data Resources should reflect this commitment and have terms of use or a license that enables the reuse and remixing of data.
• See “Identifying ELIXIR Core Data Resources”
• Agreed collectively by 21 Node directorshttps://www.elixir-europe.org/platforms/data/core-data-resources
Large impact on science
• ELIXIR Core Data Resources – over 16 000 citations of key papers in 2015
• Plus direct citations of data records and identifiers in scientific literature
• >20 000 articles w data citations (2014)
• > 88 000 direct citations of accessions in full-text open access articles (2014)
• ELIXIR Data Platform “metrics” group are working on standard methodology
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 2 4 6 8 10 12 14 16
Citationsofkeypapers(EuropePMC 2015)forELIXIRCoreDataResources
An infrastructure for bioeconomy innovation
2010-2015:
30 771 patents* referred to bioinformatics repositories*Patterns of database citations in articles and patents indicate long-term scientific and industry value: https://f1000research.com/articles/5-160/v1
Towards a Global coalition to sustain Core Data Resources
• Call for Action published in Nature in March 2017
• Full text of article available as pre-print in bioRxiv
• June workshop in London with international funders
• Great interest in Core Data Resources (outcome and method)
• Outcomes taken into HIRO meeting following day
• Working Group established to take forward next steps
Changing landscape with many actors
• Highly distributed data-generating & monitoring
• Distributed analysis requires reference datasets (organized centrally, locally or in distributed networks)
• Manage Legal requirements in transnational settings
International Resources
National data centres
N!
A!D!Institutional data centres
ELIXIR Position Paper on FAIR data management in the life sciences
1. Open sharing of research data is a core principle
2. Data Management is crucial to science
3. Data should be submitted to deposition databases
4. All data submitted to Open Data archives should align with community-defined standards
5. ELIXIR Nodes implement FAIR for their respective nations
6. Professional skills, adequate resources and appropriate funding are needed for Data Management and infrastructure
Blomberg N and ELIXIR Consortium. ELIXIR position paper on FAIR data management in the life sciences. F1000Research 2017, 6(ELIXIR):1857 (document) (doi: 10.7490/f1000research.1114985.1)
“Whenever possible, biological research data should be submitted to the recommended community deposition databases"
• The ELIXIR Deposition Databases meet the technical quality and governance criteria expected of ELIXIR Core Data Resources
• See “Identifying ELIXIR Core Data Resources”
• Agreed collectively by 21 Node directors
• International collaborative effort
https://elixir-europe.org/platforms/data/elixir-deposition-databases
“All data submitted to Open Data archives must be annotated in accordance with community-defined standards”
https://elixir-europe.org/platforms/interoperability
“FAIR data management requires professional skills and adequate resources” Bring your own data workshops
• Problem-centered workshops
• Integration experts -Data resources –Users
• With national nodes or pan-European projects
“ELIXIR Nodes are the national implementation of a harmonised FAIR Data Management programme for the life sciences”
FindabilityHow do you find a needle in a federated haystack?
Bioschemas“schema.org markup for life sciences –minimum properties needed for finding data”
http://bioschemas.org
Carole Goble, Alisdair Gray, ELIXIR-UKRafael Jimenez – ELIXIR Hub
Bioschemas.org
Search enginesRegistriesData
Aggregators
• Standardised metadata
• Metadata publish and harvest without APIs or special feeds
• Feed bio registries and aggregators
A community initiative built on top of Schemas.org to improve Findability and Accessibility in Life Sciences
• Rapid markup• Exposed to harvesting• Find
Major data resources
Smaller datasets
Bioschemas Bioschemas
Bioschemas progress
Use case Gap analysis Spec Test Adoption Applications
Data repositories
✓ ✓ ✓ ✓ ✓ ✓
Datasets ✓ ✓ ✓ ✓ ✓
Beacons ✓ ✓ ✓ ✓ ✓
Samples ✓ ✓ ✓ ✓ ✓
Protein annotations
✓ ✓ ✓ ✓ ✓
Biological Entity ✓ ✓
Event ✓ ✓ ✓ ✓ ✓ ✓
Training material ✓ ✓ ✓ ✓ ✓ ✓
Tools ✓ ✓ ✓
omicsDI
Early adopters
Google research blog: Facilitating the discovery of public
datasets
Dataset index
Scientific File
PID
Dataset index
Scientific File
PID
Dataset index
Scientific File
PID
EarthLife ...
Common Access Common Access Common Access
Data
Services Compute Storage Transfer …
”Research schemas” as Emerging federation architecture in EOSC
EOSC Catalogue
ELIXIRComputePlatformhttps://www.elixir-europe.org/platforms/compute
Targetingaseamlessworkflow:aresearchermayusetheirelectronicidentitytosecurelycreateascientificsoftwareanalysisenvironment,andusetheenvironmenttoaccesslargebiologicaldataresourcesstoredonacloud.
Reliableelectronicidentificationofusers(ELIXIRID)isneededtoaccessthekeyservicesandcapacitiesofELIXIR.
• YoucanlinkexistinguseraccountstocreateyourELIXIRIDtodayatwww.elixir-europe.orgELIXIRAAIallowsUserstocontinueusingtheirfederatedacademic,corporateorsocialmediaidentitybylinkingittoapersonalELIXIRID.
• TheELIXIRserviceprovidersconnectedtoELIXIRAAIbenefitfromacentralised useridentityandaccessmanagementservices.
• ProtocolsSAML2,OpenIDConnect.
• https://www.elixir-europe.org/services/compute/aai
ELIXIRAuthenticationandAuthorizationInfrastructureAAI
o 359 Home Organisation IdPs enabled for login (via eduGAIN)
o 987 ELIXIR users
o 155 groups created in ELIXIR AAI
o 61 registered Resource Providers
ELIXIRCloud&Compute
ELIXIRCloudcapacitiessurveyedhere DK,DE,EBI,FI,FR,SUIconfirmedcapacity
>60.000computecores
>24.000TBofstorage
>3.000computeusers
ELIXIR Cloud WG: towards interoperable clouds
Datastorageandtransfer, coupled to security
Insert link to ELIXIR Webinar
ELIXIR Industry Strategy
ELIXIR Innovation and SME Forum
Previous Events Node-hosted events that present to companies the free tools and services made available through ELIXIR
•8 events since 2014
•350 companies have attended
•50% of forum attendees, on average, are from the industry sector
•95% attendee satisfaction rate
•Lots of networking opportunities
Upcoming Events• Cambridge – UK 24-25 January 2018:
Enabling Discoverability in Bio-Data Innovation
• Munich – Germany (Dates TBA): Biotechnology
• Themes• Human Data: FI, ES, CH • Rare Disease: FR• Marine: NO• Plant Sciences: NL• Multi-domain: BE, DK
SPEAKERS:
• Wim Haentjens (European Commission, DG Research & Innovation –Agri-food unit)
• Peer Bork (European Molecular Biology Laboratory)• Silvia Miret Catalan (Director Nutrition & Health Discover at Unilever)
DATA RESOURCE SHOWCASE | TRAINING | FLASH-TALK
ELIXIR Innovation and SME Forums – attendees –Quantitative Indicator
TotalPrivateAcademics
Copenhagen 2014
Wageningen2015
Basel2015
Oslo2016
Helsinki2017
Barcelona2017
Brussels2017
Paris2017
Outcome from innovation events: Qualitative Indicator
Node - collaboration
Service - exchange
Node - collaboration
ELIXIR in numbers• 21 Members and 1 Observer
• ~ 180 institutes involved
• 600+ staff
• 16 Core Data Resources
• 23 Implementation Studies ongoing or soon to start
• 17 papers in ELIXIR F1000 channel
• 264 live events in TeSS
• 350 companies attended Innovation and SME programme