Date post: | 16-Apr-2018 |
Category: |
Documents |
Upload: | phungkhuong |
View: | 223 times |
Download: | 2 times |
Overview of EMBL-European Bioinformatics Institute
and Interactions with CDISC
Dominic Clark
Industry Programme Manager
www.ebi.ac.uk/industry
Key topics
• EMBL-EBI Background, Services and Standards activities
• EMBL-EBI working with Industry
• The Genomic Standards Consortium
• Challenges ahead
OUR
MISSION
To contribute to
the advancement
of biology
through basic
investigator-
driven research
in bioinformatics
What is EMBL-EBI?
• Part of the European
Molecular Biology
Laboratory
• International, non-profit
scientific institute
• Europe’s hub for biological
data services
Where is EMBL-EBI?
© John Freebury
• We share a campus with
the Wellcome Trust
Sanger Institute
• Near Cambridge, UK
EMBL-EBI
Hinxton data centre
(Most services run
from data centres in
London)
14/11/2013 9
EMBL member states
Austria, Belgium, Croatia,
Denmark, Finland, France,
Germany, Greece, Iceland, Ireland,
Israel, Italy, Luxembourg, the
Netherlands, Norway, Portugal,
Spain, Sweden, Switzerland and
the United Kingdom
Associate member state: Australia
Our funders
EMBL member states: Austria, Belgium, Croatia, Denmark,
Finland, France, Germany, Greece, Iceland, Ireland, Israel,
Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain,
Sweden, Switzerland, United Kingdom.
Associate member state: Australia
Other major funders: the European Commission,
UK Research Councils, the US National Institutes of Health
and the Wellcome Trust
EMBL-EBI users: a snapshot
The new EBI building & ELIXIR Technical
hub.
14/11/2013 1
3
Who we are
~500 members of staff
~53 nationalities
~400 in services & support
~100 focus on basic research
EMBL-EBI works collaboratively
Hinxton Cambridge UK
Global Europe
EMBL-EBI research collaborations
We share funding and author
publications with partner
institutes throughout the world:
• 327 publications in 2011
(90% in collaboration with
other institutes)
• 843 grants shared with other
institutes in 2011
Data and tools for molecular life science
Services
www.ebi.ac.uk/services
Atlas
what happens where
From molecules to medicine
Biology is changing:
• Data explosion
• New types of data
• Emphasis on systems
• Growth of applied biology
• molecular medicine
• agriculture
• food
• environmental
sciences.
Big and bigger data
Key principles about our services
• Freely available
• A comprehensive collection of molecular databases
• Globally coordinated data collection and dissemination
• Produced in collaboration with other world leaders, e.g.:
• NCBI (United States)
• Wellcome Trust Sanger Institute (United Kingdom)
• National Institute of Genetics (Japan)
• SIB Swiss Institute of Bioinformatics (Switzerland)
Data resources at EMBL-EBI
Genes, genomes
& variation
RNA, protein &
metabolite
expression
Protein sequences,
families & motifs
Molecular & cellular
structures
Reactions, interactions &
pathways
Chemical biology
Ontologies & biological
samples
Scientific literature
Data resources at EMBL-EBI
Genomes & variation
• Ensembl
• Ensembl Genomes
• Genome-phenome archive
• Metagenomics
Nucleotide sequences
• European Nucleotide
Archive (ENA)
Expression
• ArrayExpress
• Expression Atlas
• PRIDE
• R-Workbench Proteins
• The Universal Protein
Resource (UniProt)
• InterPro Chemical biology
• ChEMBL
• ChEBI
Literature & ontology
• Europe PubMed
Central
• Gene Ontology
Molecular structures
• Protein Data Bank in Europe
• PDBsum
• ProFunc
Pathways
• IntAct
• Reactome
• MetaboLights
Systems
• BioModels
• Enzyme Portal
• BioSamples
Patent sequences
• Non-redundant patent
sequence dbs
• Patent compounds
Standards development – international collaborations Genomes
www.geneontology.org
gensc.org
Functional Genomics
www.fged.org
Protein sequence
www.uniprot.org
Proteomics
www.psidev.info/
Protein structure
www.wwpdb.org
Cheminformatics
www.ebi.ac.uk/chebi
Pathways
www.reactome.org
www.biopax.org
Systems modeling
www.sbml.org
www.sbgn.org
Metabolomics
www.metabolomicssociety.org
Literature and text mining
www.pistoiaalliance.org/
Nucleotide sequence
www.insdc.org
www.barcodeoflife.org/
Database collaborations: we collaborate on standards and data sharing
in global data sharing agreements for all our major databases.
14/11/2013 24
2005: The Genomics Standards Consortium
• A vast and rich body of information has grown up as a
result of the world’s enthusiasm for ’omics technologies.
Finding ways to describe and make available this
information that maximise its usefulness has become a
major effort across the ’omics world. At the heart of this
effort is the Genomic Standards Consortium (GSC), an
open-membership organization that drives community-
based standardization activities,
• The GSC call for the scientific community to join forces to
improve the quality and quantity of contextual information
about our public collections of genomes, metagenomes,
and marker gene sequences.
The GSC’s Mission
• the implementation of new genomic standards
• methods of capturing and exchanging metadata
• harmonization of metadata collection and analysis efforts across the wider genomics community
Community-driven solutions
The path:
• Identify the problem
• Define a community to address it
• Define scope of the solution
• Implement solution
• Gain adoption of solution
Data standardization at ENA
Petra ten Hoopen
European Nucleotide Archive
European Nucleotide Archive
http://www.ebi.ac.uk/ena/home
Permanent and comprehensive repository for public
domain nucleotide sequences and associated information
• Archiving
• Helpdesk
• Training
• Standards development
• Technology development
• Community building
ENA data model
Data = raw reads and nucleotide sequence assemblies
Metadata = information associated with sequences, includes
provenance of biological sample (sample), sequencing experiment (experiment) and its
scope (study), analysis and annotation of sequences (analysis), and files of raw data (run)
Study
Experiment
Analysis
Sample
Run
Data
ENA data standardization
Standardized reporting requirements for all metadata and
data objects
Study
Experiment
Analysis
Sample
Run
Data
agreed by
INSDC
Consortia of scientific domain-specific experts
(e.g. GSC, MicroB3, RNACentral)
implemented with
community-agreed checklists and control vocabularies,
data-type-specific file formats
ENA checklists 30 Checklists for assembled and annotated sequences in
WEBIN submission system
Large scale
• WGS unannotated
• WGS annotated
• EST
• GSS
• STS
• TSA unannotated
• TSA annotated
Community Standards
• Barcode COI
• MIMARKS 16S
• MIMARKS soil sample 16S
RNA
• Single CDS mRNA
• Single viral CDS genomic RNA
• ssRNA viral polyprotein
• ssRNA viral cRNA
DNA
• Single CDS genomic DNA
• MHC gene 1-exon
• MHC gene 2-exons
• Gene intron
• ITS region
• ETS region
• IGS
• Phylogenetic marker
• COI gene
• D-loop
• trnK-matK locus
• Satellite DNA
• Betasatellite
• rRNA gene
• 16-23S ISR
• Gene promoter
Power of ENA checklists
consistent reporting
user-friendly data submission
data validation
data retrieval
data discovery
data interoperability
data usability
ENA-implemented checklists support and improve:
1. help to achieve objectives of data standardization efforts
1. assist to both data submitters and data users
ENA-implemented checklists
The EMBL-EBI Industry Programme
We support larger companies through the “Industry
Programme”
• For the past 17 years the Industry Programme has been
an integral part of EMBL-EBI, providing on-going and
regular contact with key stakeholder groups.
• Established in 1996, the programme is now well
established as a subscription-funded service for larger
companies.
• Through the Industry Programme, EMBL-EBI provides
specialist workshops, standards-based activities and
pre-competitive research and development opportunities
of particularly relevance to the industry programme
members.
14/11/2013 36
www.ebi.ac.uk/industry
Industry Programme members
The EMBL-EBI Industry Programme
• Relationship between industry members, EMBL-EBI and
our collaborators.
• Enabling industrial update of innovations in bioinformatics
• Knowledge Exchange workshops with world
leaders/KOLs
• Neutral ground for members to explore strategic
developments and concepts
• Input into services development
• Pre-competitive collaboration
• Standards development
• Technical development
Early Success: development of MIAMI
Standard • MIAME describes the Minimum Information About a
Microarray Experiment that is needed to enable the
interpretation of the results of the experiment
unambiguously and potentially to reproduce the
experiment. [Brazma et al., Nature Genetics]
• The public repositories ArrayExpress at the EBI (UK),
GEO at NCBI (US) and CIBEX at DDBJ (Japan) are
designed to accept, hold and distribute MIAME compliant
microarray data.
The six most critical elements contributing
towards MIAME are:
• The raw data for each hybridisation (e.g., CEL or GPR files)
• The final processed (normalised) data for the set of hybridisations in the
experiment (study) (e.g., the gene expression data matrix used to draw the
conclusions from the study)
• The essential sample annotation including experimental factors and their
values (e.g., compound and dose in a dose response experiment)
• The experimental design including sample data relationships (e.g., which raw
data file relates to which sample, which hybridisations are technical, which
are biological replicates)
• Sufficient annotation of the array (e.g., gene identifiers, genomic coordinates,
probe oligonucleotide sequences or reference commercial array catalog
number)
• The essential laboratory and data processing protocols (e.g., what
normalisation method has been used to obtain the final processed data)
MIBBI - Minimum Information for Biological
and Biomedical Investigations
• The MIBBI project promotes extant efforts developing
minimum information guidelines for the reporting of
biological and biomedical science to the wider
community. Background and history of the MIBBI project
can be found here. We work to progressively move the
information to this new site that is also set to provide
additional search and link functionality to connect
guidelines with terminologies and exchange format, as
used by the community.
• There are 38 MIBBI records in BioSharing –
• http://www.biosharing.org/standards/mibbi
Knowledge Exchange Workshops
• The Industry Programme organises high quality workshops and
symposia, providing expert level presentations and strategic
discussion opportunities for members and other key opinion
leaders.
• Workshops:
• Prioritised by the IP members based on proposals
• Organised through a planning team,
• Include key opinion leaders as speakers
• Include appropriate stakeholders
• By individual/collective invitation only.
• Facilitated
• Take a significant amount of planning
14/11/2013 43
Member-driven workshops
Computational
systems
biology
Data
integration
Workshops in 2012
Workshop Title Date
Using electronic health records (EHRs) for
translational bioinformatics
Feb 2012
Chemogenomics Mar 2012
1000 Genomes Project Apr 2012
R & Bioconductor training workshop May 2012
Metabolomics May 2012
Antibody Informatics June 2012
Systems Biology for Toxicology Pathways Sept 2012
Secure Hosted Services Oct 2012
1000 Genomes and NSG data Analysis (Novartis
site, Cambridge, MA)
Nov 2012
Pre-clinical Safety Data (EMBL, Heidelberg, DE) Nov 2012
14/11/2013 45
Industry Programme Workshops for 2013
14.11.2013 4
6
Workshop Title Date
Oncogenomics 13-14 Mar 2013
Overview of Biomedical Ontologies 17-18 Apr 2013
Biomarkers 23-24 Apr 2013
Encode and Epigenomics 19-20 Jun 2013
Data Integration and its application 18-19 Sep 2013
Translational informatics 23-24 Oct 2013
Oncogenomics (Pfizer, Pearl River, NY) 14-15 Nov 2013
Computational tools for chemical biology,
phenotypic screening & target de-convolution
21-22 Nov 2013
RNA-seq data analysis 11-12 Dec 2013
Dates for 2014
14.11.2013 4
7
Workshop Title Date
Rare Diseases and drug repositioning 24-25 Mar 2014
Encode Workshop, Cambridge, MA, USA 15-16 Apr 2014
EBI/EuroDISH/NuGO workshop on Nutrition
Information, Ontologies and Nutrigenomics
29-30 Apr 2014
Systems Pharmacology 7th-8th May 2014
Biologics 21-22 May 2014
Shared Data, Shared Cost 18-19 June 2014
What happens after workshops?
• Presentations are made available in Industry members
website
• Short report
• Where appropriate EMBL-EBI will act as a coordinator or
broker in establishing pre-competitive
collaborations/initiatives between Industry programme
members (and third parties – academic groups, funding
organisations, other commercial companies)
• Publication: Examples from 2011
• MIABE paper in NRDD
• Tox ontology roadmap papers
14/11/2013 48
Published Sept. 2011, PMID: 21878981
14/11/2013 49
Major challenges remain
Variation: EGA and GWAS
• Explore datasets from Genome-
Wide Association Studies
(GWAS)
• All types of sequence and
genotype experiments:
• Case control
• Population
• Family studies
• SNP and CNV genotypes from
array-based methods
• Genotyping done with re-
sequencing methods
European Genome-
phenome Archive:
www.ebi.ac.uk/ega
A Global Alliance for sharing genomic
and clinical data • EMBL and EMBL-EBI have joined the Global Alliance, a large-scale,
international effort to enable the secure sharing of genomic and clinical data.
The Global Alliance invites commercial and not- for-profit organisations to
join forces with other leading data, health care, research, and disease
advocacy organisations to establish an evidence base for genomic research
and medicine that adheres to the highest standards of ethics and privacy.
• A White Paper circulated in early 2013 has the support of nearly 70
organisations in Asia, Australia, Africa, Europe, North America and South
America who are committed to creating a common framework that supports
data analysis and protects the autonomy and privacy of participating
individuals. Signatories of an accompanying Letter of Intent to create a not-
for-profit, inclusive, public–private, international, non-governmental
organisation include healthcare providers, research institutions, disease
advocacy groups, life science and information technology companies. Many
more are expected to join.
Summary
• EMBL-EBI is one of the global leaders in the storage,
annotation, interrogation and dissemination of large datasets
of relevance to the bio-industries.
• Standards are an important part of international data
exchange and effective utilisation of information.
• We work closely with industry in developing new standards.
• Major challenges remain.
Acknowledgements
• Peter Sterk (U. Oxford) and secretary of GSC.
• Petra ten Hoopen (EMBL-EBI)