2011Field talk at iEVOBIO 2011

Post on 27-Jan-2015

104 views 1 download

Tags:

description

A keynote talk at iEVOBIO 2011 meeting - http://ievobio.org/. Has been a great meeting.

transcript

iEVOBIO 2011

The role of grass-roots data sharingcommunities, standards and

megasequencing projects in the genomics revolution

Dawn FieldNERC Centre for Ecology and Hydrology

 

iEVOBIO 2011

Opportunities and Challenges

The era of genomics is just beginning...

...how will we cope with the data?

...how will we gain the most knowledge from this investment in data?

iEVOBIO 2011

PARADIGM SHIFTPARADIGM SHIFT1960-1990

16S RNA

1990-2010

Genomes

2010-2020

Pangenomes

Nikos Kyrpides

iEVOBIO 2011

GREAT CHALLENGESGREAT CHALLENGES

1995-2009 2010-2015

Finished 1000 3000

Draft 1000 10000

P. Chain et al. Science, 2009Genome Sequencing Projects on GOLD

September 2009, 5643 projects

0

1000

2000

3000

4000

5000

6000

Incomplete

Complete

Nikos Kyrpides

iEVOBIO 2011

iEVOBIO 2011

Culturable

Unculturable

Nikos Kyrpides

The trend is now increasingly geared towards

ever more ambitious megasequencing

projects...

iEVOBIO 2011

And democratization of access to sequencing

power...

Just one example....

iEVOBIO 2011

(~80) 41 metagenomes“Global Ocean Survey” Sanger sequencing(Rusch et al, 2007)

Metagenomics: Putting data generating capacity into perspective with an example from Bergen

(1) 1 metagenomeSargasso SeaSanger sequencing(Venter et al, 2005)

(~120) 4 metagenomes &4 metatranscriptomesBergen mesocosm experimentPyrosequencing(Gilbert et al, 2008)

Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I. (2008) Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE. Aug 22;3(8):e3042.

The Bergen ocean acidification study produced 19% of the reads produced in the GOS study and 5% of the total

basepairs of sequence.

Further evidence for the “Unknown Genome” and the

Dark Matter of the Tree of Life

iEVOBIO 2011

The

Data

- Flood

- Tsunami

- Deluge

?

iEVOBIO 2011

the data bonanza

iEVOBIO 2011

To exploit fully the promise of these data we need both scientific

innovation and community agreement on how to provide

appropriate stewardship of these resources for the benefit of all. 

Requires the evolution of our scientific, technological and sociological thinking....

iEVOBIO 2011

SuperMarket

The Genome Catalogue

iEVOBIO 2011

DataMarket Norman Morrison

iEVOBIO 2011

Packaging data

iEVOBIO 2011

Labels for data

<phenotype>

<environmen

tal context>

iEVOBIO 2011

standardsPrinciples: Not everything should be ‘standardized’Aggregation of data, information, and knowledge

requires standard ways of doing things

Standards provide foundations; Standards should drive innovation(think of electrical plugs or the internet)

Pick the right concepts to standardize – at the right time, with the right people

Requires good ‘group think’ – or ‘systems thinking’

iEVOBIO 2011

Community-driven solutions:

The Common Path:

•Identify the problem•Define a community to address it•Define scope of the solution•Implement solution•Gain adoption of solution

iEVOBIO 2011

The Genomic Standards ConsortiumGSC 10

Argonne, 2010

GSC 11,Hinxton,

2010

Innovation through Collaboration

GSC 12Bremen,

2011

GSC 13BGI 2012

iEVOBIO 2011

The GSC’s Mission

• the implementation of new genomic standards

• methods of capturing and exchanging metadata

• harmonization of metadata collection and analysis efforts across the wider genomics community

iEVOBIO 2011

The GSC fulfills its mission by

• Organizing meetings • Forming working groups• Creating Consensus Products

iEVOBIO 2011

Pelin Yilmaz et al 2011

iEVOBIO 2011

iEVOBIO 2011

Use of MIGS/MIMS/MIENS

Please provide this minimum information when you publish

•a genome•a metagenome•a gene marker study (i.e. ribosomal genes)

Genbank, EMBL and DDBJ now accept this information and encourage its submission to their public DNA databases

iEVOBIO 2011

Labels for data

<MIGS><MIMS>

iEVOBIO 2011

Goal:Goal:International effort to sequence a reference genome for every cultured Archaeal and Bacterial organism (~9,000 microbes)

Goal:Goal:International effort to sequence a reference genome for every cultured Archaeal and Bacterial organism (~9,000 microbes)

The Microbial Earth The Microbial Earth ProjectProject

Phase I:Sequence one representative from every characterized microbial type type

speciesspecies

Phase I:Sequence one representative from every characterized microbial type type

speciesspecies

GEBAGEBAGEBAGEBA HMPHMPHMPHMP

iEVOBIO 201130

Source: Jack A. GilbertArgonne National Labs

http://earthmicrobiome.org

iEVOBIO 2011

Field et al unpublished work on a Metadata Coverage Index (MCI)

MCI > 50

iEVOBIO 2011

GSC 5 at the EBI2008

iEVOBIO 2011

iEVOBIO 2011

iEVOBIO 2011

J BacteriologyJ Bacteriology

PNASPNAS

NatureNature

ScienceScience

SIGSSIGS

PLoS ONEPLoS ONE

Genome ResearchGenome Research

PLoS GeneticsPLoS Genetics

Nat BiotechNat Biotech

BMC GenomicsBMC Genomics

To

tal g

eno

me

pu

blic

atio

ns (

1995

- 2

011

)

Top ten journals publishing genome reports

Total 1160 Genome publicationsin 60 peer reviewed publications

Source - GenomesOnline DatabaseMay 28, 2011

iEVOBIO 2011

Incentives for compliance

iEVOBIO 2011

MIGS compliant marine phage genomes

iEVOBIO 2011

GSC 9 at the JCVI – April 2010

iEVOBIO 2011

Darwin Core

GSC MIxSPeter Dawyndt

Darwin core vs GSC MixS standard

Darwin core vs GSC MixS standard

Darwin Core

GSC MIxS standard

TaxonIdentification

Occurrence

IPR related info

EventLocation

GeologicalContextSamplingProtocolEnvironmentalConditions

Darwin core vs GSC MixS standard

Darwin core vs GSC MixS standard

Peter Dawyndt

Preliminary (first) conclusions

Preliminary (first) conclusions

•DC & GSC checklist more complementary than overlapping

how can we make these standards completely orthogonal?

iEVOBIO 2011

iEVOBIO 2011

http://gensc.org

More Information about the GSC...

iEVOBIO 2011

Feast of the Mind

iEVOBIO 2011

Labels for data

<soil>

<water>

iEVOBIO 2011

http://environmentontology.org

Member of OBO Foundry http://obofoundry.org

iEVOBIO 2011

1) Pick terms2) View hits

3) Browse4) Follow links to primary

data

– building on ontologies

Users :

http://ontogrator.org Morrison et al, 2011 SIGS

iEVOBIO 2011

Ontogrator approach depends on quality of

• Data Resources• Knowledge Organization Systems (KOS)

used

Can we use this approach to improve both?Can we complete the virtuous cycle?

iEVOBIO 2011

Field, et al 2009. Science. 326:234-236. 

http://biosharing.org

iEVOBIO 2011

iEVOBIO 2011

Conclusions

• The era of genomics is just beginning…• Self-organization by the scientific community

can pay dividends (i.e. consensus building, large-scale co-ordination)– Standards are keys to unlocking data– Group thinking overcomes the tragedy of the

commons

• Emerging key players from the molecular domain – “one stop shops”– Genomic Standards Consortium– BioSharing – driving cross-community collaborations

iEVOBIO 2011

Feast of the Mind

iEVOBIO 2011

Future

• Analysis – proof sharing is beneficial• Making the field of data sharing more

quantitative – Objective measures of consensus– Useful Metrics: i.e. Metadata coverage index (MCI)– Modelling – i.e. how to best incentivize data

sharing?

• Further shared concepts– Minimum Information about a Sampling Site (MISS)– Minimum Data Policy– PubData?

AcknowledgementsBergen and L4 metagenomicsJack Gilbert Sue

HuseIan Joint Paul

SwiftPaul Somerfield Rob

Knight

NEBCBela TiwariTim BoothMesude Bicak

CEHNorman MorrisonDave Hancock

University of Manchester

Henning HermjakobChris Taylor

European Bioinformatics Institute

Susanna SansonePhilippe Rocca-SerraEamonn Maguire

Oxford University

Genomic Standards ConsortiumPeter Sterk

iEVOBIO 2011

Acknowledgements

Coordination, workshops, working groups,infrastructure and exchange visits

Additional workshop funds

Local Hosts of GSC workshops

Sponsors of GSC 9 and GSC 10

GSC FundingRCN4GSC