Global Catalogue of Microorganisms(GCM) 2.0:
Sequencing for Type Strains
Juncai MA, Linhuan WU
World Data Center for Microorganisms(WDCM)
The Microbial Resource and Big Data Center, IMCAS
Cooperation Background
Scientific Targets
Roadmap
CONTENTS
Progress
Identification of strains and quality control
Improving identification of protein families and ortholog groups across species, and
hence annotation of other microbial genomes;
Providing phylogenetic anchoring of metagenomic data;
Improving Gene discovery by selecting phylogenetically novel organisms
Understanding of the processes underlying the evolution of microbes and
correlations of phenotype and genotype in microbes.
Sequencing type strains
1,003 reference genomes of bacterial and archaeal isolates
expand coverage of the tree of life.
974 bacterial and 29 archaeal genomes (from 579 genera in 21 phyla and
43 classes) were sequenced as part of the GEBA Initiative (GEBA-I),
using a phylogeny-based scoring system for strain selection
Blue denotes the genetic diversity covered by 828 genomes of
type strains before GEBA-I, red denotes the diversity covered by
the GEBA-I genomes and gray denotes the remaining type
strains lacking a genome sequence.
BACTERIA& ARCHAEA
4,000 / 12,239
type strains sequenced
(Whitman et al., 2015)
7,048/14,895
Type strains sequenced
(WDCM statistics in 2018)
provide a phylogenetically balanced genomic representation
DISTRIBUTION OF GLOBAL TYPE STRAINS
52 151 type strains distributed in 134
Culture Collections
93.8% from
top 30
6
Asia culture collections received depositions of
a total 940 type strains (56.2%).
Validated species and type strains
CC No. of type strain deposited
DSMZ 310
JCM 239
KCTC 214
LMG 131
NBRC 112
CGMCC 109
CECT 102
KACC 92
CCTCC 70
CCUG 56
…. ….
A total of 1678 type strains of
819 novel (sub)species validated in 2014
CC No. of type strain deposited
KCTC 328
DSMZ 255
JCM 222
LMG 165
NBRC 112
KACC 104
CGMCC 96
MCCC 90
CCTCC 61
CECT 49
BCRC 33
…. ….
Asia culture collections received depositions of
a total 940 type strains (61.7%).
A total of 1874 type strains of
866 novel (sub)species validated in 2017
Japan
Korea
China Other Asia
Others
A great activity of describing novel microbial species in Asia
29.7%
21.0%
11.7%
Authors’ country in 2014
75% of IJSEM papers are published
by Asian researchers
Asia
Percentage ratio of type strain Country of
Origin of the IJSEM papers in 2017, 60%
were isolated from China, Korea and Japan
China Korea Japan Spain Germany Antarctica Others
China
Korea
Others
Country of Origin in 2017
Global Catalogue of Microorganisms
GCM I S C O M I N G
WDCM cover the costs for sequencing services, database system and data analysis
Raw data and analysis results are published online for free access.
Call for strains and samples from culture collections and scientists with a targets for a specific subject researches.
• 10,000 bacteria, archaea and fungi type strains
GCM 2.0: Sequencing for Type Strains
Theme: Sequencing for existing type strains
Outputs in 5 years:
Pilot Y2 Y3 Y4 Y5
Organization
SOP
Database
Sequencing
Subproject
Training
Meta data
300/100
20 participants each year
2000/300 2000/500 2000/600 2000/500
Functional database
3-5 3-5 3-5 3-5
Time table
Progress
Organization
Participants
Progress
• ATCC, USA
• BCCM/LMG, Belgium
• BCRC, Chinese Taipei
• CAIM, Mexico
• CBS, Netherland
• CCM, Czech
• CCUG, Sweden
• CECT, Spain
• CICC, China
• CIP, France
• CGMCC, China
• FGSC, USA
• ICMP, Netherland
CULTURE COLLECTIONS CONFIRMED
• JCM, Japan
• KCTC, Korea
• KMM, Russia
• MUM, Portugal
• NCTC, UK
• NBRC, Japan
• NCAIM, Portugal
• PCU, Thailand
• TBRC, Thailand
• TISTR, Thailand
• UCD-FST, USA
• VKM, Russia
The 25 collections from 16 countries and regions
GCM 2.0 PROJECT PROGRESS
Scientific committee: Open for recommendations
Working Groups:
1. Bacteria Selection
2. Fungi Selection
3. SOPs
4. Database
5. Intellectual Property Right and Legal Issue.
STRAIN SELECTION STRATEGIES
Phylogenetic Diversity Priority
Scientific targets relevance
Availability of the resources
Nagoya Protocol Safety
Quality Consideration-Two type strains in different collections
Sample preparation
Sequencing
QC & assembly
Annotation
Database
Strain selections
Subproject selection
Data analysis
Joint Publication Open to the Public
Scientific Committee &
Working group Culture Collections
Cultures or DNA
List of type species and type strains
List of type species and type strains
List of type species and type strains
MOU & MTA template
Nagoya protocol safety
WDCM shall use the cultures, DNA samples and associated data only for the
following purposes: sequencing, data exploring and integrating data into data
platform for microbial resources.
WDCM shall ensure the cultures, and DNA samples to be destroyed after the
sequencing project
STRAIN LIST FOR PILOT STAGE
Collection Bacterial Fungi
KCTC 312
ATCC 20
CGMCC 290
NBRC 92
JCM 51
NCTC 25
TBRC 56
TISTR 28
BCCM 10 75
CICC 5
VKM 59
CAIM 28
NCAIM 43
CBS 90
Total 927 224
SOP for Sample preparation and submission
STRAIN LIST FOR PILOT STAGE
Culture collection
CAIM 37 Inoculum
CGMCC 17 live-culture
ICMP 39 freeze dried
JCM 15 DNA
KCTC 54 Cell mass
NBRC 92 DNA
NCAIM 43 freeze dried
CICC 5 DNA
TBRC 15 DNA
Total 317
DNA
samples
Culture Collections
live-culture freeze dried
Sequencing (7-10days)
Data annotation
In BGI (3days)
Data annotation
In IMCAS (7 days)
Data results for
culture collections
40-50 strains/month 3 days sample report
30 working days for DNA
samples
2-3 months for cultures
Quality Control Steps
Further
analysis
317 strains
Contamination 3
Incorrect 15
Cultures
DNA samples
Raw data
Assembly genomes
Sequencing Capacity
Sequencers(295+)
BGISeq-500 Illumina/HiSeq Illumina/MiSeq AB/3730xi Roche/454
PacBio RSⅡ Sequel Bionano Irys System Life Tech/Ion Torrent
Sequencing Capacity:>30 Tb / day
BGI has the largest sequencing capacity in the world.
IN-HOUSE DATA MANAGEMENT SYSTEM
Standard data analysis pipeline
Cooperation
Cooperation mechanism
Coordinate with other projects
Sequencing
Standards & SOP
Complete the prokaryotic tree of life
Reference
database
Genes, Proteins,
Pathways
Improvement of
annotation
Data reports
Sampling
Functions and
evolution
Recruitment of
metagenomic reads
Identification of New
species
EXPECTED OUTPUTS
Cooperation Mechanisms
Sequencing
Data sharing
Network
Culture
collections Scientists
WDCM
Thanks for your attention !
Do our best for cooperation !