IMG clusters – the hidden features

Post on 04-Jan-2016

34 views 0 download

description

IMG clusters – the hidden features. Sean Hooper Genome Biology Program JGI. Clusters work behind the scenes in IMG Used for Data compression Annotation assistance Grouping of similar functions Necessary for large datasets, e.g. metagenomics. Background. Example. - PowerPoint PPT Presentation

transcript

Sequencing the World of Possibilities for Energy & Environment

IMG clusters – the hidden features

Sean Hooper

Genome Biology Program

JGI

Sequencing the World of Possibilities for Energy & Environment

Background

• Clusters work behind the scenes in IMG

• Used for– Data compression– Annotation assistance– Grouping of similar functions– Necessary for large datasets, e.g.

metagenomics

Sequencing the World of Possibilities for Energy & Environment

Example

• Search for a gene annotated as putative or hypothetical

• Study the often overlooked clusters of genes in IMG

Sequencing the World of Possibilities for Energy & Environment

Putative ribolase carboxylase

Sequencing the World of Possibilities for Energy & Environment

COG

Pfam

IMG

Sequencing the World of Possibilities for Energy & Environment

Tatusev et al 1997

1997: 720 cogs

2003: 4873 cogs

Sequencing the World of Possibilities for Energy & Environment

COG

Pfam

IMG

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

COG

Pfam

IMG

Sequencing the World of Possibilities for Energy & Environment

MCL clustering on sequence

Sequencing the World of Possibilities for Energy & Environment

Nodes = IMG genes

Edges = in same cluster

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Alignment detail

Sequencing the World of Possibilities for Energy & Environment

Phylogeny

• How do these clusters relate to phylogeny?

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Sequencing the World of Possibilities for Energy & Environment

Conclusions

• Provide fast access to related proteins

• Ease analysis and annotation (but cannot replace experimental work)

• Reveal substructures in function and phylogeny

Sequencing the World of Possibilities for Energy & Environment

Acknowledgements

Genome Biology

K Mavrommatis

IJ Anderson

NC Kyrpides

A Pati

IMG crew

K Palappian

E Szeto

VK Markowitz

Chalmers, Sweden

D Dalevi

Sequencing the World of Possibilities for Energy & Environment

COAL demo

• Cluster overview of Archaea

• Spectral bipartitioning

• Integrate metadata (phenotype, phylogeny)