+ All Categories
Home > Documents > Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner...

Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner...

Date post: 06-Mar-2018
Category:
Upload: hoangkhuong
View: 230 times
Download: 5 times
Share this document with a friend
29
. . . . . . . . . Orthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 1 / 25
Transcript
Page 1: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

.

.

. ..

.

.

Orthologs Detection and Applications

Marcus Lechner

Bioinformatics Leipzig

2009-10-23

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 1 / 25

Page 2: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Table of contents

.. .1 Background on homology

.. .2 Proteinortho

.. .3 Domain wide commons

.. .4 Annotation pipeline

.. .5 References

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 2 / 25

Page 3: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Definitions

.Homologous genes..

.

. ..

.

.

have derived from a common ancestor

.Orthology..

.

. ..

.

.

evolved by speciation

thought to have a similar function

.Paralogy..

.

. ..

.

.

homologous genes within the same species

thought to have a related function (neo-/subfunctionalization)

out-paralogs arose form a duplication preceding a speciation

in-paralogs evolved by duplication subsequent to speciation

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 3 / 25

Page 4: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Example

Figure: Illustration of relationships: Three species with orthologs, xeno-, in- andout-paralogs. Adapted from [1].

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 4 / 25

Page 5: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

.Interpretation..

.

. ..

.

.

original definition of homology (1843):’the same organ under every variety of form and function’ [2]

still a very good quantitative indication

but neither essential nor sufficient

Homology of two proteins is not equivalent with a common function,sequence nor structure!

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25

Page 6: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

.Interpretation..

.

. ..

.

.

original definition of homology (1843):’the same organ under every variety of form and function’ [2]

still a very good quantitative indication

but neither essential nor sufficient

Homology of two proteins is not equivalent with a common function,sequence nor structure!

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25

Page 7: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

.Interpretation..

.

. ..

.

.

original definition of homology (1843):’the same organ under every variety of form and function’ [2]

still a very good quantitative indication

but neither essential nor sufficient

Homology of two proteins is not equivalent with a common function,sequence nor structure!

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25

Page 8: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

.Relative definition..

.

. ..

.

.

in-/out-paralog definition only in subjection to a certain species

greatly dependent on available data

no absolute view

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 6 / 25

Page 9: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

Figure: Illustration of relationships: Complete view needed.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 7 / 25

Page 10: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

Figure: Illustration of relationships: Complete view needed.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 8 / 25

Page 11: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

.Information benefit..

.

. ..

.

.

duplications are known to be a major source of innovation in evolution

proteins are homologs per definition, if they have a common ancestor

irrespective of their actual similarity or function

most proteins are anciently related but have evolved far

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 9 / 25

Page 12: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

Figure: Multiple gene duplications: All are homologs per definition but smallergroups may be more of use.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 10 / 25

Page 13: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Problems

.Information benefit..

.

. ..

.

.

duplications are known to be a major source of innovation in evolution

proteins are homologs per definition, if they have a common ancestor

irrespective of their actual similarity or function

most proteins are anciently related but have evolved far

Up to which point is the homology information useful?

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 11 / 25

Page 14: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Conclusion

.Proteinortho approach..

.

. ..

.

.

arose from the same ancestor + similar function ⇒ similar sequence

should return a useful subset of homologs (isofunctional aimed)

reciprocal best blast(s)

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 12 / 25

Page 15: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Reciprocal best blast(s) for homologs detection

Figure: Homology detection using blast

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 13 / 25

Page 16: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Proteinortho

.Features..

.

. ..

.

.

orthologs and paralogs assignment for proteins/protein coding genes

designed for large-scale application

behaves nicely in memory consumption

capable of distributed computing

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 14 / 25

Page 17: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Workflow

Figure: Proteinortho workflow: 1) Reciprocal blasts 2) Transformation into graphrepresentation 3) Coloring and decomposition 4) Reconversion and mapping tospecies with encoded proteins

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 15 / 25

Page 18: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Distributed computing

Figure: a) Multiple PCs running Proteinortho, cooperating dynamically using anN-way technique b) Workflow of synchronization

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 16 / 25

Page 19: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Challenge

.Application to all bacteria available on NCBI..

.

. ..

.

.

710 species, 1.5 million proteins

took about two weeks on 50 CPU-cores (Intel Xenon 2.33 GHz)

peak of only 2.5 GB RAM, but 300 GB hard disk

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 17 / 25

Page 20: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Challenge

.Application to all bacteria available on NCBI..

.

. ..

.

.

710 species, 1.5 million proteins

took about two weeks on 50 CPU-cores (Intel Xenon 2.33 GHz)

peak of only 2.5 GB RAM, but 300 GB hard disk

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 17 / 25

Page 21: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Results

400 450 500 550 600 650 700# of species covered

0

25

50

75

100

125

150

175

200

225

250

275

300#

of c

onne

cted

com

pone

nts

originalblastedblasted filtered

Coverage overviewcumulative

Figure: Number of common proteins. Sets with over 5% paralogs where filtered.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 18 / 25

Page 22: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Results

.Common proteins..

.

. ..

.

.

30S ribosomal proteins S2-5, S7, S8, S10-13, S17, S19

50S ribosomal proteins L1-3, L5, L6, L11, L14, L22, L23

tRNA synthetases for seryl, arginyl, phenylalanyl (alpha chain)

preprotein translocase, SecY subunit

peptidase M22, O-sialoglycoprotein endopeptidase

transcription elongation/termination factor NusA

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 19 / 25

Page 23: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Annotation pipeline

.Application for annotation..

.

. ..

.

.

in: newly sequenced bacterial genome

out: annotation of protein coding genescandidates for non-coding genes

no previous knowledge required

runs in 10 to 90 minutes

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 20 / 25

Page 24: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Relatives discovery

Figure: Relatives detection using reference proteins and tree.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 21 / 25

Page 25: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Relatives discovery with colors

Figure: Advanced relatives detection using colors.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 22 / 25

Page 26: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Seeding

Figure: Pipeline seeding with proteins.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 23 / 25

Page 27: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

Pipeline overview

Figure: Pipline seeding with proteins.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 24 / 25

Page 28: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

The end

Thank you for listening!

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 25 / 25

Page 29: Marcus Lechner - Bioinformatics · PDF fileOrthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection

. . . . . .

W M Fitch.Homology a personal view on some of the problems.Trends Genet, 16(5):227–31, May 2000.

Richard Owen, Cooper, and William White.Lectures on the comparative anatomy and physiology of theinvertebrate animals.London :Longman, Brown, Green, and Longmans, 1843.http://www.biodiversitylibrary.org/bibliography/6788.

Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 25 / 25


Recommended