+ All Categories
Home > Documents > Unifying measures of gene function and evolution

Unifying measures of gene function and evolution

Date post: 22-Feb-2016
Category:
Upload: mrinal
View: 41 times
Download: 0 times
Share this document with a friend
Description:
Unifying measures of gene function and evolution. Eugene V. Koonin, National Center for Biotechnology Information, NIH, Bethedsa. Nothing in (systems) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970). Wolf, Carmel, Koonin, Proc. Roy Soc. B, in press. - PowerPoint PPT Presentation
43
Unifying measures of gene function and evolution Wolf, Carmel, Koonin, Proc. Roy Soc. B, in press Nothing in (systems) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Eugene V. Koonin, National Center for Biotechnology Information, NIH, Bethedsa
Transcript
Page 1: Unifying measures of gene function and evolution

Unifying measures of gene function and evolution

Wolf, Carmel, Koonin, Proc. Roy Soc. B, in press

Nothing in (systems) biology makes sense except in the light of evolutionafter Theodosius Dobzhansky (1970)

Eugene V. Koonin, National Center for Biotechnology Information, NIH, Bethedsa

Page 2: Unifying measures of gene function and evolution

With the advent of OMICS data…

The game of correlations began…

Systems Biology and Evolution

Page 3: Unifying measures of gene function and evolution

Evolutionary systems biology:• In principle, we address the classical problem: the

relationship between the (largely neutral?) evolution of the genome and the (largely adaptive) evolution of the phenotype

• In practice, the progress of genomics + other OMICS allows us to measure, on whole-genome scale, the effects of all kinds of molecular phenotypic characteristics (expression level, protein-protein interactions etc etc) on evolutionary rates – this typically yields weak, even if significant, correlations

• Can we synthesize these measurements to produce a coherent picture of the links between phenomic and genomic evolution?

Page 4: Unifying measures of gene function and evolution

The Cautionary Tale

"It was six men of Indostan / To learning much inclined, Who went to see the Elephant / (Though all of them were blind), That each by observation / Might satisfy his mind "

(J.G. Saxe)

Page 5: Unifying measures of gene function and evolution

The Cautionary Tail

"…each was partly in the right / And all were in the wrong"(J.G. Saxe)

Page 6: Unifying measures of gene function and evolution

Different Faces of the Hypercube?

Synthesis

Pairwisecorrelations

Page 7: Unifying measures of gene function and evolution

Analysis of Multidimensional Data

Page 8: Unifying measures of gene function and evolution

Analysis of Multidimensional Data

"fair world" model "unfair world" model

Page 9: Unifying measures of gene function and evolution

Analysis of Multidimensional Data

Principal Components Analysis (PCA) introduces a new orthogonal coordinate system where axes are ranked by the fraction of original variance accounted for.

PC1

PC2PC3

Page 10: Unifying measures of gene function and evolution

PCA• PCA takes a set of variables and defines new variables that are linear

combinations of the initial variables. • PCA expects the variables you enter to be correlated (as is the case in the correlation game of Systems Biology). • PCA returns new, uncorrelated variables, the principal components or

axes, that summarize the information contained in the original full set of variables.

• PCA does not test any hypotheses or predict values for dependent variables; it is more of an exploratory technique.

• The data entered represent a cloud of points, in n-space. • The cloud is, typically, longer in one direction than another, and that

longest dimension is where the points are the most different; that's where PCA draws a line called the first principal component.

• The first principal component is guaranteed to be the line that places your sample points the farthest apart from each other, in that way, PCA "extracts the most variance" from your data. This process is repeated to get multiple components, or axes.

Page 11: Unifying measures of gene function and evolution

The Data Set: KOGsIdeally, we would like to obtain and synthesize the data on individual genes in precise space-time coordinates (e.g., instant evolutionary rates)

However:• some of the variables are not easily measurable (if defined at

all) for genes in extant species [e.g. rate of evolution];• other variables are measurable in principle but, in practice, are available only for a few species [e.g., expression level]• much of the data are inherently noisy, either due to technical

problems or true biological variation [e.g. fitness effect of gene disruption].

Thus, we analyze orthologous protein sets, using the proteins from different species to derive complementary data and smooth out variations in other.

Practically, this means using the KOG dataset (with additions): 10058 KOGs from 15 species (Koonin et al. 2004, Genome Biol).

Page 12: Unifying measures of gene function and evolution

The Data Set: KOGs

100 Myr

Arath

Orysa

Dicdi

Enccu

Maggr

Neucr

Schpo

Sacce

Canal

Caeel

Caebr

Drome

Cioin

Homsa

Musmu

Original KOGs for some species, "index orthologs" for other.

10058 KOGs altogether

Page 13: Unifying measures of gene function and evolution

Variables: Gene LossPropensity for Gene Loss (PGL), introduced by Krylov et al. (Genome Res. 13, 2229-2235, 2003).

At CeDm Hs Sc Sp Ec

Gene loss

Computed from KOG phyletic pattern.

Originally an empirical measure (Dollo parsimony reconstruction of events; ratio of branch lengths).

In this work – employs an Expectation Maximization algorithm.

Page 14: Unifying measures of gene function and evolution

Variables: Gene DuplicationNumber of Paralogs, average number observed for a given KOG.

Example: KOG0417 (Ubiquitin-protein ligase) and KOG0424 (Ubiquitin-protein ligase).

At1g16890 At1g36340 At1g64230 At1g78870 At2g16740 At2g32790 At3g08690 At3g08700 At3g13550 At4g27960 At5g25760 At5g41700 At5g53300 At5g56150

CE03482 CE09712 CE10824 CE28997

7292764 7292948 7295708_2 7296089 7297757 7298165 7299919

Hs17476541 Hs22043797 Hs22054779 Hs22064361 Hs4507773 Hs4507775 Hs4507777 Hs4507779 Hs4507793 Hs5454146 Hs7661808 Hs8393719

YBR082c YDR059c YDR092w YGR133w

SPAC11E3.04c SPAC1250.03 SPBC119.02 SPBC1198.09

ECU10g0940 ECU11g1990

At3g57870 CE01332 CE09784

7296195 Hs4507785 YDL064w SPAC30D11.13 ECU01g0940

Page 15: Unifying measures of gene function and evolution

Variables: Evolution Rate

Ascomycota:Sordariomycetes vs. Yeasts

Select a taxon

Build an alignment (MUSCLE);

Compute distance matrix (PAML);

Select minimum distance between members of the two subtrees of the group.

Page 16: Unifying measures of gene function and evolution

Variables: Expression LevelExpression Level data for S. cerevisiae, D. melanogaster and H. sapiens were downloaded from UCSC Table Browser (hgFixed).

Organism Table No. exp. No. prob. No. KOGs

Sacce yeastChoCellCycle 17 6602 3030

Drome arbFlyLifeAll 162 4921 2617

Homsa gnfHumanAtlas2All 158 10197 3872

Standardized (=0; =1) log values; median expression level among paralogs was used to represent a KOG.

Page 17: Unifying measures of gene function and evolution

Variables: Interactions

Protein Protein and Genetic Interactions (PPI and GI) data for S. cerevisiae, C. elegans and D. melanogaster were downloaded from GRID Web site.

Median number of interaction partners among paralogs was used to represent a KOG.

Page 18: Unifying measures of gene function and evolution

Variables: Lethality

Lethality of Gene Knockout data for S. cerevisiae were downloaded from MIPS FTP site (0/1 values).

Embryonic Lethality of RNAi Interference data for C. elegans were taken from Kamath et al., 2003 (0/1 values).

Page 19: Unifying measures of gene function and evolution

Missing DataTotal: 38 variables in 10058 KOGs – lots of missing data.

Complete data (all 38 variabless available): 23 KOGs – too few.

Combined data: 7 variables, 1482 KOGs with complete data; 4124 with at most one missing point; 3912 KOGs after removal of outliers.

Example: evolution rate.At.Os Sc.Ca Mg.Nc Hs.Mm. Pl.MF

KOG0009 - 0.168 0.300 - 0.405KOG0010 0.671 1.252 0.606 0.087 1.492KOG0011 0.905 1.698 0.428 0.073 1.547KOG0012 - 2.238 0.665 0.244 -KOG0013 0.355 - - 0.014 1.343KOG0014 1.913 4.041 - 0.126 2.840KOG0015 - 2.286 0.400 0.027 -KOG0016 - - 0.506 0.380 -

0.667 1.864 0.521 0.075 1.910

At.Os Sc.Ca Mg.Nc Hs.Mm. Pl.MF- 0.090 0.575 - 0.2121.006 0.672 1.162 1.166 0.7811.358 0.911 0.821 0.984 0.810- 1.201 1.275 3.275 -0.532 - - 0.181 0.7032.869 2.168 - 1.692 1.487- 1.227 0.767 0.365 -- - 0.970 5.087 -

Average0.2930.9570.9771.9170.4722.0540.7863.028

Page 20: Unifying measures of gene function and evolution

VariablesPhenotypic

• EL – expression level

• PPI – protein-protein interactions

• GI – genetic interactions

• KE – knockout effect

• NP – number of paralogs

Evolutionary

• ER – (sequence) evolution rate

• PGL – propensity for gene loss

Page 21: Unifying measures of gene function and evolution

The correlationsNP PPI GI PGL ER EL KE

NP -

PPI 0.057 -

GI 0.060 0.034 -

PGL 0.000 -0.125 -0.019 -

ER -0.070 -0.200 0.034 0.141 -

EL 0.129 0.199 -0.050 -0.099 -0.277 -

KE 0.027 0.234 -0.048 -0.181 -0.155 0.188 -

Page 22: Unifying measures of gene function and evolution

Two Tiers of Variables

"evolutionary"variables

"phenotypic"variables

Observation on the pattern of pairwise relationships in the data: "phenotypic" and "evolutionary" variables behave differently.

"bigger is better"

"slow is good,fast is bad"

Page 23: Unifying measures of gene function and evolution

Two Tiers of Variables

"evolutionary"variables

"phenotypic"variables

positive

positive

negative

Observation on the pattern of pairwise relationships in the data: "phenotypic" and "evolutionary" variables behave differently.

Page 24: Unifying measures of gene function and evolution

The correlations

non-essential(almost by definition)

low-expressed

relativelyfast-evolving

NP PPI GI PGL ER EL KE

NP -

PPI 0.057 -

GI 0.060 0.034 -

PGL 0.000 -0.125 -0.019 -

ER -0.070 -0.200 0.034 0.141 -

EL 0.129 0.199 -0.050 -0.099 -0.277 -

KE 0.027 0.234 -0.048 -0.181 -0.155 0.188 -

Page 25: Unifying measures of gene function and evolution

PCA of the Data SpacePC.1 PC.2 PC.3

NP 0.17 0.69 0.44PPI 0.46 0 -0.17GI 0 0.67 -0.54PGL -0.33 0 0.51ER -0.47 0 -0.20EL 0.48 0 0.36KE 0.45 -0.27 -0.21-----------------------------------------% var. 25.0 15.3 14.5

PC125.0%

PC215.3%

PC314.5%

PC412.4%

PC512.2%

PC610.6%

PC710.0%PC1

25.0%

PC215.3%

PC314.5%

PC412.4%

PC512.2%

PC610.6%

PC710.0%

Sphericity

Page 26: Unifying measures of gene function and evolution

PCA of the Data Space

PC1

PC

2

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7NPGI

PGLER

EL

PPI

KE

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7NPGI

PGLER

EL

PPI

KE

Page 27: Unifying measures of gene function and evolution

PCA of the Data Space

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

NP

GI

PGL

EL

PPI

KE

PPI

ER

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

NP

GI

PGL

EL

PPI

KE

PPI

ER

PC2

PC

3

Page 28: Unifying measures of gene function and evolution

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7NPGI

PGLER

EL

PPI

KE

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7NPGI

PGLER

EL

PPI

KE

PC1 – Gene’s “status"

"important""accessory"PC1

PC

2

Page 29: Unifying measures of gene function and evolution

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7NPGI

PGLER

EL

PPI

KE

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7NPGI

PGLER

EL

PPI

KE

PC2 – "Adaptability"

PC1

PC

2

"flex

ible

""r

igid

"

Page 30: Unifying measures of gene function and evolution

PC2 and Expression Profile Skew

Skew ~0 Skew >0

S. cerevisiae 0.29 0.291x100 0.32 0.443x10-3

D. melanogaster 1.82 1.844x10-1 1.82 1.907x10-2

H. sapiens 1.75 1.947x10-4 1.87 2.12<1x10-20

PC2 PC2LO HI p-value LO HI p-value

Status - LO Status - HI

Omnibus test 1x10-2 <1x10-20

Page 31: Unifying measures of gene function and evolution

PC3 – "Reactivity"

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

NP

GI

PGL

EL

PPI

KE

PPI

ER

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

NP

GI

PGL

EL

PPI

KE

PPI

ER

PC2

PC

3

Page 32: Unifying measures of gene function and evolution

PC3 and Expression Profile Skew

S. cerevisiae 0.26 0.313x10-1 0.22 0.50<1x10-20

D. melanogaster 1.77 1.886x10-2 1.86 1.859x10-1

H. sapiens 1.80 1.943x10-4 1.86 2.13<1x10-20

PC3 PC3LO HI p-value LO HI p-value

Status - LO Status - HI

Omnibus test 4x10-4 <1x10-20

Skew ~0 Skew >0

Page 33: Unifying measures of gene function and evolution

Relationships Between Variables

"evolutionary"variables

"ADAPTABILITY""REACTIVITY"

"phenotypic"variables

"STATUS"

Page 34: Unifying measures of gene function and evolution

Status and Adaptability of Genes

Classification of KOGs into 4 major categories

Status

Adap

tabi

lity

Page 35: Unifying measures of gene function and evolution

Status and Adaptability of Genes

Classification of KOGs into 4 major categories

Status

Adaptability

Reactivity

INF

CELL

MET

UNKN

Page 36: Unifying measures of gene function and evolution

Status and Adaptability of Genes

Cytoplasmic and Mitochondrial ribosomal proteins

-5 -4 -3 -2 -1 0 1 2 3 4 5-4

-3

-2

-1

0

1

2

3

4

5

6

Status

Ada

ptab

ility

Page 37: Unifying measures of gene function and evolution

Status and Adaptability of Genes

Vacuolar ATPase and Vacuolar Sorting proteins

-5 -4 -3 -2 -1 0 1 2 3 4 5-4

-3

-2

-1

0

1

2

3

4

5

6

Status

Ada

ptab

ility

Page 38: Unifying measures of gene function and evolution

Status and Adaptability of Genes

Replication Licensing Complex and Histones

-5 -4 -3 -2 -1 0 1 2 3 4 5-4

-3

-2

-1

0

1

2

3

4

5

6

Status

Ada

ptab

ility

Page 39: Unifying measures of gene function and evolution

-5 -4 -3 -2 -1 0 1 2 3 4 5-4

-3

-2

-1

0

1

2

3

4

5

6

Status

Ada

ptab

ility

Status and Adaptability of Genes

RNA processing and modification

Core Cluster(spliceosome and mRNA cleavage-polyadenylation

complex)

Page 40: Unifying measures of gene function and evolution

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1

23

4

Adaptability

Rea

ctiv

ityAdaptability and Reactivity of Genes

translation and ribosome

replication, RNA processing and

modification

signal transduction

carbohydrate transport and metabolism

Page 41: Unifying measures of gene function and evolution

Status, adaptability, and reactivity of selected multisubunit complexes and functional classes of proteins

Major functional categories No. of KOGs

Average status

Average adaptability

Average reactivity

Information storage and processing

951 0.553* -0.164* -0.146*

Cellular processes and signaling 1216 0.179* 0.201* -0.080* Metabolism 692 -0.057 0.075 0.494* Poorly characterized 1053 -0.669* -0.134* -0.100*

Complexes No. of KOGs

Average status

Average adaptability

Average reactivity

Cytoplasmic ribosome 76 2.679* 0.203 1.226* Mitochondrial ribosome 40 -0.004 -0.527* -0.089 Chaperonin complex TCP-1 8 2.237* -0.291 -0.299 Spliceosome 50 1.234* -0.511* -0.393* mRNA cleavage and polyadenylation

10 0.968* -0.609 -0.705

Proteasome 33 2.158* -0.547* -0.329* Exosome 12 0.967* -0.660 -0.419 Nucleosome 6 1.933 1.875 1.727 Vesicle coat complex 19 1.360* -0.496* -0.049 Vacuolar H+-ATPase 13 1.696* -0.449 0.345 Mitochondrial F0F1-ATP synthase

13 1.110* -0.427 0.083

Replication licensing complex 6 1.475* -1.154 -0.046 Aminoacyl-tRNA syntetases 33 0.425 -0.478* -0.131 * - Significantly different from zero (P < 0.05), using t-test with Bonferroni correction.

Page 42: Unifying measures of gene function and evolution

Conclusions• Three composite, independent variables – "status",

"adaptability" and "reactivity" – dominate the multidimensional data space of quantitative genomics.

• The notion of status provides biologically relevant null hypotheses regarding the connections between various measures.

• Breaks in the pattern possibly indicate something nontrivial (targets for further investigation).

• Functional groups of genes show distinctive patterns of status, adaptability, and reactivity

Page 43: Unifying measures of gene function and evolution

Co-AuthorsLiran Carmel Eugene Koonin

Yuri Wolf


Recommended