population genetics of gene function (talk)

Post on 21-Jan-2015

155 views 2 download

Tags:

description

Talk given at Popgroup 46, Glasgow, December 2012

transcript

Popula'on  gene'cs  of  gene  func'on  

Ignacio  Gallo  

Glasgow,  

December  2012  

Mo'va'on  

“Molecular  signatures  of  natural  selec0on”,  Nielsen  2005:  

“inferences  regarding  the  paAerns  and  distribu'on  of  selec'on  in  genes  and  genomes  may  provide  important  func'onal  informa'on”  

Wikipedia  entry  on  “sta0s0cal  thermodynamics”:  

“The  goal  of  sta's'cal  thermodynamics  is  to  understand  and  to  interpret  the  measurable  macroscopic  proper'es  of  materials  in  terms  of  the  proper'es  of  their  cons'tuent  par'cles  and  the  interac'ons  between  them”  

Mo'va'on  

The  func'onal  importance  of  a  gene'c  sequence  can  be  inferred  by  its  popula'on  distribu'on  (for  example  from  its  degree  of  conserva'on).  

Can  anything  more  be  said  about  the  gene’s  specific  func'on  (survival,  reproduc'on,  etc)?  

distribu)on  of  genes     gene  func)on    

macroscopic  proper)es  of  materials    

proper)es  of  their  cons)tuent  par)cles  (size,  speed,  etc)  

Moran  model:  

Model  varia)on:  

birth  death  

change  in  frequency  for  a  given  phenotype  

p−

p+

q −

q +

1− p− − q −

p+

q +

We  have  two  phenotypes,  P1  and  P2  

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If      p - + q - < 1    in  some  intervals  nothing  happens.  

If  a  death  happens,  a  birth  happens  instantaneously  (“musical  chairs”  process).  

p−

p+

q −

q +

1− p− − q −

p+

q +

We  have  two  phenotypes,  P1  and  P2  

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If      p - + q - < 1    in  some  intervals  nothing  happens.  

If  a  death  happens,  a  birth  happens  instantaneously  (“musical  chairs”  process).  

loop  

p−

p+

q −

q +

1− p− − q −

p+

q +

We  have  two  phenotypes,  P1  and  P2  

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If      p - + q - < 1    in  some  intervals  nothing  happens.  

If  a  death  happens,  a  birth  happens  instantaneously  (“musical  chairs”  process).  

loop  

no  loop  

p−

p+

q −

q +

1− p− − q −

p+

q +

We  have  two  phenotypes,  P1  and  P2  

p−, p+ : death and birth probabilities for P1q −, q + : death and birth probabilities for P2

If      p - + q - < 1    in  some  intervals  nothing  happens.  

If  a  death  happens,  a  birth  happens  instantaneously  (“musical  chairs”  process).  

loop  

no  loop  

p−

p+

q −

q +

1− p− − q −

p+

q +

We  differen'ate  the  phenotypes’  reproduc've  fitness  and  life'mes  independently,  and  consider  reproduc)on  and  survival  as  two  different  func'ons.  

W1 : offspring for type 1,W2 : offspring for type 2,⎧ ⎨ ⎩

T1 : average lifespan for type 1,T2 : average lifespan for type 2.⎧ ⎨ ⎩

reproduc,on  

survival  

u : mutation probability

and  we  are  interested  in  the  equilibrium  distribu'on  of  a  process  with  symmetric  reversible  muta)on  for  haploid  individuals  

q + =

W1

T1u x +

W2

T2(1−u)(1− x)

W1

T1x +W2

T2(1− x)

.€

q − =1− xT2

.

p+ =

W1

T1(1−u)x +

W2

T2u (1− x)

W1

T1x +W2

T2(1− x)

,€

p− =xT1,

Death  probabili)es:  

Birth  probabili)es:  

p−

p+

q −

q +

1− p− − q −

p+

q +

Variable  x  is  the  frequency  of  phenotype  P1  (so  frequency  of  P2  is  1 - x )  

q + =

W1

T1u x +

W2

T2(1−u)(1− x)

W1

T1x +W2

T2(1− x)

.€

q − =1− xT2

.

p+ =

W1

T1(1−u)x +

W2

T2u (1− x)

W1

T1x +W2

T2(1− x)

,€

p− =xT1,

Death  probabili)es:  

Birth  probabili)es:  

p−

p+

q −

q +

1− p− − q −

p+

q +

⇒ p− + q − < 1

Variable  x  is  the  frequency  of  phenotype  P1  (so  frequency  of  P2  is  1 - x )  

Nu N→∞⎯ → ⎯ ⎯ θ,

The  model  therefore  depends  on:  

To   get   a   non   trivial   distribu'on   the   following   asympto'c   constraints  are  imposed  on  the  parameters:  

θ (mutation), s (reproduction), λ (survival).

For notational convenience we also define λ =T1T2.

N W1

W2−1

⎝ ⎜

⎠ ⎟ u→ 0⎯ → ⎯ ⎯ s.

Asympto)c  parameters  

Nu N→∞⎯ → ⎯ ⎯ θ,

The  model  therefore  depends  on:  

To   get   a   non   trivial   distribu'on   the   following   asympto'c   constraints  are  imposed  on  the  parameters:  

θ (mutation), s (reproduction), λ (survival).

For notational convenience we also define λ =T1T2.

N W1

W2−1

⎝ ⎜

⎠ ⎟ u→ 0⎯ → ⎯ ⎯ s.

Asympto)c  parameters  

M = E[Xt +1 − Xt ] =1NT1

⋅θ λ2( 1− x )2 + λ s x ( 1− x ) −θ x2

x + λ (1− x),

V = E (Xt +1 − Xt )2[ ] =

2 λNT1

⋅x ( 1− x )x + λ (1− x)

.

M = E[Xt +1 − Xt ] =1NT1

⋅θ λ2( 1− x )2 + λ s x ( 1− x ) −θ x2

x + λ (1− x),

V = E (Xt +1 − Xt )2[ ] =

2 λNT1

⋅x ( 1− x )x + λ (1− x)

.

φ (x)= C⋅ 1V⋅ exp 2 M

Vdx∫

⎧ ⎨ ⎩

⎫ ⎬ ⎭ ,

φ (x)= Ceα x xλ θ −1 (1− x)θλ−1x + λ (1− x){ },

α = s +θ1λ− λ

⎝ ⎜

⎠ ⎟ .

which  explicitly  gives  

where  

The  Wright  equilibrium  distribu'on  for  large  N  is    

Equilibrium  distribu)on  

φ (x)= C⋅ 1V⋅ exp 2 M

Vdx∫

⎧ ⎨ ⎩

⎫ ⎬ ⎭ ,

φ (x)= Ceα x xλ θ −1 (1− x)θλ−1x + λ (1− x){ },

α = s +θ1λ− λ

⎝ ⎜

⎠ ⎟ .

which  explicitly  gives  

where  

The  Wright  equilibrium  distribu'on  for  large  N  is    

Equilibrium  distribu)on  

Typical  shapes  for  equilibrium  distribu)ons

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

low mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

high mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

mutation probability close to 1N

Typical  shapes  for  equilibrium  distribu)ons

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

low mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

high mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

mutation probability close to 1N

“u”:    probability  of  muta'on  per  site  

sta'

onary  po

ints  

λ =T1T2

=1 (life'mes  are  equal  for  the  two    phenotypes)  

random  driR    muta'on/selec'on  balance  

1N

Sta)onary  points

“u”:    probability  of  muta'on  per  site  

sta'

onary  po

ints  

λ =T1T2

=1 (life'mes  are  equal  for  the  two    phenotypes)  

random  driR    muta'on/selec'on  balance  

1N

Sta)onary  points

sta'

onary  po

ints  

λ =T1T2

=32

“u”:    probability  of  muta'on  per  site  €

1N

random  driR    muta'on/selec'on  balance  

“u”:    probability  of  muta'on  per  site  

sta'

onary  po

ints  

λ =T1T2

=1 (life'mes  are  equal  for  the  two    phenotypes)  

random  driR    muta'on/selec'on  balance  

1N

Sta)onary  points

random  driR    muta'on/selec'on  balance  

sta'

onary  po

ints  

λ =T1T2

=32

“u”:    probability  of  muta'on  per  site  €

1N

λ =T1T2

=32

1N

Sta)onary  points

sta'

onary  po

ints  

λ =T1T2

=32

“u”:    probability  of  muta'on  per  site  €

1N

random  driR    muta'on/selec'on  balance  

1N

Typical  shapes  for  equilibrium  distribu)ons

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

low mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

high mutation

rela've  frequency  of  “blue”  phenotypes  

prob

ability  den

sity  

mutation probability close to 1N

The   model   includes   one   more   parameter   than   the   standard   seTng,     so   it’s  desirable  expand  the  number  of  independent  sta)s)cs.  

This  can  be  done  considering  the  amount  of  synonymous  varia'on  included  in  each  of  our  two  phenotypes,  and  considering  it  neutral  (as  done  by  Nielsen  and  colleagues  for  various  types  of  models).  

The  amount  of  synonymous  varia'on  can  be  quan'fied  by  using  the  inbreeding  coefficient  concept.  

Inferring λ = T1T2

from population statistics

phenotypes  P1  and  P2  

genotypes  genotypes  

Distribu)on  of  a  gene  throughout  a  popula)on  (Kreitman  1983)  

x = relative frequency of P1F1 = inbreeding coefficient for P1F2 = inbreeding coefficient for P2

Sta)s)cal  quan))es

P1  P2  

F1 /x +θ λ F1 /x − F1( ) + (L −1) F1[ ] − 1/x ≈ 0

F2 /(1− x) +θ1λF2 /(1− x) − F2( ) + (L −1) F2

⎣ ⎢ ⎤

⎦ ⎥ − 1/(1− x) ≈ 0

A  result  by  Kimura  and  Crow  gives  that  for  only  one  phenotype  

F ≈1

1+ 2θ

This  can  be  extended  to  the  case  of  two  phenotypes  to  give  two  equa'ons:  

where  θ  is  the  rescaled  muta'on  rate.  

0.5 1 1.5 20.75 1.25 1.75

0.5

1

1.5

2

.75

1.25

1.75

Real

Estim

ated

T1 = 2 T2T1 = 1/2 T2

This   (hideous)   formula  can  be  used  to  derive  the  value  of    

λ    

from  combined  moments  of  quan''es    x ,      F1 ,      F2    

es'mated   from   a   set   of   simulated  realiza'ons  of  the  process.  

number of realizations per point =10000

running time = 5000 "generations"

simulation parameters :s = −5, θ = 7, N =1000, L (genotype length) = 40

following auxiliary quantities

R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.

In terms of these quantities, the equation for λ takes the following form

R = λλQ1 + L− 1

Q2 + λ(L− 1),

and this relation leads to a quadratic equation that only admits one non-negative solution:

λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)

Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of

simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):

we see that the average values of such estimations are well aligned with the actual values.

The magnitude of the standard deviation for our estimations, on the other hand, is

considerable, especially in view of the fact the 10000 realisations of the process were used

to estimate each value of λ: it is clear that a substantial increase of efficiency will be

needed to make the theory relevant to actual empirical phenomena.

This practical consideration ought not to be allowed, however, to obfuscate the fact

that equation (4.5) provides a direct mathematical relation between combined moments

of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably

contains information about the function of a genetic sequence.

5 Outlook

We have shown that the effect of differentiating the lifetimes of two phenotypes inde-

pendently from their fertility includes a qualitative change in the equilibrium state of a

population: since survival and reproduction are quite distinct macro-functions performed

by any living organism, this may contribute to extend the population-genetical charac-

terisation of biological function.

26

following auxiliary quantities

R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.

In terms of these quantities, the equation for λ takes the following form

R = λλQ1 + L− 1

Q2 + λ(L− 1),

and this relation leads to a quadratic equation that only admits one non-negative solution:

λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)

Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of

simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):

we see that the average values of such estimations are well aligned with the actual values.

The magnitude of the standard deviation for our estimations, on the other hand, is

considerable, especially in view of the fact the 10000 realisations of the process were used

to estimate each value of λ: it is clear that a substantial increase of efficiency will be

needed to make the theory relevant to actual empirical phenomena.

This practical consideration ought not to be allowed, however, to obfuscate the fact

that equation (4.5) provides a direct mathematical relation between combined moments

of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably

contains information about the function of a genetic sequence.

5 Outlook

We have shown that the effect of differentiating the lifetimes of two phenotypes inde-

pendently from their fertility includes a qualitative change in the equilibrium state of a

population: since survival and reproduction are quite distinct macro-functions performed

by any living organism, this may contribute to extend the population-genetical charac-

terisation of biological function.

26

following auxiliary quantities

R =�F2��F1�

· �1/x� − �F1/x��1/(1− x)� − �F2/(1− x)� ,

Q1 =�F1/x��F1�

− 1, Q2 =�F2/x��F2�

− 1.

In terms of these quantities, the equation for λ takes the following form

R = λλQ1 + L− 1

Q2 + λ(L− 1),

and this relation leads to a quadratic equation that only admits one non-negative solution:

λ =1

2Q1

�(R− 1)(L− 1) +

�(R− 1)2(L− 1)2 + 4RQ1Q2

�. (4.5)

Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of

simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2T2) to 2 (i.e. T1 = 2T2):

we see that the average values of such estimations are well aligned with the actual values.

The magnitude of the standard deviation for our estimations, on the other hand, is

considerable, especially in view of the fact the 10000 realisations of the process were used

to estimate each value of λ: it is clear that a substantial increase of efficiency will be

needed to make the theory relevant to actual empirical phenomena.

This practical consideration ought not to be allowed, however, to obfuscate the fact

that equation (4.5) provides a direct mathematical relation between combined moments

of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably

contains information about the function of a genetic sequence.

5 Outlook

We have shown that the effect of differentiating the lifetimes of two phenotypes inde-

pendently from their fertility includes a qualitative change in the equilibrium state of a

population: since survival and reproduction are quite distinct macro-functions performed

by any living organism, this may contribute to extend the population-genetical charac-

terisation of biological function.

26

Summary  

•  Playing  with  details  of  process  is  fun  

•  In  principle  “func'onal”  parameter        λ  =  T1 / T2  can  be  es'mated  from  “popula'on  observables”      x ,      F1 ,      F2  

Thank  you!  

“Disclaimer”  

…there  is  no  such  thing  as  the  “func'on  of  a  gene”,  like  there  is  no  such  thing  as  the  “meaning  of  a  word”.  

For  example,  the  word  “Gene”  can  mean:  

…but  dic'onaries  exist  (and  they  are  obsolete  (but  they’re  a  start  (or  maybe  not))).  

or