+ All Categories
Home > Documents > October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course...

October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course...

Date post: 11-Mar-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
32
Lecture 13: Population Structure October 8, 2012
Transcript
Page 1: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Lecture 13: Population Structure

October 8, 2012

Page 2: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Last Time

 Effective population size calculations

 Historical importance of drift: shifting balance or noise?

 Population structure

Page 3: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Today  Course feedback

 The F-Statistics

 Sample calculations of FST

 Defining populations on genetic criteria

Page 4: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Midterm Course Evaluations  Based on five responses: It’s not too late to have an impact!

 Lectures are generally OK

 Labs are valuable, but better organization and more feedback are needed

 Difficulty level is OK

 Book is awful

Page 5: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

F-Coefficients

 Quantification of the structure of genetic variation in populations: population structure

  Partition variation to the Total Population (T), Subpopulations (S), and Individuals (I)

T S

Page 6: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

F-Coefficients

Combine different sources of reduction in expected heterozygosity into one equation:

)1)(1(1 ISSTIT FFF −−=−

Deviation due to subpopulation differentiation

Overall deviation from H-W expectations

Deviation due to inbreeding within populations

Page 7: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

F-Coefficients and IBD

 View F-statistics as probability of Identity by Descent for different samples

)1)(1(1 ISSTIT FFF −−=−

Overall probability of IBD

Probability of IBD for 2 individuals in a subpopulation

Probability of IBD within an individual

Page 8: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

F-Statistics Can Measure Departures from Expected Heterozygosity Due to Wahlund Effect

S

ISIS H

HHF −=

T

ITIT H

HHF −=

T

STST H

HHF −=

where

HT is the average expected heterozygosity in the total

population

HI is observed heterozygosity within

a subpopulation

HS is the average expected heterozygosity

in subpopulations

Page 9: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Calculating FST Recessive allele for flower color

White: 10, Dark: 10

White: 2, Dark: 18

B2B2 = white; B1B1 and B1B2 = dark pink Subpopulation 1: F(white) = 10/20 = 0.5 F(B2)1 = q1= 0.5 = 0.707 p1=1-0.707 = 0.293

Subpopulation 2: F(white)=2/20=0.1 F(B2)2 = q2 = 0.1 = 0.32 p2 = 1-0.32 = 0.68

Page 10: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Calculating FST

For 2 subpopulations: HS = Σ2piqi/2 = (2(0.707)(0.293) + 2(0.32)(0.68))/2 HS= 0.425

Calculate Average HE of Subpopulations (HS)

White: 10, Dark: 10

White: 2, Dark: 18

Calculate Average HE for Merged Subpopulations (HT): F(white) = 12/40 = 0.3

q = 0.3 = 0.55; p=0.45 HT = 2pq = 2(0.55)(0.45) HT = 0.495

Page 11: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Bottom Line:

White: 10, Dark: 10

White: 2, Dark: 18

FST = (HT-HS)/HT =

(0.495 - 0.425)/ 0.495 = 0.14

  14% of the total variation in flower color alleles is due to variation among populations

AND

  Expected heterozygosity is increased 14% when subpopulations are merged (Wahlund Effect)

Page 12: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Nei's Gene Diversity: GST Nei's generalization of FST to multiple, multiallelic loci

Where HS is mean HE of m subpopulations, calculated for n alleles

with frequency of pj

T

STST H

DG =

HT =1! Pj2"

)1(11 1

2∑ ∑= =

−=m

i

n

jjS p

mH

STST HHD −=

Where pj is mean allele frequency of allele j over all subpopulation

Page 13: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Unbiased Estimate of FST  Weir and Cockerham's (1984) Theta

  Compensates for sampling error, which can cause large biases in FST or GST (e.g., if sample represents different proportions of populations)

  Calculated in terms of correlation coefficients Calculated by FSTAT software: http://www2.unil.ch/popgen/softwares/fstat.htm Goudet, J. (1995). "FSTAT (Version 1.2): A computer program to

calculate F- statistics." Journal of Heredity 86(6): 485-486.

Often simply referred to as FST in the literature

Weir, B.S. and C.C. Cockerham. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.

Page 14: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Linanthus parryae population structure  Annual plant in Mojave desert is classic

example of migration vs drift

 Allele for blue flower color is recessive

 Use F-statistics to partition variation among regions, subpopulations, and individuals

 FST can be calculated for any hierarchy:

 FRT: Variation due to differentiation of regions

 FSR: Variation due to differentiation among subpopulations within regions

Schemske and Bierzychudek 2007 Evolution

Page 15: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Linanthus parryae population structure

Page 16: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Hartl and Clark 2007

HT = 2 1! pm2

m"

#

$%

&

'(

R

SRSR H

HHF −=

FSR =0.1589! 0.1424

0.1589= 0.1036

T

RTRT H

HHF −=

FRT =0.2371! 0.1589

0.2371= 0.3299

T

STST H

HHF −=

FST =0.2371! 0.1424

0.2371= 0.3993

HR =1Nr

r!

Nr 1" prm2

m=1!

#

$%

&

'(

r=1

3

!

HS =130

1! pim2

m=1"

#

$%

&

'(

i=1

30

"

Page 17: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

FST as Variance Partitioning  Think of FST as proportion of genetic variation

partitioned among populations

qpqVFST)(

=where

V(q) is variance of q across

subpopulations

 Denominator is maximum amount of variance that could occur among subpopulations

Page 18: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Analysis of Molecular Variance (AMOVA)

 Analogous to Analysis of Variance (ANOVA)  Use pairwise genetic distances as ‘response’  Test significance using permutations

 Partition genetic diversity into different hierarchical levels, including regions, subpopulations, individuals

 Many types of marker data can be used

 Method of choice for dominant markers, sequence, and SNP

Page 19: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Phi Statistics from AMOVA

http://www.bioss.ac.uk/smart/unix/mamova/slides/frames.htm

222

2

cba

aCT σσσ

σφ

++=

Correlation of random pairs of haplotypes drawn from a region relative to pairs drawn from the

whole population (FRT)

22

2

cb

bSC σσ

σφ

+=

Correlation of random pairs of haplotypes drawn from an individual subpopulation

relative to pairs drawn from a region (FSR)

222

22

cba

baST σσσ

σσφ

++

+=

Correlation of random pairs of haplotypes drawn from an individual subpopulation relative to pairs drawn from the whole

population (FST)

Page 20: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

What if you don’t know how your samples are organized into

populations (i.e., you don’t know how many source populations you

have)?

What if reference samples aren’t from a single population? What if they are offspring from parents coming from different source

populations (admixture)?

Page 21: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

What’s a population anyway?

Page 22: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Defining populations on genetic criteria

  Assume subpopulations are at Hardy-Weinberg Equilibrium and linkage equilibrium

  Probabilistically ‘assign’ individuals to populations to minimize departures from equilibrium

  Can allow for admixture (individuals with different proportions of each population) and geographic information

  Bayesian approach using Monte-Carlo Markov Chain method to explore parameter space

  Implemented in STRUCTURE program:

http://pritch.bsd.uchicago.edu/structure.html

Londo and Schaal 2007 Mol Ecol 16:4523

Page 23: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Example: Taita Thrush data*

 Three main sampling locations in Kenya   Low migration rates (radio-tagging study)   155 individuals, genotyped at 7 microsatellite loci

Slide courtesy of Jonathan Pritchard

Page 24: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are
Page 25: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are
Page 26: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are
Page 27: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are
Page 28: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are
Page 29: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Estimating K

Structure is run separately at different values of K. The program computes a statistic that measures the fit of each value of K (sort of a penalized likelihood); this can be used to help select K.

  Taita thrush data 12345

~0 ~0 0.993 0.007 0.00005

Assumed value of K!

Posterior probability of K

Page 30: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Another method for inference of K

 The ΔK method of Evanno et al. (2005, Mol. Ecol. 14: 2611-2620):

Eckert, Population Structure, 5-Aug-2008 46

Page 31: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Inferred population structure

Each individual is a thin vertical line that is partitioned into K colored segments according to its membership coefficients in K clusters."

Africans Europeans MidEast Cent/S Asia Asia Oceania America

Rosenberg et al. 2002 Science 298: 2381-2385

Page 32: October 8, 2012sdifazio/popgen_12/lectures/oct8_population_structure2.pdf · Midterm Course Evaluations Based on five responses: It’s not too late to have an impact! Lectures are

Inferred population structure – regions

Rosenberg et al. 2002 Science 298: 2381-2385


Recommended