Population Genetics: Chapter 3 Epidemiology 217 January 16, 2011.

Post on 16-Jan-2016

215 views 0 download



Population Genetics: Chapter 3

Epidemiology 217January 16, 2011


Allele Frequency Estimation

Hardy-Weinberg equilibrium (HWE)

HWE Game

Population Substructure

Allele Frequency

Diploid, autosomal locus with 2 alleles: A and aAllele frequency is the fraction:

No. of particular allele

No. of all alleles in population

0.86 0.14

0.64 0.36

0.53 0.47

0.93 0.08

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00





rs1036819: longevity SNP


Allele (Gamete) Frequency

Let p = Freq(A) frequency of the dominant allele

Let q = Freq(a) frequency of the recessive allele

Then, p + q =1

Genotype Frequency

p2 = frequency of homozygous dominant genotype q2 = frequency of homozygous recessive

genotype 2pq = frequency of heterozygous genotype

Then, p2 +2pq + q2 =1

Estimating Allele Frequencies from Genotype Frequencies

Frequency of A allele = p2 + ½ (2pq)Frequency of a allele = q2 + ½ (2pq)

Genotypes: AA Aa aaFrequency: p2 2pq q2

Ex. Calculation: Allele Frequencies

In Pop 1:

Assume N=200 in each of two populationsPop 1: 90 AA 40 Aa 70 aa (N=200)Pop 2: 45 AA 130Aa 25 aa (N=200)

p = 90/200 + ½ (40/200) = 0.45 + 0.10 = 0.55q = 70/200 + ½ (40/200) = 0.35 + 0.10 = 0.45

In Pop 2:p = 45/200 + ½ (130/200) = 0.225 + 0.325 = 0.55q = 25/200 + ½ (130/200) = 0.125 + 0.325 = 0.45

Take home points

p + q =1 (sum of the allele frequencies = 1)

p2 + 2pq + q2 =1 (sum of the genotype frequencies = 1)

Two populations with markedly different genotype

frequencies can have the same allele frequencies


The Hardy–Weinberg principle states that both allele and genotype frequencies in a population remain constant—that is, they are in equilibrium—from generation to generation unless specific disturbing influences are introduced

p2 + 2pq + q2 = 1

Hardy-Weinberg Assumptions

Allele frequencies do not vary IF:Large populationRandom matingNo in or out migrationNo isolated groups within the populationNo mutationNo selection (no allele is advantageous)

Test of Hardy-Weinberg Equilibrium

Allele frequencies

G alleles = 100*2 + 30 = 230

A alleles =20*2 + 30 = 70

Total alleles = 300

1. Calculate observed allele & genotype frequencies

Genotype frequencies

GG = 100/150 = 0.67

AG =30/150 = 0.20

AA = 20/150 = 0.13

100 GG

30 AG

20 AA

G afq (p) = 230/300 = 0.71

A afq (q) = 1-p = 0.23

Test of Hardy-Weinberg Equilibrium

p2 (GG) = 0.77 * 0.77 = 0.59

2pq (AG) = 2 * 0.77 * 0.23 = 0.35

q2 (AA) = 0.23 * 0.23 = 0.05

2. Calculate expected genotype frequencies based on HW: p2 + 2pq + q2 = 1

Test of Hardy-Weinberg Equilibrium

expected observed

GG 0.59 0.67

AG 0.35 0.20

AA 0.06 0.13

3. Compare expected genotype frequencies to observed frequencies

Chi-square test = Σ(observed – expected)2/expected

= 29.17 with 1 degree of freedom

p = 6.6 x 10-8 > Out of H-W

HWE can be easily expanded to account for any number of alleles at a locus

3 allele case (p1, p2, p3)

Allele frequencies: p1 + p2 + p3 = 1

Genotype frequencies:p1

2 + p22 + p3

2 + 2p1p2 + 2p1p3 + 2p2p3= 1

4 allele case (p1, p2, p3, p4)Allele frequencies: p1 + p2 + p3 + p4= 1

Genotype frequencies:p12 + p22 + p32 + p4

2 + 2p1p2 + 2p1p3 + 2p2p3 + 2p3p4=


Application of Hardy-Weinberg Equilibrium

For genetic association studies:

Used as QC measure to assess the accuracy of the genotyping method

Expect SNPs to be in HWE among control populations (ethnic-specific)

Violations of HWE could indicate genotyping errors or bias in data

HWE Game

1. Everyone receives ~5 pairs of cards

2. Two allele model: Red (R allele) & Black (B allele)

3. Random Mating: Exchange one card from each pair with another person (keep cards face down)

4. Determine genotype frequency: RR, RB, BB

5. Determine allele frequency: R, B

Population Stratification

Population stratification is a form of confounding in genetic studies where a gene under study shows marked variation in allele frequency across subgroups of a population and these subgroups differ in their baseline risk of disease

Population Stratification: Confounding

Exposure of Interest

True Risk Factor Disease

Genotype of Interest



True Risk Factor

Wacholder, JNCI, 2000

Population Stratification: Gm3;5,13,14 in admixed sample of Native Americans of the Pima and Papago tribes

Study Population: 4,290 Pima and Papago Indians

Genetic Variant: Gm 3;5,13, 15 haplotype (Gm system of human immunoglobulin G)

Outcome: Type 2 diabetes

Question: Is the Gm 3; 5,13, 15 haplotype associated with Type 2 diabetes?

Knowler, AJHG, 1998

Population Stratification: Gm3;5,13,14 in admixed sample of Native Americans of the Pima and Papago tribes

Unadjusted for ethnic backgroundOR = 0.27 (95% 0.18-0.40)

Full heritage American Indian population

+ -

Gm3;5,13,14 ~1% ~99%

NIDDM prevalence ~40%

Caucasian population

+ -

Gm3;5,13,14 ~66% ~34%

NIDDM prevalence ~15%

Gm3,5,13,14 haplotype Cases Controls

+ 7.80% 29.00%

- 92.20% 71.00%

Population Stratification: Gm3;5,13,14 in admixed sample of Native Americans of the Pima and Papago tribes

Gm3,5,13,14 haplotype Cases Controls

+ 7.80% 29.00%

- 92.20% 71.00%

Adjusted for ethnic background OR = 0.83 (95% 0.58-1.18)

Index of Indian heritage

Gm3;5,13,14 haplotype

% Diabetes

0 65.8% 18.5%

4 42.1% 28.5%

8 1.6% 39.2%

Ancestry Informative Markers

Polymorphisms with known allele frequency differences across ancestral groups

Useful in estimating ancestry in admixed individuals

Example: Duffy locus (codes for blood group)

100% sub-Saharan Africans vs. other groups

protects P. vivax (malaria)

Example AIM: Duffy locus

0.15 0.85

0.99 0.01

0.77 0.24



0.53 0.47



0.00 0.20 0.40 0.60 0.80 1.00

U.S. Hispanics


U.S. Whites



U.S. Blacks

Central Africa


rs1814778: Duffy locus




Population Inbreeding

Population inbreeding occurs when there is a preference of mating between close relatives or because of geographic isolation in a population. This will cause deviations in HWE by causing a deficit of heterozygotes.

How to quantify the amount of inbreeding in a population?

Inbreeding coefficient, F

The probability that a random individual in the population inherits two copies of the same allele from a common ancestor

F ranges 0 to 1:

F is low in random mating populations

F close to 1 in self-breeding population (plants)

Helgason, Science, 2008

Kinship & Reproduction: Icelandic couples

# of children

# of children

that reproduce

# of grandchildren

mean lifespan of children