Genetic Admixture, Human Population History and Local ... · PDF fileGenetic Admixture, Human...

Post on 05-Feb-2018

217 views 1 download

transcript

Genetic Admixture, Human Population History and Local Adaptation

Shuhua Xu

CAS-MPG Partner Institute for

Computational Biology (PICB)

Otto Warburg International Summer School and Research Symposium 2013

1. A brief introduction to genetic admixture

2. Inference of population history with recombination

information and using admixture analysis

3. Local adaptation in admixed populations

Outline

A brief introduction to genetic admixture

1

• Genetic admixture refers to the result of interbreeding

between two or more previously isolated populations within a

species.

unpublished data Xu et al., AJHG 2009

Parental populations Admixed populations

European x African African American

European x Amerindian x African Latino, Mexican, Hispanic etc.

European x East Asian Uyghur, Kazakh, etc.

Genetic Admixture and Admixed Populations

Evolutionary studies

Increased population genetic diversity — evolutionary impact

Shed light on human migration history

Local adaptation

Medical studies

Increased individual genome herterozygosity — medical impact

Mapping genes — admixture mapping

Why interested in admixed populations?

Disease mapping strategies

Family-based linkage studies

Population-based association studies

Unlikely exist

No good strategy Mag

nitud

e o

f effect

Frequency in population

Principle of Association Study

3/6

2/4 3/2

6/2

3/5

2/6

3/6 5/6

Allele 6 is ‘associated’ with disease

4/6 2/6

6/6

6/6

3/4

5/2

Controls Cases

• Population stratification in Epidemiology.

• Analysis of mixed samples having different allele frequencies is a primary concern in human genetics, as it leads to false evidence for allelic association.

Population structures make trouble in association

studies

Association due to population structure

Population 1 Population 2

Case

Control

Genotype AA Aa aa

Association due to population structure

Genotype AA Aa aa

Population 1 Population 2

Case

Control

Odds Ratio (OR)

Odds ratio (OR) = Odds for case: a/c

Odds for control: b/d

Disease

Exposure yes no total

yes a b a + b

no c d c + d

total a + c b + d a + b + c + d

a/c

b/d

OR>1: exposure factors increase the risk of disease; positive association

OR<1: exposure factors decrease the risk of disease; negative association

OR=1: no association

Explanation of OR

Example

(+) (-)

Case Control

A 50 20

a 50 80

Odds for case 50:50 = 1

Odds for control 20:80 = 0.25

Odds ratio = 50:50/20:80 = 1/0.25 = 4

Allele

Heterogeneity/Stratification

Total Population case control

A 51 59 110

a 549 1,341 1,890

600 1,400 2,000

Subpopulation 1 Subpopulation 2

case control case control

A 50 50 100 A 1 9 10

a 450 450 900 a 99 891 990

500 500 1,000 100 900 1000

= 9.2% = 4.4% 51

549

59

1,341 OR=2.11 !

OR=1 OR=1

Xu et al. AJHG 2009

Geographic Distribution Sample Size

PC1 distribution Heterozygosity

Xu et al. AJHG 2009

PCA of world-wide samples

PCA of East Asian samples

Xu et al. AJHG 2009

After removing populations sampled from Beijing, Shanghai, Guangzhou, Anhui, and Jiangsu

Xu et al. AJHG 2009

Xu et al. AJHG 2009

Genomic control

• Devlin and Roeder (1999) used theoretical arguments to propose that with population structure, the distribution of Cochran-Armitage trend tests, genome-wide, is inflated by a constant multiplicative factor λ.

• We can estimate the multiplicative inflation factor using the statistic λ = median(Xi

2)/0.465.

• Inflation factor λ > 1 indicates population structure and/or genotyping error.

• We can carry out an adjusted test of association that takes account of any mismatching of cases/controls at any SNP using the statistic Xi

2/ λ. Inflation factor λ = 1.11

Xu et al. AJHG 2009

• Human populations are generally not homogenous, as a whole, is not in HWE due to population structure

• Population substructure can cause false positive or false negative results in association studies

• Population substructure can be controlled and corrected with AIMs, but only globally

• Controlling local structure due to population admixture is challenging, but the information itself is useful for both evolutionary and medical studies

Summary

Population structure vs population admixture

1 generation ago

2 generations ago

3 generations ago

4 generations ago

Two African chromosomes

Two European chromosomes

One African One European chromosome

Disease locus

today

Admixture Mapping

100%

50%

0% 20cM 40cM 60cM 80cM 100cM 120cM 140cM

Position on chromosome (centimorgans)

• Controls are not necessary!

• The perfect control is the rest of peoples’ genome

• ~2,000 SNPs for genome-wide mapping

• Reducing multiple testing and computational burden

Human population admixture in Asia is common

Population Structure and

Genetic History of Uyghurs

Xu et al. Am.J.Hum.Genet. 2008a

Xu & Jin Am.J.Hum.Genet. 2008b

Xu et al. Mol.Biol.Evol. 2009

Genetic relationship of Uyghurs

and HapMap populations

Xu et al. AJHG 2008

Cluster relationship of populations

Xu & Jin. AJHG 2008

Population Genetic Structure

Southern Uyghurs Northern Uyghurs

European East Asian

Xu & Jin. AJHG 2008

Inference of population history with recombination

information and using admixture analysis

2

Genetic information used to

reconstruct human phylogeny

① Mutation

Kivisild et al. 2002 (YCC, 2002)

mtDNA Y chromosome

② Drift (allele frequency)

• The accumulated recombination events in genome is expected to provide additional information for human genetic relationship studies.

• The vast recombination information in human genome is generally ignored or deliberately avoided in studies on human population genetic relationship.

③ Recombination

►4Ner: population recombination parameter.

►Alternatively denoted by ρ, 4Nec or C

– r or c is the recombination rate across the region of interest;

– Ne is the effective population size.

Population recombination rate (4Ner)

• Estimation of population recombination parameter 4Ner from genotyping data is computationally challenging.

• The theory of optimal estimation is not fully worked out.

• Estimators rely on assumptions about demography and selective neutrality.

Challenges in Studies on Recombination

• Full sequence data

• Polymorphisims

• Rare mutations

• CNVs

• Small indels

• Recombination

Information from NGS

Reconstruct human phylogeny using

recombination information

Now

Admixture point

Inter-ancestral segments Intra-ancestral segments

pre-

post-

Intra-ancestral segments

Modified from Xu et al,(AJHG 2008a)

Recombination info in admixed genomes

Dating Austronesian Expansion

Xu et al, PNAS 2012

Geographical distribution and relationship of genetic components

• Because genetic recombination breaks down parental

genomes into segments of different sizes, the genome of

a descendant of an admixture event is composed of

different combinations of these ancestral segments.

• Admixture time can be estimated from the information

based on the distribution of ancestral segments and the

recombination breakpoints in an admixed genome.

• Admixture time can be considered as an estimation of

the expansion time of the population of the second wave

of migration.

Principles and Methods

Xu et al, PNAS 2012

Dating Population Admixture using recombination information

Xu et al, PNAS 2012

Estimating admixture time using different methods

Xu et al, PNAS 2012

Estimation of recombination parameter and admixture time

Xu et al, PNAS 2012

A “cline” of admixture time decreasing from west to east

• We provided the first genetic dating for Austronesian

expansion using recombination and admixture analysis.

• Our analysis indicates a cline of decreasing time of

admixture across E. Indonesia, with oldest time in the

west and youngest time in the east.

• The estimated Austronesian expansion began was about

4,000 years ago, in excellent agreement with inferences

based on linguistic and archeological information.

Xu et al, PNAS 2012

Summary

Local adaption in admixed populations

3

Population Genomics

Population Genetics

Local adaptation (positive selection)

Functional restriction (negative selection)

Disease (negative selection)

High Altitude Adaptation in Tibetans

The Tibetan Plateau, known as "the roof of the world" and with an average elevation of over 4,500 meters, is the highest plateau in the world.

Identification of HAA genes

HIF2A encodes HIF-2α, a transcription factor involved In the induction of genes regulated by oxygen.

HIFPH2 encodes HIF-prolyl hydroxylase 2 , which catalyzes the post-translational formation of 4-hydroxyproline in HIF-α

Detecting selection in admixed population based on biased ancestry contribution

140cM

100%

50%

0%

20cM 40cM 60cM 80cM 100cM 120cM

Position on chromosome (cM)

Pro

port

ion o

f an

cest

ry

European

Population admixture

Current African American

African

Selection before admixture

Tough living conditions

Selection After admixture

Local environmental

challenges

Middle Passage

N Generations

Selection before admixture

Schematic of possible natural selections in African Americans

Schematic of local ancestry inference and genome partition

African component European component

AfA

Local ancestry inference

Partitioning admixed genomes and detecting nature selection

Genome Research, 2009

Developing Methods for detecting local adaptation signatures in admixed genomes

HB

B

HLA

-C

CD

36

Regions with highly differentiated allele frequency between AAF and African

Regions or

SNPs Position Size (bp) SNPs

Highest

FST Genes Pathways Related disease

1p21 chr1:100125058..1

00183875 58817 2 0.0562 AGL*

Metabolism of

carbohydrate Glycogen storage disease

1q22 chr1:153401959..1

53464086 62127 4 0.0692

THBS3*, MUC1*, MTX1, TRIM46, KRTCAP2

Signaling by PDGF Stomach cancer, breast cancer, osteosarcoma

rs12094201 chr1:236509336 1 1 0.0561 (ZP4* 389kb) NA Hypertension,

Non-alcoholic fatty liver

rs7642575 chr3: 31400165 1 1 0.0453 (STT3B, OSBPL10*

149 kb) NA Peripheral arterial disease

6p21-p22 chr6:26554684..33

961049 7406365 11 0.0711

HLA-B*, HLA-C, EHMT2*, HLA-DPA1*,

HLA-DRB5, EHM,

BTN3A3, et al

Signaling by GPCG,

signaling in immune

system, HIV infection,

Diabetes pathway

HIV, Crohn’s disease, rheumatoid arthritis,

juvenile idiopathic

arthritis, colorectal cancer, systemic sclerosis

6q25 chr6:151555551..1

51569258 13707 2 0.0545 (AKAP12* 40kb) Cell growth

Hypertension,

hemorrhagic stroke

rs10499542 chr7: 22235870 1 0.04606 RAPGEF5* GTP/GDP-regulation Thyroid stimulating

hormone

7q21 chr7:79768487..80

482597 714110 10 0.0946 CD36*, SEMA3C

Metabolism of lipids and lipoprotein

Metabolic syndrome, malaria

8q24 chr8:143754039..1

43758933 4894 2 0.04679 PSCA* NA

Prostate cancer, bladder

cancer, gastric cancer

11p15 chr11:5034229..54

21456 387227 3 0.0617

HBB*, HBD*, HBE1*,

HBG2, OR51I1, et al Signaling by GPCR

Sickle cell disease,

beta-thalassemia, malaria

rs6015945 chr20:59319574 1 1 0.0627 CDH4* Cell junction organization

Alzheimer's Disease

1

Ingenuity pathway analysis (IPA)

• Diseases and Disorders – Metabolic diseases (P = 1.5110-16)

– Endocrine system disorder (P = 2.23 10-16)

– Immunological diseases (P = 9.3010-12)

– Genetic disorder (P = 5.6710-11).

• Pathways (all related with immune system) – Antigen presentation pathway (P = 1.9510-4)

– allograft rejection signaling (P = 4.6910-3)

– Graft-versus-host disease signaling (P = 4.6910-3)

– Autoimmune thyroid diseases signaling (P = 5.3510-3)

Evolutionary analysis of human disease related genes

• Human disease genes have been subjected to both purifying and positive selection.

Andreas Dress, Wenfei Jin; Haiyi Lou; Administration staff; IT staff …

Li Jin; Shilin Li; Yajun Yang…

Mark Stoneking Irina Pugach

Children’s Hospital Boston,

Harvard Medical School Bailin Wu; Yiping Shen

Edison T. Liu

Mark Seielstad

Guoping Zhao; Wei Huang; Ying Wang; Haifeng Wang

Max-Planck EVA

Chinese National Human Genome Center at Shanghai

Fudan University CAS-MPG PICB

Manfred Kayser

Erasmus MC University

Maude Phipps Zilfalil Bin Alwi Boon Peng Hoh

The HUGO PanAsian

SNP Consortium

93 scientists, >10 countries

Anhui Medical University

Xuejun Zhang; Xianyong Yin

Acknowledgements

Thank you!