+ All Categories
Home > Documents > Haplotype tagging efficiency in worldwide populations in CTLA4 gene

Haplotype tagging efficiency in worldwide populations in CTLA4 gene

Date post: 08-Apr-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
12
FULL PAPER Haplotype tagging efficiency in worldwide populations in CTLA4 gene A Ramı ´rez-Soriano 1 , O Lao 1,2 , M Soldevila 1 , F Calafell 1 , J Bertranpetit 1 and D Comas 1 1 Unitat de Biologia Evolutiva, Departament de Cie `ncies Experimentals i de la Salut, Facultat de Cie `ncies de la Salut i de la Vida, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain; 2 Erasmus University Medical Centre, Rotterdam, The Netherlands The cytotoxic T lymphocyte antigen 4 (CTLA4) acts as a potent negative regulator of T-cell response, and has been suggested as a pivotal candidate gene for autoimmune disorders such as Graves’ disease, type 1 diabetes and autoimmune hypothyroidism, among others. Several single-nucleotide polymorphisms (SNPs) have been proposed as the susceptibility variants, or to be in strong linkage disequilibrium (LD) with the variant. Nevertheless, contradictory results have been found, which may be due to lack of knowledge of the genetic structure of CTLA4 and its geographic variation. We have typed 17 SNPs throughout the CTLA4 gene region in order to analyze the haplotype diversity and LD structure in a worldwide population set (1262 individuals from 44 populations) to understand the variation pattern of the region. Allele and haplotype frequency differentiation between populations is consistent with genomewide averages and points to a lack of strong population-specific selection pressures. LD is high and its pattern is not significantly different within or between continents. However, haplotype composition is significantly different between geographical groups. A continent-specific set of haplotype tagging SNPs has been designed to be used for future association studies. These are portable among populations, although their efficiency might vary depending on the population haplotype spectrum. Genes and Immunity (2005) 6, 646–657. doi:10.1038/sj.gene.6364251; published online 21 July 2005 Keywords: CTLA4 gene; population diversity; haplotype; linkage disequilibrium; tag-SNP Introduction CTLA4 (cytotoxic T lymphocyte antigen 4) is a member of the immunoglobulin supergene family that functions as a potent negative regulator of T-cell response, 1,2 its lack causing massive lymphoproliferative disorders, fatal multiorgan destruction and early death in knockout and CTLA4-deficient mice. 3 CTLA4 gene is located at position 2q33 in a region also containing the ICOS and CD28 genes, and shares with the latter a high nucleotide identity strongly suggesting that they are the result of a gene duplication. 4 CTLA4 is composed of four exons encoding different functional domains: a leader se- quence, and extracellular, transmembrane and cytoplas- mic domains. Two amino-acid replacement SNPs have been described for this gene: 49 A/G (T17A) in exon 1 and 2814 G/A in exon 2 (M90I), the latter described in African Americans. 5 Several other single-nucleotide polymorphisms (SNPs) have also been described in the promoter region and at the 3 0 -end of the gene, 6,7 as well as an (AT)n in the 3 0 -unstranslated region. Owing to its crucial role in T-cell regulation, CTLA4 has been shown to act as a primary determinant of susceptibility to autoimmune disorders; actually, CTLA4 has been shown to be robustly associated with Graves’ disease, 8 while some of the SNPs mentioned above have been proposed as the susceptibility variant of disease or to be in strong linkage disequilibrium (LD) with it in different human populations. þ 49 G has been related to multiple diseases, including type 1 diabetes (T1D), 9 rheumatoid arthritis, 10 multiple sclerosis, 11 Graves’ dis- ease, 12 asthma, 13 or systemic sclerosis. 14 SNP 1722 has been related to lupus erythematosus in two contradictory studies, 15,16 while the 1661 G variant has been sug- gested to be associated with T1D. 17 Also, SNPs 6230 G, 10 717 G, 10 242 G and 12 310 T have been related to GD, autoimmune hypothyroidism and T1D. 7 Nonetheless, the associations have not been always replicated in dif- ferent populations. Haplotype and LD analyses have revealed them- selves as being a powerful tool to map disease genes 18 as well as to disclose the history of human populations (Betranpetit et al 19 and references therein). In that sense, studies in different loci have shown higher haplo- type diversity and lower LD values in African popu- lations than in non-African ones (Tishkoff et al 20 and Sawyer et al 21 and references therein), which is explained by the ‘out-of-Africa’ hypothesis of human dispersal, the subsequent founder effect in non-African populations and the larger effective population size of African populations. However, LD patterns are more complex than this simple difference between Africans and non- Africans. 21,22 Received 12 May 2005; revised 16 June 2005; accepted 20 June 2005; published online 21 July 2005 Correspondence: Dr D Comas, Unitat de Biologia Evolutiva, Departament de Cie `ncies Experimentals i de la Salut, Facultat de Cie `ncies de la Salut i de la Vida, Universitat Pompeu Fabra, Barcelona, Catalonia 08003, Spain. E-mail: [email protected] Genes and Immunity (2005) 6, 646–657 & 2005 Nature Publishing Group All rights reserved 1466-4879/05 $30.00 www.nature.com/gene
Transcript

FULL PAPER

Haplotype tagging efficiency in worldwide populations inCTLA4 gene

A Ramırez-Soriano1, O Lao1,2, M Soldevila1, F Calafell1, J Bertranpetit1 and D Comas1

1Unitat de Biologia Evolutiva, Departament de Ciencies Experimentals i de la Salut, Facultat de Ciencies de la Salut i de la Vida,Universitat Pompeu Fabra, Barcelona, Catalonia, Spain; 2Erasmus University Medical Centre, Rotterdam, The Netherlands

The cytotoxic T lymphocyte antigen 4 (CTLA4) acts as a potent negative regulator of T-cell response, and has been suggestedas a pivotal candidate gene for autoimmune disorders such as Graves’ disease, type 1 diabetes and autoimmunehypothyroidism, among others. Several single-nucleotide polymorphisms (SNPs) have been proposed as the susceptibilityvariants, or to be in strong linkage disequilibrium (LD) with the variant. Nevertheless, contradictory results have been found,which may be due to lack of knowledge of the genetic structure of CTLA4 and its geographic variation. We have typed 17 SNPsthroughout the CTLA4 gene region in order to analyze the haplotype diversity and LD structure in a worldwide population set(1262 individuals from 44 populations) to understand the variation pattern of the region. Allele and haplotype frequencydifferentiation between populations is consistent with genomewide averages and points to a lack of strong population-specificselection pressures. LD is high and its pattern is not significantly different within or between continents. However, haplotypecomposition is significantly different between geographical groups. A continent-specific set of haplotype tagging SNPs hasbeen designed to be used for future association studies. These are portable among populations, although their efficiency mightvary depending on the population haplotype spectrum.Genes and Immunity (2005) 6, 646–657. doi:10.1038/sj.gene.6364251; published online 21 July 2005

Keywords: CTLA4 gene; population diversity; haplotype; linkage disequilibrium; tag-SNP

Introduction

CTLA4 (cytotoxic T lymphocyte antigen 4) is a member ofthe immunoglobulin supergene family that functions as apotent negative regulator of T-cell response,1,2 its lackcausing massive lymphoproliferative disorders, fatalmultiorgan destruction and early death in knockoutand CTLA4-deficient mice.3 CTLA4 gene is located atposition 2q33 in a region also containing the ICOS andCD28 genes, and shares with the latter a high nucleotideidentity strongly suggesting that they are the result of agene duplication.4 CTLA4 is composed of four exonsencoding different functional domains: a leader se-quence, and extracellular, transmembrane and cytoplas-mic domains. Two amino-acid replacement SNPs havebeen described for this gene: 49 A/G (T17A) in exon 1and 2814 G/A in exon 2 (M90I), the latter described inAfrican Americans.5 Several other single-nucleotidepolymorphisms (SNPs) have also been described in thepromoter region and at the 30-end of the gene,6,7 as wellas an (AT)n in the 30-unstranslated region.

Owing to its crucial role in T-cell regulation, CTLA4has been shown to act as a primary determinant of

susceptibility to autoimmune disorders; actually, CTLA4has been shown to be robustly associated with Graves’disease,8 while some of the SNPs mentioned above havebeen proposed as the susceptibility variant of disease orto be in strong linkage disequilibrium (LD) with it indifferent human populations. þ 49 G has been related tomultiple diseases, including type 1 diabetes (T1D),9

rheumatoid arthritis,10 multiple sclerosis,11 Graves’ dis-ease,12 asthma,13 or systemic sclerosis.14 SNP �1722 hasbeen related to lupus erythematosus in two contradictorystudies,15,16 while the �1661 G variant has been sug-gested to be associated with T1D.17 Also, SNPs 6230 G,10 717 G, 10 242 G and 12 310 T have been related to GD,autoimmune hypothyroidism and T1D.7 Nonetheless,the associations have not been always replicated in dif-ferent populations.

Haplotype and LD analyses have revealed them-selves as being a powerful tool to map disease genes18

as well as to disclose the history of human populations(Betranpetit et al19 and references therein). In that sense,studies in different loci have shown higher haplo-type diversity and lower LD values in African popu-lations than in non-African ones (Tishkoff et al20 andSawyer et al21 and references therein), which is explainedby the ‘out-of-Africa’ hypothesis of human dispersal, thesubsequent founder effect in non-African populationsand the larger effective population size of Africanpopulations. However, LD patterns are more complexthan this simple difference between Africans and non-Africans.21,22

Received 12 May 2005; revised 16 June 2005; accepted 20 June 2005;published online 21 July 2005

Correspondence: Dr D Comas, Unitat de Biologia Evolutiva, Departamentde Ciencies Experimentals i de la Salut, Facultat de Ciencies de la Salut ide la Vida, Universitat Pompeu Fabra, Barcelona, Catalonia 08003, Spain.E-mail: [email protected]

Genes and Immunity (2005) 6, 646–657& 2005 Nature Publishing Group All rights reserved 1466-4879/05 $30.00

www.nature.com/gene

With the availability of large SNP data sets, a structureof haplotype blocks with a very low recombination rateseparated by hot-spots of recombination has beensuggested,23,24 the genome being organized in blocksshorter in Africans than in Europeans or Asians.25 Thepresence of these high LD regions allows definition ofhaplotype tagging SNPs (tag-SNPs), that is, the mini-mum set of SNPs that would capture most of thehaplotype diversity within a block.6 CTLA4 is one ofthe nine genes where the term tag-SNP was first defined.However, both the presence of blocks and the use of tag-SNPs as markers for disease genes are controversial.26,27

Moreover, tag-SNPs identified in one population maynot be applicable to other populations28 even if they seemto be quite general for Europeans.29 Furthermore,Crawford et al,30 after analyzing the sequence diversityin 100 human genes, concluded that the amount of LDand the haplotype structure should be empiricallyanalyzed in order to assess which and how many SNPsmust be typed in a specific gene to detect an associationwith a disease-causing SNP in a case–control study.

The aim of the present study is to analyze thehaplotype diversity within the CTLA4 gene region in aworldwide population set in order to establish itsstructure, define LD patterns and haplotypes, provide astandard set of markers with known variation for furtherstudies in populations of different ethnic or geographicorigin and shed light on the discrepancies found inassociation studies.

Results

Polymorphism and haplotype descriptionFigure 1a shows, for the 17 SNPs analyzed, their position,the allelic variants and the ancestral states deduced fromthe primate samples. All SNPs and populations were inHardy–Weinberg equilibrium after Bonferroni correctionfor multiple testing. Table 1 summarizes several para-

meters for each population and geographical region,including the number of fixed (monomorphic) alleles. Asshown in Figure 2, all the SNPs analyzed are highlypolymorphic worldwide except for SNP –658, whichshows little polymorphism in Europeans, North Africansand Middle Easterners, and reaches fixation in mostother populations; and SNP –319, which is also fixed in alarge number of samples.

Ancestral alleles (Table 2) were generally morefrequent than the derived alleles, which argues againsta recent selective sweep at CTLA4 in the human lineage.Divergence between the human and the chimpanzeeCTLA4 sequences is 1.085%, close to the average human–chimpanzee divergence31 and consistent with a lack ofaccelerated nucleotide divergence for this gene.

In order to test for SNP frequency differences, bothbetween SNPs and between populations, FST values werecalculated by SNP and by geographic regions (Table 2).The average FST for all 44 populations is B0.10, inagreement with previous studies of neutral markers.32–34

When instead of 44 independent populations the eightcontinental groups are considered, there is only a slightdecrease in the FST values. As expected, the highestcontinental FST value is found in sub-Saharan Africa,with very low values and extreme homogeneity withinEurope, North-Africa and Central/South Asia. For thesecontinents, AMOVA (analysis of molecular variance)shows no statistical significance, in contrast to the highlysignificant values for sub-Saharan Africa, Middle Eastand East Asia. When a two-tier AMOVA (continentalgroups and populations) is applied, most differences arewithin populations (89.2%), a high proportion amongregional groups (8.5%) and a mere 2.3% among popula-tions within regional groups.

When comparing FST among SNPs, a maximumdifferentiation is found for SNP 6230 (0.137 betweencontinental groups), which has been associated todisease.7 However, FST values are not significantlydifferent between disease-related and nondisease-related

Figure 1 Map of the CTLA4 gene region and the SNPs typed. Genes (CD28, CTLA4 and ICOS) are represented by gray boxes, CTLA4 exonsby black boxes and SNPs are represented by vertical tick marks. (a) Allelic variants found in the worldwide human population set andancestral states deduced from primate genotyping. (b) Suggested tag-SNPs for different geographical populations.

Haplotype structure of CTLA4A Ramırez-Soriano et al

647

Genes and Immunity

SNPs (Mann–Whitney test, P¼ 0.494). In order toevaluate the significance of the FST values, the FST foreach SNP was compared to the FST distributions

provided by genomewide studies: Akey et al35 for alarge set of SNPs in three populations and Kidd et al36 forgene-centred SNPs in a similar set of populations than

Table 1 Population descriptive parametersa grouped by continental region

Continental region Population 2Nb S fixed Kh Dh Kmax FNF Kh private (%)

Sub-Saharan Africa Bantu (BAN) 40 2 15 0.897470.0284 40 0.65 4 (26.67)Biaka Pygmies (BPY) 64 1 21 0.913270.0172 66 0.70 5 (23.81)Mandenka (MAN) 44 2 18 0.919770.0208 41 0.59 3 (16.67)Mbuti Pygmies (MPY) 30 2 15 0.931070.0241 30 0.53 4 (26.67)San (SAN)c 14 5Tanzan (TAN) 64 1 16 0.815070.0326 58 0.75 4 (25.00)Yoruba (YOR) 38 2 13 0.903370.0248 37 0.69 4 (30.77)Mean 294 16 0.8966 45 0.65 4.00 (24.93)

North-Africa Moroccan (MRA) 70 0 21 0.853870.0253 69 0.72 3 (14.29)Mozabite (MOZ) 60 2 20 0.828870.0424 58 0.68 4 (20.00)Saharaui (SAH) 106 0 28 0.865970.0225 104 0.75 7 (25.00)Mean 236 23 0.8495 77 0.71 4.67 (19.76)

Middle East Bedouin (BED) 88 0 21 0.865570.0258 87 0.77 3 (13.64)Druze (DRU) 88 0 26 0.929570.0122 84 0.71 9 (34.62)Palestinian (PAL) 102 0 20 0.867670.0199 95 0.81 5 (25.00)Mean 278 22 0.8875 89 0.76 5.67 (24.42)

Europe Adygei (ADY) 34 0 9 0.823570.0517 34 0.78 2 (22.22)Basque (BAS) 82 0 19 0.864570.0233 79 0.78 6 (31.58)Catalan (CAT) 40 0 14 0.847470.0380 40 0.68 1 (7.14)Continental Italian (CIT) 42 0 15 0.837470.0497 44 0.69 1 (6.67)French (FRE) 46 0 15 0.792370.0434 46 0.70 5 (33.33)French Basque (FRB) 48 0 11 0.792670.0419 47 0.80 1 (9.09)Orcadian (ORC) 32 1 6 0.691570.0666 32 0.87 0 (0.00)Russian (RUS) 50 0 14 0.760870.0430 49 0.74 0 (0.00)Sardinian (SAR) 54 0 14 0.843570.0335 38 0.67 2 (14.29)Spanish (SPA) 142 0 25 0.854470.0197 136 0.83 8 (32.00)Mean 570 14 0.8108 54 0.75 2.50 (14.52)

Central/South Asia Balochi (BAL) 48 0 14 0.773070.0584 47 0.73 2 (14.29)Brahui (BRA) 50 0 13 0.786970.0490 49 0.77 1 (7.69)Burusho (BUR) 50 0 11 0.640070.0704 49 0.81 0 (0.00)Hazara (HAZ) 46 0 9 0.766270.0394 47 0.85 0 (0.00)Kalash (KAL) 50 1 10 0.627870.0752 49 0.83 1 (10.00)Makrani (MAK) 44 1 11 0.592070.0863 43 0.78 2 (18.18)Pathan (PAT) 48 1 9 0.610870.0737 48 0.85 1 (11.11)Sindhi (SIN) 50 1 11 0.705370.0636 49 0.81 1 (9.09)Mean 386 11 0.6878 48 0.80 1.00 (8.80)

East Asia Cambodian (CAM) 22 1 7 0.796570.0669 22 0.75 1 (14.29)Han (HAN) 88 1 11 0.780670.0223 89 0.90 3 (27.27)Japanese (JAP) 54 1 13 0.805070.0297 56 0.79 3 (23.08)North China (NCH) 134 1 17 0.780070.0208 124 0.88 1 (5.88)South China (SCH) 140 1 20 0.859970.0146 132 0.86 6 (30.00)Yakut (YAK) 50 1 13 0.743770.0538 47 0.76 1 (7.69)Mean 486 14 0.7943 78 0.82 2.50 (18.04)

Oceania Nan Melanesian (NAN) 36 1 8 0.655670.0750 42 0.82 4 (50.00)Papuan (PAP) 30 2 8 0.816170.0427 30 0.78 2 (25.00)Mean 66 8 0.7359 36 0.80 2.50 (29.17)

America Colombian (COL) 26 1 7 0.766270.0522 26 0.79 2 (28.57)Karitiana (KAR) 46 2 4 0.529570.0444 45 0.95 0 (0.00)Maya (MAY) 46 0 10 0.742070.0424 45 0.81 0 (0.00)Pima (PIM) 46 5 8 0.715970.0393 45 0.86 1 (12.50)Surui (SUR) 42 7 3 0.486670.0530 40 0.97 0 (0.00)Mean 206 6 0.6480 40 0.88 0.60 (8.21)

a2N (number of chromosomes); S fixed (number of nonpolymorphic SNPs); Kh (total number of haplotypes); Dh (haplotype diversity); Kmax(number of haplotypes expected under equilibrium); FNF (fraction of haplotypes not found for each population); Kh private (%) (number ofnonshared haplotypes and percentage)b2N is not averaged. The value given corresponds to the total number of chromosomes in the geographical group.cSan haplotypes have not being estimated.

Haplotype structure of CTLA4A Ramırez-Soriano et al

648

Genes and Immunity

the one used here. No extreme FST values were foundwithin the CTLA4 region compared to the genomedistribution. This is a conservative approach, since Akeyet al35 considered only three main population groupswhich, by design, would yield lower FST values than ourworldwide sample. In relation to Kidd et al36 the highestFST value of our results (0.137) and the average FST (0.096,Table 2) fall close to the mean for 369 markers.36 Thissuggests that the geographic stratification shown byCTLA4 is well below a genomewide average and pointsto the absence of strong geographically specific selectivepressures.

A total of 181 haplotypes were estimated, of which 68account for 95% of the global variation. The cumulativefrequency of the nine most frequent haplotypes is 79%,the 10th adding only 1.27%. As shown in Figure 3, which

represents haplotype composition for world regions allover the world, neither the number of haplotypes northeir distribution are homogeneous even for the mostfrequent ones. Out of the 180 haplotypes, 72 (40%) werefound in more than one population and 63 (35%) werefound in more than one continental region. The ancestralhaplotype estimated from primate data is not present inour sample; the haplotypes most similar to the ancestralare h152 and h165, with only one change each. Bothhaplotypes are rare (maximum two chromosomes), h152being present in two chromosomes from two differentAfrican populations and a single h165 chromosomebeing private to Bantus.

Table 1 shows haplotype descriptive parameters foreach population and averaged by geographical groups.Haplotype diversity (Dh, that is, expected haplotype

Figure 2 SNP frequencies for each population. SSAFR stands for sub-Saharan Africa (Bantu, Biaka Pygmi, Mandenka, Mbuti Pygmi, San,Tanzan, Yoruba), NA for North-Africa (Mozabite, Moroccan, Saharawi), ME for Middle-East (Bedouin, Druze, Palestinian, Adygei), EUROPEfor Europeans (Basque, Catalan, French Basque, French, Continental Italian, Orcadian, Russian, Sardinian, Spanish), CSASIA for Central/South Asia (Balochi, Brahui, Burusho, Hazara, Kalash, Makrani, Pathan, Sindhi), EASIA for East Asia (Cambodian, South China, NorthChina, Han, Japanese, Yakut), O for Oceania (Nan, Papuan) and AME for America (Colombian, Karitiana, Maya, Pima and Surui). SNPs aredisplayed according to their position in the gene.

Haplotype structure of CTLA4A Ramırez-Soriano et al

649

Genes and Immunity

Table 2 Fst values for each SNP in the eight geographical regions

Sub-Saharan Africa North-Africa Middle East Europe Central/South Asia East Asia Oceania America Globala Globalb

No. of populations rs i.d. 7 3 3 10 8 6 2 5 44

�1765 T/C rs11571315 0.007 (0.61) �0.006 (0.30) 0.109 (0.24) 0.013 (0.34) 0.008 (0.30) 0.007 (0.63) 0.025 (0.26) 0.021 (0.54) 0.105 0.102�1722 T/C rs733618 0.025 (0.90) �0.010 (0.94) 0.005 (0.87) 0.017 (0.96) 0.001 (0.88) 0.037 (0.64) �0.008 (0.99) 0.103 (0.86) 0.108 0.095�1661 A/G rs4553808 0.036 (0.85) 0.003 (0.78) �0.001 (0.83) 0.003 (0.84) �0.016 (0.90) 0.088 (0.90) �0.027 (0.84) 0.010 (0.98) 0.034 0.015�1577 G/A rs11571316 0.022 (0.90) 0.011 (0.57) 0.063 (0.46) 0.004 (0.49) 0.003 (0.40) 0.021 (0.71) 0.079 (0.45) 0.038 (0.54) 0.121 0.122�658 C/T rs11571317 0.000 (1.00) 0.033 (0.94) 0.051 (0.94) 0.018 (0.90) �0.002 (0.98) 0.000 (1.00) 0.013 (0.98) 0.003 (1.00) 0.062 0.048�319 C/T rs5742909 �0.012 (1.00) 0.012 (0.97) 0.011 (0.92) �0.014 (0.91) �0.007 (0.95) 0.039 (0.92) 0.012 (0.99) �0.002 (0.98) 0.028 0.022+49 A/G rs231775 0.081 (0.61) �0.005 (0.72) 0.085 (0.77) 0.020 (0.72) 0.015 (0.70) 0.004 (0.36) 0.028 (0.71) 0.019 (0.50) 0.102 0.088+6230 G/A rs3087243 0.033 (0.90) 0.004 (0.57) 0.031 (0.42) 0.003 (0.46) 0.010 (0.37) 0.028 (0.71) 0.034 (0.45) 0.031 (0.58) 0.134 0.137+6249 G/A rs11571319 0.067 (0.76) �0.003 (0.78) �0.001 (0.88) �0.005 (0.86) �0.012 (0.91) 0.045 (0.92) �0.027 (0.85) 0.010 (0.93) 0.034 0.019+7092 A/G rs231723 0.065 (0.66) 0.038 (0.71) 0.130 (0.55) 0.021 (0.73) 0.024 (0.68) 0.005 (0.36) 0.019 (0.64) 0.023 (0.48) 0.101 0.065+7482 A/C rs10197010 0.099 (0.76) 0.008 (0.72) 0.012 (0.79) �0.004 (0.86) �0.013 (0.87) 0.042 (0.90) �0.027 (0.85) 0.026 (0.98) 0.037 0.017+7982 G/A rs231725 0.114 (0.64) 0.005 (0.74) 0.083 (0.88) 0.018 (0.76) 0.039 (0.77) 0.021 (0.46) 0.004 (0.75) 0.018 (0.52) 0.125 0.105+8173 C/T rs231726 0.160 (0.73) 0.001 (0.80) 0.076 (0.88) 0.024 (0.77) 0.032 (0.77) 0.017 (0.44) �0.009 (0.74) 0.014 (0.52) 0.122 0.097+10 242 T/G rs11571302 0.049 (0.35) 0.001 (0.47) 0.004 (0.59) �0.002 (0.60) 0.009 (0.63) 0.049 (0.32) 0.008 (0.60) 0.031 (0.42) 0.102 0.098+10 717 G/A rs7565213 0.041 (0.80) 0.005 (0.55) 0.010 (0.46) 0.004 (0.40) 0.007 (0.37) 0.058 (0.70) 0.008 (0.40) 0.031 (0.58) 0.104 0.098+12 131 C/G rs10932025 0.079 (0.76) �0.005 (0.78) 0.025 (0.77) �0.006 (0.83) �0.014 (0.87) 0.042 (0.90) �0.027 (0.85) 0.044 (0.98) 0.033 0.019+12 310 T/C rs11571297 0.052 (0.78) 0.004 (0.53) 0.004 (0.40) 0.003 (0.39) 0.010 (0.35) 0.049 (0.67) 0.008 (0.40) 0.031 (0.58) 0.101 0.095

Averagec 0.068 0.005 0.046 0.008 0.009 0.031 0.013 0.028 0.096e 0.085f

AMOVA P-valuesd o0.001 NS o0.001 NS NS o0.001 NS 0.047 o0.001

In parentheses, median frequency of the ancestral allele (marked in bold in the first column).aGlobal comparison of all populations for each SNP.bGlobal comparison of the eight geographical groups for each SNP.cAverage Fst over all loci for each continental region.dAnalysis of the molecular variance (AMOVA). NS¼not significant.eGlobal comparison of all populations for the average Fst over all loci.fGlobal comparison of the eight geographical groups for the average Fst over all loci.

Haplotype

structureofCTLA4

ARam

ırez-Sorianoetal

650

Gen

es

an

dIm

mu

nity

heterozygosity) values, as well as the number ofhaplotypes (Kh) and the number of private haplotypes(Kh private), are significantly different between geogra-phical groups (Kruskal–Wallis test, P¼ 0.0001, 0.0021 and0.0046, respectively). In a recent extensive gene-centeredanalysis,30 the mean number of common haplotypes(frequency 45%) varied greatly from gene to gene, witha mean of 5.0 and 4.5 in populations of African andEuropean descent, respectively. In the CTLA4 regionsimilar values are found, African samples having themaximum mean number of common haplotypes (5.8),quite high in the Middle East (5.3) and similar in theother groups (around 4), except for the low value in theAmericas (2.6).

LD analysis and haplotype tag-SNPsAn overall measure of haplotype diversity in the CTLA4region can be obtained with the FNF statistic (seeMaterials and methods), shown in Table 1. In general,FNF values are high, corresponding to a dearth ofhaplotypes and, thus, of high LD in the region.Furthermore, these values are significantly differentbetween geographical groups (Kruskal–Wallis test,P¼ 0.0020), being lowest in sub-Saharan Africans,according to their higher haplotype diversity, and FNFvalues are very high in the Americas, with a lowdiversity and number of haplotypes.

The LD structure of CTLA4 (measured with D0 or r2,with similar results) shows that the whole region ofaround 14 kb presents substantial LD, as describedpreviously,7 and now confirmed for all the worldwidepopulations studied, and depending on block definition,it may be contained within a single LD block. A firstapproach to test whether the amount of LD was similaramong populations consisted of taking r2 values foradjacent SNPs and comparing them among populations,by means of Friedman test, and comparing the average r2

values of a continental group with the average of allother groups, using Wilcoxon’s test. No statisticaldifferences in r2 values were found among populations.Therefore, populations within continental groups arehomogeneous regarding their amounts of LD, as are alsoall worldwide regional groups. Next, we tested whetherthe structure of LD was similar by computing thecorrelation between the same r2 values between pairs ofpopulations. Comparisons within continental regions

showed high correlations between population pairs, allof them being statistically significant (Po0.05). Whenpairs of populations from different continental regionswere compared, the correlations were also significant,with the exception of the two Pygmy populations, whichpresented low correlation and nonsignificant values withmost of the analyzed populations (data not shown). Ifinstead of just taking the diagonal, the whole LD matrixis considered, correlations with Mantel’s tests givesignificant correlation values both between populationspairs within regions (with the exception of MbutiPygmies) and for pairs of continental regions (data notshown). Thus, both the amount and pattern of LD in theCTLA4 gene region was similar across human popula-tions.

Considering the high LD in the region, as well as itssimilarity among populations, we would expecttag-SNPs to be extremely useful for the analysis ofthe CTLA4 gene and portable across populations. Forthis purpose, we have defined sets of tag-SNPs for thewhole CTLA4 region in our worldwide sample forgeographical regions, grouping the population haplo-types in each continent. As shown in Figure 1b,continental regions can be described using only two-to-four tag-SNPs, with geographical regions with higherLD needing less tag-SNPs. It is interesting to notethat none of the seven SNPs at the 30-end has beenselected as tag-SNP in any continental region, and, inmost cases, only those at the 50-end are defined astag-SNPs. The number of common haplotypes (thosewith frequencies over 5%) and the fraction of thetotal haplotypes detectable with the continentaltag-SNP sets in each population is shown in Table 3.The tag efficiency of the tag-SNP sets for each continentalregion is defined as the frequency of total haplotypesin the population divided by the number of tag-SNPs inthe set. For instance, the European tag-SNP set consistedof three SNPs (�1765, �1661 and –658; Figure 1b) scoresB88% of the total common haplotypes (which repre-sent B80% of the total European haplotypes). Thus, thedetected haplotypes represent B71% of the totalEuropean haplotypes, and therefore, the tag efficiencyof each of the three SNP in the European tag-SNP setis B24%. As expected, tag efficiency is higher in thoseregions with stronger LD (ie Oceania and America) andlower in sub-Saharan Africans.

Figure 3 Frequencies of the nine more frequent haplotypes represented by geographical groups. Pies are proportional to the number ofindividuals typed and contain the data of all populations within the geographical region. Haplotype diversity for each geographical group isshown besides the continental pies.

Haplotype structure of CTLA4A Ramırez-Soriano et al

651

Genes and Immunity

Table 3 Tag-SNP haplotype scoring and tag efficiency in the CTLA4 gene in worldwide populations

Present tag-SNPsa Johnson’s tag-SNPsb Present tag-SNPsb

Continentalregion

Population Kh common(%)c

Kdetected(%)d

Tagefficiency

Kh common(%)c

Kdetected(%)d

Tagefficiency

Kdetected(%)d

Tagefficiency

Sub-Saharan Bantu (BAN) 7 (80.00) 5 (84.38) 7 (95.00) 4 (73.68) 5 (84.21)Africa Biaka Pygmies

(BPY)6 (68.57) 4 (70.83) 6 (88.57) 4 (87.13) 4 (80.69)

Mandenka(MAN)

5 (63.64) 4 (89.31) 6 (93.18) 4 (60.94) 5 (82.94)

Mbuti Pygmies(MPY)

6 (70.00) 3 (61.86) 5 (93.33) 3 (75.04) 3 (78.57)

Tanzan (TAN) 3 (71.88) 3 (100.00) 5 (92.19) 4 (89.80) 4 (89.80)Yoruba (YOR) 9 (89.47) 5 (67.48) 7 (100.00) 4 (73.70) 5 (76.30)Mean 6.00 (73.93) 4.00 (78.98) 14.60 6.00 (93.71) 3.83 (76.71) 14.38 4.33 (82.09) 19.23

North-Africa Moroccan(MRA)

4 (71.43) 4 (100.00) 4 (82.86) 4 (100.00) 4 (100.00)

Mozabite (MOZ) 4 (65.00) 3 (92.31) 4 (86.67) 3 (92.27) 3 (92.27)Saharaui (SAH) 4 (66.04) 4 (100.00) 5 (85.85) 5 (100.00) 4 (93.36)Mean 4.00 (67.49) 3.67 (97.44) 21.92 4.33 (85.12) 4.00 (97.42) 16.58 3.67 (95.21) 27.01

Middle East Bedouin (BED) 5 (66.67) 3 (81.55) 5 (81.11) 5 (100.00) 2 (71.15)Druze (DRU) 6 (62.22) 3 (55.32) 7 (92.22) 6 (90.35) 2 (43.29)Palestinian(PAL)

5 (69.61) 3 (83.05) 6 (89.22) 6 (100.00) 2 (61.44)

Mean 5.33 (66.17) 3.00 (73.31) 24.25 6.00 (87.52) 5.67 (96.78) 16.94 2.00 (58.63) 51.31e

Europe Adygei (ADY) 8 (97.06) 4 (72.49) 7 (97.06) 6 (87.84) 5 (81.76)Basque (BAS) 6 (78.05) 4 (84.37) 6 (90.24) 5 (91.91) 4 (82.49)Catalan (CAT) 6 (80.00) 4 (87.50) 5 (92.50) 5 (100.00) 4 (94.59)ContinentalItalian (CIT)

6 (77.27) 4 (79.42) 6 (90.91) 5 (92.52) 4 (82.51)

French (FRE) 3 (71.74) 3 (100.00) 5 (93.48) 5 (100.00) 3 (81.39)French Basque(FRB)

4 (79.17) 3 (89.52) 4 (87.50) 4 (100.00) 3 (90.51)

Orcadian (ORC) 5 (96.88) 4 (93.50) 5 (100.00) 4 (93.70) 4 (93.70)Russian (RUS) 3 (76.00) 3 (100.00) 3 (84.00) 3 (100.00) 3 (100.00)Sardinian (SAR) 6 (79.63) 4 (85.93) 6 (96.30) 6 (100.00) 5 (82.66)Spanish (SPA) 4 (66.20) 3 (90.48) 5 (78.17) 4 (90.15) 4 (90.15)Mean 5.10 (80.20) 3.60 (88.32) 23.61 5.20 (91.02) 4.70 (95.61) 17.40 3.90 (87.98) 26.69

Central/SouthAsia

Balochi (BAL) 4 (72.92) 3 (88.62) 3 (81.25) 3 (100.00) 3 (100.00)

Brahui (BRA) 4 (76.00) 2 (78.95) 4 (90.00) 4 (100.00) 3 (91.11)Burusho (BUR) 3 (82.00) 3 (100.00) 4 (92.00) 4 (100.00) 3 (93.48)Hazara (HAZ) 4 (87.50) 3 (88.11) 3 (91.67) 3 (100.00) 2 (88.65)Kalash (KAL) 5 (90.00) 3 (82.22) 4 (96.00) 4 (100.00) 3 (91.67)Makrani (MAK) 3 (77.27) 2 (91.20) 4 (93.18) 4 (100.00) 2 (85.40)Pathan (PAT) 3 (83.33) 3 (100.00) 3 (93.75) 3 (100.00) 3 (100.00)Sindhi (SIN) 4 (82.00) 3 (92.68) 5 (92.00) 4 (93.48) 3 (86.96)Mean 3.75 (81.38) 2.75 (90.22) 36.71 3.75 (91.23) 3.63 (99.18) 18.10 2.75 (92.16) 42.04

East Asia Cambodian(CAM)

5 (90.91) 4 (85.04) 4 (95.45) 4 (100.00) 4 (100.00)

Han (HAN) 4 (86.67) 4 (100.00) 4 (98.89) 4 (100.00) 4 (100.00)Japanese (JAP) 3 (78.57) 3 (100.00) 3 (91.07) 3 (100.00) 3 (100.00)North China(NCH)

4 (86.03) 4 (100.00) 4 (97.79) 4 (100.00) 4 (100.00)

South China(SCH)

5 (77.86) 4 (92.68) 4 (87.86) 4 (100.00) 4 (100.00)

Yakut (YAK) 3 (74.00) 3 (100.00) 3 (94.00) 3 (100.00) 3 (100.00)Mean 4.00 (82.34) 3.67 (96.29) 26.43 3.67 (94.18) 3.67 (100.00) 18.84 3.67 (100.00) 31.39

Oceania Nan Melanesian(NAN)

3 (83.33) 3 (100.00) 3 (90.48) 3 (100.00) 2 (86.85)

Papuan (PAP) 5 (90.00) 4 (92.56) 3 (93.33) 3 (100.00) 2 (82.11)Mean 4.00 (86.67) 3.50 (96.28) 41.72 3.00 (91.90) 3.00 (100.00) 18.38 2.00 (84.48) 77.64e

America Colombian(COL)

3 (84.62) 3 (100.00) 3 (88.46) 3 (100.00) 3 (100.00)

Karitiana (KAR) 2 (95.65) 2 (100.00) 2 (95.65) 2 (100.00) 2 (100.00)

Haplotype structure of CTLA4A Ramırez-Soriano et al

652

Genes and Immunity

In a previous study of the CTLA4 region,6 five tag-SNPs were described for the region contained betweenSNPs -1765 to þ 6249 in UK control individuals. We havetested the original five tag-SNPs defined by Johnson et al6

(�1722, �1661, �658, �319 and þ 49) to assess their tagefficiency between populations. With this objective, newhaplotypes have been estimated considering only thegene region reported in that previous study (ie frompositions –1765 to 6249, encompassing nine SNPs).Table 3 shows the fraction of detected haplotypes usingthe five previously defined tag-SNPs. These tag-SNPs,defined in Europeans, detect B87% of haplotypes amongEuropeans (population values between 70 and 96%).These values, as expected, are low for Africans and highfor Amerindians but, surprisingly, are very high (B94%)for East Asian populations, with population valuesreaching 99%. These tag-SNPs can be consideredportable among populations with the described limita-tions, that is, from populations needing more tag-SNPs topopulations needing less tag-SNPs, but not vice versa.

When the present continental tag-SNPs are applied tothe estimated haplotypes from positions –1765 to 6249,the amount of detected haplotypes is similar to theamount provided by the previously defined tag-SNPs,but the efficiency of each tag-SNP (defined as theproportion of the whole variation each one explains)increases dramatically, and these values are similar tothose found when the whole region is considered. As inOceania and Middle East, one of the present tag-SNP liesoutside the haplotypes estimated from �1765 to 6249,new tag-SNPs have been recalculated only for thisregion. In North-Africa, East Asia, Oceania and America,the tag-SNPs have not changed, while in sub-SaharanAfrica, Europe and Central/South Asia, one extra SNP isneeded and in Middle East three SNPs have been added.This increase in the number of SNPs seen in somecontinental groups can be explained by the effect ofadding some low-frequency haplotypes that shared thefirst nine SNPs.

Discussion

Polymorphism patternsThe genetic diversity pattern found in the CTLA4 regionin SNP and haplotype frequencies, FST and genetic

structure, and LD patterns, agree with the populationhistory of the samples analyzed and the general patternof diversity that has been shown in other global studiesin different gene regions,19,21 and therefore can beexplained by the ‘out-of-Africa’ origin of modern hu-mans, with more genetic diversity, less LD and higherheterogeneity among populations in Africa than else-where. However, these patterns could have been affectedby other processes such as ascertainment bias of themarkers or selective pressures.

On the one hand, as the analyzed SNPs were describedin European populations,6,7 a certain bias in theirdescription could exist. Nevertheless, the present world-wide results do not show this bias as, with minorexceptions especially in Native Americans, all SNPs areuniversally polymorphic. In fact, SNP ascertainment biaswould not be relevant for ascertaining common haplo-types, especially in genetic regions, like the CTLA4, withhigh LD.

On the other hand, in contrast to population processesthat affect the whole genome, gene factors such asdifferential selection can shape the haplotype structureand LD in specific gene regions and might result inpopulation differences. Since CTLA4 has a key role inthe immune system and has been related to autoimmunediseases, the exposure to geographic differential selectivepressures, such as the presence of pathogens that couldhave affected the CTLA4 gene structure, could beenvisaged. This has been the case of the selection forresistance to malaria detected in several genes such asG6PD,37 Duffy38 and TNFSF5.39 Although differencesamong populations in allele frequencies have been foundin the CTLA4 gene, none of the SNPs analyzed presentedsignificantly higher FST values compared to an SNPgenomewide distribution of FST.35,36 This fact stresses thatthe differences among populations are not unexpectedlylarge and points to a lack of local selective pressuresacross human populations on the CTLA4 gene, at leastin recent times.

The analysis of the CTLA4 region has shown that thereis a clear population structure in haplotype frequencies,but there are minor differences in the LD amongpopulations. Thus, genomic processes might haveaffected all global populations in the same manner,giving similar LD patterns in the CTLA4 region, whereasthe differences found, basically in haplotype and SNP

Table 3 Continued

Present tag-SNPsa Johnson’s tag-SNPsb Present tag-SNPsb

Continentalregion

Population Kh common(%)c

Kdetected(%)d

Tagefficiency

Kh common(%)c

Kdetected(%)d

Tagefficiency

Kdetected(%)d

Tagefficiency

Maya (MAY) 3 (82.61) 3 (100.00) 3 (86.96) 3 (100.00) 3 (100.00)Pima (PIM) 3 (89.13) 3 (100.00) 3 (91.30) 3 (100.00) 3 (100.00)Surui (SUR) 2 (97.61) 2 (100.00) 2 (97.62) 2 (100.00) 2 (100.00)Mean 2.60 (89.93) 2.60 (100.00) 44.97 2.60 (92.00) 2.60 (100.00) 18.40 2.60 (100.00) 46.00

aHaplotypes estimated using all 17 SNPs.bHaplotypes estimated using SNPs from �1765 to +6249 (nine SNPs).cIn brackets, percentage of common haplotypes (frequency 45%).dIn brackets, percentage of the total common haplotypes (frequency 45%) detectable by using tag-SNPs.eOnly one tag-SNP has been used, since the second tag-SNP lies out of the analyzed region.

Haplotype structure of CTLA4A Ramırez-Soriano et al

653

Genes and Immunity

frequencies, might be explained by demographic pro-cesses, such as expansions, founder effects and migra-tions.

LD patternsThe present study has been performed on a regionspanning 14 kb, which has been described as having alow recombination rate (0.3 cM/Mb).40 This points tohigh LD over the region, which has already beenreported by Ueda et al.7 Our results not only show thishigh LD but also that it persists across populationsworldwide.

Previous studies in other genes, such as PAH41 or thePKLR-GBA gene region,42 have shown a general patternof moderate geographical structure of LD with highervalues out of Africa and low values in sub-SaharanAfrica. However, a number of deviations from thisoverall pattern has been described, ranging from extremedifferences between African and non-African popula-tions, such as those found in the CD4 locus,20 DRD243 orDM,44 among others, to similar LD values in African andnon-African populations such as the CFTR gene.45 Thedifferent geographical patterns observed in different locican be interpreted as the distribution of a stochasticvariable, in which CD4 and CFTR would be the oppositeextremes. When comparing our results with theseprevious studies, LD patterns in CTLA4 show a similarpattern to that found in the CFTR region, that is, lackof strong differences among human groups.

Until the present CTLA4 analysis, the only gene relatedto immunity studied in worldwide samples was CD4.However, their LD patterns do not coincide, as the CD4locus showed a much higher LD geographical structurethan CTLA4. This difference could be related to theirfunction, with possible diversifying selection in CD4 butnot in CTLA4, and also it could be due to stochasticvariation. More immunity genes should be globallystudied to assess whether an LD trend in immunitygenes exists.

Tag-SNPsIt would be expected that tag-SNPs would be particu-larly efficient in the CTLA4 region, given its high LD. Aprevious study (Gonzalez-Neira et al46) has shown thattag-SNPs are portable among populations even acrossregional groups, which means that most of the haplotypediversity in different populations can be scored by aunique set of tag-SNPs. However, those results wereobtained in a gene-free, low-LD region and may not beextended to other regions with different properties.The present analysis provides some tag-SNP sets fordifferent continental regions that score a large andsimilar amount of the total haplotype variation. More-over, these tag-SNP sets appear to be portable betweenpopulations within continental regions since no signifi-cant differences are found in the amount of haplotypevariation detected. In contrast, the portability of thesetag-SNP sets between continental regions depends on thenumber of tag-SNPs that form a continental set. Forinstance, the sub-Saharan tag-SNP set, formed by fourSNPs, would score most of the variation in the rest of thecontinents, but the scores of the American set applied toother continents would yield poorer results. Therefore,the larger the number of tag-SNPs in a set, the moreportable to other geographical regions. Nonetheless, the

tag efficiency (defined as the amount of variationdetected by each tag-SNP) would be dramaticallyreduced in some continental areas.

Owing to the high-throughput technologies available,a reduction from 17 to two-to-four SNPs to be typed mayseem unnecessary. However, at a whole-genome scale, orwith a candidate-gene approach, if such levels ofreductions were achieved in each of the hundreds orthousands of genes to be typed, the saving in costs andtime would be more than justified, and would certainlyallow to increase the number of genes to be typed.

Impact on association studiesUp to now, association studies in CTLA4 have led tocontradictory results in different populations. Our resultsshow global FSTB10%, that is, that populations world-wide for the CTLA4 gene are not more different fromeach other than the average calculated in previousstudies of neutral markers.32–34 However, significantdifferences in SNP frequencies and haplotype composi-tion exist not only between geographical groups but also,in some cases, within groups. This could explain thecontradictions found in the literature, especially con-sidering that association studies work with very largesample sizes, which would make significant smalldifferences not reflected here. These results, then, are aclear indication that the design of new case–controlstudies should take into account the heterogeneity bothat inter- and at intragroup level, and points to the need ofa very well matched control population to compare withpatients: different ethnic or geographic extraction couldeasily jeopardize the differences found between bothgroups.

The knowledge of the haplotype structure and LDpatterns in specific regions, such as the CTLA4 gene, inworldwide populations will shed light not only on thepopulation history of the populations analyzed but alsoon the genomic processes that could be pivotal forbiomedical interests.18

Materials and methods

SamplesA total of 1262 individuals from 44 human populationswere analyzed for the CTLA4 gene region. In all, 38populations are those from the HGDP-CEPH HumanGenome Diversity Cell Line Panel,47 which containslymphoblastoid cell lines from 1051 individuals in 51populations located in all major geographic regions ofthe world. The rest of populations were chosen toimprove the coverage in some geographic areas (Saha-rawi, Tanzanian, Moroccan, Catalan, Basque and Span-ish). All samples were obtained with appropriateinformed consent. Sample sizes varies from seven (San)to 71 (Spanish) individuals, most of them having around25 individuals (Table 1). Within the HGDP-CEPH panel,some populations represented by a reduced number ofindividuals were grouped and analyzed together: Con-tinental Italy (North Italian, Tuscan), North China (Daur,Hezhen, Mongola, Oroquen, Tu, Uygur, Xibo) and SouthChina (Dai, Lahu, Miazou, Nai, She, Tujia, Yizu). Toelucidate the ancestral state of the SNPs analyzed, DNAsfrom one chimpanzee (Pan troglodytes), one gorilla

Haplotype structure of CTLA4A Ramırez-Soriano et al

654

Genes and Immunity

(Gorilla gorilla) and one orangutan (Pongo pygmaeus) wereused.

SNP typingA total of 17 SNPs were typed, seven of them at the 50-end and first exon of the gene6 and 10 at the 30-end.7

Setting the first nucleotide at the Met initiator codon as 1,the nucleotide positions of the SNPs typed were: �1765,�1722, �1661, �1577, �658, �319, 49, 6230, 6249, 7092,7482, 7982, 8173, 10 242, 10 717, 12 131 and 12310. Thisnumbering refers to the cds as in GenBank entryAF411058. All of these SNPs had been submitted todbSNP and their rs i.d.’s are given in Table 2. SNPsmostly encompass the 50 and 30 regions because thesevariants are those relevant for disease-association studiesand the whole coding region lies within a clear and verystrong LD block,6,7 a fact confirmed in the present study.

PCR amplificationThe 50-end region containing SNPs from �1765 to þ 49was amplified using conditions described previously.17

The 30 region containing the rest of the SNPs wasamplified in two fragments in a multiplex reaction withthe following cycling conditions: 941C for 5 min; 35 cyclesof 941C for 30 s, 631C for 30 s and 721C for 45 s; and a finalelongation step of 721C for 5 s. PCR products werepurified using EXO-SAP (5 units of EXO and 25 units ofSAP per reaction), with an incubation of 60 min at 371Cfollowed by 15 min at 721C to inactivate the enzyme.

SNaPshot reactionAll SNPs were typed using the SNaPshott Multiplextechnique (Applied Biosystems), a single-base primerextension method that uses labelled ddNTPs to inter-rogate SNPs. The single-base primer extension wasperformed following supplier’s recommendations usingdifferent primer lengths. Two different SNaPshot reac-tions were performed, one for the 50-end and first exonSNPs and the other for the 30-end SNPs. Unincorporated-labelled ddNTPs were removed by adding one unit ofCIP to the primer extension products for 60 min at 371Cfollowed by 15 min at 721C to inactivate the enzyme.Products were analyzed in an ABI PRISM3100 and ABIPRISM GeneScan Analysis Software v3.7 (AppliedBiosystems). LIZ-120 (Applied Biosystems) was used assize marker.

In order to check the accuracy of the SNaPshottechnique, SNP 6230 was also genotyped for the world-wide diversity panel with TaqMan’s Assays-by-DesignSM

service (Applied Biosystems), which consisted of a mixof unlabelled PCR primers and TaqMans MGB probes(FAMt and VICs dye-labelled). We performed the assayusing TaqMan Universal PCR Master Mix followingsupplier’s recommendations. Results were analyzedusing SDS software package version 2.1 (AppliedBiosystems). There were minor discrepancies betweenTaqMan and SNaPshot genotypes (three samples of thepanel were scored as heterozygotes by the SNaPshottechnique and as homozygotes by TaqMan), whichimplies that the SNaPshot genotype assignment is99.7% concordant with the TaqMan assays.

Ancestral-type inferenceAll SNPs were typed in three primate individuals, onechimpanzee (P. troglodytes), one gorilla (G. gorilla) and

one orangutan (P. pygmaeus) as described above. Sincesome of the SNPs were difficult to type in the primatesusing the SNaPshot technology, we sequenced thefragment containing the seven SNPs located at the 50-end of the gene and the SNP 10 717 in the three primates.Sequence was carried out using the following cyclingconditions: 3 min at 941C and 25 cycles of 961C 10 s, 501C5 s and 601C 4 min; and precipitated using BigDyes 3.0protocol (Applied Biosystems). Chimpanzee results havealso been checked using the chimpanzee genome inEnsembl (http://www.ensembl.org/) and the ancestralpositions have been determined using the approach inIyengar et al.48

Data analysisHaplotype frequencies were estimated using a Bayesianalgorithm as implemented in the PHASE packageversion 2.0.49 Populations with sample sizes of less than20 chromosomes (namely, the San) were dropped fromthe haplotype analyses given the high uncertaintyassociated with the haplotype estimation.

FST gives a measure of the proportion of the geneticvariance explained by differences among populations.Incorporating the molecular distance between alleles toFST, FST is obtained. Hardy–Weinberg equilibrium, FST

and FST values, and AMOVA were calculated usingArlequin software.50 LD measures D0 and r2 werecalculated using Haploview.

Friedman and Wilcoxon nonparametric tests wereperformed using Statistica. Mantel tests to compare r2

matrices were computed using Passage version 1.0.Comparisons were made between populations withineach continental group. Subsequently, r2 values wererecalculated for populations pooled into geographicalgroups and Mantel tests were applied.

We computed the FNF statistic,45 which can beinterpreted as the fraction of haplotypes not found in apopulation, where the number of haplotypes expectedunder linkage equilibrium, given the sample size andthe allele frequencies, is compared with the number ofobserved haplotypes in each population. It can becalculated as FNF¼ 1�Kh/Kmax, where Kh is the numberof haplotypes found in the sample and Kmax is themaximum possible number of different haplotypesexpected under total linkage equilibrium given the sizeand allele frequencies of the population. FNF values areindependent of the number of loci and are expected toincrease with LD.

New tag-SNPs were described both using htSNP26

(http://wwwgene.cimr.cam.ac.uk/clayton/software/stata/)and BEST.51 These two programs identify the minimumset of SNPs accounting for a given minimum fractionof the genomic variation. One method to define htSNPs,implemented in htSNP2, seeks to find the minimumset of SNPs so that the proportion of the total varianceexplained (as measured by the multiple regressionR2 statistic) is above a certain threshold (0.8 in ourcase). The second algorithm used, implemented in BEST,takes as input a set S of haplotypes. The algorithmreturns a minimal set of SNPs from which all of theother SNPs in the haplotype set can be derived. Thetag-SNPs obtained using htSNP2 are concordant withthe ones obtained with BEST (or, in the cases whereBEST provides more than one set, coincident with one

Haplotype structure of CTLA4A Ramırez-Soriano et al

655

Genes and Immunity

of them). Haplotype tag-SNP portability was testedusing htSNP2.

Acknowledgements

We thank Elena Bosch, Arcadi Navarro, MichelleGardner, Lourdes Sampietro and Monica Valles (Uni-versitat Pompeu Fabra) for helpful advise and technicalsupport. Mark Shriver (Pennsylvania State University)and Kenneth K Kidd (Yale University) kindly providedthe raw data for the FST comparison. We thank HowardCann (CEPH, Paris) for providing the HGDP-CEPHpanel. We also thank Anna Perez-Lezaun, Roger Angladaand Stephanie Plaza (Servei de Genomica, UniversitatPompeu Fabra) for technical support. This research wassupported by Ministerio de Educacion y Ciencia of theSpanish Government (BFU2004-04208/BMC) and Depar-tament d’Universitats, Recerca i Societat de la Informacio(DURSI) of the Generalitat de Catalunya.

References

1 Walunas TL, Lenschow DJ, Bakker CY et al. CTLA-4 canfunction as a negative regulator of T cell activation. Immunity1994; 1: 405–413.

2 Walunas TL, Bakker CY, Bluestone JA. CTLA-4 ligationblocks CD28-dependent T cell activation. J Exp Med 1996; 183:2541–2550.

3 Khattri R, Auger JA, Griffin MD, Sharpe AH, Bluestone JA.Lymphoproliferative disorder in CTLA-4 knockout mice ischaracterized by CD28-regulated activation of Th2 responses.J Immunol 1999; 162: 5784–5791.

4 Harper K, Balzano C, Rouvier E, Mattei MG, Luciani MF,Golstein P. CTLA-4 and CD28 activated lymphocyte moleculesare closely related in both mouse and human as to sequence,message expression, gene structure, and chromosomallocation. J Immunol 1991; 147: 1037–1044.

5 Martin AM, Athanasiadis G, Greshock JD et al. Populationfrequencies of single nucleotide polymorphisms (SNPs) inimmuno-modulatory genes. Hum Hered 2003; 55: 171–178.

6 Johnson GC, Esposito L, Barratt BJ et al. Haplotype tagging forthe identification of common disease genes. Nat Genet 2001; 29:233–237.

7 Ueda H, Howson JM, Esposito L et al. Association of the T-cellregulatory gene CTLA4 with susceptibility to autoimmunedisease. Nature 2003; 423: 506–511.

8 Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. Acomprehensive review of genetic association studies. GenetMed 2002; 4: 45–61.

9 Ide A, Kawasaki E, Abiru N et al. Association between IL-18gene promoter polymorphisms and CTLA-4 gene 49A/Gpolymorphism in Japanese patients with type 1 diabetes.J Autoimmun 2004; 22 (1): 73–78.

10 Lee CS, Lee YJ, Liu HF et al. Association of CTLA4 gene A–Gpolymorphism with rheumatoid arthritis in Chinese. ClinRheumatol 2003; 22: 221–224.

11 Teutsch SM, Booth DR, Bennetts BH, Heard RN, Stewart GJ.Association of common T cell activation gene polymorphismswith multiple sclerosis in Australian patients. J Neuroimmunol2004; 148: 218–230.

12 Vaidya B, Oakes EJ, Imrie H et al. CTLA4 gene and Graves’disease: association of Graves’ disease with the CTLA4 exon 1and intron 1 polymorphisms, but not with the promoterpolymorphism. Clin Endocrinol (Oxford) 2003; 58: 732–735.

13 van Oosterhout AJ, Deurloo DT, Groot PC. Cytotoxic Tlymphocyte antigen 4 polymorphisms and allergic asthma.Clin Exp Allergy 2004; 34: 4–8.

14 Hudson LL, Silver RM, Pandey JP. Ethnic differences incytotoxic T lymphocyte associated antigen 4 genotypeassociations with systemic sclerosis. J Rheumatol 2004; 31:85–87.

15 Hudson LL, Rocca K, Song YW, Pandey JP. CTLA-4 genepolymorphisms in systemic lupus erythematosus: a highlysignificant association with a determinant in the promoterregion. Hum Genet 2002; 111: 452–455.

16 Fernandez-Blanco L, Perez-Pampin E, Gomez-Reino JJ, Gon-zalez A. A CTLA-4 polymorphism associated with suscept-ibility to systemic lupus erythematosus. Arthritis Rheum 2004;50: 328–329.

17 Bouqbis L, Izaabel H, Akhayat O et al. Association of theCTLA4 promoter region (�1661G allele) with type 1 diabetesin the South Moroccan population. Genes Immun 2003; 4:132–137.

18 Cardon LR, Bell JI. Association study designs for complexdiseases. Nat Rev Genet 2001; 2: 91–99.

19 Bertranpetit J, Calafell F, Comas D, Gonzalez-Neira A,Navarro A. Structure of linkage disequilibrium in humans:genome factors and population stratification. Cold Spring HarbSymp Quant Biol 2003; 68: 79–88.

20 Tishkoff SA, Dietzsch E, Speed W et al. Global patterns oflinkage disequilibrium at the CD4 locus and modern humanorigins. Science 1996; 271: 1380–1387.

21 Sawyer SL, Mukherjee N, Pakstis AJ et al. Linkage disequili-brium patterns vary substantially among populations. Eur JHum Genet 2005; 13: 677–686.

22 Gonzalez-Neira A, Calafell F, Navarro A et al. Geographicstratification of linkage disequilibrium: a worldwide popula-tion study in a region of chromosome 22. Hum Genomics 2004;1: 399–409.

23 Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES. High-resolution haplotype structure in the human genome. NatGenet 2001; 29: 229–232.

24 Wall JD, Pritchard JK. Haplotype blocks and linkage dis-equilibrium in the human genome. Nat Rev Genet 2003; 4:587–597.

25 Gabriel SB, Schaffner SF, Nguyen H et al. The structure ofhaplotype blocks in the human genome. Science 2002; 296:2225–2229.

26 Clark AG, Weiss KM, Nickerson DA et al. Haplotype structureand population genetic inferences from nucleotide-sequencevariation in human lipoprotein lipase. Am J Hum Genet 1998;63: 595–612.

27 Wang N, Akey JM, Zhang K, Chakraborty R, Jin L.Distribution of recombination crossovers and the originof haplotype blocks: the interplay of population history,recombination, and mutation. Am J Hum Genet 2002; 71:1227–1234.

28 Weale ME, Depondt C, Macdonald SJ et al. Selection andevaluation of tagging SNPs in the neuronal-sodium-channelgene SCN1A: implications for linkage-disequilibrium genemapping. Am J Hum Genet 2003; 73: 551–565.

29 Mueller JC, Lohmussaar E, Magi R et al. Linkage disequili-brium patterns and tagSNP transferability among Europeanpopulations. Am J Hum Genet 2005; 76: 387–398.

30 Crawford DC, Carlson CS, Rieder MJ et al. Haplotypediversity across 100 candidate genes for inflammation, lipidmetabolism, and blood pressure regulation in two popula-tions. Am J Hum Genet 2004; 74: 610–622.

31 Chen FC, Li WH. Genomic divergences between humans andother hominoids and the effective population size of thecommon ancestor of humans and chimpanzees. Am J HumGenet 2001; 68: 444–456.

32 Barbujani G, Magagni A, Minch E, Cavalli-Sforza LL. Anapportionment of human DNA diversity. Proc Natl Acad SciUSA 1997; 94: 4516–4519.

Haplotype structure of CTLA4A Ramırez-Soriano et al

656

Genes and Immunity

33 Rosenberg NA, Pritchard JK, Weber JL et al. Genetic structureof human populations. Science 2002; 298: 2381–2385.

34 Excoffier L, Hamilton G. Comment on ‘Genetic structure ofhuman populations’. Science 2003; 300: 1877; author reply1877.

35 Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. Interrogatinga high-density SNP map for signatures of natural selection.Genome Res 2002; 12: 1805–1814.

36 Kidd KK, Pakstis AJ, Speed WC, Kidd JR. Understandinghuman DNA sequence variation. J Hered 2004; 95: 406–420.

37 Tishkoff SA, Varkonyi R, Cahinhinan N et al. Haplotypediversity and linkage disequilibrium at human G6PD: recentorigin of alleles that confer malarial resistance. Science 2001;293: 455–462.

38 Hamblin MT, Thompson EE, Di Rienzo A. Complex signaturesof natural selection at the Duffy blood group locus. Am J HumGenet 2002; 70: 369–383.

39 Sabeti PC, Reich DE, Higgins JM et al. Detecting recentpositive selection in the human genome from haplotypestructure. Nature 2002; 419: 832–837.

40 Kong A, Gudbjartsson DF, Sainz J et al. A high-resolutionrecombination map of the human genome. Nat Genet 2002; 31:241–247.

41 Kidd JR, Pakstis AJ, Zhao H et al. Haplotypes and linkagedisequilibrium at the phenylalanine hydroxylase locus, PAH,in a global representation of populations. Am J Hum Genet2000; 66: 1882–1899.

42 Mateu E, Perez-Lezaun A, Martinez-Arias R et al. PKLR- GBAregion shows almost complete linkage disequilibrium over 70

kb in a set of worldwide populations. Hum Genet 2002; 110:532–544.

43 Kidd KK, Morar B, Castiglione CM et al. A global survey ofhaplotype frequencies and linkage disequilibrium at theDRD2 locus. Hum Genet 1998; 103: 211–227.

44 Tishkoff SA, Goldman A, Calafell F et al. A global haplotypeanalysis of the myotonic dystrophy locus: implications for theevolution of modern humans and for the origin of myotonicdystrophy mutations. Am J Hum Genet 1998; 62: 1389–1402.

45 Mateu E, Calafell F, Lao O et al. Worldwide genetic analysis ofthe CFTR region. Am J Hum Genet 2001; 68: 103–117.

46 Gonzalez-Neira A, Ke X, Lao O et al. The portability oftagSNPs across populations. A worldwide survey(submitted).

47 Cann HM, de Toma C, Cazes L et al. A human genomediversity cell line panel. Science 2002; 296: 261–262.

48 Iyengar S, Seaman M, Deinard AS et al. Analyses of crossspecies polymerase chain reaction products to infer theancestral state of human polymorphisms. DNA Seq 1998; 8:317–327.

49 Stephens M, Donnelly P. A comparison of bayesian methodsfor haplotype reconstruction from population genotype data.Am J Hum Genet 2003; 73: 1162–1169.

50 Schneider S, Roessli D, Excoffier L. Arlequin Ver. 2.0: A softwarefor Population Genetic Data Analysis, 2nd edn., University ofGeneva, Geneva, 2000.

51 Sebastiani P, Lazarus R, Weiss ST, Kunkel LM, Kohane IS,Ramoni MF. Minimal haplotype tagging. Proc Natl Acad SciUSA 2003; 100: 9900–9905.

Haplotype structure of CTLA4A Ramırez-Soriano et al

657

Genes and Immunity


Recommended