Post on 03-Feb-2022
transcript
TechnologyTransition Workshop
Genetics of SNP Markers
Technology Transition Workshop| Kenneth K. Kidd®
TechnologyTransition Workshop
Why SNPs?
• Plentiful – millions exist in the human genome
• Genetically simple – di‐allelic and co‐dominant
• Very low mutation rates – genetically stable
• Robust to DNA damage – small amplicons
• Multiple typing methods – easy to type
• Typing automatable – fast results
• Genotype calling automatable – interpretation easy, qualitative calls
Forensic SNP Analysis Genetics of SNP Markers 2
TechnologyTransition Workshop
Requirements for DNA Markers in Forensics
• The genetic nature of the polymorphism must be well understood [OK for SNPs]
• The molecular methodology for testing the marker must be reliable [OK for SNPs]
• The markers should be usable in mixtures [NOT ± OK for SNPs]
• The statistical methods for evaluating the data must be sound [OK for SNPs]
• The data for use in the statistics must be sufficient [NOT OK for SNPs yet]
Forensic SNP Analysis Genetics of SNP Markers 33
TechnologyTransition Workshop
Problems with SNPs in Forensics
1. Few SNPs have extensive population database support
2. No SNPs have forensic databases accumulated (i.e., offender and crime scene SNP data)
3. SNPs are problematic with sample mixtures4. Few forensic labs have experience with SNPs5. No agreed upon common set of SNPs to consider6. SNPs vary widely in their population genetic
characteristics7. Different SNPs are needed for different purposes
Forensic SNP Analysis Genetics of SNP Markers 4
TechnologyTransition Workshop
Progress on Problems with SNPs in Forensics
1. Few SNPs have extensive population database support• Increasing numbers of SNPs are being tested on multiple
populations
• Though sample sizes per population are often less than usually considered adequate, there are multiple populations from each geographic area in the accumulating body of knowledge
• Our studies include work to collect more SNP data on many population samples
• We are also attempting to accumulate in one place the allele frequency data being collected and published by many research groups
• We are using the NSF‐supported database ALFRED: http://alfred.med.yale.edu
Forensic SNP Analysis Genetics of SNP Markers 5
TechnologyTransition Workshop
Progress on Problems with SNPs in Forensics
2. No SNPs have forensic databases accumulated (i.e., offender and crime scene SNP data)• If SNP panels can be agreed upon, a parallel
processing can be done relatively cheaply
• In many cases, SNPs would be sufficient for a local crime with clear suspect
3. SNPs are problematic with sample mixtures• Procedures and statistics are being developed to use
SNPs to detect mixtures, but I am not sure what the number of SNPs needed will be or how technically robust the methods will be
Forensic SNP Analysis Genetics of SNP Markers 6
TechnologyTransition Workshop
Progress on Problems with SNPs in Forensics
4. Few forensic labs have experience with SNPs• I may be wrong in this statement but existence of this
workshop supports the belief
5. No agreed upon common set of SNPs to consider• The European SNPforID Consortium has two SNP
panels now recognized and used by forensic labs in several countries
• My research is attempting to find better panels, to be discussed later in this talk
Forensic SNP Analysis Genetics of SNP Markers 7
TechnologyTransition Workshop
Progress on Problems with SNPs in Forensics
6. SNPs vary widely in their population genetic characteristics
7. Different SNPs are needed for different purposes
The following slides illustrate how allele frequencies among populations can vary greatly among SNPs.
That variation, or lack thereof, can be used for different forensic purposes.
Forensic SNP Analysis Genetics of SNP Markers 8
TechnologyTransition Workshop
Allele Frequencies at Some SNPs Vary Greatly Among Populations
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
AFRICA SWASIA EUROPE SBR PAC EASIA AMERICA
Forensic SNP Analysis 9Genetics of SNP Markers
TechnologyTransition Workshop
Types of SNP Panels• Individual Identification SNPs (IISNPs):
− SNPs that collectively give very low probabilities of two individuals having the same multisite genotype
• Ancestry Informative SNPs (AISNPs):− SNPs that collectively give a high probability of an individual’s ancestry
being from one part of the world or being derived from two or more areas of the world
• Lineage Informative SNPs (LISNPs):− Sets of tightly linked SNPs that function as multiallelic markers that can
serve to identify relatives with higher probabilities than simple di‐allelic SNPs
• Phenotype Informative SNPs (PISNPs):− SNPs that provide high probability that the individual has particular
phenotypes, such as a particular skin color, hair color, eye color, etc.
Forensic SNP Analysis Genetics of SNP Markers 10
Reproduced in Butler et al., 2008
TechnologyTransition Workshop
Requirements for IISNPs• Individual Identification SNPs (IISNPs):
− SNPs that collectively give very low probabilities of two individuals having the same multisite genotype• We have added the additional criterion that ethnicity should not be an issue in determining the match probability
• Therefore, we are requiring all such SNPs to be close to maximally informative all around the world
• That translates in practice to SNPs that have globally average heterozygosities > 0.4 and global Fst values < 0.06
• The work we have done is covered in detail in the following slides
Forensic SNP Analysis Genetics of SNP Markers 11
TechnologyTransition Workshop
Procedures for Identifying IISNPs
• Identify likely candidate polymorphisms
• Screen on a few populations
• Retain the “best”
• Test on many populations
• Retain the “best” − Reliability of typing
− Hardy‐Weinberg criterion
• Test for LD and linkage
Forensic SNP Analysis Genetics of SNP Markers 12
TechnologyTransition Workshop
Population Basis of Final IISNP Panel
Number of Individuals Sampled
Number of Populations Sampled
Africa 503 10
South West Asia 273 4
Europe 567 9
Siberia 148 3
South Central Asia 30 1
East Asia 481 9
Pacific Islands 60 2
Americas 296 6
Forensic SNP Analysis Genetics of SNP Markers 13
Data on 2358 individuals from 44 populations
TechnologyTransition Workshop
Examples of candidate IISNPs —high heterozygosity and little allele frequency variation
among populations
Forensic SNP Analysis 14Genetics of SNP Markers
TechnologyTransition Workshop
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0Fst(44)=.0217 Fst(44)=.0596Low Fst SNPs
AFRICA SWASIA EUROPE SBR PAC EASIA AMERICA
Forensic SNP Analysis 15Genetics of SNP Markers
TechnologyTransition Workshop
If you think those show high variation, consider examples of
candidate AISNPs
Forensic SNP Analysis 16Genetics of SNP Markers
TechnologyTransition Workshop
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0ADH1B Fst=.47 DARC Fst=.90 SLC45A2 Fst=.74High Fst SNPs
AFRICA SWASIA EUROPE SBR PAC EASIA AMERICA
Forensic SNP Analysis 17Genetics of SNP Markers
TechnologyTransition Workshop
Summary of Screening for IISNPs• We screened several data sets with allele frequencies for multiple SNPs on 4 to 50 diverse populations
• We typed > 500 of these on our 44 populations
• 92 SNPs (~20% of those screened) met criteria of average heterozygosity > 0.4 and Fst < 0.06
• 86 of those 92 showed no significant pairwise LD in the (86 x 85) / 2 = 3,655 tests
• 45 of those 86 have no or very loose linkage: our final IISNP panel
Forensic SNP Analysis 18Genetics of SNP Markers
TechnologyTransition Workshop
Fst reference distribution based on 40 populations
0
20
40
60
80
100
120
140
160
180
0.03 0.06 0.09 0.12 0.15 0.18 0.21 0.24 0.27 0.30 0.33 0.36 0.39 0.42 0.45 0.48 0.51 0.54 0.57 0.60
Upper bound of Fst interval (0.03 increments)
Num
ber
of S
NPs
N Markers 813Mean .139Median .125Std Dev .070Minimum .020Maximum .534
Forensic SNP Analysis 19Genetics of SNP Markers
TechnologyTransition Workshop
Forensic SNP Analysis 20Genetics of SNP Markers
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
0.14
0.15
Fst(44p) Rank (left to right) of 45 "unlinked" of 92 IISNPs
Fst(4
4pop
s)
0.25
0.27
0.29
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
0.47
0.49
Ave
rage
Het
eroz
ygos
ity (4
4pop
s)
Fst(44p) Avg het(44p)
Fst and Average Heterozygosity for the 45‐SNP Panel
TechnologyTransition Workshop
Distribution of All Pairwise LD Values (r2)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19 0.21 0.23 0.25 0.27 0.29
Total pairwise LD values = 43,560(45 SNPs in 44 Populations)
95.14% of the LD values are < 0.1199.90% of the LD values are < 0.30
The nominally significant LD values are distributed randomly among loci and among the smallest, most isolated populations.
Forensic SNP Analysis 21Genetics of SNP Markers
TechnologyTransition Workshop
1.E-19
1.E-18
1.E-17
1.E-16
1.E-15
1.E-14
Samaritans
Nasioi
Atayal
Mbuti
R.Surui
AFRICA SWASIA EUROPE SBR PAC EASIA AMERICA
Match probabilities: 45 IISNPs, 44 population samples
Forensic SNP Analysis 22Genetics of SNP Markers
TechnologyTransition Workshop
1.E-19
1.E-18
1.E-17
1.E-16
1.E-15
1.E-14
1.E-13
1.E-12
1.E-11
1.E-10
1.E-09
1.E-08
1.E-07
45 unlinked IISNPs
Random 45 SNPs Set#1
Random 45 SNPs Set#2
Comparing match probabilities
AFRICA SWASIA EUROPE SBR PAC EASIA AMERICA
Forensic SNP Analysis 23Genetics of SNP Markers
TechnologyTransition Workshop
Points To Note On Previous Slide
• The “random” SNPs are from our in‐lab database of SNPs we study because they show high heterozygosity in one region of the world
• Truly random SNPs would have much lower average heterozygosity and be much less informative
• There is a strong European bias because until recently most SNPs were discovered because they had a high heterozygosity in Europe
Forensic SNP Analysis Genetics of SNP Markers 24
TechnologyTransition Workshop
An Empiric Evaluation of Matches
• We have compared genotypes between all possible pairs of individuals who had data on all 45 IISNPs
• The numbers of genotypes out of 45 that match have been tabulated separately for pairs within a population and pairs between populations since some of the populations, especially the small tribal ones, contain closely related individuals
Forensic SNP Analysis Genetics of SNP Markers 25
TechnologyTransition Workshop
Match Within Between Combined0 0 0 0
1 or 2 0 0 03 or 4 0 18 185 or 6 7 348 3557 or 8 82 4,150 4,232
9 or 10 514 25,346 25,86011 or 12 1,974 94,245 96,21913 or 14 5,040 225,443 230,48315 or 16 8,933 362,366 371,29917 or 18 10,873 398,947 409,82019 or 20 9,307 308,707 318,01421 or 22 5,770 168,386 174,15623 or 24 2,731 64,779 67,51025 or 26 929 18,030 18,95927 or 28 342 3,484 3,82629 or 30 121 477 59831 or 32 39 31 7033 or 34 12 4 1635 or 36 3 1 437 or 38 1 0 139 or 40 0 0 041 or 42 0 0 043 or 44 0 0 0
45 0 0 0
Totals 46,678 1,674,762 1,721,440
Numbers of genotypes matching in all possible pairwise comparisons of 1856 individuals (in all 44 populations) that were fully typed for all 45 IISNPs
The highest number of loci matching
Forensic SNP Analysis 26Genetics of SNP Markers
→
TechnologyTransition Workshop
Forensic SNP Analysis 27Genetics of SNP Markers
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
-0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5
PC #1 (23% variance)
PC #
2 (1
8% v
aria
nce)
Africa SWAsiaEurope NWAsiaSCAsia EAsiaNEAsia PacificNAmerica SAmerica
MBU
NAS
SUR
ATL
Principal Components Analysis: 92 IISNPs, 44 population samples
SAM
TechnologyTransition Workshop
Forensic SNP Analysis 28Genetics of SNP Markers
-0.36
-0.26
-0.16
-0.06
0.04
0.14
0.24
-0.24 -0.14 -0.04 0.06 0.16 0.26 0.36
PC #1 (52% variance)
PC #
2 (2
0% v
aria
nce)
Africa SW AsiaEurope NW AsiaSC Asia East AsiaNE Asia PacificN America S AmericaSAM
NAS
SUR
ATL
Principal Components Analysis: 200 random SNPs, 44 populations
MBU
TechnologyTransition Workshop
Some Obvious Observations• Fst values can change significantly as the number of
populations considered in the calculations increases but Fst stabilizes with global coverage
• A few populations are ‘outliers’ and often have significantly different allele frequencies− Isolated populations?
− Bottleneck?
− Founder effect?
− Small sample size?
• Any other measure of allele frequency variation should be highly correlated with Fst, so the set of IISNPs identified should be quite generally valid, though the rank‐order might change
Forensic SNP Analysis Genetics of SNP Markers 29
TechnologyTransition Workshop
The Next Steps
• Test on other typing platforms – who will do this?
• Develop multiplex assays – already being done by AB
• Test in more populations – who will do this?
• Test in forensic practice – being initiated
• Adopt a standard panel
Forensic SNP Analysis Genetics of SNP Markers 30
TechnologyTransition Workshop
Requirements for AISNPs• Ancestry Informative SNPs (AISNPs):
− SNPs that collectively give a high probability of an individual’s ancestry being from one part of the world or being derived from two or more areas of the world• We are currently accumulating data on SNPs that show very high allele frequency variation among populations
• It is very easy to find SNPs that will differentiate ancestry entirely from indigenous peoples of West Africa, Western Europe, Far East Asia, or the Americas
• It is far more difficult to differentiate ancestry from geographically “intermediate” regions
• The components of admixed ancestry are also very difficult to determine
• In my opinion the companies that sell such services are not sufficiently accurate for forensic purposes
Forensic SNP Analysis Genetics of SNP Markers 31
TechnologyTransition Workshop
Preliminary Results
• As part of our general research we have studied many markers on our populations and have been evaluating haplotypes for their utility in distinguishing among populations
• We have begun identifying individual SNPs that are highly informative in variation among populations
Forensic SNP Analysis Genetics of SNP Markers 32
TechnologyTransition Workshop
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0ADH1B Fst=.47 DARC Fst=.90 SLC45A2 Fst=.74High Fst SNPs
AFRICA SWASIA EUROPE SBR PAC EASIA AMERICA
Forensic SNP Analysis 33Genetics of SNP Markers
TechnologyTransition Workshop
Results for 506 haplotypes based on 2556 SNPs in 45 populations, a total of 6.22 million genotypes
followed by
results for 128 high‐Fst SNPs in 71 populations
Forensic SNP Analysis 34Genetics of SNP Markers
TechnologyTransition Workshop
STRUCTURE Analyses of 506 Haplotypes
Forensic SNP Analysis Genetics of SNP Markers 35
TechnologyTransition Workshop
STRUCTURE Analysis of 128 High‐Fst SNPs in 71 Populations
Forensic SNP Analysis Genetics of SNP Markers 36
TechnologyTransition Workshop
Requirements for LISNPs
• Lineage Informative SNPs (LISNPs):− Sets of tightly linked SNPs that function as multiallelic markers that can serve to identify relatives with higher probabilities than simple di‐allelic SNPs• Many of the haplotypes in the previous figure will be useful for this purpose
• Each will need to be evaluated for its heterozygosity, lack of frequent recombination, etc.
Forensic SNP Analysis Genetics of SNP Markers 37
TechnologyTransition Workshop
Requirements for PISNPs
• Phenotype Informative SNPs (PISNPs):− SNPs that provide high probability that the individual has particular phenotypes, such as a particular skin color, hair color, eye color, etc.• This is a very problematic area because phenotype is complex and most existing data are correlational, not biologically definitive
• The four loci believed most responsible for light skin color in Europeans have very different allelic distributions in the regions flanking Western Europe (ALFRED and unpublished results from Kidd Lab)
Forensic SNP Analysis Genetics of SNP Markers 38
TechnologyTransition Workshop
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
SLC24A5 rs1426654
SLC45A2 rs16891982
Skin Pigmentation SNP Frequencies in 85 Populations
Africa SWA Europe NWA SA Pac SEA EA NEA America
Source data: ALFRED
Forensic SNP Analysis 39Genetics of SNP Markers
TechnologyTransition Workshop
Requirements for PISNPs• A biological understanding of the relationship between the three genotypes at a SNP and the phenotype variation
• A biological understanding of the relationships among the genotypes at the several loci and the phenotype variation
• A population genetic understanding of how the genotype frequencies vary among populations
• PISNPs are not ready for “prime time”; a simple correlation at the population level is generally not sufficient
Forensic SNP Analysis Genetics of SNP Markers 40
TechnologyTransition Workshop
Data Availability
• As we are accumulating data on additional markers and additional populations the data are being made public through two sources: − (1) the Kidd Lab website with relevant forensic annotation and • Kidd Lab: http://info.med.yale.edu/genetics/kkidd
− (2) ALFRED with the links to genetic, population, and molecular descriptions• ALFRED: http://alfred.med.yale.edu
Forensic SNP Analysis Genetics of SNP Markers 41
TechnologyTransition Workshop
Acknowledgements
• This work is currently funded by grant 2007‐DN‐BX‐K197 from the NIJ
• NIH Grants AA009379 and GM057672 fund the ongoing general work on population genetics of DNA markers providing resources on which the forensic studies rely
• NSF Grant BCS‐0938633 funds the maintenance of ALFRED and helps us make our forensic data publically available
Forensic SNP Analysis Genetics of SNP Markers 43
TechnologyTransition Workshop
Acknowledgements• The data presented here are the result of work by many individuals:− Andrew J. Pakstis, Ph.D.
− Judith R. Kidd, Ph.D.
− William C. Speed
− Eva Straka
• We also thank the many hundreds of anonymous individuals for their participation in these studies
• These studies would not be possible without their voluntary consent to give blood samples for studies of genetic variation
Forensic SNP Analysis Genetics of SNP Markers 44
TechnologyTransition Workshop
Contact Information
Kenneth K. Kidd
Department of Genetics
Yale University School of Medicine
P.O. Box 208005
New Haven, CT 06520‐8005
203‐785‐2654
kenneth.kidd@yale.edu
Forensic SNP Analysis 46Genetics of SNP Markers
Note: All images are courtesy of Dr. Kenneth K. Kidd.
®
TechnologyTransition Workshop
Appended Information for Reference or to Address Questions
Forensic SNP Analysis Genetics of SNP Markers 47
TechnologyTransition Workshop
Publications of Kidd Laboratory Research to Date• 449. Kidd K.K., A.J.Pakstis, W.C. Speed, E.L. Grigorenko, S.L.B. Kajuna, N.J.
Karoma, S. Kungulilo, J.‐J. Kim, R.‐B. Lu A. Odunsi, F. Okonofua, J. Parnas, L.O. Schulz, O.V. Zhukova, and J.R. Kidd, 2006. Developing a SNP panel for forensic identification of individuals. Forensic Science International 164 :20‐32
• 461. Pakstis A. J., W. C. Speed, J. R. Kidd, and K. K. Kidd, 2007. Candidate SNPs for a Universal Individual Identification Panel. Human Genetics 121: 305‐317
• 467. Pakstis, A. J., W. C. Speed, J. R. Kidd, and K. K. Kidd, 2008. SNPs for Individual Identification. Progress in Forensic Genetics : Genetics Supplement Series 1: 479–481
• 468. Butler, J. M., B. Budowle, P. Gill, K. K. Kidd, C. Phillips, P. M. Schneider, P. M. Vallone, and N. Morling, 2008. Report on ISFG SNP Panel Discussion. Progress in Forensic Genetics: Genetics Supplement Series 1: 471–472
Forensic SNP Analysis Genetics of SNP Markers 48
TechnologyTransition Workshop
Additional Information on Our Forensic Research to Date• See the Microsoft® PowerPoint® versions of relevant posters and talks in the “Library” section of the Kidd Lab website:− http://info.med.yale.edu/genetics/kkidd/
• Other publications from our laboratory can be found under “Publications” on that website
Forensic SNP Analysis Genetics of SNP Markers 49
TechnologyTransition Workshop
Populations Studied (Sample Sizes)
Forensic SNP Analysis Genetics of SNP Markers 50
Africa S.W. Asia Eurpoe N. Asia
Biaka (70) Yemenites (43) Adygei (54) Komi Zyrian (47)
Mbuti (39) Druze (106) Chuvash (42) Kyanty (50)
Yoruba (78) Samaritans (41) Russians Yakut (51)
Ibo (48) Ashkenazi (83) Archangelsk (34)
Hausa (39) Vologda (48) S.C. Asia
Masai (22) Hungarians (92) Keralites (30)
Chagga (45) Finns (36)
Sandawe (40) Danes (51)
Ethiopians (32) Irish (118)
African Americans (90) EuroAmericans (92)
TechnologyTransition Workshop
Populations Studied (Sample Sizes)
Forensic SNP Analysis Genetics of SNP Markers 51
Pacific Islands East Asia Americas
Nasioi (23) Chinese, SF (60) Pima, Mexico (53)
Micronesians (37) Chinese, TW (49) Maya (52)
Hakka (41) Quechua (22)
Koreans (54) Ticuna (65)
Japanese (51) Rondonian Surui (47)
Ami (40) Karitiana (57)
Atayal (42)
Cambodians (25)
Laotians (119)
TechnologyTransition Workshop
0
25
50
75
100
125
150
175
200
225
250
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Phys
ical
Pos
ition
(MB
)
Chromosomal locations: 45 "unlinked" IISNPs, 13 CODIS STRs
pter
qter Chromosomes as megabase-scaled black lines.Cross-bars show telomeres.
Filled-black circles locate 45 "unlinked" IISNPS. Hollow, yellow-filled circles locate 13 CODIS STRs.
Forensic SNP Analysis 52Genetics of SNP Markers