Molecular Markers
• Locus – is a physical location within the DNA sequence
• Allele – is a single variant of a locus
2 http://gabrielarogers.blogspot.com/2011/05/allele-one-of-two-or-more-molecular.html
Molecular Markers
• Molecular markers are designed to interrogate heritable differences in DNA sequence called “polymorphisms”
• Each unique polymorphism for a given DNA position is called an “allele”
• Individuals may inherit unique combinations of polymorphisms/alleles across several loci providing a distinctive DNA ID or fingerprint
3
Molecular Markers
• Molecular markers are tools that allow us to collect information about an individual, a population, or a species
• Molecular markers are used to answer a variety of questions – mating behaviors/effective population size
– population census counts
– parentage
– genetic basis of traits/gene mapping
4
Allozyme Markers
• Allozymes are variants (alleles) of the same protein
• Proteins are involved in carrying out various biological processes
• Changes in the genetic code (DNA) can have an affect on the physical properties of the protein (DNA > RNA > Protein)
• Used as early as 1955
5
Allozyme Markers
Each allozyme allele must have unique physical properties that allow us to differentiate them
using a process called “electrophoresis”
6
-3
- +
-
-
- - -
- +
+
-1
Allozyme Markers Example: Sickle Cell Disease β-globin gene mutation
http://www.mun.ca/biology/scarr/Hemoglobin_Electrophoresis.html 8
• Glu > Val in beta-globin HbA > HbS -
Allozyme Markers
• The pioneering marker of molecular genetics
• Pros – Multi-allelic/co-dominant marker
– Easy to replicate across labs
– Requires no DNA sequence information
• Cons – Requires variation in amino acid sequence to be detected
– Protein isolation can be time consuming and expensive
– Large amount of tissue required (lethal sampling)
– Susceptible to environmental variation (tissue specific protein expression levels)
10
DNA Sequence and Genetic Markers
• 1977 Sanger and colleagues describe laboratory methods of DNA sequencing
• Allowed for the direct interrogation of variation at the DNA sequence level which lead to the discovery of informative genetic markers
http://www.nature.com/nmeth/journal/v5/n1/full/nmeth1154.html 11
DNA Sequence Variation
microsatellite
•di-nucleotide “CT” repeat
•di- and tetra-nucleotide markers common
insertion/deletion (indel)
•Ex. 8 bp deletion
•1 or more nucleotides
SNP
•Any variation of the 4 nucleotides
12
Mitochondrial DNA
• Maternally inherited – mtDNA in sperm destroyed at fertilization – Maternal lineages can be traced back in time many
generations with accuracy (no recombination with paternal mtDNA)
• Susceptible to oxidative damage and increased mutation rates (>3x compared to nDNA) – Allows for assessment of genetic relationships among
individuals, species, and across taxa
• Restriction fragments and sequence variants as mtDNA markers
14
Mitochondrial DNA
• Pros
– More copies per cell than nuclear DNA
• Paleontology
– No recombination/Maternally inherited/High mutation rates
• Good resolution of taxonomic relationships
• Cons
– Only maternally inherited, so can’t be used to identify paternity
16
Microsatellite Markers
Microsatellite markers are repeating sequences of 2-6 base pairs of DNA and can be hyper-variable compared to other markers
TGCCGTGCATATATATATATATATATCGAGCTATT (9X)=35bp 5 TGCCGTGCATATATATATATATATCGAGCTATT (8X)=33bp
TGCCGTGCATATATATATATATATCGAGCTATT (8X)=33bp 4 TGCCGTGCATATATCGAGCTATT (3X)=23bp
1 2 3 4 5
PCR amplification of microsatellites allows them to be scored using electrophoresis just like allozyme markers.
17
TGCCGTGCATATATCGAGCTATT (3X)=23bp 1
TGCCGTGCATATATCGAGCTATT (3X)=23bp
TGCCGTGCATATATATATATCGAGCTATT (6X)=29bp 2
TGCCGTGCATATATATATATCGAGCTATT (6X)=29bp
TGCCGTGCATATATATATCGAGCTATT (5X)=27bp 3 TGCCGTGCATATATATATCGAGCTATT (5X)=27bp
Microsatellite Markers
• Example: Paternity testing
Case 1 Case 2
http://www.paternity.be/information_EN.html
Not my kid!
18
Microsatellite Markers
• Pros – Extremely variable
• > 20 alleles for one locus • Codominant
– Moderately abundant in genome – Predominantly neutral loci – Moderately easy to genotype using PCR
• Less time and effort to genotype than allozyme markers
– Little tissue required thanks to PCR (non-lethal sampling)
• Cons – Difficult to discover
• Generally requires DNA sequence information
– Difficult to standardize and exchange data across labs • PCR variation • Scoring variation
19
SNP Markers
• A single-nucleotide polymorphism (SNP) is a variation in DNA, when a single nucleotide (A, T, C, or G) within a given sequence differs between homologous chromosomes or between individuals at homologous loci
ATG GCT TCG ATC GAT CTA
ATG GCC TCG ATC GAT CTA
ATG GCT ACG ATC GAC CTA ATG GCT ACG ATC GAC CTA
20
SNP Markers
• Can be interrogated in many ways including PCR/electrophoresis, RT-PCR, and sequencing
• Can be rapidly/massively genotyped using high-throughput methods – TaqMan assay (RT-PCR)
• Fluidigm
– SNPchips and Microarrays
– Genotyping by Sequencing (GBS) • RAD-seq
21
RAD sequencing
• Restriction-site Associated DNA (RAD) • Method of discovering and interrogating 1,000s
of SNPs quickly • Targeted sequencing of loci (restriction sites)
normally distributed throughout the genome • Reduces the genome to a quantity of loci that are
realistic to survey with current sequencing methods
• Genotypes come directly from sequence alignments
22
PCR
P2 Ligation
Size Select
sbfI digestion
Sonication
P1 Ligation
DNA
RAD Tag
CCTGCAGG CCTGCAGG CCTGCAGG CCTGCAGG
Genotyping by sequencing
24
AATTCCTGCAGGCTGAGCCATGCTAGACGATGGC
AATTCCTGCAGGCTGAGCCATGCTAGACGATGGC
AATTCCTGCAGGCTGAGCCATGCTAGACGATGGC
AATTCCTGCAGGCTGAGCCATGCTAGACGATGGC
AATTCCTGCAGGCTGAGCCATGCTAGACGATGGC
AATTCCTGCAGGCTGAGCCATGCTAGACGATGGC
AATTCCTGCAGGCTGAGCCATGCTAGACGATGGC
GGCCAATGCAGGCTGAGCCATGCTAGTCGATGGC
GGCCAATGCAGGCTGAGCCATGCTAGTCGATGGC
GGCCAATGCAGGCTGAGCCATGCTAGTCGATGGC
GGCCAATGCAGGCTGAGCCATGCTAGTCGATGGC
GGCCAATGCAGGCTGAGCCATGCTAGTCGATGGC
GGCCAATGCAGGCTGAGCCATGCTAGTCGATGGC
GGCCAATGCAGGCTGAGCCATGCTAGTCGATGGC
GGCCAATGCAGGCTGAGCCATGCTAGTCGATGGC
AATTCCTGCAGGAATCGTCGTAGCTGATCGATCG
AATTCCTGCAGGAATCGTCGTAGCTGATCGATCG
AATTCCTGCAGGAATCGTCGTAGCTGATCGATCG
AATTCCTGCAGGAATCGTGGTAGCTGATCGATCG
AATTCCTGCAGGAATCGTGGTAGCTGATCGATCG
AATTCCTGCAGGAATCGTGGTAGCTGATCGATCG
AATTCCTGCAGGAATCGTCGTAGCTGATCGATCG
GGCCAATGCAGGAATCGTCGTAGCTGATCGATCG
GGCCAATGCAGGAATCGTGGTAGCTGATCGATCG
GGCCAATGCAGGAATCGTCGTAGCTGATCGATCG
GGCCAATGCAGGAATCGTCGTAGCTGATCGATCG
GGCCAATGCAGGAATCGTGGTAGCTGATCGATCG
GGCCAATGCAGGAATCGTGGTAGCTGATCGATCG
GGCCAATGCAGGAATCGTCGTAGCTGATCGATCG
GGCCAATGCAGGAATCGTGGTAGCTGATCGATCG
𝐿1𝐴/𝐴, 𝐿6𝐶/𝐺
𝐿1𝑇/𝑇 , 𝐿6𝐶/𝐺
L1 L2 L3 L4 L5 L6 L7
ID001
ID002
SNP Markers
• Pros – Codominant markers – Most abundant markers in the genome – Easy to interrogate with current high-throughput
technology – requires little tissue – Highly reproducible between labs, easy to standardize,
easy exchange of data – Can be adaptive and neutral loci
• Cons – Not as variable as microsatellites
• Generally only two alleles per locus
– High up front discovery/operating cost 25