Comparison of Middle Eastern Bedouin
Genotypes with Previously Studied Populations
Using Polymorphic Alu Insertions
Alison Patricia Pitt (BA, GDipForSci)
Centre for Forensic Science
University of Western Australia
This thesis is presented in partial fulfilment of the requirements for the
Master of Forensic Science
2008 (Font TNR, 16point)
i
I declare that the research presented in this 36 point thesis, as part of the 96 point
Master degree in Forensic Science, at the University of Western Australia, is my own
work. The results of the work have not been submitted for assessment, in full or part,
within any other tertiary institute, except where due acknowledgement has been made
in the text.
…………………………………………………
Your name here
ii
Acknowledgments
I am grateful to my supervisors Dr. Guan Tay and Associate Professor Ian Dadour for
all their support and guidance that they have provided me during my research, as well
as providing me the opportunity to pursue the work that is described in this thesis.
I am also very grateful to Habiba Al-Safar for her collaboration and help that she had
given to me during my visit to Dubai, United Arab Emirates (UAE). As well as to Dr.
Kamal A. Khazanehdari for allowing me the use of his wonderful laboratory and
equipment at the Central Veterinary Research Laboratory (CVRL), Molecular
Biology and Genetics department in Dubai, UAE.
A most appreciated thank you to all the staff at the CVRL for their patience with me
in the lab while guiding me through this learning process, and for all their intellectual
support. They have provided me with the knowledge to help me get further in this
field of research.
My sincere gratitude to all my colleagues at the Centre for Forensic Science, Stephen
Iaschi, Catherine Rinaldi, Rebecca Ford, Yvette Hitchen and Ha Nguyen for
providing me with beginners tips and lending their knowledge so that my journey into
forensic DNA could be an easy and enjoyable one.
iii
Abbreviation and Symbols
μl – Micro litre
°C – Degree celsius
4AOH – Fourth Asia Oceanic Histocompatibility
7SL RNA – An abundant cytoplasmic RNA that functions in protein secretion
A – Adenine
Alu – Arthrobacter luteus restriction enzyme
bp – Base pair
C – Cytosine
CDSN – Corneodesmosin
DNA – Deoxyribonucleic acid
DNTPs – Deoxyribonucleotide triphosphates
G – Guanine
HGP – Human Genome Project
HLA – Human leucocyte antigen
HWE – Hardy-Weinberg Equilibrium
Indel – Insertion/deletion
Kb – Kilobase
LD – Linkage disequilibrium
LINE – Long interspersed element
LTR – Long terminal repeat
MgCl2 – Magnesium chloride
MHC – Major histocompatibility complex
MIC – MHC Class I chain related
mM – Milli molar
MW – Molecular weight
Mya – Million years ago
ng – Nano grams
nmol – Nano mol
PCR – Polymerase chain reaction
POALIN – Polymorphic alu insertion
RNA – Ribonucleic acid
iv
SINE – Short interspersed element
SLP – Single locus probe
SNP – Single nucleotide polymorphism
T – Thymine
TFIIH – Transcription factor II H
Tris-HCl – Tris-Hydrochloride
tRNA – Transfer ribonucleic acid
v
Definitions
Allele:
One of two or more alternate forms of a gene that occupy the same locus in the genome.
Allele Frequency:
The frequency with which one form of a gene (an allele) occurs within a given population.
Alu:
A short transposable element that makes up more than 10% of the human genome. Alu
elements are class 2 retroelements that do not encode protein and as such are
nonautonomous elements.
Annealing:
Often used to describe the binding of a DNA probe, or the binding of a primer to a DNA
strand during a polymerase chain reaction (PCR).
Autonomous Transposable Elements:
A transposable element that encodes the protein necessary for its transposition and for the
transposition of nonautonomous elements on the same family.
Blastocyst:
The structure formed in early embyogenesis.
Denaturation:
The separation of the two strands of a DNA double helix or the severe disruption of the
structure of any complex molecule without breakage of the major bond of its chains.
DNA:
Deoxyribonucleic Acid. A double chain linked nucleotide; the fundamental substance of
which genes are comprised.
Electrophoresis:
The process of separating charged molecules through a gel matrix by the application of
an electric field. The gel matrix allows molecules to be separated on the basis of size.
vi
Embryogenesis:
The process by which the embryo is formed and develops.
Genome:
The entire collection of genetic information in an organism.
Genotype:
Is the inherited genetic information that is stored within an organism, and contributes to
the physical characteristics that determine chances of survival and reproduction.
Genotype Frequency:
The proportion or frequency of any particular genotype among the individuals of a
population.
Haplotypes:
A genetic class described by a sequence of DNA or of genes that are together on the same
physical chromosome.
Hardy-Weinberg Equilibrium:
The stable frequency distribution of genotypes A/A. A/a, a/a in the population of p2, 2pq
and q2, that is a consequence of random mating in the absence of mutation, migration,
natural selection or random drift.
Heterozygous:
A gene pair having different alleles in the two chromosome set of the diploid individual.
Homozygous:
Refers to the state of carrying a pair of identical alleles at the one locus.
Hybridisation:
The process in which two complementary nucleotides: purines and pyrimidine bind
through hydrogen bounds that form between them.
vii
Indel:
A mutation in which one or more nucleotide pairs are added or deleted.
Linkage Disequilibrium:
The non-random association of alleles at different loci. Occurs when genotypes at the
two loci are not independent of another.
Long Interspersed Repeated DNA (LINE):
A type of class I transposable element that encodes a reverse transciptase. LINE’s are
also called non-LTR retrotransposons.
Locus:
The position of a gene or chromosome segment on a chromosome. Alleles are located at
identical loci on homologous chromosomes.
Messenger RNA:
Carries information from DNA to structures called ribosomes. These ribosomes are
made from proteins and ribosomal RNAs, which together can read messenger RNA and
translate the information they carry into proteins.
Multi-Locus Probe (MLP):
A probe created by Alec Jeffreys that hybridises to a number of different sites in the
genome of an organism to compare selected sequences of single stranded DNA.
Nonautonomous Elements:
A transpoable element that relies on the protein products of autonomous elements for it
mobility.
Oocyte:
A female germ cell involved in reproduction.
Phenotype:
Is any observable characteristic of an organism, such as its morphology, development,
biochemical or physiological properties, or behaviour. Phenotypes result from the
expression of an organism’s genes as well as the influence of environmental factors.
viii
Polymerase Chain Reaction (PCR):
An in vitro method of amplifying a specific DNA segment that uses two primers that
hybridise to opposite ends of the segment in opposite polarity and, over successive cycles,
results in the replication of that segment only.
Polymorphism:
The simultaneous occurrence of two or more allelic forms within the population.
Retro Elements:
The general name for the class I transposable elements that move through an RNA
intermediate.
Retrotransposition:
A mechanism of transposition characterised by the reverse flow of information from
RNA to DNA.
Retrotransposons:
A transposable element that uses reverse transcriptase to transpose through an RNA
intermediate.
Retrovirus:
An RNA virus that replicate by first being converted into double-stranded DNA.
Reverse Transcriptase:
An enzyme that catalyses the synthesis of a DNA strand from an RNA template.
Ribonucleic Acid (RNA):
RNA is very similar to DNA, but differs in a few important structural details. In the cell
RNA is usually single stranded and contains only ribose. RNA is transcribed from DNA
by enzymes called RNA polymerase and is generally further processed by other enzymes.
RNA is central to the synthesis of proteins.
ix
Short Interspersed Nuclear Elements (SINE):
A type of Class I transposable element that does not encode reverse transcriptase but it
thought to use the reverse transcriptase encoded by LINEs.
Short Tandem Repeats (STRs):
A class of polymorphisms that occurs when a pattern of two or more nucleotides are
repeated and the repeated sequences are directly adjacent to each other.
Single-Locus Probe (SLP):
Is a DNA or RNA sequence that is able to hybridise with DNA from a specific restriction
fragment on a Southern blot, depending on complementary base pairs and probe sequence.
SLP are usually tagged with radioactive labels for easy detection, and are chosen to
detect one polymorphic genetic locus on a single chromosome.
Single Nucleotide Polymorphisms (SNPs):
A nucleotide pair different at a given location in the genome of two or more naturally
occurring individuals.
Translocation:
The relocation of a chromosomal segment to a different position in the genome.
Variable Number Tandem Repeats (VNTRs):
A location in a genome where a short nucleotide sequence is organised as a tandem
repeat. These can be found on many chromosomes, and often show variations in length
between individuals.
x
Table of Contents
Declaration i
Acknowledgment ii
Abbreviation and Symbols iii – iv
Definitions v - ix
Table of Contents x – xi
List of Tables xii
List of Figures xiii – xiv
Abstract xv - xvii
CHAPTER 1 INTRODUCTION
1.1 Introduction 1
1.2 History of Genetics 1
1.3 Fundamentals of Genetic Structure 5
1.3.1 DNA Principles 5
1.3.2 DNA Structure 5
1.3.3 Base Pairing 6
1.4 Human Genome Project 8
1.5 DNA Profiling 9
1.6 Population Variation 11
1.7 Principles of Population Genetics 13
1.8 Gene Clusters 13
1.9 Major Histocompatibility Complex (MHC) 14
1.10 MHC Ancestral Haploypes 16
1.11 SNPs vs Indels 17
1.12 Retroelements and Repeat Sequences 20
1.13 SINEs – Short Insterspersed Nuclear Elements 21
1.14 Polymorphic Alu Insertions (POALINs) 21
1.15 Profiling Ethnicity in Forensic DNA 23
xi
1.16 Previous Population Studies 24
1.17 Hardy-Weinberg Equilibrium 24
1.17 Bedouin Culture 25
1.18 Aims for this Thesis 25
CHAPTER 2 MATERIALS AND METHODS
2.1 Genomic DNA 28
2.2 POALIN PCR Assay 28
2.2.1 PCR Reaction 28
2.2.2 Cycling Conditions 29
2.2.3 Electrophoresis 29
2.3 Genotype and Phenotype Analysis 29
CHAPTER 3 RESULTS
3.1 Location of POALINs within the MHC Class 1 Region 33
3.2 Distribution of POALIN Allele Frequencies in Bedouin
Population 33
3.3 Hardy-Weinberg Equilibrium 42
CHAPTER 4 DISCUSSION
4.1 MHC POALIN Associations 51
4.2 Population Comparison 51
4.3 Hardy-Weinberg Conditions 58
4.4 Conclusion 60
REFERENCES 63 – 74
APPENDIX A 62
xii
List of Tables
Table 1.1: A synopsis on the history of genetics 5
Table 2.1: Primer sequence and product size of the four POALIN loci. 31
Table 3.1: Observed genotypes, allele frequencies, Hardy–Weinberg
significance and heterozygosity for AluyMICB, AluyTF,
AluyHJ and AluyHF in an Arab Bedouin Population 44
Table 3.2a: Allele frequency comparison for AluyMICB. 46
Table 3.2b: Allele frequency comparison for AluyTF. 47
Table 3.2c: Allele frequency comparison for AluyHJ. 48
Table 3.2d: Allele frequency comparison for AluyHF. 49
xiii
List of Figures Figure 1.1: Mendelian Inherited traits as dominant and recessive
phenotypes. 4
Figure 1.2: Illustration of hydrogen bonds during hybridisation between
adenine and thymine, and guanine and cytosine. 7
Figure 1.3: Illustration showing the difference between a sequence
polymorphism and a length polymorphism. 13
Figure 1.4: Illustration of the Major Histocompatibility Complex (MHC):
Chromosome 6. 16
Figure 1.5: Illustration of a SNP. 19
Figure 1.6: Illustration of an insertion and a deletion at the chromosome
level. 20
Figure 1.7: Structural comparison of a retrovirus to types of transposable
elements in the human genome 24
Figure 2.1: Gel photographic presentation of the variation between the
four MHC Class I POALINs and product size. 33
Figure 3.1: Location of the four POALINs within the short arm of
chromosome 6 within the MHC. 35
Figure 3.2: Gel photographic presentation of AluyMICB. 37
Figure 3.3: Gel photographic presentation of AluyTF. 38
Figure 3.4: Gel photographic presentation of AluyHJ. 39
Figure 3.5: Gel photographic presentation of AluyHF. 40
Figure 3.6: Gel photographic presentation of all four MHC POALINs,
verifying their ability to produce both product sizes and
heterozygosity. 41
xiv
Figure 4.1: Population comparison for AluyMICB allele genotype frequencies
among the populations. 51
Figure 4.2: Population comparison for AluyTF allele genotype frequencies
among the populations. 53
Figure 4.3: Population comparison for AluyHJ allele genotype frequencies
among the populations. 54
Figure 4.4: Population comparison for AluyHF allele genotype
frequencies among the populations. 55
Appendix
Figure 3.7: Gel photographic presentation of the second electrophoresis
run of AluyMICB. 62
xv
Abstract
Polymorphic Alu insertions (POALINs) are known to contribute to the variation and
genetic diversity of the human genome. In this report specific POALINs of the Major
Histocompatibility Complex (MHC) were studied. Previous population studies on the
MHC POALINs have focused on individuals of African, European and Asian descent.
In this study, we expand the research by studying a new and previously
uncharacterised population, focusing on the Bedouin from the Middle East.
Specifically we report on the individual insertion frequencies of four POALINs
within the MHC class I region of this population.
POALINs are members of a young Alu subfamily that have only recently been
inserted into the human genome. POALINs are either present or absent at particular
sites. Individuals that share the inserted (or deleted) polymorphism inherited the
insertion (or deletion) from a common ancestor, making Alu alleles identical by
decent. In population genetics a comparison of the resulting products from each
population can then be done by comparing the lengths of the PCR products in a series
of unrelated individuals and may also detect polymorphisms with regard to the
presence or absence of the Alu repeats.
As a direct result of their abundance and sequence identity, they promote genetic
recombination events that are responsible for large-scale deletions, duplication and
translocations. The deletions occur mostly in the A-T rich regions and have found to
be unlikely to have been created independently of the insertions of the Alu elements
(Callinan et al, 2005) The easy genotyping of the POALINs has proven to be very
valuable as lineage markers for the study of human population genetics, pedigree and
forensics as well as genomic diversity and evolution. POALINs have been used in a
range of applications, primarily focusing on anthropological analysis of human
populations. As a result of its ease of use and its utility as a marker in human
evolutions studies, combining the POALINs along with other markers used in
forensics could lead to improved identity testing in forensic science. More
xvi
specifically, in combination with more traditional markers, race specific genotypes
and haplotypes could be used for profiling crime scene samples.
This thesis reports on frequencies of four POALINs, AluyMICB, AluyTF, AluyHJ and
AluyHF. The POALINs simplicity comes from the markers method of showing a
presence or absence of the gene. Allele *1 is represented by a smaller sized band
when run through an electrophoresis and was indicative of the absence of the
POALIN gene. For all four POALINs Allele*1 had the higher frequency in all
populations studied. Allele*2 is represented by a large band and is indicative to the
presence of the POALIN marker. In the case of this report, the majority of the
Bedouin population showed no sign of the presence of the four POALIN used.
Out of the four POALINs, the highest frequency for the insertion of the allele
(Allele*2) in the Middle Eastern Bedouin population was 0.100 for AluyMICB and the
lowest was 0.056 for AluyTF. AluyHJ and AluyHF both showed no variation,
possibly due to an inadequate sample size used for the study. Comparisons were
made with other populations from previous studies. The allele frequencies of
AluyMICB (0.100) showed a strong similarity to the North Eastern Thai AluyMICB
(0.117) and found to be comparable to the Australian Caucasian AluyMICB (0.157)
and the South African Sekele San AluyMICB (0.050). While the AluyTF (0.056)
showed a strong similarity to the Malaysian Chinese AluyTF (0.040) as well as to
those from the North Eastern Thai AluyTF (0.086) and found to also be comparable
with the South African Sekele San AluyTF (0.034)
Anthropologically, this can suggest that the Middle Eastern Bedouin population
diverged from the Asian population after they split from Africa. This is supported by
previously reported molecular data using various types of genetic markers. In a study
using six separate Alu genes, Antunez-de-Mayolo et al were able to generate a
phylogenetic tree, in which the biogeographical groups followed a pattern. The
biogeographical groups started with African populations that were found to relate
closely to the hypothetical ancestral African population. The African populations
xvii
were then followed in order by Southwest Asian populations, European populations
which include Middle Eastern groups (Antunez-de-Mayolo et al, 2002).
This study shows the similarities and differences between the frequencies of the
Middle Eastern Bedouin and the rest of the compared populations. Though no clear
results were determined, the information from the POALINs along with information
provided from other genetic markers can lead to further research on the Bedouin
population and the improvement of the forensic population database in order to
accurately test individual ethnic background of samples to be analysed.
1
Chapter 1 Background
1.1 Introduction
Traditionally, DNA profiling has created controversy in the court room since it was
first introduced in the mid 1980s. Though used in many cases, the court was still
sceptical about the credibility of the techniques that had been used and how the
scientist reached their results. The methods over the years were slowly replaced with
new and more complex techniques, which used chromosome markers and
subpopulation examination, but could still be challenged in the court room. Today‟s
techniques have taken another step forward and allow more accurate comparison
among DNA samples. Subpopulation testing has also made its way back into DNA
profiling methods and has proved to be fundamental in DNA comparison (Lynch,
2003).
In this thesis, we studied Polymorphic Alu Insertions (POALINs) within the Human
Major Histocompatibility Complex (MHC) to establish similarities and differences
among the subpopulation of Middle Eastern Bedouin and other populations studied
using Polymorphic Alu Insertions (POALINs) from the Major Histocompatibility
Complex (MHC). The aim was to use DNA profiling in a subpopulation using a
marker known as Alu for more accurate identification and analysis of DNA samples.
1.2 History of Genetics
Some of the first diversity studies were done over 150 years ago and were published
by Charles Darwin in “On the Origin of Species” (Darwin, 1859). This publication
introduced the Theory of Evolution and the concept of Natural Selection. But one of
the chief difficulties for Darwin and other naturalists of his time was that there was no
agreed upon model of heredity. The idea of heredity had not been completely
separated conceptually from the idea of the development of the organism. Darwin
himself saw variation and heredity as two essentially opposed forces, with most genes
2
working to preserve the fixity of a type rather than acting as the agent of species
variability.
As scientists debated the concept of variation and heredity, Darwin himself produced
the theory of Pangenesis, which essentially was a model of „blended‟ heredity. The
effects of used and disused body parts from the parent were transmitted to the child,
and were contribution from each parent were roughly equal (Darwin, 1859). But with
this model could over time the species still evolve?
It was in 1866 that genetics took a step forward, and it is generally held that it started
with the works of Gregor Mendel and his pea plants. His theories showing the
heritability between plant hybridisation were soon published and became knows as
Mendelian Inheritance (Figure 1.1). This basic principle was later applied to a wide
variety of organisms and was developed by geneticists into the Mendelian-
Chromosome Theory of Heredity (Griffiths et al, 2005). This principle was widely
accepted in 1925, which brought forth the statistical framework of population
genetics and the explanation into the study of evolution. With the basic pattern of
inheritance now understood, biologists were able to focus on the physical nature of
the gene. It was in 1953 that the first models of viruses, bacteria, and the double
helical structure of DNA was discovered (Watson and Crick, 1953; Perutz, et al.,
1969; Olby. 2003). In 1986 Kary Mullis brought genetics another step forward with
the development of the polymerase chain reaction (PCR). The PCR allowed
researchers to amplify DNA easier, which enabled in the 1990s for DNA
fingerprinting and gene therapy to develop (McGill, 2000), and provided the
equipment needed for the Human Genome Project to be set in motion; leading to the
era of molecular genetics.
The increase in technology and the introduction of the automatic sequencing
technology in 1995 helped to further research and understanding in the genome. In
1996 at the Roslin Institute in Scotland, Ian Wilmut and his colleagues began using a
technique known as the Somatic Cell Nucleus Transfer, which allowed the cell
nucleus from an adult cell to be transferred into an unfertilised oocyte that had had its
3
nucleus removed. The hybrid cell was then stimulated to divide by an electric shock,
and when it developed into a blastocyst it was then implanted in a surrogate mother.
It was this technique that allowed Wilmut and his colleagues to genetically clone the
first live animal, known to the world as Dolly. The creation of Dolly has lead to
continuous controversy over the use of stem cell technology and the possibility of
further animal cloning and the possibility of human cloning.
1.3 Fundamentals of Genetic Structure
1.3.1 DNA Principles
To understand how our genetic and physical traits are passed on from our family, one
must understand how the genetic structure of human DNA takes form. An average
human is composed of approximately 100 trillion cells, all of which originated from a
single cell. Each cell contains individual genetic coding and within the cell is the
nucleus which is the control centre. Within the nucleus there is a chemical substance
known as Deoxyribonucleic acid (DNA), which is considered the genetic blueprint as
it stores the information necessary for passing down genetic attributes to future
generations (Watson and Crick, 1953; Benham and Mielke, 2005).
The encoded information within the DNA structure is passed from generation to
generation with one half of a person‟s DNA information coming from their mother
and the other from their father. DNA has two primary purposes: one is to make
copies of itself so cells can divide and carry on the same information; and two, to
carry instructions on how to make protein so cells can build the machinery of life
(Griffiths et al, 2005; Watson and crick, 1953; Benham and Mielke, 2005). DNA
accomplishes this by carrying the information for making all of the cell‟s proteins.
These proteins implement all of the functions of a living organism and determine the
organism‟s characteristics. To pass on the information the cell reproduces by making
copies of itself, in doing so it has passed all of the information it carries on to the
daughter cells created. Before the cell can reproduce, it must first replicate its DNA,
again by making copies.
4
X
G
X b
b
G
G
GG Gb
Gb bb
Figure 1.1: Mendelian Inherited traits as dominant and recessive phenotypes. Parental
generation passes down both the dominant and recessive genes to the first
generation. The dominant (green) and the recessive (blue) phenotype all look
the same in the first generation and show a 3:1 ratio in the second generation.
G b b
b G b G
=
X
G b b G
5
Table 1.1: A synopsis on the history of genetics
Year Discovery
1859 Published the “On the Origins of Species”
1866 Gregor Mendel‟s paper is published. Concept of inheritance in pairs;
dominance & recessive
1869 Nuclein, now known as DNA identified by Friedrich Miescher
1900 Mendel‟s theories were independently discovered and verified, marking the
beginning of modern genetics
1902 Walter Sutton pointed out the interrelationships between cytology and
Mendelism, closing the gap between cell morphology and heredity. Proposes
Chromosome theory of heredity.
1905 Nettie Stevens and Edmund Wilson independently described the behaviour of
sex chromosomes-XX determines female; XY determines male.
1908 Archibold Garrod proposed that some human diseases are due to "inborn errors
of metabolism" that result from the lack of a specific enzyme.
1931 Harriet Creighton and Barbara McClintock determine that genetic
recombination is caused by a physical exchange of chromosomal pieces
1950 Erwin Chargaff discovered a one-to-one ratio of adenine to thymine and
guanine to cytosine in DNA samples from a variety of organisms.
1953 James Watson and Francis Crick discover DNA is in the shape of a double
helix.
1959 François Jacob and Jacques Monod discover that Messenger RNA is the
intermediate between DNA and protein.
1966 Genetic code is cracked by a number of researchers
1977 Fred Sanger developed a chain determination DNA Sequencing technology
1985 Kary Mullis published a paper describing the polymerase chain reaction
(PCR), the most sensitive assay for DNA yet devised.
1988 The Human Genome Project began with the goal of determining the entire
sequence of DNA composing human chromosomes.
1989 Alec Jefferys coined the term DNA fingerprinting and was the first to use
DNA polymorphisms in paternity, immigration, and murder cases.
1990 Gene therapy and genetically modified foods were introduced.
1995 Automated sequencing was introduced.
1996 First cloning of a mammal performed by Ian Wilmut and Colleagues
2001 Sequence of the human genome released.
2007 Controversies continue over human and animal cloning using stem cell
research.
6
1.3.2 DNA Structure
The structure of DNA lends itself easily to DNA replication. Watson and Crick
discovered that DNA had two strands, and that these strands were twisted into the
shape of a double helix. Each side of the double helix runs in opposite directions
allowing the structure to simply unzip when replication is to take place. The DNA
structure itself is composed of nucleotide units that are made up of a nucleotide base,
a sugar, and a phosphate. These nucleotide bases are composed of four characters
representing the four nucleotide bases: A (adenine), T (thymine), C (cytosine) and G
(guanine) (Figure 1.2), providing variation in each nucleotide unit and yielding the
diverse biological differences among human beings and all living creatures, while the
strong back bone structure of the DNA molecule is comprised of sugar-phosphate
portions of adjacent nucleotides are bonded together. The phosphate of one
nucleotide is covalently bounded to the sugar of the next nucleotide (Griffiths et al,
2005; Watson and Crick, 1953; Benham and Mielke, 2005).
1.3.3 Base Pairing
While the strands of DNA are made of the sugar and phosphate portions of the
nucleotides, the middle part of the DNA strands are made up of the nitrogenous bases.
The nitrogenous form pairs with the bases on the other side of the DNA strand, and is
formed from two complementary nucleotides: purines and pyrimidine, bound together
by a process know as hydridisation. The individual nucleotides pair up with their
complimentary base through the hydrogen bonds that form between the bases. Two
hydrogen bonds form between purines allowing adenine to hybridise only with
thymine and three bonds form between pyrimidines, allowing cytosine to hybridise
only with guanine, making G-C base pairs a little stronger then A-T base pairs and
thus creating the twist to shape the double helix (Butler, 2005; Benham and Mielke,
2005).
Though the hybridisation is a fundamental property of DNA, the hydrogen bonds may
be broken by elevating the temperature or through chemical treatment, a process
7
N H O N
N
N
N H N
N H O
N
H
H Guanine Cytosine
N
N
N
N
O H N
H
H N
N
O Adenine Thymine
Figure 1.2: Top, a GC base pair with three hydrogen bonds. Bottom, an AT base pair with two hydrogen
bonds. Hydrogen bonds are shown as dashed lines. (Adapted from Alberts et al, 2002)
8
known as denaturation, resulting in a single stranded DNA. This process has allowed
biologist to extract and further examine DNA, to understand DNA‟s stability and to
replicate DNA using PCR technology.
1.4 Human Genome Project
With genetics taking larger steps forward in the research world, an international
research project began to take form in the early 1990s. The Human Genome Project
(HGP) was initially headed by James D. Watson, and had primary goals to determine
the sequences of the bases that make up the human genetic code, DNA (Watson &
Cook-Deagan, 1991). It was with the introduction of new sequencing analyses over
the years that allowed the HGP to progress further and faster, and it was finally in
2000 that a working draft of the human genome was released and was then completed
in 2003, with further analysis still being published.
The HGP has stimulated the development of advanced technology for characterising
DNA and studying genes. The HGP has also had a profound impact on our
understanding of health and disease, by enabling researchers to locate and study more
than 10,000 genes that contain instructions for building a human being (Robbins,
1992). The genetic information can also be used to predict an individual‟s chance to
of inheriting a genetic disease; this deeper understanding of the disease processes at
the molecular level helps to determine new therapeutic procedures, improve both
population studies and provide more statistical power for drug trials among
populations, establishing the importance of DNA in molecular biology (van Ommen,
2002).
The HGP has considerably improved the study of human genetic disease and animal
model systems, by allowing public access to all their research and shedding light on
the relatedness of different populations, which caused a transformation in medicine,
as researchers turned to DNA based determination of individual risk of future illness
or adverse drug response, facilitating individualised preventive medicine (Collins,
2006). Further analysis of similarities between DNA sequences from different
9
organisms has highlighted the existence of several novel genetic mechanisms, the
impact of which could never have been conceived otherwise, such as genetic
imprinting, and trinucleotide repeat expansion and anticipation (Patrinos & Drell,
1997; Joseph, 1995). In turn, the study of these processes has greatly deepened our
fundamental insights into genetics and is also uncovering new answers in the study of
the theory of evolution.
1.5 DNA Profiling
With research and interest in DNA developing, work began to help increase
productivity and knowledge of DNA. The earliest technique developed by Sir Alec
Jeffreys and his colleagues was the multi-locus probe (MLP) technique, which used
chemical restriction enzymes to dissolve DNA into fragments. Because a DNA
„fingerprint‟ was not a direct trace of a person‟s DNA. Jeffreys created the MLP
technique to visualize the selected sequences of a single-stranded DNA and compare
their sizes (Rand et al, 1991; Lynch, 2003). The MLP technique used markers that
bind to an indefinite number of chromosomal sites, resulting in a complex pattern of
bands (Lynch, 2003; Aronson, 2005).
Jeffreys and other supporters of the MLP technique believed that when two samples
were compared, it was virtually impossible, except with identical twins, that an entire
pattern of bands would match, although precise estimates could not be given for the
likelihood that any given band, or number of bands, would match (Lynch, 2003;
Aronson, 2007; McLay, 1996). But the main problem with the interpretation of MLP
band patterns was that the variation in the band intensity was generally independent
of fragment size but dependent on the DNA concentration (Rand et al, 1991).
MLP „fingerprints‟ were quickly replaced in the late 1980s by the single-locus probe
(SLP) technique. This involved the isolation and marking of a limited number of
non-coding DNA regions known as variable number tandem repeat (VNTR)
sequences. These can be found on many chromosomes, and often show variations in
length between individuals. Selected VNTR sequences were shown to be hyper-
variable in the human population, as each variant acted as an inherited allele. These
10
were marked by means of radioactive probes, allowing them to be used for personal
or parental identification (Jeffreys et al, 1985; Buffery et al, 1991; Schneider et al,
1991; Aronson, 2007). Though it showed less genetic information then the MLP
technique, The SLP technique became more commonly used in forensic cases as it
was found useful when using degraded and mixed (victim and perpetrator) DNA
samples, compared to the MLP. Population studies also generated probability
measures for the frequency of SLP patterns in human populations and selected
„racial‟ subpopulations, such as Caucasian, Asian and African (Lynch, 2003; Buffery
et al, 1991).
Although the SLP technique enjoyed the advantages of greater control and more
precise quantification, it also became subject to heated disputes in the courts and
scientific literature. SLP results were presented in probabilistic form, but the
resulting estimates often seemed to predict near absolute identity (Aronson, 2007).
Estimates of the chance that two, randomly chosen and unrelated, individuals would
share the same combination of alleles in a DNA profile sometimes approached less
than one in hundreds of millions (Aronson, 2007; Lynch, 2003).
Most human identity testing nowadays is performed using the Combined DNA Index
System (CODIS). CODIS had been designed to be a system of pointers to help public
US crime laboratories compare and exchange DNA profiles. It consists of two
indexes: the Convicted Offender Index and the Forensic Index. The Convicted
Offender Index contains profiles of individuals convicted of crimes eligible for
CODIS and the Forensic Index contains profiles developed from biological material
found at a crime scene. Using multiple core short tandem repeats (STRs), usually 10-
14 (Westring et al, 2007; Opel et al, 2007) on the autosomal chromosome, and sex
determination done with markers on the sex chromosome, CODIS compares the 10-
14 STR markers of the DNA found at the crime scene to those already within the
database.
11
Alleles only make up approximately 5% of human genomic DNA (Butler, 2005;
Nusbaum et al, 2005) and so markers used for human identity testing are found on the
non-coding region either between or within alleles and thus do not code for genetic
variation (Schneider, 1997). The STR markers used in CODIS use and compare
alleles at similar loci found on pairs of chromosomes within the genetic sample.
Loci that contain alleles that are the same size are described as homologous and
contain the same genetic structure, as a copy of each gene resides at the same locus
on each chromosome of the homologous pair. The alternative possibilities is for two
of the alleles at a genetic locus on homologous chromosomes to be different, these
alleles are termed heterozygous (Griffiths et al, 2005; Starr, 2005). A genotype is a
characterisation of the alleles present at a genetic locus. If there are two alleles at a
locus, 1 and 2, then there are three possible genotypes: 1,1, 1,2 and 2,2. 1,1 and 2,2
being homozygous and 1,2 being heterozygous.
DNA profiling uses this process of determining the genotype present at specific
locations along the DNA molecule. Polymorphic markers that differ among
individuals can be found throughout the non-coding region of the human genome.
Multiple loci from these areas are typically examined in human identity testing to
reduce the possibility of a random match between unrelated individuals (Ania et al,
2002; Aronson, 2007; Baffery, 1991).
1.6 Population Variation
Genetic variation is one facet of the more general concept of phenotypic variation.
Phenotypic variation describes differences in the characteristics of individuals of a
population and is of interest to biologist because it is what natural selection acts upon;
different phenotypes may have different fitness‟s and selection results in fitter
phenotypes leaving more descendants. Phenotypic variation arises from either of two
sources: genetic variation and environmental variation. However, only differences
that arise from genetic variation can be passed on to future generations.
12
Despite the physical variation observed throughout humans worldwide, there is
surprisingly little difference in DNA content between humans. DNA molecules are
the same between different ethnicities, over 99.7% in common. Only a small fraction
of our DNA (0.3%) differs between populations and even a smaller amount among
subpopulations (Butler, 2005; Romualdi, 2001; Mooser et al, 1994). This is evident
by the fact that with the exception of identical twins, we all appear different from
each other. Hair colour, eye colour, height and shape all represent alleles in our
genetic make up. These variable regions of DNA provide the capability of using
DNA information for human identity purposes. DNA variation can be exhibited in
two different ways; either sequence polymorphism or length polymorphism (Butler,
2005) (Figure 1.3). Polymorphisms are the natural variation in a gene, DNA
sequence, or chromosome and usually occur with fairly high frequency within the
general population (Dawkins, 1999). The genetic variation in DNA sequence among
individuals occurring in a population would be considered a useful polymorphism for
genetic linkage analysis, giving researchers more DNA to be examined and a higher
chance that two unrelated individuals compared will have a greater number of
different genotypes (Schneider, 1997).
1.7 Principles of Population Genetics
As biologist learned more and more polymorphic markers, the question of how each
population related made its way into genetic research. Population genetics studied
the inherited variation and its modification over time. It was and is still an attempt to
quantify the variation observed within a population group or a different population
group in terms of allelic and genotype frequencies (Hammer et al, 1997; Nei, 1972).
The simplest description of variation is the frequency distribution of genotypes. A
measure of this variation is the number of heterozygote individuals present in a
population. Variability within a locus has to be stable enough to accurately pass the
allele to the next generation, yet not be too stable or else only a few alleles would
exist over time and the locus would not be as informative over time, losing variability
such as heterozygosity (Perna et al, 1992; Shen, Batzer & Deninger, 1991).
13
Figure 1.3: Illustration of a sequence polymorphism which is a mutation resulting in a difference of a single-base
pair, and a length polymorphism which is a mutation that differs in the amount of fragments within a
chain sequence (adapted from Butler, 2005).
Sequence Polymorphism
ATCGCGTAGACGATTCGG
ATCGCGGAGAAGATTCGG
Length Polymorphism
ATCGCG(GGCT)(GGCT)-----------ATTCGG
ATCGCG(GGCT)(GGCT)( GGCT)ATTCGG
14
Population genetic forces including mutation, gene flow, natural selection, and
random genetic drift all affect gene frequency of alleles present in a population. This
can be seen over time in isolated populations. Once diverged from one another the
population size decreases, resulting in members all coming from a small number of
individuals, and therefore have limited genetic variation, losing genetic distinction
between each other (Arcos-Burgos, 2002). The gene selection pool is smaller in
isolated groups and therefore, not as much shuffling of genes exists (Middleton,
2000).
1.8 Gene Clusters
Gene clusters also found to be very useful when it came to population variation. A
gene cluster is a set of two or more genes that serve to encode for the same or similar
products. Gene clusters exist all over the genome in every organism, each playing
important roles in body development, body functionality and immunity (Singer and
Berg, 1997). A common and important gene cluster is the Hox cluster. The Hox
genes function is to determine where limbs and other body segments will grow in a
developing foetus or larva (Griffiths, 2005; Singer and Berg, 1997). Mutations in any
of the Hox genes can lead to growth of extra, typically non-functioning body parts in
invertebrates, while in humans; it usually causes deformation of the hands and feet or
may result in miscarriages (Goodman and Scambler, 2001).
Found on the short arm of chromosome 11, the Human Alpha-globin and Beta-globin
gene clusters are other important clusters in the genome. These genes have a role in
the formation of haemoglobin and allow haemoglobin to adjust its oxygen-binding
capacity according to the oxygen concentration of its environment (Efstratiadis et al,
1980). It is beta-globin that is altered in human sickle-cell anemia, while without
sufficient normal alpha-globin proteins, individuals can develop alpha-thalassaemia, a
potentially life threatening form of anaemia. Though in areas where Malaria is wide
spread, the mutation of the alpha and beta-globin clusters is an advantage and
prevents the individual from severe infection.
15
The above mutations have been known to spread in some populations, but it is
unknown why one mutation may be more frequent in one population and not in
another. However, they are useful for tracing back recent evolutionary history, as
common ancestors tend to possess the same varieties of gene clusters and genetic
mutations (Schneider, 1997; Singer and Berg, 1997).
1.9 Major Histocompatibility Complex
A „gold standard‟ for population genetics is found on the short arm of chromosome 6.
The major histocompatibility complex gene cluster is the largest region or gene
family found in most vertebrates; in humans it is a complex collection of genes
clustered closely on chromosome 6 (Kulski et al, 2002) (Figure 1.4). One of the most
striking features of the MHC, particularly in humans, is the high gene density. “This
clustering is considered to be biologically and evolutionarily significant and has been
attributed to selection pressure, possibly supporting the co-ordinated expression
and/or matching of allelic forms in cis and the suppression of recombination” (Kulski
et al, 2002). The MHC region is the most polymorphic gene in the genome and
contains genes that are highly duplicated. This duplication is what is responsible for
much of the genetic diversity (Mungall, 2003; Dawkins et al, 1999; Anzai, 2003;
Takasu, 2007; Hedrick, 2002).
Population surveys of the other classical loci routinely find tens to a hundred alleles
still highly diverse. Perhaps even more remarkable is that many of these alleles are
quite ancient. It is often the case that an allele from a particular MHC gene is more
closely related to an allele found in chimpanzees than it is to another human allele
from the same gene (Gagneux & Varki, 2001; Muchmore, 2001).
The MHC is divided into 3 major sub-regions, Class II, Class III (central MHC) and
Class I, centromeric to telomeric. Strong linkage disequilibrium exists across the
MHC particularly among alleles of specific multilocus haplotypes and between
particular genes (Begovich et al, 1992; Rajsbaum, 2002; Dunn et al, 2005; Dunn et al,
2006; Mungall, 2003). An increased list of polymorphisms identified in both the
16
HL
A-A
HL
A-C
H
LA
-B
HL
A-D
R
HL
A-D
Q
HL
A-D
P
21.3
2p
21.3
1p
21.2
p
Cen
trom
ere
Major Histocompatibility Complex: Human Chromosome 6
Figure 1.4: The human major histocompatibility complex (MHC). This group of genes resides on chromosome 6,
and encodes cell-surface antigen-presenting proteins and many other genes.
17
intragenic and inergenic regions of the MHC genomic region will permit rapid
identification of changes that can be localised to small segments (Dunn, 2005).
Other markers have also been used in studying MHC haplotype variation, such as the
polymorphic MHC-related genes, micro satellites, SNPs, and polymorphic Alu
insertions (POALINs) (Walsh, 2003; Dunn, 2005); all are informative genetic
markers in lineage analysis, hitchhiking effects, population genetics and evolutionary
relationships, especially in studying the MHC genomic region (Begovich et al, 1992,
Dunn et al, 2005; Skaug, 2001; Wakeley, 2001; Leelayuwat et al, 1994).
1.10 MHC Ancestral Haplotypes
Studies within the MHC usually focused on groups of haplotypes referred to as
ancestral or extended haplotypes. The initial definition of a haplotype arose from the
recognition that serologic patterns segregated within families. Allelic products of
closely linked genes were assumed to be inherited en bloc, as a unit (Degli-Esposti et
al, 1992; Yunis et al, 2003). Unless there was specific evidence of recombination
between the genes, it soon became obvious that haplotypes that were described in one
family study were similar or identical to those found in other families, suggesting the
possibility that there could be some remote ancestral relationship between different
families and that haplotypes had been maintained en bloc over many generations
(Degli-Esposti et al, 1992; Martins et al, 2007). They are relatively population
specific and are believe to present the original MHC haplotype of our ancestors,
which are still segregating unchanged (Gaudieri et al, 1997).
The existence of ancestral haplotypes implies conservation of large chromosomal
segments (Degli-Esposti et al, 1992). Irrespective of the mechanisms involved in
preservation of ancestral haplotypes, it is clear that these haplotypes carry several
MHC genes, other than Human Leukocyte Antigens (HLA) which may be relevant to
antigen presentation, autoimmune responses, and transplantation rejection (Degli-
Esposti et al, 1992; Marounger, 1999). The ability to recognise intact and the
18
recombination of ancestral haplotypes enable an approach to mapping the MHC gene
(Marounger, 1999).
1.11 SNPs vs Indels
Single nucleotide polymorphisms (SNPs) are the most common type of genetic
variation among people. Each SNP represents a difference in a single DNA building
block, called a nucleotide (Kamahori, 2002; Aoki et al, 2003). For example, a SNP
may replace the nucleotide cytosine (C) with the nucleotide thymine (T) in a certain
stretch of DNA (Figure 1.5).
SNPs occur normally throughout an individual‟s DNA. They occur once in every 300
nucleotides on average, which means there are roughly 10 million SNPs in the human
genome (Griffiths et al, 2005; Kamahori, 2002; Ting, et al, 2006). Most commonly,
these variations are found in the DNA between genes. They can act as biological
markers, helping scientists locate genes that are associated with disease (Aoki et al,
2003; Kirk et al, 2002). When SNPs occur within a gene or in a regulatory region
near a gene, they may play a more direct role in disease by affecting the gene‟s
function.
Another form of polymorphism is a bi-allelic polymorphism, insertion-deletion
(Indel). An indel can be the insertion or deletion of a segment of DNA ranging from
one nucleotide to hundreds of nucleotides (Figure 1.6). The two alleles for bi-allelic
indels can simply be classified as „short‟ and „long‟ (Weber et al, 2002). James
Weber and colleagues at the Marshfield Medical Research Foundation recently
characterises over 2000 bi-allelic indels in the human genome (Weber et al, 2002). A
total of 71% of these indels possessed 2, 3, or 4 nucleotide length differences with
only 4% having greater than a 16 nucleotide length difference” (Butler, 2005).
Short Interspersed Nuclear Elements are just one of the types of indels make up the
majority of the “short” indels found within the genome. And though SNPs can
provide much detail about evolutionary history, because of their size, indels are easier
19
C
G
T
A
SNP
1
2
Figure 1.5: Illustration of a SNP. DNA strand 1 differs from DNA strand 2 at a single-base pair
location. SNPs occur in members of the same group showing variation in their DNA
sequence.
20
Before insertion After insertion
Area being
inserted Inserted
area
Before
deletion After
deletion
Deleted
area
Figure 1.6: Illustration of an insertion and a deletion at the chromosome level.
21
to type and have already been found useful in genetic studies, and have found their
use in forensic identity testing (Weber, et al, 2002; Ye et al, 2002).
1.12 Retroelements and Repeat Sequences
Indels can also be controlled by RNA mediated movement of genetic information
from one locus to another and is known as retrotransposition, while the transposed
genetic information is termed a retroelement.
Retroelements comprise a substantial portion of the human genome, and can be
classified into two groups. Members of the first group are called retroposons or
retrosequences and include SINE elements and processed pseudogenes (Griffiths et al,
2005). Among the retroelements, which in themselves may have the capacity to
transpose, are nonviral elements such as LINEs, as well as endogenous retroviruses
and retroviral elements, with structural analogies to infectious retroviruses (Leib-
Mösch, 1996; Smit, 1996).
Repeat sequences are assumed to influence the genomic stability and to generate hot
spots for recombination. Since closely related retroelements are dispersed in high
copy numbers throughout the human genome, it is conceivable that these sequences
could be involved in unequal crossovers between two related elements on different
chromosomal locations (Leib-Mösch, 1996; Smit, 1996), leading to DNA
rearrangements, such as deletions, inversions, duplications, and translocations.
1.13 SINEs – Short Interspersed Nuclear Elements
Almost half of the human genome is derived from transposable elements. The vast
majority of these transposable elements are SINEs or long interspersed nuclear
elements (LINEs) (Griffiths et al, 2005). LINEs move by retrotransposition with the
use of an element encoded reverse transcriptase, but lack some structural features of
retrovirus-like elements (Okada et al, 1991). SINEs can be best described as
nonautomomous LINEs. Because they have the structural features of LINEs but do
22
Presumably, they are mobilised by reverse transcriptase enzymes that are encoded by
LINEs that reside in the genome.
SINEs in the human genome are sequence approximately 300 bp in length derived
from the 7SL RNA gene. SINEs are inserted into DNA at different location by
retrotransposition, a mechanism in which a complementary DNA generated by
reverse transcription of RNA transcripts is expressed by one of a possible 100 Alu
master copy sequences and is then inserted into a new position in the genome (Jurka,
2002; Batzer, 1994).
1.14 Polymorphic Alu Insertions (POALINs)
Alu sequences are the largest family of SINEs in humans and other primates with
more than a million copies per haploid genome. Alu sequences were ancestrally
derived from the 7SL RNA gene and are thought to mobilise in a process termed
retroposition (Batzer et al, 1994; Batzer et al, 1994; Leib-Mösch & Seifarth, 1996;
Kulski & Dunn, 2005). Once inserted at specific chromosomal locations, most Alu
elements do not appear to be subject to loss or rearrangements, with less than 0.5% of
the Alu elements reported to be polymorphic (Arcot et al, 1996; Batzer & Deininger,
2002; Comas et al, 2004; Dunn et al, 2007). Also, generations of new Alu insertions
by retrotransposition are rare events, making them stable genetic markers, and their
allele frequency distribution varies in geographically distinct human populations
(Antunez-de-Mayolo et al, 2002).
POALINs have several desirable properties for studying genetic variation in human
populations. The non-radioactive, PCR based detection method for these
polymorphisms make it feasible to rapidly screen large numbers of DNA (Batzer,
1994). The Alu insertions also appear to have a relatively stable integration into the
genome and are rarely deleted. Even when deletion of an Alu element occurs, the
deletion is not a precise excision of the Alu element, but rather it leaves behind a
signature of the original insertion event (Batzer, 1994; Dunn, 2005).
23
The rate of insertion and fixation of new Alu elements are about 100-200 per million
years, so the independent insertion of two different Alu elements at the same location
is the genome has essentially no chance of occurring (Arcot et al, 1996; Comas, 2001;
Kulski et al, 2002). Therefore, individuals who share POALINs inherited them from
a common ancestor, making POALINs identical by descent (Paabo et al, 2001; Batzer,
1994; Kass 2006). The likelihood of Alu insertions occurring at different loci within
the same individual (haplotype) is extremely rare, haplotypes with multiple POALIN
sites (two or more) have most probably arisen by recombination of haplotypes with
single but different polymorphic Alu elements (Figure 1.7) (Dunn, 2005; Perna, 1992).
POALINs clearly represent an ongoing evolutionary process in the human genome,
and the Alu family of repeats represent a unique source of genetic variation for
human population genetics and forensic identity testing (Batzer et al, 1994).
Though POALINs will provide a strong source of genetic variation, the benefit of
them working with a large dataset can be a limitation for researchers, that in order to
get an accurate interpretation of a large population a large sample size must be tested.
It is also hard to detect one new insertion among one million pre-existing elements in
the genome (Cordaux et al, 2007). The major disadvantage is that non-African
populations all have the absence for a variety of particular POALINs, which reduces
the genetic diversity of the world and excludes those POALINs for population
comparisons (Cordaux et al, 2007).
1.15 Profiling Ethnicity in Forensic DNA
The genetic differences between subpopulations are very important. Shared ancestry
can cause a defendant‟s DNA profile to be more common among individuals from the
same subpopulations, and this subpopulation will often include some, perhaps even
most, of the alternative possible culprits (Ayres et al, 2002) (Triggs and Buckleton,
2002). If possible, population databases for use in forensic DNA testing should
contain unrelated individuals of known ethnicity. However, this may not be
24
Transposition
Autonomous
Nonautonomous
Nonautonomous
Autonomous
Structure
AAA
ORF1 ORF2 (pol)
AAA
transposase
LTR LTR gag pol env
AAA
Insertion
Nonautonomous
Type
Full Length
Adenine Rich Segments
AluJ, AluSx, AluSq, AluSp, AluSc,
AluY, AluYa5, AluYa8, & AluYb8,
LINEs
Element
DNA
Transposons
Alu
SINEs
Retrovirus
Figure 1.7: Structural comparison of a retrovirus to types of transposable elements in the human genome.
25
completely possible in a practical sense, as many laboratories are required to use
samples that have been made anonymous prior to study (Butler, 2005).
In addition, categories of ethnicity are often subjective and may be based on
perceived phenotype or cultural classification. Broad ethnic categories are usually
adequate for most forensic databases, unless an isolated population is of interest
(Ayres et al, 2002; Triggs and Buckleton, 2002). Sampled individuals may also have
more than one easily definable ethnic background and may prefer to be grouped
differently from a cultural stand point than they might otherwise biologically be.
Finally, people who have been adopted or conceived through in vitro fertilisation may
not know their genetic heritage (Butler, 2005).
All of these can lead to potential bias against the defendants, but can be overcome by
assuming that all alternative possible culprits have the same ethnic background and
are from the same subpopulation as the defendant (Ayres et al, 2002) (Triggs and
Buckleton, 2002). Examination of allele frequencies observed with different sample
sets from around the world have shown small differences between individuals of the
same population, and a distinguishable difference between different populations,
providing as much accuracy to the forensic testing as possible. Though there is still
potential difficulty with the „same subpopulation‟ approach in that allele frequency
estimates for the defendant‟s subpopulation may not be available. (Ayres et al, 2002)
(Triggs and Buckleton, 2002; Butler, 2005).
1.16 Previous Population Studies
Although a variety of studies have indicated that using statistical clustering
techniques to examine genetic information may allow for geographically based
grouping of individuals that tenuously map onto some conceptions of ethnicity
(Zyphur, 2006; Paabo, 2001). These studies have also indicated that the amount of
genetic variation within these groupings is significantly larger than the variation that
exists between them.
26
Many population studies divide the world into three primary ancestral groups African,
Asian and Caucasian, roughly representing the populations around the world (Comas
et al; Antunez-de-Mayolo, 2002). These categories not only can be hard to
distinguish from „race‟, but they also ignore, usually the overlap between groups and
the continuous nature of the way people and genes spread today (Comas et al, 2004).
Further research is available to look at a wide variety of uncharted populations as well
as the subpopulations that lie within.
1.17 Hardy-Weinberg Equilibrium
When it comes to population studies, researchers cannot visually examine the extent
to which genetic alleles tend to be inherited together. The simplest way to determine
the independence of alleles within a locus is to use the Hardy-Weinberg principle.
G.H. Hardy and W. Weinberg independently suggested a scheme whereby evolution
could be viewed as changes occurred in frequency of alleles in a population of
organisms. They argued that when certain conditions were met, breeding in large
populations, random mating, no mutation, migration and no natural selection, the
population‟s alleles and genotype frequency would remain constant from generation
to generation (Price, 1971; Aronson, 2007).
Checking for the Hardy-Weinberg Equilibrium (HWE) is performed by taking the
observed allele frequencies and calculating the expected genotype frequencies based
on those allele frequencies (Price, 1971). If the observed genotype frequencies are
close to the expected genotype frequencies calculated from the observed allele
frequencies, then the population is in Hardy-Weinberg Equilibrium and allele
combination are assumed to be independent of one another (Aronson, 2007).
1.18 Bedouin Culture
Bedouin derives from the word badawi, meaning an inhabitant of the Arabian and
Syrian Desert. This isolated population has divided itself among numerous amount of
tribes spread out across the Arabian Peninsula desert (Losleben, 2002). These
27
nomads of the desert travel from oases to oases using the resources as they grow
naturally. Their origin is unknown, but many have thought that the Bedouin were
descended from nomads who herded cattle at a time when the climate was milder
(Losleben, 2002; Abu-Rabia, 2002). While, the Bedouin themselves believe they are
the descendents of Shem, son of Noah. The Bedouin people are very tribe oriented
and even with strong westernisation they still maintain their cultural customs.
However, due to their deeply rooted customs of consanguineous marriages, the
Bedouins suffer from genetic diseases at a higher rate then an average population.
They do not carry more genetic mutations then the general population, but because
more than 50% of the population with almost two thirds of the consanguineous
mating being between first cousins, they have a significant higher chance of marrying
someone who carries the same mutations (Sheffield, 1998; Hsien, 2006). This is
consistent due to the fact that many in the Bedouin culture still travel in “goum”
which generally consist of people from one groups of members of a descent group.
1.19 Aims for this Thesis
Previous population studies have generally focused on three main population groups:
Caucasian, Asian and African. Subpopulations are then categorised into these three
main ethnic groups. To begin an understanding of subpopulations and their
connections I focused my research on the Bedouin population found in the deserts of
the United Arab Emirates. Arab populations are considered a subpopulation of
Caucasian ethnicity, but are they genetically similar to Caucasians?
A variety of markers have been used from STRs to microsatellites in order to
compare populations and subpopulations. In this experiment my work focused on the
study of the MHC Class I POALINs. POALINs have been emerging over the last few
years and have been very useful in previous population studies and have also been
found to be important polymorphic markers in ongoing population and disease studies.
The four POALINs used in this experiment were previously researched by Dr. David
Dunn for his PhD completed in 2005. My aim was to further characterise the
28
reliability of the four POALINs and to determine their frequency distribution within
the Bedouin population, and to compare the frequencies within other known
populations in order to understand the evolutionary history of the Bedouin population,
and to determine the population category in which they fit.
Further research into subpopulations can also help to improve technology currently in
place to understand and identify the separate populations. In the area of forensic
science, databases are used in order to compare DNA found at a crime scene to those
previously tested. In order to get the most accurate results, 13 STR markers are used;
as well the DNA is compared to a particular population database. It is at this point
that the accuracy of the test lowers, as placing an individual into a population
category is usually based on phenotypes rather then genetics, and may result in an
inaccurate comparison if the individual belongs to another population then the one
they are compared to. Researching further into the POALINs and the connection
between subpopulations may assist in enabling forensic testing to accurately
determine the population of the individual to be identified.
My aim is to assist in continuing improvement of the forensic science DNA database,
by proving the reliability of the POALINs in population research, and to also begin
research into the genetics of the Bedouin population, to understand their genetic flow,
as well as lead to research to understand the particular diseases that affect their
population.
29
Chapter 2 Materials and Method
2.1. Genomic DNA
Whole blood was drawn from 54 unrelated healthy Bedouin individuals following
standard procedures and after ethics approval from Dubai HE approval information.
The DNA was then extracted using the High Pure Viral Nucleic Acid Kit (Roche
Applied Science, Indianapolis, IN, USA). 300μl of whole blood from each sample
was mixed with 200μl of binding buffer to lyse the cell wall and to release the DNA,
and 40μl of Proteinase K to target and denature the protein.
100μl of isoproponal was added to remove residual amounts of protein. 500μl of
inhibitor Removal Buffer (5M guanidine-HCl, 20mM Tris-HCl, pH 6.6) was then
added to remove of lipids in the mixture. The DNA was then washed with wash
buffer (20mM NaCl, 2mM Tris-HCl, pH 7.5) and centrifuged twice. The DNA was
then washed using cold 70% ethanol, centrifuged and the supernatant was discarded,
leaving purified template DNA that was diluted in TE Buffer (1mM EDTA, 10mM
Tris-HCl, pH 7.5) to a concentration of approximately 20ng/μl. 4μl was used for each
polymerase chain reaction (PCR) assay.
2.2. POALIN PCR Assay
2.2.1 PCR Reaction
The presence and/or absence of the Alu at each of the four loci were distinguished
from each other by the different sizes of the PCR product for each primer pair (Table
2.1). For primers AluyHJ, AluyHF and AluyMICB the PCR solution (20μl) contained
80ng of DNA template, 10pmol of each primer, 25nmol of each deoxyribonucleotide
triphosphates (dNTPs), 0.4 units of FastStart Taq Polymerase (Roche Applied
Science, Indianapolis, IN, USA), 3mM of MgCl2 and 2μl of 10xPCR Buffer (600 mM
Tris-HCl pH 8.3, 250 mM KCl, 1% Triton X100, 100 mM β-mercaptoenthanol).
AluyTF included 40ng of DNA template, 5pmol of each primer, 0.4 μl of each dNTPs
0.5 units of FastStart Taq Polymerase, 1μl of MgCl2 and 1μl of 10xPCR Buffer (600
mM Tris-HCl pH 8.3, 250 mM KCl, 1% Triton X100, 100 mM β-mercaptoenthanol).
30
2.2.2 Cycling Conditions
Each solution was performed using a DNA Engine Tetrad Thermal Cycler (Bio-Rad
Laboratories, Hercules, CA, USA) with a hot start at 95°C for 10 mins for 1 cycle to
release the FastStart Taq, 35 cycles with a denaturation at 95°C for 30 secs, annealing
temperature at 59°C for AluyMICB and AluyHF, 55.1°C for AluyHJ and 56°C for
AluyTF, and an extension step at 72°C for 45 secs. A final extension step of 72°C for
10 mins complete the cycle.
2.2.3 Electrophoresis
The reaction programs were analysed by horizontal sub-cell model 192 gel
electrophoresis (Bio-Rad Laboratories, Hercules, CA, USA), in 1.5% agarose using
Ethinium Bromide running buffer. Fragments of different sizes were produced for
either the presence or absence of the POALINs (Figure 2.1); a single fragment of
different sizes for the two homozygous and two fragments for the heterozygous. Two
Caucasian DNA samples from Busselton Research Foundation were used as positive
controls, one homozygous for the absence and the other homozygous for the presence
of Alu insertions.
2.3 Genotype and Phenotype Analysis
The observed allele frequencies were obtained by using the gene counting method
(Ceppellini, Siniscalco et al. 1955). The method is used by adding the individuals
with the same of either of the two genotypes: A(p) and a(q). Every AA individual has
2 A genes and every Aa individual has 1 A. This is relative to all the genes in the
population, by dividing the total number of alleles present in the samples population
(2 x number of individuals).
To determine if a population is in Hardy-Weinberg Equilibrium an expected
frequency must be determined and compared to the observed frequency of the
population. The Hardy-Weinberg equilibrium equation (p2 + 2pq + q
2) was
performed for each of the POALINs used. If the population were in Hard-Weinberg
Equilibrium, it would be expected that the frequencies calculated would be similar to
those of the observed frequencies.
31
Table 2.1: The primer sequences and product size for the PCR amplification of the 4 POALIN loci.
Fragment size
(bp)
Aluy
Loci Primer Name Primer Sequence (5' – 3')
Size
(bp)
Accession
Numbera
Positionb allele*1 allele*2
MICB AluyMICB.F GCC TTC CAA TGC CAT TCA CAG 21 AC006046 38,921 38,941
502 664 AluyMICB.R CTC AGC CCT GCT TTC CCA TCT 21 AC006046 38,277 38,297
TF AluyTF.F GTG CCT GGT AAA AAT TTA AGA GCT GTA 27 AC005530 7,150 7,177
422 710 AluyTF.R TGC ACC CGG CCT AAA ACC ACT GGT T 25 AC005530 7,836 7,859
HJ AluyHLAJ.F AAG AAA CCC ATA ACT CAC TTG 21 AP000519 11,430 11,450
163 501 AluyHLAJ.R TGT GTC CAG GTT AAA CTT CAG 21 AP000519 11,909 11,929
HF AluyHF.F GCC TCA TGG CCT GAA TCT GCC AGT GTC CTT 30 AP000521 124,367 124,396
458 605 AluyHF.R GTA ACT GAC GTG CCC TCT ATA GTA TAG TCT 30 AP000521 124,794 124,825
aThe accession number and
bpositon can be found on the National Centre for Biotechnology Information (NCBI) database.
32
Using the observed frequencies, the expected frequency is calculated by squaring the
observed frequency of each genotype (p2 or q
2). From the frequency calculated, the
expected number of individuals to have that genotype can be determined (p2
or q2 x
number of individuals in the population). The heterozygous individuals are then
calculated using the expected frequency results in the equation 2pq and then
multiplying them by the number of individuals in the population to determine the
number of expected individuals with that genotype.
To ensure that the calculations are done correctly to determine genotype frequencies,
the equation p2 + 2pq + q
2 should equal to 1. If the results add up to 1 then it can
suggest that the allele and genotype frequencies are in Hardy-Weinberg equilibrium.
In other words, it can be expected that the allele frequencies will remain constant over
time. However, it does not imply that the population meeting Hardy-Weinberg
equilibrium is not evolving; it merely indicates that the particular locus being studied
is not changing.
For statistical purposes a chi-square (χ2) 2x2 contingency table from an online chi-
square calculator, GraphPad had been used. Using the results of the chi-square, the
probability value was then calculated using an online p-value statistical calculator
from danielsoper.com to determine the genetic relationship between the Middle
Eastern Bedouin and enable the ability to compare the Middle Eastern Bedouin to
each of the individual populations that had been previously studied using the same
POALIN markers. P-value measures how much evidence there is against the null
hypothesis, a hypothesis that presumes no change or no effect, in this case, that the
populations are identical. The general rule is that a small p-value is evidence against
the null hypothesis while a large p-value means little or no evidence against the null
hypothesis. Though a large p-value should not automatically be construed as
evidence in support of the null hypothesis; the failure to reject the null hypothesis can
be caused by an inadequate sample size.
33
1000bp
500bp 664bp
A: AluyMICB
502bp
MW 3 4 5 6 7 8 MW 3 4 5 6 7 8 9
1000bp 710bp 422bp
B: AluyTF
500bp
C: AluyHJ
1000bp
500bp
163bp
501bp
MW 3 4 5 6 7 8
D: AluyHF
1000bp
500bp 605bp 458bp
MW 3 4 5 6 7 8
Figure 2.1: Gel photographic presentation of the MHC Class I POALINs. The PCR products for the presence and/or absence of the respective
POALINs are visually distinguishable. A marker (MW) control with known sizes (sizes shown on the left for A through D) was used
for each gel (A-D) and the columns represent individual PCR products. The larger PRC product size for each POALIN represents the
presence and the smaller size represents the absence of the POALIN. (A) Columns 1 and 7 represent a homozygous AluyMICB
individual, products in lanes 2 to 5 and 8 represent homozygous individuals without (absence) AluyMICB, and product in lane 6 present
heterozygous individuals. (B) Product in lane 1 represents homozygous AluyTF for homozygous presence. Product 2, 3 and 5 to 8
represent homozygous AluyTF individuals represent homozygous absence of AluyTF and product in lane 4 represents a heterozygous;
individual carrying both one band for the presence and a second for the absence of the Alu gene. (C) Products 1 to 8 all represent
homozygous for the absence of AluyHJ. (D) Products 1 to 8 were also all representative of homozygous for the absence of AluyHF,
although not detected in the limited number of samples tested. The alternative alleles for both markers are shown in Figure 3.6
.
4A
OH
057
4A
OH
043
4A
OH
036
4A
OH
043
4A
OH
036
4A
OH
074
4A
OH
044
4A
OH
074
34
Chapter 3 Results
3.1 Location of POALINs within the MHC Class I Region
A map of the location for the four POALINs within the MHC class I region is shown
in Figure 3.1. Essentially, AluyMICB is located within the first intron of the MICB
gene in the beta block. AluyTF is located in the region between the beta and kappa
blocks close to the TFIIH and CDSN genes. The remaining two, AluyHJ and AluyHF,
are located at the beginning and the end of the alpha block, close to the HLA-J, and
HLA-F genes.
3.2 Distribution of POALIN Allele Frequencies in Bedouin Population
From Figure 3.2 it is evident that the AluyMICB shows the presence of the Alu genes
with a large band at 664bp and the absence of a band at 502bp. The AluyMICB
represents the most heterozygous allele out of all four POALIN primers that had been
used, with 10 individuals representing a heterozygous pair. In order to determine the
frequency of heterozygous alleles the gel photographs of AluyMICB (Figure 3.2) and
AluyTF (Figure 3.4) had been consulted. In Figure 3.2 the separation of the bands are
very difficult to distinguish and an accurate determination of alleles could not be
made. The product from the experiments completed in Dubai, UAE, were run a
second time by electrophoresis at the University of Western Australia lab (See
Appendix). Though due to unforeseen problems during transport little of the product
remained, but was able to provide more information about heterozygous bands that
were indistinguishable from the first test.
From Figure 3.3 it is apparent that the AluyTF primer shows a presence of the Alu
gene with a large band at 710bp and the absence of the gene with a small band at
422bp. For the Bedouin individuals that had been analysed, the group did display 6
individuals with heterozygous alleles, but the rest all showed a small band for the
absence of the gene.
35
Figure 3.1: The human MHC is a 4 Mb region located on the short arm of chromosome 6 (6p21). It is composed of three subregions,
class I, class II, and class III. The class I region is located within a 2000 kilobase (kb) region constituting the telomeric
half of the human MHC. Above is the map of the location and distribution of the four polymorphic Alu insertions
(AluyMICB, AluyTF, AluyHJ and AluyHF), HLA class I loci and related genes within and between the beta, kappa and
alpha blocks of the MHC Class I region (adapted from Dunn, 2005).
AluyMICB AluyTF AluyHJ AluyHF
β block κ block α block Centromeric Telomeric
BA
T1
HL
A-C
M
ICA
H
LA
-B
M
ICB
C
DS
N
DD
R1
FL
OT
1
G
NL
1
HL
A-E
MIC
C
HL
A-3
0
HL
A-9
2
TR
IM26
TR
IM31
H
LA
-J
MIC
D
HL
A-A
MIC
F
HL
A-G
MIC
G
MIC
E
HL
A-F
36
The first well contained the Caucasian control sample for the large band which does
show the primer is capable of amplifying that allele. The bands also get thicker
further down the gel, which could be due to the fact that the Bedouin DNA samples
were extracted using a kit and were diluted to an estimated 20ng/μl, or that there had
been too much sample (10μl) added to the gel when loaded.
The AluyHJ primer indicates the presence of the Alu gene with a large band at 501 bp
and the absence of the Alu Gene with a small band at 163bp (Figure 3.4). With the
Bedouin samples analysed the individuals tested had only represented the absence of
the Alu gene. The control Caucasian sampled used for the large band as well as two
of the Bedouin individual samples showed no result. In order to make sure that the
AluyHJ primer was working, at the University of Western Australia lab, using
Busselton Caucasian samples, all four primers were tested to see if they could amplify
all three, large, small and heterozygous individuals. Figure 3.6 shows that the AluyHJ
primer was able to amplify all three alleles.
The AluyHF primer indicates the presence of the Alu gene with a band at 605bp and
the absence of the gene with a band at 458 bp. The Caucasian control in well 1 was
thought to represent a large band sample, but indicated a small band. During tests in
Perth with the Caucasian samples, the AluHF primer had been giving results, but was
amplifying at a higher base pair then expected. In Dubai it was determined that the
sequence for the primer had been incorrect, and the sample used as a control for the
representation of a large band, no longer amplified as having the Alu gene. The
sequence of the AluyHF primer had been correct and new primers had been ordered.
Similar to the AluyHJ primer all Bedouin individuals tested showed only a small band
for the absence of the Alu gene. Six individuals showed no sign of results, so the
primer was retested in Perth for accuracy. Figure 3.6 shows evidence that the primer
was able to amplifying large, small and heterozygous individuals, confirming that the
primer was working during the tests in Dubai.
37
Fig
ure 3
.2: T
he g
el photo
grap
hic p
resentatio
n o
f Alu
yMIC
B. A
luyM
ICB
is a biallelic p
rimer th
at
show
s either th
e presen
ce and/o
r absen
ce of th
e Alu
by a larg
e or sm
all ban
d size.
Pro
ducts 2
and 3
were p
ositiv
e contro
ls of C
aucasian
DN
A. P
roduct 8
represen
ts the
presen
ce of th
e PO
AL
IN w
ith a h
om
ozy
gous larg
e (664bp) size. S
amples 7
, 11, 1
8,
31, 3
5 an
d 4
3 are h
eterozy
gous h
avin
g b
oth
the p
resence an
d ab
sence o
f the P
OA
LIN
s
leavin
g th
e rest of th
e pro
ducts to
be h
om
ozy
gous fo
r the ab
sence o
f the P
OA
LIN
with
a hom
ozy
gous sm
all (502bp) size.
500bp
1000bp
MW
M
W
MW
41 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
9 5
0 5
1 5
2 5
3 5
4 5
5 5
6 5
7 5
8 5
9 6
0
1000bp
1000bp
500bp
500bp
502bp
664bp
21 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 3
0 3
1 3
2 3
3 3
4 3
5 3
6 3
7 3
8 3
9 4
0
1 2
3 4
5 6
7 8
9 1
0 1
1 1
2 1
3 1
4 1
5 1
6 1
718 1
9 2
0
502bp
502bp
664bp
664bp
38
61 6
2 6
3 6
4 6
5 6
6 6
7 6
8 6
9 7
0 7
1 7
2 7
3 7
4 7
5 7
6 7
7 7
8 7
9
80
Fig
ure 3
.3: T
he g
el photo
grap
hic p
resentatio
n o
f Alu
yTF
. Alu
yTF
is a biallelic p
rimer
that sh
ow
s either th
e presen
ce and/o
r absen
ce of th
e Alu
by a larg
e or
small b
and size. P
roducts 2
and 3
were p
ositiv
e contro
ls of C
aucasian
DN
A. P
roduct 5
, 33, 4
3, 5
3, 5
4 an
d 6
3 are h
eterozy
gous h
avin
g b
oth
the
presen
ce and ab
sence o
f the P
OA
LIN
s leavin
g th
e rest of th
e pro
du
cts to
be h
om
ozy
gous fo
r the ab
sence o
f the P
OA
LIN
with
a hom
ozy
gous sm
all
(422bp) size.
1000bp
500bp
MW
M
W
500bp
500bp
500bp
1000bp
1000bp
1000bp
422bp
710bp
710bp
710bp
710bp
422bp
422bp
422bp
41 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
9 5
0 5
1 5
2 5
3 5
4 5
5 5
6 5
7 5
8 5
9 6
0
21 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 3
0 3
1 3
2 3
3 3
4 3
5 3
6 3
7 3
8 3
9 4
0
1 2
3 4
5 6
7 8
9 1
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 2
0
39
Fig
ure 3
.4: T
he g
el photo
grap
hic p
resentatio
n o
f Alu
yHJ. A
luyH
J is a biallelic p
rimer
that sh
ow
s either th
e presen
ce and/o
r absen
ce of th
e Alu
by a larg
e or sm
all
ban
d size. P
roducts 2
and 3
were p
ositiv
e contro
ls of C
aucasian
DN
A,
pro
duct 2
did
not am
plify
any resu
lt. All p
roducts am
plified
hom
ozy
gous fo
r
the ab
sence o
f the P
OA
LIN
with
a hom
ozy
gous sm
all (163bp) size fo
r all
pro
ducts.
61 6
2 6
3 6
4 6
5 6
6 6
7 6
8 6
9 7
0 7
1 7
2 7
3 7
4 7
5 7
6 7
7 7
8 7
9 8
0
500bp
1000bp
MW
M
W
MW
1000bp
1000bp
1000bp
500bp
500bp
500bp
163bp
501bp
41 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
9 5
0 5
1 5
2 5
3 5
4 5
5 5
6 5
7 5
8 5
9 6
0
21 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 3
0 3
1 3
2 3
3 3
4 3
5 3
6 3
7 3
8 3
9 4
0
1 2
3 4
5 6
7 8
9 1
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 2
0
501bp
501bp
501bp
163bp
163bp
163bp
40
41 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
9 5
0 5
1 5
2 5
3 5
4 5
5 5
6 5
7 5
8 5
9 6
0
Fig
ure 3
.5: T
he g
el photo
grap
hic p
resentatio
n o
f Alu
yHF
. Alu
yHF
is a biallelic p
rimer th
at
show
s either th
e presen
ce and/o
r absen
ce of th
e Alu
by a larg
e or sm
all ban
d
size. Pro
ducts 2
and 3
were p
ositiv
e contro
ls of C
aucasian
DN
A. P
roduct o
ne
did
not am
plify
a large (6
05bp) size. A
ll pro
ducts am
plified
hom
ozy
gous fo
r
th
e absen
ce of th
e PO
AL
IN w
ith a h
om
ozy
gous sm
all (458bp) size.
1000bp
500bp
MW
M
W
MW
500
bp
500bp
1000bp
1000bp
458bp
605bp
458bp
458bp
605bp
605bp
21 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 3
0 3
1 3
2 3
3 3
4 3
5 3
6 3
7 3
8 3
9 4
0
1 2
3 4
5 6
7 8
9 1
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 2
0
41
MW MW MW MW MW
AluyMICB AluyTF AluyHJ AluyHF
502bp
664bp
710bp
422bp
710 +
422bp
501bp
163bp
501 +
163bp
605bp
458bp
664 +
502bp
1000bp
500bp
605 +
458bp
Figure 3.6: Gel photographic presentation of the MHC Class I POALINs. The PCR products for the presence and/or
absence of the respective POALINs are visually distinguishable. A marker (MW) control with known sizes
(sizes shown on the right) was used for the gel and the columns represent individual PCR products. The
larger PCR product seize for each POALIN represents the presence and the smaller size represents the
absence of the POALIN. Using Caucasian samples, large homozygous, small homozygous and
heterozygous are represented for each POALIN, verifying the ability of each primer to produce the wanted
products.
42
3.3 Hardy-Weinberg Equilibrium
The Hardy-Weinberg equilibrium predicts that under stable conditions after a
generation of random mating, genotype frequencies throughout a population at a
specific gene locus become fixed at a specific equilibrium value. These values can be
defined as a function of the allele frequency of the genotype. The entire principle is
based on Mendelian genetics.
In a single locus with two alleles (A and a) have allele frequencies of p and q, the
frequency of genotype AA will be p2, the frequency of genotype Aa will be p*q, and
the frequency aa will be q2. The Hardy-Weinberg model consists of two equations:
one that calculates the allele frequencies and one that calculates genotype frequencies.
These are the foundation of population genetics p + q = 1 and p2 + 2pq + q
2 = 1. Each
genotype has a genotypic frequency and the sum of all genotypic frequency in the
population must add up to 1. 1 is the sum of all the individuals in the specific
population, and through this equation, a population can be examined as being at
genetic equilibrium or not.
Using the raw data collected allele frequency for a gene locus is determined by
observing the population. Each individual with AA has two copies of A alleles,
heterozygote individuals have one of each allele A and a, and individuals with aa
genotype have two copies of the a allele. Calculating the allelic frequency was done
by simply dividing the number of A or a alleles by the total # of alleles in the
population and ensuring that p + q were equal to 1. These are the observed
frequencies.
If the population were to be in Hardy-Weinberg equilibrium then it would be
expected that the genotype frequencies for AA, Aa and aa would be p2. 2pq and q
2,
and that the genotype frequencies add up to 1.
43
In calculating the Hardy-Weinberg equilibrium only the AluyMICB and AluyTF
Bedouin frequencies could be taken into consideration; as both the AluyHJ and
AluyHF frequencies deviated from the Hardy-Weinberg Equilibrium. Even though
the POALINs are found to be in linkage disequilibrium, you can have deviation from
the Hardy-Weinberg equilibrium from some of the POALINs. In the case of the
Bedouin population AluyHJ and AluyHF had showed no variation in the observed
population. The deviation from Hardy-Weinberg in this case can be attributed to the
small population size that had been collected for the study, but can also be due to the
consanguineous marriages which are still part of the Bedouin culture.
AluyMICB observed genotype frequency is 0.850 for allele*1, 0.150 for allele*2 and
0.255 for the heterozygous alleles. Using the Hardy-Weinberg equilibrium the
expected genotype frequencies for allele*1 frequency was expected to be 0.850,
0.150 for allele*2 and 0.255 for the heterozygous allele.
AluyTF observed genotype frequency is 0.944 for allele*1, 0.056 for allele*2, and
0.106 for the heterozygous alleles. Using the Hardy-Weinberg equilibrium the
expected genotype frequencies for allele*1 was expected to be 0.945, 0.054 for
allele*2 and 0.101 for the heterozygous alleles.
AluyHJ and AluyHF frequencies were not taken into consideration for the Hardy-
Weinberg equilibrium as they show no variation in allele genotypes.
The observed genotypes and allele frequencies of the four POALINs are listed in
Table 3.1. The most frequent POALIN was the AluyMICB*2 (0.150) followed by
AluyTF*2 allele (0.056) AluyHJ and AluyHF both had only shown a presence of a
small band, representing the absence of the Alu genes and had shown no other
variation.
44
Table 3.1: Observed genotypes, allele frequencies, and heterozygosity for AluyMICB, AluyTF, AluyHJ
and AluyHF in a Middle Eastern Bedouin Population
Aluy Loci n
Genotypesa Allele Frequencies
Heterozygosity (H)
1,1 1,2 2,2 allele*1 allele*2
Aluy MICB 40 29 10 1 0.850 0.150 0.255
Aluy TF 54 48 6 0 0.940 0.056 0.106
Aluy HJ 48 48 0 0 1.000 N.A N/A
Aluy HF 42 42 0 0 1.000 N/A N/A
a Genotypes: 1,1 homozygote absent; 1,2 heterozygote and 2,2 homozygote present
45
Table 3.2 a-d shows the insertion frequencies of the four MHC POALINs in Middle
Eastern Bedouin compared to other previously studied populations using the same 4
POALINs. The insertion frequency in the Middle Eastern Bedouin AluyMICB is
similar to the Australian Caucasian AluyMICB frequency (0.150) and is above the
frequency for North Eastern Thai (0.117) and below the AluyMICB for the Malaysian
Chinese population (0.170). With a higher frequency of 0.150; there is a significant
separation between the Middle Eastern Bedouin AluyMICB and the African
population AluyMICB frequencies that have been previously studied. Similarly,
Middle Eastern Bedouin AluyTF (0.056) is between the Malaysian Chinese (0.040)
and the North Eastern Thai (0.086) and is comparable to the South African Sekele
San frequency (0.034) and the Australian Caucasian frequency (0.107).
AluyHJ and AluyHF showed no variation among the population size that had been
studied, with the each of individuals DNA bands being representative for the absence
of the two POALINs. No variation can be due to many factors. One is the population
size that was tested. A smaller population size does not provide the study with a
substantial opportunity to acquire a large percentage of variation. In the case of the
Middle Eastern Bedouin, the possibility is also due to the fact that consanguineous
marriages are still a ritual part of the culture and the lack of random mating has
lowered the chance of variation in the particular POALINs studied. No comparison
could be made.
For AluyMICB the probability value (p-value) calculated for the relationship between
the Australian Caucasian and the Middle Eastern Bedouin was a probability of 0.1697.
Though the genotype frequencies of the two populations are similar, the p-value
suggests that the null hypothesis is to be rejected and that the AluyMICB between the
two populations are not identical. The p-value for the Malaysian Chinese comparison
(0.9691), the North Eastern Thai comparison (0.9436) shows a close relationship to
the Middle Eastern Bedouin. And of the African populations the South
African !Kung San (0.9099) and South African Sekele San (0.8136) also show a close
relationship to the Middle Eastern Bedouin population.
46
Table 3.2a: Allele frequency comparison for AluyMICB
Markers Description Race Frequency
HWb Reference χ² P-Value
allele*1 allele*2 Ha
AluyMICB
Polymorphic
insertion
consisting of
2 alleles
(502bp =
AluyMICB*1
and 664bp =
AluyMICB*2)
Malaysian
Chinese 0.830 0.170 0.282 Yes
Dunn et
al, (2007) 0.0015 0.9691
North Eastern
Thai 0.883 0.117 0.207 Yes
Dunn et
al, (2005) 0.0050 0.9436
Mongolian
Khalkh 0.622 0.378 0.470 No
Dunn,
(2005) 0.1338 0.7145
South African
South Eastern
Bantu
0.970 0.030 0.058 Yes Dunn,
(2005) 0.0879 0.7669
South African
Sekele San 0.950 0.050 0.095 Yes
Dunn,
(2005) 0.0556 0.8136
South
African !Kung
San
0.096 0.036 0.069 Yes Dunn,
(2005) 0.0128 0.9099
Australian
Caucasian 0.843 0.157 0.265 Yes
Dunn et
al, (2002) 1.8850 0.1697
Middle
Eastern
Bedouin
0.850 0.150 0.255 Yes This
Study N/A N/A
aHeterozygosity
bHardy Weinberg Formula
47
Table 3.2b: Allele frequency comparison for AluyTF
Markers Description Race Frequency
HWb Reference χ² P-Value
allele*1 allele*2 Ha
AluyTF
Polymorphic
insertion
consisting of
2 alleles
(422bp =
AluyTF*1
and 710bp =
AluyTF*2)
Malaysian
Chinese 0.960 0.040 0.077 Yes
Dunn et al,
(2007) 0.0028 0.9578
North Eastern
Thai 0.914 0.086 0.152 Yes
Dunn et al,
(2005) 0.0068 0.9343
Mongolian
Khalkh 0.780 0.220 0.343 Yes
Dunn,
(2005) 0.1130 0.7368
South African
South Eastern
Bantu
0.900 0.100 0.180 Yes Dunn,
(2005) 0.0135 0.9075
South African
Sekele San 0.966 0.034 0.066 Yes
Dunn,
(2005) 0.0056 0.9403
South
African !Kung
San
0.762 0.283 0.363 Yes Dunn,
(2005) 0.1705 0.6797
Australian
Caucasian 0.893 0.107 0.198 Yes
Dunn et al,
(2002) 0.0174 0.8951
Middle
Eastern
Bedouin
0.944 0.056 0.106 Yes This
Study N/A N/A
aHeterozygosity
bHardy Weinberg Formula
48
Table 3.2c: Allele frequency comparison for AluyHJ
Markers Description Race Frequency
HWb Reference χ² P-Value
allele*1 allele*2 Ha
AluyHJ
Polymorphic
insertion
consisting of 2
alleles (163bp =
AluyHJ*1 and
501bp =
AluyHJ*2)
Malaysian
Chinese 0.700 0.300 0.420 Yes
Dunn et al,
(2007) 0.3530 0.5524
North Eastern
Thai 0.708 0.292 0.413 Yes
Dunn et al,
(2005) 0.3420 0.5887
Mongolian
Khalkh 0.707 0.293 0.414 Yes Dunn, (2005) 0.3430 0.5581
South African
South Eastern
Bantu
0.930 0.070 0.130 Yes Dunn, (2005) 0.0730 0.7870
South African
Sekele San 0.950 0.050 0.095 No Dunn, (2005) 0.0510 0.8213
South
African !Kung
San
0.893 0.107 0.191 Yes Dunn, (2005) 0.1130 0.7368
Australian
Caucasian 0.927 0.073 0.358 Yes
Dunn et al,
(2002) 0.0760 0.7828
Middle
Eastern
Bedouin
1.000 N/A N/A No This Study N/A N/A
aHeterozygosity
bHardy Weinberg Formula
49
Table 3.2d: Allele frequency comparison for AluyHF
Markers Description Race Frequency
HWb Reference χ² P-Value
allele*1 allele*2 Ha
AluyHF
Polymorphic
insertion
consisting of
2 alleles
(458bp =
AluyHF*1
and 605bp =
AluyHF*2)
Malaysian
Chinese 0.970 0.030 0.058 Yes
Dunn et al,
(2007) 0.0305 0.8614
North Eastern
Thai 0.982 0.018 0.035 Yes
Dunn et al,
(2005) 0.0182 0.8927
Mongolian
Khalkh 0.902 0.098 0.177 Yes
Dunn,
(2005) 0.1030 0.7483
South African
South Eastern
Bantu
0.910 0.090 0.164 Yes Dunn,
(2005) 0.0942 0.7589
South African
Sekele San 0.917 0.083 0.152 Yes
Dunn,
(2005) 0.0866 0.7685
South
African !Kung
San
0.940 0.060 0.113 Yes Dunn,
(2005) 0.0618 0.8037
Australian
Caucasian 0.962 0.038 0.03 Yes
Dunn et al,
(2002) 0.0387 0.8440
Middle
Eastern
Bedouin
1.000 N/A N/A No This
Study N/A N/A
aHeterozygosity
bHardy Weinberg Formula
50
With AluyTF the p-value for the Malaysian Chinese (0.9578) and the South African
Sekele San (0.9403) again show a close relationship the Middle Eastern Bedouin.
There is also still a strong relation to the North Eastern Thai (0.9343) as seen in
AluyMICB. However, the comparison differentiates in that the relation to the South
African Bantu (0.9075) is more comparable than that of the Australian Caucasian
population (0.8951) as there is with AluyMICB.
Both AluyHJ and AluyHF, due to the loss of variation in the population had not been
taken into consideration for comparison for the calculation of p-value and the
comparison between the populations.
63
References
1. Abu-Rabia, A. (2002). Bedouin Century: Education and Development among the
Nagev Tribes in the Twentieth Century. Berghahn Books: Isreal
2. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, and Walters P. (2002)
Molecular Biology of the Cell, 4th
Ed. Garland Science: London.
3. Ania L, Manson E, C.A. Jones (2002) Cell Biology and Genetics. Elsevier Health
Sciences: London.
4. Antunez-de-Mayolo G, Antunez-de-Mayola A, Antunez-de-Mayolo P, Papiha SS,
Hammer M, Yunis JJ, Yunis EJ, Damodaran C, Martinez de Pancrobo M,
Caeiro JL, Puzyrev VP, Herrera RJ (2002). Phylogenetics of Worldwide
Human Populations as Determined by Polymorphic Alu Insertions.
Electrophoresis, 23: 3346-3356.
5. Anzai T, Shiina T, Kimura N, Yanagiya K, Kohara S, Shigenari A, Yamagata T,
Kulski JK, Naruse TK, Fujimori Y, Fukuzumi Y, Yamazaki M, Tashiro H,
Iwamoto C, Umehara Y, Imanishi T, Meyer A, Ikeo K, Gojobori T, Bahram S,
Inoko H. (2003). Comparative Sequencing of Human and Chimpanzee MHC
class I Regions Unveils Insertions/ Deletions as the Major Path to Genomic
Divergence. Proceedings of the National Academy of Science, 100: 7708-
7713.
6. Aoki, T., Satoh, K., Imamura, T., Watabe, H. (2004). A New Method for
Detection Single Nucleotide Polymorphism Using GFP-Display. Journal of
Biochemistry and Biophysical Methods, 60: 61-67.
7. Arcos-Burgos, M., and Muenke, M. (2002). Genetics of Populations Isolates.
Clinical Genetics, 61: 233-247.
64
8. Arcot SS, Adamson AW, Lamerdin JE, Kanagy B, Deininger PL, Carrano AV,
Batzer MA. (1996). Alu Fossil Relics – Distribution and Insertion
Polymorphism. Genome Research, 6: 1084-1092.
9. Aronson J.D. (2007) Genetic Witness: Science, Law and Controversy in the
Making of DNA Profiling. Rutgers University Press: London.
10. Bamshad MJ, Wooding S, Watking WS, Ostler CT, Batzer MA, Jorde LB (2003).
Human Population Genetic Structure and Inference of Group Membership.
American Journal of Human Genetics, 72: 578-589.
11. Batzer MA, Deininger PL (2002). Alu Repeats and Human Genomic Diversity.
Nature, 3: 370-379.
12. Batzer MA, Stoneking, M, Alegria-Hartman M, Bazan H, Kass DH, Shaikh TH,
Novick GE, Ioannou PA, Scheer WD, Herrera RJ, Deininger PL. (1994).
African Origin of Human-Specific Polymorphic Alu Insertions. Proceedings
of the National Academy of Science, 91:12288-12292.
13. Batzer, M. M., S. S. Arcot, et al. (1996). Genetic variation of recent Alu insertions
in human populations. Journal of Molecular Evolution, 42(1): 22-9.
14. Begovich, AB., McClure, GR., Suraj, VC., Helmuth, RC., Fildes, N., Bugawan,
TL., Erlich, HA., Klitz, W. (1992). Polymorphism, Recombination, and
Linkage Disequilibrium within the HLA Class II Region. Journal of
Immunology, 148(1): 249-258.
15. Benham, C. J., and Mielke, S. P. (2005). DNA Mechanics. Annual Review of
Biomedical Engineering, 7: 21-53.
65
16. Buffery, C., Burridge, F., Greenhalgh, M., Jones, S., and Willott, G. (1991).
Allele Frequency Distributions of Four Variable Number Tandem Repeat
(VNTR) Loci in the London Area. Forensic Science International, 52: 53-64.
17. Butler, J. M. (2005). Forensic DNA Typing. Elsevier Academic Press: London.
18. Collins, F. S. (2006). No Longer Just Looking under the Lamppost. American
Journal of Human Genetics, 79(3): 421-426.
19. Comas D, Plaza S, Calafell F, Sjantila A, Bertranpetit J (2001). Recent Insertion
of an Alu Element within a Polymorphic Human-Specific Alu Insertion.
Molecular Biology and Evolution, 18(1): 85-88.
20. Comas D, Schmid H, Braeuer S, Flaiz C, Busquets A, Calafell F, Bertranpetit J,
Scheil H,G, Huckenbeck, W, Efremouska L, Schmidt H. (2004). Alu Insertion
Polymorphisms in the Balkans and the Origins of the Aromuns. Annals of
Human Genetics, 68:120-127.
21. Cordaux, R., Srikanta, D., Lee, J., Stoneking M., and Batzer, MA. (2007). In
Search of Polymorphic Alu Insertions with Restricted Geographic
Distribution. Genomics, 90(1): 154-158.
22. Dawkins R, Leelayuwat C, Gaudieri S, Tay G, Hui J, Cattley S, Martinez P,
Kulski JK (1999). Genomics of the Major Histocompatibility Comples:
Haplotypes, Duplication, Retroviruses and Disease. Immunological Reviews,
167: 275-304.
23. Degli-Esposti MA, Leelayuwat C, Daly CN, Carcassi C, Contu L, Versluis LF,
Tilanus MG, Dawkins RL. (1992). Ancestral Haplotypes: Conserved
Population MHC Haplotypes. Human Immunology, 34: 242-252.
66
24. Donaldson CS, Crapanzano JP, Watson JC, Levine EA, Batzer MA. (2002)
PROGINS Alu Insertions and Human Genomic Diversity. Mutation Research,
501: 137-141.
25. Dunn DS, Naruse T, Inoko H, Kulski JK. (2004). The Association Between HLA-
A Alleles and Young Alu Dimorphisms Near the HLA-J, -H, and –F Genes in
Workshop Cell Lines and Japanese and Australian Populations. Journal of
Molecular Evolution, 55 (6): 718-726.
26. Dunn DS, Romphruk AV, Leelayuwat C, Bellgard M, Kulski JK (2005).
Polymorphic Alu Insertions and Their Associations with MHC Class I Alleles
and Haplotypes in the Northeastern Thais. Annals of Human Genetics, 69:
364-372.
27. Dunn DS (2005). Studies on Polymorphic Alu Insertions and Genomic Diversity
within the Major Histocompatibility Complex. PhD Thesis: University of
Western Australia.
28. Dunn DS, Inoko H, Kulski JK (2006). The Association Between Non-Melanoma
Skin Cancer and a Young Dimorphic Alu Element within the Major
Histocompatibility Complex Class I Genomic Region. Tissue Antigens,
68:127-134.
29. Dunn DS, Choy MK, Phipps ME, Kulski JK. (2007). The Distribution of Major
Histocompatibility Complex Class I Polymorphic Alu Insertions and their
Associations with HLA Alleles in a Chinese Population from Malaysia.
Tissue Antigens, 70: 136-143.
30. Efstratiadis AA, Posakony JJ, et al. (1980). The Structure and Evolution of the
Human Beta-Globin Gene Family. Cell, 21(3): 653-668.
67
31. Gagneux P, Varki A (2001). Genetic Differences Between Humans and Great
Apes. Molecular Phylogenetics and Evolution, 18(1): 2-13
32. Gaudieri S, Leelayuwat C, Tay GK, Townend DC, Mullberg J, Cosman D,
Dawkin RL (1997). Allelic and Interlocus Comparison of the PERB11
Multigene Family in the MHC. Immunogenetics, 45: 209-216.
33. Goodman F.R., Scambler P.J. (2001) Human Hox Gene Mutations. Clinical
Genetics, 59: 1-11.
34. Griffiths, et al (2005). Introduction to Genetic Analysis. W. H. Freeman and
Company: New York.
35. Hsien, L. (2006). A Heartbreaking Story about Genetics. The New York
Times: Science News.
36. Hammer HF, Spurdle AB, Karafet T, Bonner MR, Wood ET, Novelletto A,
Malaspina D, Michell RJ, Hosai S, Jenkins T, Zegura SL (1997). The
Geographic Distribution of Human Y Chromosome Variation. Genetics, 145:
787-805
37. Joseph, D. M. D. M. (1995). The Human Genome Project and Biology Education.
Bioscience, 45(11): 786-791.
38. Jurka, J, Krnjajic M, Kapitonov VV, Stenger JE, Kokhanyy O (2002). Active Alu
Elements Are Passed Primarily through Paternal Germlines. Theoretical
Population Biology, 61: 519-530.
39. Kamahori M, Harada K and Kambara H. (2002) A New Single Nucleotide
Polymorphisms Typing Method and Device by Bioluminometric Assay
68
Couple with a Photodiode Array. Measurement Science and Technology, 13:
1779-1785.
40. Kass, D., Jamison, N., Mayberry, MM., and Tecle, E. (2006). Identification of a
unique Alu-based Polymorphism and its use in Human Population Studies.
Gene, 390(1-2): 146-152.
41. Kirk, BW., Feinsod, M., Favis, R., Kliman, RM., and Brany, F. (2002). Single
Nucleotide Polymorphism Seeking Long Term Association with Complex
Disease. Nucleic Acids Research, 30(15): 3295-3311.
42. Kulski JK, Shiina T, Anzai T, Kohara S, Inoko H. (2002). Comparative Genomic
Analysis of the MHC: The Evolution of Class I Duplication Blocks, Diversity
and Complexity from Shark to Man. Immunology Review, 190: 95-122.
43. Kulski JK, Dunn DS (2005) Polymorphic Alu Insertions within the Major
Histocompatibility Complex Class I Region: A Brief Review. Cytogenetic and
Genome Research., 110: 193-202
44. Leelayuwat, C. et al (1994). A New Polymorphic and Multicopy MHC Gene
Family Related to Nonmammalian Class I. Immunogenetics, 40: 339-351
45. Leib-Mösch C and Seifarth, W (1996) Evolution and Biological Significance of
Human Retroelements. Virus Genes, 11: 133-145.
46. Losleben, B. (2003). The Bedouin of the Middle East. Lerner Publication:
London.
69
47. Martins, Sandra; Calafell, Francesc; Gaspar, Claudia; Wong, Virginia C. N.;
Silveira, Isabel; Nicholson, Garth A.; Brunt, Ewout R.; Tranebjaerg, Lisbeth;
Stevanin, Giovanni; Hsieh, Mingli; Soong, Bing-wen; Loureiro, Leal; Dürr,
Alexandra; Tsuji, Shoji; Watanabe, Mitsunori; Jardim, Laura B.; Giunti,
Paola; Riess, Olaf; Ranum, Laura P. W.; Brice, Alexis; Rouleau, Guy A.;
Coutinho, Paula; Amorim, António; Sequeiros, Jorge. (2007). Asian Origin
for the Worldwide-Spread Mutational Event in Machado-Joseph Disease.
American Medical Association 64(10): 1502-1508.
48. Marin MLC, Savioli CR, Yamamoto JH, Jorge K, Goldberg AC. (2004). MICA
Polymorphism in a sample of the Sao Paulo Population, Brazil. European
Journal of Immunogenetics, 31(2): 63-71.
49. Mavoungou, E., Sall, A., Poaty-Mavoungou, V., Toure, FS., Yaba, P., Delicat, A.,
and Lansoud-Soukate, J. (1999). Alloreactivity and Association of Human
Natural Killer Cells with the Major Histocompatibility Complex. Clinical and
Diagnostic Laboratory Immunology, 6(2): 254-259.
50. Middleton DD, Williams FF, Meenagh A, Daar AS, Gorodezky C, Hammond M,
Nascimento E, Briceno I, Perez MP. (2000). Analysis of the Distribution of
HLA-A Alleles in Populations from Five Continents. Human Immunology,
61(10): 1048-1052.
51. Mooser V, Mancini F.P., Bopp S, Pethö-Schramm A, Guerra P, Boerwinkle E,
Müller H.J, and Hobbs H.H. (1994). Sequence Polymorphisms in the Apo(a)
Gene Associated with Specific Levels of Lp(a) in Plasmas. Human Molecular
Genetics, 4(2): 173-181.
52. Muchmore, E.A. (2001). Chimpanzee Models for Human Disease and
Immunobiology. Immunological Review, 183: 86-93.
70
53. Mungall AJ, Palmer SA, Sims SK, Edwards CA, Ashurgt JL, wilming L, Jobes
MC, Horton R, Hunt SE, Scott CE, Gilber JG, Clamp ME, Bethel G, Milne S,
Ainscought R, Almeida JP, Ambrose TD, Ashwell RI, Babbage AK,
Bagguley CI, et al (2003). The DNA Sequence and Analysis of Human
Chromosome 6. Nature, 425: 805-811.
54. Nasidze I, Risch GM, Robichaux M, Sherry St, Batzer MA, Stoneking, M (2001).
Alu insertion Polymorphism and the Genetic Structure of Human Population
from the Caucasus. European Journal of Human Genetics, 9: 267-292
55. Nusbaum, C., Micheal, C. Z., et al. (2005). DNA Sequence and Analysis of
Human Chromosome 18. Nature, 437(7058): 551.
56. Nei M (1972). Genetic Distance Between Populations. American Naturalist, 106:
283-292.
57. Okada N (1991) SINEs. Current Opinion in Genetics and Development., 1: 498-
504.
58. Okada N, Hamada M, Ogiwara I, Ohshima K (1997). SINEs and LINEs Share
Common 3’ Sequence: A review. Gene, 205: 229-243
59. Olby, R. (2003). Quiet Debut for the Double Helix. Nature, 421: 402-405.
60. Paabo, S. (2001). Genomics and Society. The Human Genome and Our View of
Ourselves. Science, 291(5507): 1219-1220.
61. Patrinos A, Drell DW (1997) Introducing the Human Genome Project. Its
Relevance, Triumphs, and Challenges. The Judges Journal,36 (3).
71
62. Perna NT, Batzer MA, Deininger PL, Stoneking M (1992). Alu Insertion
Polymorphism: A New Type of marker for Human Population Studies.
Human Biology., 164: 641-648.
63. Perutz, MF., Randall, JT., Thomson, L., Wilkins, MH., and Watson, JD. (1969).
DNA Helix. Science Journal Science, 164: 1537–1539.
64. Price GR (1971) Extension of the Hardy-Weinberg Law to Assortative Mating.
Annals of Human Genetics., 34: 455-458.
65. Rajsbaum R, Fici D, Boggs DA, Fraser PA, Flores-Villanueva PO, Awdeh ZL
(2002). Linkage Disequilibrium Between HLA-DPB1 Alleles and Retinoid X
Receptor β Haplotypes. Human Immunology, 63(9): 771-778.
66. Robbins, R. R. J. (1992). Challenges in the human genome project. IEEE
Engineering in Medicine and Biology Magazine, 11(1): 25-34.
67. Romualdi, C. et al. (2002). Patterns of human diversity, within and among
continents, inferred from biallelic DNA polymorphisms. Genome Research.
12: 602−612.
68. Schneider, PM. (1997). Basic Issues in Forensic DNA typing. Forensic Science
International, 88(1): 17-22.
69. Sheffield VC, Stone EM, Carmi R. (1998). Use of Isolated Inbred Human
Populations for Identification of Disease Genes. Trends in Genetics, 14: 391-
396.
70. Shen MR, Batzer MA, Deininger PL (1991). Evolution of the Master Alu Genes.
Journal of Molecular Evolution, 33: 311-320.
72
71. Singer M, Berg P. (1997). Exploring Genetic Mechanisms. University Science
Books: New York.
72. Skaug HJ. (2001). Allele-Sharing Methods for Estimation of Population Size.
Biometrics, 57: 750-756.
73. Smit AF (1996). The Origin of Interspersed Repeats in the Human Genome.
Current Opinion of Genetics and Development, 6: 743-748.
74. Starr, C (2005). Biology: Concepts and Applications. Thomson Books: New
York.
75. Stumpf MP (2002). Haplotype Diversity and the Block Structure of Linkage
Disequilibrium. Trends Genetics, 18: 226-228.
76. Takasu M, Hayashi R, Maruya E, Ota M, Imura K, Kougo K, Kobayashi C, Saji
H, Ishikawa Y, Asai T, Tokunaga K (2007). Deletion of Entire HLA-A Gene
Accompanied by an Insertion of a Retrotransposon. Tissue Antigens, 70: 144-
150.
77. Ting, JC., Ye, Y., Thomas, GH., Ruczinki, I., and Pevsner, J. (2006). Analysis
and Visualisation of Chromosomal Abnormalities in SNP date with SNPscan.
BMC Bioinformatics, 7: 25.
78. Triggs, CM., and Buckleton, JS. (2002). Logical Implications of Applying the
Principles of Population Genetics to the Interpretation of DNA Profiling
Evidence. Forensic Science International, 128(3): 108-114.
79. Van Ommen, G-J.B. (2002). The Human Genome Project and the Future of
Diagnostics, Treatment and Prevention. Journal of Inherited and Metabolic
Diseases, 25: 183-188.
73
80. Wakeley J. (2001). The Discovery of Single-Nucleotide Polymorphisms: And
Inferences about Human Demographic History. American Journal of Human
Genetics, 69(6): 1332-1347.
81. Walsh, EC, Mather KA, Schaffner SF, Farwell L, Daly MJ, Patterson N, Cullen
M, Carrington M, Bugawan TL, Erlich H, Campbell J, Barrett J, Miller K,
Thomson G, Lander ES, Rioux JD (2003). American Journal of Human
Genetics, 73: 580-590.
82. Watson JD, Cook-Deagan RM (1991). Origins of the Human Genome Project.
FASEB Journal, 5: 8-11
83. Weber, JL., David, D., Heil, J., Fan, Y., Zhao, C., and Marth, G. (2002). Human
Diallelic Insertion/Deletion Polymorphisms. American Journal of Human
Genetics, 71: 854-862.
84. Xiao FX, Yang JF, Cassiman JJ, Decorte R (2002). Diversity at Eight
Polymorphic Alu Insertion Loci in Chinese Population Shows Evidence for
European Admixture in an Ethic Minority Population from Northwest China.
Human Biology, 74: 555-568.
85. Ye, J., Parra, J.E., Sosnoski, D.M., Hiester, K., Underhill, P.A. and Shriver, M.D.
Melting curve SNP (McSNP) genotyping: a useful approach for diallelic
genotyping in forensic science. Journal of Forensic Science, 47: 593- 600.
86. Yunis EJ, Larsen CE, Fernandez-Vina M, Awadeh Al, Romero T, Hansen Ja,
Alper CA. (2003). Inheritable Variation Sizes of DNA Stretches in the
Human MHC: Conserved Extended Haplotype and their Fragments or Blocks.
Tissue Antigens, 62: 1-20.
74
87. Zyphur, MJ. (2006). On the Complexity of Race. American Psychologist: 179-
180.
62
Appendix A
1000bp
500bp
500bp
1000bp
500bp
1000bp
MW
M
W
502bp
664bp
MW
Fig
ure 3
.7: T
he g
el photo
grap
hic rep
resentatio
n o
f the seco
nd electro
pho
resis run o
f the
Alu
yMIC
B p
roduct. R
un
in o
rder to
check
for th
e accurate n
um
ber o
f
hetero
zygous alleles.
MW
M
W
MW
502bp
502bp
664bp
664bp
1 2
3 4
5 6
7 8
9 1
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 2
0
21 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 3
0 3
1 3
2 3
334 3
5 3
6 3
7 3
8 3
9 4
0
41 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
9 5
0 5
1 5
2 5
3 5
4 5
5 5
6 5
7 5
8 5
9 6
0