+ All Categories
Home > Documents > Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf ·...

Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf ·...

Date post: 08-Mar-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
19
Summary Given the possible number of genetic variations, the probability of having a naturally occurring Doppelganger is low. This is why DNA evidence acquired at crime scenes is such conclusive evidence when presented in criminal trials. Though the process of DNA fingerprinting is fallible, the probability that two unrelated people with the same DNA exist is microscopic. Barring, then, that you have an identical evil twin, the probability that you will be mistaken for a criminal based on such evidence is low. Fingerprints, however, being only a portion of this genetic identity, seem far less restricting. It is then conceivably possible that one could be mistaken as the perpetrator of a crime based on fingerprint evidence. It is our goal to determine exactly how probable this is. One of the progenitors of the study of fingerprint identity was Sir Francis Galton, who identified characteristic ridge patterns in the skin that vary widely among a population, but which are constant over time to an individual. In addition to these minutiae, fingerprints also have an overall pattern that in nearly all cases falls into one of three groups: loops, arches, and whorls. Using both the overall fingerprint patterns, and a set of the most commonly occurring Galton Characteristics (GCs), we created a model to test the individuality of fingerprints, based on a probabilistic interpretation: highly probable fingerprints are less individual, and less probably fingerprints are more individual. In this model, we first divided an ideal rectangular thumbprint into squares of equal area, denoted as cells. Knowing that any comparison between two fingerprints first matches the general pattern of a fingerprint and then a certain number of GCs, we calculated the fingerprint patterns that have the maximum probability of occurrence. This was done by using figures which determined the relative frequency of occurrence of each of the patterns and GCs. To start, we assumed that from an ideal thumbprint containing N total cells, we chose to confirm the form and placement of n GCs in those cells. Our model proceeds in stages, first choosing the overall pattern of the print, and then proceeding to choose n locations of GCs from the N total placements possible. Once the pattern and placement have been determined, it remains only to factor in the relative occurrence probabilities of each GC in order to determine a measure of the individuality of the fingerprint. The model is constructed based on a number of assumptions. To begin with, we first assume that the patterns and GCs occur independently; neither has an influence on the other’s probability. In later stages of our analysis, then, we account for the fact that dependencies may exist, and alter the selection of GCs accordingly. Another assumption that our model makes is that the GCs occur independently; that is, in the n spaces which we wish to confirm the presence of GCs, placement has no effect on which characteristic is selected. Since there has been no conclusive evidence that a particular fingerprint pattern has any influence on the minutiae present in the fingerprint, this seems to be a valid assumption, and hence no unnecessary restrictions were placed on the form of the fingerprint. The construction of the model allowed us to calculate the ability to confirm a fingerprint based on partial fingerprint evidence. In addition, we used population figures of many countries and the entire world to find what the minimum number of GCs in common between fingerprints should be before a match can be said to occur. In testing this model, we did not calculate the probability of occurrence for every individual pattern and placement of GCs. Rather, we calculated only the probability of the most likely occurrence. Also, the orientation of GCs was not taken into consideration. This may at first seem to be a weakness, but is in fact a strength, as requiring a fingerprint to occur with GCs oriented in a particular direction is stricter than not requiring any particular direction for their placement. Thus, any fingerprint occurring in nature is hypothetically less likely to occur than our calculated maximum. For a template fingerprint with 12 identified minutiae, a reasonable required number given new advancements in laser recognition of fingerprints, the probability finding a match was calculated to be on the order of 10 -13 . This figure shows that even the most likely fingerprint is thus highly individual, and fingerprint identification is as reliable on ideal grounds as DNA identification, which has reliability on the order of 10 -10 .
Transcript
Page 1: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Summary

Given the possible number of genetic variations, the probability of having a naturally occurringDoppelganger is low. This is why DNA evidence acquired at crime scenes is such conclusive evidence when presentedin criminal trials. Though the process of DNA fingerprinting is fallible, the probability that two unrelated people withthe same DNA exist is microscopic. Barring, then, that you have an identical evil twin, the probability that you will bemistaken for a criminal based on such evidence is low. Fingerprints, however, being only a portion of this geneticidentity, seem far less restricting. It is then conceivably possible that one could be mistaken as the perpetrator of acrime based on fingerprint evidence. It is our goal to determine exactly how probable this is.

One of the progenitors of the study of fingerprint identity was Sir Francis Galton, who identifiedcharacteristic ridge patterns in the skin that vary widely among a population, but which are constant over time to anindividual. In addition to these minutiae, fingerprints also have an overall pattern that in nearly all cases falls into oneof three groups: loops, arches, and whorls. Using both the overall fingerprint patterns, and a set of the most commonlyoccurring Galton Characteristics (GCs), we created a model to test the individuality of fingerprints, based on aprobabilistic interpretation: highly probable fingerprints are less individual, and less probably fingerprints are moreindividual.

In this model, we first divided an ideal rectangular thumbprint into squares of equal area, denoted as cells.Knowing that any comparison between two fingerprints first matches the general pattern of a fingerprint and then acertain number of GCs, we calculated the fingerprint patterns that have the maximum probability of occurrence. Thiswas done by using figures which determined the relative frequency of occurrence of each of the patterns and GCs.

To start, we assumed that from an ideal thumbprint containing N total cells, we chose to confirm the form andplacement of n GCs in those cells. Our model proceeds in stages, first choosing the overall pattern of the print, and thenproceeding to choose n locations of GCs from the N total placements possible. Once the pattern and placement havebeen determined, it remains only to factor in the relative occurrence probabilities of each GC in order to determine ameasure of the individuality of the fingerprint.

The model is constructed based on a number of assumptions. To begin with, we first assume that the patternsand GCs occur independently; neither has an influence on the other’s probability. In later stages of our analysis, then,we account for the fact that dependencies may exist, and alter the selection of GCs accordingly. Another assumptionthat our model makes is that the GCs occur independently; that is, in the n spaces which we wish to confirm thepresence of GCs, placement has no effect on which characteristic is selected. Since there has been no conclusiveevidence that a particular fingerprint pattern has any influence on the minutiae present in the fingerprint, this seems tobe a valid assumption, and hence no unnecessary restrictions were placed on the form of the fingerprint. Theconstruction of the model allowed us to calculate the ability to confirm a fingerprint based on partial fingerprintevidence. In addition, we used population figures of many countries and the entire world to find what the minimumnumber of GCs in common between fingerprints should be before a match can be said to occur.

In testing this model, we did not calculate the probability of occurrence for every individual pattern andplacement of GCs. Rather, we calculated only the probability of the most likely occurrence. Also, the orientation ofGCs was not taken into consideration. This may at first seem to be a weakness, but is in fact a strength, as requiring afingerprint to occur with GCs oriented in a particular direction is stricter than not requiring any particular direction fortheir placement. Thus, any fingerprint occurring in nature is hypothetically less likely to occur than our calculatedmaximum. For a template fingerprint with 12 identified minutiae, a reasonable required number given newadvancements in laser recognition of fingerprints, the probability finding a match was calculated to be on the order of10-13. This figure shows that even the most likely fingerprint is thus highly individual, and fingerprint identification isas reliable on ideal grounds as DNA identification, which has reliability on the order of 10-10.

Page 2: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 2

AN INQUIRY INTO INDIVIDUALITY OF THUMBPRINTSAsma Al-Rawi, Steve Gilberston, Jonathan Whitmer

Kansas State UniversityMathematical Contest in Modeling 2004

I. Introduction

“How can you disbelieve in me when I have created each one of you down to theprints on your fingers?”

--God (The Holy Qur’an 75:3-4) [4]

The above reference, depending on one’s religiousness or secularism, eitherconfirms that fingerprints are distinct to individuals, or at the very least, that knowledgeof variation of fingerprints between persons, and its inherent properties in identification,has existed since the 8th century. In modern Western culture, the idea of using fingerprintsas a means of identification first appeared in an article written by Henry Faulds in 1880 inthe journal Nature [3]. His interest was aroused by his discovery of ridged patternimprints in handmade pottery. After performing a series of experiments to determinedifference in fingerprints among individuals as well as their resilience, he recommendedthat a primary use of these ridged imprints could be used as evidence of criminal identityat the scene of the crime. At the root of this assertion is the assumption of uniqueness ineach human’s fingerprint patterns. There are several commonalities in the patterns ofridged skin, however, which allow fingerprints to be systematically classified.

For example, the ridged lines on fingers appear in a number of major patterntypes: loops, which comprise the largest portion of all fingerprints and occur in twochiralities; whorls, which are characterized by the spiraling pattern of the ridges; and thearches, which comprise the smallest major group [1]. Other possible manifestations exist;however their occurrence is very rare. In addition to these major groups, the ridges ofdifferent fingerprints show certain defining characteristics. This idea was prevalent in oneof the first attempted quantifications of fingerprint individuality, which was performed bySir Francis Galton in 1892 [1]. The patterns of finger ridge divergences andcombinations, termed minutiae, are also identified as Galton Characteristics in his honor.Later developments have incorporated his ideas along with other print-determiningfactors to establish more exactly each print’s uniqueness [1,2,6].

Whether or not each fingerprint pattern is truly unique, their use as a form ofidentification has found much use in forensic science. Recently, however, the validity offingerprint evidence has been called into question, as evidenced by the case United Statesv. Mitchell, which presented the US with its first challenge as to the admissibility of latentfingerprint evidence as a means of identification [7]. This necessitates a reevaluation ofthe validity of fingerprint uniqueness in measurement. Thus, we become faced with theproblem of determining the probability that two people in the world might share the samefingerprints to measurable accuracy. This is quite a complex problem if one allows it tobe, as there seem at first to be almost infinitely many variations within ridge patternswhose appearance and interplay must be accounted for, and yet it has a simple andelegant solution which we will show in this paper. In our study, we focus not on each ofthe ten fingers, but on only the thumb, which effectively serves as an upper bound for the

Page 3: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 3

multiple occurrence probability of all friction ridged skin. Our calculations have foundon the basis of a discrete probability model that it is extremely unlikely that two peoplewith the same thumbprints have ever existed, within the limitations of currentmeasurement practices.

II. Model

The first step in devising a model for thumbprint individuality is simply tounderstand what types of fingerprints exist. As mentioned previously, fingerprints occurin what seems to be an infinite number of variations, determined by both their overallpattern and the distribution of Galton Characteristics (GCs). The patterns fall into threemain categories: loops, arches, and whorls. These can be further divided into over athousand subcategories [1]. Figure 1 shows the major types of prints.

FIGURE 1. These are four most common patterns of fingerprint patterns: Left and right loops, whorls, and arches.From www.sfis.ca.gov/pattern_types.htm.

Prints which fall into these categories can, to the untrained eye, and oftentimeseven the trained eye, appear very similar. When the contribution of GCs is factored in, aparticular fingerprint’s unique character starts to become apparent. The major types ofGCs are illustrated in Figure 2. Whether the pattern on the finger is a loop, arch, or whorl,GCs occur randomly throughout the entire print. These occurrences give distinctattributes to the print that can be systematically classified.

Page 4: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 4

FIGURE 2. A chart showing the 10 most common forms of Galton Characteristics. (Osterburg ??)

The central problem, given a known classification of a fingerprint by its patternand GCs, becomes to calculate the probability that an identical finger exists. Our modelfocuses specifically on thumbprints, for a variety of reasons. For instance, a thumb is easyto idealize. In practice, when fingerprints are taken, the finger is rolled over nearly itsentire surface above the first knuckle. This is similar to the unrolling of an uncappedcylinder. The shape of this print on paper is approximately rectangular. The thumbprinthas the largest area, and also the largest number of defining qualities, due to the randomdistribution of GCs.

For an ideal rectangular thumbprint, we partition the area into N equally sizedsquares, with a minimum size on the order of one square millimeter, due to the minimumextent to which a GC can be identified as occurring in one of the N squares. Since only afinite number of visible GCs can occur on a single patterned finger, a discrete probabilitymethod is useful for determining the possibility of Doppelganger thumbs. It is thenperfectly admissible to use a counting argument to find approximately the number ofpossible arrangements of friction ridges on the thumb, and their relative occurrencesbased on the features they contain.

It should be noted that ideal fingerprints as described above do not usually occurin actual fieldwork. Usually only portions of fingerprints are left by oils or othersubstances on the fingers of the criminal; these are called latent prints. After these latentprints are developed and brought into visible form, they are described as partial prints.These partial prints contain only a fraction of the total surface of the friction ridged skinon the thumb. Using similar ideas to the ones above, we can model partial prints simply

Page 5: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 5

by decreasing N; that is, limiting the number of cells on which the prints have to matchup. Since a partial print cannot possibly match the rest of the cells contained in an idealprint, the characteristics of those cells are irrelevant. Decreasing N then gives an accuratemodel, as we can say that the area we are sampling from is smaller. Accordingly, theprobability of matching the print among people of a given population grows, as we showbelow.

III. Probability Algorithms

Our first step was to measure the dimensions of an idealized thumb. Averagingover the three members in our group, we found the dimensions of a nearly rectangularprint, when measured as described above, to be approximately 3 cm by 4 cm. Thus thereare approximately 1200 square millimeters on two thumbs. We took each squaremillimeter to be a cell, so that in our ideal thumb model, a full print has a possibility of1200 identification points.

In practice, a suspect’s thumbprint and the thumbprint found at the scene of thecrime are compared to each other on both the overall pattern and a certain number ofdistinguishing characteristics. The distinguishing factors can correspond to either scars onthe suspect’s thumbprint or GCs. Since scars are the result of completely random events,and thus are nearly impossible to quantify without exact personal histories, our modelconsiders only the cases in which GCs occupy these identifying points. In previousmodels [1,2], the relation between GCs and the overall pattern was not considered; onlythe occurrence of GCs was taken into account. In our model, various degrees of patternand GC independence were considered. This accounts for the possibility that a certainpercentage of the GCs are inherent in the overall pattern. In the case where pattern andGC occurrences are completely independent, one can separate the probability of afingerprint’s occurrence into two factors:

GCpfp PPP (1).

In the above equation, Pfp is the probability a particular fingerprint will occur, Pp is theprobability a particular pattern will occur, some approximate figures for which are givenin Table 1, and PGC is the probability of a particular combination of GCs.

Class ofPrint ProbabilityRight Loop 0.325Left Loop 0.325Whorl 0.3Arch 0.05Total 1

TABLE 1: A list of approximate occurrence probabilities of the four most common thumbprints from Osterburg, et. al.The loop category is determined therein to have a 65% occurrence probability, which here is divided into the two

chiralities, which are easily distinguishable and occur at nearly the same rate overall.

Our model treats non-measured GCs and cells in which there are no GCs asequivalent empty cells. Thus, in the case where GCs are dependent on which pattern afingerprint has, we can still use this independence model, by noting that since a particular

Page 6: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 6

percentage of the GCs are determined by the pattern, we can treat those as empty space inwhich no defining characteristic occurs.

Suppose then, that we wish to find the probability that a particular distribution ofmeasured GCs occurs. To do this, we note that of the N total cells in the fingerprint, onlyn of these cells have any significance in terms of GC measurement. The number of waysthis can be distributed is easy to compute. Placing all measured cells on the same level,we begin placing GC’s and empty cells on the surface of the thumbprint. At first there aren GCs to place within the total area of the print, and N total cells to place them in. If thefirst cell is empty space, we are left with N-1 cells in which to place characteristics, and ncharacteristics. If the first cell contains a characteristic, we have N-1 empty cells in whichto place characteristics, and n-1 GCs. Iterating this choice process over all N cells, wefind that the number of ways we can place the GCs is

)!(!!

nNnN

n

N (2).

This leaves us to calculate the probability that each GC cell contains a particularGC. Osterburg, et al, contains relative frequencies of occurrence for each characteristicaveraged over 39 fingers. Table 2 gives these figures. In our model, since we disregardempty spaces, we considered only the relative frequency of the eleven most commonelements. Double occurrences, or the event that two GCs occur in the same space, whilecertainly possible, were ignored in this model calculation, due to their small frequency.The number in the table is misleading, as it accounts for all double occurrences, notdouble occurrences of particular types.

Parameter Cell configuration Frequency Probability of Parameter0 Empty 6,584 0.7661 Island 152 0.0182 Bridge 105 0.0123 Spur 64 0.0074 Dot 130 0.0155 Ending ridge 715 0.0836 Fork 328 0.0387 Lake 55 0.0068 Trifurcation 5 0.0019 Double bifurcation 12 0.001

10 Delta 17 0.00211 Broken ridge 119 0.01412 Multiple occurances 305 0.036

Total 8,591 1.000TABLE 2. Experimentally determined Galton Characteristic probability numbers. From Osterburg, et al. Our modeldisregards multiple occurrences, hence for our purposes, the characteristics numbered 0 and 12 are empty cells. Only

the characteristics numbered 1-11 are relevant.

The relative probability is a necessary factor for determining which characteristicis most likely to occur in the n GC cells. The probability of the ith occurrence is given by:

Page 7: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 7

=

i

i iPiP

r)(

)((3),

where the elements P(i) are determined from Table 1. The i in this case ranges from 1 to11, as our model considers only single GC occurrences, and treats the low probability andmultiple occurrence GCs as empty space. It should be noted that their inclusion woulddecrease the relative probability of the ith term as defined above; hence, it would decreasethe upper bound which our calculation aims to set. Clearly, the sum of these relativeprobability quantities is 1, hence they are validly defined as probabilities.

For n GCs, the probability of each arrangement is given by the relative probabilityof each GC to the power of the number of times the GC is selected divided by the numberof ways to divide those n elements into groups categorized by the eleven GCs considered.Though the idea is complex, the notation is rather mathematically simple, andcorresponds to the product of the selection probabilities divided by the multinomialcoefficient corresponding to n choosing n1 of GC number 1, n2 of GC number 2, etc. If wedivide this quantity by the number of ways each of the n GCs considered, we obtain theprobability of each arrangement of n GC’s, shown in equation (4a).

∏∏∏==

)!(!

)!(

)!(!

!

!!

!

11

111

11

1

111

11

1

nNN

rn

nNn

N

nn

n

r

n

N

nn

n

rP i

iii

ii

i

GC

iii

(4a)

One should note that in the above,

ni

i∑α (4b),

hence there are only as many stages considered in the determination of GCs as there areGCs that are measured and available to compare to.

To reiterate, our algorithm for calculating Doppelganger thumb probabilitiesconsiders separately the probabilities of both the general pattern and GC occurrence. Theprobability of GC occurrence is determined by the number of places in which GCs areobserved, the relative probability of a GC occurring there, and the number of ways theseGC’s can then be ordered. The quantification of this is then given by equation (4a).

Now, given equations (1) and (4a), we can calculate the probability of anyparticular fingerprint matching on both the pattern and any n GCs by using theinformation in Tables 1 and 2. Since we wish, then, to put a limit on the number ofpeople in the world who can match fingerprints, given these characteristics, we calculatedPmax, the probability of any thumbprint matching a template with only the most likelycharacteristics in each of the GC places. This simplifies equation (4a), by restrictingchoice to only the GC with maximum probability. Thus we have

Page 8: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 8

maxmaxmax

111

11

1

00

P

n

N

r

n

N

n

n

r

n

N

nn

n

rP

nni

i

GC

i∏=

(5).

Some plots of this are given in Appendix A. These plots use the value of rmax obtained bycomputing the relative probability of ending ridges, and consider only the right and leftloop patterns (occurring in equal supply) to constitute the maximum pattern probability.

To calculate the quantities determined in equation (5), it becomes necessary tocalculate factorials of very large numbers to determine values of N choose n. This can beapproximately done by using Sterling’s approximation, whose formula is given by

)2log(21

)log()!log( mmmmm +−≈ (6).

This, in turn, leads us to the approximation

)!log()!(log)!log(log nnNnn

N(7),

which can be utilized to approximaten

N.

If we suppose that a percentage of GCs are dependent on the overlying pattern,then our model changes very little. Assuming that l of the n total GCs are dependent on aparticular pattern, we can essentially disregard all pattern-dependent GCs as empty cells,as they would be exactly what is expected in the print at that point in the pattern. Hence,with a slight modification from n to n – l, where l denotes the number of GCs dependenton the pattern, equations (4a) and (4b) can still be utilized. In the event that the GCs arewholly determined by the overlying pattern, we can disregard the influence of the patternin our calculation of Pfp, as we have more precise information about GC form andoccurrence than we do about pattern and sub-pattern form and occurrence. Also, ourestimates for the likelihood of a GC occurring at a given point in the N-square array givea more limiting maximum for the probability than do our figures on general patterncharacteristics. The omission of the pattern influence on the fingerprint probability iscompletely valid, since total GC dependence on pattern is equivalent to total patterndependence on GC; they simply become two different types of taxonomy.

IV. Data

Returning to problem now, we are specifically asked to determine what theprobability is that a person can be misidentified by fingerprint evidence; that is, we are todetermine the probability that two people share the same fingerprint characteristics. For atemplate with n GCs, we are to calculate the probability that two distinct people matchthe template. This is limited by the square of Pmax for a given n, which as graphed in

Page 9: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 9

Figure 3 below, is seen to be very low for all n ≥ 10. For the value of n = 12, taken inOsterburg, et al to be a median value for what is required for verification by variousinternational law enforcement agencies, we can see that the probability of fingerprintmultiplicity is 4.64 x 10-15. These calculations were simply performed using a MicrosoftExcel spreadsheet and the formulas in Section III.

Maximum Probabilities at Various Pattern Dependencies

1.00E-31

1.00E-26

1.00E-21

1.00E-16

1.00E-11

1.00E-06

1.00E-01

1.00E+04

0 5 10 15 20 25 30 35

Number of GCs

P_

max

No Dependence 25% Dependence 50% Dependence 75% Dependence 100% Dependence

FIGURE 3: Plot of maximum probability as a function of the number n of GCs used in the verification process. Here nis allowed to range from 1 to 30.

Another, directly applicable, and highly interesting problem is the following:What is the maximum number of GCs that a particular country’s law enforcementagencies must use in order to get the highest probability of a match using the lowestnumber of GCs per identification? Using population figures in Table 3, we can determinethis. To do so, we multiply the population of a country by Pmax to find the number ofpeople in a country that are probable to match a given n GC template. The results areplotted in Appendix A.

The plots in Appendix A all point to near certain identification for n ≥ 12. This istrue regardless of the country in which the identification is being made. In fact, using theworld population figure, it is near certain that on a thumb with 1200 cells, a match is allbut certain, and indeed, only one person is likely to have ever existed with such a print.

Country Number of peopleUS 2.925E+08World 6.347E+09China 1.295E+09

Page 10: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 10

Lichtenstein 3.284E+04# People Ever 1.269E+10

Table 3: Population figures for the world and some representative countries. The number of people everwas a figure computed on the assumption that roughly twice as many people have existed in the history ofhumanity than exist at this particular point in time.

As was noted before, however, it might be the case that a thumb with 1200 cells isoverly large, or that only partial prints can be obtained for identification purposes. In thiscase, we restrict the number N to a number less than 1200. For the plots in Appendix B,we changed the number 1200 in our calculation to values of N = 600 and N = 300.Though this increases the probability of finding multiple matches, due to restriction in thenumber of sites to place n GCs. However, if as few as 12 GCs are matched, thefingerprint’s unique identity is all but assured.

V. Error Analysis

A previous investigation by Pankanti, et. al. included the orientation of eachminutia in the model for fingerprint individuality. We neglect to include the factor oforientation of the characteristic for many reasons. Firstly, removing the factor of GCorientation can only decrease our estimate of the maximum possible thumb Doppelgangerprobability. Since we are attempting only to find a maximum bound for this probability,removal of a factor which can only decrease the probability of a particular print, while inthe same breath unnecessarily complicates our solution, does no damage to our model.Pankanti, whom accounts for orientation in his model, arrived at a lower figure forfingerprint individuality than we did. In accounting for this orientation, however,Pankanti completely disregards the differences in minutiae, only concentrating onlocation and orientation of defining features in the fingerprint ridges. Some figures doneon various model calculations that are included in Pankanti’s paper are listed in Table 4,in Appendix C.

A second reason our model disregards orientation is that our model relies on theassumption that minutiae occur either independently or semi-independently. Inaccounting for orientation, we would have to take into account restrictions placed on theorientation of the GC by the overall pattern. This is simple to see: persons with looppatterns have a higher probability upward and downward pointing GCs than do personswith arches. Accounting for orientation would make the pattern and minutiaeprobabilities inseparable, and again harm the simplicity of our model while offering littleimprovement to our limiting maximum.

Another unavoidable problem with our model is the roughness of pattern and GCfrequencies. Unfortunately, there are no good assessments published on the percentagesof the population who patterns that fall into the arch, loop, and whorl categories. Thefrequency of occurrence of GCs faces a similar problem. In fact, the only figures wecould find were rough estimations based on a small sample of people. Osterburg, whosefigures we used in this model, arrived at his probability parameters of GCs by samplingfrom 39 fingerprints. He did break them into a total of 8,591 cells, but as we do not knowwhether or not a single person is more likely to have a certain type of GC, theseprobabilities cannot be taken at face value [1]. Surely more recent figures on these

Page 11: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 11

parameters exist, but they again do not harm our model, only the figures which itcalculates.

As mentioned before, there is a possibility that there exists dependence betweenGCs and the overall pattern of a fingerprint. In our model, we attempted account for thisby decreasing the identifying traits of a particular minutia by 25%, 50%, and 100%. Forthe 100%, we simply calculated the probability of a particular GC occurrence anddisregarded the pattern, as either can be seen to be the determining factor of the other.This is not an exact model simply because this assumes semi-independence wherecomplete dependence may occur. Without proper relations that give the dependence ofminutiae on the overall pattern, however, we are unable to properly account for this.Inasmuch as we were able to adjust for these parameters, our model still predicts thatidentifying 12 or more minutiae on a print, which is well within current technology, allbut assures a positive match.

One who pays astute attention to our graph in Figure 3 notes that the graphs of100% and 0% dependence are actually the closest in predicted probability. This isbecause removal of the pattern parameter in the calculation of Pmax only increases theoverall maximum probability by an approximate factor of 10. The other figures sufferfrom inexactness in relating the dependence between occurrence of pattern and minutiae.In the figures for our model, we have more precise knowledge of GC occurrence than ofpattern occurrence. Hence, the plots in which we require a percent dependence on patternsuffer unnecessarily from inexact data.

As we are creating a somewhat idealistic model of fingerprints, scars were nottaken into consideration. As can bee seen in Figure 4, scars do have an effect on theappearance of fingerprints. This may create inaccuracies; however, there is no good wayto model the formation of scars, as this is completely due to personal experiences.

FIGURE 4: The effect of scars on fingerprint analysis. From Cowger, p. 4.

Our model also differs on one account from most other models of fingerprints.Previous articles [3] published on fingerprint analysis define fingerprints only as theportion in the general vicinity of the central pattern. Our model actually takes the print onthe entire area above the upper joint of the thumb, which would be the type of fingerprinton file. Accordingly, our probabilities are significantly lower than those calculated byothers. However, our model can, as mentioned before, be made to approximate these inthe limit where the number of cells N is at a value around 300 and n is around 12. Thevalues we calculated in this method match up to other models accordingly, as seen inTable 4.

The major problem which our model suffers from is its inability to account forhuman error in determining thumbprint probability. Epstein [7] notes that the majorproblem with latent fingerprint evidence is the inability of the humans whom examine theprints to discern exact characteristics. We now have the ability to use optical scans todetermine fingerprints of an individual exactly, as opposed to putting ink on file. If thethumbprints matches were able to be tested by a computer, it would be highly unlikely,given our model, that anyone would ever be misidentified.

Page 12: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 12

Comparing the output of our model with the probabilities of error in DNAanalysis, we find that fingerprints are a much more accurate method of identification.Though everyone except identical twins and clones has a unique sequence of DNA, forcriminology, the exact sequence is not actually used as evidence. Instead, DNA is cut upwith an enzyme into Restriction fragment length polymorphisms (RFLPs). These piecesof DNA are then run out on a gel, which separates it out by the size of the segment [8].Accordingly, if two or more people simply have restriction sites in approximately thesame area, or even have the same amounts of DNA between restriction sites, they can bemistaken for one another. This is a much higher probability than if the exact sequencewere taken into account. Accordingly, though misidentification is rare, the probability ofmisidentification in DNA analysis is on the order of one in ten billion, while according toour data that of fingerprint analysis is much lower [5].

Page 13: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 13

VI. Conclusion

Initially, this problem aroused in us many concerns. What if one of us really had athumb Doppelganger? We could be convicted for crimes we had never committed! Thissituation would be most unfortunate. However, after running our model under a case ofmaximum probability, we discovered that there is a better chance of misidentificationthrough DNA profiling if the fingerprint analysis is conducted with minimal human error.This is plainly evident in the fact that the odds of misidentification of DNA evidence,regarded in legal and public opinion as nearly infallible, has a probability ofmisidentification on the order of 10-10, while the odds of fingerprint misidentification isfour orders of magnitude less, according to our model. Needless to say, it seemsunreasonable to deny fingerprint profiling as evidence in a criminal trial.

Page 14: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 14

Appendix A: Shared Characteristics of a Population

The following plots were used to determine the optimum figure for identificationof criminals based on fingerprint evidence that is given in section IV.

Number of like thumbprints, 0% dependence, N=1200

1.E-25

1.E-20

1.E-15

1.E-10

1.E-05

1.E+00

1.E+05

1.E+10

0 5 10 15 20 25 30 35

Number of GCs

Nu

mb

er o

f p

eop

le w

ith

th

um

bp

rin

t

US Most World Most China Most Lichtenstein Most Ever Most

Figure 5: Plot of the number of probable like thumbprints in a given country using the model of zeropercent pattern dependence. This shows that if only 10 minutiae are required to match, then it is likely that

no one in the history of the world has had an exactly matching whole thumbprint.

Number of like thumbprints, 25% dependence, N=1200

1.00E-24

1.00E-19

1.00E-14

1.00E-09

1.00E-04

1.00E+01

1.00E+06

1.00E+11

0 5 10 15 20 25 30 35

Number of GCs

Nu

mb

er o

f p

eop

le w

ith

th

um

bp

rin

t

US Most World Most China Most Lichtenstein Most Ever Most

Figure 6: Same as above, for 25% dependence model. Here, only 10 minutiae are required for positiveidentification as well.

Page 15: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 15

Number of like thumbprints, 50% dependence, N=1200

1.00E-21

1.00E-16

1.00E-11

1.00E-06

1.00E-01

1.00E+04

1.00E+09

0 5 10 15 20 25 30 35

Number of GCs

Nu

mb

er o

f p

eop

le w

ith

th

um

bp

rin

t

US Most World Most China Most Lichtenstein Most Ever Most

Figure 7: Same as above, for the 50% pattern dependence model. Here, around 12 characteristics arerequired for a highly probable identification. The difference here is likely caused by error in our knowledge

of pattern frequencies.

Number of like thumbprints, 100% dependence N=1200

1.00E-26

1.00E-21

1.00E-16

1.00E-11

1.00E-06

1.00E-01

1.00E+04

1.00E+09

0 5 10 15 20 25 30 35

Number of GCs

Nu

mb

er o

f p

eop

le w

ith

th

um

bp

rin

t

US Most World Most China Most Lichtenstein Most Ever Most

Figure 8: Same as above, for the complete dependence model. Again, only about 10 characteristics arerequired for a positive identification.

Page 16: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 16

Appendix B: Shared Partial Print Characteristics of a Population

The following plots were used to determine the optimum number of GCs to matchup within a given population if only partial prints are available for comparison.

Number of like thumbprints, 0% dependence, N=600

1.00E-22

1.00E-17

1.00E-12

1.00E-07

1.00E-02

1.00E+03

1.00E+08

1.00E+13

0 5 10 15 20 25 30 35

Number of GCs

Nu

mb

er

of

pe

op

le w

ith

th

um

bp

rin

t

US Most World Most China Most Lichtenstein Most Ever Most

Figure 9: A plot of the number of possible like half-thumbprints, given zero dependence on fingerprintpattern.

Number of like thumbprints, 100% dependence, N=600

1.00E-22

1.00E-17

1.00E-12

1.00E-07

1.00E-02

1.00E+03

1.00E+08

1.00E+13

0 5 10 15 20 25 30 35

Number of GC's

Nu

mb

er o

f p

eop

le w

ith

th

um

bp

rin

t

US Most World Most China Most Lichtenstein Ever Most

Figure 10: A plot of the number of possible like half-thumbprints, given one hundred percent dependenceon fingerprint pattern.

Page 17: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 17

Number of like thumbprints, 0% dependence, N=300

1.00E-18

1.00E-13

1.00E-08

1.00E-03

1.00E+02

1.00E+07

1.00E+12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Number of GCs

Nu

mb

er

of

pe

op

le w

ith

th

um

bp

rin

t

US Most World Most China Most Lichtenstein Most Ever Most

Figure 11: A plot of the number of possible like quarter-thumbprints, given zero dependence on fingerprintpattern.

Number of like thumbprints, 100% dependency, N=300

1.00E-18

1.00E-13

1.00E-08

1.00E-03

1.00E+02

1.00E+07

1.00E+12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Number of GCs

Nu

mb

er

of

pe

op

le w

ith

th

um

bp

rin

t

US Most World Most China Most Lichtenstein Most Ever Most

Figure 12: A plot of the number of possible like quarter-thumbprints, given one hundred percentdependence on fingerprint pattern.

Page 18: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 18

Appendix C: Table of Calculated Probabilities

These probabilities were calculated using various past models by Pankanti [6]. Asnoted earlier, our model, which predicts a value less than 4 x 10-15 for the probability ofeach individual fingerprint, is in good agreement with these calculations.

Author Pfp n=36, R=24,M=72

N=12, R=8,M=72

Galton (1892) R

2

1

256

1

16

1 1.45x10-11 9.54x10-7

Pearson(1930) R

36

1

256

1

16

1 1.09x10-41 8.65x10-17

Henry(1900) 2

4

1

N 1.32x10-23 3.72x10-9

Balthazard(1911) N

4

1 2.12x10-22 5.96x10-8

Bose(1917) N

4

1 2.12x10-22 5.96x10-8

Wentworh & Wilder(1918)

N

50

1 6.87x10-62 4.10x10-21

Cummins & Midlo (1943) N

50

1

31

1 2.22x10-63 1.32x10-22

Gupta (1968) N

10

1

10

1

10

1 1.00x10-38 1.00x10-14

Roxburgh (1933) N

412.210

5.1

1000

1 3.75x10-47 3.35x10-18

Trauring (1963) N)1944.0( 2.47x10-26 2.91x10-9

Osterburg et al. (1980) NNM )234.0()766.0( 1.33x10-27 3.05x10-15

Stoney (1985) 13 )105.0(6.05

××× NN 1.2x10-80 3.5x10-26

TABLE 4: Calculated probabilities for various models. Obtained from Pankanti, et. al. [6]. Here, R is the number ofregions of a fingerprint considered as defined by Galton, M is the number of regions as defined by Osterburg.

Page 19: Given the possible number of genetic variations, the ...dav/undergrad_research/04mcm-250.pdf · Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer Kansas State University Mathematical

Team 250 Page 19

References

[1] J.Osterburg, et al., “Development of a Mathematical Formula for the Calculation ofFingerprint Probabilities Based on Individual Characteristics”, Journal of theAmerican Statistical Association, Vol. 72, No. 360, pg 772-778, 1977

[2] S. L. Sclove, “The Occurrence of Fingerprint Characteristics as a Two DimensionalProcess”, Journal of American Statistical Association, Vol. 74, No. 367, pp. 588-595,1979

[3] James F. Cowger, Friction Ridge Skin: Comparison and Identification of Fingerprints,Elsevier Science Publishing Co. Inc., New York, New York, 1983.

[4] The Noble Qur’an: In the English Language, Dr. Muhammad Taqi-un-Din Al-Hilali.Riyadh, Houston, Lahore: Darussalam Publishers and Distributors, 1998.

[5] “DNA Fingerprinting.” The Columbia Encyclopedia, Sixth Edition. New York:Columbia University Press, 2003

[6] Sharath Pankanti, et al., “On the Individuality of Fingerprints”http://biometrics.cse.msu.edu/2cvpr230.pdf

[7] Robert Epstein, “Fingerprints Meet Daubert: The Myth of Fingerprint “Science” isRevealed”, Southern California Law Review, Vol. 75, pp. 605-658, 2002

[8] Anthony J. F. Griffiths, Modern Genetic Analysis, W. H. Freeman and Company,New York, Mew York, 2002.


Recommended