1
Estimates of Striation Pattern Identification Error Rates by Algorithmic Methods
In: AFTE J, 45(3):235-244 2013 Nicholas D. K. Petraco1, Loretta Kuo1, Helen Chan1, Elizabeth Phelps2, Carol Gambino3, Patrick
McLaughlin1,4, Frani Kammerman1, Peter Diaczuk1, Peter Shenkin1, Nicholas Petraco1,5 and James Hamby6
1John Jay College of Criminal Justice, City University of New York, 524 West 59th Street, New York, NY, 10019 2Boston Police Department, Crime Laboratory, One Schroeder Plaza, Boston, MA 02120
3Borough of Manhattan Community College, City University of NY, 199 Chambers Street, New York, NY, 10007 4New York City Police Department, Evidence Collection Unit. 1,5Petraco Forensic Consulting, 240 Abbey Street, Massapequa Park, NY 11762. 6International Forensic Science Laboratory & Training Centre, 2265 Executive Drive, Indianapolis, IN, 46241.
2
Abstract: This study presents a computationally based methodology to estimate identification
error rates of striation patterns in as modern and objective way as possible. A database was
assembled consisting of 3D striation patterns generated by standard tip screwdrivers and 9-mm
Glock firing pin apertures. These toolmark surfaces were digitally recorded by white light
confocal microscopy commonly used for surface metrology applications. Multivariate
algorithmic methods were used which encompass few assumptions and have a long and
successful application history in many scientific fields. Specifically, principal component
analysis and support vector machine methodology were exploited to objectively associate
striation patterns with the tools that created them. Estimated toolmark identification error rates
were far less than 1% so long as enough toolmark data is used to train the algorithm. Realizing
that our approach to this problem is not the only one possible and to stimulate interest in
constructing an open reference database of toolmarks and computer programs, all of the data and
software generated for this study is available at http://toolmarkstatistics.jjay.cuny.edu/ to
registered users for free.
Key words: forensic science, toolmarks, cartridge cases, screwdrivers, striation pattern,
database, confocal microscopy, surface metrology, multivariate statistics, machine
learning, error rates
3
Introduction
Forensic science has come under increased scrutiny in recent years. In February 2009, the
National Academy of Sciences (NAS) released their report, “Strengthening Forensic Science in
the United States: A Path Forward” [1]. The NAS report (2009) states that “much forensic
evidence— including, for example, bite marks and firearm and toolmark identifications—is
introduced in criminal trials without any meaningful scientific validation, determination of error
rates, or reliability testing to explain the limits of the discipline” (p. 3-18). It suggests
“additional studies should be performed to make the process of individualization more precise
and repeatable” (p. 5-21). This study outlines one such set of objective and testable methods to
associate toolmark impression evidence with the tools and firearms that generated them.
The basic elements of toolmark examination and comparison include the production of an
exemplar toolmark made from a questioned tool, and the comparison of the impressions of the
exemplar with that of the toolmark found at the crime scene. We can quantify this method by
representing impressions made by tools and firearms as mathematical patterns composed of
features. A particular approach for recognizing variations in patterns is with multivariate
statistical/algorithmic methods. In a computational pattern recognition context, these methods
are often referred to as machine learning. The mathematical details of machine learning can give
what Moran calls “…the quantitative difference between an identification and non-
identification” [2]. In our study, we use a surface metrological-algorithmic scheme to
statistically estimate the identification error rate parameter for striation pattern comparisons. In
the interests of transparency and reproducibility we necessarily focus on the details of the
approach.
Literature Review
Applications of some form of statistical/probabilistic method to toolmark analysis are
extensive in the literature. In this review we necessarily limit ourselves to discussing those
methods that explicitly relied on 3D imaging and explicit use of computers. Geradts, Keijer, and
Keereweer created a database for toolmarks (TRAX) with video-images and data about
toolmarks (width of toolmark, type of tool, microscope magnification, etc.) [3]. A video camera
on a comparison microscope is connected to a computer, which is used to scan the striation
patterns and digitize the image. They developed an algorithm for the automatic comparison of
4
digitized striation patterns. A comparison screen in TRAX makes it possible to compare images
of toolmarks. The system was tested with ten screwdrivers of the same brand and all striation
marks were identified with the correct screwdriver.
De Kinder and Bonfanti developed a system capable of performing automated comparisons
between striation marks on bullets, using laser profilometry, a non-contact laser scanning
technique that records the topography of a bullet [4]. The system was able to obtain a one-
dimensional array of characteristics out of the recorded data (a feature vector) and compare it to
similar quantities from other bullets using a correlation technique.
Bachrach discussed the development of SciClops, an automated microscope comparison
system using a 3D characterization of a bullet’s surface [5]. Preliminary tests were conducted to
evaluate the ability of the system to identify and distinguish bullets. It was determined that it was
possible to acquire reliable characterizations of a bullet’s surface, to accurately identify
similarities between bullets fired by the same gun, and to accurately discriminate between bullets
fired by different guns.
In Banno, Masuda, and Ikeuchi’s study, they presented an algorithm for a shape
comparison of impressions on bullets using 3D shape data [6]. A confocal microscope was used
to obtain 3D data of striated surfaces and to visualize virtual impressions. Then they aligned the
3D data to compare the shapes of the striations by computing a distance between two surfaces for
alignment.
Senin et al. introduced a 3D virtual comparison microscope to compare two specimens
through their virtual 3D reconstructions [7]. The authors determined that systems based on 3D
surface topography can aid in the visual comparison process, as well as in making quantitative
measurement over shape data. Furthermore, algorithms were also used to generate artificially
enhanced images. They concluded that visual enhancement tools and quantitative measurement
of shape properties could help a firearm examiner in comparing toolmarks.
A system known as BulletTRAX-3D™ aids forensic firearms examiners in the comparison
process. This system uses three-dimensional sensory technology, allowing operators to capture
2D digital images and to create 3D topographic models of the bullet's surface area. Roberge and
Beauchamp decided to apply the Tontarski and Thompson test to BulletTRAX-3D and determine
if the system was able to correctly match each numbered pair to a unique lettered pair [8,9]. The
test involves the comparison of twenty-one pairs of 9mm Luger Hi-Point bullets fired from ten
5
consecutively manufactured Hi-Point barrels. In the Roberge and Beauchamp paper, all pairs of
bullets in the test were imaged with BulletTRAX-3D, which computed a score that quantifies the
similarity of standard and test bullets. BulletTRAX-3D was able to accurately match each of the
numbered and lettered pairs, showing that the system could reproduce what firearms examiners
would do manually [8].
Brinck attempted to determine whether newer 3D imaging technology was better than 2D
technology by evaluating the abilities of IBIS and BulletTRAX-3D. In his experiment, bullets
from ten consecutively manufactured barrels were fired into a water recovery tank [10]. One pair
of copper-jacketed bullets and one pair of lead bullets were selected from those generated and
uploaded into IBIS and BulletTRAX-3D by the same operator. Brinck concluded that, although
IBIS is an effective tool for the identification of copper-jacketed bullets, BulletTRAX-3D was
better at identifying all bullet types tested (copper-jacketed, lead, and inter-composition bullets)
[10].
Faden et al. developed a computer program to compare toolmarks made from forty-four
consecutively manufactured screwdrivers on soft lead plates [11]. A surface profilometer was
used to make height, depth, and width measurements as a function of location on the sample
surfaces. Four marks were produced using both sides of each tool at three different angles (30°,
60°, and 85°). Pearson correlation was used to compare toolmarks involving true matches, true
non-matches, and marks made from different sides of the same tool. All produced high
correlation values, suggesting that the Pearson correlation alone is not effective at determining
when there is an actual match. There was, however, a significant separation in correlation values
between true match and true non-match toolmarks produced at the same angle, as well as,
toolmarks made from different sides of the same screwdriver tip, supporting the hypothesis that
different sides of a screwdriver act as different tools when producing toolmarks.
Chumbley et al. extended the Faden et al. study by comparing the effectiveness of an
algorithm to human examiners [11,12]. The algorithm they used first optimized the dataset, in
which it identifies a region of best agreement between the toolmark datasets being compared.
Next, the algorithm validated the dataset, in which the certain corresponding areas in the region
of best fit (on both toolmarks) are compared and a correlation value is calculated. If a match
exists at one point along the scan length (Optimization), there should be large correlations
between corresponding areas along their entire length (Validation). The authors then conducted a
6
double-blind study in which fifty experienced toolmark examiners gave their opinions on the
sample set. In the end, the authors determined that examiner performance was much better than
the algorithm, but the deficiencies could now be addressed and improved upon [12].
Chu et al. estimated the width of lands for 48 bullets using confocal microscopy. In their
study, each barrel had six lands; as a result, 288 land engraved area (LEA) widths were
calculated from each topography image [13]. The 48 bullets were classified into different groups
based on the width class characteristic for each LEA. Once the average profile is determined for
each LEA image, cross-correlation values were computed between the LEAs of two bullets and a
list of the best candidates is generated. For all 48 lists, the average number of correct matching
bullets was about 9.3% higher than that obtained using current optical reflection systems.
Furthermore, the error rate was about 24% smaller with confocal microscopy [13].
Bachrach, Jain, Jung, and Koons compared striated toolmarks from screwdrivers and
tongue and groove pliers using confocal microscopy [14]. They considered the effect of changing
the substrate onto which the toolmarks were created, as well as the angle of incidence for
creating the toolmark. Bachrach et al. sought to validate the basic premise of toolmark
examination, namely that toolmarks exhibit a high degree of individuality [14]. Algorithms were
developed to generate toolmark signatures, while metrics were used to assess the degree of
similarity between known matching and non-matching pairs. From these similarity values, the
authors determined that it was possible to evaluate “the degree to which toolmarks created by the
same tool are repeatable and distinguishable from toolmarks created by other tools” [14]. They
concluded that: (1) the striated toolmarks produced on the same medium and under the same
conditions were both repeatable and specific enough to allow for reliable identification of the
producing tool; (2) striated toolmarks created on different media but under the same conditions
could still be identified with high reliability; (3) screwdriver striated marks depend more on the
angle at which the toolmark is created than the media; (4) the probability of a pair of different
tools having similar features is extremely low; and (5) the probability of error from a faulty
image, not because of the tool itself, would not create repeatable and individual toolmarks [14].
Database and web interface
The database of all 3D toolmarks recorded for this study is available to the
firearms/toolmark research community and the forensic firearms/toolmark practitioner
7
community at http://toolmarkstatistics.jjay.cuny.edu/. Users can sign up to request an account,
where after approval, they will have full access to the database. Several pieces of software and
statistical analysis scripts were generated in the process of carrying out this project and are also
available on the above website. Note that the database is meant to explore what 3D microscopy
and computational pattern recognition is capable of: research, algorithmic development/testing,
and the generation of 3D toolmark images for case/court presentation purposes. It is not (yet)
meant specifically for casework.
A web interface was developed so that the data collected, as well as the
statistical/visualization software, can be searched or downloaded by interested users. Figure 1
shows a screenshot of the homepage for the database. Queries are returned as text descriptors
that can be clicked on to download data files to the user’s computer. The 3D surface data is
stored using the Mountains® metrology software system format. The Mountains® binary data
format was chosen because it is generally well known in the scientific community (specifically
metrology/mechanical engineering) and is published. Users will not need to have the Mountains®
software to open the files downloaded from the database. A Java language “plug-in” has been
written for the open-source digital imaging/analysis software suite ImageJ
(http://rsbweb.nih.gov/ij/ developed at the NIH). The plug-in, which is available on the website,
allows ImageJ functionality to be used to perform measurement tasks as well as interactive 3D
viewing of the tool mark surfaces. Figures 2, 3 and 4 show several screen shots of possible uses.
Users of the database can also develop their own analysis software to operate on our data files.
Surface file read in routines written in the programming languages Java, C++, Python,
MATLAB and R are available on the site in order to facilitate this task.
Materials and Methods
1. Toolmark data acquisition
A Zeiss Axio CSM-700 confocal microscope was used to analyze the toolmarks produced
for this study. Confocal Microscopy is an imaging technique that allows quantitative observation
of surface microstructure details, and the reconstruction of three-dimensional surface
topographies. See the excellent review article by Artigas [15].
For the "Glock" data set, 162 9-mm cartridge cases, fired from twenty-four Glock pistols
were collected. The primer shear marks were scanned using 50x magnification (0.95 NA).
8
Because the shear marks were not always normal to the breach face surface, the cases were
mounted on a goniometer during the scanning process to reduce tilt, keeping the scanned volume
(required confocal stack) to a minimum. A quick pre-scan with the 10x objective allowed
evaluation and accommodation for this natural tilt. Resulting scanned tiles were stitched together
and noisy end portions were cropped out. Zeiss "Z-interpolation" was used to threshold and
smooth dropouts and outliers [16].
Fifteen Craftsman® brand screwdrivers, 10 Iron Bridge® brand screwdrivers and 4
Workforce® brand screwdrivers were used to construct the "screwdriver" database (29 exemplar
screwdrivers total, all new and unused). Striation patterns form both sides of the 29 screwdrivers
were recorded, with five replicates each, creating a total of 290 striation patterns. Lead was used
as the recording medium. It is soft enough that it will not damage the tool’s working surface, and
it lacks the pitted texture that we observed on wax under the high magnification needed for the
confocal microscopy. The pitting on the wax surface significantly added to dropouts observed on
the digitally recorded surface. The screwdriver striation pattern exemplars were made using a jig
constructed to give the examiner good control over the tool’s lateral and rotational angles with
respect to the impression medium (cf. Figure 5). The jig was set to a consistent angle of 15˚ for
comparison purposes, and in each case, the screwdriver was pulled toward the operator. Note
that the same angle of attack was used in the screwdriver study of Bachrach et al. [14]. The
exemplars were scanned using the 50x-long working distance objective (0.6NA) due to the high
ridges encountered at the edges of the striation patterns. During scanning, the left edge of each
toolmark was denoted to provide a point of reference for the section of striations. Once the left
edge was marked, it was moved 1,000 µm to the right. The purpose of this was to decrease scan
time. The confocal microscope collects slices of information in the z-direction. Because the left
edge of the striation patterns are generally so much higher than the rest of the mark, scanning
from the left edge would have increased the scanning time dramatically, with relatively little gain
of information. From this point (1,000 µm from the left edge), seven sections were selected so
that there was some overlap for the confocal microscope software to stitch together. Zeiss "Z-
interpolation" was used to threshold and smooth dropouts and outliers [16].
2. Toolmark surface preprocessing routine
2a. Form removal and Filtering
9
Due to long range surface warping during the toolmark formation process, form was
removed for all recorded striation patterns. Third order polynomial surface fits were used for all
recorded striated surfaces. This degree polynomial was chosen because it was observed to have a
minimal set of degrees of freedom to remove a majority of gross surface warp. The resulting
form removed striation patterns were filtered into roughness and waviness components using the
Gaussian filter and λxc = λyc = 0.025 mm cutoff values [13,17]. Note that the definitions of form,
waviness and roughness are not unique. Our definitions correspond to the parameters stated
above.
Means laterally down the striation patterns were taken to turn each surface into an
average "profile". Mean profiles were used due to the high redundancy of information found on
the surface, top to bottom. Also, following the literature, it is current “standard practice” to use a
profile (usually the mean profile) as input into the statistical discrimination algorithms instead of
the entire surface (5,11-14).
From the profile plots it was clear that almost all of the “line structure” in the striation
pattern, apparent in a comparison microscope (Leica FSM), was contained in the waviness
surfaces across all of our samples. Therefore, only extracted waviness was utilized. The waviness
component of each mean profile was loaded into the R statistical program for further processing
and analysis [18].
2b. Registration and Alignment
Because each profile did not begin and end at the same points, the profiles required
alignment (i.e. registration) in order to be processed as multivariate feature vectors. In order to
register profiles from the same experimental unit (e.g. a Glock or a screwdriver), the cross-
correlation function (CCF) between two profiles from each group was computed to find the shift
that yielded maximum correlation – a linear, univariate measure of similarity [13,17]. Within a
group of experimental units, the longest profile is chosen as a reference or “anchor profile”. The
remaining profiles are then maximally aligned with respect to the anchor profile.
After profiles within each experimental unit were registered, profiles between
experimental units were aligned. This was done by computing a group-mean-profile (GMP) for
each of the within-group aligned profile sets [19]. The GMP for each experimental unit served as
a representation for that unit. The GMPs were then registered with respect to each other within a
10
user defined “uncertainty window”. The reason an uncertainty window was required for between
group registration was that, in general, there is no ubiquitous landmark available for any given
profile. The GMPs were aligned within a +/-100µm uncertainty window. The shift parameters
produced by the registration of group-means were used to shift all the mean profiles of the
groups in blocks. That is, each group of mean profiles was shifted by the amount required to
register the GMPs. All profiles used in an analysis were rescaled such that the lowest profile
point was designated 0 and the highest 1. This was done in order to minimize discrimination
between experimental units due only to valley depth and peak height variation. Valley depth and
peak height variation can be due to pressure variations in toolmark formation. Generally, this
should not be information that is used in toolmark discrimination. Length differences in the
profiles were padded with zeros. Several padding schemes were tried; zeros, standard Gaussian
random variates, uniform random variates and chopping. Zero padding was found to have the
least effect on decreasing the identification error rates. All programs for preprocessing were
written in the R programming language and are available on the website.
2c. Toolmark profile simulation
A portion of the database consists of 9-mm cartridges fired for the Hamby-Thorpe study
[20]. For the original Hamby-Thorpe study, two to three cartridges/Glock were available. The
statistical analysis techniques used in this project, however, are numerically more reliable with
five or more “replicates” per experimental unit. In order to exploit the Hamby-Thorpe
benchmark data set, a wavelet decomposition based simulator was written in the R programming
language [18,21]. The waveslim and wmtsa R packages were used for the actual wavelet
decompositions of the toolmark profiles [22,23]. The wavelet expansion was used because it
offers a principled multi-scale description of surface morphology and allows for statistical
analysis to be carried out efficiently [24,25]. Following Fu, the fourth-order (24-parameter)
Coiflet wavelet basis set was used in all decompositions/syntheses [26]. It was decided to
balance the whole data set and simulate enough profiles so that each gun or each screwdriver was
represented by 30 mean profiles. For the Glock set, the real data consisted of 162 collected
profiles taken from a subset of 24 different Glocks in the database. After simulations were
carried out the data set size was 720 profiles (30 total for each Glock). For the screwdriver set,
the real data consisted of 290 collected profiles taken from 58 different screwdriver tip surfaces
11
(29 screwdrivers). After simulations were carried out the data set size was 1740 profiles (30 total
for each screwdriver tip surface).
Criteria for keeping simulated profiles was a correlation of greater than 0.5. This low
bound to “similarity” was chosen to generate a challenging set of profiles to discriminate.
Profiles were simulated in blocks of ten. The growing sets of group profiles (both real and
simulated) were fed back into the simulator as input until the set reached 30 acceptable profiles
(again, criteria for an acceptable simulated profile was a correlation “similarity score” greater
than or equal to 0.5 with the real profiles) [27]. The augmented data sets was renormalized and
registered between groups of toolmarks. The profiles were stacked together and zero padded as
before, forming a data matrix.
3. Details of Statistical Methodology
3a. The Data Matrix and Principal Component Analysis
Profiles for a given study were arranged into an n×p data matrix (X) where n is the
number of profiles and p is the number of points in each profile. Each value in the data matrix
represents a scaled z-height in a striation pattern profile. At this point in the analysis,
neighboring points in the profiles contain a great deal of redundant information. That is,
proximal points in a profile are correlated. An effective way to capture much of the essential
information within profiles while representing them with a smaller number of points is through
principal component analysis (PCA) [28]. The number of points, or PCs, used to represent the
profiles was hold-one-out cross validation (HOO-CV).
3b. Support Vector Machines
PCA itself does not identify which tool made a particular toolmark. In order to do that
PCA must be combined with a method to “learn” classification rules. Statistical learning theory
and its practical application, the support vector machine (SVM) is just such a method and was
developed in response to the need for reliable statistical discriminations within small to medium
sample size studies [29]. SVMs seek to determine efficient classification rules for objects
assuming nothing about the form of the underlying probability distribution generating the data.
This is a great advantage for application in forensic science. The fewer the decision algorithm’s
underlying assumptions, the less vulnerable its conclusions are to attack in court.
12
The one-vs.-one multi-category approach to SVM classification in the e1071 R package was
used in this study [30].
3c. Toolmark Identification Error Rate Estimation
An error is defined as a misclassification of a toolmark by the comparison algorithm.
This occurs when the algorithm does not identify the unknown toolmark as having been made by
the suspect tool when it indeed had, or the algorithm identifies the unknown toolmark as having
been made by the suspect tool when indeed it had not. The error rate estimate that was used in
this study was based on the bootstrap [31]. First a set of B bootstrap data sets are generated by
randomly selecting (with replacement) n toolmark pattern feature vectors from the original data
set X. Note that each bootstrap data set contains the same number of elements (toolmark pattern
feature vectors) as the original data set, thus some patterns may be repeated. The decision rules
are recomputed for each bootstrap sample. An average error rate is found using these decision
rules on the original data as well as the bootstrapped data. The difference between these two
averages is called the bootstrap estimated optimism. Averaging together these optimisms gives
the expected bootstrap estimated optimism. The averaged optimism is then added to the observed
error rate on the original training data. The sum gives what is called the refined bootstrap
estimate of identification error rate [31]. Custom bootstrapping routines written in the R
programming language are available on the website.
Results
Cartridge case primer shears
One-vs.-one multiclass support vector machines (SVMs) (linear kernel, penalty
parameter C = 1) were applied to the 162 real striation pattern profiles generated by the 24 9-mm
Glocks used in this study. Hold-one-out cross-validation indicated that 22 PCs were needed to
represent these toolmark profiles in order to obtain reasonably low identification error rates with
SVM. Using 2,000 bootstrap resampling iterations, 22D PCA-SVM produced a refined bootstrap
error rate estimate of 2.5%. The approximate 95% confidence interval around the error rate
estimate was [1.3%,3.2%].
Visual examination of mean profiles that were incorrectly identified in the initial HOO-
CV process were relatively straight forward to identify with the Glock that created it. We have
13
observed this behavior in past machine learning projects when too few replicate toolmarks per
tool are used in training the identification algorithm. We thus decided to examine if the error
rates would decrease and the confidence interval narrow if more replicate profiles per Glock
were used in the training process.
A simulation run was performed on the real primer shear profiles (162) using the same
operating parameters as above. A total of thirty real and simulated profiles were used in the
training/testing process for each Glock bringing the data set size up to 720 patterns.
Unfortunately, performing an HOO-CV computation on the entire 720 profile data set in order to
estimate an optimal number of PCs to use in the bootstrap error rate estimation proved to be too
computationally intense. Thus again, 22 PCs were used to represent the profiles as in the
previous calculation.
The refined bootstrap error rate estimate for 22D PCA-SVM discrimination model was
0.03%. The corresponding approximate 95% confidence interval around the error rate estimate,
[0.0%,0.1%], was indeed found to be narrower using the augmented data set with thirty
replications per Glock. This tells us something that is already well known in the artificial
intelligence community. Computers are good at identifying patterns, but it takes a lot of data to
do this.
Screwdrivers
For the PCA-SVM computations, HOO-CV was again used to find a lower dimensional
representation of the 290 real screwdriver striation pattern profiles that would still be adequate
for analysis. Using the data set projected into 26D PCA space, the refined bootstrapped error rate
estimate was found to be 6.5% with 2,000 resampling iterations. The 95% confidence interval for
the error rate, as determined by the bootstrap optimism histogram was 3.5% – 10%.
An error rate below 10% is generally considered good in the computational pattern
classification industry [32]. We however received a good deal of feedback from the practitioner
community that it is not considered high performance for forensic applications. Also this 95%
confidence interval around the error rate for the real data set was wide and indicative that a larger
training set is needed to narrow uncertainty. Again, as was the case for cartridge cases, visual
examination of a mean profile that was incorrectly identified in HOO-CV computations was
relatively straightforward to pair with the screwdriver that created it. A simulation run was
14
performed on the real screwdriver profiles (290) using the same operating parameters as were
used for the cartridge case primer shear profiles. Twenty-five profiles were simulated for each
screwdriver brining the data set size up to 1740 patterns (30 profiles per screwdriver). The
refined bootstrap error rate estimate for 26D PCA-SVM discrimination model was 0.01%. The
refined bootstrap 95% confidence interval around the error rate estimate, [0.0%,0.06%] was
indeed narrower using the augmented data set with thirty replications per screwdriver.
Conclusions
Impression evidence left at crime scenes is indispensable and cannot be allowed to
become inadmissible in court. Computational pattern recognition is already widely used in
industry, including chemical engineering, audio/visual engineering, mail and product sorting,
computer security, marketing, etc. It is absolutely critical that the forensic toolmark examination
community take advantage of the enormous potential of pattern recognition and the computing
power available today. Adopting these statistical techniques for impression pattern comparison
will yield standardized and efficient protocols as well as reproducible, independently verifiable,
fair and accurate conclusions.
In this paper we have shown that mean profiles, derived from striation patterns, can serve
as multivariate feature vectors. Information within such representations of toolmarks can be
suitably condensed with PCA and effectively discriminated with the "industrial-strength"
computational pattern recognition method of SVM. Toolmark identification error rates were low.
This is commensurate with the experience of practitioners. Still though, the computational
algorithm made identification errors (on smaller data sets) at a rate that we felt was too high for a
production level system in the forensic sciences. Simply looking at the misidentified patterns
however quickly led us to the conclusion that more training data was needed so that the routines
could account for a wider range of variation that can occur within a set of toolmarks made by the
same tool. For this reason the profile simulator was developed. When the SVM algorithm was
presented with a much larger data set consisting of both real and simulated profiles,
identification error rates dropped to trivial levels. This is also a drawback of the method.
Computers are "dumb" and need a lot of data to get exceptional performance in pattern
recognition tasks. Thus future directions of this research are to search for more efficient sets of
features that can be extracted form toolmark profiles such that exceptional identification
15
performance can be obtained from much smaller data sets. In this regard, and to open up the
problem to a wider audience, all of the data and programs developed in the course of this study
are available at http://toolmarkstatistics.jjay.cuny.edu/.
Acknowledgements
The authors would like to thank Lauren Claytor and Chris Luckie of the Commonwealth
of Virginia, Department of Forensic Sciences for providing us with cartridge case samples and
advice on improving the performance of our system. We thank Roger Xu of Intelligent
Automation Inc. for providing valuable advice on improving the performance of our profile
simulator. Finally we thank Pauline Leary at John Jay for kindly reading and commenting on the
content of our manuscript.
16
FIGURE 1. Screen shot of the database homepage.
17
FIGURE 2. ImageJ toolbar, 2D and 3D surface topography of a screwdriver striation pattern. ImageJ functionality makes measurement and manipulation of calibrated tool mark images from the database simple and flexible.
18
FIGURE 3. 2D topographies of three screwdriver striation patterns (two screwdrivers), shown in grey levels. Shown are two known matches and one known non-match.
19
FIGURE 4. Interactive 3D ImageJ images screwdriver striation patterns. The exemplars are the same as those shown in Figure 3.
20
FIGURE 5. Screwdriver holding jig for generating striation patterns on any media.
21
References
1. National Academy of Sciences, Strengthening forensic science in the United States: A path
forward, The National Academies Press, Washington, D.C., 2009.
2. Moran B., "A Report on the AFTE Theory of Identification and Range of Conclusions for
Tool Mark Identification and Resulting Approaches To Casework," AFTE Journal, Vol. 34, No.
2, 2002, pp. 227-35.
3. Geradts, Z., Keijer, J., and Keereweer, I., "A new approach to automatic comparison of
striation marks," Journal of Forensic Sciences, Vol. 39, No. 4, 1994, pp. 974 – 980.
4. De Kinder, J., and Bonfanti, M., "Automated comparisons of bullet striations based on 3D
topography," Forensic Science International, Vol. 101, No. 2, 1999, pp. 85 – 93.
5. Bachrach, B., "Development of a 3D-based automated firearms evidence comparison system,"
Journal of Forensic Sciences, Vol. 47, No. 6, 2002, pp. 1 – 12.
6. Banno, A., Masuda, T., and Ikeuchi, K., "Three dimensional visualization and comparison of
impressions on fired bullets," Forensic Science International, Vol. 140, No. 3, 2004, pp. 233 –
240.
7. Senin, N., Groppetti, R., Garofano, L., Fratini, P., and Pierni, M., "Three-dimensional surface
topography acquisition and analysis for firearm identification," Journal of Forensic Sciences,
Vol. 51, No. 2, 2006, pp. 282 – 295.
8. Roberge, D., and Beauchamp, A., (2006). "The use of BulletTrax-3D in a study of
consecutively manufactured barrels," AFTE Journal, Vol. 30, No. 2, 2006, pp. 166 – 172.
22
9. Tontarski, R.E., and Thompson, R.M. (1998). "Automated firearms evidence comparison: A
forensic tool for firearms identification–An update," Journal of Forensic Sciences, Vol. 43, No.
3, 1998, pp. 641 – 647.
10. Brinck, T.B., "Comparing the performance of IBIS and BulletTRAX-3D technology using
bullets fired through 10 consecutively rifled barrels", Journal of Forensic Sciences, Vol. 53, No.
3, 2008, pp. 677 – 682.
11. Faden, D., Kidd, J., Craft, J., Chumbley, L. S., Morris, M., Genalo, L., Kreiser, J., and Davis,
S., "Statistical confirmation of empirical observations concerning toolmark striae," AFTE
Journal, Vol. 39, No. 3, 2007, pp. 205 – 214.
12. Chumbley, L. S., Morris, M. D., Kreiser, M. J., Fisher, C., Craft, J., Genalo, L. J., Davis, S.,
Faden, D., and Kidd, J., "Validation of tool mark comparisons obtained using a quantitative,
comparative, statistical algorithm," Journal of Forensic Sciences, Vol. 55, No. 4, 2010, pp. 953 –
961.
13. Chu,W., Song, J., Vorburger, T., Yen, J., Ballou, S., and Bachrach, B., "Pilot study of
automated bullet signature identification based on topography measurements and correlations,"
Journal of Forensic Sciences, Vol. 55, No. 2, 2010, pp. 341 – 347.
14. Bachrach, B., Jain, A., Jung, S., and Koons, R.D., "A statistical validation of the individuality
and repeatability of striated toolmarks: Screwdrivers and tongue and groove pliers," Journal of
Forensic Sciences, Vol. 55, No. 2, 2010, pp. 348 – 357.
15. Artigas, R., "Imaging Confocal Microscopy". In: Optical Measurements of Surface
Topography, Ed: Leach, R., Springer, New York, 2011.
16. Zeiss Axio CSM 700 Confocal Microscope Software Manual.
23
17. Muralikrishnan B., Raja J., Computational Surface and Roundness Metrology, Springer, New
York, 2009.
18. R Core Development Team. (2009). R: A language and environment for statistical computing
[computer program]. 2.9.1th ed. Vienna, Austria: R Foundation for Statistical Computing.
19. Gambino, C., McLaughlin, P., Kuo, L., Kammerman, F., Shenkin, P., Diaczuk, P., Petraco,
N., Hamby, J., and Pertaco N.D.K., "Forensic Surface Metrology: Toolmark Evidence,",
Scanning, Vol. 33, 2011, pp. 1-7.
20. Hamby J., and Thorpe J., "The Examination, Evaluation and Identification of 9mm Cartridge
Cases Fired from 617 Different GLOCK Model 17 & 10 Semiautomatic Pistols," AFTE Journal,
Vol. 41, No. 4, 2009, pp. 310-324.
21. Percival D.B., and Walden A.T. Wavelet methods for time series analysis, Cambridge
University Press, New York, 2006.
22. waveslim R package (2012). waveslim: Basic wavelet routines for one-, two- and three-
dimensional signal processing [computer program]. 1.7.1th ed. Brandon Whitcher.
23. wmtsa R package (2012). wmtsa: Wavelet Methods for Time Series Analysis [computer
program]. 1.1-1th ed. William Constantine and Donald Percival.
24. Maksumov, A., Vidu, R., Palazoglu, A., and Stroeve, P., "Enhanced Feature Analysis Using
Wavelets for Scanning Probe Microscopy Images of Surfaces," Journal of Colloid and Interfacial
Science, Vol. 272, 2004, pp. 365-377.
25. Reizer, R., "Simulation of 3D Gaussian surface topography," Wear, Vol. 271, 2011, pp. 539-
543.
24
26. Fu, S., Muralikrishnan, B., and Raja, J., "Engineering Surface analysis with different wavelet
bases," Journal of Manufacturing Science and Engineering, Vol. 125, No. 4, 2003, pp. 844-852.
27. Xu, R., Personal Communication, August 9, 2012.
28. Jolliffe I.T., Principal component analysis, 2nd ed. Springer, New York, 2004.
29. Vapnik, V.N., Statistical learning theory, Wiley, New York, 1998.
30. e1071 R package (2012). e1071: Misc Functions of the Department of Statistics [computer
program]. 1.6-1th ed. Technische Universitat Wien, Austria.
31. Efron, B., and Tibshirani, R.J., An introduction to the bootstrap, Chapman & Hall, London,
1993.
32. Koren Y., "The belkor solution to the netflix grand prize."
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BelKor.pdf, 2009.