Download - Estimates of Striation Pattern Identification Error Rates by …jjcweb.jjay.cuny.edu/npetraco/pubs/AFTEJ_paper_14.pdf · 2014. 8. 27. · 60°, and 85°). Pearson correlation was

1

Estimates of Striation Pattern Identification Error Rates by Algorithmic Methods

In: AFTE J, 45(3):235-244 2013 Nicholas D. K. Petraco1, Loretta Kuo1, Helen Chan1, Elizabeth Phelps2, Carol Gambino3, Patrick

McLaughlin1,4, Frani Kammerman1, Peter Diaczuk1, Peter Shenkin1, Nicholas Petraco1,5 and James Hamby6

1John Jay College of Criminal Justice, City University of New York, 524 West 59th Street, New York, NY, 10019 2Boston Police Department, Crime Laboratory, One Schroeder Plaza, Boston, MA 02120

3Borough of Manhattan Community College, City University of NY, 199 Chambers Street, New York, NY, 10007 4New York City Police Department, Evidence Collection Unit. 1,5Petraco Forensic Consulting, 240 Abbey Street, Massapequa Park, NY 11762. 6International Forensic Science Laboratory & Training Centre, 2265 Executive Drive, Indianapolis, IN, 46241.

2

Abstract: This study presents a computationally based methodology to estimate identification

error rates of striation patterns in as modern and objective way as possible. A database was

assembled consisting of 3D striation patterns generated by standard tip screwdrivers and 9-mm

Glock firing pin apertures. These toolmark surfaces were digitally recorded by white light

confocal microscopy commonly used for surface metrology applications. Multivariate

algorithmic methods were used which encompass few assumptions and have a long and

successful application history in many scientific fields. Specifically, principal component

analysis and support vector machine methodology were exploited to objectively associate

striation patterns with the tools that created them. Estimated toolmark identification error rates

were far less than 1% so long as enough toolmark data is used to train the algorithm. Realizing

that our approach to this problem is not the only one possible and to stimulate interest in

constructing an open reference database of toolmarks and computer programs, all of the data and

software generated for this study is available at http://toolmarkstatistics.jjay.cuny.edu/ to

registered users for free.

Key words: forensic science, toolmarks, cartridge cases, screwdrivers, striation pattern,

database, confocal microscopy, surface metrology, multivariate statistics, machine

learning, error rates

3

Introduction

Forensic science has come under increased scrutiny in recent years. In February 2009, the

National Academy of Sciences (NAS) released their report, “Strengthening Forensic Science in

the United States: A Path Forward” [1]. The NAS report (2009) states that “much forensic

evidence— including, for example, bite marks and firearm and toolmark identifications—is

introduced in criminal trials without any meaningful scientific validation, determination of error

rates, or reliability testing to explain the limits of the discipline” (p. 3-18). It suggests

“additional studies should be performed to make the process of individualization more precise

and repeatable” (p. 5-21). This study outlines one such set of objective and testable methods to

associate toolmark impression evidence with the tools and firearms that generated them.

The basic elements of toolmark examination and comparison include the production of an

exemplar toolmark made from a questioned tool, and the comparison of the impressions of the

exemplar with that of the toolmark found at the crime scene. We can quantify this method by

representing impressions made by tools and firearms as mathematical patterns composed of

features. A particular approach for recognizing variations in patterns is with multivariate

statistical/algorithmic methods. In a computational pattern recognition context, these methods

are often referred to as machine learning. The mathematical details of machine learning can give

what Moran calls “…the quantitative difference between an identification and non-

identification” [2]. In our study, we use a surface metrological-algorithmic scheme to

statistically estimate the identification error rate parameter for striation pattern comparisons. In

the interests of transparency and reproducibility we necessarily focus on the details of the

approach.

Literature Review

Applications of some form of statistical/probabilistic method to toolmark analysis are

extensive in the literature. In this review we necessarily limit ourselves to discussing those

methods that explicitly relied on 3D imaging and explicit use of computers. Geradts, Keijer, and

Keereweer created a database for toolmarks (TRAX) with video-images and data about

toolmarks (width of toolmark, type of tool, microscope magnification, etc.) [3]. A video camera

on a comparison microscope is connected to a computer, which is used to scan the striation

patterns and digitize the image. They developed an algorithm for the automatic comparison of

4

digitized striation patterns. A comparison screen in TRAX makes it possible to compare images

of toolmarks. The system was tested with ten screwdrivers of the same brand and all striation

marks were identified with the correct screwdriver.

De Kinder and Bonfanti developed a system capable of performing automated comparisons

between striation marks on bullets, using laser profilometry, a non-contact laser scanning

technique that records the topography of a bullet [4]. The system was able to obtain a one-

dimensional array of characteristics out of the recorded data (a feature vector) and compare it to

similar quantities from other bullets using a correlation technique.

Bachrach discussed the development of SciClops, an automated microscope comparison

system using a 3D characterization of a bullet’s surface [5]. Preliminary tests were conducted to

evaluate the ability of the system to identify and distinguish bullets. It was determined that it was

possible to acquire reliable characterizations of a bullet’s surface, to accurately identify

similarities between bullets fired by the same gun, and to accurately discriminate between bullets

fired by different guns.

In Banno, Masuda, and Ikeuchi’s study, they presented an algorithm for a shape

comparison of impressions on bullets using 3D shape data [6]. A confocal microscope was used

to obtain 3D data of striated surfaces and to visualize virtual impressions. Then they aligned the

3D data to compare the shapes of the striations by computing a distance between two surfaces for

alignment.

Senin et al. introduced a 3D virtual comparison microscope to compare two specimens

through their virtual 3D reconstructions [7]. The authors determined that systems based on 3D

surface topography can aid in the visual comparison process, as well as in making quantitative

measurement over shape data. Furthermore, algorithms were also used to generate artificially

enhanced images. They concluded that visual enhancement tools and quantitative measurement

of shape properties could help a firearm examiner in comparing toolmarks.

A system known as BulletTRAX-3D™ aids forensic firearms examiners in the comparison

process. This system uses three-dimensional sensory technology, allowing operators to capture

2D digital images and to create 3D topographic models of the bullet's surface area. Roberge and

Beauchamp decided to apply the Tontarski and Thompson test to BulletTRAX-3D and determine

if the system was able to correctly match each numbered pair to a unique lettered pair [8,9]. The

test involves the comparison of twenty-one pairs of 9mm Luger Hi-Point bullets fired from ten

5

consecutively manufactured Hi-Point barrels. In the Roberge and Beauchamp paper, all pairs of

bullets in the test were imaged with BulletTRAX-3D, which computed a score that quantifies the

similarity of standard and test bullets. BulletTRAX-3D was able to accurately match each of the

numbered and lettered pairs, showing that the system could reproduce what firearms examiners

would do manually [8].

Brinck attempted to determine whether newer 3D imaging technology was better than 2D

technology by evaluating the abilities of IBIS and BulletTRAX-3D. In his experiment, bullets

from ten consecutively manufactured barrels were fired into a water recovery tank [10]. One pair

of copper-jacketed bullets and one pair of lead bullets were selected from those generated and

uploaded into IBIS and BulletTRAX-3D by the same operator. Brinck concluded that, although

IBIS is an effective tool for the identification of copper-jacketed bullets, BulletTRAX-3D was

better at identifying all bullet types tested (copper-jacketed, lead, and inter-composition bullets)

[10].

Faden et al. developed a computer program to compare toolmarks made from forty-four

consecutively manufactured screwdrivers on soft lead plates [11]. A surface profilometer was

used to make height, depth, and width measurements as a function of location on the sample

surfaces. Four marks were produced using both sides of each tool at three different angles (30°,

60°, and 85°). Pearson correlation was used to compare toolmarks involving true matches, true

non-matches, and marks made from different sides of the same tool. All produced high

correlation values, suggesting that the Pearson correlation alone is not effective at determining

when there is an actual match. There was, however, a significant separation in correlation values

between true match and true non-match toolmarks produced at the same angle, as well as,

toolmarks made from different sides of the same screwdriver tip, supporting the hypothesis that

different sides of a screwdriver act as different tools when producing toolmarks.

Chumbley et al. extended the Faden et al. study by comparing the effectiveness of an

algorithm to human examiners [11,12]. The algorithm they used first optimized the dataset, in

which it identifies a region of best agreement between the toolmark datasets being compared.

Next, the algorithm validated the dataset, in which the certain corresponding areas in the region

of best fit (on both toolmarks) are compared and a correlation value is calculated. If a match

exists at one point along the scan length (Optimization), there should be large correlations

between corresponding areas along their entire length (Validation). The authors then conducted a

6

double-blind study in which fifty experienced toolmark examiners gave their opinions on the

sample set. In the end, the authors determined that examiner performance was much better than

the algorithm, but the deficiencies could now be addressed and improved upon [12].

Chu et al. estimated the width of lands for 48 bullets using confocal microscopy. In their

study, each barrel had six lands; as a result, 288 land engraved area (LEA) widths were

calculated from each topography image [13]. The 48 bullets were classified into different groups

based on the width class characteristic for each LEA. Once the average profile is determined for

each LEA image, cross-correlation values were computed between the LEAs of two bullets and a

list of the best candidates is generated. For all 48 lists, the average number of correct matching

bullets was about 9.3% higher than that obtained using current optical reflection systems.

Furthermore, the error rate was about 24% smaller with confocal microscopy [13].

Bachrach, Jain, Jung, and Koons compared striated toolmarks from screwdrivers and

tongue and groove pliers using confocal microscopy [14]. They considered the effect of changing

the substrate onto which the toolmarks were created, as well as the angle of incidence for

creating the toolmark. Bachrach et al. sought to validate the basic premise of toolmark

examination, namely that toolmarks exhibit a high degree of individuality [14]. Algorithms were

developed to generate toolmark signatures, while metrics were used to assess the degree of

similarity between known matching and non-matching pairs. From these similarity values, the

authors determined that it was possible to evaluate “the degree to which toolmarks created by the

same tool are repeatable and distinguishable from toolmarks created by other tools” [14]. They

concluded that: (1) the striated toolmarks produced on the same medium and under the same

conditions were both repeatable and specific enough to allow for reliable identification of the

producing tool; (2) striated toolmarks created on different media but under the same conditions

could still be identified with high reliability; (3) screwdriver striated marks depend more on the

angle at which the toolmark is created than the media; (4) the probability of a pair of different

tools having similar features is extremely low; and (5) the probability of error from a faulty

image, not because of the tool itself, would not create repeatable and individual toolmarks [14].

Database and web interface

The database of all 3D toolmarks recorded for this study is available to the

firearms/toolmark research community and the forensic firearms/toolmark practitioner

7

community at http://toolmarkstatistics.jjay.cuny.edu/. Users can sign up to request an account,

where after approval, they will have full access to the database. Several pieces of software and

statistical analysis scripts were generated in the process of carrying out this project and are also

available on the above website. Note that the database is meant to explore what 3D microscopy

and computational pattern recognition is capable of: research, algorithmic development/testing,

and the generation of 3D toolmark images for case/court presentation purposes. It is not (yet)

meant specifically for casework.

A web interface was developed so that the data collected, as well as the

statistical/visualization software, can be searched or downloaded by interested users. Figure 1

shows a screenshot of the homepage for the database. Queries are returned as text descriptors

that can be clicked on to download data files to the user’s computer. The 3D surface data is

stored using the Mountains® metrology software system format. The Mountains® binary data

format was chosen because it is generally well known in the scientific community (specifically

metrology/mechanical engineering) and is published. Users will not need to have the Mountains®

software to open the files downloaded from the database. A Java language “plug-in” has been

written for the open-source digital imaging/analysis software suite ImageJ

(http://rsbweb.nih.gov/ij/ developed at the NIH). The plug-in, which is available on the website,

allows ImageJ functionality to be used to perform measurement tasks as well as interactive 3D

viewing of the tool mark surfaces. Figures 2, 3 and 4 show several screen shots of possible uses.

Users of the database can also develop their own analysis software to operate on our data files.

Surface file read in routines written in the programming languages Java, C++, Python,

MATLAB and R are available on the site in order to facilitate this task.

Materials and Methods

1. Toolmark data acquisition

A Zeiss Axio CSM-700 confocal microscope was used to analyze the toolmarks produced

for this study. Confocal Microscopy is an imaging technique that allows quantitative observation

of surface microstructure details, and the reconstruction of three-dimensional surface

topographies. See the excellent review article by Artigas [15].

For the "Glock" data set, 162 9-mm cartridge cases, fired from twenty-four Glock pistols

were collected. The primer shear marks were scanned using 50x magnification (0.95 NA).

8

Because the shear marks were not always normal to the breach face surface, the cases were

mounted on a goniometer during the scanning process to reduce tilt, keeping the scanned volume

(required confocal stack) to a minimum. A quick pre-scan with the 10x objective allowed

evaluation and accommodation for this natural tilt. Resulting scanned tiles were stitched together

and noisy end portions were cropped out. Zeiss "Z-interpolation" was used to threshold and

smooth dropouts and outliers [16].

Fifteen Craftsman® brand screwdrivers, 10 Iron Bridge® brand screwdrivers and 4

Workforce® brand screwdrivers were used to construct the "screwdriver" database (29 exemplar

screwdrivers total, all new and unused). Striation patterns form both sides of the 29 screwdrivers

were recorded, with five replicates each, creating a total of 290 striation patterns. Lead was used

as the recording medium. It is soft enough that it will not damage the tool’s working surface, and

it lacks the pitted texture that we observed on wax under the high magnification needed for the

confocal microscopy. The pitting on the wax surface significantly added to dropouts observed on

the digitally recorded surface. The screwdriver striation pattern exemplars were made using a jig

constructed to give the examiner good control over the tool’s lateral and rotational angles with

respect to the impression medium (cf. Figure 5). The jig was set to a consistent angle of 15˚ for

comparison purposes, and in each case, the screwdriver was pulled toward the operator. Note

that the same angle of attack was used in the screwdriver study of Bachrach et al. [14]. The

exemplars were scanned using the 50x-long working distance objective (0.6NA) due to the high

ridges encountered at the edges of the striation patterns. During scanning, the left edge of each

toolmark was denoted to provide a point of reference for the section of striations. Once the left

edge was marked, it was moved 1,000 µm to the right. The purpose of this was to decrease scan

time. The confocal microscope collects slices of information in the z-direction. Because the left

edge of the striation patterns are generally so much higher than the rest of the mark, scanning

from the left edge would have increased the scanning time dramatically, with relatively little gain

of information. From this point (1,000 µm from the left edge), seven sections were selected so

that there was some overlap for the confocal microscope software to stitch together. Zeiss "Z-

interpolation" was used to threshold and smooth dropouts and outliers [16].

2. Toolmark surface preprocessing routine

2a. Form removal and Filtering

9

Due to long range surface warping during the toolmark formation process, form was

removed for all recorded striation patterns. Third order polynomial surface fits were used for all

recorded striated surfaces. This degree polynomial was chosen because it was observed to have a

minimal set of degrees of freedom to remove a majority of gross surface warp. The resulting

form removed striation patterns were filtered into roughness and waviness components using the

Gaussian filter and λxc = λyc = 0.025 mm cutoff values [13,17]. Note that the definitions of form,

waviness and roughness are not unique. Our definitions correspond to the parameters stated

above.

Means laterally down the striation patterns were taken to turn each surface into an

average "profile". Mean profiles were used due to the high redundancy of information found on

the surface, top to bottom. Also, following the literature, it is current “standard practice” to use a

profile (usually the mean profile) as input into the statistical discrimination algorithms instead of

the entire surface (5,11-14).

From the profile plots it was clear that almost all of the “line structure” in the striation

pattern, apparent in a comparison microscope (Leica FSM), was contained in the waviness

surfaces across all of our samples. Therefore, only extracted waviness was utilized. The waviness

component of each mean profile was loaded into the R statistical program for further processing

and analysis [18].

2b. Registration and Alignment

Because each profile did not begin and end at the same points, the profiles required

alignment (i.e. registration) in order to be processed as multivariate feature vectors. In order to

register profiles from the same experimental unit (e.g. a Glock or a screwdriver), the cross-

correlation function (CCF) between two profiles from each group was computed to find the shift

that yielded maximum correlation – a linear, univariate measure of similarity [13,17]. Within a

group of experimental units, the longest profile is chosen as a reference or “anchor profile”. The

remaining profiles are then maximally aligned with respect to the anchor profile.

After profiles within each experimental unit were registered, profiles between

experimental units were aligned. This was done by computing a group-mean-profile (GMP) for

each of the within-group aligned profile sets [19]. The GMP for each experimental unit served as

a representation for that unit. The GMPs were then registered with respect to each other within a

10

user defined “uncertainty window”. The reason an uncertainty window was required for between

group registration was that, in general, there is no ubiquitous landmark available for any given

profile. The GMPs were aligned within a +/-100µm uncertainty window. The shift parameters

produced by the registration of group-means were used to shift all the mean profiles of the

groups in blocks. That is, each group of mean profiles was shifted by the amount required to

register the GMPs. All profiles used in an analysis were rescaled such that the lowest profile

point was designated 0 and the highest 1. This was done in order to minimize discrimination

between experimental units due only to valley depth and peak height variation. Valley depth and

peak height variation can be due to pressure variations in toolmark formation. Generally, this

should not be information that is used in toolmark discrimination. Length differences in the

profiles were padded with zeros. Several padding schemes were tried; zeros, standard Gaussian

random variates, uniform random variates and chopping. Zero padding was found to have the

least effect on decreasing the identification error rates. All programs for preprocessing were

written in the R programming language and are available on the website.

2c. Toolmark profile simulation

A portion of the database consists of 9-mm cartridges fired for the Hamby-Thorpe study

[20]. For the original Hamby-Thorpe study, two to three cartridges/Glock were available. The

statistical analysis techniques used in this project, however, are numerically more reliable with

five or more “replicates” per experimental unit. In order to exploit the Hamby-Thorpe

benchmark data set, a wavelet decomposition based simulator was written in the R programming

language [18,21]. The waveslim and wmtsa R packages were used for the actual wavelet

decompositions of the toolmark profiles [22,23]. The wavelet expansion was used because it

offers a principled multi-scale description of surface morphology and allows for statistical

analysis to be carried out efficiently [24,25]. Following Fu, the fourth-order (24-parameter)

Coiflet wavelet basis set was used in all decompositions/syntheses [26]. It was decided to

balance the whole data set and simulate enough profiles so that each gun or each screwdriver was

represented by 30 mean profiles. For the Glock set, the real data consisted of 162 collected

profiles taken from a subset of 24 different Glocks in the database. After simulations were

carried out the data set size was 720 profiles (30 total for each Glock). For the screwdriver set,

the real data consisted of 290 collected profiles taken from 58 different screwdriver tip surfaces

11

(29 screwdrivers). After simulations were carried out the data set size was 1740 profiles (30 total

for each screwdriver tip surface).

Criteria for keeping simulated profiles was a correlation of greater than 0.5. This low

bound to “similarity” was chosen to generate a challenging set of profiles to discriminate.

Profiles were simulated in blocks of ten. The growing sets of group profiles (both real and

simulated) were fed back into the simulator as input until the set reached 30 acceptable profiles

(again, criteria for an acceptable simulated profile was a correlation “similarity score” greater

than or equal to 0.5 with the real profiles) [27]. The augmented data sets was renormalized and

registered between groups of toolmarks. The profiles were stacked together and zero padded as

before, forming a data matrix.

3. Details of Statistical Methodology

3a. The Data Matrix and Principal Component Analysis

Profiles for a given study were arranged into an n×p data matrix (X) where n is the

number of profiles and p is the number of points in each profile. Each value in the data matrix

represents a scaled z-height in a striation pattern profile. At this point in the analysis,

neighboring points in the profiles contain a great deal of redundant information. That is,

proximal points in a profile are correlated. An effective way to capture much of the essential

information within profiles while representing them with a smaller number of points is through

principal component analysis (PCA) [28]. The number of points, or PCs, used to represent the

profiles was hold-one-out cross validation (HOO-CV).

3b. Support Vector Machines

PCA itself does not identify which tool made a particular toolmark. In order to do that

PCA must be combined with a method to “learn” classification rules. Statistical learning theory

and its practical application, the support vector machine (SVM) is just such a method and was

developed in response to the need for reliable statistical discriminations within small to medium

sample size studies [29]. SVMs seek to determine efficient classification rules for objects

assuming nothing about the form of the underlying probability distribution generating the data.

This is a great advantage for application in forensic science. The fewer the decision algorithm’s

underlying assumptions, the less vulnerable its conclusions are to attack in court.

12

The one-vs.-one multi-category approach to SVM classification in the e1071 R package was

used in this study [30].

3c. Toolmark Identification Error Rate Estimation

An error is defined as a misclassification of a toolmark by the comparison algorithm.

This occurs when the algorithm does not identify the unknown toolmark as having been made by

the suspect tool when it indeed had, or the algorithm identifies the unknown toolmark as having

been made by the suspect tool when indeed it had not. The error rate estimate that was used in

this study was based on the bootstrap [31]. First a set of B bootstrap data sets are generated by

randomly selecting (with replacement) n toolmark pattern feature vectors from the original data

set X. Note that each bootstrap data set contains the same number of elements (toolmark pattern

feature vectors) as the original data set, thus some patterns may be repeated. The decision rules

are recomputed for each bootstrap sample. An average error rate is found using these decision

rules on the original data as well as the bootstrapped data. The difference between these two

averages is called the bootstrap estimated optimism. Averaging together these optimisms gives

the expected bootstrap estimated optimism. The averaged optimism is then added to the observed

error rate on the original training data. The sum gives what is called the refined bootstrap

estimate of identification error rate [31]. Custom bootstrapping routines written in the R

programming language are available on the website.

Results

Cartridge case primer shears

One-vs.-one multiclass support vector machines (SVMs) (linear kernel, penalty

parameter C = 1) were applied to the 162 real striation pattern profiles generated by the 24 9-mm

Glocks used in this study. Hold-one-out cross-validation indicated that 22 PCs were needed to

represent these toolmark profiles in order to obtain reasonably low identification error rates with

SVM. Using 2,000 bootstrap resampling iterations, 22D PCA-SVM produced a refined bootstrap

error rate estimate of 2.5%. The approximate 95% confidence interval around the error rate

estimate was [1.3%,3.2%].

Visual examination of mean profiles that were incorrectly identified in the initial HOO-

CV process were relatively straight forward to identify with the Glock that created it. We have

13

observed this behavior in past machine learning projects when too few replicate toolmarks per

tool are used in training the identification algorithm. We thus decided to examine if the error

rates would decrease and the confidence interval narrow if more replicate profiles per Glock

were used in the training process.

A simulation run was performed on the real primer shear profiles (162) using the same

operating parameters as above. A total of thirty real and simulated profiles were used in the

training/testing process for each Glock bringing the data set size up to 720 patterns.

Unfortunately, performing an HOO-CV computation on the entire 720 profile data set in order to

estimate an optimal number of PCs to use in the bootstrap error rate estimation proved to be too

computationally intense. Thus again, 22 PCs were used to represent the profiles as in the

previous calculation.

The refined bootstrap error rate estimate for 22D PCA-SVM discrimination model was

0.03%. The corresponding approximate 95% confidence interval around the error rate estimate,

[0.0%,0.1%], was indeed found to be narrower using the augmented data set with thirty

replications per Glock. This tells us something that is already well known in the artificial

intelligence community. Computers are good at identifying patterns, but it takes a lot of data to

do this.

Screwdrivers

For the PCA-SVM computations, HOO-CV was again used to find a lower dimensional

representation of the 290 real screwdriver striation pattern profiles that would still be adequate

for analysis. Using the data set projected into 26D PCA space, the refined bootstrapped error rate

estimate was found to be 6.5% with 2,000 resampling iterations. The 95% confidence interval for

the error rate, as determined by the bootstrap optimism histogram was 3.5% – 10%.

An error rate below 10% is generally considered good in the computational pattern

classification industry [32]. We however received a good deal of feedback from the practitioner

community that it is not considered high performance for forensic applications. Also this 95%

confidence interval around the error rate for the real data set was wide and indicative that a larger

training set is needed to narrow uncertainty. Again, as was the case for cartridge cases, visual

examination of a mean profile that was incorrectly identified in HOO-CV computations was

relatively straightforward to pair with the screwdriver that created it. A simulation run was

14

performed on the real screwdriver profiles (290) using the same operating parameters as were

used for the cartridge case primer shear profiles. Twenty-five profiles were simulated for each

screwdriver brining the data set size up to 1740 patterns (30 profiles per screwdriver). The

refined bootstrap error rate estimate for 26D PCA-SVM discrimination model was 0.01%. The

refined bootstrap 95% confidence interval around the error rate estimate, [0.0%,0.06%] was

indeed narrower using the augmented data set with thirty replications per screwdriver.

Conclusions

Impression evidence left at crime scenes is indispensable and cannot be allowed to

become inadmissible in court. Computational pattern recognition is already widely used in

industry, including chemical engineering, audio/visual engineering, mail and product sorting,

computer security, marketing, etc. It is absolutely critical that the forensic toolmark examination

community take advantage of the enormous potential of pattern recognition and the computing

power available today. Adopting these statistical techniques for impression pattern comparison

will yield standardized and efficient protocols as well as reproducible, independently verifiable,

fair and accurate conclusions.

In this paper we have shown that mean profiles, derived from striation patterns, can serve

as multivariate feature vectors. Information within such representations of toolmarks can be

suitably condensed with PCA and effectively discriminated with the "industrial-strength"

computational pattern recognition method of SVM. Toolmark identification error rates were low.

This is commensurate with the experience of practitioners. Still though, the computational

algorithm made identification errors (on smaller data sets) at a rate that we felt was too high for a

production level system in the forensic sciences. Simply looking at the misidentified patterns

however quickly led us to the conclusion that more training data was needed so that the routines

could account for a wider range of variation that can occur within a set of toolmarks made by the

same tool. For this reason the profile simulator was developed. When the SVM algorithm was

presented with a much larger data set consisting of both real and simulated profiles,

identification error rates dropped to trivial levels. This is also a drawback of the method.

Computers are "dumb" and need a lot of data to get exceptional performance in pattern

recognition tasks. Thus future directions of this research are to search for more efficient sets of

features that can be extracted form toolmark profiles such that exceptional identification

15

performance can be obtained from much smaller data sets. In this regard, and to open up the

problem to a wider audience, all of the data and programs developed in the course of this study

are available at http://toolmarkstatistics.jjay.cuny.edu/.

Acknowledgements

The authors would like to thank Lauren Claytor and Chris Luckie of the Commonwealth

of Virginia, Department of Forensic Sciences for providing us with cartridge case samples and

advice on improving the performance of our system. We thank Roger Xu of Intelligent

Automation Inc. for providing valuable advice on improving the performance of our profile

simulator. Finally we thank Pauline Leary at John Jay for kindly reading and commenting on the

content of our manuscript.

16

FIGURE 1. Screen shot of the database homepage.

17

FIGURE 2. ImageJ toolbar, 2D and 3D surface topography of a screwdriver striation pattern. ImageJ functionality makes measurement and manipulation of calibrated tool mark images from the database simple and flexible.

18

FIGURE 3. 2D topographies of three screwdriver striation patterns (two screwdrivers), shown in grey levels. Shown are two known matches and one known non-match.

19

FIGURE 4. Interactive 3D ImageJ images screwdriver striation patterns. The exemplars are the same as those shown in Figure 3.

20

FIGURE 5. Screwdriver holding jig for generating striation patterns on any media.

21

References

1. National Academy of Sciences, Strengthening forensic science in the United States: A path

forward, The National Academies Press, Washington, D.C., 2009.

2. Moran B., "A Report on the AFTE Theory of Identification and Range of Conclusions for

Tool Mark Identification and Resulting Approaches To Casework," AFTE Journal, Vol. 34, No.

2, 2002, pp. 227-35.

3. Geradts, Z., Keijer, J., and Keereweer, I., "A new approach to automatic comparison of

striation marks," Journal of Forensic Sciences, Vol. 39, No. 4, 1994, pp. 974 – 980.

4. De Kinder, J., and Bonfanti, M., "Automated comparisons of bullet striations based on 3D

topography," Forensic Science International, Vol. 101, No. 2, 1999, pp. 85 – 93.

5. Bachrach, B., "Development of a 3D-based automated firearms evidence comparison system,"

Journal of Forensic Sciences, Vol. 47, No. 6, 2002, pp. 1 – 12.

6. Banno, A., Masuda, T., and Ikeuchi, K., "Three dimensional visualization and comparison of

impressions on fired bullets," Forensic Science International, Vol. 140, No. 3, 2004, pp. 233 –

240.

7. Senin, N., Groppetti, R., Garofano, L., Fratini, P., and Pierni, M., "Three-dimensional surface

topography acquisition and analysis for firearm identification," Journal of Forensic Sciences,

Vol. 51, No. 2, 2006, pp. 282 – 295.

8. Roberge, D., and Beauchamp, A., (2006). "The use of BulletTrax-3D in a study of

consecutively manufactured barrels," AFTE Journal, Vol. 30, No. 2, 2006, pp. 166 – 172.

22

9. Tontarski, R.E., and Thompson, R.M. (1998). "Automated firearms evidence comparison: A

forensic tool for firearms identification–An update," Journal of Forensic Sciences, Vol. 43, No.

3, 1998, pp. 641 – 647.

10. Brinck, T.B., "Comparing the performance of IBIS and BulletTRAX-3D technology using

bullets fired through 10 consecutively rifled barrels", Journal of Forensic Sciences, Vol. 53, No.

3, 2008, pp. 677 – 682.

11. Faden, D., Kidd, J., Craft, J., Chumbley, L. S., Morris, M., Genalo, L., Kreiser, J., and Davis,

S., "Statistical confirmation of empirical observations concerning toolmark striae," AFTE

Journal, Vol. 39, No. 3, 2007, pp. 205 – 214.

12. Chumbley, L. S., Morris, M. D., Kreiser, M. J., Fisher, C., Craft, J., Genalo, L. J., Davis, S.,

Faden, D., and Kidd, J., "Validation of tool mark comparisons obtained using a quantitative,

comparative, statistical algorithm," Journal of Forensic Sciences, Vol. 55, No. 4, 2010, pp. 953 –

961.

13. Chu,W., Song, J., Vorburger, T., Yen, J., Ballou, S., and Bachrach, B., "Pilot study of

automated bullet signature identification based on topography measurements and correlations,"

Journal of Forensic Sciences, Vol. 55, No. 2, 2010, pp. 341 – 347.

14. Bachrach, B., Jain, A., Jung, S., and Koons, R.D., "A statistical validation of the individuality

and repeatability of striated toolmarks: Screwdrivers and tongue and groove pliers," Journal of

Forensic Sciences, Vol. 55, No. 2, 2010, pp. 348 – 357.

15. Artigas, R., "Imaging Confocal Microscopy". In: Optical Measurements of Surface

Topography, Ed: Leach, R., Springer, New York, 2011.

16. Zeiss Axio CSM 700 Confocal Microscope Software Manual.

23

17. Muralikrishnan B., Raja J., Computational Surface and Roundness Metrology, Springer, New

York, 2009.

18. R Core Development Team. (2009). R: A language and environment for statistical computing

[computer program]. 2.9.1th ed. Vienna, Austria: R Foundation for Statistical Computing.

19. Gambino, C., McLaughlin, P., Kuo, L., Kammerman, F., Shenkin, P., Diaczuk, P., Petraco,

N., Hamby, J., and Pertaco N.D.K., "Forensic Surface Metrology: Toolmark Evidence,",

Scanning, Vol. 33, 2011, pp. 1-7.

20. Hamby J., and Thorpe J., "The Examination, Evaluation and Identification of 9mm Cartridge

Cases Fired from 617 Different GLOCK Model 17 & 10 Semiautomatic Pistols," AFTE Journal,

Vol. 41, No. 4, 2009, pp. 310-324.

21. Percival D.B., and Walden A.T. Wavelet methods for time series analysis, Cambridge

University Press, New York, 2006.

22. waveslim R package (2012). waveslim: Basic wavelet routines for one-, two- and three-

dimensional signal processing [computer program]. 1.7.1th ed. Brandon Whitcher.

23. wmtsa R package (2012). wmtsa: Wavelet Methods for Time Series Analysis [computer

program]. 1.1-1th ed. William Constantine and Donald Percival.

24. Maksumov, A., Vidu, R., Palazoglu, A., and Stroeve, P., "Enhanced Feature Analysis Using

Wavelets for Scanning Probe Microscopy Images of Surfaces," Journal of Colloid and Interfacial

Science, Vol. 272, 2004, pp. 365-377.

25. Reizer, R., "Simulation of 3D Gaussian surface topography," Wear, Vol. 271, 2011, pp. 539-

543.

24

26. Fu, S., Muralikrishnan, B., and Raja, J., "Engineering Surface analysis with different wavelet

bases," Journal of Manufacturing Science and Engineering, Vol. 125, No. 4, 2003, pp. 844-852.

27. Xu, R., Personal Communication, August 9, 2012.

28. Jolliffe I.T., Principal component analysis, 2nd ed. Springer, New York, 2004.

29. Vapnik, V.N., Statistical learning theory, Wiley, New York, 1998.

30. e1071 R package (2012). e1071: Misc Functions of the Department of Statistics [computer

program]. 1.6-1th ed. Technische Universitat Wien, Austria.

31. Efron, B., and Tibshirani, R.J., An introduction to the bootstrap, Chapman & Hall, London,

1993.

32. Koren Y., "The belkor solution to the netflix grand prize."

http://www.netflixprize.com/assets/GrandPrize2009_BPC_BelKor.pdf, 2009.