Statistical Considerations of the Histomorphometric Test Protocol for Determination of Human Origin of
Skeletal Remains
John E. Byrd, Ph.D. D-ABFAMaria-Teresa Tersigni-Tarrant, Ph.D.
Central Identification LaboratoryJPAC
human vs. non-human• Fundamental question in forensic
anthropology• Relevant to medicolegal significance• Results needed quickly• Typically determined by macroscopic
observations• Small fragments present special
problems
Histomorphology as a solution
• Some patterns are decisively nonhuman (e.g. plexiform bone)
• Presence of primary and secondary osteons is a hallmark of human bone, but not unique
• This allows us to REJECT
human as the origin, but not ACCEPT!
[Note the assymmetry]
Histomorphometrics
• Metrics present a more powerful approach to recognizing human remains due to the ability to deal with bone showing only circular osteons
• Osteon area is the preferred measurement• Use of metrics requires statistical approach• This presentation will describe a test protocol
for using osteon area to segregate human from non-human bone
Test Protocol
1. Section case specimen, embed in resin, mount on slide, capture digital image with scale
2. Import image into Image J©
3. Select random sample of 30 osteons4. Measure the osteon area of the 30 osteons and
calculate sample mean5. Test null hypothesis that remains have same
mean as human reference sample6. Assess the strength of evidence in results
How good is the test?
• We compared the human osteon area data to osteon areas from 17 chimpanzees, our closest living relative. Chimp data was obtained from slides in possession of AFIP Museum.
• We ran the test on 37 specimens from 35 known humans (independent of reference sample) to check performance
Comparison to chimpanzees0 P e r c e n t T o t a l
Chimp:Osteon Area 25377 micron2
Standard Dev of mean 2692
Human:Osteon Area 37365 micron2
Standard Dev of mean 2728 These statistics were derived from a bootstrap procedure involving 1000 samples of N=30. Histograms depict the 1000 means. Note that the standard deviation is analogous to the standard error in parametric statistics.
Application to test sample• 37 specimens from 35 different known
individuals were tested using this protocol• Tests included no erroneous results
Cutoff (p < 0.05) = < 32,839
Validation?
• Chimpanzees show similarly large osteon areas, yet are shown to be statistically separable. Power of test versus chimp = 0.90.
• Application of test to known human test sample revealed no errors. Presumably, future applications will mis-fire according to the p-value chosen (e.g. 0.1, 0.05, 0.01, etc.)
• Small overlap in distribution of chimp means versus human means is very encouraging
Example• Known sample (#02H) from dog• Protocol is followed to obtain a sample mean
value of 17980 micron2
• Since this is below the cutoff (p<0.05) of 32,839 micron2, we reject the null hypothesis
Post-hoc Evaluation of a Result
• Concept proposed by Karl Popper• A hypothesis (interpretation) should be accepted
just to the extent that it has survived a severe test• Mayo (1996) has operationalized the concept into a
statistic that can be applied to test results post-hoc• Not the same thing as power of the test
Severity!
Severity• Operationalizes the idea of the severe test• In the case where we do NOT reject the null
hypothesis, we are interested in the probability that if an opposing hypothesis (alternative “A”) were true, our test result would have indicated so (by a more significant departure)
• If this probability is high it meansP(test result more sig than observed; A) = high• Thus, we can take the observed statistic as indicating
“not-A” with severity. We would usually relate this to the parameter, as in there is evidence µ > µ’ (for some µ’ alternative to null)
Severity cont’d• For the case we DO reject null hypothesis—• We are interested in the probability that if the
null were true, we would have seen a less impressive departure from the null
• If this probability is high, it means that P(test result less sig; null) = high
• This is (1-p). With small p-value, we can infer with severity (1-p) evidence for some discrepancy from the null
Severity example: Dog sample
• Human reference standard from bootstrap runs: Mean 37365.3, std of mean 2727.7
• Test: H0: sample is from population with µ ≥ 37365.3; reject iff T < T* (where T is test result and T* is cutoff)
• Dog sample: Mean 17980• T = 37365.3-17980/2727.7; P = 0.000000040 Reject!
• Severity = 1 - p = >99%• Read this number as, “There is a more than 99% chance
that if this sample were human, we would have obtained a different (larger) result.”
Severity example: Human sample
• Human reference standard from bootstrap runs: Mean 37365.3, std of mean 2727.7
• Test sample Individual #33 humerus sample Mean 44205.9
• Test: H0: sample is from population with µ ≥ 37365.3; reject iff T < T* (where T is test result and T* is cutoff)
• P = 0.99 Accept! But, how sure am I that this specimen is human, not other animal?
• Bear in mind that all of the other species we have measured, have smaller osteons…
• We can calculate the severity in multiple ways that address varying concerns (varying alternatives)—
• If my concern is that I will mistake a chimpanzee sample for human with this test: Use chimp sample mean and std as the basis for the severity estimate.
• Chimp Mean 25377.6, std 2692.1• Distance stat given the test sample result of 44205.9
is T = (44205.9-25377.6)/2692.1 • Using normal distribution,
P(T < observed T; chimp) = 0.99999• Severity = 0.99999
Severity curve for test sample result--