Copyright © 2010 by Edmund H. Conrow
1
Evaluation of Subjective Probability Statements
Dr. Edmund H. Conrow, CMC, CPCM, CRM, PMP*
Management and Technology Associates, Redondo Beach, California 90278
Subjective probability statements have been used for decades in intelligence and risk
analyses. A subjective probability example was obtained and evaluated that has been
broadly distributed. This example, published in 1977, contains 16 subjective probability
statements and corresponding probability values from 23 respondents, and includes results
from two prior studies—one from 1964, while the other cannot be specifically identified.
The graphic summarizing the results was evaluated and several errors and deficiencies were
identified, including mis-worded subjective probability statements, incorrect minimum and
maximum probability bounds, and a distortion in the graphic grid that may introduce non-
trivial error into values with a probability less than 0.20. Subsequent re-publication of this
graphic by a number of authors introduced a variety of additional errors (e.g., a response
count other than 23), and makes estimation of descriptive statistics unreliable. Because of
these difficulties, the original 16 subjective probability statements were included in a new
survey in randomized order. This survey was completed by 141 engineers and project
managers, and a statistical analysis was performed on the results for each subjective
probability statement. The resulting response range was ≥≥≥≥ 0.90 in all 16 cases, ≥≥≥≥ 0.95 in 11
cases and equal to 1.00 in five cases! An approach was developed to filter potential outliers
based upon 18 (of the 141) responses from systems engineers at one company. The resulting
filtered data eliminated 1 to 18 responses per subjective probability statement, and the
resulting range per statement was ≥≥≥≥ 0.90 for only one of the 16 statements, but ≥≥≥≥ 0.50 in 11
of the 16 statements. While the filtered data improvement is noteworthy in terms of
identifying and eliminating potential outliers, the results still strongly suggest that
probability scales or tables based upon subjective probability statements should not be used
in risk analyses. The reason for this concern is that a subjective probability statement (e.g.,
probable) and the associated probability value for a statement (e.g., 0.68) may be interpreted
as mis-matched, hence incorrect in the mind of the analyst, and lead to mis-scoring the
resulting probability level (e.g., via cognitive dissonance).
Introduction
Subjective probability statements have been used in the intelligence, risk management, and other communities
for decades. However, both probability values associated with subjective statements and the manner in which the
results are depicted are often incorrect. Probability values and a graphical portrayal generated over 30 years ago are
evaluated for one such set of subjective statements to examine their accuracy and determine how useful they may be
for risk management purposes.
In this paper I closely examine an estimative probability table (presented in graphical figure form) composed of
16 subjective probability statements that was developed in the 1970s and widely distributed in the 1980s and 1990s
through the present time. Specifically, I examine the data contained in the figure for the 16 statements, evaluate the
graphical boundaries that sometimes appear with these statements, and examine results from a statistical analysis of
a moderate-sized data collection (141 surveys) associated with the same 16 subjective probability statements. More
than 10 years span the time I first examined a version of this figure to when I completed my analysis. The results of
this analysis are somewhat surprising and point to the limitations of using estimative probability tables and scales in
risk analyses. Such tables and scales should be viewed as more of a “last resort” than first choice and not used in a
risk analysis unless they are the only available means to evaluate probability [1]. [This is a different situation than
faced by the intelligence community (e.g., as discussed in the 1950s and 1960s by Sherman Kent), where subjective
statements might imply a probability [2]. With project risk management subjective statements are sometimes used
to estimate probability of occurrence associated with a risk analysis.]
* Principal, P. O. Box 1125, Redondo Beach, CA 90278, www.risk-services.com, Associate Fellow and Life
Member.
AIAA SPACE 2010 Conference & Exposition 30 August - 2 September 2010, Anaheim, California
AIAA 2010-8739
Copyright © 2010 by Edmund H Conrow. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.
Copyright © 2010 by Edmund H. Conrow
2
Graphical Data Representation
Several different graphical representations of the data associated with the 16 subjective probability statements
have been located [3], [4], [5], [6], [7], [8]. Most appear to be derived directly or indirectly from a 1977 Decisions
and Designs, Inc. report authored by Barclay et. al. [9], and none, including the 1977 report, provide the data in
tabular (numerical) form. The graphical representation given in Figure 1, is from the 1977 Barclay et. al. report [9],
which includes a brief discussion of the plot and how the data was generated.
Fig. 1 What Uncertainty Statements Mean to Different Readers [9]
The data and its graphical portrayal given in Figure 1 were examined between 1998 and 2009. Several results
are summarized here. First, without supplemental information, it is unclear what the dots represent in Figure 1. In
fact, they are results from 23 survey participants, but in numerous published cases from the 1980s to 2009, between
three and 25 responses were present rather than 23 (and in none of these publications was the specific number of
responses listed as 23). (The underlying numerical data is lost. This is despite recent attempts by two co-authors of
the original 1970s study to locate the raw data per my request.) Second, without supplemental information, it is
unclear what the bars represent in Figure 1. For example, the bars aren’t the data range of the 23 survey
participants. Third, the probability ranges associated with the bars were obtained from a 1960s study authored by
Sherman Kent [2] and almost all of the subjective probability statements with bars had one or both bar (minimum or
maximum) bounds that were erroneous when compared to the original source used to estimate the bar bounds.
Fourth, without supplemental information, it is not possible to correctly determine that the bars (probability ranges
from the Sherman Kent study [2]) have no relationship to the dots (survey responses from a likely 1970s analysis)
for any of the subjective statements since the bars and dots were derived from two different and completely
unrelated studies performed perhaps 10 or more years apart by two different organizations. [The source of the study
performed with 23 survey participants is unknown, but unlikely to be from Decisions and Designs, Inc., else the
citation within Barclay et. al. [9] would have been written differently. Thus, Figure 1 was likely created from two
independent studies (Kent [2] and an unknown source), then overlaid by Barclay et. al. [9].] I will now explore the
above items in greater detail.
Supplemental information from Barclay et. al. [9] reveals the following about the source of the data and results.
Twenty-three officers (NATO intelligence analysts), with ranks from Lieutenant General to Squadron Leader,
evaluated the 16 subjective probability phrases [9]. Each “dot” in Figure 1 represents one respondent’s probability
Copyright © 2010 by Edmund H. Conrow
3
assessment. Some phrases, in particular “better than even,” were interpreted similarly by most of the respondents
[9]. However, for most subjective phrases there was “tremendous” variation in the responses (as evident in Figure
1) [9].
“In 1964, Sherman Kent attempted to mitigate this problem within the intelligence community by proposing a
scale range of probabilities for various” subjective phrases [9]. These ranges are indicated by the shaded areas in
Figure 1 (termed here a “Kent bar.”) Unfortunately, as discussed below, the ranges attributed to tables developed by
Sherman Kent [2] that are included in Figure 1 [9] and used by other researchers while correctly cited by Kent [2]
are incorrectly used in most other cases without any notation to the reader of potential errors.
Statistical Analysis of Data and Data Ranges
Figure 1 was available as a scanned image contained within a PDF file of Barclay et. al. [9]. On-screen (72 dots
per inch) digitizing software coupled with a 600 dot per inch mouse was used to convert the graphical representation
into probability values (0 to 100 in percent) for the 23 responses for each of the 16 subjective probability statements.
Without knowing that there were 23 responses for each statement [9] it would have been very easy to miss a number
of the data points because of both close clustering of values, and the dot pattern that was used to fill the shaded areas
associated with the Kent ranges in Figure 1 [9]. The latter issue was dominant given the potential overlap between
dots representing responses and the shading pattern for each of the 13 responses that included a Kent range. A
cursory (and likely not perfect) examination of the number of responses per subjective probability statement yielded
the following results for each corresponding reference: three to nine responses [3], eight to 15 responses [4], 18 to
24 responses [5], 20 to 22 responses [6], 21 to 23 responses [7], and 23 responses (although identifying some
responses was extremely difficult) [9].
While descriptive statistics were developed for the 23 responses for each of the 16 subjective probability
statements, the results are not reported here due to potential scanning and digitizing errors. To quantify potential
errors, measurements were made using the figure contained in the PDF file at 10 percent intervals from zero to 100
percent for the “very good chance” and “almost no chance” subjective probability statements from Figure 1. [These
two portions of the figure were selected because they contained no shaded regions (“boxes”) associated with the
Kent ranges.]
I was also able to obtain a high quality copy of Figure 1 contained in a hardcopy of Barclay et. al. [9] courtesy of
Dr. Cameron R. Peterson, one of the documents co-authors. I scanned the hardcopy of the figure at 600 dots per
inch (dpi) in an attempt to have a “cleaner” separation between the 23 data points (“dots”) and the shading pattern
used in the “Kent bars.” This approach proved successful in making it easier to identify the 23 data points.
Unfortunately, a non-linear spatial distortion was identified in both the PDF file version of this figure I originally
worked with and existed in the scanned version I created from Dr. Peterson’s hardcopy.
With the PDF (file) version of this figure, the resulting error measured for two statements (almost no chance,
very good chance) was between 0.3% and 0.5%, respectively for probability values denoted by the vertical grid
between 30 and 100 percent. However, at 20 percent the resulting error was 2.5% to 2.6%, and at 10 percent this
error increased to 6.1% to 8.0%. With the 600 dpi scanned hardcopy file, the resulting error measured for two
statements (almost no chance, very good chance) was between 0.5% and 0.8%, respectively for probability values
denoted by the vertical grid between 30 and 100 percent. However, at 20 percent the resulting error was 2.7% to
3.1%, and at 10 percent this error increased to 6% to 7%.
The above results show that the error is likely caused by the spatial alignment of the grid at the 10% and 20%
values, rather than from scanning either or both of the hardcopies (the original source into the PDF file and the
digitization I performed of Dr. Peterson’s hardcopy). However, it’s not possible to know if the data placement
between 0% and 20% (hence any corresponding data points) are in error as a result of the shifted grid. (The two
possibilities include a shifted grid but correct data point placement, and shifted grid and incorrect data point
placement.)
The nature of the resulting errors is even more complicated when descriptive statistics are calculated from
redrawn figures rather than Figure 1 [9] as illustrated in results presented by Hillson [8]. The same type of problems
are also true when ranges are estimated from redrawn figures rather than the than Figure 1 [9], let alone failing to
recognize that these ranges were actually developed by Kent [2] as mentioned in Barclay et. al. [9]. More than one
layer of errors can exist in such cases. Again, this is illustrated in results presented by Hillson [8].
Finally, as mentioned above the Kent ranges are independent of the 23 respondent’s assigned probability values
in each case because the Kent ranges were estimated in a different manner and likely years earlier than the survey
responses. Hence errors associated with the Kent ranges are a matter of how the Kent data was interpreted and how
Copyright © 2010 by Edmund H. Conrow
4
the resulting figure(s) were drawn. The subsequent errors have nothing to do with the survey data from the 23
respondents.
The Kent ranges contained in Barclay et. al. [9] were a mix of statements contained in Table 1, where “give or
take” is assumed to mean “plus or minus” [2], plus a second table contained in Kent [2], and given in Table 2 below.
Table 2 represents, per Kent, synonyms for several (six) subjective probability statements. The degree of accuracy
associated with the Kent ranges given in Figure 1 [9] can be determined given the data contained in Tables 1 and 2.
Each Kent range contains an upper and lower bound. Of the 16 subjective probability statements, only three (“very
good chance,” “little chance,” and “almost no chance”) do not have Kent ranges in Figure 1 [9]. Of the 13
subjective probability statements that do have Kent ranges, one is erroneous—“highly unlikely,” as this statement
does not appear in Tables 1 and 2 from Kent [2].
Value Modifier Range Subjective Phrase
100% N/A 100% Certainty
93% Give or take about 6% 87% to 99% Almost certain
75% Give or take about 12% 63% to 87% Probable
50% Give or take about 10% 40% to 60% Chances about even
30% Give or take about 10% 20% to 40% Probably not
7% Give or take about 5% 2% to 12% Almost certainly not
0% N/A 0% Impossibility
Table 1. Subjective Probability Assessment Table (Developed by Sherman Kent [2])
Subjective Phrase Synonyms
Possible Conceivable
Could
May
Might
Perhaps
Almost certain Virtually certain
All but certain
Highly probable
Highly likely
Odds (or chances) overwhelming
Probable Likely
We believe
We estimate
50-50 Chances about even
Chances a little better (or less) than even
Improbable
Unlikely
Probably not We believe that…not
We estimate that…not
We doubt, doubtful
Almost certainly not Virtually impossible
Almost impossible
Some slight chance
Highly doubtful
Table 2. Table of Synonyms for Subjective Phrases (Developed by Sherman Kent [2])
Copyright © 2010 by Edmund H. Conrow
5
Of the 12 subjective probability statements that have valid Kent ranges, two of the 12 are statements from Table
1 (“probable” and “probably not”), while the other 10 cases are synonyms contained in Table 2. Ten of the 12
subjective probability statements had one or both bounds that were erroneous when compared to the information in
Tables 1 and 2. (In five cases both bounds were erroneous and in the other five cases one bound was erroneous.)
The absolute magnitude of these probability errors ranged from three to 25 percent, and was eight percent or larger
for seven of the ten subjective probability statements. I leave it as an exercise to the reader to estimate the accuracy
of the Kent ranges represented in references [3], [4], [5], [6], [7], and [9].
While errors associated with the “Kent bars” were identified and corrected, it was not possible to accurately
determine descriptive statistics for the “dots” because of the nonlinear geometric distortions in the grid spacing
present in Figure 1 (previously discussed). (This error exists in the original publication and cannot be removed.)
The nonlinear grid spacing error, inability to locate the numerical values in tabular form, together with the fact that
only 23 survey responses existed, led to creating and distributing a survey of the 16 subjective probability statements
to obtain a larger and more reliable statistical sample for analysis.
Finally, it should be noted that the data contained in the representation of Figure 1 in references [3], [4], [5], [6],
[7], and [9] contain errors—in some cases not simply limited to the Kent ranges. (See the discussion in the appendix
to this paper for additional information.) As a result, Figure 1 and related illustrations should be considered nothing
more than historical examples. These figures may not accurately portray the underlying data and results, and the
figures should not be used in performing intelligence assessments, risk analyses, etc.
Survey Developed for 16 Subjective Probability Statements
The 16 subjective probability statements given in Figure 1 were randomized and included in a survey that was
both conducted in person and through representatives of different organizations. The randomization was performed
to eliminate the possibility of adjustment and anchoring biases adversely affecting the respondent’s scoring of the
subjective probability statements [10]. The randomization also included a manual check and re-adjustment where
needed to insure that no consecutive subjective probability statements as ordered in Figure 1 existed in the survey
form.
Conrow developed instructions for completing the survey which follow:
“Sixteen statements that have been used for subjective probability assessments are given in the
table below. However, different people will assign a different probability value to each statement.
Please assess the probability level of each statement, then write a single value in each cell to the
right of the statement. Write the numbers as an integer between 0 and 100 (e.g., a probability of
0.5 = 50). There is no “correct” answer for any statement.”
“Please perform the assessment by yourself—don’t do it with another person. Don't attempt to
modify the table—enter the values exactly as the table is structured. Your contact information will
be held strictly confidential, not provided to anyone else, not used for spam, etc. You will not be
contacted unless there is an error in completing the survey.”
Altogether 141 respondents completed the survey. The respondents were from a wide variety of industry,
government, professional societies, and other organizations. The respondents were primarily located in the United
States, but a small number were European aerospace engineers (who participated via a professional society
headquartered in the U. S.). The primary occupations of the participants were engineers and project managers, and
the primary industries were aerospace and information technology (IT).
Survey Results
When the 141 responses were examined it was clear that responses from a number of individuals appeared to be
unreasonable, even when examined against the broad range for each of the 16 subjective probability statements in
Figure 1. Statistical results for the 141 raw (unfiltered) responses for the 16 statements are given in Table 3. The
number of respondents was 141, but in 36 cases a single subjective probability statement (“improbable”) was
repeated in random order in the questionnaire to determine if the results were statistically significantly different.
(Consequently, for the subjective probability statement “unlikely” there were 36 fewer responses, reflecting the dual
inclusion of “improbable,” or 105 total.)
Copyright © 2010 by Edmund H. Conrow
6
The use of two cases of “improbable” was performed with one group of 36 project managers, primarily (but not
exclusively) with IT backgrounds. Hence the test attempted to address the question of whether/not the sample
Table 3. Statistical Analysis of Raw Subjective Probability Statement Survey Responses
respondents would assess the probability level similarly or differently for the same statement in their randomized
survey. A Mann-Whitney W test was used to compare the medians of the two samples of 36 responses for
“improbable.” (This test is constructed by combining the two samples, sorting the data from smallest to largest, and
comparing the average ranks of the two samples in the combined data.) The resulting test score had a probability of
0.24. Since the probability value is greater than or equal to 0.05, there was not a statistically significant difference
between the medians of the two samples at the 95.0% confidence level. This indicates that the 36 respondents did
not inconsistently score the two different “improbable” subjective probability statements (based upon comparing the
sample medians). (Extrapolations beyond this one statement should not be attempted.)
From Table 3 it is evident that a wide range of responses existed for each subjective probability statement. The
range was ≥ 0.90 in all 16 cases, ≥ 0.95 in 11 cases and equal to 1.00 in five cases! These results show that there is a
distinct possibility that at least some individuals provided responses to statements that were statistical outliers.
Of the 141 total survey respondents, 18 were systems engineers at “Company X” (which requested that the
company name be withheld for competitive reasons). These survey responses were performed individually but
nevertheless appeared both reasonable and had a relatively small standard deviation for each of the 16 subjective
probability statements. An approach was developed for filtering the raw responses from the 141 participants based
upon the 18 responses from “Company X.” First, the deviation each respondent’s assessed probability level for each
subjective probability statement was calculated by:
(Probability Estimate– “Company X” Sample Mean)/”Company X” Sample Standard Deviation (1)
Equation (1) represents the number of (“Company X”) standard deviations that a given assessed probability
value is relative to the “Company X” mean for a given subjective probability statement. The maximum and
minimum values for the 16 (statements) x 18 (“Company X” responses) = 288 scores were +2.8 standard deviations
from the mean and – 3.2 standard deviations from the mean, respectively. (There were only seven scores out of 288
with a value > 2.00 standard deviations from the mean and three scores out of 288 scores with a value < -2.00
standard deviations from the mean.)
Equation (1) was used to filter the 16 (statements) x 141 (responses) = 2,256 total responses. If the absolute
value of a respondent’s estimated probability was more than 5.0 “Company X” sample standard deviations from the
“Company X” mean that value was eliminated (and counted as an outlier). All 16 values were evaluated
independently such that only identified outliers meeting the above criteria were eliminated, while the values that did
not meet the outlier criteria from a given respondent were retained for further statistical analysis.
The test of the subjective probability phrase “improbable” was re-evaluated using the data filtered for outliers (as
mentioned above). There was 30 filtered responses in each of the two “improbable” cases, hence six values were
Statement Average Std Dev Min 10 25 50 75 90 Max Range Number
Almost certainly 0.87 0.19 0.05 0.70 0.90 0.95 0.97 0.99 1.00 0.95 141
Highly likely 0.82 0.17 0.10 0.70 0.77 0.90 0.90 0.95 1.00 0.90 141
Very good chance 0.80 0.16 0.00 0.70 0.75 0.80 0.90 0.95 1.00 1.00 141Probable 0.68 0.16 0.05 0.50 0.60 0.70 0.75 0.85 1.00 0.95 141
Likely 0.70 0.16 0.05 0.50 0.60 0.75 0.80 0.90 1.00 0.95 141
Probably 0.69 0.18 0.05 0.50 0.60 0.75 0.80 0.90 1.00 0.95 141
We believe 0.68 0.21 0.05 0.40 0.60 0.70 0.80 0.90 1.00 0.95 141
Better than even 0.56 0.13 0.00 0.50 0.51 0.55 0.60 0.66 1.00 1.00 141We doubt 0.26 0.18 0.00 0.05 0.10 0.25 0.40 0.50 0.90 0.90 141
Improbable 0.20 0.24 0.00 0.00 0.02 0.10 0.30 0.54 1.00 1.00 177
Unlikely 0.23 0.15 0.00 0.07 0.10 0.20 0.30 0.38 1.00 1.00 105
Probably not 0.23 0.19 0.00 0.05 0.10 0.20 0.30 0.49 0.90 0.90 141Little chance 0.15 0.15 0.00 0.04 0.05 0.10 0.20 0.30 0.90 0.90 141
Almost no chance 0.09 0.17 0.00 0.01 0.01 0.05 0.10 0.15 0.95 0.95 141
Highly unlikely 0.16 0.23 0.00 0.01 0.05 0.05 0.10 0.50 1.00 1.00 141Chances are slight 0.18 0.17 0.00 0.05 0.10 0.15 0.25 0.30 0.90 0.90 141
Percentiles
Copyright © 2010 by Edmund H. Conrow
7
removed as outliers in each of the two cases. As described for raw data values, a Mann-Whitney W test was used to
compare the medians of the two samples of 30 filtered responses for “improbable.” The resulting test score was a
probability value of 0.11. Since the probability value is greater than or equal to 0.05, there was not a statistically
significant difference between the medians of the two samples at the 95.0% confidence level. This indicates that the
30 respondents did not inconsistently score the two different “improbable” subjective probability statements (based
upon comparing the sample medians). (Extrapolations beyond this one statement should not be attempted.)
The data for all 16 survey subjective probability statement responses was non-normal based upon formal
distribution fitting using Anderson Darling and Kolmogorov Smirnov test statistics at the 0.05 probability level. In
addition, the skewness and kurtosis of the resulting data also indicated that the responses for the 16 subjective
probability statements were non-normal. Thus it is not possible to equate five standard deviations to a probability
value using a simple normal distribution.
Twenty or more distribution types were evaluated against the 105 to 177 “raw” responses for each of the 16
subjective probability statements, as represented by results given in Table 3. There were only two subjective
probability statements that yielded one or more distribution fits that could not be rejected at the 0.05 probability
level for the 20 or more distribution types tested. These two cases were “we believe” and “likely.” Each statement
was not rejected by the Anderson Darling and Kolmogorov Smirnov tests at the 0.05 probability level for the
Johnson SU distribution [11], [12] but was rejected for all other distributions tested. There is insufficient evidence
to accept the Johnson SU distribution as an adequate representation of the data in these two cases even though the
distribution type could not be rejected at the 0.05 level. Data from the remaining 14 subjective probability
statements were rejected from 20 or more distribution types evaluated with both the Anderson Darling and
Kolmogorov tests at the 0.05 level. However, based upon examining the data and results it is believed that selecting
5.0 standard deviations as the outlier criteria is both reasonable and not overly conservative (which tends to preserve
variability in the responses).
The resulting descriptive statistics for the filtered data is given in Table 4. While there were modest to moderate
differences in the average values between the raw and filtered data cases on a percentage basis, as evident from
comparing Tables 3 and 4, there was virtually no difference in the medians (50th
percentile) of the two cases. Larger
differences were noted in the standard deviation, minimum, maximum, and range values of the two cases than for
the average or median. (This is because the outliers tend to manifest near the bottom or top percentile values for a
given statement, and hence their influence on the average and median values is typically less than the standard
deviation, minimum, maximum, and range values.) For example, with the filtered data there is only one subjective
probability statement that had a range ≥ 0.90 (”better than even”) versus all 16 statements that had a range ≥ 0.90
for the unfiltered data! As shown in Table 5, the difference in the results of the unfiltered vs. filtered cases show
that some individuals (one to 18) provided responses to statements that were outliers (as defined above),
corresponding to 0.7 to 13.7 percent of the sample that completed the survey. Three of the 16 subjective probability
statements had more than 10% of the respondents categorized as outliers, while nine of the 16 cases had more than
5% of the respondents categorized as outliers (given the criteria used here). These results clearly show that
substantial deviations from nominal values occur on more than rare occasions.
Table 4. Statistical Analysis of Filtered Subjective Probability Statement Survey Responses
Statement Average Std Dev Min 10 25 50 75 90 Max Range Number
Almost certainly 0.93 0.06 0.75 0.85 0.90 0.95 0.98 0.99 1.00 0.25 125
Highly likely 0.86 0.08 0.60 0.75 0.80 0.90 0.90 0.95 1.00 0.40 131
Very good chance 0.83 0.10 0.48 0.70 0.80 0.83 0.90 0.95 1.00 0.52 134Probable 0.68 0.15 0.20 0.50 0.60 0.70 0.76 0.85 1.00 0.80 140
Likely 0.71 0.14 0.30 0.51 0.60 0.75 0.80 0.90 1.00 0.70 138
Probably 0.71 0.15 0.25 0.50 0.60 0.75 0.80 0.90 1.00 0.75 136
We believe 0.71 0.17 0.30 0.50 0.60 0.70 0.80 0.94 1.00 0.70 133
Better than even 0.56 0.12 0.10 0.50 0.51 0.55 0.60 0.66 1.00 0.90 139We doubt 0.25 0.16 0.00 0.05 0.10 0.25 0.39 0.46 0.70 0.70 138
Improbable 0.13 0.14 0.00 0.00 0.01 0.10 0.20 0.40 0.50 0.50 159
Unlikely 0.21 0.11 0.00 0.06 0.10 0.20 0.25 0.35 0.50 0.50 102
Probably not 0.20 0.14 0.00 0.05 0.10 0.20 0.25 0.40 0.60 0.60 135Little chance 0.12 0.08 0.00 0.03 0.05 0.10 0.15 0.24 0.40 0.40 133
Almost no chance 0.05 0.04 0.00 0.01 0.01 0.05 0.05 0.10 0.20 0.20 130
Highly unlikely 0.08 0.07 0.00 0.01 0.05 0.05 0.10 0.20 0.30 0.30 124Chances are slight 0.16 0.10 0.00 0.05 0.10 0.15 0.20 0.30 0.51 0.51 134
Percentiles
Copyright © 2010 by Edmund H. Conrow
8
Table 5. Statistical Analysis of Filtered Subjective Probability Statement Survey Responses
Table 6 provides the number of individuals that had responses categorized as outliers on a per subjective
probability statement basis. Fifty five individuals, corresponding to 39.0 percent of the total respondents had one or
more assigned estimated probabilities that was categorized as an outlier response (as defined above), while 30
individuals (21.3%) had two or more outlier responses. Similarly, 16.3% had three or more outliers, and 13.5% had
four or more outliers (which corresponds to 25+% of the total subjective probability statements). This is a fairly
large proportion and one that should bring pause to those considering using the 16 subjective probability statements
evaluated here to either represent particular concerns or as part of an estimative probability table or scale for use in a
risk analysis. The reason for this concern is that a subjective probability statement (e.g., probable) and the
associated probability value for a statement (e.g., 0.68) may be interpreted as mis-matched, hence incorrect in the
mind of the analyst, and lead to mis-scoring the resulting probability level (via cognitive dissonance). For example,
while the filtered mean respondent score for “probable” was 0.68 (Table 4), the range for this subjective probability
statement was 0.80 (minimum = 0.20 and maximum = 1.00). If “probable” coupled with a value of 0.68 was used to
represent an ordinal scale level (e.g., 3), and the analyst believes, for example, that the correct value for “probable”
is 0.90, then the analyst might score the item in question as ordinal level 4 instead (where level 4 represents a higher
probability level than level 3). While mis-scoring items in a risk analysis were not directly evaluated in this study,
the results presented illustrate the potential for mis-scoring when a probability scale based upon subjective
probability statements is used.
Table 6. Statistical Analysis of Subjective Probability Statement Survey Responses
As evident from Table 5, filtering respondent data based upon results from a control group can remove potential
outlier values. This will typically reduce the estimated range and standard deviation for each subjective probability
statement. However, as previously mentioned estimative probability tables and scales should be viewed as more of
a “last resort” than first choice and not used in a risk analysis unless they are the only available means to evaluate
probability [1].
Statement Range Difference Unfiltered Number Filtered Number Respondent Difference Percent Respondent Difference
Almost certainly 0.70 141 125 16 12.8
Highly likely 0.50 141 131 10 7.6
Very good chance 0.48 141 134 7 5.2Probable 0.15 141 140 1 0.7
Likely 0.25 141 138 3 2.2
Probably 0.20 141 136 5 3.7
We believe 0.25 141 133 8 6.0Better than even 0.10 141 139 2 1.4
We doubt 0.20 141 138 3 2.2
Improbable 0.50 177 159 18 11.3
Unlikely 0.50 105 102 3 2.9
Probably not 0.30 141 135 6 4.4Little chance 0.50 141 133 8 6.0
Almost no chance 0.75 141 130 11 8.5
Highly unlikely 0.70 141 124 17 13.7Chances are slight 0.39 141 134 7 5.2
Number of Statements Number of Outlier Respondents Percent Respondents Outliers
1 25 17.7
2 7 5.0
3 4 2.8
4 5 3.55 5 3.5
6 5 3.5
7 3 2.18 1 0.7
Total 55 39.0
Copyright © 2010 by Edmund H. Conrow
9
There are at least five specific reasons why this is the case [1]:
• First, the definitions for probability statements are interpreted differently by different analysts.
• Second, results obtained from estimative probability data will typically have a high degree of uncertainty, or
at least variability given that the range (maximum – minimum) is typically substantial for many statements.
From Table 4, filtered values for 11 of the 16 subjective probability statements had a range ≥ 0.50!
• Third, candidate risks often evaluated with estimative probability data (e.g., an ordinal scale or a probability
table) may actually be related to maturity (e.g., potential development status of an item) or some other
criteria different than probability. This forces the analyst to choose a probability level that may not be known
or even apply.
• Fourth, probability data of this type almost never represent probabilities associated with actual measured
values (e.g., real world data or survey results), but typically only subjective estimates made by the author of
the estimative probability ordinal scale or probability table and later the analyst attempting to use it.
• Fifth, in cases where the probability representation of a risk may actually be valid, the analyst often has little
or no knowledge how to score the given risk.
Given the above limitations, maturity-based, resource-based, and other classes of ordinal scales should typically
be used in a risk analyses instead of estimative probability scales and tables whenever possible [13].
Conclusions
Subjective probability values have been used for decades in intelligence and risk analyses. The example
obtained and used in this paper, published in 1977, includes results associated with 16 subjective probability
statements. The 1977 authors included a figure that was a combination of results from a 1964 study overlaid on
another study whose pedigree is unclear (beyond the fact that 16 subjective probability statements were evaluated by
23 respondents and that the work was performed in 1977 or earlier). A variety of errors were identified in the
original 1977 figure which included the two different study results. These errors were subsequently propagated in
part by at least six different authors, several of these six also introduced other errors into the re-published figure.
While the errors in each figure, both the original and each re-published one, were identified, one error associated
with a non-linear grid between probability values of zero and 0.20 could not be corrected. This provides a “lesson
learned” for future subject probability analysis researchers. Unless the original numerical data is available (or a
means to precisely generate it), subsequent analyses may yield flawed results because of underlying errors present in
graphical representations of the data.
Because of these issues, a survey was developed and conducted using a randomized ordering of the same 16
subjective probability statements as published in the 1977 study. Results obtained from the 141 respondents
included a large range (≥ 0.90) for every one of the 16 subjective probability statements—clearly indicating that
outliers existed in each sample. An approach was developed to filter potential outliers based upon a sub-sample of
the data (corresponding to responses from 18 systems engineers from a single organization), and applied to the data
sample for each statement. The resulting filtered data revealed between one and 18 outliers (of 141 respondents) for
each of the 16 subjective probability statements, along with a substantial reduction in the range for each statement.
The resulting range per statement was ≥ 0.90 for only one of the 16 statements, but ≥ 0.50 in 11 of the 16
statements. This represents a substantial reduction in the range per statement with only a relatively small number of
data points eliminated.
While the improvement resulting from the filtering process is noteworthy in terms of identifying and eliminating
potential outliers, the results still strongly suggest that probability scales based upon subjective probability
statements should not be used in risk analyses because they leave open the chance of mis-scoring. Precisely worded
probability-related scales based upon maturity, resources, etc. should be used instead of estimative probability scales
whenever possible.
Appendix
The Kerzner [3] risk management section has been updated by Edmund Conrow since the Seventh Edition
(2000). Figure 17-4, pg. 764 [3] was removed in 2009 (Tenth Edition) upon discovery that it contained errors in
labeling one subjective probability statement (“almost likely” instead of “almost certainly”) and graphically
representing the data, and replaced by a figure derived from survey data given in Appendix J of Conrow [14]. This
substitution of information occurred in the Kerzner Tenth Edition, Second Printing, 2009. While Conrow had
previously and separately seen references [4], [5], and [7] prior to 1999, and obtained the Kent reference [2] in April
1999, he did not closely examine the figures in these references together with Figure 17-4 in the Tenth Edition of
Copyright © 2010 by Edmund H. Conrow
10
Kerzner (and corresponding figures in earlier editions), until 2009. Nevertheless, Figure 17-4 was removed at the
first opportunity after the problems with this figure were identified and verified.
Pariseau and Oswalt [4] cited an earlier (1973) document apparently from the same company as Barclay et. al.
[9]. Efforts to locate this earlier document through multiple Department of Defense sources, and the founder and
former President of Decisions and Designs, Inc. were futile. Dr. Igar Oswalt was extremely helpful, and supplied a
copy of Barclay et. al. [9] to Conrow in 2009 (which led to Conrow updating the Kerzner material discussed above.
The graphical portrayal included by Pariseau and Oswalt in their paper (pg. 157) includes Kent bars for “very good
chance,” “little chance,” and “almost no chance,” whereas these subjective probability phrases were not included in
Kent [2]. However, the Kent bars are not explained in the Pariseau and Oswalt paper.
Boehm [5] cited reference [7] as the source of this information (pg. 494). But reference [7] cited Barclay et. al.
[9] as the source. Boehm [5] wisely and correctly said that “…although there is reasonable consensus on the overall
location of these adjectives (statements) on the (assigned) probability scale, the location is not precise and is likely
to vary significantly from one person to another” (pg. 132).
Heuer [6] cited Barclay et. al. [9] as the source of this information. Heuer did not discuss the Kent ranges, only
saying that the “shaded areas in the table show the ranges proposed by Kent” (pg. 154). Heuer does correctly state
that “probability ranges attributed to Kent in this table (pg. 155) are slightly different from those in Sherman Kent,
“Words of Estimated Probability…” (reference [2]). Heuer’s graphical portrayal (pg. 155) includes an additional
subjective probability statement, “about even” that contains no data points, but does show a Kent range from 40 to
60 percent probability. This subjective probability statement is not included in Barclay et. al. [9] which is used by
Heuer (pp. 153-154), hence was not part of the survey evaluated by the 23 respondents. However, the Kent range is
included in Kent’s Table 1 (from 40 to 60 percent probability) as “chances about even” [2]. On the surface this may
appear acceptable, but even slight changes in wording for subjective probability statements (in this case “about
even” vs. “chances about even”) may lead to differences in the resulting assigned probability level. Including this
statement also intermixes information from two different studies (references [2] and [9]) without providing an
explanation to the reader. Heuer also includes a Kent range for “highly unlikely” as did Barclay et. al. [9]. This is
erroneous as Kent [2] did not include a range for this subjective probability statement.
The Defense Systems Management College Risk Assessment Techniques [7] cited Barclay et. al. [9]. However,
it does not explicitly discuss the Kent ranges, only saying that the “dark bars represent a researcher’s
recommendations for standardization” (pg. D-3).
Hillson [8] used Boehm [5] as the source, and Hillson reported the mode and range results for a limited number
of subjective probability statements. Of the eight Hillson subjective probability statements that are common with
Boehm [5] (and Barclay et. al. [9]), two of the mode estimates reported by Hillson were incorrect. When Hillson’s
mode estimates results were compared to the Barclay et. al. values [9], five of the eight mode estimates were
incorrect. (In any event, it is inadvisable to use the mode as a primary statistical measure since slight measurement
errors can lead to non-trivial differences in results when the data distribution is not strictly unimodal and a small
sample size (23 respondents) exists as with many of the distributions associated with the 16 subjective probability
statements.)
Upon evaluating the Hillson range values derived from Boehm [5] it is apparent that Hillson simply used the
Boehm representation of the Kent ranges without recognizing that the Kent ranges were unrelated to the 23
respondent’s data, that the Kent ranges were not specifically identified in Boehm, and that Boehm made no claim as
to what the rectangular bars represented [5]. Of the eight ranges provided by Hillson that overlapped Barclay et. al.
[9], six matched the Boehm illustration, while two did not [5]. (One of the two errors, “good chance” for “very good
chance,” was present in the Boehm illustration [5] but Boehm correctly had no Kent range associated with it,
because there was none given by Barclay et. al. [9] let alone Kent [2].) When the eight Hillson cases were compared
to the actual Kent ranges [2], the ranges given by Hillson were correct in only one case (“almost certain” for “almost
certainly”), while incorrect in the other seven cases.
References
[1] Conrow, E., Effective Risk Management: Some Keys to Success, Second Edition, American Institute of
Aeronautics and Astronautics, 2003, pp. 204, 491-493.
[2] Kent, S., “Words of Estimative Probability,” Studies in Intelligence, Center for the Study of Intelligence, Central
Intelligence Agency, Fall 1964, (declassified and released for public distribution and use Sept. 22, 1993), pp. 49–65.
[3] Kerzner, H., Project Management: A Systems Approach to Planning, Scheduling, and Controlling, Tenth
Edition, Wiley, 2009. pp. 763-764.
Copyright © 2010 by Edmund H. Conrow
11
[4] Pariseau, R., and Oswalt, I., “Using Data Types and Scales for Analysis and Decision Making,” Acquisition
Review Quarterly, Vol. 1, No. 2, Spring 1994, pp. 156-157.
[5] Boehm, B. W., Software Risk Management. Piscataway, New Jersey, USA: IEEE Computer Society Press,
1989, pp. 132-133.
[6] Heuer, R., Psychology of Intelligence Analysis, Center for the Study of Intelligence, Central Intelligence
Agency, 1999, pp. 154-155.
[7] _____, Risk Assessment Techniques, Defense Systems Management College, First Edition, 1983, pp. D-1 to D-
3.
[8] Hillson, D., “Describing Probability: the Limitations of Natural Language,” Exhibit 5, pg. 5. This paper was
originally published as part of the Project Management Institute Global Congress 2005, Edinburgh, United
Kingdom.
[9] Barclay, S., et. al., “Handbook for Decision Analysis,” Decisions and Designs, Inc., Report Number TR-77-6-
30, Contract Number N00014-76-0074, September 1977, pp. 67-68.
[10] Tversky, A., and Kahneman, D., “Judgment Under Uncertainty: Heuristics and Biases,” Science, Vol. 185, 27
Sept. 1974, pp. 1124–1131.
[11] Johnson, N. L., Kotz, S., and Balakrishnan, N,. Continuous Univariate Distributions, Vol. 1, 2nd ed., John
Wiley & Sons, Inc., New York, 1994, pp. 34-38.
[12] Lee, T., and Thomas, D., “Cost Growth Models for NASA’s Programs,” Journal of Probability and Statistical
Science, Vol. 1 No. 2 August 2003 pp. 265-279.
[13] Conrow, E., Effective Risk Management: Some Keys to Success, Second Edition, American Institute of
Aeronautics and Astronautics, 2003, pp. 461-483.
[14] Conrow, E., Effective Risk Management: Some Keys to Success, Second Edition, American Institute of
Aeronautics and Astronautics, 2003, pp. 491-513.