[American Institute of Aeronautics and Astronautics AIAA SPACE 2010 Conference & Exposition -...

Copyright © 2010 by Edmund H. Conrow

1

Evaluation of Subjective Probability Statements

Dr. Edmund H. Conrow, CMC, CPCM, CRM, PMP*

Management and Technology Associates, Redondo Beach, California 90278

Subjective probability statements have been used for decades in intelligence and risk

analyses. A subjective probability example was obtained and evaluated that has been

broadly distributed. This example, published in 1977, contains 16 subjective probability

statements and corresponding probability values from 23 respondents, and includes results

from two prior studies—one from 1964, while the other cannot be specifically identified.

The graphic summarizing the results was evaluated and several errors and deficiencies were

identified, including mis-worded subjective probability statements, incorrect minimum and

maximum probability bounds, and a distortion in the graphic grid that may introduce non-

trivial error into values with a probability less than 0.20. Subsequent re-publication of this

graphic by a number of authors introduced a variety of additional errors (e.g., a response

count other than 23), and makes estimation of descriptive statistics unreliable. Because of

these difficulties, the original 16 subjective probability statements were included in a new

survey in randomized order. This survey was completed by 141 engineers and project

managers, and a statistical analysis was performed on the results for each subjective

probability statement. The resulting response range was ≥≥≥≥ 0.90 in all 16 cases, ≥≥≥≥ 0.95 in 11

cases and equal to 1.00 in five cases! An approach was developed to filter potential outliers

based upon 18 (of the 141) responses from systems engineers at one company. The resulting

filtered data eliminated 1 to 18 responses per subjective probability statement, and the

resulting range per statement was ≥≥≥≥ 0.90 for only one of the 16 statements, but ≥≥≥≥ 0.50 in 11

of the 16 statements. While the filtered data improvement is noteworthy in terms of

identifying and eliminating potential outliers, the results still strongly suggest that

probability scales or tables based upon subjective probability statements should not be used

in risk analyses. The reason for this concern is that a subjective probability statement (e.g.,

probable) and the associated probability value for a statement (e.g., 0.68) may be interpreted

as mis-matched, hence incorrect in the mind of the analyst, and lead to mis-scoring the

resulting probability level (e.g., via cognitive dissonance).

Introduction

Subjective probability statements have been used in the intelligence, risk management, and other communities

for decades. However, both probability values associated with subjective statements and the manner in which the

results are depicted are often incorrect. Probability values and a graphical portrayal generated over 30 years ago are

evaluated for one such set of subjective statements to examine their accuracy and determine how useful they may be

for risk management purposes.

In this paper I closely examine an estimative probability table (presented in graphical figure form) composed of

16 subjective probability statements that was developed in the 1970s and widely distributed in the 1980s and 1990s

through the present time. Specifically, I examine the data contained in the figure for the 16 statements, evaluate the

graphical boundaries that sometimes appear with these statements, and examine results from a statistical analysis of

a moderate-sized data collection (141 surveys) associated with the same 16 subjective probability statements. More

than 10 years span the time I first examined a version of this figure to when I completed my analysis. The results of

this analysis are somewhat surprising and point to the limitations of using estimative probability tables and scales in

risk analyses. Such tables and scales should be viewed as more of a “last resort” than first choice and not used in a

risk analysis unless they are the only available means to evaluate probability [1]. [This is a different situation than

faced by the intelligence community (e.g., as discussed in the 1950s and 1960s by Sherman Kent), where subjective

statements might imply a probability [2]. With project risk management subjective statements are sometimes used

to estimate probability of occurrence associated with a risk analysis.]

* Principal, P. O. Box 1125, Redondo Beach, CA 90278, www.risk-services.com, Associate Fellow and Life

Member.

AIAA SPACE 2010 Conference & Exposition 30 August - 2 September 2010, Anaheim, California

AIAA 2010-8739

Copyright © 2010 by Edmund H Conrow. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.


2

Graphical Data Representation

Several different graphical representations of the data associated with the 16 subjective probability statements

have been located [3], [4], [5], [6], [7], [8]. Most appear to be derived directly or indirectly from a 1977 Decisions

and Designs, Inc. report authored by Barclay et. al. [9], and none, including the 1977 report, provide the data in

tabular (numerical) form. The graphical representation given in Figure 1, is from the 1977 Barclay et. al. report [9],

which includes a brief discussion of the plot and how the data was generated.

Fig. 1 What Uncertainty Statements Mean to Different Readers [9]

The data and its graphical portrayal given in Figure 1 were examined between 1998 and 2009. Several results

are summarized here. First, without supplemental information, it is unclear what the dots represent in Figure 1. In

fact, they are results from 23 survey participants, but in numerous published cases from the 1980s to 2009, between

three and 25 responses were present rather than 23 (and in none of these publications was the specific number of

responses listed as 23). (The underlying numerical data is lost. This is despite recent attempts by two co-authors of

the original 1970s study to locate the raw data per my request.) Second, without supplemental information, it is

unclear what the bars represent in Figure 1. For example, the bars aren’t the data range of the 23 survey

participants. Third, the probability ranges associated with the bars were obtained from a 1960s study authored by

Sherman Kent [2] and almost all of the subjective probability statements with bars had one or both bar (minimum or

maximum) bounds that were erroneous when compared to the original source used to estimate the bar bounds.

Fourth, without supplemental information, it is not possible to correctly determine that the bars (probability ranges

from the Sherman Kent study [2]) have no relationship to the dots (survey responses from a likely 1970s analysis)

for any of the subjective statements since the bars and dots were derived from two different and completely

unrelated studies performed perhaps 10 or more years apart by two different organizations. [The source of the study

performed with 23 survey participants is unknown, but unlikely to be from Decisions and Designs, Inc., else the

citation within Barclay et. al. [9] would have been written differently. Thus, Figure 1 was likely created from two

independent studies (Kent [2] and an unknown source), then overlaid by Barclay et. al. [9].] I will now explore the

above items in greater detail.

Supplemental information from Barclay et. al. [9] reveals the following about the source of the data and results.

Twenty-three officers (NATO intelligence analysts), with ranks from Lieutenant General to Squadron Leader,

evaluated the 16 subjective probability phrases [9]. Each “dot” in Figure 1 represents one respondent’s probability


3

assessment. Some phrases, in particular “better than even,” were interpreted similarly by most of the respondents

[9]. However, for most subjective phrases there was “tremendous” variation in the responses (as evident in Figure

1) [9].

“In 1964, Sherman Kent attempted to mitigate this problem within the intelligence community by proposing a

scale range of probabilities for various” subjective phrases [9]. These ranges are indicated by the shaded areas in

Figure 1 (termed here a “Kent bar.”) Unfortunately, as discussed below, the ranges attributed to tables developed by

Sherman Kent [2] that are included in Figure 1 [9] and used by other researchers while correctly cited by Kent [2]

are incorrectly used in most other cases without any notation to the reader of potential errors.

Statistical Analysis of Data and Data Ranges

Figure 1 was available as a scanned image contained within a PDF file of Barclay et. al. [9]. On-screen (72 dots

per inch) digitizing software coupled with a 600 dot per inch mouse was used to convert the graphical representation

into probability values (0 to 100 in percent) for the 23 responses for each of the 16 subjective probability statements.

Without knowing that there were 23 responses for each statement [9] it would have been very easy to miss a number

of the data points because of both close clustering of values, and the dot pattern that was used to fill the shaded areas

associated with the Kent ranges in Figure 1 [9]. The latter issue was dominant given the potential overlap between

dots representing responses and the shading pattern for each of the 13 responses that included a Kent range. A

cursory (and likely not perfect) examination of the number of responses per subjective probability statement yielded

the following results for each corresponding reference: three to nine responses [3], eight to 15 responses [4], 18 to

24 responses [5], 20 to 22 responses [6], 21 to 23 responses [7], and 23 responses (although identifying some

responses was extremely difficult) [9].

While descriptive statistics were developed for the 23 responses for each of the 16 subjective probability

statements, the results are not reported here due to potential scanning and digitizing errors. To quantify potential

errors, measurements were made using the figure contained in the PDF file at 10 percent intervals from zero to 100

percent for the “very good chance” and “almost no chance” subjective probability statements from Figure 1. [These

two portions of the figure were selected because they contained no shaded regions (“boxes”) associated with the

Kent ranges.]

I was also able to obtain a high quality copy of Figure 1 contained in a hardcopy of Barclay et. al. [9] courtesy of

Dr. Cameron R. Peterson, one of the documents co-authors. I scanned the hardcopy of the figure at 600 dots per

inch (dpi) in an attempt to have a “cleaner” separation between the 23 data points (“dots”) and the shading pattern

used in the “Kent bars.” This approach proved successful in making it easier to identify the 23 data points.

Unfortunately, a non-linear spatial distortion was identified in both the PDF file version of this figure I originally

worked with and existed in the scanned version I created from Dr. Peterson’s hardcopy.

With the PDF (file) version of this figure, the resulting error measured for two statements (almost no chance,

very good chance) was between 0.3% and 0.5%, respectively for probability values denoted by the vertical grid

between 30 and 100 percent. However, at 20 percent the resulting error was 2.5% to 2.6%, and at 10 percent this

error increased to 6.1% to 8.0%. With the 600 dpi scanned hardcopy file, the resulting error measured for two

statements (almost no chance, very good chance) was between 0.5% and 0.8%, respectively for probability values

denoted by the vertical grid between 30 and 100 percent. However, at 20 percent the resulting error was 2.7% to

3.1%, and at 10 percent this error increased to 6% to 7%.

The above results show that the error is likely caused by the spatial alignment of the grid at the 10% and 20%

values, rather than from scanning either or both of the hardcopies (the original source into the PDF file and the

digitization I performed of Dr. Peterson’s hardcopy). However, it’s not possible to know if the data placement

between 0% and 20% (hence any corresponding data points) are in error as a result of the shifted grid. (The two

possibilities include a shifted grid but correct data point placement, and shifted grid and incorrect data point

placement.)

The nature of the resulting errors is even more complicated when descriptive statistics are calculated from

redrawn figures rather than Figure 1 [9] as illustrated in results presented by Hillson [8]. The same type of problems

are also true when ranges are estimated from redrawn figures rather than the than Figure 1 [9], let alone failing to

recognize that these ranges were actually developed by Kent [2] as mentioned in Barclay et. al. [9]. More than one

layer of errors can exist in such cases. Again, this is illustrated in results presented by Hillson [8].

Finally, as mentioned above the Kent ranges are independent of the 23 respondent’s assigned probability values

in each case because the Kent ranges were estimated in a different manner and likely years earlier than the survey

responses. Hence errors associated with the Kent ranges are a matter of how the Kent data was interpreted and how


4

the resulting figure(s) were drawn. The subsequent errors have nothing to do with the survey data from the 23

respondents.

The Kent ranges contained in Barclay et. al. [9] were a mix of statements contained in Table 1, where “give or

take” is assumed to mean “plus or minus” [2], plus a second table contained in Kent [2], and given in Table 2 below.

Table 2 represents, per Kent, synonyms for several (six) subjective probability statements. The degree of accuracy

associated with the Kent ranges given in Figure 1 [9] can be determined given the data contained in Tables 1 and 2.

Each Kent range contains an upper and lower bound. Of the 16 subjective probability statements, only three (“very

good chance,” “little chance,” and “almost no chance”) do not have Kent ranges in Figure 1 [9]. Of the 13

subjective probability statements that do have Kent ranges, one is erroneous—“highly unlikely,” as this statement

does not appear in Tables 1 and 2 from Kent [2].

Value Modifier Range Subjective Phrase

100% N/A 100% Certainty

93% Give or take about 6% 87% to 99% Almost certain

75% Give or take about 12% 63% to 87% Probable

50% Give or take about 10% 40% to 60% Chances about even

30% Give or take about 10% 20% to 40% Probably not

7% Give or take about 5% 2% to 12% Almost certainly not

0% N/A 0% Impossibility

Table 1. Subjective Probability Assessment Table (Developed by Sherman Kent [2])

Subjective Phrase Synonyms

Possible Conceivable

Could

May

Might

Perhaps

Almost certain Virtually certain

All but certain

Highly probable

Highly likely

Odds (or chances) overwhelming

Probable Likely

We believe

We estimate

50-50 Chances about even

Chances a little better (or less) than even

Improbable

Unlikely

Probably not We believe that…not

We estimate that…not

We doubt, doubtful

Almost certainly not Virtually impossible

Almost impossible

Some slight chance

Highly doubtful

Table 2. Table of Synonyms for Subjective Phrases (Developed by Sherman Kent [2])


5

Of the 12 subjective probability statements that have valid Kent ranges, two of the 12 are statements from Table

1 (“probable” and “probably not”), while the other 10 cases are synonyms contained in Table 2. Ten of the 12

subjective probability statements had one or both bounds that were erroneous when compared to the information in

Tables 1 and 2. (In five cases both bounds were erroneous and in the other five cases one bound was erroneous.)

The absolute magnitude of these probability errors ranged from three to 25 percent, and was eight percent or larger

for seven of the ten subjective probability statements. I leave it as an exercise to the reader to estimate the accuracy

of the Kent ranges represented in references [3], [4], [5], [6], [7], and [9].

While errors associated with the “Kent bars” were identified and corrected, it was not possible to accurately

determine descriptive statistics for the “dots” because of the nonlinear geometric distortions in the grid spacing

present in Figure 1 (previously discussed). (This error exists in the original publication and cannot be removed.)

The nonlinear grid spacing error, inability to locate the numerical values in tabular form, together with the fact that

only 23 survey responses existed, led to creating and distributing a survey of the 16 subjective probability statements

to obtain a larger and more reliable statistical sample for analysis.

Finally, it should be noted that the data contained in the representation of Figure 1 in references [3], [4], [5], [6],

[7], and [9] contain errors—in some cases not simply limited to the Kent ranges. (See the discussion in the appendix

to this paper for additional information.) As a result, Figure 1 and related illustrations should be considered nothing

more than historical examples. These figures may not accurately portray the underlying data and results, and the

figures should not be used in performing intelligence assessments, risk analyses, etc.

Survey Developed for 16 Subjective Probability Statements

The 16 subjective probability statements given in Figure 1 were randomized and included in a survey that was

both conducted in person and through representatives of different organizations. The randomization was performed

to eliminate the possibility of adjustment and anchoring biases adversely affecting the respondent’s scoring of the

subjective probability statements [10]. The randomization also included a manual check and re-adjustment where

needed to insure that no consecutive subjective probability statements as ordered in Figure 1 existed in the survey

form.

Conrow developed instructions for completing the survey which follow:

“Sixteen statements that have been used for subjective probability assessments are given in the

table below. However, different people will assign a different probability value to each statement.

Please assess the probability level of each statement, then write a single value in each cell to the

right of the statement. Write the numbers as an integer between 0 and 100 (e.g., a probability of

0.5 = 50). There is no “correct” answer for any statement.”

“Please perform the assessment by yourself—don’t do it with another person. Don't attempt to

modify the table—enter the values exactly as the table is structured. Your contact information will

be held strictly confidential, not provided to anyone else, not used for spam, etc. You will not be

contacted unless there is an error in completing the survey.”

Altogether 141 respondents completed the survey. The respondents were from a wide variety of industry,

government, professional societies, and other organizations. The respondents were primarily located in the United

States, but a small number were European aerospace engineers (who participated via a professional society

headquartered in the U. S.). The primary occupations of the participants were engineers and project managers, and

the primary industries were aerospace and information technology (IT).

Survey Results

When the 141 responses were examined it was clear that responses from a number of individuals appeared to be

unreasonable, even when examined against the broad range for each of the 16 subjective probability statements in

Figure 1. Statistical results for the 141 raw (unfiltered) responses for the 16 statements are given in Table 3. The

number of respondents was 141, but in 36 cases a single subjective probability statement (“improbable”) was

repeated in random order in the questionnaire to determine if the results were statistically significantly different.

(Consequently, for the subjective probability statement “unlikely” there were 36 fewer responses, reflecting the dual

inclusion of “improbable,” or 105 total.)


6

The use of two cases of “improbable” was performed with one group of 36 project managers, primarily (but not

exclusively) with IT backgrounds. Hence the test attempted to address the question of whether/not the sample

Table 3. Statistical Analysis of Raw Subjective Probability Statement Survey Responses

respondents would assess the probability level similarly or differently for the same statement in their randomized

survey. A Mann-Whitney W test was used to compare the medians of the two samples of 36 responses for

“improbable.” (This test is constructed by combining the two samples, sorting the data from smallest to largest, and

comparing the average ranks of the two samples in the combined data.) The resulting test score had a probability of

0.24. Since the probability value is greater than or equal to 0.05, there was not a statistically significant difference

between the medians of the two samples at the 95.0% confidence level. This indicates that the 36 respondents did

not inconsistently score the two different “improbable” subjective probability statements (based upon comparing the

sample medians). (Extrapolations beyond this one statement should not be attempted.)

From Table 3 it is evident that a wide range of responses existed for each subjective probability statement. The

range was ≥ 0.90 in all 16 cases, ≥ 0.95 in 11 cases and equal to 1.00 in five cases! These results show that there is a

distinct possibility that at least some individuals provided responses to statements that were statistical outliers.

Of the 141 total survey respondents, 18 were systems engineers at “Company X” (which requested that the

company name be withheld for competitive reasons). These survey responses were performed individually but

nevertheless appeared both reasonable and had a relatively small standard deviation for each of the 16 subjective

probability statements. An approach was developed for filtering the raw responses from the 141 participants based

upon the 18 responses from “Company X.” First, the deviation each respondent’s assessed probability level for each

subjective probability statement was calculated by:

(Probability Estimate– “Company X” Sample Mean)/”Company X” Sample Standard Deviation (1)

Equation (1) represents the number of (“Company X”) standard deviations that a given assessed probability

value is relative to the “Company X” mean for a given subjective probability statement. The maximum and

minimum values for the 16 (statements) x 18 (“Company X” responses) = 288 scores were +2.8 standard deviations

from the mean and – 3.2 standard deviations from the mean, respectively. (There were only seven scores out of 288

with a value > 2.00 standard deviations from the mean and three scores out of 288 scores with a value < -2.00

standard deviations from the mean.)

Equation (1) was used to filter the 16 (statements) x 141 (responses) = 2,256 total responses. If the absolute

value of a respondent’s estimated probability was more than 5.0 “Company X” sample standard deviations from the

“Company X” mean that value was eliminated (and counted as an outlier). All 16 values were evaluated

independently such that only identified outliers meeting the above criteria were eliminated, while the values that did

not meet the outlier criteria from a given respondent were retained for further statistical analysis.

The test of the subjective probability phrase “improbable” was re-evaluated using the data filtered for outliers (as

mentioned above). There was 30 filtered responses in each of the two “improbable” cases, hence six values were

Statement Average Std Dev Min 10 25 50 75 90 Max Range Number

Almost certainly 0.87 0.19 0.05 0.70 0.90 0.95 0.97 0.99 1.00 0.95 141

Highly likely 0.82 0.17 0.10 0.70 0.77 0.90 0.90 0.95 1.00 0.90 141

Very good chance 0.80 0.16 0.00 0.70 0.75 0.80 0.90 0.95 1.00 1.00 141Probable 0.68 0.16 0.05 0.50 0.60 0.70 0.75 0.85 1.00 0.95 141

Likely 0.70 0.16 0.05 0.50 0.60 0.75 0.80 0.90 1.00 0.95 141

Probably 0.69 0.18 0.05 0.50 0.60 0.75 0.80 0.90 1.00 0.95 141

We believe 0.68 0.21 0.05 0.40 0.60 0.70 0.80 0.90 1.00 0.95 141

Better than even 0.56 0.13 0.00 0.50 0.51 0.55 0.60 0.66 1.00 1.00 141We doubt 0.26 0.18 0.00 0.05 0.10 0.25 0.40 0.50 0.90 0.90 141

Improbable 0.20 0.24 0.00 0.00 0.02 0.10 0.30 0.54 1.00 1.00 177

Unlikely 0.23 0.15 0.00 0.07 0.10 0.20 0.30 0.38 1.00 1.00 105

Probably not 0.23 0.19 0.00 0.05 0.10 0.20 0.30 0.49 0.90 0.90 141Little chance 0.15 0.15 0.00 0.04 0.05 0.10 0.20 0.30 0.90 0.90 141

Almost no chance 0.09 0.17 0.00 0.01 0.01 0.05 0.10 0.15 0.95 0.95 141

Highly unlikely 0.16 0.23 0.00 0.01 0.05 0.05 0.10 0.50 1.00 1.00 141Chances are slight 0.18 0.17 0.00 0.05 0.10 0.15 0.25 0.30 0.90 0.90 141

Percentiles


7

removed as outliers in each of the two cases. As described for raw data values, a Mann-Whitney W test was used to

compare the medians of the two samples of 30 filtered responses for “improbable.” The resulting test score was a

probability value of 0.11. Since the probability value is greater than or equal to 0.05, there was not a statistically

significant difference between the medians of the two samples at the 95.0% confidence level. This indicates that the

30 respondents did not inconsistently score the two different “improbable” subjective probability statements (based

upon comparing the sample medians). (Extrapolations beyond this one statement should not be attempted.)

The data for all 16 survey subjective probability statement responses was non-normal based upon formal

distribution fitting using Anderson Darling and Kolmogorov Smirnov test statistics at the 0.05 probability level. In

addition, the skewness and kurtosis of the resulting data also indicated that the responses for the 16 subjective

probability statements were non-normal. Thus it is not possible to equate five standard deviations to a probability

value using a simple normal distribution.

Twenty or more distribution types were evaluated against the 105 to 177 “raw” responses for each of the 16

subjective probability statements, as represented by results given in Table 3. There were only two subjective

probability statements that yielded one or more distribution fits that could not be rejected at the 0.05 probability

level for the 20 or more distribution types tested. These two cases were “we believe” and “likely.” Each statement

was not rejected by the Anderson Darling and Kolmogorov Smirnov tests at the 0.05 probability level for the

Johnson SU distribution [11], [12] but was rejected for all other distributions tested. There is insufficient evidence

to accept the Johnson SU distribution as an adequate representation of the data in these two cases even though the

distribution type could not be rejected at the 0.05 level. Data from the remaining 14 subjective probability

statements were rejected from 20 or more distribution types evaluated with both the Anderson Darling and

Kolmogorov tests at the 0.05 level. However, based upon examining the data and results it is believed that selecting

5.0 standard deviations as the outlier criteria is both reasonable and not overly conservative (which tends to preserve

variability in the responses).

The resulting descriptive statistics for the filtered data is given in Table 4. While there were modest to moderate

differences in the average values between the raw and filtered data cases on a percentage basis, as evident from

comparing Tables 3 and 4, there was virtually no difference in the medians (50th

percentile) of the two cases. Larger

differences were noted in the standard deviation, minimum, maximum, and range values of the two cases than for

the average or median. (This is because the outliers tend to manifest near the bottom or top percentile values for a

given statement, and hence their influence on the average and median values is typically less than the standard

deviation, minimum, maximum, and range values.) For example, with the filtered data there is only one subjective

probability statement that had a range ≥ 0.90 (”better than even”) versus all 16 statements that had a range ≥ 0.90

for the unfiltered data! As shown in Table 5, the difference in the results of the unfiltered vs. filtered cases show

that some individuals (one to 18) provided responses to statements that were outliers (as defined above),

corresponding to 0.7 to 13.7 percent of the sample that completed the survey. Three of the 16 subjective probability

statements had more than 10% of the respondents categorized as outliers, while nine of the 16 cases had more than

5% of the respondents categorized as outliers (given the criteria used here). These results clearly show that

substantial deviations from nominal values occur on more than rare occasions.

Table 4. Statistical Analysis of Filtered Subjective Probability Statement Survey Responses

Statement Average Std Dev Min 10 25 50 75 90 Max Range Number

Almost certainly 0.93 0.06 0.75 0.85 0.90 0.95 0.98 0.99 1.00 0.25 125

Highly likely 0.86 0.08 0.60 0.75 0.80 0.90 0.90 0.95 1.00 0.40 131

Very good chance 0.83 0.10 0.48 0.70 0.80 0.83 0.90 0.95 1.00 0.52 134Probable 0.68 0.15 0.20 0.50 0.60 0.70 0.76 0.85 1.00 0.80 140

Likely 0.71 0.14 0.30 0.51 0.60 0.75 0.80 0.90 1.00 0.70 138

Probably 0.71 0.15 0.25 0.50 0.60 0.75 0.80 0.90 1.00 0.75 136

We believe 0.71 0.17 0.30 0.50 0.60 0.70 0.80 0.94 1.00 0.70 133

Better than even 0.56 0.12 0.10 0.50 0.51 0.55 0.60 0.66 1.00 0.90 139We doubt 0.25 0.16 0.00 0.05 0.10 0.25 0.39 0.46 0.70 0.70 138

Improbable 0.13 0.14 0.00 0.00 0.01 0.10 0.20 0.40 0.50 0.50 159

Unlikely 0.21 0.11 0.00 0.06 0.10 0.20 0.25 0.35 0.50 0.50 102

Probably not 0.20 0.14 0.00 0.05 0.10 0.20 0.25 0.40 0.60 0.60 135Little chance 0.12 0.08 0.00 0.03 0.05 0.10 0.15 0.24 0.40 0.40 133

Almost no chance 0.05 0.04 0.00 0.01 0.01 0.05 0.05 0.10 0.20 0.20 130

Highly unlikely 0.08 0.07 0.00 0.01 0.05 0.05 0.10 0.20 0.30 0.30 124Chances are slight 0.16 0.10 0.00 0.05 0.10 0.15 0.20 0.30 0.51 0.51 134

Percentiles


8

Table 5. Statistical Analysis of Filtered Subjective Probability Statement Survey Responses

Table 6 provides the number of individuals that had responses categorized as outliers on a per subjective

probability statement basis. Fifty five individuals, corresponding to 39.0 percent of the total respondents had one or

more assigned estimated probabilities that was categorized as an outlier response (as defined above), while 30

individuals (21.3%) had two or more outlier responses. Similarly, 16.3% had three or more outliers, and 13.5% had

four or more outliers (which corresponds to 25+% of the total subjective probability statements). This is a fairly

large proportion and one that should bring pause to those considering using the 16 subjective probability statements

evaluated here to either represent particular concerns or as part of an estimative probability table or scale for use in a

risk analysis. The reason for this concern is that a subjective probability statement (e.g., probable) and the

associated probability value for a statement (e.g., 0.68) may be interpreted as mis-matched, hence incorrect in the

mind of the analyst, and lead to mis-scoring the resulting probability level (via cognitive dissonance). For example,

while the filtered mean respondent score for “probable” was 0.68 (Table 4), the range for this subjective probability

statement was 0.80 (minimum = 0.20 and maximum = 1.00). If “probable” coupled with a value of 0.68 was used to

represent an ordinal scale level (e.g., 3), and the analyst believes, for example, that the correct value for “probable”

is 0.90, then the analyst might score the item in question as ordinal level 4 instead (where level 4 represents a higher

probability level than level 3). While mis-scoring items in a risk analysis were not directly evaluated in this study,

the results presented illustrate the potential for mis-scoring when a probability scale based upon subjective

probability statements is used.

Table 6. Statistical Analysis of Subjective Probability Statement Survey Responses

As evident from Table 5, filtering respondent data based upon results from a control group can remove potential

outlier values. This will typically reduce the estimated range and standard deviation for each subjective probability

statement. However, as previously mentioned estimative probability tables and scales should be viewed as more of

a “last resort” than first choice and not used in a risk analysis unless they are the only available means to evaluate

probability [1].

Statement Range Difference Unfiltered Number Filtered Number Respondent Difference Percent Respondent Difference

Almost certainly 0.70 141 125 16 12.8

Highly likely 0.50 141 131 10 7.6

Very good chance 0.48 141 134 7 5.2Probable 0.15 141 140 1 0.7

Likely 0.25 141 138 3 2.2

Probably 0.20 141 136 5 3.7

We believe 0.25 141 133 8 6.0Better than even 0.10 141 139 2 1.4

We doubt 0.20 141 138 3 2.2

Improbable 0.50 177 159 18 11.3

Unlikely 0.50 105 102 3 2.9

Probably not 0.30 141 135 6 4.4Little chance 0.50 141 133 8 6.0

Almost no chance 0.75 141 130 11 8.5

Highly unlikely 0.70 141 124 17 13.7Chances are slight 0.39 141 134 7 5.2

Number of Statements Number of Outlier Respondents Percent Respondents Outliers

1 25 17.7

2 7 5.0

3 4 2.8

4 5 3.55 5 3.5

6 5 3.5

7 3 2.18 1 0.7

Total 55 39.0


9

There are at least five specific reasons why this is the case [1]:

• First, the definitions for probability statements are interpreted differently by different analysts.

• Second, results obtained from estimative probability data will typically have a high degree of uncertainty, or

at least variability given that the range (maximum – minimum) is typically substantial for many statements.

From Table 4, filtered values for 11 of the 16 subjective probability statements had a range ≥ 0.50!

• Third, candidate risks often evaluated with estimative probability data (e.g., an ordinal scale or a probability

table) may actually be related to maturity (e.g., potential development status of an item) or some other

criteria different than probability. This forces the analyst to choose a probability level that may not be known

or even apply.

• Fourth, probability data of this type almost never represent probabilities associated with actual measured

values (e.g., real world data or survey results), but typically only subjective estimates made by the author of

the estimative probability ordinal scale or probability table and later the analyst attempting to use it.

• Fifth, in cases where the probability representation of a risk may actually be valid, the analyst often has little

or no knowledge how to score the given risk.

Given the above limitations, maturity-based, resource-based, and other classes of ordinal scales should typically

be used in a risk analyses instead of estimative probability scales and tables whenever possible [13].

Conclusions

Subjective probability values have been used for decades in intelligence and risk analyses. The example

obtained and used in this paper, published in 1977, includes results associated with 16 subjective probability

statements. The 1977 authors included a figure that was a combination of results from a 1964 study overlaid on

another study whose pedigree is unclear (beyond the fact that 16 subjective probability statements were evaluated by

23 respondents and that the work was performed in 1977 or earlier). A variety of errors were identified in the

original 1977 figure which included the two different study results. These errors were subsequently propagated in

part by at least six different authors, several of these six also introduced other errors into the re-published figure.

While the errors in each figure, both the original and each re-published one, were identified, one error associated

with a non-linear grid between probability values of zero and 0.20 could not be corrected. This provides a “lesson

learned” for future subject probability analysis researchers. Unless the original numerical data is available (or a

means to precisely generate it), subsequent analyses may yield flawed results because of underlying errors present in

graphical representations of the data.

Because of these issues, a survey was developed and conducted using a randomized ordering of the same 16

subjective probability statements as published in the 1977 study. Results obtained from the 141 respondents

included a large range (≥ 0.90) for every one of the 16 subjective probability statements—clearly indicating that

outliers existed in each sample. An approach was developed to filter potential outliers based upon a sub-sample of

the data (corresponding to responses from 18 systems engineers from a single organization), and applied to the data

sample for each statement. The resulting filtered data revealed between one and 18 outliers (of 141 respondents) for

each of the 16 subjective probability statements, along with a substantial reduction in the range for each statement.

The resulting range per statement was ≥ 0.90 for only one of the 16 statements, but ≥ 0.50 in 11 of the 16

statements. This represents a substantial reduction in the range per statement with only a relatively small number of

data points eliminated.

While the improvement resulting from the filtering process is noteworthy in terms of identifying and eliminating

potential outliers, the results still strongly suggest that probability scales based upon subjective probability

statements should not be used in risk analyses because they leave open the chance of mis-scoring. Precisely worded

probability-related scales based upon maturity, resources, etc. should be used instead of estimative probability scales

whenever possible.

Appendix

The Kerzner [3] risk management section has been updated by Edmund Conrow since the Seventh Edition

(2000). Figure 17-4, pg. 764 [3] was removed in 2009 (Tenth Edition) upon discovery that it contained errors in

labeling one subjective probability statement (“almost likely” instead of “almost certainly”) and graphically

representing the data, and replaced by a figure derived from survey data given in Appendix J of Conrow [14]. This

substitution of information occurred in the Kerzner Tenth Edition, Second Printing, 2009. While Conrow had

previously and separately seen references [4], [5], and [7] prior to 1999, and obtained the Kent reference [2] in April

1999, he did not closely examine the figures in these references together with Figure 17-4 in the Tenth Edition of


10

Kerzner (and corresponding figures in earlier editions), until 2009. Nevertheless, Figure 17-4 was removed at the

first opportunity after the problems with this figure were identified and verified.

Pariseau and Oswalt [4] cited an earlier (1973) document apparently from the same company as Barclay et. al.

[9]. Efforts to locate this earlier document through multiple Department of Defense sources, and the founder and

former President of Decisions and Designs, Inc. were futile. Dr. Igar Oswalt was extremely helpful, and supplied a

copy of Barclay et. al. [9] to Conrow in 2009 (which led to Conrow updating the Kerzner material discussed above.

The graphical portrayal included by Pariseau and Oswalt in their paper (pg. 157) includes Kent bars for “very good

chance,” “little chance,” and “almost no chance,” whereas these subjective probability phrases were not included in

Kent [2]. However, the Kent bars are not explained in the Pariseau and Oswalt paper.

Boehm [5] cited reference [7] as the source of this information (pg. 494). But reference [7] cited Barclay et. al.

[9] as the source. Boehm [5] wisely and correctly said that “…although there is reasonable consensus on the overall

location of these adjectives (statements) on the (assigned) probability scale, the location is not precise and is likely

to vary significantly from one person to another” (pg. 132).

Heuer [6] cited Barclay et. al. [9] as the source of this information. Heuer did not discuss the Kent ranges, only

saying that the “shaded areas in the table show the ranges proposed by Kent” (pg. 154). Heuer does correctly state

that “probability ranges attributed to Kent in this table (pg. 155) are slightly different from those in Sherman Kent,

“Words of Estimated Probability…” (reference [2]). Heuer’s graphical portrayal (pg. 155) includes an additional

subjective probability statement, “about even” that contains no data points, but does show a Kent range from 40 to

60 percent probability. This subjective probability statement is not included in Barclay et. al. [9] which is used by

Heuer (pp. 153-154), hence was not part of the survey evaluated by the 23 respondents. However, the Kent range is

included in Kent’s Table 1 (from 40 to 60 percent probability) as “chances about even” [2]. On the surface this may

appear acceptable, but even slight changes in wording for subjective probability statements (in this case “about

even” vs. “chances about even”) may lead to differences in the resulting assigned probability level. Including this

statement also intermixes information from two different studies (references [2] and [9]) without providing an

explanation to the reader. Heuer also includes a Kent range for “highly unlikely” as did Barclay et. al. [9]. This is

erroneous as Kent [2] did not include a range for this subjective probability statement.

The Defense Systems Management College Risk Assessment Techniques [7] cited Barclay et. al. [9]. However,

it does not explicitly discuss the Kent ranges, only saying that the “dark bars represent a researcher’s

recommendations for standardization” (pg. D-3).

Hillson [8] used Boehm [5] as the source, and Hillson reported the mode and range results for a limited number

of subjective probability statements. Of the eight Hillson subjective probability statements that are common with

Boehm [5] (and Barclay et. al. [9]), two of the mode estimates reported by Hillson were incorrect. When Hillson’s

mode estimates results were compared to the Barclay et. al. values [9], five of the eight mode estimates were

incorrect. (In any event, it is inadvisable to use the mode as a primary statistical measure since slight measurement

errors can lead to non-trivial differences in results when the data distribution is not strictly unimodal and a small

sample size (23 respondents) exists as with many of the distributions associated with the 16 subjective probability

statements.)

Upon evaluating the Hillson range values derived from Boehm [5] it is apparent that Hillson simply used the

Boehm representation of the Kent ranges without recognizing that the Kent ranges were unrelated to the 23

respondent’s data, that the Kent ranges were not specifically identified in Boehm, and that Boehm made no claim as

to what the rectangular bars represented [5]. Of the eight ranges provided by Hillson that overlapped Barclay et. al.

[9], six matched the Boehm illustration, while two did not [5]. (One of the two errors, “good chance” for “very good

chance,” was present in the Boehm illustration [5] but Boehm correctly had no Kent range associated with it,

because there was none given by Barclay et. al. [9] let alone Kent [2].) When the eight Hillson cases were compared

to the actual Kent ranges [2], the ranges given by Hillson were correct in only one case (“almost certain” for “almost

certainly”), while incorrect in the other seven cases.

References

[1] Conrow, E., Effective Risk Management: Some Keys to Success, Second Edition, American Institute of

Aeronautics and Astronautics, 2003, pp. 204, 491-493.

[2] Kent, S., “Words of Estimative Probability,” Studies in Intelligence, Center for the Study of Intelligence, Central

Intelligence Agency, Fall 1964, (declassified and released for public distribution and use Sept. 22, 1993), pp. 49–65.

[3] Kerzner, H., Project Management: A Systems Approach to Planning, Scheduling, and Controlling, Tenth

Edition, Wiley, 2009. pp. 763-764.


11

[4] Pariseau, R., and Oswalt, I., “Using Data Types and Scales for Analysis and Decision Making,” Acquisition

Review Quarterly, Vol. 1, No. 2, Spring 1994, pp. 156-157.

[5] Boehm, B. W., Software Risk Management. Piscataway, New Jersey, USA: IEEE Computer Society Press,

1989, pp. 132-133.

[6] Heuer, R., Psychology of Intelligence Analysis, Center for the Study of Intelligence, Central Intelligence

Agency, 1999, pp. 154-155.

[7] _____, Risk Assessment Techniques, Defense Systems Management College, First Edition, 1983, pp. D-1 to D-

3.

[8] Hillson, D., “Describing Probability: the Limitations of Natural Language,” Exhibit 5, pg. 5. This paper was

originally published as part of the Project Management Institute Global Congress 2005, Edinburgh, United

Kingdom.

[9] Barclay, S., et. al., “Handbook for Decision Analysis,” Decisions and Designs, Inc., Report Number TR-77-6-

30, Contract Number N00014-76-0074, September 1977, pp. 67-68.

[10] Tversky, A., and Kahneman, D., “Judgment Under Uncertainty: Heuristics and Biases,” Science, Vol. 185, 27

Sept. 1974, pp. 1124–1131.

[11] Johnson, N. L., Kotz, S., and Balakrishnan, N,. Continuous Univariate Distributions, Vol. 1, 2nd ed., John

Wiley & Sons, Inc., New York, 1994, pp. 34-38.

[12] Lee, T., and Thomas, D., “Cost Growth Models for NASA’s Programs,” Journal of Probability and Statistical

Science, Vol. 1 No. 2 August 2003 pp. 265-279.


Aeronautics and Astronautics, 2003, pp. 461-483.


Aeronautics and Astronautics, 2003, pp. 491-513.

Date post:	14-Dec-2016
Category:	Documents
Upload:	edmund
View:	216 times
Download:	1 times

[American Institute of Aeronautics and Astronautics AIAA SPACE 2010 Conference & Exposition -...

Documents