+ All Categories
Home > Documents > ATTRIBUTE MEASUREMENT ERROR...

ATTRIBUTE MEASUREMENT ERROR...

Date post: 13-Jun-2018
Category:
Upload: duongminh
View: 235 times
Download: 0 times
Share this document with a friend
15
ATTRIBUTE MEASUREMENT ERROR ANALYSIS Attribute data consist of classifications rather than measurements. Attribute inspection involves determining the classification of an item, e.g., is it ‘‘good’’ or ‘‘bad’’? The principles of good measurement for attribute inspec- tion are the same as for measurement inspection (Table 10.7). Thus, it is possi- ble to evaluate attribute measurement systems in much the same way as we 346 MEASUREMENT SYSTEMS ANALYSIS Figure 10.13. Minitab gage linearity output. Figure 10.14. One-sample t-test of bias.
Transcript
Page 1: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

ATTRIBUTE MEASUREMENT ERROR ANALYSISAttribute data consist of classifications rather than measurements.

Attribute inspection involves determining the classification of an item, e.g., isit ‘‘good’’ or ‘‘bad’’? The principles of good measurement for attribute inspec-tion are the same as for measurement inspection (Table 10.7). Thus, it is possi-ble to evaluate attribute measurement systems in much the same way as we

346 MEASUREMENT SYSTEMS ANALYSIS

Figure 10.13.Minitab gage linearity output.

Figure 10.14.One-sample t-test of bias.

Administrator
Line
Administrator
Line
Page 2: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

evaluate variable measurement systems. Much less work has been done onevaluating attribute measurement systems. The proposals provided in thisbook are those I’ve found to be useful for my employers and clients. Theideas are not part of any standard and you are encouraged to think aboutthem critically before adopting them. I also include an example of Minitab’sattribute gage R&R analysis.

Attribute measurement error analysis 347

Table 10.7. Attribute measurement concepts.

MeasurementConcept

Interpretation forAttribute Data Suggested Metrics and Comments

Accuracy Items are correctlycategorized.

Number of times correctly classified by all

Total number of evaluations by all

Requires knowledge of the ‘‘true’’ value.

Bias The proportion ofitems in a givencategory is correct.

Overall average proportion in a given category (for allinspectors) minus correct proportion in a givencategory. Averaged over all categories.

Requires knowledge of the ‘‘true’’ value.

Repeatability When an inspectorevaluates the sameitem multipletimes in a shorttime interval, sheassigns it to thesame categoryevery time.

For a given inspector:

Total number of times repeat classifications agree

Total number of repeat classifications

Overall: Average of repeatabilities

Reproducibility When allinspectors evaluatethe same item,they all assign it tothe same category.

Total number of times classifications for all concur

Total number of classifications

Continued on next page . . .

Page 3: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

Operational definitionsAn operational definition is defined as a requirement that includes a means

of measurement. ‘‘High quality solder’’ is a requirement that must be operatio-nalized by a clear definition of what ‘‘high quality solder’’ means. This mightinclude verbal descriptions, magnification power, photographs, physical com-parison specimens, and many more criteria.

EXAMPLESOFOPERATIONALDEFINITIONS

1. Operational de¢nition of the Ozone Transport Assessment Group’s(OTAG) goal

Goal: To identify reductions and recommend transported ozoneand its precursors which, in combination with other measures, willenable attainment and maintenance of the ozone standard in theOTAG region.

348 MEASUREMENT SYSTEMS ANALYSIS

MeasurementConcept

Interpretation forAttribute Data Suggested Metrics and Comments

Stability The variabilitybetween attributeR&R studies atdi¡erent times.

‘‘Linearity’’ When an inspectorevaluates itemscovering the fullset of categories,her classi¢cationsare consistentacross thecategories.

Range of inaccuracy and bias across all categories.

Requires knowledge of the ‘‘true’’ value.

Note: Because there is no natural ordering fornominal data, the concept of linearity doesn’t reallyhave a precise analog for attribute data on this scale.However, the suggested metrics will highlightinteractions between inspectors and speci¢c categories.

Metric Stability Measure for Metric

Repeatability Standard deviation ofrepeatabilities

Reproducibility Standard deviation ofreproducibilities

Accuracy Standard deviation of accuracies

Bias Average bias

Table 10.7 (cont.)

Page 4: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

Suggested operational de¢nition of the goal:1. A general modeled reduction in ozone and ozone precursors

aloft throughout the OTAG region; and2. A reduction of ozone and ozone precursors both aloft and at

ground level at the boundaries of non-attainment area modelingdomains in the OTAG region; and

3. A minimization of increases in peak ground level ozone concen-trations in the OTAG region. (This component of the opera-tional de¢nition is in review.)

2. Wellesley College Child Care Policy Research Partnership operationalde¢nition of unmet need1. Standard of comparison to judge the adequacy of neighborhood ser-

vices: the median availability of services in the larger region(Hampden County).

2. Thus, our de¢nition of unmet need: The di¡erence between the careavailable in the neighborhood and themedian level of care in the sur-rounding region (stated in terms of child care slots indexed to theage-appropriate child population�‘‘slots-per-tots’’).

3. Operational de¢nitions of acids and bases1. An acid is any substance that increases the concentration of the H+

ion when it dissolves in water.2. A base is any substance that increases the concentration of the OH^

ion when it dissolves in water.4. Operational de¢nition of ‘‘intelligence’’

1. Administer the Stanford-Binet IQ test to a person and score theresult. The person’s intelligence is the score on the test.

5. Operational de¢nition of ‘‘dark blue carpet’’A carpet will be deemed to be dark blue if1. Judged by an inspector medically certi¢ed as having passed the U.S.

Air Force test for color-blindness1.1. Itmatches the PANTONEcolor card 7462Cwhenboth carpet

and card are illuminated byGE ‘‘cool white’’ £uorescent tubes;1.2. Card and carpet are viewed at a distance between 16 inches and

24 inches.

HOWTOCONDUCTATTRIBUTE INSPECTIONSTUDIES

Some commonly used approaches to attribute inspection analysis are shownin Table 10.8.

Attribute measurement error analysis 349

Page 5: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

350 MEASUREMENT SYSTEMS ANALYSIS

Table 10.8.Methods of evaluating attribute inspection.

True Value Method of Evaluation Comments

Known

Expert Judgment: Anexpert looks at theclassi¢cations after theoperator makes normalclassi¢cations and decideswhich are correct andwhich are incorrect.

& Metrics:Percent correct

& Quanti¢es the accuracy of theclassi¢cations.

& Simple to evaluate.

& Who says the expert is correct?

& Care must be taken to include all types ofattributes.

& Di⁄cult to compare operators sincedi¡erent units are classi¢ed by di¡erentpeople.

& Acceptable level of performance must bedecided upon. Consider cost, impact oncustomers, etc.

Round Robin Study: A setof carefully identi¢edobjects is chosen torepresent the full range ofattributes.

1. Each item is evaluatedby an expert and itscondition recorded.

2. Each item is evaluatedby every inspector atleast twice.

& Metrics:1. Percent correct by inspector2. Inspector repeatability3. Inspector reproducibility4. Stability5. Inspector ‘‘linearity’’

& Full range of attributes included.

& All aspects of measurement errorquanti¢ed.

& People know they’re being watched, maya¡ect performance.

& Not routine conditions.

& Special care must be taken to insure rigor.

& Acceptable level of performance must bedecided upon for each type of error.Consider cost, impact on customers, etc.

Continued on next page . . .

Page 6: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

Example of attribute inspection error analysisTwo sheets with identical lithographed patterns are to be inspected under

carefully controlled conditions by each of the three inspectors. Each sheet hasbeen carefully examined multiple times by journeymen lithographers and theyhave determined that one of the sheets should be classified as acceptable, theother as unacceptable. The inspectors sit on a stool at a large table where thesheet will be mounted for inspection. The inspector can adjust the height ofthe stool and the angle of the table. A lighted magnifying glass is mounted tothe table with an adjustable arm that lets the inspector move it to any part ofthe sheet (see Figure 10.15).

Attribute measurement error analysis 351

True Value Method of Evaluation Comments

Unknown

Inspector ConcurrenceStudy: A set of carefullyidenti¢ed objects ischosen to represent thefull range of attributes, tothe extent possible.

1. Each item is evaluatedby every inspector atleast twice.

& Metrics:1. Inspector repeatability2. Inspector reproducibility3. Stability4. Inspector ‘‘linearity’’

& Like a round robin, except true value isn’tknown.

& Nomeasures of accuracy or bias arepossible. Can only measure agreementbetween equally quali¢ed people.

& Full range of attributes included.

& People know they’re being watched, maya¡ect performance.

& Not routine conditions.

& Special care must be taken to insure rigor.

& Acceptable level of performance must bedecided upon for each type of error.Consider cost, impact on customers, etc.

Table 10.8. (cont.)

Page 7: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

Each inspector checks each sheet once in the morning and again in the after-noon. After each inspection, the inspector classifies the sheet as either accepta-ble or unacceptable. The entire study is repeated the following week. Theresults are shown in Table 10.9.

352 MEASUREMENT SYSTEMS ANALYSIS

Figure 10.15. Lithography inspection station table, stool and magnifying glass.

Table 10.9. Results of lithography attribute inspection study.

A B C D E F G H I

1 Part Standard InspA InspB InspC Date T|me Reproducible Accurate

2 1 1 1 1 1 Today Morning 1 1

3 1 1 0 1 1 Today Afternoon 0 0

4 2 0 0 0 0 Today Morning 1 0

5 2 0 0 0 1 Today Afternoon 0 0

6 1 1 1 1 1 LastWeek Morning 1 1

7 1 1 1 1 0 LastWeek Afternoon 0 0

8 2 0 0 0 1 LastWeek Morning 0 0

9 2 0 0 0 0 LastWeek Afternoon 1 0

Page 8: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

In Table 10.9 the Part column identifies which sheet is being inspected, andthe Standard column is the classification for the sheet based on the journey-men’s evaluations. A 1 indicates that the sheet is acceptable, a 0 that it is unac-ceptable. The columns labeled InspA, InspB, and InspC show theclassifications assigned by the three inspectors respectively. The Reproduciblecolumn is a 1 if all three inspectors agree on the classification, whether their clas-sification agrees with the standard or not. The Accurate column is a 1 if allthree inspectors classify the sheet correctly as shown in the Standard column.

INDIVIDUAL INSPECTORACCURACYIndividual inspector accuracy is determined by comparing each inspector’s

classification with the Standard. For example, in cell C2 of Table 10.9Inspector A classified the unit as acceptable, and the standard column in thesame row indicates that the classification is correct. However, in cell C3 theunit is classified as unacceptable when it actually is acceptable. Continuingthis evaluation shows that Inspector A made the correct assessment 7 out of 8times, for an accuracy of 0.875 or 87.5%. The results for all inspectors are givenin Table 10.10.

Repeatability and pairwise reproducibilityRepeatability is defined in Table 10.7 as the same inspector getting the same

result when evaluating the same item more than once within a short time inter-val. Looking at InspA we see that when she evaluated Part 1 in the morning of‘‘Today’’ she classified it as acceptable (1), but in the afternoon she said it wasunacceptable (0). The other three morning/afternoon classifications matchedeach other. Thus, her repeatability is 3/4 or 75%.

Pairwise reproducibility is the comparison of each inspector with every otherinspector when checking the same part at the same time on the same day. Forexample, on Part 1/Morning/Today, InspA’s classification matched that ofInspB. However, for Part 1/Afternoon/Today InspA’s classification was differ-

Attribute measurement error analysis 353

Table 10.10. Inspector accuracies.

Inspector A B C

Accuracy 87.5% 100.0% 62.5%

Page 9: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

ent than that of InspB. There are eight such comparisons for each pair of inspec-tors. Looking at InspA versus InspB we see that they agreed 7 of the 8 times,for a pairwise repeatability of 7/8 = 0.875.

In Table 10.11 the diagonal values are the repeatability scores and the off-diagonal elements are the pairwise reproducibility scores. The results areshown for ‘‘Today’’, ‘‘Last Week’’ and both combined.

OVERALLREPEATABILITY, REPRODUCIBILITY,ACCURACYANDBIAS

Information is always lost when summary statistics are used, but the datareduction often makes the tradeoff worthwhile. The calculations for the overallstatistics are operationally defined as follows:

& Repeatability is the average of the repeatability scores for the two dayscombined; i.e., (0:75þ 1:00þ 0:25Þ=3 ¼ 0:67.

& Reproducibility is the average of the reproducibility scores for the twodays combined (see Table 10.9); i.e.,

1þ 0þ 1þ 0

4þ1þ 0þ 0þ 1

4

� ��2 ¼ 0:50

& Accuracy is the average of the accuracy scores for the two days combined(see Table 10.9); i.e.,

1þ 0þ 0þ 0

4þ1þ 0þ 0þ 0

4

� ��2 ¼ 0:25:

354 MEASUREMENT SYSTEMS ANALYSIS

Table 10.11. Repeatability and pairwise reproducibility for both days combined.

Overall Today Last Week

A B C

A 0.75 0.88 0.50

B 1.00 0.50

C 0.25

A B C

A 0.50 0.75 0.50

B 1.00 0.75

C 0.50

A B C

A 1.00 1.00 0.50

B 1.00 0.50

C 0.00

Page 10: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

& Bias is the estimated proportion in a categoryminus the true proportion inthe category. In this example the true percent defective is 50% (1 part in2). Of the twenty-four evaluations, twelve evaluations classi¢ed the itemas defective. Thus, the bias is 0:5� 0:5 ¼ 0.

OVERALL STABILITYStability is calculated for each of the above metrics separately, as shown in

Table 10.12.

INTERPRETATIONOFRESULTS1. The system overall appears to be unbiased and accurate. However, the

evaluation of individual inspectors indicates that there is room forimprovement.

2. The results of the individual accuracy analysis indicate that Inspector Chas a problem with accuracy, see Table 10.10.

3. The results of the R&R (pairwise) indicate that Inspector C has a pro-blem with both repeatability and reproducibility, see Table 10.11.

4. The repeatability numbers are not very stable (Table 10.12). Comparingthe diagonal elements for Today with those of Last Week in Table10.11, we see that Inspectors A and C tended to get di¡erent results forthe di¡erent weeks. Otherwise the system appears to be relatively stable.

5. Reproducibility of Inspectors A and B is not perfect. Some bene¢t mightbe obtained from looking at reasons for the di¡erence.

Attribute measurement error analysis 355

Table 10.12. Stability analysis.

Stability of . . . Operational De¢nition of StabilityStabilityResult

Repeatability Standard deviation of the six repeatabilities (0.5, 1, 0.5, 1, 1, 1) 0.41

Reproducibility Standard deviation of the average repeatabilities. For data inTable 10.9, =STDEV(AVERAGE(H2:H5),AVERAGE(H6:H9)) 0.00

Accuracy Standard deviation of the average accuracies. For data in Table10.9, =STDEV(AVERAGE(I2:I5),AVERAGE(I6:I9)) 0.00

Bias Average of bias over the two weeks 0.0

Page 11: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

6. Since Inspector B’s results are more accurate and repeatable, studyingher might lead to the discovery of best practices.

Minitab attribute gage R&R exampleMinitab includes a built-in capability to analyze attribute measurement sys-

tems, known as ‘‘attribute gage R&R.’’ We will repeat the above analysis usingMinitab.

Minitab can’t work with the data as shown in Table 10.9, it must be rear-ranged. Once the data are in a format acceptable to Minitab, we enter theAttribute Gage R&R Study dialog box by choosing Stat > Quality Tools >Attribute Gage R&R Study (see Figure 10.16). Note the checkbox ‘‘Categoriesof the attribute data are ordered.’’ Check this box if the data are ordinal andhave more than two levels. Ordinal data means, for example, a 1 is in somesense ‘‘bigger’’ or ‘‘better’’ than a 0. For example, if we ask raters in a taste testa question like the following: ‘‘Rate the flavor as 0 (awful), 1 (OK), or 2 (deli-cious).’’ Our data are ordinal (acceptable is better than unacceptable), butthere are only two levels, so we will not check this box.

356 MEASUREMENT SYSTEMS ANALYSIS

Figure 10.16. Attribute gage R&R dialog box and data layout.

Page 12: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

Within appraiser analysisMinitab evaluates the repeatability of appraisers by examining how often the

appraiser ‘‘agrees with him/herself across trials.’’ It does this by looking at allof the classifications for each part and counting the number of parts where allclassifications agreed. For our example each appraiser looked at two parts fourtimes each. Minitab’s output, shown in Figure 10.17, indicates that InspA rated50% of the parts consistently, InspB 100%, and InspC 0%. The 95% confidenceinterval on the percentage agreement is also shown. The results are displayedgraphically in Figure 10.18.

Attribute measurement error analysis 357

Figure 10.17.Minitab within appraiser assessment agreement.

Figure 10.18. Plot of within appraiser assessment agreement.

Page 13: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

Accuracy AnalysisMinitab evaluates accuracy by looking at how often all of an appraiser’s clas-

sifications for a given part agree with the standard. Figure 10.19 shows theresults for our example. As before, Minitab combines the results for both days.The plot of these results is shown in Figure 10.20.

358 MEASUREMENT SYSTEMS ANALYSIS

Figure 10.19.Minitab appraiser vs standard agreement.

Figure 10.20. Plot of appraiser vs standard assessment agreement.

Page 14: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

Minitab also looks at whether or not there is a distinct pattern in the disagree-ments with the standard. It does this by counting the number of times theappraiser classified an item as a 1 when the standard said it was a 0 (the # 1/0Percent column), how often the appraiser classified an item as a 0 when it wasa 1 (the # 0/1 Percent column), and how often the appraiser’s classificationswere mixed, i.e., is not repeatable (the # Mixed Percent column). The resultsare shown in Figure 10.21. The results indicate that there is no consistent bias,defined as consistently putting a unit into the same wrong category. The pro-blem, as was shown in the previous analysis, is that appraisers A and C are notrepeatable.

BETWEENAPPRAISERASSESSMENTSNext, Minitab looks at all of the appraiser assessments for each part and

counts how often every appraiser agrees on the classification of the part. Theresults, shown in Figure 10.22, indicate that this never happened during ourexperiment. The 95% confidence interval is also shown.

Attribute measurement error analysis 359

Figure 10.21.Minitab appraiser assessment disagreement analysis.

Figure 10.22.Minitab between appraisers assessment agreement.

Page 15: ATTRIBUTE MEASUREMENT ERROR ANALYSISpyzdek.mrooms.net/file.php/1/reading/bb-reading/measurement_syste… · Attribute measurement error analysis 351 TrueValue MethodofEvaluation Comments

ALLAPPRAISERS VS STANDARDFinally, Minitab looks at all of the appraiser assessments for each part and

counts how often every appraiser agrees on the classification of the partand their classification agrees with the standard. This can’t be any betterthan the between appraiser assessment agreement shown in Figure 10.22.Unsurprisingly, the results, shown in Figure 10.23, indicate that this neverhappened during our experiment. The 95% confidence interval is also shown.

360 MEASUREMENT SYSTEMS ANALYSIS

Figure 10.23.Minitab assessment vs standard agreement across all appraisers.


Recommended