08-Pyzdek CH08 001-074pyzdek.mrooms.net/file.php/1/reading/bb-reading/establish_baseline_06... ·...

278 Chapter Eight

procedure is known as the hypergeometric probability distribution, and it is shown in Eq. (8.60).

(8.60)

In Eq. (8.60), N is the lot size, m is the number of defectives in the lot, n is the sample size, x is the number of defectives in the sample, and P(x) is the probability of getting exactly x defectives in the sample. Note that the numerator term c::-~m gives the number of combinations of non-defectives while C;Z is the number of combinations of defectives. Thus the numerator gives the total number of arrangements of samples from lots of size N with m defectives where the sample n contains exactly x defectives. The term C~ the denominator is the total number of combinations of samples of size n from lots of size N, regardless of the number of defectives. Thus, the probability is a ratio of the likelihood of getting the result under the assumed conditions.

For our example, we must solve the above equation for x = 0 as well as x = I, since we would also accept the lot if we had no defectives. The solution is shown as follows.

C12- 3C3 126 x 1 P(O)= 4-0 0 =--=0255 q2 495 .

C12-3C 3 84 x 3 252 P(l)= 4- 1 1 =--=-=0509 q2 495 495 .

P(l or less) = P(O) + P(l)

Adding the two probabilities tells uOOOOs the probability that our sampling plan will accept lots of 12 with 3 nonconforming units. The plan of inspecting 4 parts and accepting the lot if we have 0 or 1 nonconforming has a probability of 0.255 + 0.509 = 0.764, or 76.4%, of accepting this "bad" quality lot. This is the "consumer 's risk" for this sampling plan. Such a high sampling risk would be unacceptable to most people.

Example of Hypergeometric Probability Calculations Using Microsoft Excel Microsoft Excel has a built-in capability to analyze hypergeometric probabilities. To solve the above problem using Excel, enter the population and sample values as shown in Fig. 8.37. Note the formula result near the bottom of the screen (0.509) gives the probability for x = 1. To find the cumulative probability you need to sum the probabilities for x = 0 and x = 1 etc.

Normal Distribution The most common continuous distribution encountered in Six Sigma work is, by far, the normal distribution. Sometimes the process itself produces an approximately normal distribution, other times a normal distribution can be obtained by performing a mathematical transformation on the data or by using averages. The probability density function for the normal distribution is given by Eq. (8.61).

(8.61)

Pro c e s s B e hay i 0 r C h art s 279

A Prob of less than or equal to sample x ina sam pile of n fr om a population

21 N with populaNon X successes

22 Population N 23 Population X

12 3 4 ..1!, Sample n

25 Answer 26 Sample x 27 Prob

o 763636364 co~

o 'I 0.254545455 .$8$22)

[W~~::e_s IC$26 :§] _ 1

Number _ .. m,.PI r-1$~8$-24---------""~"" _ 4

Pop!Jlati n_ 1$8$23 ~ = 3

lllumh ~ -pop 1$8$22 :::9 = 12

- 0,509090909

Sample_ Is the I1lI mber of suc~esse.s In the sal'l1P1e.

Fc:rrtl.lia rewl =0,50909CI909 OK Caro-Jel

FIGURE 8.37 Example of finding hypergeometric probability using Microsoft Excel.

If f(x) is plotted versus x, the well-known "bell curve" results . The normal distribution is also known as the Gaussian distribution. An example is shown in Fig. 8.38.

In Eq. (8.61), /..l is the population average or mean and cr is the population standard deviation. These parameters have been discussed earlier in this chapter.

FIGURE 8.38 The normal/Gaussian curve.

280 C hap te rEi g h t

Example of Calculating f,1, (12, and (1

Find /-1, (J2, and (J for the following data: Table 8.20 gives the equation for the population mean as:

; x. I

1 17

2 23

3 5

To find the mean for our data we compute

1 /-1="3(17+23+5)=15

(8.62)

The variance and standard deviation are both measures of dispersion or spread. The equations for the population variance (J2 and standard deviation (J are given in Table 8.21.

(8.63)

(J = Jci2 Referring to the data above with a mean /-1 of 15, we compute (J2 and (J as follows:

; x . x j - /-1 (Xj

- /-1)2 I

1 17 2 4

2 23 8 64

3 5 -10 100

Sum 168

(J2 = 168/3 = 56

(J =.Jcr2 = J56 "" 7 .483

Usually we have only a sample and not the entire population. A population is the entire set of observations from which the sample, a subset, is drawn. Calculations for the sample mean, variance, and standard deviation were shown earlier in this chapter.

The areas under the normal curve can be found by integrating Eq. (8.61) using numerical methods, but, more commonly, tables are used. Appendix 2 gives areas under the normal curve. The table is indexed by using the Z transformation, which is

(8.64)

Process Behavior Charts 281

for population data, or

x.-x Z=_I __ S

(8.65)

for sample data. By using the Z transformation, we can convert any normal distribution into a nor

mal distribution with a mean of 0 and a standard deviation of 1. Thus, we can use a single normal table to find probabilities.

Example The normal distribution is very useful in predicting long-term process yields. Assume we have checked the breaking strength of a gold wire bonding process used in microcircuit production and we have found that the process average strength is 9# and the standard deviation is 4#. The process distribution is normaL If the engineering specification is 3# minimum, what percentage of the process will be below the low specification?

Since our data are a sample, we must compute Z using Eq. (8.65).

Z = 3 - 9 = -6 = -1 5 4 4 .

Figure 8.39 illustrates this situation. Entering in Appendix 2 for Z = -1:5, we find that 6.68% of the area is below this Z

value. Thus 6.68% of our breaking strengths will be below our low specification limit of 3. In quality control applications, we usually try to have the average at least three standard deviations away from the specification. To accomplish this, we would have to improve the process by either raising the average breaking strength or reducing the process standard deviation, or both.

Z=-1 .5 1.1.

FIGURE 8.39 Illustration of using Ztables for normal areas.

282 C hap te rEi g h t

NORMDIST J. ... J x ,, ~ ::NORMIDIST~ e :3. e1 . 62, 1)

A I B leo I E F I ~ Average 9

2 Sigma 4

~x 3

~ ~ P(less than xll!1.B2, 1)

NORMDIST

X lB3 ::hl=3 Mealil lB1 ~ = 9

Standard_dev r-IB-2 ------------=~,.." = 4

Cumulative 11 ~ = TRUE

-

= 0.066807229 Returns the normal mmulatlw; d lstnbutian for the spec:ifled mean and srnndard Mlileltlm.

Cumulative is a logical YCifue: fur the c:umulatille dislribution function, use TRUE; fur the probability mass function, use FALSE.

;-------, Formula resu lt =0.066807229 OK Cancel

FIGURE 8.40 Example of finding normal probability using Microsoft Excel.

Example of Normal Probability Calculations Using Microsoft Excel Microsoft Excel has a built-in capability to analyze normal probabilities. To solve the above problem using Excel, enter the average, sigma and x values as shown in Fig. 8.40. The formula result near the bottom of the screen gives the desired probability.

Exponential Distribution Another distribution encountered often in quality control work is the exponential distribution. The exponential distribution is especially useful in analyzing reliability. The equation for the probability density function of the exponential distribution is

1 f(x) = -e-x/~, x ~ a (8.66) 11

Unlike the normal distribution, the shape of the exponential distribution is highly skewed and there is a much greater area below the mean than above it. In fact, over 63% of the exponential distribution falls below the mean. Figure 8.41 shows an exponential pdf.

Unlike the normal distribution, the exponential distribution has a closed form cumulative density function (cdf), that is, there is an equation which gives the cumulative probabilities directly. Since the probabilities can be determined directly from the equation, no tables are necessary. See Eq. (8.67).

P(X:::; x) = 1- e-x/~ (8.67)

Example of Using the Exponential cdf A city water company averages 500 system leaks per year. What is the probability that the weekend crew, which works from 6 p.m. Friday to 6 a.m. Monday, will get no calls?

Process Behavior Charts 283

FIGURE 8.41 Exponential pdf curve.

We have /.l = 500 leaks per year, which we must convert to leaks per hour. There are 365 days of 24 hours each in a year, or 8760 hours. Thus, mean time between failures (MTBF) is 8760/500 = 17.52 hours. There are 60 hours between 6 p .m. Friday and 6 a.m. Monday. Thus x = 60. Using Eq. (8.67) gives

P(X::S; 60) = 1- e-60/17.5 2 = 0.967 = 96 .7%

Thus, the crew will get to loaf away 3.3% of the weekends.

Example of Exponential Probability Calculations Using Microsoft Excel Microsoft Excel has a built-in capability to analyze exponential probabilities. To solve the above problem using Excel, enter the average and x values as shown in Fig. 8.42. Note that Excel uses "lambda" rather than the average in its calculations; lambda is the reciprocal of the average. The formula result near the bottom of the screen gives the desired probability.

Example of Non-Normal Capability Analysis Using Minitab Minitab has a built-in capability to perform process capability analysis for non-normal data which will be demonstrated with an example. The process involved is technical support by telephone. A call center has recorded the total time it takes to "handle" 500 technical support calls. Handle time is a total cycle time metric which includes gathering preliminary information, addressing the customer 's issues, and performing postcall tasks. It is a CTQ metric that also impacts the shareholder. It has been determined that the upper limit on handle time is 45 minutes. Once the data has been collected, it can be analyzed as follows:

Phase 1-Check for Special Causes: To begin we must determine if special causes of variation were present during our study. A special cause is operationally defined as

332 C hap te r Ten

characteristics as the target population. If this is accomplished then inferences from the sample are said to have internal validity. A limitation on design-based inferences for experimental studies is that formal conclusions are restricted to the finite population of subjects that actually received treatment, that is, they lack external validity. However, if sites and subjects are selected at random from larger eligible sets, then models with random effects provide one possible way of addressing both internal and external validity considerations. One important consideration for external validity is that the sample coverage includes all relevant subpopulations; another is that treatment differences be homogeneous across subpopulations. A common application of design-based inference is the survey.

Alternatively, if assumptions external to the study design are required to extend inferences to the target population, then statistical analyses based on postulated probability distributional forms (e.g., binomial, normal, etc.) or other stochastic processes yield model-based inferences. A focus of distinction between design-based and modelbased studies is the population to which the results are generalized rather than the nature of the statistical methods applied. When using a model-based approach, external validity requires substantive justification for the model's assumptions, as well as statistical evaluation of the assumptions.

Statistical inference is used to provide probabilistic statements regarding a scientific inference. Science attempts to provide answers to basic questions, such as can this machine meet our requirements? Is the quality of this lot within the terms of our contract? Does the new method of processing produce better results than the old? These questions are answered by conducting an experiment, which produces data. If the data vary, then statistical inference is necessary to interpret the answers to the questions posed. A statistical model is developed to describe the probabilistic structure relating the observed data to the quantity of interest (the parameters), that is, a scientific hypothesis is formulated. Rules are applied to the data and the scientific hypothesis is either rejected or not. In formal tests of a hypothesis, there are usually two mutually exclusive and exhaustive hypotheses formulated: a null hypothesis and an alternate hypothesis.

Chi-Square, Student's T, and F Distributions In addition to the distributions present earlier in the Measure phase, these three distributions are used in Six Sigma to test hypotheses, construct confidence intervals, and compute control limits.

Chi-Square Many characteristics encountered in Six Sigma have normal or approximately normal distributions. It can be shown that in these instances the distribution of sample variances has the form (except for a constant) of a chi-square distribution, symbolized X2. Tables have been constructed giving abscissa values for selected ordinates of the cumulative X2 distribution. One such table is given in Appendix 4.

The X2 distribution varies with the quantity u, which for our purposes is equal to the sample size minus 1. For each value of u there is a different X2 distribution. Equation (10.3) gives the pdf for the X2.

(10.3)

tom

Line

Analyze Phase 333

0.20

0.15

R 0.10

0.05

0.00 0 2 4 6 8 10

FIGURE 10.8 X2 pdf for u = 4.

Figure 10.8 shows the pdf for u = 4.

Example The use of X2 is illustrated in this example to find the probability that the variance of a sample of n items from a specified normal universe will equal or exceed a given value S2; we compute X2 = (n-1) S2/ (32. Now, let's suppose that we sample n = 10 items from a process with (32 = 25 and wish to determine the probability that the sample variance will exceed 50. Then

(n -1)s2 (j2

9(50) = 18 25

We enter Appendix 4 (X2) at the line for u = 10 - 1 = 9 and note that 18 falls between the columns for the percentage points of 0.025 and 0.05. Thus, the probability of getting a sample variance in excess of 50 is about 3%.

It is also possible to determine the sample variance that would be exceeded only a stated percentage of the time. For example, we might want to be alerted when the sample variance exceeded a value that should occur only once in 100 times. Then we set up the X2 equation, find the critical value from Appendix 4, and solve for the sample variance. Using the same values as above, the value of S2 that would be exceeded only once in 100 times is found as follows:

60 .278

In other words, the variance of samples of size 10, taken from this process, should be less than 60.278, 99% of the time.

Example of Chi-Squared Probability Calculations Using Microsoft Excel Microsoft Excel has a built-in capability to calculate chi-squared probabilities. To solve the above problem using Excel, enter the n and x values as shown in Fig. 10.9. Note that

334 C hap te r Ten

I CHID]ST I ... J X oJ = I =C HIDIST( B2, B 1-1 ) I

A I 18 I C I D E F I

1 n I 10

j -

2 x 18 -3 -

~ 4 P(less than x) :2,81-1 ) ----;-

r-CHIDIST

~ X]B2 2J = 18

oeg_fireedom IBI-1 ~=9

-= 0.03517354

R@turns the one-tail@d proba'biliity of the chi-squared distribution. -

- oeg_fireedom is the number of degrees of freedom, a number ben.voon 1 and lO A lO, exc Iud ingl 10 .... 1o,

~ ~ For mul1a result =0 .. 03517354 I OK I Cancel I FIGURE 10.9 Example of finding chi-squared probability using Microsoft Excel.

Excel uses degrees of freedom rather than the sample size in its calculations; degrees of freedom is the sample size minus one, as shown in the Deg_freedom box in Fig. 10.9. The formula result near the bottom of the screen gives the desired probability.

Example of Inverse Chi-Squared Probability Calculations Using Microsoft Excel Microsoft Excel has a built-in capability to calculate chi-squared probabilities, making it unnecessary to look up the probabilities in tables. To find the critical chi-squared value for the above problem using Excel, use the CHIINV function and enter the desired probability and degrees of freedom as shown in Fig. 10.10. The formula result near the bottom of the screen gives the desired critical value.

r-CHIlNV------------------------------;-

Probabililty J.01 ~ '" 0.101

Oeg_freedom I'g------------::!I:--I". ="9

= 21.6660:4759 Rell.lrns ti1e inverse ofti1e one-tailed probabiliti of the cni-squared dislJ"ibution.

Deg_freedom is lt1e· number of degrees offreedom, a number between Jl and 10"10, excluding 10" 10.

Formula result =21.66604799 OK Can!:e i

FIGURE 10.10 Example of finding inverse chi-squared probability using Microsoft Excel.

Analyze Phase 335

Student's T Distribution The t statistic is commonly used to test hypotheses regarding means, regression coefficients and a wide variety of other statistics used in quality engineering. "Student" was the pseudonym of W.s. Gosset, whose need to quantify the results of small scale experiments motivated him to develop and tabulate the probability integral of the ratio which is now known as the t statistic and is shown in Eq. (10.4) .

X-Il t=--s / J;z

(10.4)

In Eq. (10.4), the denominator is the standard deviation of the sample mean. Percentage points of the corresponding distribution function of t may be found in Appendix 3. There is a t distribution for each sample size of n > 1. As the sample size increases, the t distribution approaches the shape of the normal distribution, as shown in Fig. 10.11.

One of the simplest (and most common) applications of the student's t test involves using a sample from a normal population with mean 11 and variance 0-2

• This is demonstrated in the hypothesis testing section later in this chapter.

F Distribution Suppose we have two random samples drawn from a normal population. Let si be the variance of the first sample and si be the variance of the second sample. The two samples need not have the same sample size. The statistic F given by

(10.5)

has a sampling distribution called the F distribution. There are two sample variances involved and two sets of degrees of freedom, n

1 - 1 in the numerator and n

2 - 1 in the

t Distribution /'--' .... for n = 2 ---f--+-,"/

I / , I

,I I I I , I

,II ,'/

/ (/

r

FIGURE 10.11 Student's t distributions.

t distribution for n = 10

336 Chapter Ten

1.0

0.8 F(2,2)

0.6

0.4

0.2

0.0 0 2 4 6 8 10

F

8

7

6

5

4

3

2

2 4 6 8 10 F

FIGURE 10.12 F distributions.

denominator. Appendix 5 and 6 provide values for the 1 and 5% percentage points for the F distribution. The percentages refer to the areas to the right of the values given in the tables. Figure 10.12 illustrates two F distributions.

Point and Interval Estimation So far, we have introduced a number of important statistics including the sample mean, the sample standard deviation, and the sample variance. These sample statistics are called point estimators because they are single values used to represent population parameters. It is also possible to construct an interval about the statistics that has a predetermined probability of including the true population parameter. This interval is called a confidence interval. Interval estimation is an alternative to point estimation that gives us a better idea of the magnitude of the sampling error. Confidence intervals can be either one-sided or two-sided. A one-sided or confidence interval places an upper or lower bound on the value of a parameter with a specified level of confidence. A twosided confidence interval places both upper and lower bounds.

In almost all practical applications of enumerative statistics, including Six Sigma applications, we make inferences about populations based on data from samples . In this chapter, we have talked about sample averages and standard deviations; we have even used these numbers to make statements about future performance, such as long term

tom

Line

Date post:	16-Mar-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

08-Pyzdek CH08 001-074pyzdek.mrooms.net/file.php/1/reading/bb-reading/establish_baseline_06... ·...

Documents