Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf ([email protected]) University of...

Comparing Distributions I:DIMAC and Fishers Exact

By Peter Woolf ([email protected])University of Michigan

Michigan Chemical Process Dynamics and Controls Open Textbook

version 1.0

Creative commons

Scenario: You run a small plastic factory described in an earlier lecture

You have already developed the P&ID, control architecture, and parameterized your controllers. The system is running well most of the time, but not always. Generally you get a 30% yield, but not always. If the yield is above 32% or below 28% then the batch can’t be sold.

How do you tell if the system is out of control?What do you do if it is out of control?What strategies can you adopt to maintain tighter control?

DMAIC: Define, measure, analyze, improve, and control

Goal:Consistent yield

Measureyield

Control charts, detective workChange system and/or policies

How do you tell if the system is out of control?

1) Make some measurements

How do you tell if the system is out of control?

2) Construct a control chartStatistically out of control because run 9 exceeds the UCL

Now what??

• Log it and do nothing. Wait for it to happen again before taking action– Note lost opportunity to improve

process, and possible safety risk.

Passive solution

What if you are out of control?


• Resample to make sure it is not an error– Odd that this is not done when things are

okay..

• Adjust calculated mean up or down to adjust to the new situation – Treat the symptom, not the cause– Lost opportunity to learn about the process

Semi-passive solutions

• Look for a special cause and remove or enhance it.– Not all changes are bad, some may

actually improve the process.


Active solution

Look for a special cause Possible sources of information:

1) Patterns in the data2) Association with

unmeasured events 3) Known physical effects4) Operators

Field observation: “The feed for run 9 seemed unusually runny--maybe that is the reason?”

Hypothesis: Runny feed causes the product to go out of our desirable range.

1) Gather data2) Evaluate hypothesis3) Make a model of the relationship

(1) Is this significant?(2) What causes the feed to be runny?(3) Can we develop strategies to cope with this?

Data from 25 runs

5Normal feedRunny feed

Bad productGood product 1 18

1

Marginal results(sums on the side that count over one of the states)

Is this significant? --> What are the odds?

2 answers depending on the question:(1) What are the odds of choosing 25

random samples with this particular configuration

(2) What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?



1 6

19

196 25totals

totals



1 6

19

196 25totals

totals

What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?

What are the odds?

Urn Remove 6 balls

Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black?

Break down the problem: For the 6 bad products, odds of 5 with runny feed, 1 normal?

Urn Remove 6 balls


Number of ways of choosing 5 out of 6 of the white balls

€

6!

5!(6 − 5)!= 6

Number of ways of choosing 6 out of 25 balls

€

25!

6!(25 − 6)!=177,100

Number of ways of choosing 1 out of 19 of the black balls

€

19!

1!(19 −1)!=19

Odds of this draw

€

(6)(19)

177,100= 0.000305

Urn Remove 6 balls


Odds of this draw

€

(6)(19)

177,100= 0.000305

Hypergeometric distribution: probability sampling exactly k special items in a sample of n from an urn containing N items of which m are special

€

phyper =

m

k

⎛

⎝ ⎜

⎞

⎠ ⎟N −m

n − k

⎛

⎝ ⎜

⎞

⎠ ⎟

N

n

⎛

⎝ ⎜

⎞

⎠ ⎟

€

a

b

⎛

⎝ ⎜

⎞

⎠ ⎟=

a!

b!(a−b)!where Reads“a choose b”

€

phyper =

6

5

⎛

⎝ ⎜

⎞

⎠ ⎟25 − 6

6 − 5

⎛

⎝ ⎜

⎞

⎠ ⎟

25

6

⎛

⎝ ⎜

⎞

⎠ ⎟

=

€

6!

5!1!

⎛

⎝ ⎜

⎞

⎠ ⎟

19!

1!18!

⎛

⎝ ⎜

⎞

⎠ ⎟

25!

19!6!

⎛

⎝ ⎜

⎞

⎠ ⎟

= 0.000305



1 6

19

196 25totals

totals


What are the odds?Analogous arguments can be made for: •1 in 19 of the good products having runny feed•1 in 6 of the runny feed products being good products•1 in 19 of the normal feeds being bad product

Composite probability can be calculated using Fisher’s exact test

Fisher’s exact is the probability of sampling a particular configuration of a 2 by 2 table with constrained marginals

a

Normal feedRunny feedBad productGood product c d

b a+b

c+d

b+da+b a+b+c+dtotals

totals

€

p fisher =(a+ b)!(c + d)!(a+ c)!(b+ d)!

(a+ b+ c + d)!a!b!c!d!

# of ways the marginals can be arranged

# of ways the total can be arranged

# of ways each observation can be arranged

€

p fisher =(a+ b)!(c + d)!(a+ c)!(b+ d)!

(a+ b+ c + d)!a!b!c!d!



1 6

19

196 25totals

totals

What are the odds of choosing 25 samples with these marginals in this configuration?

€

p fisher =(6)!(19)!(6)!(19)!

(25)!5!1!1!18!= 0.00064

In Mathematica:

But this is for this configuration alone!Is this one of many bad configurations?



1 6

19

196 25totals

totals

What are the odds of choosing 25 samples with these marginals in this configuration?

Probability estimate at a particular value

Estimate at a value or further

Or more extreme values..

One tail test..



1 6

19

196 25totals

totals


A more extreme case with the same marginals



0 6

19

196 25totals

totals

Pfisher=0.00064

Pfisher=0.0000056

P-value = 0.00064+0.0000056=0.0006456

P-valuesP-values can be interpreted as the probability that the null hypothesis is true.

Null hypothesis: Most common interpretation is completely random event, sometimes with constraints

Examples of null hypotheses:

• Runny feed has no impact on product quality• Points on a control chart are all drawn from the same distribution• Two shipments of feed are statistically the same

Often p-values are considered significant if they are less than 0.05 or 0.001, but this limit is not guaranteed to be appropriate in all cases..

Look for a special cause



1 6

19

196 25totals

totals

1) Data

2) Analysis: p-value=0.00064<0.053) Conclusion: runny feed significantly impacts product quality

Note: Runny feed is not the only cause as sometimes we get good product from runny feed..

Look for a special cause

3) Conclusion: runny feed is likely to impact product quality

What next?• Look for root causes: What causes runny feed? Supplier? Temperature? Storage conditions? Lot number? Storage time?

- very process dependent• Develop a method to detect runny feed before it goes into the process

Take Home Messages

• After you identify a system is out of control, take appropriate action

• Associations between variables can be identified using Fisher’s exact tests and its associated p-value

• Once the cause of a disturbance is found, find a way to eliminate it

Date post:	20-Dec-2015
Category:	Documents
View:	217 times
Download:	5 times

Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf ([email protected]) University of...

Documents