+ All Categories
Home > Documents > Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf ([email protected]) University of...

Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf ([email protected]) University of...

Date post: 20-Dec-2015
Category:
View: 217 times
Download: 5 times
Share this document with a friend
Popular Tags:
23
Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf ([email protected]) University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0 Creative commons
Transcript
Page 1: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Comparing Distributions I:DIMAC and Fishers Exact

By Peter Woolf ([email protected])University of Michigan

Michigan Chemical Process Dynamics and Controls Open Textbook

version 1.0

Creative commons

Page 2: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Scenario: You run a small plastic factory described in an earlier lecture

You have already developed the P&ID, control architecture, and parameterized your controllers. The system is running well most of the time, but not always. Generally you get a 30% yield, but not always. If the yield is above 32% or below 28% then the batch can’t be sold.

How do you tell if the system is out of control?What do you do if it is out of control?What strategies can you adopt to maintain tighter control?

Page 3: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

DMAIC: Define, measure, analyze, improve, and control

Goal:Consistent yield

Measureyield

Control charts, detective workChange system and/or policies

Page 4: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

How do you tell if the system is out of control?

1) Make some measurements

Page 5: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

How do you tell if the system is out of control?

2) Construct a control chartStatistically out of control because run 9 exceeds the UCL

Now what??

Page 6: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

• Log it and do nothing. Wait for it to happen again before taking action– Note lost opportunity to improve

process, and possible safety risk.

Passive solution

What if you are out of control?

Page 7: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

What if you are out of control?

• Resample to make sure it is not an error– Odd that this is not done when things are

okay..

• Adjust calculated mean up or down to adjust to the new situation – Treat the symptom, not the cause– Lost opportunity to learn about the process

Semi-passive solutions

Page 8: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

• Look for a special cause and remove or enhance it.– Not all changes are bad, some may

actually improve the process.

What if you are out of control?

Active solution

Page 9: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Look for a special cause Possible sources of information:

1) Patterns in the data2) Association with

unmeasured events 3) Known physical effects4) Operators

Field observation: “The feed for run 9 seemed unusually runny--maybe that is the reason?”

Page 10: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Hypothesis: Runny feed causes the product to go out of our desirable range.

1) Gather data2) Evaluate hypothesis3) Make a model of the relationship

(1) Is this significant?(2) What causes the feed to be runny?(3) Can we develop strategies to cope with this?

Data from 25 runs

5Normal feedRunny feed

Bad productGood product 1 18

1

Page 11: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Marginal results(sums on the side that count over one of the states)

Is this significant? --> What are the odds?

2 answers depending on the question:(1) What are the odds of choosing 25

random samples with this particular configuration

(2) What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?

5Normal feedRunny feed

Bad productGood product 1 18

1 6

19

196 25totals

totals

Page 12: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

5Normal feedRunny feed

Bad productGood product 1 18

1 6

19

196 25totals

totals

What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?

What are the odds?

Urn Remove 6 balls

Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black?

Break down the problem: For the 6 bad products, odds of 5 with runny feed, 1 normal?

Page 13: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Urn Remove 6 balls

Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black?

Number of ways of choosing 5 out of 6 of the white balls

6!

5!(6 − 5)!= 6

Number of ways of choosing 6 out of 25 balls

25!

6!(25 − 6)!=177,100

Number of ways of choosing 1 out of 19 of the black balls

19!

1!(19 −1)!=19

Odds of this draw

(6)(19)

177,100= 0.000305

Page 14: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Urn Remove 6 balls

Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black?

Odds of this draw

(6)(19)

177,100= 0.000305

Hypergeometric distribution: probability sampling exactly k special items in a sample of n from an urn containing N items of which m are special

phyper =

m

k

⎝ ⎜

⎠ ⎟N −m

n − k

⎝ ⎜

⎠ ⎟

N

n

⎝ ⎜

⎠ ⎟

a

b

⎝ ⎜

⎠ ⎟=

a!

b!(a−b)!where Reads“a choose b”

phyper =

6

5

⎝ ⎜

⎠ ⎟25 − 6

6 − 5

⎝ ⎜

⎠ ⎟

25

6

⎝ ⎜

⎠ ⎟

=

6!

5!1!

⎝ ⎜

⎠ ⎟

19!

1!18!

⎝ ⎜

⎠ ⎟

25!

19!6!

⎝ ⎜

⎠ ⎟

= 0.000305

Page 15: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

5Normal feedRunny feed

Bad productGood product 1 18

1 6

19

196 25totals

totals

What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?

What are the odds?Analogous arguments can be made for: •1 in 19 of the good products having runny feed•1 in 6 of the runny feed products being good products•1 in 19 of the normal feeds being bad product

Composite probability can be calculated using Fisher’s exact test

Page 16: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Fisher’s exact is the probability of sampling a particular configuration of a 2 by 2 table with constrained marginals

a

Normal feedRunny feedBad productGood product c d

b a+b

c+d

b+da+b a+b+c+dtotals

totals

p fisher =(a+ b)!(c + d)!(a+ c)!(b+ d)!

(a+ b+ c + d)!a!b!c!d!

# of ways the marginals can be arranged

# of ways the total can be arranged

# of ways each observation can be arranged

Page 17: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

p fisher =(a+ b)!(c + d)!(a+ c)!(b+ d)!

(a+ b+ c + d)!a!b!c!d!

5Normal feedRunny feed

Bad productGood product 1 18

1 6

19

196 25totals

totals

What are the odds of choosing 25 samples with these marginals in this configuration?

p fisher =(6)!(19)!(6)!(19)!

(25)!5!1!1!18!= 0.00064

In Mathematica:

But this is for this configuration alone!Is this one of many bad configurations?

Page 18: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

5Normal feedRunny feed

Bad productGood product 1 18

1 6

19

196 25totals

totals

What are the odds of choosing 25 samples with these marginals in this configuration?

Probability estimate at a particular value

Estimate at a value or further

Or more extreme values..

One tail test..

Page 19: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

5Normal feedRunny feed

Bad productGood product 1 18

1 6

19

196 25totals

totals

What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?

A more extreme case with the same marginals

6Normal feedRunny feed

Bad productGood product 0 19

0 6

19

196 25totals

totals

Pfisher=0.00064

Pfisher=0.0000056

P-value = 0.00064+0.0000056=0.0006456

Page 20: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

P-valuesP-values can be interpreted as the probability that the null hypothesis is true.

Null hypothesis: Most common interpretation is completely random event, sometimes with constraints

Examples of null hypotheses:

• Runny feed has no impact on product quality• Points on a control chart are all drawn from the same distribution• Two shipments of feed are statistically the same

Often p-values are considered significant if they are less than 0.05 or 0.001, but this limit is not guaranteed to be appropriate in all cases..

Page 21: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Look for a special cause

5Normal feedRunny feed

Bad productGood product 1 18

1 6

19

196 25totals

totals

1) Data

2) Analysis: p-value=0.00064<0.053) Conclusion: runny feed significantly impacts product quality

Note: Runny feed is not the only cause as sometimes we get good product from runny feed..

Page 22: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Look for a special cause

3) Conclusion: runny feed is likely to impact product quality

What next?• Look for root causes: What causes runny feed? Supplier? Temperature? Storage conditions? Lot number? Storage time?

- very process dependent• Develop a method to detect runny feed before it goes into the process

Page 23: Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls.

Take Home Messages

• After you identify a system is out of control, take appropriate action

• Associations between variables can be identified using Fisher’s exact tests and its associated p-value

• Once the cause of a disturbance is found, find a way to eliminate it


Recommended