Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 5 times |
Comparing Distributions I:DIMAC and Fishers Exact
By Peter Woolf ([email protected])University of Michigan
Michigan Chemical Process Dynamics and Controls Open Textbook
version 1.0
Creative commons
Scenario: You run a small plastic factory described in an earlier lecture
You have already developed the P&ID, control architecture, and parameterized your controllers. The system is running well most of the time, but not always. Generally you get a 30% yield, but not always. If the yield is above 32% or below 28% then the batch can’t be sold.
How do you tell if the system is out of control?What do you do if it is out of control?What strategies can you adopt to maintain tighter control?
DMAIC: Define, measure, analyze, improve, and control
Goal:Consistent yield
Measureyield
Control charts, detective workChange system and/or policies
How do you tell if the system is out of control?
1) Make some measurements
How do you tell if the system is out of control?
2) Construct a control chartStatistically out of control because run 9 exceeds the UCL
Now what??
• Log it and do nothing. Wait for it to happen again before taking action– Note lost opportunity to improve
process, and possible safety risk.
Passive solution
What if you are out of control?
What if you are out of control?
• Resample to make sure it is not an error– Odd that this is not done when things are
okay..
• Adjust calculated mean up or down to adjust to the new situation – Treat the symptom, not the cause– Lost opportunity to learn about the process
Semi-passive solutions
• Look for a special cause and remove or enhance it.– Not all changes are bad, some may
actually improve the process.
What if you are out of control?
Active solution
Look for a special cause Possible sources of information:
1) Patterns in the data2) Association with
unmeasured events 3) Known physical effects4) Operators
Field observation: “The feed for run 9 seemed unusually runny--maybe that is the reason?”
Hypothesis: Runny feed causes the product to go out of our desirable range.
1) Gather data2) Evaluate hypothesis3) Make a model of the relationship
(1) Is this significant?(2) What causes the feed to be runny?(3) Can we develop strategies to cope with this?
Data from 25 runs
5Normal feedRunny feed
Bad productGood product 1 18
1
Marginal results(sums on the side that count over one of the states)
Is this significant? --> What are the odds?
2 answers depending on the question:(1) What are the odds of choosing 25
random samples with this particular configuration
(2) What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?
5Normal feedRunny feed
Bad productGood product 1 18
1 6
19
196 25totals
totals
5Normal feedRunny feed
Bad productGood product 1 18
1 6
19
196 25totals
totals
What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?
What are the odds?
Urn Remove 6 balls
Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black?
Break down the problem: For the 6 bad products, odds of 5 with runny feed, 1 normal?
Urn Remove 6 balls
Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black?
Number of ways of choosing 5 out of 6 of the white balls
€
6!
5!(6 − 5)!= 6
Number of ways of choosing 6 out of 25 balls
€
25!
6!(25 − 6)!=177,100
Number of ways of choosing 1 out of 19 of the black balls
€
19!
1!(19 −1)!=19
Odds of this draw
€
(6)(19)
177,100= 0.000305
Urn Remove 6 balls
Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black?
Odds of this draw
€
(6)(19)
177,100= 0.000305
Hypergeometric distribution: probability sampling exactly k special items in a sample of n from an urn containing N items of which m are special
€
phyper =
m
k
⎛
⎝ ⎜
⎞
⎠ ⎟N −m
n − k
⎛
⎝ ⎜
⎞
⎠ ⎟
N
n
⎛
⎝ ⎜
⎞
⎠ ⎟
€
a
b
⎛
⎝ ⎜
⎞
⎠ ⎟=
a!
b!(a−b)!where Reads“a choose b”
€
phyper =
6
5
⎛
⎝ ⎜
⎞
⎠ ⎟25 − 6
6 − 5
⎛
⎝ ⎜
⎞
⎠ ⎟
25
6
⎛
⎝ ⎜
⎞
⎠ ⎟
=
€
6!
5!1!
⎛
⎝ ⎜
⎞
⎠ ⎟
19!
1!18!
⎛
⎝ ⎜
⎞
⎠ ⎟
25!
19!6!
⎛
⎝ ⎜
⎞
⎠ ⎟
= 0.000305
5Normal feedRunny feed
Bad productGood product 1 18
1 6
19
196 25totals
totals
What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?
What are the odds?Analogous arguments can be made for: •1 in 19 of the good products having runny feed•1 in 6 of the runny feed products being good products•1 in 19 of the normal feeds being bad product
Composite probability can be calculated using Fisher’s exact test
Fisher’s exact is the probability of sampling a particular configuration of a 2 by 2 table with constrained marginals
a
Normal feedRunny feedBad productGood product c d
b a+b
c+d
b+da+b a+b+c+dtotals
totals
€
p fisher =(a+ b)!(c + d)!(a+ c)!(b+ d)!
(a+ b+ c + d)!a!b!c!d!
# of ways the marginals can be arranged
# of ways the total can be arranged
# of ways each observation can be arranged
€
p fisher =(a+ b)!(c + d)!(a+ c)!(b+ d)!
(a+ b+ c + d)!a!b!c!d!
5Normal feedRunny feed
Bad productGood product 1 18
1 6
19
196 25totals
totals
What are the odds of choosing 25 samples with these marginals in this configuration?
€
p fisher =(6)!(19)!(6)!(19)!
(25)!5!1!1!18!= 0.00064
In Mathematica:
But this is for this configuration alone!Is this one of many bad configurations?
5Normal feedRunny feed
Bad productGood product 1 18
1 6
19
196 25totals
totals
What are the odds of choosing 25 samples with these marginals in this configuration?
Probability estimate at a particular value
Estimate at a value or further
Or more extreme values..
One tail test..
5Normal feedRunny feed
Bad productGood product 1 18
1 6
19
196 25totals
totals
What are the odds of choosing 25 samples with these marginals in this configuration or more extreme?
A more extreme case with the same marginals
6Normal feedRunny feed
Bad productGood product 0 19
0 6
19
196 25totals
totals
Pfisher=0.00064
Pfisher=0.0000056
P-value = 0.00064+0.0000056=0.0006456
P-valuesP-values can be interpreted as the probability that the null hypothesis is true.
Null hypothesis: Most common interpretation is completely random event, sometimes with constraints
Examples of null hypotheses:
• Runny feed has no impact on product quality• Points on a control chart are all drawn from the same distribution• Two shipments of feed are statistically the same
Often p-values are considered significant if they are less than 0.05 or 0.001, but this limit is not guaranteed to be appropriate in all cases..
Look for a special cause
5Normal feedRunny feed
Bad productGood product 1 18
1 6
19
196 25totals
totals
1) Data
2) Analysis: p-value=0.00064<0.053) Conclusion: runny feed significantly impacts product quality
Note: Runny feed is not the only cause as sometimes we get good product from runny feed..
Look for a special cause
3) Conclusion: runny feed is likely to impact product quality
What next?• Look for root causes: What causes runny feed? Supplier? Temperature? Storage conditions? Lot number? Storage time?
- very process dependent• Develop a method to detect runny feed before it goes into the process
Take Home Messages
• After you identify a system is out of control, take appropriate action
• Associations between variables can be identified using Fisher’s exact tests and its associated p-value
• Once the cause of a disturbance is found, find a way to eliminate it