+ All Categories
Home > Documents > INTRODUCCION ESTADISTICO

INTRODUCCION ESTADISTICO

Date post: 14-Apr-2018
Category:
Upload: leibnytznewton
View: 228 times
Download: 0 times
Share this document with a friend

of 37

Transcript
  • 7/30/2019 INTRODUCCION ESTADISTICO

    1/37

    Introductory Statistics

    By Peter Woolf ([email protected])

    University of Michigan

    Michigan Chemical Process

    Dynamics and Controls

    Open Textbook

    version 1.0

    Creative commons

  • 7/30/2019 INTRODUCCION ESTADISTICO

    2/37

    A foolish consistency is the hobgoblin of little

    minds R. W. Emerson

    But is this always true??

  • 7/30/2019 INTRODUCCION ESTADISTICO

    3/37

    Consistency

    Why might we want consistency?

    Integration of products within a larger

    system

    Examples: want parts to fit together, want

    consistent chemical feeds, want consistent

    material properties, want consistent energy

    content, want consistent flavor

  • 7/30/2019 INTRODUCCION ESTADISTICO

    4/37

    Consistency What can be the downsides of

    consistency?

    Make something consistently bad, but

    consistent.

    Sometimes people trade consistency forquality--this is not the goal.

    Examples: Fast food vs home made food

    (depends the cook)

  • 7/30/2019 INTRODUCCION ESTADISTICO

    5/37

    Measures of Quality (or lack there of):

    Six Sigma: Number of defects per million

    opportunities

    Genichi Taguchi: Uniformity around a target valueor The loss a product imposes on society after it

    is shipped

    Process control is a central tool for reducing

    variability by adjusting and correcting for

    variations.

    Key Questions: How can we know if our

    control system is working well enough?

    How can we measure variability?

  • 7/30/2019 INTRODUCCION ESTADISTICO

    6/37

    Process Specific Questions

    1) Do recent data indicate that the

    process is broken or changed?

    2) Is the process out of control?

    3) What are the odds that two samples

    come from the same distribution?

    4) What factors influence this outcome?

  • 7/30/2019 INTRODUCCION ESTADISTICO

    7/37

    Detecting if a process has

    changed

    Scenario: You are a small Acai juice

    vendor trying to expand to a world

    market with a consistent product.

  • 7/30/2019 INTRODUCCION ESTADISTICO

    8/37

    Acai juice production

    Acai berries in the market

    Berry

    crusher

    juice

  • 7/30/2019 INTRODUCCION ESTADISTICO

    9/37

    Acai juice productionjuice

    A key selling point of your acai

    juice is that it contains a large

    concentration of antioxidants.

    With your berry crusher you get agood quality product most of the time,

    but not always. You dont want to

    waste berries if your crusher is

    hurting your product, but how can

    you know if it is not working right?

    How can you test this?

  • 7/30/2019 INTRODUCCION ESTADISTICO

    10/37

    Acai juice productionjuiceHow can you test this?

    1) Gather many samples from your

    current process and measure the

    antioxidant concentration

    N sample values: 40.1, 41.3, 44.3, 39.3,

    38.6,..

    How do we summarize this?

  • 7/30/2019 INTRODUCCION ESTADISTICO

    11/37

    juice

    N sample values: 40.1, 41.3, 44.3,

    39.3, 38.6,..How do we summarize this?

    =1

    Nx

    i

    i=1

    N

    "Average:

    Deviation from

    the average:

    (std deviation)"=

    1

    N(x

    i#)2

    i=1

    N

    $

  • 7/30/2019 INTRODUCCION ESTADISTICO

    12/37

    Deviation from

    the average:

    (std deviation)

    "=

    1

    N

    (xi#)2

    i=1

    N

    $

    Interpretation: The average distance from the mean

    OR the width of the dispersion around the mean

    Problem: What if I have only one sample (e.g. N=1)?

    =0!!

    Does this mean that the underlying process has no

    variation or that I have not sampled it sufficiently?

    Result:When N is small, the standard deviation willunderestimate the true variation

    Solution: sample standard deviations =

    1

    (N"1)(x

    i")2

    i=1

    N

    #

  • 7/30/2019 INTRODUCCION ESTADISTICO

    13/37

    Population standard

    deviation

    (Real deviation)

    "=

    1

    N(x

    i#)2

    i=1

    N

    $

    s =1

    (N"1)(x

    i")2

    i=1

    N

    #

    Sample standard

    deviation

    (Observed deviation)

    With a measure of the mean and standard deviation, you

    have enough information to define a Gaussian distribution

    Bell curve shape

    based on a model of alarge number of random,

    uncorrelated changes

  • 7/30/2019 INTRODUCCION ESTADISTICO

    14/37

    Gaussian or Normal Distribution:

    From previous lecture on Noise:

    Approximate Gaussian distribution in Excel by:=RAND()+RAND()+RAND()-RAND()-RAND()-RAND()

    The approximation is better and better for larger numbers

    of pairs of add and subtract

    Gaussian distribution is the basis of much of statisticalquality control, six sigma, and quality engineering in general.

    2

    3

    -2

    -3

    6 How do we mathematically

    define a normal distribution?

  • 7/30/2019 INTRODUCCION ESTADISTICO

    15/37

    mean and standard deviation aresufficient statistics, meaning that they are sufficient to

    describe a normal distribution

    Mathematically, we can describe a normal distribution by the followingprobability

    distribution functi on :

    PDF(x |,") =1

    " 2#

    exp $1

    2

    x $

    "

    %

    &

    '(

    )

    *

    2+

    ,-

    .

    /0

  • 7/30/2019 INTRODUCCION ESTADISTICO

    16/37

    If we want to find the density up to some point, say z or less we can just integrate:

    PDF(x |,")dx#$

    z

    % =1

    21+ erf

    z #

    " 2

    &

    '(

    )

    *+

    ,

    -.

    /

    01

    (Note: this just makes one hard problem into another, in that now we have to calculate theerror function)

    The error function is defined as:

    erf(x) 22

    3

    exp(#t2)dt0

    x

    %

    How can we calculate this?

    Excel:

    Error function is Erf(), thus the solution above could be

    expressed as

    =1/2*(1+erf((z-m)/(s*sqrt(2))))

    Mathematica:

    Nintegrate[ f(x), {x,start, end}]

    Or

    N[1/2*(1+Erf[(z-m)/(s*Sqrt[2])])]

    General numerical integration

    Using analytical solution

    with error function

  • 7/30/2019 INTRODUCCION ESTADISTICO

    17/37

    juice

    Acai juice problem revisited

    From 100 samples of the

    current process we calculate

    the following:

    Mean=40 units

    Standard deviation= 2 units

    From these data, what are the

    odds that the next batch will

    have an antioxidant value of

    37.5 or less?

    1

    " 2#exp $

    1

    2

    x $

    "

    %&'

    ()*2+

    ,-./0$1

    37.5

    2 dx

    =

    1

    21+ erf

    37.5"

    # 2

    $

    %&

    '

    ()

    *

    +,

    -

    ./

  • 7/30/2019 INTRODUCCION ESTADISTICO

    18/37

    Mean=40 units

    Standard deviation= 2 units

    From these data, what are the

    odds that the next batch willhave an antioxidant value of

    37.5 or less?

    1

    " 2#exp $

    1

    2

    x $

    "

    %&'

    ()*2+

    ,-./0$1

    37.5

    2 dx

    =

    1

    21+ erf

    37.5"

    # 2

    $

    %&

    '

    ()

    *

    +,

    -

    ./

    In Mathematica:

    short hand notation

    Answer: ~10% of the time we expect this situation

  • 7/30/2019 INTRODUCCION ESTADISTICO

    19/37

    Example 1:Say that we have a reactor with a temperature mean of 100 and standard deviation of 5

    degree. Calculate the probability of measuring a temperature of 92 or less.

    PDF(x |100,5)dx"#

    92

    $ =1

    21+ erf

    92 "100

    5 2

    %

    &'

    (

    )*

    +

    ,-

    .

    /0=

    1

    21+ erf "1.13( )[ ] = 0.054

    What about 100 or less? -> 0.5

    Example 2:

    Given this same system, what is the probability that the reactor is within 4 sigma of themean? (e.g. +/- 10 degrees)

    PDF(x |100,5)dx"#

    110

    $ " PDF(x |100,5)dx"#

    90

    $ =

    1

    21+ erf

    110 "100

    5 2

    %

    &'

    (

    )*

    +

    ,-

    .

    /0"

    1

    21+ erf

    90"100

    5 2

    %

    &'

    (

    )*

    +

    ,-

    .

    /0= 0.9545

  • 7/30/2019 INTRODUCCION ESTADISTICO

    20/37

    Example 1:Say that we have a reactor with a temperature mean of 100 and standard deviation of 5

    degree. Calculate the probability of measuring a temperature of 92 or less.

    PDF(x |100,5)dx"#

    92

    $ =1

    21+ erf

    92 "100

    5 2

    %

    &'

    (

    )*

    +

    ,-

    .

    /0=

    1

    21+ erf "1.13( )[ ] = 0.054

    What about 100 or less? -> 0.5

    Example 2:

    Given this same system, what is the probability that the reactor is within 4 sigma of themean? (e.g. +/- 10 degrees)

    PDF(x |100,5)dx"#

    110

    $ " PDF(x |100,5)dx"#

    90

    $ =

    1

    21+ erf

    110 "100

    5 2

    %

    &'

    (

    )*

    +

    ,-

    .

    /0"

    1

    21+ erf

    90"100

    5 2

    %

    &'

    (

    )*

    +

    ,-

    .

    /0= 0.9545

  • 7/30/2019 INTRODUCCION ESTADISTICO

    21/37

    Acai juice production as a function of time

    time

    Antioxidant

    value

    Is this process out of control?

    Yes: It is unusual to see so many

    batches with such a high value--

    this is strange and suggestssomething has changed.

    No: This is just normal variation--

    nothing is fundamentally different.

    Key question:

    How do we define

    unusual

  • 7/30/2019 INTRODUCCION ESTADISTICO

    22/37

    One definition: Variation

    outside of the six sigma

    window is unusual

    ean and standard deviation aresufficient statistics, meaning that they are sufficient to

    describe a normal distribution

    athematically, we can describe a normal distribution by the following probability

    istribution function :

    PDF(x |,") =1

    " 2#exp $

    1

    2

    x $

    "

    %

    &'

    (

    )*2+

    ,-

    .

    /0

    2 3-2-3

    6

    What are the odds of finding

    something that falls out of this

    bound by chance?

    Find by integration!

    For both tails the probability is ~0.0027

    or 1 in 370

    Common confusion:

    The Six Sigma process

    defines unusualas 3.4 defects

    out of 1 million, not within 6

    standard deviations (more like10.2 deviations)

  • 7/30/2019 INTRODUCCION ESTADISTICO

    23/37

    Acai juice production as a function of time

    time

    Antioxidant

    value

    Is this process out of control?

    Translation: if we assume outside of 6 sigma variation is

    unusual: Is this pattern expected to happen less than 1 in

    370 of our samples?Solution: Control charts!

  • 7/30/2019 INTRODUCCION ESTADISTICO

    24/37

    Image from wikipedia western_electric_rules

    Control charts determine if a process is behaving in an unusual

    way.

  • 7/30/2019 INTRODUCCION ESTADISTICO

    25/37

    Image from wikipedia western_electric_rules

    Control charts determine if a process is behaving in an unusual

    way.

    What are the odds?

    If each dot is a single measurement, and UCL is +3 sigma then

    UCL=Uppercontrol limit

    X-bar=

    average

    LCL=Lowercontrol limit

    For both tails the probability is ~0.0027

    or 1 in 370

    Rule 1:

  • 7/30/2019 INTRODUCCION ESTADISTICO

    26/37

    Control charts determine if a process is behaving in an unusual

    way.

    What are the odds?

    UCL=Uppercontrol limit

    X-bar=

    average

    LCL=Lowercontrol limit

    Rule 2: Can do using probability theory.

    Assuming each sample is independent, then can find the

    total probability of:

    2*[P1(out+)P2(out+)P3(out+)+P1(out+)P2(out+)P3(in)+P1(out+)P2(in)P3(out+)+P1(in)P2(out+)P3(out+)]

    =P(out+) P(in)=1-P(out+)

    =0.00305

    or 1 in 326

    1 in 370 1 in 326

  • 7/30/2019 INTRODUCCION ESTADISTICO

    27/37

    What are the odds?Alternative solution by sampling

    1 in 370 1 in 326

    Approach: Generate

    thousands of samples

    and test to see how

    many satisfy the rule

    ~ similar to 1 in 370

  • 7/30/2019 INTRODUCCION ESTADISTICO

    28/37

    What are the odds?

    1 in 326

    Alternative solution by sampling

    Rule 2:

    ~ similar to 1 in 326

    Message: Many complex decision

    processes can be evaluated

    numerically with good accuracy

    (see mathematica code on

    website under Lecture 21.nb)

  • 7/30/2019 INTRODUCCION ESTADISTICO

    29/37

    Odds 1 in 370 Odds 1 in 326

    Odds 1 in 256Odds 1 in 180

    In all cases these

    represent somewhat

    rare cases in a

    statistical sense, butthey are not all

    equally rare.

    These are not only

    constrained on

    statistics though..

    e.g. What are the odds

    of finding 15 consecutive

    samples in zone c?

    =Odds 1 in 306

    Thus is this system out of control?

    Yes, but in a good way.

  • 7/30/2019 INTRODUCCION ESTADISTICO

    30/37

    Acai juice problem revisited

    What if you know that each batch of

    berries has some variation, but you

    are unsure if the machine is

    behaving strangely? Can you still

    use your control charts?

    Solution: Take samples from each

    batch, average them and plotthese average values and

    statistics on a control chart.

    Problem: The process of

    averaging out different samples

    will change your odds--averaging

    reduces out variation.

    Day 1: 40.36, 39.36, 38.43, 39.67

    Day 2: 39.96, 40.32, 39.88, 39.75

  • 7/30/2019 INTRODUCCION ESTADISTICO

    31/37

    Acai process

    control using

    X-bar charts

    Raw Data:

    Plotting the raw data, it

    is hard to say if

    anything is going on..

  • 7/30/2019 INTRODUCCION ESTADISTICO

    32/37

    To get something

    like this need UCL

    and LCL

    Acai process

    control using

    X-bar chartsRaw Data:

    Data in excel example online

    Lecture.21.xls

  • 7/30/2019 INTRODUCCION ESTADISTICO

    33/37

    To get something

    like this need UCL

    and LCL

    UCL= grand avg+

    A3*(avg stdev)

    = 39.86+ 1.628*0.55

    =40.76

  • 7/30/2019 INTRODUCCION ESTADISTICO

    34/37

    To get something

    like this need UCL

    and LCL

    UCL= grand avg+

    A3*(avg stdev)

    = 39.86+ 1.628*0.55

    =40.76

    Note: If you use A2, you

    use the average R. The

    result is 40.77--nearly the

    same.LCL=grand avg-A3*(avg stdev)

    =38.96

    UCL represents 3 standard

    deviations away from the mean, sothe line between zones A/B is 2

    standard deviations away:

    A/B line=grand avg+

    A3*(avg stdev)*(2/3)= 40.46

  • 7/30/2019 INTRODUCCION ESTADISTICO

    35/37

    X-bar chart

    Is it in control?

    Rule 2: fail: points 9 and 10 are in zone A

    Rule 1: okay, no points outside of zone A

    Rules 3 and 4: okay

    Conclusion:

    Not in statisticalcontrol.

  • 7/30/2019 INTRODUCCION ESTADISTICO

    36/37

    Take Home Messages Statistical process control is a method

    for systematically identifyinginconsistencies.

    Probabilities are often based on aGaussian process

    Control charts provide a systematic

    method for evaluating if a process isunder control.

  • 7/30/2019 INTRODUCCION ESTADISTICO

    37/37

    A foolish consistency is the hobgoblin of

    little minds

    --R. W. Emerson

    An intelligent consistency is a virtue in

    an integrated global economy

    --Anonymous


Recommended