department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering...

Post on 11-Jan-2016

213 views 0 download

transcript

department of mathematics and computer science

2DS01

Statistics 2 for Chemical

Engineering

http://www.win.tue.nl/~sandro/2DS01

department of mathematics and computer science

Lecturers

• Marko Boon (m.a.a.boon@tue.nl)

• Dr. A. Di Bucchianico (a.d.bucchianico@tue.nl)

• Ir. G.D. Mooiweer (g.d.mooiweer@tue.nl)

• Drs. C.M.J. Rusch – Groot (c.m.j.rusch@tue.nl)

department of mathematics and computer science

Important to remember

• Web site for this course: http://www.win.tue.nl/~sandro/2DS01/

• No textbook, but handouts + Powerpoint sheets through web site

• Bring notebook to fourth lecture (12th of April) and self-study

• Software:

– Statgraphics (version 5.1). If not installed, install through

http://w3.tue.nl/nl/diensten/dienst_ict/organisatie/groepen/wins/campus_software/

– Java (at least version 1.4). Install through http://java.com.

Java is needed to run Statlab (http://www.win.tue.nl/statlab).

Important: In order to run Statlab during the exams, security settings have to

be adjusted!

department of mathematics and computer science

Goals of this course

• teach students need for statistical basis of

experimentation

• teach students statistical tools for experimentation

– design of experiments (factorial designs, optimal designs)

– analysis of experiments (ANOVA)

– use of statistical software

• give students short introduction to recent

developments

department of mathematics and computer science

Week schedule

Week 1: Introduction to Analysis of Variance

(ANOVA)

Week 2: Factorial designs: screening

Week 3: Factorial designs: optimisation

Week 4: Optimal experimental design

and mixture designs (by A. Di Bucchianico

– Bring your laptop!)

department of mathematics and computer science

Detailed contents of week 1

• statistics and experimentation

• short recapitulation of regression analysis

• one-way ANOVA

• one-way ANOVA with blocks

• multiple comparisons

department of mathematics and computer science

Statistics and experimentation

Chemical experiments often depend on several

factors (pressure, catalyst, temperature, reaction

time, ...)

Two important questions:

• which factors are really important?

• what are optimal settings for important factors?

department of mathematics and computer science

Use of statistical experimentation in chemical engineering

•Chemical synthesis (synthetic steps; work up and separation;

reagents, solvents, catalysts; structure, reactivity and

properties, ...)

•Biotech industry (drug design, analytical biochemistry, process

optimization – fermentation, purification ,...)

•Process industry (process optimization and control -yield, purity,

through put time, pollution, energy consumption; product quality

and performance - material strength, warp, color, taste, odour; ...)

• ...

department of mathematics and computer science

Short history of statistics and experimentation

• 1920’s - ... introduction of statistical methods in

agriculture by Fisher and co-workers

• 1950’s - ... introduction in chemical engineering

(Box, ...)

• 1980’s - ... introduction in Western industry of Japanese

approach (Taguchi, robust design)

• 1990’s - ... combinatorial chemistry, high througput

processing

department of mathematics and computer science

Link to Statistics 1 for Chemical Engineering

• introduction to measurements

– data analysis

– error propagation

• regression analysis

• use of statistical software (Statgraphics)

department of mathematics and computer science

Types of regression analysis

Linear means linear in coefficients, not linear functions!

•Simple linear regression

•Multiple linear regression

• Non-linear regression

0 1Y x

0 1 1 2 2 ...Y x x

21Y C

department of mathematics and computer science

Model:

ssumptions:

• the model is linear (+ enough terms)

• the i's are normally distributed with =0 and

variance 2

• the i's are independent.

Linear regression

0 1 1 2 2 ...i i i iY x x

department of mathematics and computer science

Specific warmth

•specific warmth of vapour at constant pressure as function of

temperature

•data set from Perry’s Chemical Engineers’ Handbook

• thermodynamic theories say that quadratic relation between

temperature and specific warmth usually suffices:

2210 TTC p

department of mathematics and computer science

Scatter plot of specific warmth data

Plot of Cp vs T

T

Cp

250 300 350 4001800

1900

2000

2100

2200

department of mathematics and computer science

Regression output specific warmth data

Polynomial Regression Analysis-----------------------------------------------------------------------------Dependent variable: Cp----------------------------------------------------------------------------- Standard TParameter Estimate Error Statistic P-Value-----------------------------------------------------------------------------CONSTANT 3590.36 76.3041 47.0533 0.0000T -12.1386 0.454369 -26.7153 0.0000T^2 0.0213415 0.000670762 31.8169 0.0000-----------------------------------------------------------------------------

Analysis of Variance-----------------------------------------------------------------------------Source Sum of Squares Df Mean Square F-Ratio P-Value-----------------------------------------------------------------------------Model 169252.0 2 84626.2 6227.13 0.0000Residual 285.388 21 13.5899-----------------------------------------------------------------------------Total (Corr.) 169538.0 23

R-squared = 99.8317 percentR-squared (adjusted for d.f.) = 99.8156 percentStandard Error of Est. = 3.68645Mean absolute error = 2.94042Durbin-Watson statistic = 0.310971 (P=0.0000)Lag 1 residual autocorrelation = 0.640511

department of mathematics and computer science

Issues in regression output

• significance of model

• significance of individual regression parameters

• residual plots:

– normality (density trace, normal probability plot)

– constant variance (against predicted values + each independent

variable)

– model adequacy (against predicted values)

– outliers

– independence

• influential points

department of mathematics and computer science

Residual plot specific warmth data

This behaviour is visible in plot of fitted line only after rescaling!

Residual Plot

predicted Cp

Stu

dentized r

esid

ual

1800 1900 2000 2100 2200-3.8

-1.8

0.2

2.2

4.2

department of mathematics and computer science

Plot of fitted quadratic model for specific warmth data

Plot of Fitted Model

T

Cp

250 300 350 400 4501800

1900

2000

2100

2200

department of mathematics and computer science

Conclusion regression models for specific warmth data

• we need third order model (polynomial of degree

3)

• careful with extrapolation

• original data set contains influential points

• original data set contains potential outliers

department of mathematics and computer science

Analysis of variance

• name refers to mathematical technique, not to

goal

• comparison of means (!!) using variances

(extension of t-test to more than 2 samples)

• samples usually are groups of measurements with

constant factor settings

department of mathematics and computer science

Example: ANOVA

production of yarns: influence of fibre composition on

breaking tension

simplification:

one factor: % cotton

fixed factor levels: 15%, 20%, 25%, 30%, 35%

experimental design: produce on the same machine 5

threads of each type of fibre composition in random

order

department of mathematics and computer science

Statistical setting

Basis model: Yij = + i + ij

influencefactor levels

i=1,2,…k

error term:• normal =0, 2

• independent

replicationsj=1,2,…,n

• Basis hypotheses:H0: i = 0 for all iH1: i 0 for at least one i

overallmean

department of mathematics and computer science

Expectation under H0 (= no effect of factor level)

spread observations with respect to group

means

spread group means with respect to overall

meanchance

department of mathematics and computer science

Expectation under H1

spread observations with respect to

group means

chance

systematicspread group means with respect to

overall mean

department of mathematics and computer science

Illustration of group means

y

3y

2y

1y

department of mathematics and computer science

Group means versus overall mean

y

3y

2y

1y

33 yy j

yy3

yy j3

department of mathematics and computer science

Conclusion

Comparison of both spreads yields indication for H0 vs

H1.

2

1 1.

2

1...

2

1 1..

k

i

n

jiij

k

ii

k

i

n

jij yyyynyy

total treatment:between groups

rest: within groups= +

department of mathematics and computer science

Conclusion

Comparison of both spreads yields indication for H0 vs

H1.

2

1 1.

2

1...

2

1 1..

k

i

n

jiij

k

ii

k

i

n

jij yyyynyy

total treatment:between groups

rest: within groups= +

Spreads are converted into sums of squares:

department of mathematics and computer science

Mean Sums of Squares

sums of squares differ with respect to number of

contributions.

for fair comparison: divide by degrees of freedom.

• we expect under H0: MSbetween MSwithin

• we expect under H1: MSbetween >> MSwithin

summary in ANOVA table

department of mathematics and computer science

Completely Randomized One-factor DesignCompletely Randomized One-factor Design

Experiment, in which one factor varies on k levels.

At each level n measurements are taken.

The order of all measurements is random.

department of mathematics and computer science

Multiple comparisons

• ANOVA only indicates whether there are significantly different

group means

• ANOVA does not indicate which groups have different means

(although we may construct confidence intervals for differences)

• various methods exist for correctly performing pairwise

comparisons:

– LSD (Least Significant Difference) method

– HSD (Honestly Significant Difference) method

– Duncan

– Newman – Keuls

– ...

department of mathematics and computer science

Randomized one-factor block designRandomized one-factor block design

In each block all treatments occur equally often;randomization within blocks

Experiment with one factor and observations in blocks

Blocks are levels of noise factor.

department of mathematics and computer science

Example

testing method for material hardness :

forcepressure pin/tip

strip testing material

practical problem: 4 types of pressure pins do these yield the same results?

department of mathematics and computer science

Experimental design 1

1234

5678

9101112

13141516

pin 1 pin 2 pin 4pin 3

testingstrips

Problem: if the measurements of strips 5 through 8 differ, is

this caused by the strips or by pin 2?

department of mathematics and computer science

Experimental design 2

Take 4 strips on which you measure (in random

order) each pressure pin once :

1324

1432

4321

2314

strip 1 strip 2 strip 4strip 3

pressurepins

department of mathematics and computer science

Blocking

Advantage of blocked experimental design 2:

differences between strips are filtered out

Model: Yij = + i + j + ij

• Primary goal: reduction error term

factorpressure pin

block effectstrip

error term

department of mathematics and computer science

Summary

• completely randomized design

• randomized block design

• multiple comparisons

Reading material:

• Statgraphics lecture notes section 4.1 through 4.3

http://www.acc.umu.se/~tnkjtg/chemometrics/editorial/aug2002.htm

l