k 2DS00 Statistics 1 for Chemical Engineering /k Lecturers Dr. A. Di Bucchianico – Department of...

2DS00

Statistics 1 for Chemical

Engineering

Lecturers• Dr. A. Di Bucchianico

– Department of Mathematics,

– Statistics group

– HG 9.24

– phone (040) 247 2902

– [email protected]

• Ir. G.D. Mooiweer,

– Department of Mathematics

– ICTOO

– HG 9.12

– phone 040 247 4277

(Thursdays)


•Dr. R.W. van der Hofstad

– Department of Mathematics,

– Statistics group

– HG 9.04

– phone (040) 247 2910


Goals of this course

• to prepare students for (first-year) laboratory assignments

• to learn students how to perform basic statistical analyses of

experiments

• to learn students how to use software for data analysis

• to learn students how to avoid pitfalls in analysing measurements

Important to remember

• Web site for this course: www.win.tue.nl/~sandro/2DS00/

• No textbook, but handouts (Word) + Powerpoint sheets through

web site

• Bring notebook to both lectures and self-study

• (Optional) buy lecture notes 2256 “Statgraphics voor regulier

onderwijs”

• (Optional) buy lectures notes 2218 “Statistisch Compendium”

How to study

• read lecture notes briefly before lecture

• ask questions during lecture

• study lecture notes carefully after lecture

• make excercises during guided self-study

• reread lecture notes after guided self-study

• try out previous examinations shortly before the examination

N.B. Lecture notes (pdf documents) PowerPoint files

Week schedule

Week 1: Measurement and statistics

Week 2: Error propagation

Week 3: Simple linear regression analysis

Week 4: Multiple linear regression analysis

Week 5: Nonlinear regression analysis

Detailed contents of week 1

• measurement errors

• graphical displays of data

• summary statistics

• normal distribution

• confidence intervals

• hypothesis testing

Measurements and statistics

• perfect measurements do not exist

• possible sources of measurement errors:

– reading

– environment

• temperature

• humidity

• ...

– impurities

– ...

Necessity of good measurement system

Three experiments

Experiment 1

4,5 5 5,5 60

Experiment 3

4 4,5 5 5,50

1

Experiment 2

4,5 5 5,5 6 6,5 70

Types of measurement errors

• Random errors

– always present

– reduce influence by averaging repeated measurements

• Systematic errors

– requires adjustment/repair of measuring devices

• Outliers

– recording errors

– mistakes in applying procedures

Illustration of measurement concepts

Accuracy

difference between average of measured values and true value

Accuracy

• relates to systematic errors

• absolute error

• relative error

ti ie x x

rel,i t

it

x xe =

x

Location statistics

• mean

• median

• trimmed means

Precision

the degree in which consistent results are obtained

Accurate and precise

Statistics for precision: standard deviation & co• standard deviation

• standard error

• variation coefficient

• variance

•range

2 2 2

11

1 1

nn

iiii

x

x nxx xs

n n

/x xs s n/xCV xs

minmaxR

22 2 2

1 1

1 1

1 1

n n

x x i ii i

v s x x x nxn n

Robust statistics for precision

• robust statistics

– less sensitive to outliers

– difficult mathematical theory

– requires use of statistical software

•interquartile range

– IQR = 75% quantile – 25% quantile = 3rd quartile – 1st quartile

• mean absolute deviation

1

1

1

n

ii

MAD x xn

Graphical displays

• always make graphical displays for first impression

• “one picture says more than 1000 words”

Plot of calcium vs time

time

calc

ium

0 3 6 9 12 15-0,1

0,9

1,9

2,9

3,9

4,9

5,9

2 3.1 4 1.9 2.8

Basic graphical displays

• scatter plot

– watch out for scale (automatic resizing)

• time sequence plot

– for detecting time effects like warming up

• Box-and-Whisker plot

– outliers

– quartiles

– skewness

Time sequence plot

Nummer van de waarneming

met

ingTime Sequence Plot

1 3 5 7 9 114,7

4,8

4,9

5

5,1

5,2

Box-and-Whisker plot

Box-and-Whisker Plot

4,7 4,9 5,1 5,3 5,5 5,7

Probability theory

(cumulative) distribution

function

density

density to distribution

function

( ) ( ).F t P X t

( ) ( ).d

f t F tdt

( ) ( )t

F t f x dx

The concept of probability density

density function

area denotes probability thatobservation falls between a and b

a b

Normal distribution

Normal distributionbell shaped curve

Important because of Central Limit Theorem

Normal distribution

• symmetric around µ (location of centre)

• spread parametrised by 2

– http://www.win.tue.nl/~marko/statApplets/functionPlots.html

– http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.

html

• µ=0 and 2=1: standard normal distribution Z

2

2

1( ) exp

22

- t μf t

σ π

More on normal distribution

Area between

0,67 is 0,500

1,00 is 0,683

1,645 is 0,975

1,96 is 0,950

2,00 is 0,954

2,33 is 0,980

2,58 is 0,990

3,00 is 0,997

Standardisation

X normally distributed with parameters en 2, then (X-)/ standard

normal

suppose

=3

2=4

6,2 6,2 3( 6,2) ( 1,6) 0,9452.

2

XP X P P Z P Z

Testing normality

• many statistical procedures implicitly assume normality

• if data are not normally distributed, then outcome of procedure may be

completely wrong

• user is always responsible for checking assumptions of statistical procedures

•Graphical checks:

– normal probability plot

– density trace

• Formal check

– Shapiro-Wilks test

Estimation of density function: histogram

Histogram for width

width

freq

uenc

y

265.3 265.5 265.7 265.9 266.10

5

10

15

20

25 curve: normal distribution withsample mean and variance as parameters

Drawbacks of the histogram

• misused for investigating normality• time ordering of data is lost• shape depends heavily on bin width + bin location:

Histogram for strength

strength

freq

uenc

y

24 29 34 39 44 49 540

1

2

3

4

5

Histogram for strength

strength

frequ

ency

24 29 34 39 44 49 540

1

2

3

4

5

• shape is stable for data sets of size 75 or larger• optimal number of bins n

samedata set

Alternative to histogram: Density Trace

Density Trace (also called naive density estimator):

• use moving bins instead of fixed bins

• choose bin width (automatically in Statgraphics)

• count number of observations in bin at each point

• divide by length of bin

Density Trace

Example dataset: 3.45 1.98 2.92 4.67 2.41

1.07 5.34 3.24 3.93

1 2 3 4 5 6

1/9

2/9

3/9

4/9

*

Choice of bin widths in density trace

• too small bin width yields too fluctuating curve

• too large bin width yields too smooth curve

Patterns in distribution – normal curve

• Depicted by a bell-shaped curve

• Indicates that measurement process is running normally

Patterns in distribution – bi-modal curve

• Distribution appears to have two peaks

• May indicate that data from more than process are mixed

together

Patterns in distribution – saw-toothed

Also commonly referred to as a comb distribution, appears as an

alternating jagged pattern

Often indicates a measuring problem

– improper gauge readings

– gauge not sensitive enough for readings

Testing normality

Mean,Std. dev.0,1

Normal Distribution

x-5 -3 -1 1 3 5

0

0.2

0.4

0.6

0.8

1

Normal Probability Plot

265.3 265.4 265.5 265.6 265.7 265.8 265.9

width

0.1

15

2050

80

9599

99.9

perc

enta

ge

Normally distributed?

-8 -4 0 4 80

0.1

0.2

0.3

0.4

Normal Probability Plot of not normally distributed data

Normal Probability Plot

-10 -7 -4 -1 2 50.1

1

520

50

8095

99

99.9

per

cen

tag

e

• statistical test for Normality: Shapiro-Wilks

• idea: sophisticated regression analysis in the spirit of normal

probability plot

• makes Normal Probability Plot objective

• check outliers (measurement error?; normality sometimes disturbed

by single observation)

• analyse if not normally distributed

Test for Normality: Shapiro-Wilks

Tests for Normality for width

Computed Chi-Square goodness-of-fit statistic = 254.667P-Value = 0.0

Shapiro-Wilks W statistic = 0.921395P-Value = 0.000722338

Statgraphics: Shapiro Wilks

Interpretation: • value statistic itself cannot need be interpreted• P-value indicates how likely normal distribution is• use = 0.01 as critical value in order to avoid too strict rejections of

normality

Dixon’s test

• Box-and-Whisker plot graphical test of outliers

• if data are normally distributed, then formal test may be used:

Dixon's Test (assumes normality)------------------------------------------------------------------ Statistic 5% Test 1% Test1 outlier on right 0,612903 Significant Significant1 outlier on left 0,314286 Not sig. Not sig.2 outliers on right 0,66129 Significant Not sig.2 outliers on left 0,342857 Not sig. Not sig.1 outlier on either side 0,520548 Significant Not sig.

Disadvantages of point estimators

• 95% confidence interval for µ: probability 0.95 that interval contains

true value µ

• more observations narrower interval (effect in particular for n <

20)

• higher confidence wider interval

• example : =0,05

Confidence intervals

/ 2 σ

x zn

/ 2 1,96z

Confidence intervals: example

Confidence Intervals for meting-------------------------------95,0% confidence interval for mean: 4,994 +/- 0,0875612 [4,90644;5,08156]95,0% confidence interval for standard deviation: [0,0841923;0,223458]

Confidence Intervals for meting-------------------------------99,0% confidence interval for mean: 4,994 +/- 0,125791 [4,86821;5,11979]99,0% confidence interval for standard deviation: [0,0756051;0,278784]

Summary Statistics for meting

Count = 10Average = 4,994Median = 5,01Variance = 0,0149822Standard deviation = 0,122402Standard error = 0,0387069Minimum = 4,78Maximum = 5,15Range = 0,37Interquartile range = 0,2

Hypothesis testing

• example: test whether there is a systematic error

Hypothesis Tests for metingSample mean = 4.994Sample median = 5.01t-test------Null hypothesis: mean = 5.0Alternative: not equalComputed t statistic = -0.155011P-Value = 0.880233Do not reject the null hypothesis for alpha = 0.05.

Date post:	20-Dec-2015
Category:	Documents
View:	214 times
Download:	0 times

k 2DS00 Statistics 1 for Chemical Engineering /k Lecturers Dr. A. Di Bucchianico – Department of...

Documents