Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
2DS00
Statistics 1 for Chemical
Engineering
Lecturers• Dr. A. Di Bucchianico
– Department of Mathematics,
– Statistics group
– HG 9.24
– phone (040) 247 2902
• Ir. G.D. Mooiweer,
– Department of Mathematics
– ICTOO
– HG 9.12
– phone 040 247 4277
(Thursdays)
•Dr. R.W. van der Hofstad
– Department of Mathematics,
– Statistics group
– HG 9.04
– phone (040) 247 2910
Goals of this course
• to prepare students for (first-year) laboratory assignments
• to learn students how to perform basic statistical analyses of
experiments
• to learn students how to use software for data analysis
• to learn students how to avoid pitfalls in analysing measurements
Important to remember
• Web site for this course: www.win.tue.nl/~sandro/2DS00/
• No textbook, but handouts (Word) + Powerpoint sheets through
web site
• Bring notebook to both lectures and self-study
• (Optional) buy lecture notes 2256 “Statgraphics voor regulier
onderwijs”
• (Optional) buy lectures notes 2218 “Statistisch Compendium”
How to study
• read lecture notes briefly before lecture
• ask questions during lecture
• study lecture notes carefully after lecture
• make excercises during guided self-study
• reread lecture notes after guided self-study
• try out previous examinations shortly before the examination
N.B. Lecture notes (pdf documents) PowerPoint files
Week schedule
Week 1: Measurement and statistics
Week 2: Error propagation
Week 3: Simple linear regression analysis
Week 4: Multiple linear regression analysis
Week 5: Nonlinear regression analysis
Detailed contents of week 1
• measurement errors
• graphical displays of data
• summary statistics
• normal distribution
• confidence intervals
• hypothesis testing
Measurements and statistics
• perfect measurements do not exist
• possible sources of measurement errors:
– reading
– environment
• temperature
• humidity
• ...
– impurities
– ...
Necessity of good measurement system
Three experiments
Experiment 1
4,5 5 5,5 60
Experiment 3
4 4,5 5 5,50
1
Experiment 2
4,5 5 5,5 6 6,5 70
Types of measurement errors
• Random errors
– always present
– reduce influence by averaging repeated measurements
• Systematic errors
– requires adjustment/repair of measuring devices
• Outliers
– recording errors
– mistakes in applying procedures
Illustration of measurement concepts
Accuracy
difference between average of measured values and true value
Accuracy
• relates to systematic errors
• absolute error
• relative error
ti ie x x
rel,i t
it
x xe =
x
Location statistics
• mean
• median
• trimmed means
Precision
the degree in which consistent results are obtained
Accurate and precise
Statistics for precision: standard deviation & co• standard deviation
• standard error
• variation coefficient
• variance
•range
2 2 2
11
1 1
nn
iiii
x
x nxx xs
n n
/x xs s n/xCV xs
minmaxR
22 2 2
1 1
1 1
1 1
n n
x x i ii i
v s x x x nxn n
Robust statistics for precision
• robust statistics
– less sensitive to outliers
– difficult mathematical theory
– requires use of statistical software
•interquartile range
– IQR = 75% quantile – 25% quantile = 3rd quartile – 1st quartile
• mean absolute deviation
1
1
1
n
ii
MAD x xn
Graphical displays
• always make graphical displays for first impression
• “one picture says more than 1000 words”
Plot of calcium vs time
time
calc
ium
0 3 6 9 12 15-0,1
0,9
1,9
2,9
3,9
4,9
5,9
2 3.1 4 1.9 2.8
Basic graphical displays
• scatter plot
– watch out for scale (automatic resizing)
• time sequence plot
– for detecting time effects like warming up
• Box-and-Whisker plot
– outliers
– quartiles
– skewness
Time sequence plot
Nummer van de waarneming
met
ingTime Sequence Plot
1 3 5 7 9 114,7
4,8
4,9
5
5,1
5,2
Box-and-Whisker plot
Box-and-Whisker Plot
4,7 4,9 5,1 5,3 5,5 5,7
Probability theory
(cumulative) distribution
function
density
density to distribution
function
( ) ( ).F t P X t
( ) ( ).d
f t F tdt
( ) ( )t
F t f x dx
The concept of probability density
density function
area denotes probability thatobservation falls between a and b
a b
Normal distribution
Normal distributionbell shaped curve
Important because of Central Limit Theorem
Normal distribution
• symmetric around µ (location of centre)
• spread parametrised by 2
– http://www.win.tue.nl/~marko/statApplets/functionPlots.html
– http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.
html
• µ=0 and 2=1: standard normal distribution Z
2
2
1( ) exp
22
- t μf t
σ π
More on normal distribution
Area between
0,67 is 0,500
1,00 is 0,683
1,645 is 0,975
1,96 is 0,950
2,00 is 0,954
2,33 is 0,980
2,58 is 0,990
3,00 is 0,997
Standardisation
X normally distributed with parameters en 2, then (X-)/ standard
normal
suppose
=3
2=4
6,2 6,2 3( 6,2) ( 1,6) 0,9452.
2
XP X P P Z P Z
Testing normality
• many statistical procedures implicitly assume normality
• if data are not normally distributed, then outcome of procedure may be
completely wrong
• user is always responsible for checking assumptions of statistical procedures
•Graphical checks:
– normal probability plot
– density trace
• Formal check
– Shapiro-Wilks test
Estimation of density function: histogram
Histogram for width
width
freq
uenc
y
265.3 265.5 265.7 265.9 266.10
5
10
15
20
25 curve: normal distribution withsample mean and variance as parameters
Drawbacks of the histogram
• misused for investigating normality• time ordering of data is lost• shape depends heavily on bin width + bin location:
Histogram for strength
strength
freq
uenc
y
24 29 34 39 44 49 540
1
2
3
4
5
Histogram for strength
strength
frequ
ency
24 29 34 39 44 49 540
1
2
3
4
5
• shape is stable for data sets of size 75 or larger• optimal number of bins n
samedata set
Alternative to histogram: Density Trace
Density Trace (also called naive density estimator):
• use moving bins instead of fixed bins
• choose bin width (automatically in Statgraphics)
• count number of observations in bin at each point
• divide by length of bin
Density Trace
Example dataset: 3.45 1.98 2.92 4.67 2.41
1.07 5.34 3.24 3.93
1 2 3 4 5 6
1/9
2/9
3/9
4/9
*
Choice of bin widths in density trace
• too small bin width yields too fluctuating curve
• too large bin width yields too smooth curve
Patterns in distribution – normal curve
• Depicted by a bell-shaped curve
• Indicates that measurement process is running normally
Patterns in distribution – bi-modal curve
• Distribution appears to have two peaks
• May indicate that data from more than process are mixed
together
Patterns in distribution – saw-toothed
Also commonly referred to as a comb distribution, appears as an
alternating jagged pattern
Often indicates a measuring problem
– improper gauge readings
– gauge not sensitive enough for readings
Testing normality
Mean,Std. dev.0,1
Normal Distribution
x-5 -3 -1 1 3 5
0
0.2
0.4
0.6
0.8
1
Normal Probability Plot
265.3 265.4 265.5 265.6 265.7 265.8 265.9
width
0.1
15
2050
80
9599
99.9
perc
enta
ge
Normally distributed?
-8 -4 0 4 80
0.1
0.2
0.3
0.4
Normal Probability Plot of not normally distributed data
Normal Probability Plot
-10 -7 -4 -1 2 50.1
1
520
50
8095
99
99.9
per
cen
tag
e
• statistical test for Normality: Shapiro-Wilks
• idea: sophisticated regression analysis in the spirit of normal
probability plot
• makes Normal Probability Plot objective
• check outliers (measurement error?; normality sometimes disturbed
by single observation)
• analyse if not normally distributed
Test for Normality: Shapiro-Wilks
Tests for Normality for width
Computed Chi-Square goodness-of-fit statistic = 254.667P-Value = 0.0
Shapiro-Wilks W statistic = 0.921395P-Value = 0.000722338
Statgraphics: Shapiro Wilks
Interpretation: • value statistic itself cannot need be interpreted• P-value indicates how likely normal distribution is• use = 0.01 as critical value in order to avoid too strict rejections of
normality
Dixon’s test
• Box-and-Whisker plot graphical test of outliers
• if data are normally distributed, then formal test may be used:
Dixon's Test (assumes normality)------------------------------------------------------------------ Statistic 5% Test 1% Test1 outlier on right 0,612903 Significant Significant1 outlier on left 0,314286 Not sig. Not sig.2 outliers on right 0,66129 Significant Not sig.2 outliers on left 0,342857 Not sig. Not sig.1 outlier on either side 0,520548 Significant Not sig.
Disadvantages of point estimators
• 95% confidence interval for µ: probability 0.95 that interval contains
true value µ
• more observations narrower interval (effect in particular for n <
20)
• higher confidence wider interval
• example : =0,05
Confidence intervals
/ 2 σ
x zn
/ 2 1,96z
Confidence intervals: example
Confidence Intervals for meting-------------------------------95,0% confidence interval for mean: 4,994 +/- 0,0875612 [4,90644;5,08156]95,0% confidence interval for standard deviation: [0,0841923;0,223458]
Confidence Intervals for meting-------------------------------99,0% confidence interval for mean: 4,994 +/- 0,125791 [4,86821;5,11979]99,0% confidence interval for standard deviation: [0,0756051;0,278784]
Summary Statistics for meting
Count = 10Average = 4,994Median = 5,01Variance = 0,0149822Standard deviation = 0,122402Standard error = 0,0387069Minimum = 4,78Maximum = 5,15Range = 0,37Interquartile range = 0,2
Hypothesis testing
• example: test whether there is a systematic error
Hypothesis Tests for metingSample mean = 4.994Sample median = 5.01t-test------Null hypothesis: mean = 5.0Alternative: not equalComputed t statistic = -0.155011P-Value = 0.880233Do not reject the null hypothesis for alpha = 0.05.