Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | lien-huong |
View: | 30 times |
Download: | 0 times |
Statistics for Analytical Statistics for Analytical ChemistryChemistry
Lecture NotesLecture Notes
Dr. Ta Thi ThaoDr. Ta Thi Thao
Syllabus( 2 credits)
Introduction of Analytical process Chapter 1: Error in analytical chemistry Chapter 2: Descriptive statistics Chapter 3: Basic Distributions Chapter 4: Significant Test Chapter 5: ANOVA Chapter 6: Correlation and Regression Chapter 7: QA/QC Software: EXCEL, ORIGIN, MINITAB,
MATLAB, STATGRAPHICS, SPSS…
What is analytical chemistry?• Almost all chemists routinely make qualitative or
quantitative measurements.• Analytical chemistry is not a separate branch of
chemistry, but simply the application of chemical knowledge.
The craft of analytical chemistry is not in performing a routine analysis on a routine sample but in improving established methods, extending existing methods to new types of samples, and developing new methods for measuring chemical phenomena.
The analytical process1. Definition problems
2. Chose methods
3. Sampling
4. Sample preparation
5. Chemical Separation 6. Analysis
7. Data processing and Report Results
The analytical process (cont.)1. Define the problem What need to be found? Qualitative or quantitative? What will the information be used for? Who will
be used? When will it be needed? How accurate and precise does it have to be? What is the fund? The analysts should consult with the clients to
plan the useful and efficient analysis, including how to obtain a useful sample.
The analytical process (cont.)2. Choose methods Sample type; - Size of sample; Sample preparation needed; Concentration and range (sensitivity needed) Selectivity needed (interferences) Accuracy and precision needed Tools/ instruments available Expertise/ experience Cost - Speed Does it need to be automated? Are method available in the chemical literature? Are standard methods available?
The analytical process (cont.)3. Sampling Sample type Representative/ random sample Sample size Minimum sample number Sampling statistics/ error
The analytical process (cont.)4. Sample preparation Samples are solid, liquid or gas? Dissolve? Ash or digest? Chemical separation or masking of
interferences needed? Need to be concentrate the analysts? Need to change analyst for detection? Need to adjust solution conditions ( pH, add
reagent…)
The analytical process (cont.)
5. Chemical Separation if necessary
Distillation
Precipitation
Solvent Extraction
Solid phase extraction
Chromatography
Electrophoresis
May be done as part of the measurement step
The analytical process (cont.)6. Analysis- Calibration- Validation/controls/ blanks- Replicates
7. Data processing and Report ResultsStatistical analysis
Report the results
Signal Instrumental MethodsEmission of radiation Emission spectroscopy (X-ray, UV, visible, electron,
Auger); fluorescence, phosphorescence, and luminescence (X-ray, UV, and visible)
Absorption of radiation Spectrophotometry and photometry (X-ray, UV, visible, IR); photoacoustic spectroscopy; nuclear magnetic resonance and electron spin resonance spectroscopy
Scattering of radiation Turbidimetry;nephelometry; Raman spectroscopyRefraction of radiation Refractometry; interferometry
Diffraction of radiation X-ray and electron diffraction methods
Rotation of radiation Polarimetry; optical rotatry dispersion;circular dichroism
Electrical potential Potentiometry; chronopotentiometryElectric charge Coulometry
Electric current Polarography; amperometryElectrical resistance ConductometryMass-to-charge ratio
Rate of reactionMass spectrometry
Kinetic methodsThermal properties Thermal conductivity and enthalpy methodsRadioactivity Activation and isotope dilution methods
Types of Instrumental Methods
Comparison of Different analytical methods
Method Approx. range
( mol/L)
Approxprecision (%)
Selectivity
Speed Cost Principle uses
Gravimetry
Titrimetry
Potentionmetry
Electrogravimetry,
coulometry
Voltammetry
Spectrophotometry
Fluorometry
Atomic spectrometry
Chromatography
Kinetic methods
10-1-10-2
10-1-10-4
10-1 -10-6
10-1-10-4
10-3 -10-10
10-3-10-6
10-6-10-9
10-3 -10-9
10-3 -10-9
10-2-10-10
0.1
0.1-1
2
0.01-2
2-5
2
2-5
2-10
2-5
2-10
Poor- mod.
Poor- mod.
Good
Moderate
Good
Good-mod.
Moderate
Good
Good
Good-Mod.
Slow
Mod.
Fast
Slow-mod.
Moderate
Fast- mod.
Moderate
Fast
Fast-Mod.
Fast- Mod.
Low
Low
Low
Mod.
Mod.
Low- Mod.
Mod.
Mod.- High
Mod.-high
Mod.
Inorg.
Inorg., Org.
Inorg.
Inorg., org.
Inorg., org.
Inorg., org.
Org.
Inorg- Multiele.
Org. Multicom.
Inorg.,org,enzyme
Validation of a method Precision must be checked by analyze replicate
samples Accurate result must be obtained by:+ use proper calibration= Analyses spiked sample+ compare the sample’s results with the those
obtained with another accepted method+ analyze the standard reference material of
known composition+ Run control sample at least daily To assure validation method, apply the
guidelines of good laboratory practice (GLP)
The Laboratory Notebook Use to record your job in a analytical chemist,
documents everything you do. Some good rules:+ Use a hardcover notebook+Number pages consecutively+ Record only in ink+ Never tear out pages.+ Date each page, sign it and have it signed by
someone else.+ Record the name of the project, why it being
done, and any literature references+ Record all data on the day you obtain it.
The Laboratory Notebook An example of laboratory notebook:
+Date of experiment
+ Name of experiment
+ Principle
+ Reaction for determination:
+ Work of standardization of preparation of chemicals, reagents…
+ The way to calculate; raw data of experiment, calculate the average and Standard deviation.
+ The final result.
Chapter 1:
Error in Anal. Chem.
1. Error
2. Absolute and Relative error
3. Systematic and random error
4. Outliers and accumulative error
5. Repeatability, reproducibility
6. Precision and accuracy* Every measurement that is made is subject to a number of errors. If you cannot measure it, you cannot know it.
A. Einstein
absolute error X = X=
= measured value – true value
EA= x – relative error = x = X / X
percent relative error = x x 100 (%)
Absolute and Relative Error
Random Error (indeterminate error)
– Cannot be determined (no control over)– A result of fluctuations (+ and -) in random variables– Multiple trials help to minimize
Random errors can be reduced by:- Better experiments (equipment, methodology, training of
analyst)- Large number of replicate samples
• Random errors show Gaussian distribution for a large number of replicates
• Can be described using statistical parameters
Systematic Error (determinate error)
• Known cause: - Operator- Calibration of glassware, sensor, or instrument a result of
a bias in one direction
– A result of a bias in one direction ( + or -)– When determined can be corrected– May be of a constant or proportional nature
To detect a systematic error:
• Use Standard Reference Materials• Run a blank sample• Use different analytical methods• Participate in “round robin” experiments (different labs and
people running the same analysis)
Types of ErrorProportional error influences the slope.
Constant error influences the intercept.
If the nature of the error is not known (random or systematic?) then the following rules will apply:
Accumulate ErrorAccumulate Error
Addition and subtractions
When adding or subtracting measurements the absolute errors are added.
Example 1:
X X
mass of beaker plus sample 21.1184 g 0.0003 g
mass of empty beaker 15.8465 g 0.0003 g
mass of sample 5.2719 g 0.0006 g (errors added !)
(21.1184 0.0003) g – (15.8465 0.0003) g = (5.2719 0.0006) g
Multiplication and division
When multiplying or dividing measurements the relative errors are added.
Consequently the absolute errors of the measurements must first be converted to relativeerrors.
Example 1:
A = (1.56 0.04) cm, A = 0.04 cm A = 0.04 cm / 1.56 cm = 0.0256
B = (15.8 0.2) cm2, B = 0.2 cm2 B = 0.2 cm2 / 15.8 cm2 = 0.0127
Product of A and B: AB = (1.56 cm)(15.8 cm2) = 24.648 cm3 = 24.6 cm3 to 3 SF
Adding relative errors: AB = A + B = 0.0256 + 0.0127 = 0.0383 = 0.04
The % relative error in the product AB is therefore = 4 %
Sampling
Preparation
Analysis
Representativesample
homogeneousvs.
heterogeneousLoss
Contamination(unwanted addition)
Measurement of Analyte
Calibration of Instrument orStandardsolutions
How about sampling a chocolate chip cookie?
1. Static Error 2. Dynamic Error 3. Insertion and Loading Errors4. Instrument Error 5. Human Error6. Theoretical Error 7. Miscellaneous Error
Repeatability, reproducibility
The closeness of agreement between independent results obtained with the same
method on identical test material,
• under the same conditions (same operator, same apparatus, same laboratory and after short intervals of time) (repeatability).
• under different conditions (different operators, different apparatus, different laboratories and/or after different intervals of time) (reproducibility).
Accuracy and Precision
True value – standard or reference of known value or a theoretical value
Accuracy – closeness to the true value
Precision – reproducibility or agreement with each other for multiple trials.
Accuracy------ Precision • Only obtained if
measured values agree with true values
• Must reduce systemic & random error to improve accuracy
• Always requires the use or comparison to a known standard
• Describes the range of spread of the individual measurements from the average value for the series
• Describes the reproducibility of the measurement
• Improves with reduction in random error
Exercise 1
Fig. 1:
Exercise 2
Exercise 3
Exercise 4
Exercise 5
What kind of error?
Chapter 2:
Descriptive statistics
• How do you assess the total error?
- One way to assess total error is to treat a reference standard as a sample.
- The reference standard would be carried through the entire process to see how close the results are to the reference value.
Accuracy and Precision
Nature of accuracy and precision
Both accurate and precise
Precise only Neither accurate nor precise
Mathematical comments
•Small standard deviation or %CV•Small %error
•Small standard deviation or %CV•Large %error
•Large
standard deviation or %CV•Large %error
The center ofthe target isthe true value.
Scientific comments
Very small error in measurement
All cluster the true value
Remember a standard or true value is needed
Clustered multiple measurements but consistently off from true value
Calibration of probe or other measuring device is off or unknown systematic error
The shot-gun effect
Get a new measurement system or operator
Both a & p Precision only Neither a nor p
Expressing accuracy and precision
• Mean (average)
• Percent error
• Range
• Deviation
• Standard deviation
• Percent coefficient of variationprecision
accuracy
(See also in chapter 3)
Population vs. sample
• Population = the entire collection of itemse.g. all 100 mg vitamin C tablets produced
Sample = a portion of the population
e.g. a bottle of vitamin C pillsGenerally only data for samples is available
since it is generally impossible to obtain data for the whole population
Standard Deviation of the…
• Population
Actual variation in the population
• Sample – part of population
Estimatesthe variation
in the population- May not be
representative sample
sx x
N
xx
NN
ii
N
i
ii
N
i
N
_ 2
1
2 1
2
1
1 1
N
xxN
ii
1
2
Why divide by N-1 when calculating “s”?
• N-1 = degrees of freedom (Df) of sample– number of independent values on which a
result is based, or the number of values in the final calculation of a statistic that are free to vary
– for a population Df = N– for a sample Df = N-1
• one Df lost when calculating the Average of a sample
More on Dfs
To calculate the std. dev. of a random sample, we must first calculate
the mean of that sample and then compute the sum of the several
squared deviations from that mean.
While there will be n such squared deviations only (n - 1) of them are, in fact, free to assume any value whatsoever.
This is because the final squared deviation from the mean must include the one value of X such that the sum of all the Xs divided by n will equal the obtained mean of the sample.
All of the other (n - 1) squared deviations from the mean can, theoretically, have any values whatsoever.
For these reasons, std. dev. of a sample is said to have only (n - 1) degrees of freedom.
Population DataFor an infinite set of data,
N N → → ∞∞ then x x → → µµ and s s → → σσ
population mean population std. dev.
The experiment that produces a small standard deviation is more precise .
Remember, greater precision does not imply greater accuracy.
Experimental results are commonly expressed in the form:
mean standard deviation
sx
_
Standard deviation of the mean (standard error)
• When the standard deviation of several mean values is taken, the amount of deviation between the mean values will be reduced by a factor proportional to the square root of the number of data points (N) present in each set used to calculate each mean value
• s = standard deviation between individual values
• sm = standard deviation between mean values
ss
Nm
• Variance
• Relative standard deviation
• Percent RSD / coefficient of variation
x
sRSD
Other ways of expressing the precision of the data:
Variance = s2
100x
s%RSD
Box and whisker plot on Minitab 14
median
range
Large variation
Small variatio
n
outlies
The same rules apply to calculations involving standard deviations (assuming the standard deviation is due onlyto random errors)
If the nature of the errors are If the nature of the errors are ALL KNOWN TO BE ALL KNOWN TO BE RANDOMRANDOM ERRORS ERRORS
then the following set of rules will applythen the following set of rules will apply
Significant Figures• The number of digits reported in a measurement reflect the
accuracy of the measurement and the precision of the measuring device.
• The results are reported to the fewest significant figures (for multiplication and division) or fewest decimal places (addition and subtraction.
• Significant Figures1. Digits 1 6 9…2. Zeros between Significant Digits3. Terminal Zeros to Right of Decimal4. Terminal Zeros to Left of Decimal
(two thoughts)5. Place holding zeros
Except log x = 0.025
Rounding off• When the answer to a calculation contains too
many significant figures, it must be rounded off. This approach to rounding off is summarized as
follows. If the digit is smaller than 5, drop this digit and
leave the remaining number unchanged. Thus, 1.684 becomes 1.68.
If the digit is 5 or larger, drop this digit and add 1 to the preceding digit. Thus, 1.247 becomes 1.25.
If the last digit is 5, the number is rouded off to the nearest even digist
Methods of Expressing Uncertainty in Results
A. Three methods 1. Record Uncertainty (Absolute) 2. Record Relative Uncertainty in % 3. Use of Significant Digits + Record all accurately known digits + a digit that is uncertain
B. Method assumes that the last digit recorded is uncertain by 1 unless stated differently
Examples of presentation the data
Weight Measured 9.82 ± 0.2385 g = 9.82 ± 0.02 g
6051.78 ± 30 m/s = 6050 ± 30 m/s
For stating uncertainty:
-Round uncertainty to one significant figure….unless δx has a 1 as a leading digit δx = 0.14 then δx =0.14 not 0.1
For calculation though you should retain one significant more than justified
Chapter 3: Basic Distribution What is a Distribution? The pattern of variation of a variable is
called its distribution, which can be described both mathematically and graphically. In essence, the distribution records all
possible numerical values of a variable and how often each value occurs (its frequency). Can be either discrete or continuous
Which statistical test is appropriate will depend upon the distribution of your data.
From: http://stat.tamu.edu/stat30x/notes/node16.html
Types of Distributions Binomial
Distribution Normal
Distribution Poisson
Distribution Exponential
Distribution Logistic
Distribution t-Distribution
Chi-squared Distribution
F-Distribution Gamma Distribution Hypergeometric Laplace Distribution
Note that distributions can be either discrete or continuous
Binomial Distribution Graphic
From http://mathworld.wolfram.com/BinomialDistribution.html
For a large number of experiment replicates the results For a large number of experiment replicates the results approach an ideal smooth curve called theapproach an ideal smooth curve called the GAUSSIAN GAUSSIAN oror NORMAL DISTRIBUTION CURVENORMAL DISTRIBUTION CURVE
Characterised by:
The mean value The mean value xx
gives the center of the distribution
The standard The standard deviation sdeviation s
measures the width of the distribution
22 /2)(xe2
1y σμ
πσ
The Gaussian curve whose area is unity is called a normal error curve.
µ = 0 and µ = 0 and σσ = 1 = 1
The Gaussian curve equation:
πσ 2
1= Normalization factor
It guarantees that the area under the curve is unity
Probability of measuring a value in a certain range = area below the graph of that range
Gaussian Distribution of Random Errors (Population)Gaussian Distribution of Random Errors (Population)
Another way to represent a Gaussian distribution is to relate it to a new variable, z, on the x-axis
s
xxxz
_
σμ
Where
z = deviation from the mean of a data point stated in terms of units of std dev.
Gaussian Distribution of Random ErrorsGaussian Distribution of Random Errors
Range Percentage of measurements
µ ± 1σ 68.3
µ ± 2σ 95.5
µ ± 3σ 99.7
The more times you measure, the more confident you are that your average value is approaching the “true” value.
The uncertainty decreases in proportion to n1/
The standard deviation measures the width of the Gaussian curve.
(The larger the value of σ, the broader the curve)
Normal distribution
Data Transformation
What do you do if your data is not normally distributed? Use a non-parametric test Transform your data
Logarithmic transformation: Variable x log (Variable x +1) Power transformation: e.g. Variable x √(Variable x) Angular transformation: e.g. Variable x arcsine (√(Variable x))
Poisson Distribution Typically used to model the number of
random occurrences of some phenomenon in a specified unit of space or time. E.g. The number of birds seen in a 10 min
period Can usually be approximated by a normal
distribution
Exponential DistributionDescribes a sample
where y= x^a
Messy to work with, but can be transformed (sometimes) or you can use a non-parametric test
Logistic Distribution Typically describes
sample that fits y = log (x) Again, messy to
work with (sometimes) but can be transformed or you can use a non-parametric test
T-distributions
Normal distribution
- As N (DF) increases t-distribution is less spread out.- As N (DF) increases t-distribution is less spread out.- At large N t-distribution approaches shape of - At large N t-distribution approaches shape of Gaussian Gaussian distribution.distribution.
T-distribution ( 1-sided)
F-Distribution A distribution that
typically arises when testing whether two variables have the same variance
It’s the ratio of two independent chi-squared statistics
ANOVAs are based on F distributions
Chi-squared Distribution This is also based upon degrees of
freedom Can be used to approximate many
different distributions For example, may be used to approximate the
sampling distribution of the likelihood ratio statistic (may cover this later)
Chi-square Distribution examples
Estimating Random Error The random error (x)in a set of data can
be estimated by multiplying Sm by a statistical function called the student-t distribution function
x t st s
Np v m
p v ,,
Confidence intervals X at a given confidence level (say 95%)
implies that the true value will be found within X of the calculated mean
xt s
Np v,
xt s
Nx
t s
Np v p v
, ,
Chapter 4: Significant Test
• Hypothesis:
• F-test compares levels of PRECISION
• T-test compares levels of ACCURACY
Rearranging Student’s t equation:
µ = true population mean
x = measured mean
n = number of samples needed
s2 = variance of the sampling operation
e = sought-for uncertainty
Required number of
replicate analyses:
e
n
tsx 2
22
e
stn
Since degrees of freedom is not known at this stage, the value of t=1.96 for n → ∞ is used to estimate n.
The process is then repeated a few times until a constant value for n is found.
How many samples/replicates to analyze?
Comparing a mean value to the true value (1t)
• Calculate a “t” value as shown below
• Compare to a value of t in a t-table at the appropriate confidence level and DF
• If the tcalc > ttable the two results are significantly different
tx N
scalc ( )
xts
N
xts
N
x N
st
Comparing two sets of data• Comparison of means (T-test) (2t)
– unpaired data• samples from the same population
» e.g. comparing the results for the analysis water samples performed by two different labs (water samples from the same population)
– paired data• samples from different populations
» e.g. comparing cholesterol levels in different individuals using two analytical methods
• Comparison of variances (F-test)– unpaired data only
Comparison of variances (F-test)
-Calculate F :
-Compare Fcal. with Ftable
-If Fcal.> Ftable (2-tailed-test) then s1 and s2 are statistically comparable. • S1and S2 are significant differences
(Pvalue<-level=0.05)
122
21
. S
SFcal
Which type of t test should be use
Comparing two means (unpaired data)
• Textbook method: 1. Comparison of variances
2. Comparison of mean
• if S1 and S2 are not significantly different • once tcalc is determined compare to ttable ttable determined for f= n1+n2-2 • if tcalc > ttable then difference is significant(Pvalue>-level=0.05)
ss n s n
n npooled
12
1 22
2
1 2
1 1
2
( ) ( )
tx x
s
n n
n ncalcpooled
1 21 2
1 2
Comparing two means (unpaired data) (cont.)
• Textbook method– if s1 and s2 are NOT statistically
comparable (F-test fails) or S1 and S2 are significantly different.
– tcalc and DF for ttable need to be determined as follows:
if tcalc > ttable then difference is significant
(Pvalue<-level=0.05)
tx x
sn
sn
calc
1 2
12
1
22
2
DF
sn
sn
sn
n
sn
n
12
1
22
2
2
12
1
2
1
22
2
2
21 1
2
Comparing two means (unpaired data) (cont.)
• Mosi method– Calculate the confidence interval for each mean– Compare the confidence intervals
– The results are statistically comparable if the intervals overlap such that each interval overlaps with the mean value of the other interval as shown in the diagram below.
Comparing two means (paired data)
• Calculate differences between PAIRS of data: di=xAi-xBi
– values can be either + or -• do not take absolute values of differences!
• Calculate average and standard deviation of differences (sd)
• Calculate a t- value as shown below
• If the tcalc > ttable the two results are significantly different (f = N-1)
td N
scalcd
( )
d
1n
)d(ds
2i
d
Chapter 5: ANOVA (analysis of variance)
• t distribution: to test the hypothesis of no difference between two population/sample means.
• If we wish to know about the relative effect of three or more different “treatments”, t distribution can be used?.
• The t-test is inadequate in several ways.– Any statistic that is based on only part of the evidence (as
is the case when any two groups are compared) is less stable than one based on all of the evidence.
– There are so many comparisons that some will be significant by chance.
– It is tedious to compare all possible combinations of groups.
The logic of ANOVA• Hypothesis testing in ANOVA is about whether the
means of the samples differ more than you would expect if the null hypothesis were true.
• This question about means is answered by analyzing variances.– Among other reasons, you focus on variances because
when you want to know how several means differ, you are asking about the variances among those means.
• ANOVA is also used for evaluation of main/ interaction effects.
Some ANOVA notes• Hypothesis: H0: µ1 = µ2 = µ3 …. = µk
Ha: At least 2 of the means differ (Does NOT mean that all population means differ)
• The term variance refers to the statistical method being tested, not the hypothesis being tested(Does NOT test whether the variances of the groups are
different)
• The P value in an ANOVA has many tails• Reported as:
(one-way ANOVA, Fdf between groups, df within groups = , p = )
Assumptions of ANOVA
• Samples are randomly selected from larger populations.
• Sample groups are independent.
• Observations within each sample were obtained independently.
• The data from each population is normally distributed.
Two Sources of Variability• In ANOVA, an estimate of variability
between groups is compared with variability within groups.– Between-group variation is the variation
among the means of the different treatment conditions due to chance (random sampling error) and treatment effects, if any exist.
– Within-group variation is the variation due to chance (random sampling error) among individuals given the same treatment.
A N O VA
W ithin-Groups VariationV a ria tion du e to ch a nce .
Betw een-Groups VariationV a ria tion du e to ch an ce
a n d tre a tm e nt e ffe c t (if a ny e x is tis ).
Total Variation Am ong Scores
Variability Between Groups
• There is a lot of variability from one mean to the next.• Large differences between means probably are not due to
chance.• It is difficult to imagine that all six groups are random
samples taken from the same population.• The null hypothesis is rejected, indicating a treatment
effect in at least one of the groups.
One-way ANOVA formula• The one-way ANOVA fits data to this:
Yi,j = grand mean + group effect + εi,j
.Yi,j = the value of the i th subject and j th group
.Group effect = the difference between the means of population i and the grand mean
.Each εi,j is a random value from a normally distributed population with a mean of 0
The F Ratio
A N O VA (F)
M ean Squares W ithin
W ithin-Groups VariationV a ria tion du e to ch a nce .
M ean Squares Betw een
Betw een-G roups VariationV a ria tion du e to ch an ce
a n d tre a tm e nt e ffe c t (if a ny e x is tis ).
Total Variation Am ong Scores
F MSbetween
MSwithinbilityGroupVariaWithin
bilityGroupVariaBetween F
The F Ratio
F MSbetween
MSwithin
MSbetween SSbetween
dfbetweenMSwithin
SSwithin
dfwithin
SStotal SSbetween SSwithin
The F Ratio: SS Between
Grand Total (add all of the scores together, then square the total)
Total number of subjects.
N
G
n
TSSbetween
22
Find each group total, square it, and divide by the number of subjects in the group
2)( grandgroupbetween XXnSS
The F Ratio: SS Within
Squared group total.
Number of subjects in each group.
n
TXSSwithin
22
Square each individual score and then add up all of the squared scores
2)( groupwithin XXSS
The F Ratio: SS Total
1 - groups ofnumber betweendf
SStotal X2 G 2
N Grand Total (add all of
the scores together, then square the total)
Total number of subjects.
Square each score, then add all of the squared scores together.
)()()( 2groupgrandgroupgrandtotal XXXXXXSS
Degrees of Freedom:Degrees of Freedom:Between: Between:
Within:Within:groups ofnumber total- subjects ofnumber total
...111 321
within
within
df
nnndf
1
2
3
Two-Way ANOVA
• Two-Way ANOVA uses same error term
as One-Way ANOVA
– Average of within-cell variances (SSWC/dfWC)
• Difference is that between-cell SS is
partitioned into each main effect (rows,
columns) and the interaction
SSR, SSC, SSRXC
Two-Way ANOVA
Two-Way ANOVA
Two-Way ANOVA
Two-Way ANOVA
Latin square• Latin squares has counterbalancing built in
Nr of rows equals the nr of columns• The letter presenting treatments appears in
each column and row only once• Effects of treatment, order and sequence are
isolated –systematic counterbalancing• Order 1 2 3• seq 1 A B C• seq 2 B C A• seq 3 C A B
Latin Squares
Latin Squares
Chapter 6: Correlation and Linear Regression analysis
6.1. Bivariate Correlation:- is used to measure the strength of the
linear relationship between variables.
- measures how variables or rank orders are related.
-computes Pearson’s correlation coefficient and Spearmen’s rank correlation.
Assumptions
Subjects are representative of a larger populationPaired samples (must have 2 variables) are
Independent observationsX and Y values must be measured independentlyX values are measured but not controlledNormal distribution (if not use Spearman’s rank
correlation)All covariation must be linear
Note that outliers have a large influence in correlation
Scatter Diagram
Designate one variable X and the other Y. Although it does not matter which is which, in cases
where one variable is used to predict the other, X is the “predictor” variable (the variable you’re predicting from).
Draw axes of equal length for your graph. Determine the range of values for each variable. Place
the high values of X to the right on the horizontal axis and the high values of Y toward the top of the vertical axis. Label convenient points along each axis.
For each pair of scores, find the point of intersection for the X and Y values and indicate it with a dot.
Pearson correlation
- Compute for correlation coefficient (r) which indicated the strength that variables are linearly related in a sample.
-The significance test for “r” reveals whether there is a linear relationship between variables in population.
Pearson’s r assumes an underlying linear relationship (a relationship that can be best represented by a straight line).
Not all relationships are linear
Correlation Analysis
With a simple two variable correlation, you need to know the strength and direction of the correlation
Scatterplots help illustrate the relationships between variables
Pearson’s r
Definitional formula:
Computational formula:
))()()((
))(()(2222
YYnXXn
YXXYnr
r COVXY
(sx )(sy) n
YYXXCOVXY
))((
separately vary Y and X which todegree
ther vary togeY and X which todegreer
Strength of Relationship
How can we describe the strength of the relationship in a scatter diagram?
Pearson’s r. A number between -1 and +1 that indicates the
relationship between two variables.
The sign (- or +) indicates the direction of the relationship. The number indicates the strength of the relationship.
-1 ------------ 0 ------------ +1Perfect Relationship No Relationship Perfect Relationship
Correlation Coefficient
is the best-known and easiest technique
rs is given by the equation:
where d is the difference between rankings in two
ranking methods
When N 10, rs can be used to calculate a t-score with
the equation and the resulting t-score is used in
a two-tailed test of significance
Spearman Rank Correlation
1)-N( N
d6
- 1 = r2
2N
=1is
Kendall Rank Correlation Coefficient ()
More complicated than the Spearman rankShould be used when three or more sets of
rankings are comparedCalculated by the proportion of concordant pairs
minus the proportion of discordant pairs There exist two bivariate observations, (xi,yi) and (xj,yj)
Concordant pairs are when (xi-xj)*(yi-yj) are positive
Discordant pairs are when (xi-xj)*(yi-yj) is negative
Scores range from -1 to 1
Goodman and Kruskal’s Lambda (λ)
λ is used when nominal scales are usedSpearman rank and won’t work because
the ordering element is missing with nominal scales
λ can be calculated by statistical packages
Partial Correlations (rP)
- To indicate the degree of two variables are linearly related in a sample, partialling out the effects of one or more control variables.
- To interpret partial correlation between two variables we must know the bivariate correlation between the variables first.
- To conduct a partial correlation, there must be at least three variables.
can be used in the following ways…Partial correlation between two variablesPartial correlation among multiple variables within
a setPartial correlation between sets of variables
Method of Least Squares Assumptions
The uncertainties in the y-values are greater than those in the x values.
The line representing the data should be drawn so that the deviations of the y-values are minimized.
Thus the best fit line or the least squares line is the straight line drawn that minimizes the vertical deviations (residuals) between the points on the line deviations cab be positive or negative ->should minimize the sum of the squares of the deviations.
The linear relationship of the analyte content and the measured signal: Y = mX + b or signal = m (Conc.) + Sblank
That is we draw the straight line that has the least value for the sum of the squares of the deviations.
Least square method
signal = m (Conc.) + Ssignal = m (Conc.) + Sblankblank
Linear regression y=b+mx
222 )()(
))((
ii
iiii
i
ii
xxn
yxyxn
xx
yyxxm
n
i
n
iii
n
i
n
iii
n
iii
n
ii
xxn
yxxxyxmyb
1 1
22
1 11
2
1
)(
....
Linear regressiony= ( bSb)+ (mSm)x
2
2
n
dS i
y
22
22
)()(
)(
ii
iyb
xxn
xSS
22
2
)()( ii
ym xxn
nSS
fiding Sy, Sm, Sb
Important Parameters in Instrumental Analysis
1) Sensitivity
2) Detection Limit
3) Dynamic Range
4) Selectivity
5) Signal-to-noise Ratio
Detection Limit ( LOD) LOD: The [analyte]min that can be determined with
statistical confidence. Analytical signal must be statistically greater than the
random noise of the blank. (i.e. analytical signal = 2 or 3 times S.D.of blank
measurement ( approx. equal to the peaknoise).Calculation of LOD The minimal detectable analytical signal (Sm) is given
by: Sm = Sbl + k.SDblank
To experimentally determine– Perform 20-30 blank measurements over an extended
period of time.– Calculate Sbl (mean blank signal) and SDblank
– Detection limit (Cm) is : Cm = (Sm-SDbl/)/m
LOQ, LOL, Dynamic Range
• LOQ (limit of quantitation): [lowest] at which quantitative measurements can reliably be made.
LOQ=10 x Average Signal for blank
•LOL (limit of linearity): point where signal is no longer proportional to concentration.
[Dynamic Range]: from LOQ to LOL.
Cm: detection limitCm: detection limit
Sensitivity
Indicates the response of the instrument to changes in analyte concentration or a measure of a methods ability to distinguish between small differences in concentration in different samples.
• In other words, a change in analytical signal per unit change in [analyte].
Effected by the slope of calibration curve & precision • For two methods with equal precision, the one with the
steeper calibration curve is more sensitive.( Calibration Sensitivity)
• If two methods have calibration curves with equal slopes, the one with higher precision is more sensitive.
(Analytical Sensitivity)
Calibration Sensitivity
– is the slope of the calibration curve evaluated .
S = mc + Sbl (m= slope; c= conc; Sbl = Signal of Blank) – Advantage: sensitivity independent of [analyte] – Disadvantage: Does not account for precision of individual measurements
Analytical Sensitivity(Defined by Mandel and Stiehler )
to include precision in sensitivity definition g = m/Ss(m = slope; Ss is the standard deviation of measurement)
- Advantage: Insensitive to amplification factors i.e. increasing gain also increases m but Ss also increases by same factor hence g stays constant
- Disadvantage: concentration dependent as Ss usually varies with [analyte]
Selectivity Degree to which a measurement is free from
interferences by other species contained in the matrix• Analytical Signal Detected is a sum of the analyte signal
plus interference signals
S = maCa + mbC + mCc + Sblank
• Selectivitity is a measure of how easy it is to distinguishbetween the analyte signal and the interference signal• Selectivity of an analytical method can be described using
a figure of merit called selectivity coefficient
kb,a = mb/ma : kc,a = mc/ma
S = ma(Ca + kb,aCb + kc,aCc) + Sblank
• Selectivity coefficients range from 0 >> 1. Can be negative if interference reduces the observed signal
Standard Addition Calibration
Most useful when analyzing complex samples when significant matrix effects are possible.
Most common “form” is adding 1 or more aliquots to sample aliquot (sample spiked)
If sample limited, can add to sample aliquot
Where : k is a proportionality constant relating signal toconcentration, Vs is the volume of standard added at a concentration of Cs,
Vx is the volume of unknown (aliquot) added at a concentration Cx, and Vt is the total (final) volume.
The Standard Addition Method(Spiking)
Technique to be used when:– Samples have substantial matrix effects.– Assay requires instrumental conditions that are difficult to
control Procedure• A measurement is made on a portion of the sample• Varying but known amounts (called spikes) of the assayed
substances are added to several equal portions of the sample → standard addition
• Each solution is diluted to same volume and measured.• The assay measurement is then plotted as a function of the
concentration spike.• The resulting plot is extrapolated to the concentration axis (i.e.
xaxis)
Internal Standard Method
An internal standard is a substance that is added in a constant amount to all samples, blanks and calibration standards in an analysis.
Procedure:• Carefully measured quantity of the internal standard is
introduced into each standard sample.• The solutions are diluted to the same volume and the
analytical signal is measured.• Calibration curve: Plot a ratio of the analyte signal to internal
standard signal vs. the analyte concentration of the standards
• The ration for the samples is then used to obtain their analyte concentration from the calibration plot.
Internal Standard (IS)
Internal standards are essential if we have a time-varying instrumental response Internal standards are very useful if you have matrix effects
Chapter 7: Quality Assurance / Quality Control
• QA: The planned measures that ensure a service or product meets minimum professional standards.
• QC: The day-to-day activities that monitor the quality of laboratory reagents, supplies and equipment.
• QA/QC: Proficiency Testing
Laboratory Accreditation
Validation
ISO 9000• An international set of standards for quality
management.
• Applicable to a range of organisations from manufacturing to service industries.
• ISO 9001 applicable to organisations which design, develop and maintain products.
• ISO 9001 is a generic model of the quality process that must be instantiated for each organisation using the standard.
ISO 9001
ISO 9000 certification
• Quality standards and procedures should be documented in an organisational quality manual.
• An external body may certify that an organisation’s quality manual conforms to ISO 9000 standards.
• Some customers require suppliers to be ISO 9000 certified although the need for flexibility here is increasingly recognised.
Documentation standards
• Particularly important - documents are the tangible manifestation of the software.
• Documentation process standards– Concerned with how documents should be
developed, validated and maintained.
• Document standards– Concerned with document contents, structure, and
appearance.
• Document interchange standards– Concerned with the compatibility of electronic
documents.
Document standards• Document identification standards
– How documents are uniquely identified.
• Document structure standards– Standard structure for project documents.
• Document presentation standards– Define fonts and styles, use of logos, etc.
• Document update standards– Define how changes from previous versions
are reflected in a document.
Quality in Environmental Analysis
• Value of Quality Control• General QC principles.
• Sources of error.
• Terminology and Definitions.
• Quality Control vs. Quality Assurance.
QC Terminology and Definitions
Principle Data Quality Indicators (DQIs):Precision
– Bias– Accuracy– Representativeness– Comparability– Completeness
Precision:- The agreement between the numerical values of two
or more measurements that have been made in an identical fashion.
- Calculated as range or standard deviation.- Intralaboratory & interlaboratory precision.
QC Terminology and DefinitionsBias:- The systematic or persistent distortion of a
measurement process that can cause errors in one direction
Accuracy:- The measure of how close an individual or
average measurement is to the true value.- Combination of precision and bias.- A reference material must be used in
determining accuracy.
QC Terminology and Definitions
Representativeness:- A measure of the degree to which data
accurately and precisely represents a sampling point or process condition.
- A measure of how closely a sampleis representative of a larger process.
Comparability:- A qualitative term that expresses the confidence
that two data sets can contribute to a common analysis.
QC Terminology and Definitions
Completeness:- A measure of the amount of valid data
obtained from a measurement system, expressed as a percentage of the valid measurements that should have been collected (i.e., measurements that were planned to be collected).
Quality Control vs. Quality Assurance
- QC is a component of QA.
- QC measures and estimates errors in a system.
- QA is the ability to prove that the data is as reported.
Sources of Error- Sample errors- Reagent errors- Reference material errors- Method errors- Calibration errors- Equipment errors- Signal registration and recording errors- Calculation errors- Errors in reporting results
Sources of ErrorSample Errors
- Sample container contaminated.- Incorrect sample location.- Non-representative sample.- Incorrect sample container.- Sample mix up.
Reagent Errors- Impure reagents or solvents.- Improper storage of reagents.- Neglect of reagent expiration date.- Evaporated reagents.- Consideration of different purities or grades.
Sources of ErrorReference Material Errors
- Impurity of reference materials.- Errors from interfering substances.- Changes due to improper storage.- Errors in preparing reference material.- Using expired reference material.
General Method Errors- Deviating from the analysis procedure.- Disregard for the limit of detection.- Disregard for a blank correction.- Calculation errors (dilutions, mixtures, additions).- Not using the correct analytical procedure.
Sources of ErrorCalibration Errors
- Volumetric measuring errors.- Weighing errors.- Inaccurate equipment adjustments.
Equipment Errors-Equipment not cleaned- Maintenance neglected.- Temperature, electrical, and magnetic effects.- Errors in using auto-pipettes (not calibrated, pipette tip
not correctly attached, contamination).- Errors in using glass pipettes (damaged, bad technique,
contamination).
Sources of Error
Equipment Errors (continued)
• Cuvette errors (defects not considered, unsuitable cuvette glass, not filled to minimum, wet on the outside, air bubbles, contamination).
• Photometer errors (wrong wavelength, insufficient lamp intensity, dirty optics, drift effect ignored, incorrectly set zero, light entering the sample chamber).
Sources of ErrorSignal Registration and Recording Errors
- Incorrect range setting.- Reading errors.- Recording errors.- Switching of data.
Calculation Errors- Arithmetic errors, decimal point errors, incorrect units.- Rounding errors.- Not taking into account the reagent blank values.- Error in dilution factor.
Errors in Reporting Results- Omitting a sample error.- No quality assurance implemented
Validation demonstrates that a procedure is robust, reliable and reproducible
• A robust method is one which produces successful results a high percentage of the time.
• A reliable method is one that produces accurate results.
• A reproducible method produces similar results each time a sample is tested.
QA- Does the method still work
• Control charts - Documenting and archiving• Proficiency testingParticipating in collaborative interlaboratory studiesCalculate Z-score:Where:
S
XXz i
ˆ
iX is the mean of I replicate measurements by laboratory
is the accepted concentration
is the standard deviation of the accepted concentration
X̂
S
Defining the Problem
1. What accuracy is required?
2. How much sample is available?
3. What is the concentration range of the analyte?
4. What components of the sample will cause interference?
5. What are the physical and chemical properties of the sample matrix?
6. How many samples are to be analyzed?
Selecting an Analytical Method
Numerical Criteria for Selecting Analytical MethodsNumerical Criteria for Selecting Analytical Methods
Parameters for method validation
• Accuracy
• Precision
• LOD, LOQ, Sensitivity
• Selectivity
• Linearity
• Range
• Ruggedness or Robustness
Accuracy (determination)
• Compare results of the method with results of an established reference method
• Positive controls (dilution must be done separately from calibration point with fresh reagents, different supplier of standard or other batch than calibration)
• Measurements of CRM• Spiking the sample matrix with a known
concentration of RM
Standard operation procedure(SOP)
It should include: • Validity ( e.g. application in wastewater)• Short description of the main principle• Possible errors and problems• Preparation of reagents, standards, instruments• Sample preparation (sampling, enrichment,
chromatography, detection)• Quantification of the compounds• QA/QC