Experimental Design

Experimental Design

Nilesh B. Patel, PhDDept Medical Physiology

University of NairobiKenya

What is science?

The systemic study of the structure and behavior of the physical world, involving experimentation and measurement and the development of theories to describe the results of these activities.

Cambridge International Dictionary of English

What is an experiment?

A test done in order to learn something or to discover whether something works or is true (plausible).

Experiments are done to prove the hypothesis is false.

Only falsification is possible with certainty; proving something can never be done with certainty

What can be studied with the methods of science?

1. The capital of Kenya is Nairobi.2. Holland is twice the size of Kenya.3. All humans are mortal.4. All rabbits are grey.5. All ziwats are blue.6. Mentally ill people are possessed by an evil

spirit that7. Mentally ill people are possessed by an evil

spirit that cannot be detected by any known means.

Adapted from: Circadian Physiology. Roberto Refinetti. CRC Press 2000

Galileo Galilei(1564-1642)

Father of modern physics Father of modern

observational astronomy Father of modern science Pioneered the use of

quantitative experiments whose results could be analyzed with mathematical precision

How many of you accept that the sun goes round the earth?How many of you accept that the earth goes round the sun?

Types of Experiments

Experiments are done to test a hypothesis; a prediction.

Observational Experimental Quasi-experimental

Experimental Design

Whether or not a studys findings are useful or not depends crucially on design.

No matter how ingenious or important an idea for an experiment might be, if the study is badly designed, its worthless.

An excellent procedure for determining cause and effect.

Does night cause day or does day cause night?

Three Aims of Research Validity

Results actually show what it is that you intent them to show

Reliable Potentially replicable by yourself or anyone else

Important Subjective

Research can possess all the above qualities but still be essentially trivial.

Research cannot be important if its findings are unreliable or invalid.

Score or Measurement Consists of a number made up of different

components1) A true measure of the thing we want to measure2) A measure of other things3) Systemic (non-random) basis: measuring other

things inadvertently.4) Random (non-systemic) error, which should cancel

out over large number of observations. We want the score or measurement to consist

of as much true score, as possible and little of the other factors (validity and reliability).

Types of Measurement Nominal Ordinal Interval Ratio

Is what you measured, a real measure of what you are interested in?

It is a measure what you are interested in. Random (non-systematic) errors. Non-random errors (systemic) errors.

Evidence for the intellectual inferiority of women

Paul Broca (19th Century) make careful measurement of brain weight and found Caucasian men have larger brains than Caucasian women, who in turn have a larger brain than negros or any other non-Caucasian for that matter.

Modern brains were supposedly heavier than medieval brains, and French brains were heavier than German brains.

The brain weights differences were considered to reflect differences in intelligence between these different groups.

MWM and disruption of biological clock where the measure is the time to reach the platform.

Anesthesia and learning.

Experimental Design

This is important so that the results can be interpreted and other causes can be excluded from the most likely conclusion.

The results have to be valid, reliable and generalizable (important)

Simplest Experiment

MeasurementMeasurementBaseline Treatment

There is a difference so my treatment has an effect

Time animal handler effectHabituation

The need of control group

Lazaro Sapallanzani 1793 Found that bats and owls differ in their ability to

avoid objects in the dark. With hoods the bats could not avoid the objects. So the bats have better night vision than owls. Covered with transparent hoods; the bats have a

problem (key control experiment). Blind the bats and they are fine Wax in the ears Pierce invented the ultrasound microphone and

Griffin walked in with a bat. Association is not causation but can be.

TreatmentExperimental

TreatmentControl

Measurement

Measurement

Experiment with control group

Great!!! There is a difference so my treatment has an effect.

fr

e

q

u

e

n

c

y

2cm height 100 cm

treatmentcontrol

Population distribution

Randomization

Random allocation

TreatmentExperimental

TreatmentControl

Measurement

Measurement

Post-test/control groupIndependent variable and dependent variable

Measurement

MeasurementMeasurement

MeasurementTreatmentExperimental

TreatmentControl

Between-group Design

Advantages Simplicity Less chance of practice or fatigue Useful when it is impossible for an individual

to participate in all experimental conditions Disadvantages

Expense in terms of time, effort and number

Treatment

Treatment MeasurementNo Treatment

Measurement MeasurementNo Treatment

Measurement

Repeated-Measure Design

Repeated-Measures Design

Advantages Economy Sensitivity

Disadvantages Carry-over effect The need for the condition to be reversible

Treatment

Treatment MeasurementNo Treatment

Measurement MeasurementNo Treatment

Measurement

Repeated-Measure Design

Ra

n

d

o

m

a

l

l

o

c

a

t

i

o

n

TreatmentLevel A

TreatmentControl

Measurement

Measurement

Measurement

Measurement Measurement

MeasurementTreatmentLevel C

TreatmentLevel B

Measurement

Measurement

Multiple Levels of Independent Variable4 levels

Latin Square Design

To deal with the problem of order effects in within-subject designs.

The order of the various conditions is counter-balanced so each possible order of condition occurs just once.

Example

Three conditions A, B, and C.

Avoid systemic order effect

6 possible combinations of order

Order of conditions or trials

Group 1 A B CGroup 2 B C AGroup 3 C A B

James Brown EDGAR programmewww.jic.bbsrc.ac.uk/services/statistics/edgar.htm

Multi-Factorial Design

Two or three independent variables in the same study

Not recommended to use more than 2 as the analysis gets quite complicated and conclusion have to be qualified.

Gender and treatment Day and performance

Correlations

How changes in one variable alters another variable?

Can have more than one variable and determine the contribution or weight or % that each variable has on the measured value.

Multi-factorial, multi-dimensions, e.g. physics

Correlations are not cause and effect

Statistical Analysis

STATISTICS

Descriptive Inferential

Calculation of probabilityTests of significance

Graphs/tablesNumerical summary

Descriptive statistics summarizes the data obtained

Mode, Median, Mean Range, Quartiles, Squared Sum, Variance,

Standard deviation

Inferential statistics is used to reach certain conclusions that can be applied to public health planning or patient care.

Modern knowledge methods and objective decision making process

Measurement

Types of measurements Nominal Ordinal Interval Ratio

Is what you measured, a real measure of what you are interested in?

It is a measure what you are interested in. Random (non-systematic) errors. Non-random errors (systemic) errors.

Descriptive Statistics

Summarize the data

Measure of central tendency Spread of data

Standard deviation (sd)Standard error of the mean (sem)

rannge

Graphs Tables

MeanMode

Median

Line graphsHistogramsPie charts

Measuring the accuracy of the mean

The mean is a statistic that predicts the likely score of a person or measurement.

Most of the time, it will not a number in the data or measurement a hypothetical value.

Mean number of children is 2.5 per family in the UK.

The mean will only be a prefect representation of the data if all of the scores collected are the same.

01

2

3

4

5

6

0 2 4 6 8

mean

mean

The closer the mean is to all the data points the more representative is the mean of the data

Distribution of data Means are the same

Distribution of the scores is different

Range

Distribution measures

Range Quartile Mean deviation Sum of squared deviation Mean of the sum of squared deviation

variance Standard deviation Standard error of the mean (sem)

Standard deviation and percentage of data

68%

96%

1 sd

2 sd1.96 sd 95%

Population Parameters

(mu, mean) (sigma, standard

deviation)

SampleParameters

X (mean)

Sd (standard deviation

estimate

Population and sample parameters

Standard error of the mean (sem)

How well does the sample mean represent the population mean?

This is after all the purpose of research. If we take different samples from the same

population, the mean for some samples will be similar and for other different.

But usually we do not take many samples, we take usually 1 sample.

So how good an estimate of the population mean is it?

Population and sample parameters

X = 10

X = 10

X = 12X = 9

X = 11

X = 11

X = 10

Why Inferential Statistics?

Descriptive statistics does not help to answer research questions.

Most of modern research is hypothesis driven. There are two common hypothesis

(predictions) inherent1. Experimental hypothesis (HA)

The experimental manipulation will have an effect.2. Null hypothesis (H0)

The experimental manipulation will have no effect

So what if there is a difference between the means?

Two samples from the same population will have different means.

Samples taken from the extremes of the distribution will have large difference in the means.

So large differences suggest that the means are from two different populations, but how can we be certain that it not due to sampling from the extreme of the distribution.

Application of inferential statistic.

The question

what is the probability that the difference between the means is due to chance and not due to my manipulation?

Carry out tests of significance The tests of significance give the probability of obtaining

the difference between the means by chance. Choice of test depends on the type of measurement and

the experimental design. Whichever test you use you will end up with a number

the test statistic,e.g. t, z, F. What is the probability of obtaining that value of the test

statistic for the means and the spread of data that I have?

On what basis do I or anyone else accept that my manipulation did indeed have an effect?

Test Statistic A test statistic (t, F, z) is a statistic with a known

distribution. e.g. Age and death

Test statistic = systemic variation / unsystemicvariation.

The exact form of this equation changes from test to test, but essentially the comparison is the amount of variance created by an experimental effect against the amount of variance due to random factors.

If the experiment has had an effect then it will create more variance than random factors alone.

Go back

Ronald Aylmer Fisher1890-1962

"To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of."

Fisher tea test

Is the milk added after or before the tea is added?

The level of significance p < 0.05. There is a 5% or less probability that the

difference between the means is due to chance. I am therefore willing to accept with a 95%

confidence that the difference I obtained was due to my manipulation and not due to chance.

My results are significant but are they important?

Which test to use?

Sample size Data distribution Homogeneity of variance Independent measurements or repeated

measurements Comparison between 2 groups Comparison between 3 or more groups Type of data

Parametric or Non-parametric

Parametric Tests 1. Arithmetic means

Interval or ratio data2. Variance in the one condition = variance in the

other conditionHomogeneity of variance (Levenes test)Sphericity assumption (Mauchlys test)

3. Normal distribution1. Tests

Plot histogramKolmogorov-Smirnov testShapiro-Wilk test

Parametric tests

Students t-testIndependent t-testDependent t-test

Analysis of Variance (ANOVA)One-Way ANOVATwo-Way ANOVARepeated measures ANOVAMixed ANOVA

Students t-test

Only two groups of data Sample size < 30 Ratio difference between the means /

estimated s.e. of the difference between those two sample means.

Independent t-test unpaired

Two groups Different participants, i.e. each participant

contributes only one data point Normal distribution Levenes test not significant (p = 0.492) Presenting the data:

m = 24.2, se = 1.49; m = 20.00, se = 1.30.t(18) = 2.12, p < 0.05, r = 0.45.

Analysis of Variance (ANOVA) For 3 or more experimental group Generates a F ratio Tells whether the means of the samples are significantly

different. But not which means. Additional tests need to be done to determine which

means are statistically different (planned comparison or post-hoc tests).

Types of ANOVA One-Way ANOVA Two-Way ANOVA Repeated measures ANOVA Mixed ANOVA

One-Way ANOVA

3 or more groups Different participants in each group One-way for 1 independent variable Groups are independent, i.e. data in one group

does not influence the data in another group Homogeneity of variance

If Levenes test is significant Use non-parametric tests or transform the data

Output from SPSSSum of squares

df Mean square

F Sig

Between groups

450.664 5 90.133 269.733 0.000

Within groups

38.094 114 0.334

Total 488.758

Presentation of resultsF (5, 114) = 269.73, p < 0.05

Post-hoc tests9 REGWQ or HSD

Equal sample size and homogeneity of variance9 Bonferroni

Conservative, effected by group variance9 Gabriels Procedure

Slight differences in sample size9 Hochbergs GT2

Big differences in sample size9 Games-Howell Procedure

If the variances are not equal

Assumption of sphericity Equality of variance of the differences between

treatment levels Mauchlys test

Violation of sphericity is loss of power, i.e. increased probability of Type II error

Correction df by Greenhouse-Geisser estimate of sphericity Huyn-Feldt estimate of sphericity Lowest possible estimate of sphericity (lower-bound)

If > 0.75 use Huyn-Feldt If < 0.75 use Greenhouse-Geiser

Some sort of result

Mauchlys test indicated that the assumption of sphericity had been violated, 2(5) = 13.12, p < 0.05,therefore degrees of freedom were corrected using Huyn-Feldt estimates of sphericity ( = 0.85). The result showed that the number of women eyed-up was significantly affected by the amount of alcohol drunk, F(2.55, 48.40) = 4.73, p < 0.05, r = 0.40)

Non-Parametric Test

Assumption-free tests Less strict about the distribution of the data

being analyzed. Thus raw scores are used Data is ranked

Lowest score rank 1, next lowest rank 2, etc Analysis is carried out on the ranks Less power than parameteric tests hence higher

chance of Type II error.

Basic tests

Mann-Whitney test= independent t-test

Wilcoxon Signed-Rank test= dependent t-test

Kruskal-Wallis test = one-way ANOVA

Friedmans ANOVA= one-way repeated measures ANOVA

Mann-Whitney Test = independent t-test

Two conditions and different participants in each condition

Men (M dn = 27) did not seem to differ from dogs (M dn = 24) in the amount of dog-like behaviour they displayed (U = 194.5, ns)

Graphs are usually box and whisker plots. Use of the median

Non-parametric tests are testing the difference in ranks; not the difference in means.

Means are biased by outliers; whereas ranks are not.

Kruskal-Wallis Test = one-way ANOVA

More than 2 conditions and different participants have been used in each condition.

The test statistic is H and has chi-squared distribution Childrens fear beliefs about clowns were significantly

affected by the format of information given to them (H(3) = 17.06, p < 0.01). Mann-Whitney test were used to follow-up this finding. A Bonferroni correction was applied and so all effects are reported at a 0.167 level of significance. It appears that fear beliefs were significantly higher after the adverts compared to the control, U = 37.50, r = -.60.

Claude Bernard(1813-1878)

Those whose minds are bound and cramped. They make poor observations, because they choose among the results of their experiments, observation, and reading only what suits their object, neglecting whatever is unrelated to it and carefully setting aside everything which might tend toward the idea they wish to combat.

Ignaz Semmelweis(1818-1865)

Savior of mothers 10-35% mortality Death of colleague Jacob

Kolletschka (1847) Introduced lime washing

of hands 12.24% to 2.38%. The Etiology, Concept

and Prophylaxis of Childhood Fever (1861).

Semmelweis Reflex

Dismissing or rejecting out of hand information, automatically, without thought, inspection, or experiment.

References Andy Field and Graham Hole. How to Design

and Report Experiments. Sage Publications. London. 2006.

Martin Bland. An Introduction to Medical Statistics. 2nd ed. Oxford Medical Publications. 1996.

Norman T. J. Bailey. Statistical Methods in Biology. 3rd Ed. Cambridge University Press. 1995.

Beth Dawson and Robert G. Trapp. Basic and Clinical Biostatistics. Lange Medical Books/McGraw-Hill. 3rd ed. 2001.

Experimental Design What is science?What is an experiment? What can be studied with the methods of science?Galileo Galilei(1564-1642)Types of ExperimentsExperimental Design Three Aims of Research Score or Measurement Types of Measurement Evidence for the intellectual inferiority of women Experimental DesignSimplest Experiment The need of control groupLazaro Sapallanzani 1793 Randomization Between-group Design Repeated-Measure DesignRepeated-Measures Design Repeated-Measure DesignMultiple Levels of Independent Variable4 levelsLatin Square Design ExampleMulti-Factorial Design Correlations Statistical AnalysisMeasurement Descriptive StatisticsMeasuring the accuracy of the meanDistribution of data Distribution measures Standard deviation and percentage of dataPopulation and sample parametersStandard error of the mean (sem)Population and sample parametersWhy Inferential Statistics?So what if there is a difference between the means?The questionCarry out tests of significanceTest StatisticRonald Aylmer Fisher1890-1962Fisher tea testWhich test to use?Parametric or Non-parametricParametric Tests Parametric testsStudents t-testIndependent t-test unpaired Analysis of Variance (ANOVA)One-Way ANOVAOutput from SPSSPost-hoc testsAssumption of sphericitySome sort of result Non-Parametric TestBasic tests Mann-Whitney TestKruskal-Wallis TestClaude Bernard(1813-1878)Ignaz Semmelweis(1818-1865)Semmelweis ReflexReferences

Date post:	16-Dec-2015
Category:	Documents
Upload:	jithesh-kumar-k
View:	10 times
Download:	1 times

Experimental Design

Documents