+ All Categories
Home > Documents > BME STATS WORKSHOP

BME STATS WORKSHOP

Date post: 13-Jan-2016
Category:
Upload: yates
View: 31 times
Download: 0 times
Share this document with a friend
Description:
BME STATS WORKSHOP. Introduction to Statistics. Part 1 of workshop. The way to think about inferential statistics. They are tools that allow us to make black and white statements even though the data does not clearly provide answers. - PowerPoint PPT Presentation
Popular Tags:
82
BME STATS WORKSHOP Introduction to Statistics
Transcript
Page 1: BME STATS WORKSHOP

BME STATS WORKSHOP

Introduction to Statistics

Page 2: BME STATS WORKSHOP

Part 1 of workshop

Page 3: BME STATS WORKSHOP

The way to think about inferential statistics

• They are tools that allow us to make black and white statements even though the data does not clearly provide answers.

– This is to say that we will use probabilities which speak of shades of grey but will make statements with respect to rejecting or failing to reject some null hypothesis.

Page 4: BME STATS WORKSHOP

Inferencing from data analysis

• As scientists we have the unique privilege of using ingenious tools and methods that help us make informed decisions.

• One of those tools is statistical analysis. It allows us to more accurately determine the reality of our data.

• This workshop should help you make better conclusions on your data by using simple but effective statistical tools to cut through the levels of grey often encountered in research.

Page 5: BME STATS WORKSHOP

The Essence of Inferential Statistics

1. We compare a statistic obtained from acquired data to a theoretical distribution of that statistic. Thus, relativity is important in statistics.

• You will surely have conducted t-tests in the past to compare measures from a control with an experimental group.

• That t value is evaluated against a distribution of ts.

• In statistics, size does mater. Large t values increase the likelihood of the investigator stating that he has significant results.

Page 6: BME STATS WORKSHOP

Essence con’d

2. Signal to noise ratio.

• Most statistics used in this workshop such as the t statistic are made up of differences due to treatment and differences due to individuals (also called error). Error is simply random variation.

difference theoferror standard21 XX

t

Page 7: BME STATS WORKSHOP

Essence Con’d

3. Rare events

• This is related directly to point one.

• In order for treatment to be successful, the obtained statistic has to be sufficiently rare.

• We will find out that large statistical values are considered rare.

• For a better understanding of these points we will describe a Monte Carlo experiment.

Page 8: BME STATS WORKSHOP

The Plan!

1. Constructing a distribution.

2. How to apply a statistic obtained from an experimental.

3. Interpretation of a result.

4. What does a significant result mean?

Page 9: BME STATS WORKSHOP

Constructing a Distribution:Some Definitions

• Sample distribution:– A distribution of values from some measurement.

• This measurement can be of anything, such as height, weight or age to name a few.

• Sampling distribution:– A distribution of a statistic obtained from a sample

distribution.• This statistic can be a mean, mode, median, variance or

anything else that is a calculation from individual measures.• As we will see, the t statistic can be used to construct a

sampling distribution.

Page 10: BME STATS WORKSHOP

Distributions

• Sample distributions are often bell shaped or normal but this is guaranteed. On occasion exponential, rectangular or odd shaped distributions are observed.

• Sampling distributions on the other hand are almost always normally shaped. This is true even is the measurements used to calculate a statistic are from non-normal distributions.

Page 11: BME STATS WORKSHOP

How to construct a sampling distribution of the t statistic. An example under the null

hypothesis of equal means

• We first have to have a sample distribution of some measure from some population with specific parameters such as 25 year old women. The measurement of interest could be height.

• We then randomly sample from this distribution to make up two groups of individuals of a specified sample size. – Ex. Two groups of ten individual.

• From these two groups a t value is calculated. This t value is then plotted. After this calculation, the individuals are returned to the sample distribution.

• The process of “sampling” with replacement is repeated as many times as possible. Using computers you might opt for 1000 or more samplings. Thus, you would have a sampling distribution of 1000 ts.

Page 12: BME STATS WORKSHOP

How to use a sampling distribution of ts

• In any sampling distribution there are a number of values that are extreme. This is normal and we will use this concept to make decisions about our experiments.

• Traditionally, we determine the t value at which point all values greater make up 5% of all values in that distribution. If we are concerned about both tails of that distribution we will find the value at which point all values greater make up 2.5% of all values on the positive tail and 2.5% on the negative tail.

Page 13: BME STATS WORKSHOP

How to use con’d.• We then conduct an experiment in which we

have a control and an experimental group.

• We calculate a t statistic from this experiment.

• This t value is evaluated against the sampling distribution of ts we have constructed.

• If our obtained value is greater than the value from the distribution that marks the 5% cutoff we end up stating that the experiment produced a significant result. In other words the control was significantly different from the experimental group.

Sig. Sig.Not Sig.

Page 14: BME STATS WORKSHOP

Some specifics about using a t distribution. What does stating

significance really mean?• First of all when find a t value that is outside of

the critical values in a distribution we should really start by saying, “the obtained value is rare if when calculated from two groups obtained from the same population.”

• We would then follow up that statement with, “Since that value is rare and is obtained from an experiment, it is reasonable to conclude that the groups do not come from the same population.”– This is indeed saying that the treatment was effective.

Thus, we have a significant result.

Page 15: BME STATS WORKSHOP

Monte Carlo

How will building a distribution help us understand statistics

Page 16: BME STATS WORKSHOP

Monte Carlo

Building a t distribution

Page 17: BME STATS WORKSHOP

Distributions: ts

How do you build distributions of a statistic? In

this case t. 1) You start with a population of interest. 2) Calculate means from two samples with a specific number of individuals. 3) Calculate the t statistic using those two samples. 4) Do this again and again. Possibly 1000 times or more.

Remember that these distributions are built under the null hypothesis.

n1=xx n2=xx

1x 2x

Population

t Calculate

Repeat the process as often as you can.

Page 18: BME STATS WORKSHOP

Family of ts

The larger the sample size used the less variability in the results. As we can see here, the greater the degrees of freedom (df) the less extreme are the obtain values resulting in a tighter distribution.

Note: Degrees of freedom when using the t-test are calculated at n1+n2-2. Thus, for a sample size of 10 per group the dfs are 18.

Page 19: BME STATS WORKSHOP

Theoretical Distribution of ts. We use this table to determine the critical values. The computer uses the density functions.

Page 20: BME STATS WORKSHOP

Variables

• Independent variable:– That variable you manipulate.

• Subjects are allocated to groups

• Dependent variable– That variable which depends on the

manipulation.• Measures such as weight or height or some other

variable that varies depending on treatment

Page 21: BME STATS WORKSHOP

Cause and effect

• Cause can only be inferred when subjects are randomly allocated to groups.– Random allocation ensures that all

characteristics are evenly distributed across all groups.

• This way, differences between groups cannot be due to biases in the subject selection, a very important element of experimental design.

Page 22: BME STATS WORKSHOP

An example of data analysis

Page 23: BME STATS WORKSHOP

Comparing Reaction time Following Alcohol Consumption.

• University males were recruited to participate in an experiment in which they consumed a specific amount of alcohol.

• The males were randomly separated into two groups. One group consumed the alcohol and the other some non-alcoholic drink.

• Ten minutes after the second drink was consumed the subjects were asked to push a button on a box the moment they heard a buzzer.

• When the button was pushed the buzzer stopped. The investigator recorded the amount of time the buzzer sounded in milliseconds.

Page 24: BME STATS WORKSHOP

Hypotheses

We state hypotheses in terms of populations. This is to say that we are making statements on what we think exists in the real world. From our sample we will reject or fail to reject the null hypothesis. Here we have a situation in which we are predicting differences only. This is a non-directional hypothesis.

H0: c = a

 

H1: c ≠ a

Page 25: BME STATS WORKSHOP

The data (Time in ms)

• Control Alcohol group

• 150 200• 110 250• 200 220• 135 225• 90 250• 111 234

Page 26: BME STATS WORKSHOP

Results from an output provided by SPSS

Probability of a Type1 error is provided inside the red box added by myself (not SPSS). Commonly, investigators call this the significance level. It should be noted that statisticians would not label that value as such.

Page 27: BME STATS WORKSHOP

Critical Values• A critical value is that value using a theoretical distribution that

marks the point beyond which less than a specific percent of values can be found.

– We typically use 5%.

• In our example we have 12 scores from 12 individuals, thus 10 degrees of freedom.

– From the distribution of all ts we can determine how large a calculated t from our experiment must be for us to reject the null hypothesis of equal means.

– That value (see table previously shown) is 2.228.– Our obtained t is larger than the critical value (-5.465). We reject the null

hypothesis in favour of the alternate.– You will notice that the t value is negative for our experiment. What is important

is the magnitude, not the direction. If we were to reverse the groups in our calculations the value would have been positive.

Page 28: BME STATS WORKSHOP

Interpretation of the results

• Alcohol increases the amount of time needed to turn off the buzzer suggesting that the subjects are impaired in their reactions.

• We are able to make this statement because the t value obtained here would be rare if the samples came from the same population. Due to this situation, we give ourselves permission to reject the null hypothesis of equal means in the population.

Page 29: BME STATS WORKSHOP

Some Important Concepts

Page 30: BME STATS WORKSHOP

The standard deviation

• The concept of variance and standard deviation (SD) is everything in statistics.

• It is used to determine if individuals or samples are inside or outside of normal.

• Anyone that is more than 1.96 SD away from the population mean of some measure is said to not belong to that population. However, this is only true when we have population parameters (more on this later).

Page 31: BME STATS WORKSHOP

Variance:

Standard Deviation (SD):

Standard error of the mean=SEM:

1

)( 22

n

XXVarianceS

2SSD

n

SDSEM

A few formulas to help us along.

Page 32: BME STATS WORKSHOP

Variability is Important

•The greater the variability the greater the noise. Note here that with greater variability in the data, more overlap of the sample distributions is observed.

•This will result in smaller signal to noise ratios. Thus, when we have more variability we will need larger sample sizes to detect mean differences (more on this later).

Keep this in mind when reviewing the upcoming slides.

Page 33: BME STATS WORKSHOP

T-Test

• Two Sample t-test

• Comparing two sample means.

2

2

1

2

21

21

n

S

n

S

XXt

XX

It is evident from the formula that the smaller the variability, the larger the t value.

Page 34: BME STATS WORKSHOP

Hypothesis Testing revisited.

• We always determine whether or not a statistic is rare given the null hypothesis never from the alternate hypothesis. You might remember this from the Monte Carlo studies.

• Thus we have to deal with the concept of the Type1 and the Type2 error.

Page 35: BME STATS WORKSHOP

Type 1 error

• The probability of being wrong when stating that samples are from different populations.

• This is the p<.05 that we use to reject the null hypothesis of equal means in the population.– If we have a p of .02, it means that the probability of

being wrong when stating that two samples come from different populations is .02.

– The .05 is a cutoff that is said to be acceptable.

Page 36: BME STATS WORKSHOP

Type 2 error.

• The probability of failing to reject the null hypothesis when the null is not true.

• In truth, the samples are most likely from different populations. Often, we simply don’t have enough power or the tools are not sensitive enough to detect these differences.

Page 37: BME STATS WORKSHOP

Assumptions of a Distribution

What are they and why are they important?

Page 38: BME STATS WORKSHOP

Assumptions are rules

• They are the rules by which distributions are constructed.

• These rules must be followed in order for a statistic obtained from an experiment to be compared to the theoretical distribution.

• If your experiment breaks these rules, it is possible that you will either to conservative or to liberal when making a statement about the reality of the population.

Page 39: BME STATS WORKSHOP

Assumptions

1. Samples come from a normally distributed population

2. Both samples have equal variances (homogeneity of variance)

3. Samples are made up of randomly selected individuals

4. Both samples be of equal sample size.

Page 40: BME STATS WORKSHOP

What to do when we violate assumptions

• 1. We can transform the data so that the sample can have the characteristics desired.

• 2. We can use distribution free statistics.– These statistics are insensitive to violations of

assumptions.• However, they do have limitations (more in later

sessions).

Page 41: BME STATS WORKSHOP

Part 2 of workshop

Page 42: BME STATS WORKSHOP

Starting out with PASW (formerly SPSS but now SPSS again)

An introduction

Page 43: BME STATS WORKSHOP

What is SPSS

• It is “Statistical Package for the Social Sciences).

• It started life as a text driven program (SPSSx), migrated to the PC as line code and, finally made it to the Windows environment. This is the version we enjoy today.

Page 44: BME STATS WORKSHOP

Do you need the latest version?

• No.

• With each new version there are graphical changes and on occasion additional statistical tools.– However, the basics do not change. An

analysis of variance conducted with version 10 will produce the same results as those with version 19 (the latest at the time of this workshop).

Page 45: BME STATS WORKSHOP

Latest version cont’d

• One problem is with the output of different versions.– Older versions of SPSS cannot read the

output of newer versions. Thus, the outputs are not backward compatible.

– One way to get around this issue is to use the export function in the newer versions to save the outputs as PDF, DOC, or PPT so that the results can be read.

Page 46: BME STATS WORKSHOP

Getting started

• If you’ve used Excel in the past, then you have a base from which to work.

• SPSS uses a worksheet that is similar but not identical to Excel.– However, the similarities end there.

Page 47: BME STATS WORKSHOP

Learning Curve

• If you use SPSS on a regular basis, you should be somewhat proficient in a week or two.– Developing an expertise will take you

somewhat longer depending on your interest and statistics knowledge.

– Lets get started!

Page 48: BME STATS WORKSHOP

This is what you see when you start the program. In front of you is the

worksheet in the “data view”.

You enter all your data in the worksheet.

Page 49: BME STATS WORKSHOP

You also have the option of “variable view” by clicking on the tab below or clicking on the column heading “var”.

Page 50: BME STATS WORKSHOP

The variable view is where you write down the name of your variable (variable name). Also in this view you have

the option of providing variable labels and other descriptors that can help you recognize your data.

Name your variable.

Page 51: BME STATS WORKSHOP

Let’s start with a short review on variables.

• Independent variable (IV): That variable which is manipulated.

• Dependent variable (DV): That variable whose measures depend on some manipulation.

• Any experiment can have more than one IV or DV.

• These variables have to be set up correctly in a worksheet in order to properly analyze data.

Page 52: BME STATS WORKSHOP

Let’s say that the study is designed to determine if a certain drug facilitates weight loss

• We will need an independent variable….say Drug Type.– We could have two groups based on drug treatment.

• Drug 1• Drug 2

• We will also need a dependent variable…say weight.– In the worksheet we will indicate the weight for each

individual after being on the drug for a period of time.

Page 53: BME STATS WORKSHOP

Entering data. We simply click on an empty box and begin typing as appropriate. Shown here are the designations for group membership

for the IV in our fictitious experiment with two groups.

Page 54: BME STATS WORKSHOP

Back to the variable view where we change the variable name and add a label which will help us remember what that variable means for future reference. Also, the variable label is the text that will be printed on the

output following an analysis.

Page 55: BME STATS WORKSHOP

Clicking on the empty square under values allows for the user to specify group names.

The number value is assigned a label by the user.

Page 56: BME STATS WORKSHOP

On returning to the worksheet, the group labels and the variable name specified by

you replace the default labels.

Page 57: BME STATS WORKSHOP

We will now add the dependent variable with data

IV

DV

Page 58: BME STATS WORKSHOP

Some Descriptive Statistics

• PASW easily allows us to produce descriptive statistics.– Mean– Standard deviation– Standard error– Median – Etc….

Page 59: BME STATS WORKSHOP

You conduct all analyses from the Analyze option. Here we are asking for PASW to show descriptive statistics using the Means sub-option.

Page 60: BME STATS WORKSHOP

Many options for descriptive statistics are available

Page 61: BME STATS WORKSHOP

Relevant output table is shown here. Note that the statistics requested in the earlier slide are displayed in this table.

Report

This is the dependent variable for weight

This is the independent

variable Mean N Std. Deviation

Std. Error of

Mean

Drug 1 48.3000 10 7.95892 2.51683

Drug 2 80.8000 10 9.17484 2.90134 dimension1

Total 64.5500 20 18.65046 4.17037

Page 62: BME STATS WORKSHOP

Graphs: Can be constructed from a number of options

You may wish to use the chart builder option but users who are familiar with older versions of this program sometimes find it difficult to change. I like the legacy option which retains the old method.

In the next slide we will see a graph using the error bar option.

Page 63: BME STATS WORKSHOP

Here we have the 95% intervals but typically you would want the error bars to represent one

standard error.

Page 64: BME STATS WORKSHOP

Finally an analysis

• We will conduct a two sample independent t-test.

Page 65: BME STATS WORKSHOP

Here we specify the tests of means in the compare means option

Page 66: BME STATS WORKSHOP

You must indicate which groups will be compared. You must use the number assigned to the groups.

Page 67: BME STATS WORKSHOP

Levene’s test determines if the variance in one group is different from the other. This is an important assumption.

The results are significant.

Sig. (2-tailed) is the Type 1 error.

Independent Samples Test

Levene's Test for Equality of

Variances

F Sig.

Equal variances assumed 1.138 .300 This is the dependent

variable for weight Equal variances not

assumed

Independent Samples Test

t-test for Equality of Means t df Sig. (2-tailed)

Equal variances assumed -8.462 18 .000 This is the dependent

variable for weight Equal variances not

assumed

-8.462 17.648 .000

Independent Samples Test

t-test for Equality of Means

Mean

Difference

Std. Error

Difference

Equal variances assumed -32.50000 3.84086 This is the dependent

variable for weight Equal variances not

assumed

-32.50000 3.84086

Independent Samples Test

t-test for Equality of Means

95% Confidence Interval of the

Difference

Lower Upper

Equal variances assumed -40.56935 -24.43065 This is the dependent

variable for weight Equal variances not

assumed

-40.58090 -24.41910

Page 68: BME STATS WORKSHOP

Let’s add a third group

• The same method as building the database in the first place applies to adding a group.

• With the addition of a third group we will need to perform an analysis of variance (ANOVA) with posthoc tests.

Page 69: BME STATS WORKSHOP

Significant results.

ANOVA

This is the dependent variable for weight

Sum of Squares df Mean Square F Sig.

Between Groups 14581.400 2 7290.700 78.973 .000

Within Groups 2492.600 27 92.319

Total 17074.000 29

Multiple Comparisons

This is the dependent variable for weight

Tukey HSD

(I) This is the

independent variable

(J) This is the

independent variable

Mean

Difference (I-J) Std. Error Sig.

Drug 2 -32.50000* 4.29694 .000 Drug 1 dimension3

Drug 3 -53.60000* 4.29694 .000

Drug 1 32.50000* 4.29694 .000 Drug 2 dimension3

Drug 3 -21.10000* 4.29694 .000

Drug 1 53.60000* 4.29694 .000

dimension2

Drug 3 dimension3

Drug 2 21.10000* 4.29694 .000

*. The mean difference is significant at the 0.05 level.

Multiple Comparisons

This is the dependent variable for weight

Tukey HSD

95% Confidence Interval (I) This is the

independent variable

(J) This is the

independent variable Lower Bound Upper Bound

Drug 2 -43.1539 -21.8461 Drug 1 dimension3

Drug 3 -64.2539 -42.9461

Drug 1 21.8461 43.1539 Drug 2 dimension3

Drug 3 -31.7539 -10.4461

Drug 1 42.9461 64.2539

dimension2

Drug 3 dimension3

Drug 2 10.4461 31.7539

Page 70: BME STATS WORKSHOP

Interpretation

• The ANOVA indicates that there are differences between the groups.

• This result allowed for conducting a posthoc Tukey test.– All groups are considered different from one

another.– This is shown by the observation that all

comparisons are significant.

Page 71: BME STATS WORKSHOP

A graph of the results obtained from the Univariate sub-option is shown here.

Page 72: BME STATS WORKSHOP

Adding a second IV will allow us to conduct an interaction analysis using the Univariate sub-option.

We observe a significant main effect for IV1 but not IV2. Also, there is no significant interaction between IV1 and IV2 on the dependent variable. See graph.

Tests of Between-Subjects Effects

Dependent Variable:This is the dependent variable for weight

Source Type III Sum of

Squares df Mean Square F Sig.

Corrected Model 14581.400a 2 7290.700 78.973 .000

Intercept 177870.000 1 177870.000 1926.699 .000

IndpendentVar 14581.400 2 7290.700 78.973 .000

Error 2492.600 27 92.319

Total 194944.000 30

Corrected Total 17074.000 29

a. R Squared = .854 (Adjusted R Squared = .843)

Page 73: BME STATS WORKSHOP
Page 74: BME STATS WORKSHOP

After all this you might want to explore the interaction

• You would run simple main effect analysis which can be done through a syntax window.

• You write a program

• This was the norm when PASW was SPSSx. SPSS was text driven.

Page 75: BME STATS WORKSHOP

Syntax

This program allows us to determine if there are differences on the dependent variable of one IV at levels (groups) of another variable.

Page 76: BME STATS WORKSHOP

Results of the Simple Main Effects Analysis

• Next slide.

Page 77: BME STATS WORKSHOP

The default error term in MANOVA has been changed from WITHIN CELLS toWITHIN+RESIDUAL. Note that these are the same for all full factorial designs.* * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e * * * * * * * * * * * * * * * * * 30 cases accepted. 0 cases rejected because of out-of-range factor values. 0 cases rejected because of missing data. 6 non-empty cells. 1 design will be processed. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -* * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e -- Design 1 * * * * * * * * * * * * * * * * * Tests of Significance for DependentVar using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 2469.20 24 102.88 INDEPENDENTVAR2 WITH 16.90 1 16.90 .16 .689 IN INDEPENDENTVAR(1) INDEPENDENTVAR2 WITH 6.40 1 6.40 .06 .805 IN INDEPENDENTVAR(2) INDEPENDENTVAR2 WITH .10 1 .10 .00 .975 IN INDEPENDENTVAR(3) INDEPENDENTVAR 14581.40 2 7290.70 70.86 .000 (Model) 14604.80 5 2920.96 28.39 .000 (Total) 17074.00 29 588.76 R-Squared = .855 Adjusted R-Squared = .825 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Page 78: BME STATS WORKSHOP

Here is how you would set up a database for a repeated measures design

1. Arrange groups in columns so that group one has data in column 1, group 2 in column 2 and so on.

2. Specify the IV in PASW.

3. Define the groups by specifying which column belongs to which group.

4. Click on OK.

Page 79: BME STATS WORKSHOP

Group data are in columns

Use repeated measures option

Page 80: BME STATS WORKSHOP

Give the variable a name and indicate the number of groups (3 in this case)

Click on add to get this popup.

Page 81: BME STATS WORKSHOP

Results are significant.

We can say that there are mean differences between the groups but we cannot say which pairs of groups differ.

Tests of Within-Subjects Effects

Measure:MEASURE_1

Source Type III Sum of

Squares df Mean Square F Sig.

Sphericity Assumed 5806.333 2 2903.167 102.344 .000

Greenhouse-Geisser 5806.333 1.276 4549.675 102.344 .000

Huynh-Feldt 5806.333 1.519 3821.923 102.344 .000

Drug

Lower-bound 5806.333 1.000 5806.333 102.344 .000

Sphericity Assumed 283.667 10 28.367

Greenhouse-Geisser 283.667 6.381 44.455

Huynh-Feldt 283.667 7.596 37.344

Error(Drug)

Lower-bound 283.667 5.000 56.733

Tests of Within-Subjects Effects

Measure:MEASURE_1

Source Partial Eta

Squared

Noncent.

Parameter

Observed

Powera

Sphericity Assumed .953 204.689 1.000

Greenhouse-Geisser .953 130.613 1.000

Huynh-Feldt .953 155.483 1.000

Drug

Lower-bound .953 102.344 1.000

a. Computed using alpha = .05

Always interpret using the Greenhouse-Geisser.

Page 82: BME STATS WORKSHOP

Recommended