+ All Categories
Home > Documents > Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric...

Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric...

Date post: 25-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
43
Biostatistics for biomedical profession BIMM34 Karin Källen & Linda Hartman November-December 2015 2015-11-02 1
Transcript
Page 1: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Biostatistics for biomedical professionBIMM34

Karin Källen & Linda Hartman

November-December 2015

2015

-11-

02

1

Page 2: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Who needs a course in biostatistics?• - Anyone who uses quntitative methods to interpret biological

processes.

2015

-11-

02

2

But really….is it necessary withtodays advanced computersand statistical packages?

Page 3: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Now more than ever!

Computers (and softwares) aredumb like stones!

They just do what we tell themto.

2015

-11-

02

3

Page 4: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

• An imaginary 2 x 2 table:

2015

-11-

02

4

• Correct hypothesis correct basic design

• Poorly specified hypothesis poor design

• Adequate methods

• Sub-optimal methods

Page 5: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

• An imaginary 2 x 2 table:

2015

-11-

02

5

• Correct hypothesis correct basic design

• Porly specified hypotesis poor design

• Adequate methods

• Sub-optimal methods

• The draw-back of studieswith these characteristics

will be detected.

Page 6: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

• An imaginary 2 x 2 table:

2015

-11-

02

6

• Correct hypothesis correct basic design

• Poorly specified hypothesis poor design

• Adequate methods

• Sub-optimal methods

• In a good design the useof non-optimal statistical methods could bias the results but the effects

are seldom strong.

• The draw-back of studieswith these characteristics

will be detected.

Page 7: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

• An imaginary 2 x 2 table:

2015

-11-

02

7

• Correct hypothesis correct basic design

• Poorly specified hypothesis poor design

• Adequate methods

• Over-belief in sophisticated statistical methods is depressingly

common. Could be difficult to detect.

• Sub-optimal methods

• In a good design the useof non-optimal statistical methods could bias the results but the effects

are seldom strong.

• The draw-back of studieswith these characteristics

will be detected.

Page 8: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

The objectives of the currentcourse in biostatistics• An imaginary 2 x 2 table:

2015

-11-

02

8

• Correct hypothesis correct basic design

• Poorly specified hypothesis poor design

• Adequate methods

• Utilization of braincapacity when designing

knowlege of basicstatistical methods

correct interpretation ofthe results.

• Over-belief in sophisticated statistical methods is depressingly

common. Could be difficult to detect.

• Sub-optimal methods

• In a good design the useof non-optimal statistical methods could bias the results but the effects

are seldom strong.

• The draw-back of studieswith these characteristics

will be detected.

Page 9: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Statistics

Population

Sample

Probability

Inferential statistics

Descriptivestatistics

2015

-11-

02

9

Page 10: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Statistics

• Descriptive statistics

Methods to summarize (the variables in) a sample

• Summary measures

• Graphical methods

• Inferential statistics

Methods to learn about the population that the sample is drawn from

• Effect measures (w confidence intervals)

• Tests (ttest chi2-test Mann-Whitney …)

• Regression modeling

2015

-11-

02

10

Today:Basic• numerical

summaries of data• graphical

summaries of data

Page 11: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Types of data

Categorical Quantitative

Binary/dichotomous

Nominal Discrete ContinuousOrdinal

2 categories

>=2 categories

Order mattersOnly wholenumbers as values

Data thatcan take anyvalue

2015

-11-

02

11

Page 12: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Types of data - exercise

• Categorize the following measurements in binary/nominal/ordinal/discrete/continuous

1. Blood serum bilirubin (μg/ml)

2. Hair colour (Blonde Brunette Redhead and Grays)

3. Vital status (Dead/alive)

4. BMI (kg/m2)

5. # Bacteria in a sample

6. Smoking status (Non-smoker/0-10 cigarettes per day/>10 cigarettes per day)

2015

-11-

02

12

Page 13: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Types of data

Binary

Categorical Quantitative

Nominal Discrete ContinuousOrdinal

Discrete variables with only a few possiblevalues are often analysed with the same methods as for ordinal variables.

Discrete variables with many possible valuesare often analysed with the same methods as for continuous variables.

2015

-11-

02

13

Page 14: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Summary measures & Graphical presentationChapter 2 & 3 in Norman & Streiner

Chapter 3 & 4 in Kirkwood and Sterne

2015

-11-

02

14

Page 15: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Graphical presentation 1: HISTOGRAM

2015

-11-

02

15

Split the data in intervals count the number (proportion) in eachinterval:

• The width of the bar tells you the interval

• The height of the bar tells youthe number (proportion) of observations in each interval

Page 16: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Summary measures:• Central Tendency measures

Describe a “center” around which the measurements

in the data are distributed

• Dispersion (or Variability) measures

Describe “data spread” or how far away

the measurements are from the center.

2015

-11-

02

16

Page 17: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency measures• Median

‐ The middle observation if data are sorted

• Mean‐

‐ The sum of the observations devided by the number of observations

• Mode‐ The most frequently occuring value

𝑿 =𝑿𝟏 + 𝑿𝟐 + ⋯+ 𝑿𝑵

𝑵=𝟏

𝑵

𝒊=𝟏

𝑵

𝑿𝒊

2015

-11-

02

17

Page 18: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency - exercise

Exercise 3.2 a-c

• Calculate the mean median and mode of a dataset with the following values: 4 8 6 3 4

2015

-11-

02

18

Page 19: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency cont.

Maternal Vitamin D:• Mean = 2.3• Median= 2.2

Child Vitamin D:• Mean = 1.4• Median = 1.2

2015

-11-

02

19

Page 20: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency

Mean or medianThe choice depends on the distribution of the data:

• Symmetric data • Asymmetric data • Ordinal data

Symmetric distribution Asymmetric distribution(positive skew)

2015

-11-

02

20

Page 21: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency

Symmetric continuous data

Maternal height:Mean=166 cm Median= 166.5 cm

Symmetric data:• Mean = median• Use the mean

2015

-11-

02

21

Page 22: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency

Assymetric continuous data

Vitamin D in child:Mean= 1.4Median= 1.2

Asymmetric data:• Mean ≠ median• Use the median

2015

-11-

02

22

Page 23: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency

Assymetric continuous dataCD16 in % of granulocytes

CD16 in % of granulocytes:Mean= 7.3Median= 4.8

2015

-11-

02

23

Page 24: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency

Ordinal data

(Kasner 2006)

(Hacke et al. 2008)

Use the median!

Exercise:• What is the median in the Alteplase group?• What is the median in the Placebo group?

2015

-11-

02

24

Page 25: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency

Nominal data

Measures of central tendency are not meaningful.

Use number of observations and proportions

2015

-11-

02

25

Barchart

Page 26: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Central tendency measures Summary

Type of data Central tendencymeasure

Symmetric data Mean

Asymmetric data Median

Ordinal Median

Nominal -

2015

-11-

02

26

Page 27: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Measures of dispersion

• Symmetric distribution – measure based on mean

• Assymetric distribution or ordinal data – measure NOT basedon the mean

A measure of dispersion refers to how closelythe data cluster around the measure of central tendency

2015

-11-

02

27

Page 28: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Spread/distributionSmall spread

Big spread

2015

-11-

02

28

Page 29: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Descibing the spread of the data

• If we look at the average diviation from the mean:

n

xxi

n

xxi

• The average diviation from the mean equals 0.

xi(xi- 𝐱)

150

152

161

177

155

160

162

158

-9.375

-7.375

1.625

17.625

-4.375

0.625

2.625

-1.375

0

X= 159,375

2015

-11-

02

29

Page 30: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Describing the spread of the

dataIf we square every term we solve the problem with 0,

then divide by n to get mean deviation:

n

xxi 2

1

2

n

xxi

To get a better estimate we use n-1 in the denominator

This is called the VARIANCE!

The variance is expressed in cm which is unpractical

since the mean length is expressed in cm2

150

152

161

177

155

160

162

158

-9.375

-7.375

1.625

17.625

-4.375

0.625

2.625

-1.375

(x- 𝐱)2

87.89

54.39

2.64

310.64

19.14

0.39

6.89

1.89

0 483.87

= 60.48

(x- 𝐱)2

2015

-11-

02

30

Page 31: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Descibing the spread of the data

1

2

n

xxs

i

By taking the square root of the variance, you

get the standard deviation (standard deviation

= SD) which has the same units as what you

measured

2015

-11-

02

31

Ex: Variance = s2 = 60.5 cm2

s = sqrt(60.48) = 7.8 cm

Page 32: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

PercentileDescribes how many percent of the observations that lies below ex:

• 10% found below 10th percentile

• 20% found below 20th percentilen etc

Quartile

• Divide data into four equal groups;

• Lower quartile – 25th percentile

• Median – 50th percentile

• Upper quartile – 75th percentile

Q1 = (n+1)/4, Q2 = 2(n+1)/4 (Median), Q3 = 3(n+1)/4 of ordered observations

• Interquartile range (IQR) = The difference between the upper and the lower

quartiles

2015

-11-

02

32

Page 33: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Measures of dispersion

• Standard deviation – The mean deviation from the mean value

• Percentiles & quartiles

– Splits the data in fixed proportions

• Range – The difference between min and max

2015

-11-

02

33

Page 34: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Measures of dispersion -exercise

Exercise 3.2 d-e

• Calculate the standard deviation and rangefor a dataset with the following values: 4 8 6 3 4

2015

-11-

02

34

Page 35: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

RobustnessHighly skewed data

Fig 3-10

Measure WITHOUT largestobservation

WITH largestobservation

Mean 3.9 6.1

Median 4 4

Range 5 42

SD 1.4 9.3

QL; QU 3; 5 3; 5

Robust to extreme observations

Sensitive to extreme observations

Use Median & Quartiles for skeweddata

Graphicalpresentation!

+

2015

-11-

02

35

Page 36: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Summary: Summary measures

Type of data Central tendencymeasure

Dispersion measure

Symmetric data Mean Standard deviation

Asymmetric data Median Percentiles (e.g. QL andQU )

Ordinal Median Percentiles

Nominal - -

2015

-11-

02

36

Page 37: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Graphical presentation 2: BOX-PLOT

2015

-11-

02

37

383339N =

Fiskkonsumtionsgrupp

HögMediumLåg

CB

_153 (

ng/g

lip

idvik

t)

3000

2000

1000

0

Low medium high

Fish consumption

Outlier O

Observationes more than 1.5 IQR outside the box

Extreme values *Observations more than 3 IQR outside the box

Lowest ”normal” value

Lower quartile QL

Median

Upper quartile QU

Highest ”normal” value

(Inner fence)

IQR=QU –QL = Box-length

Page 38: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Box-plot - exercise

2015

-11-

02

38

• How can you use the boxplot to judge if a distribution is symmetricor asymmetric?

Use the examples in yourdiscussion

Page 39: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Box-plot: Exercise 2

Blood pressure was mesured in 39 women:

BP=138 140 141 142 142 142 142 142 143 143 144 144 144 144 145 147 147 147 147 149 149 150 150 151 152 154 154 157 157 157 158 159 161 162 162 166 167 167 170 mmHG

(Results are sorted)

• Create a boxplot of Blood-pressure

2015

-11-

02

39

Page 40: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

2013

-11-

04

40

Page 41: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Box-plot vs histogram, ex contBlood pressure was measured in 39 women:

BP=138 140 141 142 142 142 142 142 143 143 144 144 144 144 145 147 147 147 147 149 149 150 150 151 152 154 154 157 157 157 158 159 161 162 162 166 167 167 170 mmHG

2015

-11-

02

4105

10

15

Fre

qu

en

cy

130 140 150 160 170bp_before

median: 149Q1: 143Q3: 157

min: 138max: 170

IQR=Q3-Q1=14

Page 42: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

Box-plot cont

2015

-11-

02

42

Whats wrong with Figure 3-7?

Page 43: Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric distribution –measure based on mean • Assymetric distribution or ordinal data –measure

2015

-11-

02

43

Summary:‐ Types of variables (binary/nominal/ordinal/discrete/continuous)- Descriptive statistics

- Central tendency measures (mean median)- Dispersion measures (standard deviation percentiles)

- Graphical presentation- Barplot- Histogram- Boxplot

Wednesday lecture:

Subject Norman &Streiner

Kirkwood and Sterne

Normal distribution 4 5

Population, samples generalisability

6 7

Reference interval, Confidence interval

6 4.5, 6, 7


Recommended