+ All Categories
Home > Documents > Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245...

Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245...

Date post: 08-Sep-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
35
Biostatistics Burkhardt Seifert & Alois Tschopp Department of Biostatistics Epidemiology, Biostatistics and Prevention Institute (EBPI) University of Zurich Master of Science in Medical Biology 1 / 31
Transcript
Page 1: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Biostatistics

Burkhardt Seifert & Alois Tschopp

Department of BiostatisticsEpidemiology, Biostatistics and Prevention Institute (EBPI)

University of Zurich

Master of Science in Medical Biology 1 / 31

Page 2: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Overview

1 Introduction

2 Univariate descriptive statistics

3 Probability theory

4 Hypothesis testing and confidence intervals

5 Correlation and linear regression

6 Logistic regression

7 Survival analysis

8 Analysis of variance

Master of Science in Medical Biology 2 / 31

Page 3: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Introduction

For which purpose does a medical biologist need statistics?

in the own field of research

study of literature

consulting and support of the respective working groupin quantitative methods

Master of Science in Medical Biology 3 / 31

Page 4: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Population and sample

Data are based on one sampleData of different samples varyConclusions are valid for a population

● ●

population mean µ

sample:

mean x

draw conclusion for

population mean

Master of Science in Medical Biology 4 / 31

Page 5: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Population and sample (II)

Population

The population is the totality of all individuals for which conclusionsshould be made.

Sample

A sample of a population is the set of individuals that are actuallyobserved.

Example:

Population = all human beings (all Swiss citizens)

Sample = students of Medical Biology visiting this lecture

Master of Science in Medical Biology 5 / 31

Page 6: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Recommended literature

Held L., Rufibach K. and Seifert B. (2013). Medizinische Statistik. Konzepte, Methoden,Anwendungen. Pearson Studium.- covers simple to most recent advanced statistics, 448 pages.

Kirkwood, B. R. and Sterne, J. A. C. (2006). Essential Medical Statistics. Blackwell, 4thedition.- extensive textbook, 502 pages.

Husler, J. and Zimmermann, H. (2006). Statistische Prinzipien fur medizinische Projekte.Hans Huber, Bern.- clearly presented textbook, 355 pages.

Armitage, P., Berry, G., and Matthews, J. N. S. (2002). Statistical methods in medicalresearch. Blackwell, 4th edition.- comprehensive textbook, 817 pages.

Johnson, R. A. and Bhattacharyya, G. K. (2001). Statistics. Principles and methods.Wiley, 4th edition.- light reading textbook, 236 pages.

Bland, M. (1995). An introduction to medical statistics. Oxford Medical Publications.- very good introduction with many examples and exercises, 396 pages.

Master of Science in Medical Biology 6 / 31

Page 7: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

BiostatisticsUnivariate descriptive statistics

Burkhardt Seifert & Alois Tschopp

Department of BiostatisticsEpidemiology, Biostatistics and Prevention Institute (EBPI)

University of Zurich

Master of Science in Medical Biology 7 / 31

Page 8: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Univariate descriptive statistics

Approach “descriptive”, without “significance”

Main types of data (scale types)

Description of data

- via tables- via graphics- via location- and dispersion statistics

Master of Science in Medical Biology 8 / 31

Page 9: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Data in a table

In 2006, 245 students (16 groups) of the 2nd semester inmedicine reported their body height and measured theirhand length

sex height hand group tutor gender

1 168.0 17.5 1 1 f0 183.5 21.0 1 1 m1 170.0 20.0 1 1 f1 159.0 17.0 1 1 f1 165.0 18.0 1 1 f0 180.0 20.0 1 1 m1 181.0 19.5 1 1 f0 193.0 21.5 1 1 m0 183.0 19.5 1 1 m0 183.0 20.5 1 1 m... ... ... ... ... ...

Master of Science in Medical Biology 9 / 31

Page 10: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Main types of data

1) nominal, categorical data

Assignment to categories→ Counts and % meaningfulExamples: Gender, blood type

sex height hand group tutor gender

1 168.0 17.5 1 1 f0 183.5 21.0 1 1 m1 170.0 20.0 1 1 f1 159.0 17.0 1 1 f1 165.0 18.0 1 1 f0 180.0 20.0 1 1 m1 181.0 19.5 1 1 f0 193.0 21.5 1 1 m0 183.0 19.5 1 1 m0 183.0 20.5 1 1 m... ... ... ... ... ...

Levels Frequency % Cum. %

sex m 106 43.3 43.3f 139 56.7 100.0

Total 245 100.0

1-2) ordinal data (ordered categorical)

have a rankingExample: Severity of a disease

Master of Science in Medical Biology 10 / 31

Page 11: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Describing data in tables and graphics

Discrete data

relative frequency =number of times an event occurred

total number of events

Example: Proportion of blood types in a healthy population

Table

Blood type Frequency Rel. frequency

0 2313 38 %A 2678 44 %B 731 12 %

AB 365 6 %

Total 6087 100 %

Graphics are:

- easy to comprehend- easy to create nowadays

Master of Science in Medical Biology 11 / 31

Page 12: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Graphics

Pie chart

38%

44%12%

6%

Blood type

0ABAB

Pareto or bar chart

0 A B AB

Blood type

Cou

nts

050

010

0015

0020

0025

00

Origin!

Master of Science in Medical Biology 12 / 31

Page 13: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Bar chart

f m

Tutor

123

Per

cent

0

5

10

15

20

Master of Science in Medical Biology 13 / 31

Page 14: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Bar chart

f m

Tutor

123

Per

cent

0

5

10

15

20

Don’t trust a graphic which is higher than wide.

Master of Science in Medical Biology 14 / 31

Page 15: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Bar chart

f m

Tutor

123

Per

cent

10

12

14

16

18

20

Don’t trust a graphic which is higher than wide.

Pay attention to the origin.

Master of Science in Medical Biology 15 / 31

Page 16: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Main types of data

2) continuous (numeric) data

Differences and means meaningfulExample: Temperature in ◦C

If a absolute zero point exists→ Ratios meaningfulExamples: Temperature in K,body height, length of hand

sex height hand group tutor gender

1 168.0 17.5 1 1 f0 183.5 21.0 1 1 m1 170.0 20.0 1 1 f1 159.0 17.0 1 1 f1 165.0 18.0 1 1 f0 180.0 20.0 1 1 m1 181.0 19.5 1 1 f0 193.0 21.5 1 1 m0 183.0 19.5 1 1 m0 183.0 20.5 1 1 m... ... ... ... ... ...

Not meaningful: “There were times when the temperature was 60%higher than nowadays” BBC 2006

Now Then

14 ◦C 22 ◦C57 ◦F 91 ◦F = 33 ◦C287K 459K = 186 ◦C

Master of Science in Medical Biology 16 / 31

Page 17: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Histogram

Graphical visualisation of the data distribution, “data density”Continuous and ordinal dataGroup data into similar, non overlapping classes (intervals)

Determine number of observations in interval

Relative frequencyin interval

=number of observations in interval

total number of observations

Show relative (or absolute) frequencies of intervals in a bar chart

Body height (in cm)

Den

sity

150 155 160 165 170 175 180 185

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Master of Science in Medical Biology 17 / 31

Page 18: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Female body heightordered

Interval Height n # Observations Relative frequency

150-154 150 1153 1154 1 3 2%

155-159 156 3156.5 1

157 2158 4159 2 12 9%

160-164 160 8161 6162 5163 5164 7 31 22%

165-169 165 16167 8168 12169 6 42 30%

170-174 170 14171 2172 4173 9174 4 33 24%

175-179 175 2176 4177 2178 3179 1 12 9%

180-184 180 1181 2182 2183 1 6 4%

Total 139 100%

Master of Science in Medical Biology 18 / 31

Page 19: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Histogram

m

Body height (in cm)

Den

sity

150 160 170 180 190 200

0.00

0.02

0.04

0.06

0.08

f

Body height (in cm)

Den

sity

150 160 170 180 190 200

0.00

0.02

0.04

0.06

0.08

Shows the distribution in the sample

Meaningful interval length: 5 cm

Fitted a “Gaussian normal distribution” to distribution inpopulation

Master of Science in Medical Biology 19 / 31

Page 20: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Histogram

m

Body height (in cm)

Den

sity

150 160 170 180 190 200

0.00

0.04

0.08

0.12

f

Body height (in cm)

Den

sity

150 160 170 180 190 200

0.00

0.04

0.08

0.12

Interval length: 1 cm (very variable)

Statement depends mainly on bin width and slightly on center

Histograms are simple and popular, but there are better densityestimators

Master of Science in Medical Biology 20 / 31

Page 21: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Cumulative histogram

A cumulative histogram estimates the distribution function

Cumulative histogram

Body height

Fre

quen

cy

150 155 160 165 170 175 180 185

020

6010

014

0

150 155 160 165 170 175 1800.

00.

20.

40.

60.

81.

0

Empirical distribution function

Body height

Dis

trib

utio

n fu

nctio

n

n:139 m:0

Master of Science in Medical Biology 21 / 31

Page 22: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Characterization of the centre of the data

What is a typical, mean value?

Mean x : measure of the “middle” (mean, average) value

x = (x1 + x2 + . . .+ xn)/n

The mean is the value which balances the data on a set of scales.

0�

500�

1000 1500 2000�

2500�

With normally distributed data the mean in a sample is the best fitto the mean in the population.

sensitive to outliers

Master of Science in Medical Biology 22 / 31

Page 23: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Dispersion or variation of a sample

Master of Science in Medical Biology 23 / 31

Page 24: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Dispersion or variation of a sample

How dispersed are the data around the meanposition?

Variance s2:

Compute deviations (x1 − x), . . . , (xn − x)Mean? No - would result to be 0!⇒ s2 = {(x1 − x)2, . . . , (xn − x)2}/(n − 1)

Note: s2 in squared units (e. g. cm2)

Standard deviation (SD): s =√

variance (in cm)

For normally distributed data are 68% of the data in the intervalmean ± SD, 95% of the data in the interval mean ± 2 SD.

sensitive to outliers

no interpretation for non-normally distributed data

Master of Science in Medical Biology 23 / 31

Page 25: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Descriptive statistics

Data are often represented by the mean plus-minus the standarddeviation (mean ± SD).

R-output summary():

Min. 1st Qu. Median Mean 3rd Qu. Max.

f 150.0 163.0 167.0 167.2 171.5 183.0m 165.0 176.0 180.0 180.2 184.0 197.0

R-output tableContinuos() (“reporttools”, v.1.0.4):

Gender N Min Q1 Median Mean Q3 Max SD IQR #NA

f 139 150 163 167 167.2 171.5 183 6.6 8.5 0m 106 165 176 180 180.2 184.0 197 6.2 8.0 0

Mean ± SD or Mean ± SEM ?

The standard error of the mean (SEM) is the standarddeviation of the mean: SEM = SD/

√n.

In descriptive statistics the SEM should not be used!

Master of Science in Medical Biology 24 / 31

Page 26: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Bar chart

m f

Gender

Hei

ght

050

100

150

200

Error bars show mean +/- 1.0 SDBars show means

Bars stand on the floor, therefore pay attention to the origin

Take care of 3-dimensional graphics

Master of Science in Medical Biology 25 / 31

Page 27: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Bar chart

Bars stand on the floor, therefore pay attention to the origin

Take care of 3-dimensional graphics

Master of Science in Medical Biology 26 / 31

Page 28: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Dot charts

160

165

170

175

180

185

Gender

Hei

ght

m f

Error bars show mean +/- 1.0 SDDots show means

The origin has no meaning here

Master of Science in Medical Biology 27 / 31

Page 29: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Percentiles (quantiles)

α.– percentile (α% – quantile):α% of the data are smaller than or equal to the α. – percentileand (100− α)% are larger or equal.

Examples: Median = 50. percentile

Quartile = 25. and 75. percentiles

150 160 170 180 190 200

0.0

0.2

0.4

0.6

0.8

1.0

Body height

Dis

trib

utio

n fu

nctio

n

Not unique!In R there are nine different quantile algorithms.

Master of Science in Medical Biology 28 / 31

Page 30: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Percentiles (quantiles)

α.– percentile (α% – quantile):α% of the data are smaller than or equal to the α. – percentileand (100− α)% are larger or equal.

Examples: Median = 50. percentile

Quartile = 25. and 75. percentiles

150 160 170 180 190 200

0.0

0.2

0.4

0.6

0.8

1.0

Body height

Dis

trib

utio

n fu

nctio

n

0.5●

Med

ian

Not unique!In R there are nine different quantile algorithms.

Master of Science in Medical Biology 28 / 31

Page 31: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Percentiles (quantiles)

α.– percentile (α% – quantile):α% of the data are smaller than or equal to the α. – percentileand (100− α)% are larger or equal.

Examples: Median = 50. percentile

Quartile = 25. and 75. percentiles

150 160 170 180 190 200

0.0

0.2

0.4

0.6

0.8

1.0

Body height

Dis

trib

utio

n fu

nctio

n

0.5●

Med

ian0.25

1. Q

u.

0.75●

3. Q

u.

Not unique!In R there are nine different quantile algorithms.

Master of Science in Medical Biology 28 / 31

Page 32: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Percentiles (quantiles)

α.– percentile (α% – quantile):α% of the data are smaller than or equal to the α. – percentileand (100− α)% are larger or equal.

Examples: Median = 50. percentile

Quartile = 25. and 75. percentiles

150 160 170 180 190 200

0.0

0.2

0.4

0.6

0.8

1.0

Body height

Dis

trib

utio

n fu

nctio

n

0.5●

Med

ian0.25

1. Q

u.

0.75●

3. Q

u.

IQR

Not unique!In R there are nine different quantile algorithms.

Master of Science in Medical Biology 28 / 31

Page 33: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Boxplot

m f

150

160

170

180

190

Gender

Hei

ght

0

minimum (without outliers)

lower quartilemedianupper quartile

maximum (without outliers)

Master of Science in Medical Biology 29 / 31

Page 34: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Characterization of the centre of the data

Median: centre of the data, 50. precentile

i.e. half of the sample is above the median, the other half below

The median is robust to outliers.

Mode: (rarely used)

- discrete data: most frequent value- continuous data: maximum of the density

(population only)

Master of Science in Medical Biology 30 / 31

Page 35: Burkhardt Seifert & Alois Tschoppffffffff-c1f2-5119...sex m 106 43.3 43.3 f 139 56.7 100.0 Total 245 100.0 1-2) ordinal data (ordered categorical) have a ranking Example: Severity

Dispersion of a sample

Range = maximum − minimum

- states the range of all values in the sample- strongly influenced by outliers- but: Minimum and maximum are easy to understand

Interquartile range (IQR)= 75. percentile − 25. percentile= length of box in the boxplot, contains central 50% of data

- as standard deviation a measure for the magnitude of thecentral range of the data

With normally distributed data half the IQR equals 0.67 SD.

- “Median(IQR)” tells nothing about skewness⇒ Data are often reported as

“Median [lower quartile, upper quartile]”.

Master of Science in Medical Biology 31 / 31


Recommended