+ All Categories
Transcript
Page 1: Statistics in Biology and Medicine

Statistics in Biology and Medicine

Richard Tseng

Pharmamatrix Workshop 2010

July 14, 2010

Page 2: Statistics in Biology and Medicine

The goal of statistics is to analyze, interpret and present data collected to study systems of interest!!

Page 3: Statistics in Biology and Medicine

Outline• Descriptive statistics• Inferential statistics

– Probability theory– Hypothesis test– Regression

• Some other tools– Tools for component analysis– Bayesian statistics

• Summary

Page 4: Statistics in Biology and Medicine

Descriptive statistics

• Definitions– Set: A well-defined collection of objects and each

object is called an element– Operation of sets: union and intersection

For example, A = {1, 2, 3, 4} and B = {3, 4, 5, 6}

4} {3, BA

6} 5, 4, 3, 2, {1, BA

Page 5: Statistics in Biology and Medicine

• Data type– Interval scale

• For example, body weight (g): 1, 1.5, 2, 3 …

– Ordinal scale• For example, scores for patient responses to treatment

– Nominal scale • Categorical data. For example, factors to influence

treatments

Response Much worse

Bit worse About same

Bit better Much better

Score -2 -1 0 1 2

Page 6: Statistics in Biology and Medicine

• How large are the numbers?– Mean – Median

[1]

Page 7: Statistics in Biology and Medicine

[1]

• How variable are the numbers?– Standard deviation (SD) – Coefficient of variance

(CV = SD/mean)

Page 8: Statistics in Biology and Medicine

Inferential Statistics: Probability theory

• Law of large numbers– The mean of elements in a set converges to the

expected value when the number of elements close to infinite

• Law of small numbers– There are not enough small numbers to satisfy all

the demands placed on them

Page 9: Statistics in Biology and Medicine

• Central limit theorem– states conditions under which the mean of a

sufficiently large number of independent random variable, each with finite mean and variance, will be approximately normally distributed

http://www.stat.sc.edu/~west/javahtml/CLT.html

Page 10: Statistics in Biology and Medicine

• Probability– Meaning

• Frequency interpretation: A number are associated with the rate of occurrence of an event in a well defined random physical systems

• Bayesian interpretation: A number assigned to any statement whatsoever, even when no random process is involved, as a way to represent the degree to which the statement is supported by the available evidence

Page 11: Statistics in Biology and Medicine

• Probability– Basic rules

• Subtraction

• Addition

• Multiplication

BAPBPAPBAP

'1 APAP

BAPAPBAP

Page 12: Statistics in Biology and Medicine

• Probability– Bayesian rule

APABP

BPBAP

LikelihoodPriorPosterior

Page 13: Statistics in Biology and Medicine

• Probability– Maximum entropy principle: The most honest

probability distribution assignment to a system is the one that maximizes the entropy of the system subject to any information available in hand.

Page 14: Statistics in Biology and Medicine

Inferential Statistics: Regression

• Goal: To correlate the study outcomes of systems of interest and possible factors.

• Model: – Linear model

– Logistic model

bxaR

1exp

exp

bxa

bxaR

Page 15: Statistics in Biology and Medicine

• OptimizationSuppose there are n outcomes di of a study

– Least-square method

– Maximum Likelihood estimate: Supoose a likelihood function is given by L(a,b|d)

i

iibbaa

dR 2

ˆ,ˆmin

dbaLbbaa

,maxˆ,ˆ

Page 16: Statistics in Biology and Medicine

• Regression tests– Residual analysis

– Standard errors of regression coefficients

– Coefficient of determination

ii dR residual

n

ii

n

iii

dRnn

dRE

1

2

1

2

21S

2

2

)(

)(ˆ

dSD

RSDbR

Page 17: Statistics in Biology and Medicine

• Example 1: Linear regression

Page 18: Statistics in Biology and Medicine

• Example 2: MLE solution of Emax and EC50 in Michales-Menten equation

Likelihood function

MLE solution

Page 19: Statistics in Biology and Medicine

Inferential Statistics: Hypothesis test

Page 20: Statistics in Biology and Medicine

• Goal: Test of significance• Rationale

– Null hypothesis: H0, outcomes of a study purely result from chance

– Alternative hypothesis: H1, outcomes of a study are influenced from non-random sources

– Appropriate model: Normal distribution, t-distribution…

Page 21: Statistics in Biology and Medicine
Page 22: Statistics in Biology and Medicine

• Rationale– Appropriate analysis method

• P-value: The probability of observing a sample statistic as extreme as the test statistic, assuming the null hypothesis is true.

• Parametric method: t-test, F-test, Chi-square test• Non-parametric method: Kolmogorov-Smirnov test,

Mann-Whitney test

Page 23: Statistics in Biology and Medicine

P-value for significant test: 1. What is the probability of a test value from a random

population? One or two tailed?

2. If p-value is less than the confidence level the null hypothesis is rejected

t-distribution

http://socr.ucla.edu/htmls/dist/StudentT_Distribution.html

Page 24: Statistics in Biology and Medicine

Test method Test statistic Null hypothesis

one sample t-test the means of two normally distributed populations are equal

two sample F-test the means of normally distributed populations, all having the same standard deviation, are equal

Pearson Chi-sqaure test whether theoretical population R and real population d are different

nSD

dRt

/

2

1

SD

SDF

n

i i

ii

d

dR

1

22

•Parametric test

Page 25: Statistics in Biology and Medicine

Two sample t-test: (Online calculator http://www.usablestats.com/calcs/2samplet)

N Mean StDev SE Mean

Sample 1 15 0.633 0.2162 0.056

Sample 2 15 0.931 0.2021 0.052

Observed difference (Sample 1 - Sample 2): -0.298Standard Deviation of Difference : 0.0764

Unequal VariancesDF : 2795% Confidence Interval for the Difference ( -0.4548 , -0.1412 )T-Value -3.9005 Population 1 ≠ Population 2: P-Value = 0.0006 Population 1 > Population 2: P-Value = 0.9997 Population 1 < Population 2: P-Value = 0.0003 

Equal VariancesPooled Standard Deviation: 0.2093 Pooled DF: 2895% Confidence Interval for the Difference ( -0.4545 , -0.1415 ) T-Value -3.8992 Population 1 ≠ Population 2: P-Value = 0.0006 Population 1 > Population 2: P-Value = 0.9997 Population 1 < Population 2: P-Value = 0.0003

Page 26: Statistics in Biology and Medicine

Some Statistics Worth to Know

• Tool for component analysis: – Principle Component Analysis (PCA): A way to

identify patterns in data, and express in a way to highlight their similarities and differences

– Independent Component Analysis (ICA): A way to separate independent components in data

– Variable and model selection: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC)

• Bayesian statistics

Page 27: Statistics in Biology and Medicine

Summary

• What is “right” null hypothesis?• What is the appropriate distribution function?• What is the appropriate test statistics?

“Know” your data before analyze that!!

Page 28: Statistics in Biology and Medicine

Information theory based statistics: Bayesian statstics

• Goal: Using Bayesian method to design and analyze data

• Bayesian inference– Appropriate distribution functions– Appropriate sampling techniques

• Maximum entropy method based inference– Appropriate form of entropy– Appropriate constriants

Page 29: Statistics in Biology and Medicine

Information theory based statistics: Method of maximum

entropy

Page 30: Statistics in Biology and Medicine

Reference

[1] P. Rowe, Essential Statistics for Pharmaceutical Sciences, Wiley 2007.


Top Related