+ All Categories
Home > Documents > Statistics in Biology and Medicine

Statistics in Biology and Medicine

Date post: 20-Jan-2016
Category:
Upload: marvel
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Pharmamatrix Workshop 2010. Statistics in Biology and Medicine. Richard Tseng. July 14, 2010. The goal of statistics is to analyze, interpret and present data collected to study systems of interest!!. Outline. Descriptive statistics Inferential statistics Probability theory - PowerPoint PPT Presentation
Popular Tags:
30
Statistics in Biology and Medicine Richard Tseng Pharmamatrix Workshop 2010 July 14, 2010
Transcript
Page 1: Statistics in Biology and Medicine

Statistics in Biology and Medicine

Richard Tseng

Pharmamatrix Workshop 2010

July 14, 2010

Page 2: Statistics in Biology and Medicine

The goal of statistics is to analyze, interpret and present data collected to study systems of interest!!

Page 3: Statistics in Biology and Medicine

Outline• Descriptive statistics• Inferential statistics

– Probability theory– Hypothesis test– Regression

• Some other tools– Tools for component analysis– Bayesian statistics

• Summary

Page 4: Statistics in Biology and Medicine

Descriptive statistics

• Definitions– Set: A well-defined collection of objects and each

object is called an element– Operation of sets: union and intersection

For example, A = {1, 2, 3, 4} and B = {3, 4, 5, 6}

4} {3, BA

6} 5, 4, 3, 2, {1, BA

Page 5: Statistics in Biology and Medicine

• Data type– Interval scale

• For example, body weight (g): 1, 1.5, 2, 3 …

– Ordinal scale• For example, scores for patient responses to treatment

– Nominal scale • Categorical data. For example, factors to influence

treatments

Response Much worse

Bit worse About same

Bit better Much better

Score -2 -1 0 1 2

Page 6: Statistics in Biology and Medicine

• How large are the numbers?– Mean – Median

[1]

Page 7: Statistics in Biology and Medicine

[1]

• How variable are the numbers?– Standard deviation (SD) – Coefficient of variance

(CV = SD/mean)

Page 8: Statistics in Biology and Medicine

Inferential Statistics: Probability theory

• Law of large numbers– The mean of elements in a set converges to the

expected value when the number of elements close to infinite

• Law of small numbers– There are not enough small numbers to satisfy all

the demands placed on them

Page 9: Statistics in Biology and Medicine

• Central limit theorem– states conditions under which the mean of a

sufficiently large number of independent random variable, each with finite mean and variance, will be approximately normally distributed

http://www.stat.sc.edu/~west/javahtml/CLT.html

Page 10: Statistics in Biology and Medicine

• Probability– Meaning

• Frequency interpretation: A number are associated with the rate of occurrence of an event in a well defined random physical systems

• Bayesian interpretation: A number assigned to any statement whatsoever, even when no random process is involved, as a way to represent the degree to which the statement is supported by the available evidence

Page 11: Statistics in Biology and Medicine

• Probability– Basic rules

• Subtraction

• Addition

• Multiplication

BAPBPAPBAP

'1 APAP

BAPAPBAP

Page 12: Statistics in Biology and Medicine

• Probability– Bayesian rule

APABP

BPBAP

LikelihoodPriorPosterior

Page 13: Statistics in Biology and Medicine

• Probability– Maximum entropy principle: The most honest

probability distribution assignment to a system is the one that maximizes the entropy of the system subject to any information available in hand.

Page 14: Statistics in Biology and Medicine

Inferential Statistics: Regression

• Goal: To correlate the study outcomes of systems of interest and possible factors.

• Model: – Linear model

– Logistic model

bxaR

1exp

exp

bxa

bxaR

Page 15: Statistics in Biology and Medicine

• OptimizationSuppose there are n outcomes di of a study

– Least-square method

– Maximum Likelihood estimate: Supoose a likelihood function is given by L(a,b|d)

i

iibbaa

dR 2

ˆ,ˆmin

dbaLbbaa

,maxˆ,ˆ

Page 16: Statistics in Biology and Medicine

• Regression tests– Residual analysis

– Standard errors of regression coefficients

– Coefficient of determination

ii dR residual

n

ii

n

iii

dRnn

dRE

1

2

1

2

21S

2

2

)(

)(ˆ

dSD

RSDbR

Page 17: Statistics in Biology and Medicine

• Example 1: Linear regression

Page 18: Statistics in Biology and Medicine

• Example 2: MLE solution of Emax and EC50 in Michales-Menten equation

Likelihood function

MLE solution

Page 19: Statistics in Biology and Medicine

Inferential Statistics: Hypothesis test

Page 20: Statistics in Biology and Medicine

• Goal: Test of significance• Rationale

– Null hypothesis: H0, outcomes of a study purely result from chance

– Alternative hypothesis: H1, outcomes of a study are influenced from non-random sources

– Appropriate model: Normal distribution, t-distribution…

Page 21: Statistics in Biology and Medicine
Page 22: Statistics in Biology and Medicine

• Rationale– Appropriate analysis method

• P-value: The probability of observing a sample statistic as extreme as the test statistic, assuming the null hypothesis is true.

• Parametric method: t-test, F-test, Chi-square test• Non-parametric method: Kolmogorov-Smirnov test,

Mann-Whitney test

Page 23: Statistics in Biology and Medicine

P-value for significant test: 1. What is the probability of a test value from a random

population? One or two tailed?

2. If p-value is less than the confidence level the null hypothesis is rejected

t-distribution

http://socr.ucla.edu/htmls/dist/StudentT_Distribution.html

Page 24: Statistics in Biology and Medicine

Test method Test statistic Null hypothesis

one sample t-test the means of two normally distributed populations are equal

two sample F-test the means of normally distributed populations, all having the same standard deviation, are equal

Pearson Chi-sqaure test whether theoretical population R and real population d are different

nSD

dRt

/

2

1

SD

SDF

n

i i

ii

d

dR

1

22

•Parametric test

Page 25: Statistics in Biology and Medicine

Two sample t-test: (Online calculator http://www.usablestats.com/calcs/2samplet)

N Mean StDev SE Mean

Sample 1 15 0.633 0.2162 0.056

Sample 2 15 0.931 0.2021 0.052

Observed difference (Sample 1 - Sample 2): -0.298Standard Deviation of Difference : 0.0764

Unequal VariancesDF : 2795% Confidence Interval for the Difference ( -0.4548 , -0.1412 )T-Value -3.9005 Population 1 ≠ Population 2: P-Value = 0.0006 Population 1 > Population 2: P-Value = 0.9997 Population 1 < Population 2: P-Value = 0.0003 

Equal VariancesPooled Standard Deviation: 0.2093 Pooled DF: 2895% Confidence Interval for the Difference ( -0.4545 , -0.1415 ) T-Value -3.8992 Population 1 ≠ Population 2: P-Value = 0.0006 Population 1 > Population 2: P-Value = 0.9997 Population 1 < Population 2: P-Value = 0.0003

Page 26: Statistics in Biology and Medicine

Some Statistics Worth to Know

• Tool for component analysis: – Principle Component Analysis (PCA): A way to

identify patterns in data, and express in a way to highlight their similarities and differences

– Independent Component Analysis (ICA): A way to separate independent components in data

– Variable and model selection: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC)

• Bayesian statistics

Page 27: Statistics in Biology and Medicine

Summary

• What is “right” null hypothesis?• What is the appropriate distribution function?• What is the appropriate test statistics?

“Know” your data before analyze that!!

Page 28: Statistics in Biology and Medicine

Information theory based statistics: Bayesian statstics

• Goal: Using Bayesian method to design and analyze data

• Bayesian inference– Appropriate distribution functions– Appropriate sampling techniques

• Maximum entropy method based inference– Appropriate form of entropy– Appropriate constriants

Page 29: Statistics in Biology and Medicine

Information theory based statistics: Method of maximum

entropy

Page 30: Statistics in Biology and Medicine

Reference

[1] P. Rowe, Essential Statistics for Pharmaceutical Sciences, Wiley 2007.


Recommended