+ All Categories
Home > Documents > REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford [email protected] 03/02/2012.

REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford [email protected] 03/02/2012.

Date post: 13-Dec-2015
Category:
Upload: roland-gray
View: 220 times
Download: 2 times
Share this document with a friend
Popular Tags:
62
REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford [email protected] 03/02/2012
Transcript
Page 1: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

REGRESSION MODELS AND POLYTOMOUS VARIABLES

Joel Mefford [email protected]/02/2012

Page 2: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

POLYTOMOUS EXPOSURES AND OUTCOMES

Rothman, Greenland, and Lash: Ch. 17.

Page 3: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

Categorization of continuous variables into polytomous categorical variables has the risks we have already discussed in the context of creating dichotomous variables from continuous variables and misclassification bias

Biologically meaningful categories Residual confounding, especially if wide ranges of continuous measurements

are lumped together into single categories Sparse or empty categories can make analysis difficult Modeling, describing, or adjusting for misclassification become complex

problems

Page 4: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

Tabular analyses using polytomous variables Conduct a series of pair-wise analyses using methods for dichotomous variables Global tests for independence or trends Graphical analyses Move on to regression

Page 5: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

Page 6: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

Fruit and vegetable

servings per day

[] <=2 2 < [] <=4 4 < [] <=6 6 < [] total

cases 49 125 136 178 488

controls 28 111 140 209 488

total 77 236 276 387 976

Page 7: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

Fruit and vegetable servings per day

[] <=2 2 < [] <=4 4 < [] <=6 6 < [] total

cases 49 125 136 178 488

controls 28 111 140 209 488

total 77 236 276 387 976

[] <=2 2 < [] <=4 4 < [] <=6 6 < [] total

cases         488

controls         488

total 77 236 276 387 976

Page 8: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

Fruit and vegetable

servings per day

[] <=2 2 < [] <=4 4 < [] <=6 6 < [] total

cases 49 125 136 178 488

controls 28 111 140 209 488

total 77 236 276 387 976

[] <=2 2 < [] <=4 4 < [] <=6 6 < [] total

cases         488

controls         488

total 77 236 276 387 976

[] <=2 2 < [] <=4 4 < [] <=6 6 < [] total

cases 38.5 118 138 193.5 488

controls 38.5 118 138 193.5 488

total 77 236 276 387 976

Page 9: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

[] <=2 2 < [] <=4 4 < [] <=6 6 < [] total

cases 2.863636364 0.415254237 0.028985507 1.241602067 488

controls 2.863636364 0.415254237 0.028985507 1.241602067 488

total 77 236 276 387 976

df (2-1)(4-1) = 3

P-value 0.003

Page 10: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

GWAS looking at relapse or relapse-free survival after chemotherapy with (busulfan + etoposide) and autologous bone-marrow transplantation (BMT) for AML.

•314 AML patients who had chemotherapy and autologous BMT•78 patients relapsed within 12 months•199 patients did not relapse within 12 months•37 lost of follow up (missing data)

Page 11: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

Analysis 1:

Cox proportional hazards model :Time = months relapse free survival after transplantationEvent = relapseParameter of interest: hazard ratio associated with the addition of a minor

allele at a particular SNPDataset = all subjectsAdjustment covariates: 10 PCs to adjust for ancestry/relatedness2 clinical prognostic scores

Page 12: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

Analysis 2:

Trend test to look for an association between the number of minor alleles and the fraction of subjects who had a relapse of their leukemia within 12 months of transplantation

Page 13: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

The “top hits” or the SNPs with the lowest p-values for the most suggestively significant results from the two analyses are highly overlapping sets. Rank orderings of the “top hits” were different though.

The results from the survival analyses with the adjustment covariates are most interesting going forward, but the simple trend tests did capture some of the same information.

Page 14: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

status\genotype AA AB BB

case N11 N12 N13

control N21 N22 N23

Page 15: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Polytomous Exposures and Outcomes

status\genotype AA AB BB

case N11 N12 N13

control N21 N22 N23

status\genotype AA AB BB total

case N11 N12 N13 R1

control N21 N22 N23 R2

total C1 C2 C3 N

T = sum_{column i}[wi * (N1i*R2 -N2i*R1)]

under null (no assiciation): E[T]=0

There is a formula for V[T]

T/sqrt(V[T]) -> N(0,1)

Page 16: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

REGRESSION TOPICS

Rothman, Greenland, and Lash: Ch. 20.

Page 17: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression Why use regression models? How about stratified analyses of tabular data?

Control for confounding

Assess effect modification

Summarize disease association of several predictor variables, e.g. ORMH.

Model-freeAssumption: homogeneity within each strata

Page 18: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression Limitations of Stratification

Adjustment only for categorical covariates

Categorization of continuous variables: loss of information; residual confounding

Sparse data

Inefficiency

Page 19: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression What are regression functions? E [Y|X] or g(E [Y|X])

Y is the outcome variable X is the predictor or a vector of predictors g() is a transformation or “link function”

Need to define Y, X, and population over which the expectation or average is taken:

target populationsource population

sample

Page 20: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression

Generally we assume that: the function E[Y|X] has a particular form the errors, the differences between actual observations and their expected

values have particular properties independence Mean = 0 a specified distribution

We may make other assumptions

These assumptions form the “model”.

Page 21: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression There are regression models designed for use with many types of outcome variables

and explanatory variables Continuous variables Indicator variables Unordered polytomous variables Ordinal variables …

Page 22: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression

Page 23: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression There seems to be a relationship between two variables

Regression: E[Y | X ] ?

Page 24: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression E[Y | X= xi] for xi strata of X

Page 25: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression

Page 26: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression

Assume a linear relationship between X and Y (model)

Page 27: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression

Page 28: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression

Page 29: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression A regression model may summarize some aspect of the

relationship between variables without completely describing the relationship

Page 30: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression Continuous explanatory variable with categorical outcome variable:

Page 31: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression We could use a linear model for a dichotomous outcome:

linear risk modelE [1{outcome=1} ] = Pr(outcome = 1)

Page 32: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Regression We could use a linear model for a dichotomous exposure and

Continuous outcome:

Page 33: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Intervention Effects and Regression Intervention effects:

E[ Y | set(X=x1), Z=z] - E[Y | set(X=x0), Z=z]E[ Y | set(X=x1), Z=z] / E[Y | set(X=x0), Z=z]

where the expectation is over the target population

Page 34: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Intervention Effects and Regression Intervention effects:

E[ Y | set(X=x1), Z=z] - E[Y | set(X=x0), Z=z]E[ Y | set(X=x1), Z=z] / E[Y | set(X=x0), Z=z]where expectation is over target population

In practice, what we can calculate with standard regression analysis is:Ave(Y | X=x1, Z=z) - Ave(Y | X=x0, Z=z')Ave(Y | X=x1, Z=z) / Ave(Y | X=x0, Z=z')

or equivalently:

E[ Y | X=x1, Z=z] - E[Y | X=x0, Z=z’]E[ Y | X=x1, Z=z] / E[Y | X=x0, Z=z’]

where the expectation is over the sample

Page 35: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Intervention Effects and Regression If we want to use the regression association measures as estimates of the potential intervention effects,we need to assume:

E[ Y | X=x, Z=z] = E[ Y | set(X=x), Z=z]

No Confounding Assumption

“no residual confounding of X and Y given Z"

Page 36: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Intervention Effects and Regression

Regression standardization

E[ Y | X=x, Z=z]different values of Z correspond to different strata in which you may consider the Y~X association

You can define a overall measure of the Y~X association by taking a weighted average over the different strata or levels of Zresulting in a marginal or population averaged effect:

EW[Y | X=x] = Σ{z in Z}( w(z) * E[Y | X=x, Z=z] )

Different choices for weights w(z):w(z) = proportion of Z=z in source population...

or in a different target populationor in a standard population

Page 37: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Model Specification and Model Fitting

Specification: What is the functional FORM of the relationship between Y and X

E[Y | X0, X1] = a + b0*X0 + b1*X1

Fitting:Using data to estimate the various constants in the generic functional form of a model.

Page 38: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Building Blocks

vacuous models:E[Y] = a

constant models:E[Y | X] = a

linear models:E[Y | X0, X1] = a + b0*X0 + b1*X1

Is E[Y | X0, X1] = a + b0*X0 + b1*(X0^2)

a linear model?

exponential models:E[Y|X] = exp ( a + bX) = exp(a)*exp(bX)log(E[Y|X]) = a + bX

more generally:g(E[Y|X]) = a + bX

Page 39: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Variable Transformations

Transformations:covariates:

reduce leverage of outlying covariate valueschange units of effect estimates

outcome variables:change scale of model (e.g. loglinear models)make outcome distribution more “Normal” (t-

tests)

Page 40: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Variable Transformations

Millns et al(1995) Is it necessary to transform nutrient variables prior to statistical analysis? AJ Epi 141(3):251-262

Page 41: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Outcome transformations vs. generalized linear models

Outcome variables may be transformed:accelerated failure time model (eq. 20-18, Rothman et al page 396)E[ln(Y)] = α + β1X1

Instead of transforming and outcome variable to account for features of itsdistribution and then using linear regression, we may use alternatives to linear regression that can accommodatespecial aspects of the distribution of Y.

Namely, the variance of Y may be constrained by the expected value of Y.

Linear Regression: Y continuousVar[Y|X] independent of E[Y|X]

E[Y] = α + β1X1 + β2X2 + … + βkXk

Logistic Regression: Y dichotomousVar[PrY| X] = (E[PrY|X])(1-E[PrY|X )

E[ Log(odds) ] = α + β1X1 + β2X2 + … + βkXk

Poisson: Y countVar[count| X] = E[count|X]

E [Log(rate)] =α + β1X1 + β2X2 + … + βkXk

Page 42: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Generalized Linear Models

A broad class of models (including linear, logistic, and Poisson regression):The distribution of the outcome Y has a special form

“Exponential dispersion family”

There is a linear model for a transformed version of the expected value of Y – a “mean function”g(E[Y|X] ) = Xβwhere g() is a “link function”

The variance of Y can be expressed as a function of the expected value of YVar(Y|X) = V(g-1(Xβ ) )

There are general methods to solve many forms of these models and extensions of these models

Page 43: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Generalized Linear Models in Stata

For example: logistic regression is Family=binomial, link = logit. Choosing theFamily here specifies the probability model for Y|X, and thus the mean and variancefunctions

Page 44: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Logistic Regression If we use a logistic model, we do not have the problem of suggesting

risks greater than 1 or less than 0 for some values of X:E[1{outcome = 1} ] = exp(a+bX)/ [1 + exp(a+bX) ]

Page 45: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Logistic Regression Logistic model is a linear model, on a different scale than the linear risk model:

log(Pr(outcome=1)/[1 – Pr(outcome=1) ] = a + bX

Page 46: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Extensions to Logistic Regression

More than 2 outcome categories:unordered categories

* polytomous logistic model = multinomical logistic model* one category is designated the reference category, y0* for each alternative category, yi, there is a

- linear model for the log-odds of outcome yi vs. y0- log-odds (Y=yi | X=x ) = ai + bix- Odds (Y=yi | X=x )/ Odds (Y=yi | X=x* )

= exp( (x-x*)bi )

Page 47: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Extensions to Logistic Regression

Clarke et al (2008) Mobility disability and the urban built environment. AJEpi 168(5)

Page 48: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Extensions to Logistic Regression

Page 49: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Extensions to Logistic Regression More than 2 outcome categories:

ordered categories* y0 < y1 < y2* various models possible* cumulative odds = proportional odds model available in Stata

Pr(Y > yi) | X=x ) / Pr(Y <= yi |X=x) = exp(ai + bx) = exp(ai) * exp(bx)

so a unit increase in x will increase

Pr(Y > y0) | X=x ) / Pr(Y <= y0 |X=x)

and

Pr(Y > y1) | X=x ) / Pr(Y <= y1 |X=x)

by the same factor: exp(b)

thus the name “proportional odds”

Page 50: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Extensions to Logistic Regression Ordinal outcomes:

Page 51: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Extensions to Logistic Regression

Page 52: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Extensions to Logistic Regression

Page 53: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Extensions to Logistic Regression

Page 54: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Poisson Regression

We can have counts as an outcome

A Poisson distribution can model the number of independent eventsoccurring per unit of observation when the expected numberof events per unit of observation is λ

Prob (counts Y = k ) = λk *e-λ / k!

Page 55: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Poisson Regression

Think of a linear model for the log of the number of counts per unit of observation:

log( E[Y | X=x] / units of observation ) = a + bx= log( E[Y|X=x] ) – log(units of observation ) = a + bx=>

log(E[Y|X=x) = log(units of observation) + a + bx

note that the units of observation is used in the model like like a covariate, but it has a coefficient of 1. To distinguishfactors like this from other covariates, it is called an offset

E[Y|X=1] / E[Y|X=0] = exp(b)

Page 56: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Poisson Regression

A Poisson distribution can model the number of independent eventsoccurring per unit of observation when the expected numberof events per unit of observation is λ

Prob (counts Y = k ) = λ * exp(-λ) / k!

With a Poisson distribution, the expected number of counts is equal to the variance of the number of counts

Typically the model is applied to aggregated units of observation (groups or strata) for which total counts, total units of observation,and group-level covariates are recorded

Page 57: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Poisson Regression

Johnson et al (2005) Geographic Prediction of Human Onset of West Nile Virus Using Dead CrowClusters: An Evaluation of Year 2002 Data in New York State. AJ Epi 163(2):171-181.

Page 58: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Poisson Regression

Page 59: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Poisson Regression

Typically the model is applied to aggregated units of observation (groups or strata) for which total counts, total units of observation,and group-level covariates are recorded

Collapsing covariates into group-level covariates can introduce bias and loose information

Ungrouped Poisson Regression methods have been developed use individual, time-varying covariate informationestimate effects of covariates on rates

Parametric proportional hazards models with exponential event distribution

Page 60: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Poisson Regression

Loomis et al (2005) Poisson regression analysis of ungrouped data. OccEnvirnMed 62:325-329.

Page 61: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Poisson Regression

Page 62: REGRESSION MODELS AND POLYTOMOUS VARIABLES Joel Mefford meffordj@humgen.ucsf.edu 03/02/2012.

Further Extensions for Regression Typically it is assumed that observations are independent Generalized Estimating Equations

Do not necessarily assume that observations are independent, but if not a particular correlations structure for the observations is generally assumed

Independent – special case Block-independent – observations are partitioned into blocks.

Observations within a block are correlated, say with a fixed correlation r Autoregressive – observations within a block correspond to sequential

observations. Consecutive observations are more closely correlated than non-consecutive correlations

GEEs are useful in cases such as longitudinal studies where each subject has a set of observations, studies where several families are studied together and blocks represent families

Non-independent observations may also be studied using hierarchical models where:

Observations are drawn from blocks with some distribution that depends on some covariates

Blocks are drawn from some higher level model that may depend on other covariates


Recommended