Cost as the Dependent Variable (II) Paul G. Barnett, PhD VA Health Economics Resource Center.

transcript

Cost as the Dependent Cost as the Dependent Variable (II)Variable (II)

Paul G. Barnett, PhDPaul G. Barnett, PhD

VA Health Economics VA Health Economics Resource CenterResource Center

Review of Ordinarily Least Review of Ordinarily Least Squares (OLS) Squares (OLS)

Classic linear modelClassic linear model Assume dependent variable can be Assume dependent variable can be

expressed as a linear function of the expressed as a linear function of the chosen independent variables, e.g.:chosen independent variables, e.g.:

YYii = = αα + + ββ X Xii + + εεii

Review of OLS assumptionsReview of OLS assumptions

Expected value of error is zero E(Expected value of error is zero E(εεii)=0)=0

Errors are independent E(Errors are independent E(εεiiεεjj)=0)=0

Errors have identical variance E(Errors have identical variance E(εεii22)=)=σσ22

Errors are normally distributedErrors are normally distributed Errors are not correlated with Errors are not correlated with

independent variables E(Xindependent variables E(Xiiεεii)=0)=0

Cost is a difficult variableCost is a difficult variable

Skewed by rare but extremely high cost Skewed by rare but extremely high cost eventsevents

Zero cost incurred by enrollees who don’t Zero cost incurred by enrollees who don’t use careuse care

No negative valuesNo negative values

Review from last sessionReview from last session Applying Ordinary Least Squares OLS to data Applying Ordinary Least Squares OLS to data

that aren’t normal can result in biased that aren’t normal can result in biased parametersparameters– OLS can predict OLS can predict negativenegative costs costs

Log transformation can make cost more Log transformation can make cost more normally distributed normally distributed

Predicted cost is affected by re-transformation Predicted cost is affected by re-transformation biasbias– Corrected using smearing estimator Corrected using smearing estimator – Assumes constant error (homoscedasticity)Assumes constant error (homoscedasticity)

Topics for today’s courseTopics for today’s course

What to do when there is What to do when there is heteroscedasticity? heteroscedasticity?

What to do when there are many zeros What to do when there are many zeros values?values?

How to test differences in groups with no How to test differences in groups with no assumptions about distribution? assumptions about distribution?

How to determine which method is best?How to determine which method is best?

What to do when there is What to do when there is heteroscedasticity? heteroscedasticity?

Properties of variance of the errorsProperties of variance of the errors

HomoscedasticityHomoscedasticity– Identical variance E(Identical variance E(εεii

22)=)=σσ22

HeteroscedasticityHeteroscedasticity– Variance depends on x (or on predicted y)Variance depends on x (or on predicted y)

HomoscedasticityHomoscedasticity

– Errors have identical variance E(Errors have identical variance E(εεii22)=)=σσ22

0 5,000 10,000 15,000 20,000

HeteroscedasticityHeteroscedasticity

– Errors depend on x (or on predicted y)Errors depend on x (or on predicted y)

0 5,000 10,000 15,000 20,000

Why worry about hWhy worry about heteroscedasticity?eteroscedasticity?

OLS with homoscedastic retransformation OLS with homoscedastic retransformation – ““If error term If error term εε is heteroscedastic, estimates can be is heteroscedastic, estimates can be

appreciably biased”appreciably biased”– Reminding Manning and Reminding Manning and Mullahy Mullahy of Longfellow’s of Longfellow’s

nursery rhyme: nursery rhyme: ““When she was good, she was very, very good, but when When she was good, she was very, very good, but when

she was bad, she was horrid” she was bad, she was horrid”

JHE JHE 20:461, 200120:461, 2001

Generalized Linear Models (GLM)Generalized Linear Models (GLM)

Analyst specifies a link function g( )Analyst specifies a link function g( ) Analyst specifies a variance functionAnalyst specifies a variance function

– Key reading: “Estimating log models: to Key reading: “Estimating log models: to transform or not to transform,” Mullahy and transform or not to transform,” Mullahy and Manning JHE 20:461, 2001Manning JHE 20:461, 2001

Link function g( ) in GLMLink function g( ) in GLM

g (E (y | x) (y | x) )==αα + + ββxx Link function can be natural log, square Link function can be natural log, square

root, or other functionroot, or other function– E.g. E.g. ln (ln ( E ( y | x) E ( y | x))) = = αα + + ββxx

– When link function is natural log, then When link function is natural log, then ββ represents percent change in yrepresents percent change in y

GLM vs. OLSGLM vs. OLS

OLS of log estimate: E ( ln ( y) | x)) OLS of log estimate: E ( ln ( y) | x)) GLM estimate: ln (E ( y | x)) GLM estimate: ln (E ( y | x))

– Log of expectation of y is not the same as Log of expectation of y is not the same as expectation of log Y!expectation of log Y!

With GLM to find predicted YWith GLM to find predicted Y– No retransformation bias with GLMNo retransformation bias with GLM

– Smearing estimator not used Smearing estimator not used

Variance functionVariance function

GLM does not assume constant varianceGLM does not assume constant variance GLM assumes there is function that GLM assumes there is function that

explains the relationship been the explains the relationship been the variance and meanvariance and mean– v (y | x)v (y | x)

Variance assumptions for GLM cost Variance assumptions for GLM cost models models

Gamma Distribution (most common)Gamma Distribution (most common)– Variance is proportional to the square of the Variance is proportional to the square of the

meanmean Poisson DistributionPoisson Distribution

– Variance is proportional to the meanVariance is proportional to the mean

Estimation methodsEstimation methods How to specify log link and gamma How to specify log link and gamma

distribution with dependent variable Y distribution with dependent variable Y and independent variables X1, X2, X3and independent variables X1, X2, X3

StataStataGLM Y X1 X2 X3, FAM(GAM) LINK(LOG)GLM Y X1 X2 X3, FAM(GAM) LINK(LOG)

SASSASPROC GENMOD MODEL Y=X1 X2 X3 / PROC GENMOD MODEL Y=X1 X2 X3 /

DIST=GAMMA LINK=LOG;DIST=GAMMA LINK=LOG;

Choice between GLM Choice between GLM and OLS of log transformand OLS of log transform

GLM advantages:GLM advantages:– GLM can correct for heteroscedasticityGLM can correct for heteroscedasticity

– GLM does not lead to retransformation errorGLM does not lead to retransformation error OLS of log transform advantagesOLS of log transform advantages

– OLS is more efficient (standard errors are OLS is more efficient (standard errors are smaller than with GLM)smaller than with GLM)

Use GLM or Use GLM or OLS with Log Transformation?OLS with Log Transformation?

Estimate GLM modelEstimate GLM model If log scale residuals kurtosis > 3, use If log scale residuals kurtosis > 3, use

OLS Log modelOLS Log model If < 3, use GLM and find best If < 3, use GLM and find best

variance structurevariance structure

Which variance structure with Which variance structure with GLM?GLM?

– Modified Park test Modified Park test – Regression with squared residuals as dependent Regression with squared residuals as dependent

variable, predicted y as independent variable variable, predicted y as independent variable γγ11= 1 use Poisson= 1 use Poisson γγ11=2 use Gamma=2 use Gamma

iiii YYY ˆ)ˆ( 102

Use GLM or Use GLM or OLS with Log Transformation?OLS with Log Transformation?

Start with log OLSStart with log OLS Check for heteroscedasticityCheck for heteroscedasticity

– Estimate OLS of log cost model and save residualsEstimate OLS of log cost model and save residuals– Plot residuals against Xs or predicted YPlot residuals against Xs or predicted Y– Formal test are availableFormal test are available

If heteroscedastic and kurtosis > 3 If heteroscedastic and kurtosis > 3 – Choose between GLM or heteroscedastic corrected Choose between GLM or heteroscedastic corrected

OLS on log tansformed costOLS on log tansformed cost– Based choice on best fitBased choice on best fit

Other models for skewed dataOther models for skewed data

Generalized gamma modelsGeneralized gamma models Box-Cox modelsBox-Cox models

– See Manning, Basu, and Mullahy, 2005See Manning, Basu, and Mullahy, 2005

Questions?Questions?

Who is taking this course?Who is taking this course? Questions for studentsQuestions for students

– What is your principal role in the health care What is your principal role in the health care system?system?

– Are you currently involved in an economic Are you currently involved in an economic study?study?

– What is your training?What is your training?– How many semesters of statistics course How many semesters of statistics course

work have you taken?work have you taken?

What to do when there is What to do when there is heteroscedasticity? heteroscedasticity? (GLM models)(GLM models)

What to do when there are many zeros What to do when there are many zeros values? values?

Example of participants enrolled in a Example of participants enrolled in a health plan who have no utilizationhealth plan who have no utilization

Annual per person VHA costs FY05 Annual per person VHA costs FY05 among those who used VHA in FY06among those who used VHA in FY06

babili

Medical Only Medical+Rx

The two-part modelThe two-part model

Part 1: Dependent variable is indicator Part 1: Dependent variable is indicator any cost is incurred any cost is incurred – 1 if cost is incurred (Y > 0)1 if cost is incurred (Y > 0)

– 0 if no cost is incurred (Y=0)0 if no cost is incurred (Y=0) Part 2: Regression of how much cost, Part 2: Regression of how much cost,

among those who incurred any costamong those who incurred any cost

The two-part modelThe two-part model

Expected value of Y conditional on XExpected value of Y conditional on X

),0|()|)0()|( XYYEXYPXYE

Part 1.Part 1.The probability that The probability that Y is greater than zero, Y is greater than zero, conditional on Xconditional on X

Part 2.Part 2.Expected value of Y, Expected value of Y, conditional on Y being conditional on Y being greater than zero, greater than zero, conditional on Xconditional on X

Is the product of:Is the product of:

Predicted cost in two-part modelPredicted cost in two-part model

Predicted value of YPredicted value of Y

),0|()|)0()|( XYYEXYPXYE

Part 1.Part 1.Probability of any cost Probability of any cost being incurredbeing incurred

Part 2.Part 2.Predicted cost Predicted cost conditional on conditional on incurring any costincurring any cost

Is the product of:Is the product of:

Question for classQuestion for class

Part one estimates probability Y > 0Part one estimates probability Y > 0– Y > 0 is dichotomous indicatorY > 0 is dichotomous indicator– 1 if cost is incurred (Y > 0)1 if cost is incurred (Y > 0)– 0 if no cost is incurred (Y=0)0 if no cost is incurred (Y=0)

What type of regression should be used when the What type of regression should be used when the dependent variable is dichotomous (takes a value of dependent variable is dichotomous (takes a value of either zero or one)?either zero or one)?

)|)0( XYP

First part of model First part of model Regression with dichotomous variableRegression with dichotomous variable

Logistic regression or probitLogistic regression or probit Logistic regression uses maximum Logistic regression uses maximum

likelihood function to estimate log odds likelihood function to estimate log odds ratio:ratio:

Logistic regression syntax in SASLogistic regression syntax in SASProc Logistic;Proc Logistic;Model Y = X1 X2 X3 / Descending;Model Y = X1 X2 X3 / Descending;Output out={dataset} prob={variable name};Output out={dataset} prob={variable name};

Output statement saves the predicted probability that Output statement saves the predicted probability that the dependent variable equals one (cost was incurred)the dependent variable equals one (cost was incurred)

Descending option in model statement is required, Descending option in model statement is required, otherwise SAS estimates the probability that the otherwise SAS estimates the probability that the dependent variable equals zerodependent variable equals zero

Logistic regression syntax in StataLogistic regression syntax in Stata

Logit Y = X1 X2 X3Logit Y = X1 X2 X3

Predict {variable name}, prPredict {variable name}, pr

Predict statement generates the predicted Predict statement generates the predicted probability that the dependent variable probability that the dependent variable equals one (cost was incurred)equals one (cost was incurred)

Second part of model Second part of model Conditional quantityConditional quantity

Regression involves only observations Regression involves only observations with non-zero cost (conditional cost with non-zero cost (conditional cost regression)regression)

Use GLM or OLS with log costUse GLM or OLS with log cost

Two-part modelsTwo-part models

Separate parameters for participation and Separate parameters for participation and conditional quantityconditional quantity– How independent variables predict How independent variables predict

participation in careparticipation in care quantity of cost conditional on participationquantity of cost conditional on participation

– each parameter may have its policy relevanceeach parameter may have its policy relevance Disadvantage: hard to predict confidence Disadvantage: hard to predict confidence

interval around predicted Y given Xinterval around predicted Y given X

Alternate to two-part modelAlternate to two-part model

OLS with untransformed costOLS with untransformed cost OLS with log cost, using small positive OLS with log cost, using small positive

values in place of zerovalues in place of zero Certain GLM modelsCertain GLM models

What to do when there is What to do when there is heteroscedasticity? (GLM models)heteroscedasticity? (GLM models)

What to do when there are many zeros What to do when there are many zeros values? values? (Two-part models)(Two-part models)

How to test differences in groups with no How to test differences in groups with no assumptions about distribution?assumptions about distribution?

Non-parametric statistical testsNon-parametric statistical tests

Make no assumptions about distribution, Make no assumptions about distribution, variancevariance

Wilcoxon rank-sum testWilcoxon rank-sum test Assigns rank to every observationAssigns rank to every observation Compares ranks of members in two groupsCompares ranks of members in two groups Calculates the probability that the rank Calculates the probability that the rank

order occurred by chance aloneorder occurred by chance alone

Extension to more than two groups Extension to more than two groups Group variable with more than two Group variable with more than two

mutually exclusive valuesmutually exclusive values Kruskall Wallis testKruskall Wallis test

– is there any difference between any pairs of is there any difference between any pairs of the mutually exclusive groups?the mutually exclusive groups?

If KW is significant, then a series of If KW is significant, then a series of Wilcoxon tests allows comparison of Wilcoxon tests allows comparison of pairs of groupspairs of groups

Limits of non-parametric testLimits of non-parametric test It is too conservativeIt is too conservative

– Compares ranks, not meansCompares ranks, not means– Ignores influence of outliersIgnores influence of outliers– E.g. all other ranks being equal, Wilcoxon will E.g. all other ranks being equal, Wilcoxon will

give same result regardless of whether give same result regardless of whether Top ranked observation is $1 million more costly than Top ranked observation is $1 million more costly than

second observation, orsecond observation, or Top ranked observation just $1 more costlyTop ranked observation just $1 more costly

Doesn’t allow for additional explanatory Doesn’t allow for additional explanatory variablesvariables

What to do when there is heteroscedasticity? What to do when there is heteroscedasticity? (GLM models)(GLM models)

What to do when there are many zeros What to do when there are many zeros values? (Two-part models)values? (Two-part models)

How to test differences in groups with no How to test differences in groups with no assumptions about distribution? assumptions about distribution? (Non-(Non-parametric statistical tests)parametric statistical tests)

Which method is best?Which method is best?

Find predictive accuracy of modelsFind predictive accuracy of models Estimate regressions with half the data, Estimate regressions with half the data,

test their predictive accuracy on the other test their predictive accuracy on the other half of the datahalf of the data

Find Find – Mean Absolute Error (MAE)Mean Absolute Error (MAE)

– Root Mean Square Error (RMSE)Root Mean Square Error (RMSE)

Mean Absolute ErrorMean Absolute Error For each observationFor each observation

– find difference between observed and predicted costfind difference between observed and predicted cost

– take absolute valuetake absolute value

– find the meanfind the mean Model with smallest value is bestModel with smallest value is best

iii YY

Root Mean Square ErrorRoot Mean Square Error

Square the differences between predicted Square the differences between predicted and observed, find their mean, find its and observed, find their mean, find its square rootsquare root

Best model has smallest valueBest model has smallest value

iii YY

2)ˆ(1

Evaluations of residualsEvaluations of residuals Mean residual (predicted less observed)Mean residual (predicted less observed)

or or Mean predicted ratio (ratio of predicted to Mean predicted ratio (ratio of predicted to

observed)observed)– calculate separately for each decile of calculate separately for each decile of

observed Yobserved Y– A good model should have equal residuals A good model should have equal residuals

(or equal mean ratio) for all deciles(or equal mean ratio) for all deciles

Formal tests of residualsFormal tests of residuals

Variant of Hosmer-Lemeshow TestVariant of Hosmer-Lemeshow Test– F test of whether residuals in raw scale in F test of whether residuals in raw scale in

each decile are significantly differenteach decile are significantly different Pregibon’s Link TestPregibon’s Link Test

– Tests if linearity assumption was violatedTests if linearity assumption was violated See Manning, Basu, & Mullahy, 2005See Manning, Basu, & Mullahy, 2005

Questions?Questions?

Review of presentationReview of presentation

Cost is a difficult dependent variableCost is a difficult dependent variable– Skewed to the right by high outliersSkewed to the right by high outliers

– May have many observations with zero May have many observations with zero valuesvalues

– Cost is not-negativeCost is not-negative

When cost is skewedWhen cost is skewed

OLS of raw cost is prone to biasOLS of raw cost is prone to bias– Especially in small samples with influential Especially in small samples with influential

outliersoutliers– ““AA single case can have tremendous influence” single case can have tremendous influence”

When cost is skewed (cont.)When cost is skewed (cont.)

Log transformed costLog transformed cost– Log cost is more normally distributed than Log cost is more normally distributed than

raw costraw cost

– Log cost can be estimated with OLSLog cost can be estimated with OLS

When cost is skewed (cont.)When cost is skewed (cont.)

To find predicted cost, must correct for To find predicted cost, must correct for retransformation biasretransformation bias– Smearing estimator assumes errors are Smearing estimator assumes errors are

homoscedastichomoscedastic

– Biased if errors are heteroscedascticBiased if errors are heteroscedasctic ““When she was good, she was very, very When she was good, she was very, very

good, but when she was bad, she was horrid”good, but when she was bad, she was horrid”

When cost is skewed When cost is skewed and errors are heteroscedasticand errors are heteroscedastic

GLM with log link and gamma varianceGLM with log link and gamma variance– Considers heteroscedasctic errorsConsiders heteroscedasctic errors

– Not subject to retransformation biasNot subject to retransformation bias

– May not be very efficientMay not be very efficient

– Alternative specification Alternative specification Poisson instead of gamma variance functionPoisson instead of gamma variance function Square root instead of log link functionSquare root instead of log link function

When cost has many zero valuesWhen cost has many zero values

Two part modelTwo part model– Logit or probit is the first partLogit or probit is the first part

– Conditional cost regression is the second Conditional cost regression is the second partpart

Comparison without distributional Comparison without distributional assumptionsassumptions

Non-parametric tests can be usefulNon-parametric tests can be useful May be too conservativeMay be too conservative Don’t allow co-variatesDon’t allow co-variates

Evaluating modelsEvaluating models

Mean Absolute ErrorMean Absolute Error Root Mean Square ErrorRoot Mean Square Error Other evaluations and tests of residualsOther evaluations and tests of residuals

Next lectureNext lecture

Non-linear dependent variablesNon-linear dependent variables

Ciaran PhibbsCiaran Phibbs

April 30thApril 30th

Key sources on GLM Key sources on GLM MANNING, W. G. (1998) The logged dependent MANNING, W. G. (1998) The logged dependent

variable, heteroscedasticity, and the retransformation variable, heteroscedasticity, and the retransformation problem, problem, J Health EconJ Health Econ, 17, 283-95., 17, 283-95.

* MANNING, W. G. & MULLAHY, J. (2001) * MANNING, W. G. & MULLAHY, J. (2001) Estimating log models: to transform or not to Estimating log models: to transform or not to transform?, transform?, J Health EconJ Health Econ, 20, 461-94., 20, 461-94.

* MANNING, W. G., BASU, A. & MULLAHY, J. * MANNING, W. G., BASU, A. & MULLAHY, J. (2005) Generalized modeling approaches to risk (2005) Generalized modeling approaches to risk adjustment of skewed outcomes data, adjustment of skewed outcomes data, J Health EconJ Health Econ, , 24, 465-88.24, 465-88.

Key sources on two-part modelsKey sources on two-part models

* MULLAHY, J. (1998) Much ado about two: * MULLAHY, J. (1998) Much ado about two: reconsidering retransformation and the two-reconsidering retransformation and the two-part model in health epart model in health econometrics, conometrics, J Health J Health EconEcon, 17, 247-81, 17, 247-81

JONES, A. (2000) Health econometrics, in: JONES, A. (2000) Health econometrics, in: Culyer, A. & Newhouse, J. (Eds.) Culyer, A. & Newhouse, J. (Eds.) Handbook Handbook of Health Economicsof Health Economics, pp. 265-344 , pp. 265-344 (Amsterdam, Elsevier).(Amsterdam, Elsevier).

References to worked examplesReferences to worked examples FLEISHMAN, J. A., COHEN, J. W., MANNING, W. FLEISHMAN, J. A., COHEN, J. W., MANNING, W.

G. & KOSINSKI, M. (2006) Using the SF-12 health G. & KOSINSKI, M. (2006) Using the SF-12 health status measure to improve predictions of medical status measure to improve predictions of medical expenditures, expenditures, Med CareMed Care, 44, I54-63., 44, I54-63.

MONTEZ-RATH, M., CHRISTIANSEN, C. L., MONTEZ-RATH, M., CHRISTIANSEN, C. L., ETTNER, S. L., LOVELAND, S. & ROSEN, A. K. ETTNER, S. L., LOVELAND, S. & ROSEN, A. K. (2006) Performance of statistical models to predict (2006) Performance of statistical models to predict mental health and substance abuse cost, mental health and substance abuse cost, BMC Med BMC Med Res MethodolRes Methodol, 6, 53., 6, 53.

References to work examples (cont).References to work examples (cont).

MORAN, J. L., SOLOMON, P. J., PEISACH, A. R. MORAN, J. L., SOLOMON, P. J., PEISACH, A. R. & MARTIN, J. (2007) New models for old questions: & MARTIN, J. (2007) New models for old questions: generalized linear models for cost prediction, generalized linear models for cost prediction, J Eval J Eval Clin PractClin Pract, 13, 381-9., 13, 381-9.

DIER, P., YANEZ D., ASH, A., HORNBROOK, M., DIER, P., YANEZ D., ASH, A., HORNBROOK, M., LIN, D. Y. (1999). Methods for analyzing health LIN, D. Y. (1999). Methods for analyzing health care utilization and costs care utilization and costs Ann Rev Public HealthAnn Rev Public Health (1999) 20:125-144 (Also gives accessible overview (1999) 20:125-144 (Also gives accessible overview of methods, but lacks information from more recent of methods, but lacks information from more recent developments)developments)

Link to HERC Cyberseminar HSR&D Link to HERC Cyberseminar HSR&D study of worked example study of worked example

Performance of Statistical Models to Predict Performance of Statistical Models to Predict Mental Health and Substance Abuse CostMental Health and Substance Abuse Cost

Maria Montez-Rath, M.S. 11/8/2006Maria Montez-Rath, M.S. 11/8/2006The audio:The audio: http://vaww.hsrd.research.va.gov/for_researchhttp://vaww.hsrd.research.va.gov/for_research

ers/cyber_seminars/HERC110806.asxers/cyber_seminars/HERC110806.asxThe Power point slides:The Power point slides: http://vaww.hsrd.research.va.gov/for_researchhttp://vaww.hsrd.research.va.gov/for_research

ers/cyber_seminars/HERC110806.pdfers/cyber_seminars/HERC110806.pdf

Book chaptersBook chapters MANNING, W. G. (2006) Dealing with MANNING, W. G. (2006) Dealing with

skewed data on costs and expenditures, in: skewed data on costs and expenditures, in: Jones, A. (Ed.) Jones, A. (Ed.) The Elgar Companion to The Elgar Companion to Health EconomicsHealth Economics, pp. 439-446 (Cheltenham, , pp. 439-446 (Cheltenham, UK, Edward Elgar).UK, Edward Elgar).

Cost as the Dependent Variable (II) Paul G. Barnett, PhD VA Health Economics Resource Center.

Documents