+ All Categories
Home > Documents > 22 Zhang

22 Zhang

Date post: 04-Oct-2015
Category:
Upload: heriantis-data
View: 212 times
Download: 0 times
Share this document with a friend
Description:
jurnal
17
  25 th  Annual CTS Transportation Conference May 22, 2014 / St. Paul, MN Yiwen Zhang  Jon Roesler, MS / Ann a Gaichas, MS / Mark Kinde, MP H Minnesota Department of Health  What Do You Do With Missing Data?  E Pluribus Unum (one out of many)  A Comparison of Single Imp utation Methods 1
Transcript
  • 25th Annual CTS Transportation Conference

    May 22, 2014 / St. Paul, MN

    Yiwen Zhang Jon Roesler, MS / Anna Gaichas, MS / Mark Kinde, MPH

    Minnesota Department of Health

    What Do You Do With Missing Data?

    E Pluribus Unum (one out of many)

    A Comparison of Single Imputation Methods

    1

  • Background Methods Results

    Discussion Conclusions

    2

  • Crash Outcome Data Evaluation System National Transportation Safety Board June 2013

    3

  • CODES

    (23,569)

    Crash (n=183,689)

    ~15,000 taken to hospital

    Hospital (n=131,959)

    ~35,000 MV Traffic

    Probabilistic Linkage

    (Strategicmatching.com)

    4

  • Dataset Creation CODES software (LinkSolv: StrategicMatching.com) 2009 MN CODES linked dataset (Anna Gaichas) Ways to deal with the enigma of missing data*

    3 Primary Strategies: Complete Case Analysis Multiple Imputation **Single Imputation** Making up the numbers

    *It is a riddle, wrapped in a mystery, inside an enigma. Winston Churchill

    5

  • Markov Chain Monte Carlo Propensity Score Regular Regression Maximum Likelihood Predictive Mean Method Stochastic Regression

    6

  • Multiple Imputation (IVEware 0.2)

    Multivariate sequential regression: works by fitting a sequence of regression models. For example, given a variable type, a regression model is chosen: continuous variable regular linear regression binary variable logistic regression count variable poisson regression categorical variable polytomous regression (i.e., with >2 levels)

    7

  • Table 1: Standard errors of the 6 single imputation methods plus the multiple imputation (larger is better!)

    Variables Markov Chain Monte Carlo Propensity Score

    Regular Regression

    Maximum likelihood

    Predictive Mean

    Stochastic Regression

    Multiple Imputation

    speed(log) 27.4 27.4 26.5 27.6 27.7 35.4 47.5 weather 0.8 0.9 1.0 0.8 1.3 2.1 2.4 light 1.0 1.0 0.6 0.9 1.9 3.5 3.8 diagram 0.3 0.3 0.3 0.3 0.3 0.3 0.4 event1 0.5 0.5 0.4 0.5 0.5 0.5 1.2 event2 0.3 0.3 0.3 0.3 0.3 0.6 1.1 eject 0.3 0.3 0.3 0.3 0.4 0.6 0.8 injsev 6.8 6.8 6.8 6.8 6.8 7.7 9.8 age 12.9 12.9 12.9 12.9 12.9 24.0 28.3 Note: multiplying the values by 103 gives the standard errors.

    8

  • 00.01

    0.02

    0.03

    0.04

    0.05

    MCMC Propensity Score(PS)

    RegularRegression (RR)

    Maximumlikelihood (ML)

    Predictive MeanMethod(PMM)

    Stochasticregression(SR)

    MultipleImputation(MI)

    speed

    9

  • 0.0%

    10.0%

    20.0%

    30.0%

    40.0%

    50.0%

    60.0%

    70.0%

    80.0%

    90.0%

    100.0%

    light condition weather condition logspeed eject injured seveity

    Single Imputation Stochastic Regression yields almost as much variance as the "GOLD STANDARD"

    MCMC

    Propensity Score (PS)

    Regular Regression (RR)

    Maximum likelihood (ML)

    Predictive Mean Method(PMM)

    Stochastic regression(SR)

    Multiple Imputation(MI)

    10

  • Percent difference between multiple Imputation and stochastic regression

    Variables Multiple Imputation vs. Stochastic

    speed 25%

    weather 14%

    light 8%

    diagram 10%

    event1 56%

    event2 46%

    eject 26%

    injsev 22%

    age 15% Average 25%

    11

  • Stochastic regression single imputation is good for: hypothesis generation applications such as online query systems less sophisticated users introducing users to CODES

    12

  • There are limitations Generalization (only for CODES data?) Compare with the multiple imputations with 5 imputed datasets.

    However, the results are compelling

    13

  • Stochastic regression is the best for single imputation

    Single imputation can be good enough

    14

  • More research on multiple imputation by changing to 10 imputed datasets

    Proc MI (SAS) vs IVEware Paper to be published Online query: MIDAS - MN Injury Data Access System Single imputed public use datasets (2006-2013).

    15

  • 2006-2013 will be available October 2014. Please contact Jon Roesler anytime by:

    [email protected]

    16

  • Yiwen Zhang [email protected] 612-242-4290

    17

    OutlineBackground CODES Links Crash & HospitalBackgroundMethods6 Single Imputation Methods ExaminedMethodsThe Gold StandardResults Results(larger is better!)ResultsResultsbut there is still a gap between Stochastic & Multiple ImputationEvery TimeDiscussionWhat is good enough?DiscussionConclusionsNext StepsPublic Use DatasetsContact Information


Recommended