Comparing Marginsof Multivariate Binary Data
Bernhard KlingenbergAssoc. Prof. of Statistics
Williams College, MA
www.williams.edu/~bklingen
Outline
Challenges:
• Associations of various degrees among binary variables• Simultaneous Inference• Sparse and/or unbalanced data, Test statistics with discrete
support• Asymptotic theory questionable
Setup:• Two indep. groups• Response: Vector of k
correlated binary variables (multivariate binary)
Goal:• Inference about k margins:
Marginal Risk Differences Marginal Risk Ratios
Outline Motivating Examples
From drug safety or animal toxicity/carcinogenicity studies
Source: http://us.gsk.com/products/assets/us_advair.pdf
Source: http://www.pfizer.com/files/products/uspi_lipitor.pdf
Outline Example: AEs from a vaccine trial (flu shot):
> head(Y1) # ACTIVE Treatment n1=1971ID HEADACHE PAIN MYALGIA ARTHRALGIA MALAISE FATIGUE CHILLS2 1 1 1 1 1 1 14 0 1 1 0 0 1 05 1 0 0 0 0 0 06 1 1 1 1 1 1 17 0 0 0 0 0 1 09 1 0 1 1 1 1 1> head(Y2) # PLACEBO Treatment n2=1554ID HEADACHE PAIN MYALGIA ARTHRALGIA MALAISE FATIGUE CHILLS1 0 0 0 0 0 0 03 0 0 0 0 0 0 08 0 0 0 0 1 0 010 0 0 0 0 0 0 011 0 0 0 0 0 0 015 0 0 1 0 0 1 0
Notation and Setupk-dimensional response vectors:
Group 1 Group 2
Random sample in each group:
Group 1 Group 2
Joint distrib. in each group depends on 2k-1 parameters Group 1 Group 2
),...,( 111 kYY1Y ),...,( 221 kYY2Y
11n11 YY ,,22n21 YY ,,
},{ ),,Pr( ),,Pr( 1021211111 jkkkk aaYaYaYaY
Comparing Margins Usually only interested in k margins. Group 1
Group 2
With just two (k=2) adverse events:
Group 1 Group 2
kjYY jj ,, all for )Pr( )Pr( 111 21
No Yes
No Yes
Headache
Pain
No Yes
No Yes
Headache
Pain
Comparing Margins
Group1 Group2 Diff
HEADACHE 0.2603 0.2407 0.0196
INJECTION SITE PAIN 0.6088 0.1384 0.4705
MYALGIA 0.2588 0.1088 0.1500
ARTHRALGIA 0.0893 0.0579 0.0314
MALAISE 0.2085 0.1332 0.0753
FATIGUE 0.2476 0.2098 0.0378
CHILLS 0.0928 0.0463 0.0465
Differences in marginal incidence rates between Group 1 (Treatment) and Group 2 (Control)
Family of Tests j-th Null Hypothesis:
Unrestricted and restricted MLEs:
Comparing Margins Estimates of marginal incidence rates and test statistics
comparing Group 1 (Treatment) and Group 2 (Control)
p-hat1 p-hat2 p-check p-tilde Wald Local GlobalHEADACHE 0.260 0.241 0.252 0.260 1.34 1.33 1.32
PAIN 0.609 0.138 0.401 0.405 33.47 28.29 28.26MYALGIA 0.259 0.109 0.193 0.210 11.87 11.21 10.85ARTHRALGIA 0.089 0.058 0.076 0.082 3.59 3.50 3.37MALAISE 0.209 0.133 0.175 0.196 5.99 5.84 5.60FATIGUE 0.248 0.210 0.231 0.244 2.66 2.64 2.59
CHILLS 0.093 0.046 0.072 0.085 5.51 5.29 4.93
Asymptotic Test
Note: Asymptotically, multivariate
normal with covariance matrix determined by
Asymptotic Test Correlation Matrix:
> round(cov2cor(Sigma),2) d1 d2 d3 d4 d5 d6 d7d1 1.00 0.04 0.29 0.26 0.38 0.41 0.27d2 1.00 0.18 0.09 0.08 0.10 0.01d3 1.00 0.46 0.35 0.36 0.30d4 1.00 0.33 0.33 0.32d5 1.00 0.51 0.44d6 1.00 0.37d7 1.00> qmvnorm(0.95, tail="both.tails", corr=cov2cor(Sigma))$quantile[1] 2.656222
Asymptotic Test Correlation Matrix:
> round(cov2cor(Sigma),2) d1 d2 d3 d4 d5 d6 d7d1 1.00 0.06 0.33 0.28 0.41 0.41 0.29d2 1.00 0.28 0.11 0.15 0.12 0.09d3 1.00 0.46 0.41 0.36 0.35d4 1.00 0.32 0.34 0.28d5 1.00 0.50 0.47d6 1.00 0.37d7 1.00
> qmvnorm(0.95, tail="both.tails", corr=cov2cor(Sigma))$quantile[1] 2.653783
Permutation Approach When testing
can use Permutation ApproachThis assumes distributions are exchangeable
(i.e. identical), much stronger assumption than under null
Need two extra conditions:i. Sequences of all 0's as or more likely to
occur under group 2 (Control)ii. Sequence of all 1's as or more likely to
occur under group 1 (Treatment)
Permutation vs. AsymptoticPermutation vs. asymptotic distribution of
Critical Value:(a = 0.05)cperm = 2.655casympt = 2.654cBonf = 2.690
Permut. Distr.
Asympt. Distr.
Family of Tests Results: Raw and Adjusted P-values
asymptotic exact Diff Global raw.P adj.P raw.P adj.P
HEADACHE 0.020 1.32 0.1876 0.7061 0.1830 0.7013
PAIN 0.471 28.25 0.0000 0.0000 0.0000 0.0000MYALGIA 0.150 10.85 0.0000 0.0000 0.0000 0.0000ARTHRALGIA 0.031 3.37 0.0007 0.0051 0.0005 0.0032MALAISE 0.075 5.60 0.0000 0.0000 0.0000 0.0000FATIGUE 0.038 2.59 0.0094 0.0589 0.0082 0.0516
CHILLS 0.047 4.93 0.0000 0.0000 0.0000 0.0000
Simultaneous Confidence Intervals Invert family of tests:Confidence Region: Simplifies to simultaneous confidence
intervals if
Simultaneous Confidence Intervals Results: Inverting Score test
diff LB UB
HEADACHE 0.0196 -0.0196 0.0583
PAIN 0.4705 0.4323 0.5069
MYALGIA 0.1500 0.1162 0.1835
ARTHRALGIA 0.0314 0.0078 0.0547
MALAISE 0.0753 0.0416 0.1086
FATIGUE 0.0378 -0.0002 0.0752
CHILLS 0.0465 0.0239 0.0692
Simultaneous Confidence Intervals We used (and recommend) score statistic Could use Wald statistic instead This is equivalent to fitting marginal model via
GEE:
asympt. multiv. normal, with (sandwich) covariance matrix (same as before)
Use distribution of for multiplicity adjustment
Simultaneous Confidence Intervals Results: GEE approach (= inverting Wald test)
diff LB UB
HEADACHE 0.0196 -0.0194 0.0586
PAIN 0.4705 0.4331 0.5078
MYALGIA 0.1500 0.1164 0.1836
ARTHRALGIA 0.0314 0.0082 0.0546
MALAISE 0.0753 0.0419 0.1087
FATIGUE 0.0378 0.0001 0.0755
CHILLS 0.0465 0.0241 0.0689