+ All Categories
Home > Documents > Advances in Visualizing Categorical Data Using the vcd...

Advances in Visualizing Categorical Data Using the vcd...

Date post: 28-May-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
72
intro vcd gnm 3D odds References Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra Packages in R Michael Friendly 1 Heather Turner 2 David Firth 2 Achim Zeileis 3 1 Psychology Department York University 2 University of Warwick, UK 3 Department of Statistics Universit¨ at Innsbruck CARME 2011 Rennes, February 9–11, 2011 Slides: http://datavis.ca/papers/adv-vcd-4up.pdf 1 / 67
Transcript
Page 1: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References

Advances in Visualizing Categorical DataUsing the vcd, gnm and vcdExtra Packages in R

Michael Friendly1 Heather Turner2 David Firth2

Achim Zeileis3

1Psychology DepartmentYork University

2University of Warwick, UK

3Department of StatisticsUniversitat Innsbruck

CARME 2011Rennes, February 9–11, 2011

Slides: http://datavis.ca/papers/adv-vcd-4up.pdf

1 / 67

Page 2: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References

Co-conspirators

Heather TurnerUniversity of Warwick David Firth

University of Warwick

Achim ZeileisUniversitat Innsbruck

2 / 67

Page 3: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References

Outline

1 Introduction

2 Generalized Mosaic Displays: vcd Package

3 Generalized Nonlinear Models: gnm & vcdExtra Packages

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds Ratios

3 / 67

Page 4: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

4 / 67

Page 5: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Brief History of VCD

Hartigan and Kleiner (1981, 1984): representing an n-waycontingency table by a “mosaic display,” showing a (recursive)decomposition of frequencies by “tiles”, area ∼ cell frequency.

e.g., a 4-way table of viewing TVprogramsFreq ~Day + Week + Time +

Network

5 / 67

Page 6: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Brief History of VCD

Friendly (1994): developed the connection between mosaicdisplays and loglinear models

Showed how mosaic displays could be used to visualize bothobserved frequency (area) and residuals (shading) from somemodel.1st presented at CARME 1995 (thx: Michael & Jorg!)

6 / 67

Page 7: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Brief History of VCD

Visualizing Categorical Data (Friendly, 2000)

But: mosaic-like displays have a long history (Friendly, 2002)!

von Mayr (1877) Birch (1964)

2002: vcd project at TU & WU, Vienna (Kurt Hornik, DavidMeyer, Achim Zeileis) 7→ vcd package

7 / 67

Page 8: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Brief History of VCD

Visualizing Categorical Data (Friendly, 2000)

But: mosaic-like displays have a long history (Friendly, 2002)!

von Mayr (1877) Birch (1964)

2002: vcd project at TU & WU, Vienna (Kurt Hornik, DavidMeyer, Achim Zeileis) 7→ vcd package

8 / 67

Page 9: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Brief History of VCD

Visualizing Categorical Data (Friendly, 2000)

But: mosaic-like displays have a long history (Friendly, 2002)!

von Mayr (1877) Birch (1964)

2002: vcd project at TU & WU, Vienna (Kurt Hornik, DavidMeyer, Achim Zeileis) 7→ vcd package

9 / 67

Page 10: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

10 / 67

Page 11: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

Data, pictures, models & stories

data

story

Two paths to enlightenment

Page 12: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

Data, pictures, models & stories

data

story

visualization

Exploratory

Two paths to enlightenment

Page 13: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

Data, pictures, models & stories

data

story

model

visualization

Exploratory

Model-basedTwo paths to enlightenment

Page 14: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

Data, pictures, models & stories

data

story

model

visualization summary

Exploratory

Model-basedTwo paths to enlightenment

Page 15: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

Data, pictures, models & stories

data

story

model

visualization summary

inference

Exploratory

Model-basedTwo paths to enlightenment

Page 16: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Visual overview: Models for frequency tables

Related models: logistic regression, polytomous regression, logodds models, ...Goals: Connect all with visualization methods

11 / 67

Page 17: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References VCD History Visual overview

Visual overview: R packages

12 / 67

Page 18: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

13 / 67

Page 19: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Extending mosaic-like displays

Initial ideas for mosaic displays were extended in a variety of ways:

pairs plots and trellis-like layouts for marginal, conditional andpartial views (Friendly 1999).

varying the shape attributes of bar plots and mosaic displays

double-decker plots (Hofmann 2001),spine plots and spinograms (Hofmann & Theus 2005)

residual-based shadings to emphasize pattern of association inlog-linear models or to visualize significance (Zeileis et al.,2007).

dynamic interactive versions (ViSta, MANET, Mondrian):

linking of several graphs and modelsselection and highlighting across graphs and modelsinteractive modification of the visualized models

14 / 67

Page 20: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Generalized mosaic displaysvcd package and the strucplot framework

Various displays for n-way frequency tables

flat (two-way) tables of frequenciesfourfold displaysmosaic displayssieve diagramsassociation plotsdoubledecker plotsspine plots and spinograms

Commonalities

All have to deal with representing n-way tables in 2DAll graphical methods use area to represent frequencySome are model-based — designed as a visual representationof an underlying statistical modelGraphical methods use visual attributes (color, shading, etc.)to highlight relevant statistical aspects

15 / 67

Page 21: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Generalized mosaic displaysvcd package and the strucplot framework

Various displays for n-way frequency tables

flat (two-way) tables of frequenciesfourfold displaysmosaic displayssieve diagramsassociation plotsdoubledecker plotsspine plots and spinograms

Commonalities

All have to deal with representing n-way tables in 2DAll graphical methods use area to represent frequencySome are model-based — designed as a visual representationof an underlying statistical modelGraphical methods use visual attributes (color, shading, etc.)to highlight relevant statistical aspects

16 / 67

Page 22: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Familiar example: UCB Admissions

Data on admission to graduate programs at UC Berkeley, by Dept,Gender and Admission> structable(Dept ~ Gender + Admit, UCBAdmissions)

Dept A B C D E FGender AdmitMale Admitted 512 353 120 138 53 22

Rejected 313 207 205 279 138 351Female Admitted 89 17 202 131 94 24

Rejected 19 8 391 244 299 317

or, as a two-way table (collapsed over Dept),

> structable(~Gender + Admit, UCBAdmissions)

Admit Admitted RejectedGenderMale 1198 1493Female 557 1278

17 / 67

Page 23: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Fourfold displays for 2 × 2 tables

General ideas:

Model-based graphs can show both data and model tests (orother statistical features)Visual attributes tuned to support perception of relevantstatistical comparisons

Gender: Male

Adm

it: A

dmitt

ed

Gender: Female

Adm

it: R

ejec

ted

1198

557

1493

1278

Quarter circles: radius ∼ √nij ⇒area ∼ frequencyIndependence: Adjoining quadrants≈ alignOdds ratio: ratio of areas ofdiagonally opposite cellsConfidence rings: Visual test ofH0 : θ = 1↔ adjoining ringsoverlap

18 / 67

Page 24: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Fourfold displays for 2 × 2 ×k tables

Stratified analysis: one fourfold display for each departmentEach 2× 2 table standardized to equate marginal frequenciesShading: highlight departments for which Ha : θi 6= 1

Gender: Male

Adm

it: A

dmitt

ed

Gender: Female

Adm

it: R

ejec

ted

Dept: A

512

89

313

19

Gender: Male

Adm

it: A

dmitt

ed

Gender: Female

Adm

it: R

ejec

ted

Dept: B

353

17

207

8

Gender: Male

Adm

it: A

dmitt

ed

Gender: Female

Adm

it: R

ejec

ted

Dept: C

120

202

205

391

Gender: Male

Adm

it: A

dmitt

ed

Gender: Female

Adm

it: R

ejec

ted

Dept: D

138

131

279

244

Gender: Male

Adm

it: A

dmitt

ed

Gender: Female

Adm

it: R

ejec

ted

Dept: E

53

94

138

299

Gender: Male

Adm

it: A

dmitt

ed

Gender: Female

Adm

it: R

ejec

ted

Dept: F

22

24

351

317

19 / 67

Page 25: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Mosaic displays

Tiles: Area ∼ observed frequencies, nijkFriendly shading (highlight association pattern):

Residuals: rijk = (nijk − mijk)/√

(mijk)Color— blue: r > 0, red: r < 0Saturation: |r| < 2 (none), > 4 (max), else (middle)

(Other shadings highlight significance)(Other color schemes: HSV, HCL, . . . )

Model: ~Dept+Gender+AdmitGender

Admit

Dep

tF

Admitted Rejected Admitted Rejected

ED

CB

A

Male Female

Model: ~(Dept*Gender) + AdmitGender

Admit

Dep

tF

Admitted Rejected Admitted Rejected

ED

CB

A

Male Female

Model: ~(Admit + Gender) * DeptGender

Admit

Dep

tF

Admitted Rejected Admitted Rejected

ED

CB

A

Male Female

20 / 67

Page 26: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

Mosaic displays: Fitting & visualizing models

Mutual independence model: Dept ⊥Gender ⊥Admit> berk.mod0 <- loglm(~Dept + Gender + Admit, data = UCB)> mosaic(berk.mod0, gp = shading_Friendly, ...)

−14.0

−4.0 −2.0 0.0 2.0 4.0

20.2

Pearsonresiduals:

Model: ~Dept+Gender+AdmitGender

Admit

Dep

tF

Admitted Rejected Admitted Rejected

ED

CB

AMale Female

Page 27: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

Mosaic displays: Fitting & visualizing models

Joint independence model: Admit ⊥ (Gender, Dept)> berk.mod1 <- loglm(~Admit + (Gender * Dept), data = UCB)> mosaic(berk.mod1, gp = shading_Friendly, ...)

−10.2

−4.0

−2.0

0.0

2.0

4.0

10.7

Pearsonresiduals:

Model: ~Admit + (Gender*Dept)Gender

Admit

Dep

tF

Admitted Rejected Admitted Rejected

ED

CB

AMale Female

Page 28: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

Mosaic displays: Fitting & visualizing models

Conditional independence model: Admit ⊥Gender |Dept> berk.mod2 <- loglm(~(Admit + Gender) * Dept, data = UCB)> mosaic(berk.mod2, gp = shading_Friendly, ...)

−3.13

−2.00

0.00

2.00 2.33

Pearsonresiduals:

Model: ~(Admit + Gender) * DeptGender

Admit

Dep

tF

Admitted Rejected Admitted Rejected

ED

CB

AMale Female

Page 29: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Double decker plots

Visualize dependence of one categorical (typically binary)variable on predictorsFormally: mosaic plots with vertical splits for all predictordimensions, highlighting response

DeptGender

AMale Female

BMale Female

CMale Female

DMale Female

EMaleFemale

FMale Female

Admitted

Rejected

Admit

24 / 67

Page 30: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

25 / 67

Page 31: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

The strucplot framework

A general, flexible system for visualizing n-way frequency tables:

integrates tabular displays, mosaic displays, association plots,sieve plots, etc. in a common framework.

n-way tables: variables partitioned into row and columnvariables in a “flat” 2D display using model formulae

arguments allow for fitting any loglinear model via loglm() inthe MASS package.

high-level functions for all-pairwise views (pairs()),conditional views (cotabplot()).

low-level functions control all aspects of labeling, shading,spacing, etc.

26 / 67

Page 32: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

The strucplot framework

Components of the strucplot framework:

27 / 67

Page 33: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Pairwise bivariate plots

Visualize all 2-way views of different independence models inn-way tables: type=

"pairwise": Burt matrix: bivariate, marginal views"total": pairwise plots for mutual independence"conditional": marginal independence, given all others"joint": joint independence of all pairs from other variables

Panel functions for upper, lower, diagonal panels

upper, lower: mosaic, assoc, sieve, ...diagonal: barplot, text, mosaic, ...

Admit

Admitted Rejected

Gender

Male Female

Dept

A B C D E F

0

500

1000

1500

2000

2500

3000

Admitted Rejected

Admit

0

500

1000

1500

2000

2500

3000

Male Female

Gender

0

200

400

600

800

1000

A B C D E F

Dept

0500

10001500200025003000

Admitted

Admit

0500

10001500200025003000

Male Female

Gender

0200400600800

1000

A B C D E F

Dept

28 / 67

Page 34: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Extending mosaic displays The strucplot framework

Pairwise bivariate plots

> pairs(UCBAdmissions, shade=TRUE, space=0.2,+ diag_panel = pairs_diagonal_mosaic(offset_varnames=-3, ...))

Admit

Admitted Rejected

Gender

Male Female

Dept

A B C D E F

29 / 67

Page 35: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

30 / 67

Page 36: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Loglinear models and generalized linear models

Loglinear modelsModel fitting in the vcd package is based on loglinear models

log(mij) = µ+ λAi + λBj ≡ [A][B] ≡∼ A + B

log(mij) = µ+ λAi + λBj + λABij ≡ [AB] ≡∼ A * B

Fit using iterative proportional fitting (loglm())7→ No standard errors, limited syntax for expressing models

Generalized linear modelsLink function:

E(y |x) = g(µ) = η(x)

= β0 + β1x1 + · · ·βkxk

Variance function: Var(y |x) = f(µ)Loglinear models as special cases with log link, Poissondistn 7→ Var(y |x) = µ

31 / 67

Page 37: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Loglinear models and generalized linear models

Loglinear modelsModel fitting in the vcd package is based on loglinear models

log(mij) = µ+ λAi + λBj ≡ [A][B] ≡∼ A + B

log(mij) = µ+ λAi + λBj + λABij ≡ [AB] ≡∼ A * B

Fit using iterative proportional fitting (loglm())7→ No standard errors, limited syntax for expressing models

Generalized linear modelsLink function:

E(y |x) = g(µ) = η(x)

= β0 + β1x1 + · · ·βkxk

Variance function: Var(y |x) = f(µ)Loglinear models as special cases with log link, Poissondistn 7→ Var(y |x) = µ

32 / 67

Page 38: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

33 / 67

Page 39: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Generalized nonlinear models: gnm package

A generalized non-linear model (GNM) is the same as a GLM,except that we allow

g(µ) = η(x;β)

where η(x;β) is nonlinear in the parameters β.

GNMs are very general, combining:

classical nonlinear modelsstandard link and variance functions for GLM families

In the context of models for categorical data, GNMs provide:

parsimonious models for structured associationmodels for multiplicative association (e.g., Goodman’s RC(1)model)multiple instances of multiplicative terms (RC(m) models)user-defined functions for custom models

34 / 67

Page 40: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Generalized nonlinear models: gnm package

Some models for structured associations in square tables

quasi-independence (ignore diagonals)> gnm(Freq ~ row + col + Diag(row, col), family = poisson)

symmetry (λRCij = λRC

ji )> gnm(Freq ~ Symm(row, col), family = poisson)

quasi-symmetry = quasi + symmetry> gnm(Freq ~ row + col + Symm(row, col), family = poisson)

fully-specified “topological” association patterns> gnm(Freq ~ row + col + Topo(row, col, spec = RCmatrix), ...)

All of these are actually GLMs, but the gnm package providesconvienence functions Diag, Symm, and Topo to facilitate modelspecification.

35 / 67

Page 41: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Nonlinear models

Nonlinear terms are specified in model formulae by functionsof class "nonlin"

Basic nonlinear functions: Exp(), Inv(), Mult()

Nonlinear terms can be nested. e.g. for a UNIDIFF model:

logµijk = αik + βjk + exp(γk)δij

the exponentiated multiplier is specified as Mult(Exp(C), A:B)

Multiple instances. e.g., Goodman’s RC(2) model:

logµrc = αr + βc + γr1δc1 + γr2δc2

specified using: instances(Mult(A,B), 2)

user-defined functions of class "nonlin" allow furtherextensions

All of these are fully general, providing residuals, fitted values, etc.36 / 67

Page 42: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Generalized nonlinear models: vcdExtra package

Provides glue, extending the vcd package visualization methods forglm and gnm models

mosaic.glm() 7→ mosaic methods for class "glm" and class"gnm" objects

sieve.glm(), assoc.glm() 7→ sieve diagrams andassociation plots

Generalized residual types:

Pearsondeviancestandard (adjusted) — unit asymptotic variance

Model lists:

glmlist() — methods for collecting, summarizing andvisualizing a list of related modelsKway() — generate & fit models of form ~(A+B+...)k.

37 / 67

Page 43: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

38 / 67

Page 44: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Models for ordered categories

Consider an R× C table having ordered categories

In many cases, the RC association may be described moresimply by assigning numeric scores to the row & columncategories.For simplicity, we consider only integer scores, 1, 2, . . . hereThese models are easily extended to stratified tables

R:C model µRCij df Formula

Uniform association i× j × γ 1 i:j

Row effects αi × j (I − 1) R:j

Col effects i× βj (J − 1) i:C

Row+Col eff jαi + iβj I + J − 3 R:j + i:C

RC(1) φiψj × γ I + J − 3 Mult(R, C)

Unstructured (R:C) µRCij (I − 1)(J − 1) R:C

39 / 67

Page 45: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Example: Social mobility in US, UK & Japan

Data from Yamaguchi (1987): Cross-national comparison ofoccupational mobility in the U.S., U.K. and Japan. Re-analysis byXie (1992).> Yama.tab <- xtabs(Freq ~ Father + Son + Country, data = Yamaguchi87)> structable(Country + Son ~ Father, Yama.tab[, , 1:2])

Country US UKSon UpNM LoNM UpM LoM Farm UpNM LoNM UpM LoM Farm

FatherUpNM 1275 364 274 272 17 474 129 87 124 11LoNM 1055 597 394 443 31 300 218 171 220 8UpM 1043 587 1045 951 47 438 254 669 703 16LoM 1159 791 1323 2046 52 601 388 932 1789 37Farm 666 496 1031 1632 646 76 56 125 295 191

See: demo("yamaguchi-xie", package="vcdExtra")

40 / 67

Page 46: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

First thought: try MCA> library(ca)> Yama.dft <- expand.dft(Yamaguchi87)> yama.mjca <- mjca(Yama.dft)> plot(yama.mjca, what = c("none", "all"))

Yamaguchi data: Mobility in US, UK and Japan, MCA

Dim 1: Farm vs. Other (52.6%)

Dim

2: O

cc. S

tatu

s (2

8.0%

)

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

−0.

6−

0.4

−0.

20.

00.

20.

4

SonFarmSonLoM

SonLoNM

SonUpM

SonUpNM

FatherFarmFatherLoM

FatherLoNM

FatherUpM

FatherUpNM

CountryJapan

CountryUK

CountryUS

●●

Dimensions seem tohave reasonableinterpretations

2nd glance: do they?

How do they relate totheories of socialmobility?

How to understandCountry effects?

41 / 67

Page 47: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Models for stratified mobility tables

Baseline models:

Perfect mobility: Freq ~(R+C)*L

Quasi-perfect mobility: Freq ~(R+C)*L + Diag(R, C)

Layer models:

Homogeneous: no layer effectsHeterogeneous: e.g., µRCL

ijk = δRCij exp(γLk )

Extended models: Baseline ⊕ Layer model( R:C model )

Layer modelR:C model Homogeneous log multiplicative

Row effects ~.+ R:j ~.+ Mult(R:j, Exp(L))

Col effects ~.+ i:C ~.+ Mult(i:C, Exp(L))

Row+Col eff ~.+ R:j + i:C ~.+ Mult(R:j + i:C, Exp(L))

RC(1) ~.+ Mult(R, C) ~.+ Mult(R, C, Exp(L))

Full R:C ~.+ R:C ~.+ Mult(R:C, Exp(L)

42 / 67

Page 48: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Yamaguchi data: Baseline models

Minimal, null model asserts Father ⊥Son |Country> yamaNull <- gnm(Freq ~ (Father + Son) * Country, data = Yamaguchi87,+ family = poisson)> mosaic(yamaNull, ~Country + Son + Father, condvars = "Country", ...)

−17.0

−4.0 −2.0 0.0 2.0 4.0

34.5

Pearsonresiduals:

p−value =<2e−16

[FC][SC] Null [FS] association (perfect mobility)Son's status

Cou

ntry

Fath

er's

sta

tus

Japa

n

FarmLoMUpMLoNMUpNM

UK

Farm

LoMUpMLoNMUpNM

US

UpNM LoNM UpM LoM Farm

Farm

LoMUpMLoNMUpNM

43 / 67

Page 49: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Yamaguchi data: Baseline models

But, theory 7→ ignore diagonal cells> yamaDiag <- update(yamaNull, ~. + Diag(Father, Son):Country)> mosaic(yamaDiag, ~Country + Son + Father, condvars = "Country", ...)

−11.9

−4.0

−2.0

0.0

2.0

4.0

17.1

Pearsonresiduals:

[FC][SC] Quasi perfect mobility, +Diag(F,S)Son's status

Cou

ntry

Fath

er's

sta

tus

Japa

n

FarmLoMUpMLoNMUpNM

UK

Farm

LoMUpMLoNMUpNM

US

UpNM LoNM UpM LoM Farm

Farm

LoMUpMLoNMUpNM

44 / 67

Page 50: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Yamaguchi data: Fit models for homogeneous association

gnm package makes it easy to fit collections of models, withsimple update() methods

> Rscore <- as.numeric(Yamaguchi87$Father)> Cscore <- as.numeric(Yamaguchi87$Son)> yamaRo <- update(yamaDiag, ~. + Father:Cscore)> yamaCo <- update(yamaDiag, ~. + Rscore:Son)> yamaRpCo <- update(yamaDiag, ~. + Father:Cscore + Rscore:Son)> yamaRCo <- update(yamaDiag, ~. + Mult(Father, Son))> yamaFIo <- update(yamaDiag, ~. + Father:Son)

Model Ro: homogeneous row effects, +Father:j Son's status

Cou

ntry

Fath

er's

sta

tus

Japa

n

FarmLoMUpMLoNMUpNM

UK

Farm

LoMUpMLoNMUpNM

US

UpNM LoNM UpM LoM Farm

Farm

LoMUpMLoNMUpNM

Model Co: homogeneous col effects, +i:SonSon's status

Cou

ntry

Fath

er's

sta

tus

Japa

n

FarmLoMUpMLoNMUpNM

UK

Farm

LoMUpMLoNMUpNM

US

UpNM LoNM UpM LoM Farm

Farm

LoMUpMLoNMUpNM

Model RCo: homogeneous RC(1)Son's status

Cou

ntry

Fath

er's

sta

tus

Japa

n

FarmLoMUpMLoNMUpNM

UK

Farm

LoMUpMLoNMUpNM

US

UpNM LoNM UpM LoM Farm

Farm

LoMUpMLoNMUpNM

45 / 67

Page 51: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Yamaguchi data: Models for heterogeneous association

Log-multiplicative (UNIDIFF) models:

> yamaRx <- update(yamaDiag, ~ . + Mult(Father:Cscore, Exp(Country)))> yamaCx <- update(yamaDiag, ~ . + Mult(Rscore:Son, Exp(Country)))> yamaRpCx <- update(yamaDiag, ~ . + Mult(Father:Cscore ++ Rscore:Son, Exp(Country)))> yamaRCx <- update(yamaDiag, ~ . + Mult(Father,Son, Exp(Country)))> yamaFIx <- update(yamaDiag, ~ . + Mult(Father:Son, Exp(Country)))

GNM model methods:

Summary methods: print(model), summary(model), . . .Extractor methods: coef(model), residuals(model), . . .

Visualization:

Diagnostics: plot(model)

Mosaics, etc: mosaic(model)

46 / 67

Page 52: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Yamaguchi data: Comparing models

glmlist() and related methods facilitate model comparison

> models <- glmlist(yamaNull, yamaDiag,+ yamaRo, yamaRx, yamaCo, yamaCx, yamaRpCo,+ yamaRpCx, yamaRCo, yamaRCx, yamaFIo, yamaFIx)> summarise(models)

Model Summary:LR Chisq Df Pr(>Chisq) AIC BIC

yamaNull 5592 48 0.0000 5496 5099yamaDiag 1336 33 0.0000 1270 997yamaRo 156 29 0.0000 98 -142yamaRx 148 27 0.0000 94 -130yamaCo 68 29 0.0001 10 -230yamaCx 59 27 0.0004 5 -219yamaRpCo 39 26 0.0509 -13 -228yamaRpCx 33 24 0.1034 -15 -213yamaRCo 38 26 0.0642 -14 -229yamaRCx 32 24 0.1240 -16 -214yamaFIo 36 22 0.0288 -8 -190yamaFIx 31 20 0.0560 -9 -174

47 / 67

Page 53: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Yamaguchi data: Comparing models

glmlist() and related methods facilitate model comparison> BIC <- matrix(summarise(models)$BIC[-(1:2)], 5, 2, byrow = TRUE)

−22

0−

200

−18

0−

160

−14

0

Yamaguchi−Xie models: R:C model by Layer model Summary

Father−Son model

BIC

● ●

row eff col eff row+col RC(1) R:C

homogeneous

log multiplicative

Country model

Homogeneous modelsall preferred by BIC

(Xie preferredheterogeneous models)

Little diffce among Col,Row+Col and RC(1)models

7→ R:C association ∼Row scores (Father’sstatus)

48 / 67

Page 54: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References loglms and glms gnms ordered

Yamaguchi data: Comparing models

glmlist() and related methods facilitate model comparison> AIC <- matrix(summarise(models)$AIC[-(1:2)], 5, 2, byrow = TRUE)

−20

020

4060

8010

0

Yamaguchi−Xie models: R:C model by Layer model Summary

Father−Son model

AIC

● ●

row eff col eff row+col RC(1) R:C

homogeneouslog multiplicative

Country model

AIC prefersheterogeneous models

Row+Col and RC(1)model fit best

7→ R:C association ∼Father’s status, not justscores

Model summary plotsprovide sensitivecomparisons!

49 / 67

Page 55: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References

3D mosaic displays

Loglinear models rely on log(nijk) ∼ linear model7→ nijk ∼ multiplicative model

Mosaic displays rely on (nested) use of Area = Height ×Width to represent frequencies in n-way tablesHow to take this to 3D?

−4.19

−2.00

0.00

2.00

4.00

8.02

Pearsonresiduals:

p−value =<2e−16

Mutual independence: ~Hair+Eye+SexEye color

Hai

r co

lor

Sex

Blo

nd

Fem

aleM

ale

Red

Fem

aleMal

e

Bro

wn

Fem

ale

Mal

e

Bla

ck

Brown HazelGreen BlueF

emal

eMal

e

−4.19

−2.00

0.00

2.00

4.00

8.02

Pearsonresiduals:

p−value =<2e−16

Mutual independence: Expected frequenciesEye color

Hai

r co

lor

Sex

Blo

nd

Fem

aleM

ale

Red

Fem

aleMal

e

Bro

wn

Fem

ale

Mal

e

Bla

ck

Brown Hazel Green Blue

Fem

aleM

ale

50 / 67

Page 56: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References

3D mosaic displays

mosaic3d() in the vcdExtra packagepartitition unit cube 7→ nested set of 3D tiles, Volume ∼frequencyuses rgl package: interactive, 3D graphs

> mosaic3d(HEC) > mosaic3d(HEC, type="expected")

51 / 67

Page 57: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

52 / 67

Page 58: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Log odds ratios

In any two-way, R× C table, all associations can berepresented by a set of (R− 1)× (C − 1) odds ratios,

θij =nij/ni+1,j

ni,j+1/ni+1,j+1=nij × ni+1,j+1

ni+1,j × ni,j+1

ln(θij) =(1 −1 −1 1

)ln(nij ni+1,j ni,j+1 ni+1,j+1

)T

53 / 67

Page 59: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Log odds ratios

In any two-way, R× C table, all associations can berepresented by a set of (R− 1)× (C − 1) odds ratios,

θij =nij/ni+1,j

ni,j+1/ni+1,j+1=nij × ni+1,j+1

ni+1,j × ni,j+1

ln(θij) =(1 −1 −1 1

)ln(nij ni+1,j ni,j+1 ni+1,j+1

)T

54 / 67

Page 60: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Log odds ratios

ln θij ∼ N (0, σ2), with estimated asymptotic standard error:

σ(ln θij) = (n−1ij + n−1i+1,j + n−1i,j+1 + n−1i+1,j+1)1/2

This extends naturally to θij | k in higher-way tables, stratifiedby one or more “control” variables.

Many models have a simpler form expressed in terms ofln(θij).

e.g., Uniform association model

ln(mij) = µ+ λAi + λBj + γaibj ≡ ln(θij) = γ

Direct visualization of log odds ratios permits more sensitivecomparisons than area-based displays.

55 / 67

Page 61: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Models for log odds ratios: Computation

Consider an R× C ×K1 ×K2 × . . . frequency table nij···,with factors K1,K2 . . . considered as strata.Let n = vec(nij···) be the N × 1 vectorization of the table.Then, all log odds ratios and their asymptotic covariancematrix can be calculated as:

ln(θ) = C ln(n)S = Var[ln(θ)] = C diag(n)−1 CT

where C is an N -column matrix containing all zeros, exceptfor two +1 elements and two −1 elements in each row.e.g., for a 2× 2 table, C =

[1 −1 −1 1

]With strata, C can be calculated asC = CRC ⊗ IK1 ⊗ IK2 ⊗ · · ·loddsratio() in vcdExtra package provides genericmethods (coef(), vcov(), confint(), . . . )

56 / 67

Page 62: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Models for log odds ratios: Estimation

A log odds ratio linear model for the ln(θ) is

ln(θ) =Xβ

where X is the design matrix of covariates

The (asymptotic) ML estimates β are obtained by GLS via

β =(XTS−1X

)−1XTS−1 ln θ

where S = Var[ln(θ)] is the estimated covariance matrix

7→ Standard diagnostic and graphical methods can be adaptedto this case.

diagnostics: influence plots, added-variable plots, . . .visualization: effect plots, . . .

57 / 67

Page 63: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Outline

1 IntroductionBrief History of VCDVisual overview

2 Generalized Mosaic Displays: vcd PackageExtending mosaic-like displaysThe strucplot framework

3 Generalized Nonlinear Models: gnm & vcdExtra PackagesLoglinear models and generalized linear modelsGeneralized nonlinear models: gnm packageModels for ordered categories

4 3D Mosaics: vcdExtra Package

5 Models and Visualization for Log Odds RatiosLog odds ratiosExamples

58 / 67

Page 64: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Example: Breathlessness & Wheeze in Coal Miners

> fourfold(CoalMiners, mfcol = c(2, 4), fontsize = 18)

Wheeze: W

Bre

athl

essn

ess:

B

Wheeze: NoW

Bre

athl

essn

ess:

NoB

Age: 25−29

23

105

9

1654

Wheeze: W

Bre

athl

essn

ess:

BWheeze: NoW

Bre

athl

essn

ess:

NoB

Age: 30−34

54

177

19

1863

Wheeze: W

Bre

athl

essn

ess:

B

Wheeze: NoW

Bre

athl

essn

ess:

NoB

Age: 35−39

121

257

48

2357

Wheeze: W

Bre

athl

essn

ess:

B

Wheeze: NoW

Bre

athl

essn

ess:

NoB

Age: 40−44

169

273

54

1778

Wheeze: W

Bre

athl

essn

ess:

B

Wheeze: NoW

Bre

athl

essn

ess:

NoB

Age: 45−49

269

324

88

1712

Wheeze: W

Bre

athl

essn

ess:

B

Wheeze: NoW

Bre

athl

essn

ess:

NoB

Age: 50−54

404

245

117

1324

Wheeze: W

Bre

athl

essn

ess:

B

Wheeze: NoW

Bre

athl

essn

ess:

NoB

Age: 55−59

406

225

152

967

Wheeze: W

Bre

athl

essn

ess:

B

Wheeze: NoW

Bre

athl

essn

ess:

NoB

Age: 60−64

372

132

106

526

There is a strong + association at all agesBut can you see the trend?

59 / 67

Page 65: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Example: Breathlessness & Wheeze in Coal Miners

> (lor.CM <- loddsratio(CoalMiners))

log odds ratios for Wheeze and Breathlessness by Age25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-643.695 3.398 3.141 3.015 2.782 2.926 2.441 2.638

Fit linear and quadratic models in Age using WLS:

> lor.CM.df <- as.data.frame(lor.CM)> age <- seq(25, 60, by = 5)> CM.mod1 <- lm(LOR ~ age, weights=1/ASE^2, data=lor.CM.df)> CM.mod2 <- lm(LOR ~ poly(age,2), weights=1/ASE^2, data=lor.CM.df)> anova(CM.mod1, CM.mod2)

Analysis of Variance TableModel 1: LOR ~ ageModel 2: LOR ~ poly(age, 2)Res.Df RSS Df Sum of Sq F Pr(>F)

1 6 6.342 5 5.60 1 0.742 0.66 0.45

60 / 67

Page 66: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Example: Breathlessness & Wheeze in Coal Miners

Plot log odds ratios and fitted regressions: The trend is now clear!

2.5

3.0

3.5

4.0

CoalMiners data: Log odds ratio plot

Age

Log

odds

rat

io: W

heez

e x

Bre

athl

essn

ess

25−29 30−34 35−39 40−44 45−49 50−54 55−59 60−64

61 / 67

Page 67: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Attitudes toward corporal punishment

A four-way table, classifying 1,456 persons in Denmark(Punishment data in vcd package).

Attitude: approves moderate punishment of children(moderate), or refuses any punishment (no)

Memory: Person recalls having been punished as a child?

Education: highest level (elementary, secondary, high)

Age group: (15-24, 25-39, 40+)

Age 15–24 25–39 40+Education Attitude Memory Yes No Yes No Yes NoElementary No 1 26 3 46 20 109

Moderate 21 93 41 119 143 324Secondary No 2 23 8 52 4 44

Moderate 5 45 20 84 20 56High No 2 26 6 24 1 13

Moderate 1 19 4 26 8 17

62 / 67

Page 68: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Attitudes toward corporal punishment

Fourfold plots: Association of Attitude with Memory

> cotabplot(punish, panel = cotab_fourfold)

age = 15−24education = elementary

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

1

26

21

93

age = 25−39education = elementary

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

3

46

41

119

age = 40+education = elementary

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

20

109

143

324

age = 15−24education = secondary

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

2

23

5

45

age = 25−39education = secondary

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

8

52

20

84

age = 40+education = secondary

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

4

44

20

56

age = 15−24education = high

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

2

26

1

19

age = 25−39education = high

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

6

24

4

26

age = 40+education = high

memory: yes

attit

ude:

no

memory: no

attit

ude:

mod

erat

e

1

13

8

17

63 / 67

Page 69: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Log odds ratio plot> (lor.pun <- loddsratio(punish))

log odds ratios for memory and attitude by age, educationeducation

age elementary secondary high15-24 -1.7700 -0.2451 0.379525-39 -1.6645 -0.4367 0.485540+ -0.8777 -1.3683 -1.8112

Attitudes toward corporal punishment

Education

Log

odds

rat

io: A

ttitu

de x

Mem

ory

elementary secondary high

−3

−2

−1

01

2

15−2425−39

40+

Age

Structure now completely clear

Little diffce between younger groups

Opposite pattern for the 40+

Need to fit an LOR model toconfirm appearences (SEs large)

(These methods are underdevelopment)

64 / 67

Page 70: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Summary

Effective data analysis for categorical data depends on:

Flexible models, with syntax to specify possibly complexmodels — easilyFlexible visualization tools to help understand data, models,lack of fit, etc. — easily

The vcd package provides very general visualization methodsvia the strucplot framework

The gnm package extends the class of applicable models forcontingency tables considerably

Parsimonious models for structured associationsMultiplicative and other nonlinear terms

The vcdExtra package provides glue, and a testbed for newvisualization methods

65 / 67

Page 71: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References Log odds ratios Examples

Further information

vcd Zeileis A, Meyer D & Hornik K (2006). TheStrucplot Framework: Visualizing Multi-WayContingency Tables with vcd. Journal of StatisticalSoftware, 17(3), 1–48.http://www.jstatsoft.org/v17/i03/

vignette("strucplot", package="vcd").

gnm Turner H & Firth D (2010). Generalized nonlinearmodels in R: An overview of the gnm package.http://CRAN.R-project.org/package=gnm

vignette("gnmOverview", package="gnm").

vcdExtra Friendly M & others (2010). vcdExtra: vcdadditions. http:

//CRAN.R-project.org/package=vcdExtra.vignette("vcd-tutorial").

66 / 67

Page 72: Advances in Visualizing Categorical Data Using the vcd ...carme2011.agrocampus-ouest.fr/slides/Friendly... · Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra

intro vcd gnm 3D odds References

References I

Friendly, M. (1994). Mosaic displays for multi-way contingency tables.Journal of the American Statistical Association, 89, 190–200. URLhttp://www.jstor.org/stable/2291215.

Friendly, M. (2000). Visualizing Categorical Data. Cary, NC: SASInstitute.

Friendly, M. (2002). A brief history of the mosaic display. Journal ofComputational and Graphical Statistics, 11(1), 89–107.

Hartigan, J. A. and Kleiner, B. (1981). Mosaics for contingency tables.In W. F. Eddy (Ed.), Computer Science and Statistics: Proceedings ofthe 13th Symposium on the Interface, (pp. 268–273). New York, NY:Springer-Verlag.

Hartigan, J. A. and Kleiner, B. (1984). A mosaic of television ratings.The American Statistician, 38, 32–35.

Zeileis, A., Meyer, D., and Hornik, K. (2007). Residual-based shadingsfor visualizing (conditional) independence. Journal of Computationaland Graphical Statistics, 16(3), 507–525.

67 / 67


Recommended