Post on 20-Aug-2020
transcript
A Visualization Tool for Mining LargeCorrelation Tables:
The Association Navigator
Andreas Buja, Abba Krieger, Ed GeorgeStatistics Department, The Wharton School, University of Pennsylvania∗
August 22, 2016
1 Overview
The Association Navigator is an interactive visualization tool for viewing large tables ofcorrelations. The basic operation is zooming and panning of a table that is presented ingraphical form, here called a “blockplot”.
The tool is really a tool box that includes, among other things: (1) display of p-values andmissing value patterns in addition to correlations, (2) mark-up facilities to highlight variablesand sub-tables as landmarks when navigating the larger table, (3) histograms/barcharts,scatterplots and scatterplot matrices as “lenses” into the distributions of variables and vari-able pairs, (4) thresholding of correlations and p-values to show only strong and highlysignificant p-values, (5) trimming of extreme values of the variables for robustness, (6) “ref-erence variables” that stay in sight at all times, and (7) wholesale adjustment of groups ofvariables for other variables.
The tool has been applied to data with nearly 2,000 variables and associated tablesapproaching a size of 2,000×2,000. The usefulness of the tool is less in beholding gigantic ta-bles in their entirety and more in searching for interesting association patterns by navigatingmanageable but numerous and interconnected sub-tables.
2 Introduction
This document describes the Association Navigator (AN for short) in three sections:(1) In this introductory Section 2 we give some background about the data analytic and
∗This work was partially supported by a grant from the Simons Foundation (SFARI awards #121221and #296012 to A.K.). We appreciate obtaining access to the phenotypic data on SFARI Base(https://base.sfari.org). Partial support was also provided by NSF Grant DMS-1007689 to A. Buja.
1
age_
at_a
dos_
p1.C
DV
fam
ily_t
ype_
p1.C
DV
sex_
p1.C
DV
ethn
icity
_p1.
CD
Vcp
ea_d
x_p1
.CD
Vad
i_r_
cpea
_dx_
p1.C
DV
adi_
r_so
c_a_
tota
l_p1
.CD
Vad
i_r_
com
m_b
_non
_ver
bal_
tota
l_p1
.CD
Vad
i_r_
b_co
mm
_ver
bal_
tota
l_p1
.CD
Vad
i_r_
rrb_
c_to
tal_
p1.C
DV
adi_
r_ev
iden
ce_o
nset
_p1.
CD
Vad
os_m
odul
e_p1
.CD
Vdi
agno
sis_
ados
_p1.
CD
Vad
os_c
ss_p
1.C
DV
ados
_soc
ial_
affe
ct_p
1.C
DV
ados
_res
tric
ted_
repe
titiv
e_p1
.CD
Vad
os_c
omm
unic
atio
n_so
cial
_p1.
CD
Vss
c_di
agno
sis_
verb
al_i
q_p1
.CD
Vss
c_di
agno
sis_
verb
al_i
q_ty
pe_p
1.C
DV
ssc_
diag
nosi
s_no
nver
bal_
iq_p
1.C
DV
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_p
1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_vm
a_p1
.CD
Vss
c_di
agno
sis_
nvm
a_p1
.CD
Vvi
nela
nd_i
i_co
mpo
site
_sta
ndar
d_sc
ore_
p1.C
DV
srs_
pare
nt_t
_sco
re_p
1.C
DV
srs_
pare
nt_r
aw_t
otal
_p1.
CD
Vsr
s_te
ache
r_t_
scor
e_p1
.CD
Vsr
s_te
ache
r_ra
w_t
otal
_p1.
CD
Vrb
s_r_
over
all_
scor
e_p1
.CD
Vcb
cl_2
_5_i
nter
naliz
ing_
t_sc
ore_
p1.C
DV
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
Vcb
cl_6
_18_
inte
rnal
izin
g_t_
scor
e_p1
.CD
Vcb
cl_6
_18_
exte
rnal
izin
g_t_
scor
e_p1
.CD
Vab
c_to
tal_
scor
e_p1
.CD
Vno
n_fe
brile
_sei
zure
s_p1
.CD
Vfe
brile
_sei
zure
s_p1
.CD
V
age_at_ados_p1.CDVfamily_type_p1.CDV
sex_p1.CDVethnicity_p1.CDVcpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDVadi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDVadi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDVadi_r_evidence_onset_p1.CDV
ados_module_p1.CDVdiagnosis_ados_p1.CDV
ados_css_p1.CDVados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDVados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDVssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDVssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDVssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDVssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDVsrs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDVsrs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDVrbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDVcbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDVcbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDVnon_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
Correlations(Compl.Pairs)
Figure 1: A first example of a “blockplot”: labels in the bottom and left margins show variablenames, and blue and red blocks in the plotting area show positive and negative correlations.
2
statistical problem addressed by this tool; (2) in Section 3 we describe the graphical displaysused by the tool; (3) in Section 4 we describe the actual operation of the tool. We start withsome background:
An important focus of contemporary statistical research is on methods for large multi-variate data. The term “large” can have two meanings, not mutually exclusive: (1) a largenumber of cases (records, rows), also called the “large-n problem”, or (2) a large number ofvariables (attributes, columns), also called the “large-p problem.” The two types of large-ness call for different data analytic approaches and determine the kinds of questions thatcan be answered by the data. Most fundamentally it should be observed that increasing n,the number of cases, and increasing p, the number of variables, each has very different andin some ways opposite effects on statistical analysis. Since the general multivariate analysisproblem is to make statistical inference about the association among variables, increasing nhas the effect of improving the certainty of inference due to improved precision of estimates,whereas increasing p has the contrary effect of reducing the certainty of inference due to themultiplicity problem or, more colorfully, the “data dredging fallacy.” Therefore the level ofdetail that can be inferred about association among variables improves with increasing n butit plummets with increasing p.
The problem we address here is primarily the large-p problem. From the above discussionit follows that, for large p, associations among variables can generally be inferred only toa low level of detail and certainty. Hence it is sufficient to measure association by simplemeans such as plain correlations. Correlations indicate the basic directionality in pairwiseassociation, and as such they answer the simplest but also most fundamental question: arehigher values in X associated with higher or lower values in Y , at least in tendency?
Reliance on correlations may be subject to objections because they seem limited in theirrange of applicability for several reasons: (1) they are considered to be measures of linearassociation only, (2) they describe bivariate association only, and (3) they apply to quantita-tive variables only. In Appendix A we refute or temper each of these objections by showing(1) that correlations are usually useful measures of directionality even when the associationsare non-linear, (2) that higher-order associations play a reduced role especially in large-pproblems, and (3) that with the help of a few tricks of the trade (“scoring” and “dummycoding”) correlations are useful even for categorical variables, both ordinal and nominal. Inview of these arguments we proceed from the assumption that correlation tables, when usedcreatively, form quite general and powerful summaries of association among many variables.
In the following sections we describe first how we graphically present large correlationtables and second how we navigate and search them interactively. The software writtento this end, the Association Navigator or AN, implements the essential displays andinteractive functionality to support the “mining” of large correlation tables. The AN softwareis written entirely in the R language1.
All data examples in this document are drawn from the phenotypic data in the “SimonsSimplex Collection” (SSC) created by the Simons Foundation Autism Research Initiative(SFARI). Approved researchers can obtain the SSC dataset used in this document by apply-
1http://www.cran.r-project.org
3
ing at https://base.sfari.org.
3 Graphical Displays
3.1 Graphical Display of Correlation Tables: Blockplots
Figure 1 shows a first example of what we call a “blockplot”2 of a dataset with p = 38variables. This plot is intended as a direct and fairly obvious translation of a numericcorrelation table into visual form. The elements of the plot are as follows:
• The labels in the bottom and left margins show line-ups of the same 38 variables:age at ados p1.CDV, family type p1.CDV, sex p1.CDV,... . In contrast to tables, wherethe vertical axis lists variables top down, we follow the convention of scatterplots wherethe vertical axis is ascending and hence the variables are listed bottom up.
• The blue and red squares or “blocks” represent the pairwise correlations betweenvariables at the intersections of the (imagined) horizontal and vertical lines drawnfrom the respective margin labels. The magnitude of a correlation is reflected in thesize of the block and its sign in the color: positive correlations are shown in blueand negative correlations in red.3 — Along the ascending 45-degree diagonal are thecorrelations +1 of the variables with themselves, hence these blocks are of maximal size.The closeness of other correlations to +1 or −1 can be gauged by a size comparisonwith the diagonal blocks.
• Finally, the plot shows a small comment in the bottom left, “Correlations (Compl.Pairs)”,indicating that what is represented by the blocks is correlation of complete — that is,non-missing — pairs of values of the two variables in question. This comment refersto the missing values problem and to the fact that correlation can only be calculatedfrom the cases where the values of both variables are non-missing. The comment alsoalludes to the possibility that very different types of information could be representedby the blocks, and this is indeed made use of by the AN software (see Sections 3.3and 3.4).
As a “reading exercise” consider Figure 2: This is the same blockplot as in Figure 1, butfor ease of pointing we marked up two variables on the horizontal axis4:
age at ados p1.CDV, ados restricted repetitive p1.CDV,
2 This type of plot is also called “fluctuation diagram” (Hofmann, 2000). The term “blockplot” is ours,and we introduce it because it is more descriptive of the plot’s visual appearance. We may even dare proposethat “blockplot” be contracted to “blot”, which would be in the tradition of contracting “scatterplot matrix”to “splom” and “graphics object” to “grob”.
3 We follow the convention from finance where “being in the red” implies negative numbers; the oppositeconvention is from physics where red symbolizes higher temperatures. Users can easily change the defaultsfor blockplots; see the programming hints in Appendix B.
4 This dataset represents a version of the table proband cdv.csv in Version 9 of the phenotypic SSC.The acronym cdv means “core descriptive variables”.
4
age_
at_a
dos_
p1.C
DV
fam
ily_t
ype_
p1.C
DV
sex_
p1.C
DV
ethn
icity
_p1.
CD
V
cpea
_dx_
p1.C
DV
adi_
r_cp
ea_d
x_p1
.CD
V
adi_
r_so
c_a_
tota
l_p1
.CD
Vad
i_r_
com
m_b
_non
_ver
bal_
tota
l_p1
.CD
Vad
i_r_
b_co
mm
_ver
bal_
tota
l_p1
.CD
V
adi_
r_rr
b_c_
tota
l_p1
.CD
Vad
i_r_
evid
ence
_ons
et_p
1.C
DV
ados
_mod
ule_
p1.C
DV
diag
nosi
s_ad
os_p
1.C
DV
ados
_css
_p1.
CD
Vad
os_s
ocia
l_af
fect
_p1.
CD
Vad
os_r
estr
icte
d_re
petit
ive_
p1.C
DV
ados
_com
mun
icat
ion_
soci
al_p
1.C
DV
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ssc_
diag
nosi
s_ve
rbal
_iq_
type
_p1.
CD
Vss
c_di
agno
sis_
nonv
erba
l_iq
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_p
1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_vm
a_p1
.CD
Vss
c_di
agno
sis_
nvm
a_p1
.CD
V
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
Vsr
s_pa
rent
_t_s
core
_p1.
CD
V
srs_
pare
nt_r
aw_t
otal
_p1.
CD
Vsr
s_te
ache
r_t_
scor
e_p1
.CD
Vsr
s_te
ache
r_ra
w_t
otal
_p1.
CD
V
rbs_
r_ov
eral
l_sc
ore_
p1.C
DV
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
Vcb
cl_6
_18_
inte
rnal
izin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_ex
tern
aliz
ing_
t_sc
ore_
p1.C
DV
abc_
tota
l_sc
ore_
p1.C
DV
non_
febr
ile_s
eizu
res_
p1.C
DV
febr
ile_s
eizu
res_
p1.C
DV
age_at_ados_p1.CDVfamily_type_p1.CDV
sex_p1.CDVethnicity_p1.CDVcpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDVadi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDVadi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDVadi_r_evidence_onset_p1.CDV
ados_module_p1.CDVdiagnosis_ados_p1.CDV
ados_css_p1.CDVados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDVados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDVssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDVssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDVssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDVssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDVsrs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDVsrs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDVrbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDVcbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDVcbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDVnon_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
Correlations(Compl.Pairs)
Figure 2: A “reading exercise” illustrated with the same example as in Fig-ure 1. The salmon-colored strips highlight the variables age at ados p1.CDV
and ados restricted repetitive p1.CDV on the horizontal axis, and the variablesssc diagnosis vma p1.DCV and ssc diagnosis nvma p1.DCV on the vertical axis. At theintersections of the strips are the blocks that reflect the respective correlations.
5
meaning “age at the time of the administration of the ADOS or Autism Diagnostic Obser-vation Schedule”, and “problems due to restricted and repetitive behaviors”, respectively.Two other variables are marked up on the vertical axis:
ssc diagnosis vma p1.DCV, ssc diagnosis nvma p1.DCV.meaning “verbal mental age”, and “non-verbal mental age”, respectively, which are relatedto notions of IQ. For readability we will shorten the labels in what follows.
As for the actual reading exercise, in the intersection of the left vertical strip with thehorizontal strip, we find two blue blocks of which the lower is recognizably larger than theupper (the reader may have to zoom in if viewing the figure in a PDF reader), implyingthat the correlation of age at ados.. with both ..vma.. and ..nvma.. is positive, butmore strongly with the former than the latter, which may be news to the non-specialist:verbal skills are more strongly age-related than non-verbal skills. (Strictly speaking we canclaim this only for the present sample of autistic probands.) — Similarly, following the rightvertical strip to the intersection with the horizontal strip, we find two red blocks of whichagain the lower block is slightly larger than the upper, but both are smaller than the blueblocks in the left strip. This implies that ..restricted repetitive.. is negatively correlatedwith both ..vma.. and ..nvma.., but more strongly with the former, and both are moreweakly correlated with ..restricted repetitive.. than with age at ados... All of thismakes sense in light of the apparent meanings of the variables: Any notion of “mental age”is probably quite strongly and positively associated with chronological age; with hindsightwe may also accept that problems with specific behaviors tend to diminish with age, but theassociation is probably less strong than that between different notions of age.
Some other patterns are quickly parsed and understood: The two 2×2 blocks on the upperright diagonal stem from two versions of the same underlying measurements: “raw total”and “t score”. Next, the alternating patterns of red and blue in the center indicate that thethree IQ measures (“verbal”, “nonverbal”, “full scale”) are in an inverse association withthe corresponding IQ types. This makes sense because IQ types are dummy variables thatindicate whether an IQ test suitable for cognitively highly impaired probands was applied.— It becomes apparent that one can spend a fair amount of time wrapping one’s mindaround the visible blockplot patterns and their meanings.
3.2 Graphical Overview of Large Correlation Tables
Figure 1 shows a manageably small set of 38 variables which is yet large enough that thepresentation of a numeric table would be painful for anybody. This table, however, is asmall subset of a larger dataset of 757 variables which is shown in Figure 3. In spite ofthe very different appearance, this, too, is a blockplot drawn by the same tool and by thesame principles, with some allowance for the fact that 7572 = 573, 049 blocks cannot besensibly displayed on screens with an image resolution comparable to 7572. When blocks arerepresented by single pixels, blocksize variation is no longer possible. In this case, the tooldisplays only a selection of correlations that are largest in magnitude. The default (whichcan be changed) is to show 10,000 of the most extreme correlations, and it is these that givethe blockplot in Figure 3 the characteristic pattern of streaks and rectangular concentrations.
6
fam
ily.ID
pb
_site
s.FA
M
vn_s
ites.
FAM
xx
_site
s.FA
M
sb_s
ites.
FAM
fl_
site
s.FA
M
kr_s
ites.
FAM
sz
.sor
ted_
site
s.FA
M
eval
age_
s1_b
g.S
RS
ev
alag
e_m
o_bg
.SR
S
eval
mon
th_s
1_bg
.SR
S
eval
mon
th_m
o_bg
.SR
S
num
.sib
s_IN
D
sex.
f1m
0.p1
_IN
D
birt
h.p1
_IN
D
birt
h.fa
_IN
D
deat
h.p1
_IN
D
deat
h.m
o_IN
D
gene
tic.a
b.s1
_IN
D
gene
tic.a
b.m
o_IN
D
over
all_
gene
tic_F
AM
fa
ther
_gen
etic
_FA
M
sib_
gene
tic_F
AM
as
ian.
fa_r
aceP
AR
EN
T
nativ
e−ha
wai
ian.
fa_r
aceP
AR
EN
T
not−
spec
ified
.fa_r
aceP
AR
EN
T
ethn
icity
.his
pani
c.fa
_rac
ePA
RE
NT
as
ian.
mo_
race
PAR
EN
T
nativ
e−ha
wai
ian.
mo_
race
PAR
EN
T
not−
spec
ified
.mo_
race
PAR
EN
T
ethn
icity
.his
pani
c.m
o_ra
cePA
RE
NT
ba
pq_r
igid
_ave
rage
.fa_c
uPA
RE
NT
ba
pq_a
loof
_ave
rage
.fa_c
uPA
RE
NT
sr
s_ad
ult_
nbr_
mis
sing
.fa_c
uPA
RE
NT
ba
pq_n
br_m
issi
ng.m
o_cu
PAR
EN
T
bapq
_pra
gmat
ic_a
vera
ge.m
o_cu
PAR
EN
T
bapq
_ove
rall_
aver
age.
mo_
cuPA
RE
NT
sr
s_ad
ult_
tota
l.mo_
cuPA
RE
NT
fa
mily
_typ
e_p1
.CD
V
ethn
icity
_p1.
CD
V
adi_
r_cp
ea_d
x_p1
.CD
V
adi_
r_co
mm
_b_n
on_v
erba
l_to
tal_
p1.C
DV
ad
i_r_
rrb_
c_to
tal_
p1.C
DV
ad
os_m
odul
e_p1
.CD
V
ados
_css
_p1.
CD
V
ados
_res
tric
ted_
repe
titiv
e_p1
.CD
V
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ss
c_di
agno
sis_
nonv
erba
l_iq
_p1.
CD
V
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_p
1.C
DV
ss
c_di
agno
sis_
vma_
p1.C
DV
vi
nela
nd_i
i_co
mpo
site
_sta
ndar
d_sc
ore_
p1.C
DV
sr
s_pa
rent
_raw
_tot
al_p
1.C
DV
sr
s_te
ache
r_ra
w_t
otal
_p1.
CD
V
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
ab
c_to
tal_
scor
e_p1
.CD
V
febr
ile_s
eizu
res_
p1.C
DV
bc
kgd_
hx_h
ighe
st_e
du_f
athe
r_p1
.OC
UV
bc
kgd_
hx_p
aren
t_re
latio
n_st
atus
_p1.
OC
UV
ss
c_dx
_ove
rallc
erta
inty
_p1.
OC
UV
nb
r_st
illbi
rth_
mis
carr
iage
_p1.
OC
UV
fa
mily
_str
uctu
re_p
1.O
CU
V
wor
d_de
lay_
p1.O
CU
V
phra
se_d
elay
_p1.
OC
UV
ad
i_r_
q86_
abno
rmal
ity_e
vide
nt_p
1.O
CU
V
ados
1_al
gorit
hm_p
1.O
CU
V
a1_n
on_e
choe
d_p1
.OC
UV
ad
os_r
ecip
roca
l_so
cial
_p1.
OC
UV
va
bs_i
i_dl
s_st
anda
rd_p
1.O
CU
V
vabs
_ii_
mot
or_s
kills
_p1.
OC
UV
sr
s_pa
rent
_aw
aren
ess_
p1.O
CU
V
srs_
pare
nt_c
omm
unic
atio
n_p1
.OC
UV
sr
s_pa
rent
_mot
ivat
ion_
p1.O
CU
V
srs_
teac
her_
awar
enes
s_p1
.OC
UV
sr
s_te
ache
r_co
mm
unic
atio
n_p1
.OC
UV
sr
s_te
ache
r_m
otiv
atio
n_p1
.OC
UV
rb
s_r_
i_st
ereo
type
d_be
havi
or_p
1.O
CU
V
rbs_
r_iii
_com
puls
ive_
beha
vior
_p1.
OC
UV
rb
s_r_
v_sa
men
ess_
beha
vior
_p1.
OC
UV
ab
c_nb
r_m
issi
ng_p
1.O
CU
V
abc_
iv_h
yper
activ
ity_p
1.O
CU
V
abc_
iii_s
tere
otyp
y_p1
.OC
UV
sc
q_lif
e_nb
r_m
issi
ng_p
1.O
CU
V
scq_
life_
tota
l_p1
.OC
UV
cb
cl_2
_5_s
omat
ic_c
ompl
aint
s_p1
.OC
UV
cb
cl_2
_5_s
leep
_pro
blem
s_p1
.OC
UV
cb
cl_2
_5_a
ggre
ssiv
e_be
havi
or_p
1.O
CU
V
cbcl
_2_5
_affe
ctiv
e_pr
oble
ms_
p1.O
CU
V
cbcl
_2_5
_per
vasi
ve_d
evel
opm
enta
l_p1
.OC
UV
cb
cl_2
_5_o
ppos
ition
al_d
efia
nt_p
1.O
CU
V
cbcl
_6_1
8_so
cial
_p1.
OC
UV
cb
cl_6
_18_
tota
l_co
mpe
tenc
e_p1
.OC
UV
cb
cl_6
_18_
with
draw
n_p1
.OC
UV
cb
cl_6
_18_
soci
al_p
robl
ems_
p1.O
CU
V
cbcl
_6_1
8_at
tent
ion_
prob
lem
s_p1
.OC
UV
cb
cl_6
_18_
aggr
essi
ve_b
ehav
ior_
p1.O
CU
V
cbcl
_6_1
8_af
fect
ive_
prob
lem
s_p1
.OC
UV
cb
cl_6
_18_
som
atic
_pro
b_p1
.OC
UV
cb
cl_6
_18_
oppo
sitio
nal_
defia
nt_p
1.O
CU
V
cbcl
_2_5
_em
otio
nally
_rea
ctiv
e_p1
.OC
UV
m
at_r
heum
.art
hriti
s.ju
veni
le.A
UTO
.IMM
si
blin
g_rh
eum
.art
hriti
s.ju
veni
le.A
UTO
.IMM
pa
t.cou
sin_
rheu
m.a
rthr
itis.
juve
nile
.AU
TO.IM
M
pat.a
untu
ncle
_rhe
um.a
rthr
itis.
juve
nile
.AU
TO.IM
M
y.n_
rheu
m.a
rthr
itis.
adul
t.AU
TO.IM
M
mat
_rhe
um.a
rthr
itis.
adul
t.AU
TO.IM
M
mat
.aun
tunc
le_r
heum
.art
hriti
s.ad
ult.A
UTO
.IMM
m
at.g
rand
pare
nt_r
heum
.art
hriti
s.ad
ult.A
UTO
.IMM
y.
n_sy
stem
ic.lu
pus.
eryt
h.A
UTO
.IMM
m
at_s
yste
mic
.lupu
s.er
yth.
AU
TO.IM
M
mat
.cou
sin_
syst
emic
.lupu
s.er
yth.
AU
TO.IM
M
mat
.aun
tunc
le_s
yste
mic
.lupu
s.er
yth.
AU
TO.IM
M
mat
.gra
ndpa
rent
_sys
tem
ic.lu
pus.
eryt
h.A
UTO
.IMM
y.
n_as
thm
a.A
UTO
.IMM
m
at_a
sthm
a.A
UTO
.IMM
si
blin
g_as
thm
a.A
UTO
.IMM
pa
t.hal
fsib
ling_
asth
ma.
AU
TO.IM
M
pat.c
ousi
n_as
thm
a.A
UTO
.IMM
pa
t.aun
tunc
le_a
sthm
a.A
UTO
.IMM
pa
t.gra
ndpa
rent
_ast
hma.
AU
TO.IM
M
prob
and_
hype
rthy
roid
ism
.AU
TO.IM
M
pat_
hype
rthy
roid
ism
.AU
TO.IM
M
mat
.aun
tunc
le_h
yper
thyr
oidi
sm.A
UTO
.IMM
m
at.g
rand
pare
nt_h
yper
thyr
oidi
sm.A
UTO
.IMM
y.
n_hy
poth
yroi
dism
.AU
TO.IM
M
mat
_hyp
othy
roid
ism
.AU
TO.IM
M
sibl
ing_
hypo
thyr
oidi
sm.A
UTO
.IMM
pa
t.cou
sin_
hypo
thyr
oidi
sm.A
UTO
.IMM
pa
t.aun
tunc
le_h
ypot
hyro
idis
m.A
UTO
.IMM
pa
t.gra
ndpa
rent
_hyp
othy
roid
ism
.AU
TO.IM
M
prob
and_
hash
imot
os.ty
roid
itis.
AU
TO.IM
M
pat_
hash
imot
os.ty
roid
itis.
AU
TO.IM
M
mat
.cou
sin_
hash
imot
os.ty
roid
itis.
AU
TO.IM
M
pat.a
untu
ncle
_has
him
otos
.tyro
iditi
s.A
UTO
.IMM
pa
t.gra
ndpa
rent
_has
him
otos
.tyro
iditi
s.A
UTO
.IMM
m
at_d
iabe
tes.
mel
litus
.type
1.A
UTO
.IMM
si
blin
g_di
abet
es.m
ellit
us.ty
pe1.
AU
TO.IM
M
pat.c
ousi
n_di
abet
es.m
ellit
us.ty
pe1.
AU
TO.IM
M
pat.a
untu
ncle
_dia
bete
s.m
ellit
us.ty
pe1.
AU
TO.IM
M
pat.g
rand
pare
nt_d
iabe
tes.
mel
litus
.type
1.A
UTO
.IMM
pr
oban
d_di
abet
es.m
ellit
us.ty
pe2.
AU
TO.IM
M
pat_
diab
etes
.mel
litus
.type
2.A
UTO
.IMM
pa
t.cou
sin_
diab
etes
.mel
litus
.type
2.A
UTO
.IMM
pa
t.aun
tunc
le_d
iabe
tes.
mel
litus
.type
2.A
UTO
.IMM
pa
t.gra
ndpa
rent
_dia
bete
s.m
ellit
us.ty
pe2.
AU
TO.IM
M
prob
and_
adre
nal.i
nsuf
ficie
ncy.
AU
TO.IM
M
mat
.aun
tunc
le_a
dren
al.in
suffi
cien
cy.A
UTO
.IMM
pa
t.gra
ndpa
rent
_adr
enal
.insu
ffici
ency
.AU
TO.IM
M
prob
and_
psor
iasi
s.A
UTO
.IMM
pa
t_ps
oria
sis.
AU
TO.IM
M
pat.h
alfs
iblin
g_ps
oria
sis.
AU
TO.IM
M
pat.c
ousi
n_ps
oria
sis.
AU
TO.IM
M
pat.a
untu
ncle
_pso
riasi
s.A
UTO
.IMM
pa
t.gra
ndpa
rent
_pso
riasi
s.A
UTO
.IMM
pr
oban
d_bo
wel
.dis
orde
rs.A
UTO
.IMM
pa
t_bo
wel
.dis
orde
rs.A
UTO
.IMM
m
at.h
alfs
iblin
g_bo
wel
.dis
orde
rs.A
UTO
.IMM
pa
t.cou
sin_
bow
el.d
isor
ders
.AU
TO.IM
M
pat.a
untu
ncle
_bow
el.d
isor
ders
.AU
TO.IM
M
pat.g
rand
pare
nt_b
owel
.dis
orde
rs.A
UTO
.IMM
pr
oban
d_ce
llac.
dise
ase.
AU
TO.IM
M
pat_
cella
c.di
seas
e.A
UTO
.IMM
m
at.c
ousi
n_ce
llac.
dise
ase.
AU
TO.IM
M
mat
.aun
tunc
le_c
ella
c.di
seas
e.A
UTO
.IMM
m
at.g
rand
pare
nt_c
ella
c.di
seas
e.A
UTO
.IMM
y.
n_m
ultip
le.s
cler
osis
.AU
TO.IM
M
mat
_mul
tiple
.scl
eros
is.A
UTO
.IMM
m
at.c
ousi
n_m
ultip
le.s
cler
osis
.AU
TO.IM
M
mat
.aun
tunc
le_m
ultip
le.s
cler
osis
.AU
TO.IM
M
mat
.gra
ndpa
rent
_mul
tiple
.scl
eros
is.A
UTO
.IMM
y.
n_ot
her.a
utoi
mm
une.
diso
rder
.AU
TO.IM
M
prob
and_
othe
r.aut
oim
mun
e.di
sord
er.A
UTO
.IMM
pa
t_ot
her.a
utoi
mm
une.
diso
rder
.AU
TO.IM
M
pat.h
alfs
iblin
g_ot
her.a
utoi
mm
une.
diso
rder
.AU
TO.IM
M
pat.c
ousi
n_ot
her.a
utoi
mm
une.
diso
rder
.AU
TO.IM
M
pat.a
untu
ncle
_oth
er.a
utoi
mm
une.
diso
rder
.AU
TO.IM
M
pat.g
rand
pare
nt_o
ther
.aut
oim
mun
e.di
sord
er.A
UTO
.IMM
pr
oban
d_cl
eft.l
ip.p
alat
e.B
TH
.DE
F
sibl
ing_
clef
t.lip
.pal
ate.
BT
H.D
EF
m
at.c
ousi
n_cl
eft.l
ip.p
alat
e.B
TH
.DE
F
mat
.aun
tunc
le_c
left.
lip.p
alat
e.B
TH
.DE
F
mat
.gra
ndpa
rent
_cle
ft.lip
.pal
ate.
BT
H.D
EF
pr
oban
d_op
en.s
pine
.BT
H.D
EF
m
at.c
ousi
n_op
en.s
pine
.BT
H.D
EF
pa
t.aun
tunc
le_o
pen.
spin
e.B
TH
.DE
F
y.n_
cong
enita
l.hea
rt.d
efec
t.BT
H.D
EF
m
at_c
onge
nita
l.hea
rt.d
efec
t.BT
H.D
EF
si
blin
g_co
ngen
ital.h
eart
.def
ect.B
TH
.DE
F
mat
.cou
sin_
cong
enita
l.hea
rt.d
efec
t.BT
H.D
EF
m
at.a
untu
ncle
_con
geni
tal.h
eart
.def
ect.B
TH
.DE
F
mat
.gra
ndpa
rent
_con
geni
tal.h
eart
.def
ect.B
TH
.DE
F
y.n_
kidn
ey.d
efec
t.BT
H.D
EF
m
at_k
idne
y.de
fect
.BT
H.D
EF
si
blin
g_ki
dney
.def
ect.B
TH
.DE
F
pat.c
ousi
n_ki
dney
.def
ect.B
TH
.DE
F
pat.a
untu
ncle
_kid
ney.
defe
ct.B
TH
.DE
F
pat.g
rand
pare
nt_k
idne
y.de
fect
.BT
H.D
EF
pr
oban
d_ab
norm
al.s
hape
.pol
ydac
tyly
.BT
H.D
EF
pa
t_ab
norm
al.s
hape
.pol
ydac
tyly
.BT
H.D
EF
m
at.c
ousi
n_ab
norm
al.s
hape
.pol
ydac
tyly
.BT
H.D
EF
m
at.a
untu
ncle
_abn
orm
al.s
hape
.pol
ydac
tyly
.BT
H.D
EF
m
at.g
rand
pare
nt_a
bnor
mal
.sha
pe.p
olyd
acty
ly.B
TH
.DE
F
y.n_
othe
r.bir
th.d
efec
t.BT
H.D
EF
pr
oban
d_ot
her.b
irth
.def
ect.B
TH
.DE
F
pat_
othe
r.bir
th.d
efec
t.BT
H.D
EF
pa
t.hal
fsib
ling_
othe
r.bir
th.d
efec
t.BT
H.D
EF
pa
t.cou
sin_
othe
r.bir
th.d
efec
t.BT
H.D
EF
pa
t.aun
tunc
le_o
ther
.bir
th.d
efec
t.BT
H.D
EF
pa
t.gra
ndpa
rent
_oth
er.b
irth
.def
ect.B
TH
.DE
F
prob
and_
hear
t.dis
ease
.CH
RC
.ILL
pat_
hear
t.dis
ease
.CH
RC
.ILL
mat
.cou
sin_
hear
t.dis
ease
.CH
RC
.ILL
mat
.aun
tunc
le_h
eart
.dis
ease
.CH
RC
.ILL
mat
.gra
ndpa
rent
_hea
rt.d
isea
se.C
HR
C.IL
L y.
n_st
roke
.CH
RC
.ILL
pat_
stro
ke.C
HR
C.IL
L m
at.a
untu
ncle
_str
oke.
CH
RC
.ILL
mat
.gra
ndpa
rent
_str
oke.
CH
RC
.ILL
y.n_
canc
er.C
HR
C.IL
L pa
t_ca
ncer
.CH
RC
.ILL
mat
.hal
fsib
ling_
canc
er.C
HR
C.IL
L pa
t.cou
sin_
canc
er.C
HR
C.IL
L pa
t.aun
tunc
le_c
ance
r.CH
RC
.ILL
pat.g
rand
pare
nt_c
ance
r.CH
RC
.ILL
prob
and_
deat
h.un
der.5
0.C
HR
C.IL
L m
at.h
alfs
iblin
g_de
ath.
unde
r.50.
CH
RC
.ILL
mat
.cou
sin_
deat
h.un
der.5
0.C
HR
C.IL
L m
at.a
untu
ncle
_dea
th.u
nder
.50.
CH
RC
.ILL
mat
.gra
ndpa
rent
_dea
th.u
nder
.50.
CH
RC
.ILL
y.n_
othe
r.dis
orde
r.illn
ess1
.CH
RC
.ILL
prob
and_
othe
r.dis
orde
r.illn
ess1
.CH
RC
.ILL
pat_
othe
r.dis
orde
r.illn
ess1
.CH
RC
.ILL
mat
.hal
fsib
ling_
othe
r.dis
orde
r.illn
ess1
.CH
RC
.ILL
mat
.cou
sin_
othe
r.dis
orde
r.illn
ess1
.CH
RC
.ILL
mat
.aun
tunc
le_o
ther
.dis
orde
r.illn
ess1
.CH
RC
.ILL
mat
.gra
ndpa
rent
_oth
er.d
isor
der.i
llnes
s1.C
HR
C.IL
L y.
n_ot
her.d
isor
der.i
llnes
s2.C
HR
C.IL
L pr
oban
d_ot
her.d
isor
der.i
llnes
s2.C
HR
C.IL
L pa
t_ot
her.d
isor
der.i
llnes
s2.C
HR
C.IL
L m
at.h
alfs
iblin
g_ot
her.d
isor
der.i
llnes
s2.C
HR
C.IL
L pa
t.cou
sin_
othe
r.dis
orde
r.illn
ess2
.CH
RC
.ILL
pat.a
untu
ncle
_oth
er.d
isor
der.i
llnes
s2.C
HR
C.IL
L pa
t.gra
ndpa
rent
_oth
er.d
isor
der.i
llnes
s2.C
HR
C.IL
L di
et_o
ther
_1_p
ast_
p1.D
T.M
ED
.SLP
di
et_o
ther
_2_p
ast_
p1.D
T.M
ED
.SLP
ot
her_
med
s_2_
desc
_p1.
DT.
ME
D.S
LP
over
_cou
nter
s_2_
desc
_p1.
DT.
ME
D.S
LP
othe
r_m
eds_
1_re
ason
_s1.
DT.
ME
D.S
LP
othe
r_m
eds_
2_re
ason
_s1.
DT.
ME
D.S
LP
over
_cou
nter
s_1_
reas
on_s
1.D
T.M
ED
.SLP
ov
er_c
ount
ers_
2_re
ason
_s1.
DT.
ME
D.S
LP
othe
r_m
eds_
2_de
sc_f
a.D
T.M
ED
.SLP
ov
er_c
ount
ers_
2_de
sc_f
a.D
T.M
ED
.SLP
dp
t_re
actio
n_p1
.DT.
ME
D.S
LP
dtap
_rea
ctio
n_p1
.DT.
ME
D.S
LP
hib_
reac
tion_
p1.D
T.M
ED
.SLP
he
patit
is_b
_rea
ctio
n_p1
.DT.
ME
D.S
LP
polio
_ora
l_re
actio
n_p1
.DT.
ME
D.S
LP
polio
_inj
ecte
dl_r
eact
ion_
p1.D
T.M
ED
.SLP
m
mr_
reac
tion_
p1.D
T.M
ED
.SLP
flu
_sho
t_re
actio
n_p1
.DT.
ME
D.S
LP
chic
ken_
pox_
varic
ella
_if_
recd
_p1.
DT.
ME
D.S
LP
vacc
inat
ion_
othe
r_1_
p1.D
T.M
ED
.SLP
va
ccin
atio
n_ot
her_
1_re
actio
n_p1
.DT.
ME
D.S
LP
vacc
inat
ion_
othe
r_2_
reac
tion_
p1.D
T.M
ED
.SLP
y.
n_an
gelm
an.G
EN
.DIS
y.
n_do
wn.
synd
rom
e.G
EN
.DIS
m
at.c
ousi
n_do
wn.
synd
rom
e.G
EN
.DIS
m
at.a
untu
ncle
_dow
n.sy
ndro
me.
GE
N.D
IS
y.n_
phen
ylke
tonu
ria.G
EN
.DIS
pa
t.cou
sin_
phen
ylke
tonu
ria.G
EN
.DIS
m
at.a
untu
ncle
_cri.
du.c
hat.G
EN
.DIS
sp
ecify
_oth
er.g
enet
ic.G
EN
.DIS
m
at_o
ther
.gen
etic
.GE
N.D
IS
sibl
ing_
othe
r.gen
etic
.GE
N.D
IS
pat.c
ousi
n_ot
her.g
enet
ic.G
EN
.DIS
pa
t.aun
tunc
le_o
ther
.gen
etic
.GE
N.D
IS
pat.g
rand
pare
nt_o
ther
.gen
etic
.GE
N.D
IS
gest
atio
nal_
age_
days
_LB
R.D
LY.B
TH
.FE
ED
la
bor_
dura
tion_
LBR
.DLY
.BT
H.F
EE
D
labo
r_du
ratio
n_na
_LB
R.D
LY.B
TH
.FE
ED
an
esth
esia
_spi
nal_
LBR
.DLY
.BT
H.F
EE
D
anes
thes
ia_g
ener
al_L
BR
.DLY
.BT
H.F
EE
D
labo
r_in
duce
d_LB
R.D
LY.B
TH
.FE
ED
in
duct
ion_
prol
onge
d_ru
ptur
e_m
embr
ane_
LBR
.DLY
.BT
H.F
EE
D
indu
ctio
n_po
st_d
ates
_LB
R.D
LY.B
TH
.FE
ED
in
duct
ion_
reas
on_o
ther
_spe
cify
_LB
R.D
LY.B
TH
.FE
ED
in
duct
ion_
met
hod_
pito
cin_
LBR
.DLY
.BT
H.F
EE
D
indu
ctio
n_m
etho
d_pr
osta
glan
dins
_LB
R.D
LY.B
TH
.FE
ED
in
duct
ion_
met
hod_
othe
r_sp
ecify
_LB
R.D
LY.B
TH
.FE
ED
au
gmen
tatio
n_pr
emat
ure_
rupt
ure_
mem
bran
e_LB
R.D
LY.B
TH
.FE
ED
au
gmen
tatio
n_fa
ilure
_to_
prog
ress
_LB
R.D
LY.B
TH
.FE
ED
au
gmen
tatio
n_re
ason
_oth
er_L
BR
.DLY
.BT
H.F
EE
D
augm
enta
tion_
met
hod_
amni
otom
y_LB
R.D
LY.B
TH
.FE
ED
au
gmen
tatio
n_m
etho
d_st
rippi
ng_L
BR
.DLY
.BT
H.F
EE
D
augm
enta
tion_
met
hod_
othe
r_LB
R.D
LY.B
TH
.FE
ED
pr
esen
tatio
n_ce
phal
ic_L
BR
.DLY
.BT
H.F
EE
D
pres
enta
tion_
tran
sver
se_L
BR
.DLY
.BT
H.F
EE
D
deliv
ery_
type
_LB
R.D
LY.B
TH
.FE
ED
c_
sect
ion_
emer
gent
_LB
R.D
LY.B
TH
.FE
ED
c_
sect
ion_
plan
ned_
LBR
.DLY
.BT
H.F
EE
D
c_se
ctio
n_ot
her_
spec
ify_L
BR
.DLY
.BT
H.F
EE
D
nuch
al_c
ord_
LBR
.DLY
.BT
H.F
EE
D
plac
enta
_pre
via_
LBR
.DLY
.BT
H.F
EE
D
child
_sev
erel
y_tr
aum
atiz
ed_L
BR
.DLY
.BT
H.F
EE
D
birt
h_w
eigh
t_lb
s_LB
R.D
LY.B
TH
.FE
ED
bi
rth_
wei
ght_
not_
sure
_LB
R.D
LY.B
TH
.FE
ED
bi
rth_
head
_circ
umfe
renc
e_pe
rcen
tile_
LBR
.DLY
.BT
H.F
EE
D
birt
h_he
ad_c
ircum
fere
nce_
not_
sure
_lis
t_LB
R.D
LY.B
TH
.FE
ED
bi
rth_
leng
th_n
ot_s
ure_
LBR
.DLY
.BT
H.F
EE
D
apga
r_sc
ore_
at_5
_min
utes
_LB
R.D
LY.B
TH
.FE
ED
ho
spita
l_af
ter_
birt
h_LB
R.D
LY.B
TH
.FE
ED
hy
perb
iliru
bine
mia
_rx_
no_t
reat
men
t_LB
R.D
LY.B
TH
.FE
ED
hy
perb
iliru
bine
mia
_rx_
exch
ange
_tra
ns_L
BR
.DLY
.BT
H.F
EE
D
mec
oniu
m_a
spira
tion_
LBR
.DLY
.BT
H.F
EE
D
o2_s
uppl
emen
t_LB
R.D
LY.B
TH
.FE
ED
o2
_by_
mas
k_an
d_ve
ntila
tion_
LBR
.DLY
.BT
H.F
EE
D
resu
scita
tion_
requ
ired_
LBR
.DLY
.BT
H.F
EE
D
nicu
_dur
atio
n_ad
mis
sion
_LB
R.D
LY.B
TH
.FE
ED
an
emia
_LB
R.D
LY.B
TH
.FE
ED
hy
poka
lem
ia_L
BR
.DLY
.BT
H.F
EE
D
hypo
glyc
emia
_LB
R.D
LY.B
TH
.FE
ED
di
ff_re
gula
te_t
emp_
LBR
.DLY
.BT
H.F
EE
D
phys
ical
_ano
mal
ies_
poly
dact
yly_
LBR
.DLY
.BT
H.F
EE
D
phys
ical
_ano
mal
ies_
kidn
ey_L
BR
.DLY
.BT
H.F
EE
D
brea
st_b
ottle
_fee
d_LB
R.D
LY.B
TH
.FE
ED
br
east
_bot
tle_m
onth
s_LB
R.D
LY.B
TH
.FE
ED
br
east
_tot
al_m
onth
s_LB
R.D
LY.B
TH
.FE
ED
bo
ttle_
tota
l_m
onth
s_LB
R.D
LY.B
TH
.FE
ED
ty
pe_o
f_fo
rmul
a_co
w_m
ilk_L
BR
.DLY
.BT
H.F
EE
D
type
_of_
form
ula_
othe
r_sp
ecify
_LB
R.D
LY.B
TH
.FE
ED
po
or_s
uck_
LBR
.DLY
.BT
H.F
EE
D
stiff
_inf
ant_
LBR
.DLY
.BT
H.F
EE
D
leth
argi
c_ov
erly
_sle
epy_
LBR
.DLY
.BT
H.F
EE
D
prob
and_
spee
ch.d
elay
.LA
NG
.DIS
pa
t_sp
eech
.del
ay.L
AN
G.D
IS
mat
.hal
fsib
ling_
spee
ch.d
elay
.LA
NG
.DIS
m
at.c
ousi
n_sp
eech
.del
ay.L
AN
G.D
IS
mat
.aun
tunc
le_s
peec
h.de
lay.
LAN
G.D
IS
mat
.gra
ndpa
rent
_spe
ech.
dela
y.LA
NG
.DIS
y.
n_ex
pres
sive
.lang
.dis
orde
r.LA
NG
.DIS
m
at_e
xpre
ssiv
e.la
ng.d
isor
der.L
AN
G.D
IS
sibl
ing_
expr
essi
ve.la
ng.d
isor
der.L
AN
G.D
IS
pat.c
ousi
n_ex
pres
sive
.lang
.dis
orde
r.LA
NG
.DIS
pa
t.aun
tunc
le_e
xpre
ssiv
e.la
ng.d
isor
der.L
AN
G.D
IS
y.n_
rece
ptiv
e.la
ng.d
isor
der.L
AN
G.D
IS
mat
_rec
eptiv
e.la
ng.d
isor
der.L
AN
G.D
IS
sibl
ing_
rece
ptiv
e.la
ng.d
isor
der.L
AN
G.D
IS
prob
and_
mix
ed.e
xpre
ssiv
e.di
sord
er.L
AN
G.D
IS
pat_
mix
ed.e
xpre
ssiv
e.di
sord
er.L
AN
G.D
IS
mat
.cou
sin_
mix
ed.e
xpre
ssiv
e.di
sord
er.L
AN
G.D
IS
mat
.aun
tunc
le_m
ixed
.exp
ress
ive.
diso
rder
.LA
NG
.DIS
m
at.g
rand
pare
nt_m
ixed
.exp
ress
ive.
diso
rder
.LA
NG
.DIS
pr
oban
d_co
mm
unic
atio
n.di
sord
er.L
AN
G.D
IS
mat
.cou
sin_
com
mun
icat
ion.
diso
rder
.LA
NG
.DIS
y.
n_pr
agm
atic
s.la
ng.d
isor
der.L
AN
G.D
IS
sibl
ing_
prag
mat
ics.
lang
.dis
orde
r.LA
NG
.DIS
pa
t.cou
sin_
prag
mat
ics.
lang
.dis
orde
r.LA
NG
.DIS
pa
t.aun
tunc
le_p
ragm
atic
s.la
ng.d
isor
der.L
AN
G.D
IS
y.n_
stut
terin
g.LA
NG
.DIS
m
at_s
tutte
ring.
LAN
G.D
IS
sibl
ing_
stut
terin
g.LA
NG
.DIS
pa
t.hal
fsib
ling_
stut
terin
g.LA
NG
.DIS
pa
t.cou
sin_
stut
terin
g.LA
NG
.DIS
pa
t.aun
tunc
le_s
tutte
ring.
LAN
G.D
IS
pat.g
rand
pare
nt_s
tutte
ring.
LAN
G.D
IS
prob
and_
redu
ced.
artic
ulat
ion.
LAN
G.D
IS
pat_
redu
ced.
artic
ulat
ion.
LAN
G.D
IS
mat
.hal
fsib
ling_
redu
ced.
artic
ulat
ion.
LAN
G.D
IS
mat
.cou
sin_
redu
ced.
artic
ulat
ion.
LAN
G.D
IS
mat
.aun
tunc
le_r
educ
ed.a
rtic
ulat
ion.
LAN
G.D
IS
mat
.gra
ndpa
rent
_red
uced
.art
icul
atio
n.LA
NG
.DIS
y.
n_ot
her.l
ang.
diso
rder
.LA
NG
.DIS
pr
oban
d_ot
her.l
ang.
diso
rder
.LA
NG
.DIS
pa
t_ot
her.l
ang.
diso
rder
.LA
NG
.DIS
m
at.h
alfs
iblin
g_ot
her.l
ang.
diso
rder
.LA
NG
.DIS
m
at.c
ousi
n_ot
her.l
ang.
diso
rder
.LA
NG
.DIS
m
at.a
untu
ncle
_oth
er.la
ng.d
isor
der.L
AN
G.D
IS
mat
.gra
ndpa
rent
_oth
er.la
ng.d
isor
der.L
AN
G.D
IS family.ID
pb_sites.FAM vn_sites.FAM xx_sites.FAM sb_sites.FAM
fl_sites.FAM kr_sites.FAM
sz.sorted_sites.FAM evalage_s1_bg.SRS
evalage_mo_bg.SRS evalmonth_s1_bg.SRS
evalmonth_mo_bg.SRS num.sibs_IND
sex.f1m0.p1_IND birth.p1_IND birth.fa_IND
death.p1_IND death.mo_IND
genetic.ab.s1_IND genetic.ab.mo_IND
overall_genetic_FAM father_genetic_FAM
sib_genetic_FAM asian.fa_racePARENT
native−hawaiian.fa_racePARENT not−specified.fa_racePARENT
ethnicity.hispanic.fa_racePARENT asian.mo_racePARENT
native−hawaiian.mo_racePARENT not−specified.mo_racePARENT
ethnicity.hispanic.mo_racePARENT bapq_rigid_average.fa_cuPARENT bapq_aloof_average.fa_cuPARENT
srs_adult_nbr_missing.fa_cuPARENT bapq_nbr_missing.mo_cuPARENT
bapq_pragmatic_average.mo_cuPARENT bapq_overall_average.mo_cuPARENT
srs_adult_total.mo_cuPARENT family_type_p1.CDV
ethnicity_p1.CDV adi_r_cpea_dx_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV adi_r_rrb_c_total_p1.CDV
ados_module_p1.CDV ados_css_p1.CDV
ados_restricted_repetitive_p1.CDV ssc_diagnosis_verbal_iq_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV ssc_diagnosis_full_scale_iq_p1.CDV
ssc_diagnosis_vma_p1.CDV vineland_ii_composite_standard_score_p1.CDV
srs_parent_raw_total_p1.CDV srs_teacher_raw_total_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV cbcl_6_18_internalizing_t_score_p1.CDV
abc_total_score_p1.CDV febrile_seizures_p1.CDV
bckgd_hx_highest_edu_father_p1.OCUV bckgd_hx_parent_relation_status_p1.OCUV
ssc_dx_overallcertainty_p1.OCUV nbr_stillbirth_miscarriage_p1.OCUV
family_structure_p1.OCUV word_delay_p1.OCUV
phrase_delay_p1.OCUV adi_r_q86_abnormality_evident_p1.OCUV
ados1_algorithm_p1.OCUV a1_non_echoed_p1.OCUV
ados_reciprocal_social_p1.OCUV vabs_ii_dls_standard_p1.OCUV vabs_ii_motor_skills_p1.OCUV
srs_parent_awareness_p1.OCUV srs_parent_communication_p1.OCUV
srs_parent_motivation_p1.OCUV srs_teacher_awareness_p1.OCUV
srs_teacher_communication_p1.OCUV srs_teacher_motivation_p1.OCUV
rbs_r_i_stereotyped_behavior_p1.OCUV rbs_r_iii_compulsive_behavior_p1.OCUV
rbs_r_v_sameness_behavior_p1.OCUV abc_nbr_missing_p1.OCUV
abc_iv_hyperactivity_p1.OCUV abc_iii_stereotypy_p1.OCUV
scq_life_nbr_missing_p1.OCUV scq_life_total_p1.OCUV
cbcl_2_5_somatic_complaints_p1.OCUV cbcl_2_5_sleep_problems_p1.OCUV
cbcl_2_5_aggressive_behavior_p1.OCUV cbcl_2_5_affective_problems_p1.OCUV
cbcl_2_5_pervasive_developmental_p1.OCUV cbcl_2_5_oppositional_defiant_p1.OCUV
cbcl_6_18_social_p1.OCUV cbcl_6_18_total_competence_p1.OCUV
cbcl_6_18_withdrawn_p1.OCUV cbcl_6_18_social_problems_p1.OCUV
cbcl_6_18_attention_problems_p1.OCUV cbcl_6_18_aggressive_behavior_p1.OCUV
cbcl_6_18_affective_problems_p1.OCUV cbcl_6_18_somatic_prob_p1.OCUV
cbcl_6_18_oppositional_defiant_p1.OCUV cbcl_2_5_emotionally_reactive_p1.OCUV
mat_rheum.arthritis.juvenile.AUTO.IMM sibling_rheum.arthritis.juvenile.AUTO.IMM
pat.cousin_rheum.arthritis.juvenile.AUTO.IMM pat.auntuncle_rheum.arthritis.juvenile.AUTO.IMM
y.n_rheum.arthritis.adult.AUTO.IMM mat_rheum.arthritis.adult.AUTO.IMM
mat.auntuncle_rheum.arthritis.adult.AUTO.IMM mat.grandparent_rheum.arthritis.adult.AUTO.IMM
y.n_systemic.lupus.eryth.AUTO.IMM mat_systemic.lupus.eryth.AUTO.IMM
mat.cousin_systemic.lupus.eryth.AUTO.IMM mat.auntuncle_systemic.lupus.eryth.AUTO.IMM
mat.grandparent_systemic.lupus.eryth.AUTO.IMM y.n_asthma.AUTO.IMM
mat_asthma.AUTO.IMM sibling_asthma.AUTO.IMM
pat.halfsibling_asthma.AUTO.IMM pat.cousin_asthma.AUTO.IMM
pat.auntuncle_asthma.AUTO.IMM pat.grandparent_asthma.AUTO.IMM
proband_hyperthyroidism.AUTO.IMM pat_hyperthyroidism.AUTO.IMM
mat.auntuncle_hyperthyroidism.AUTO.IMM mat.grandparent_hyperthyroidism.AUTO.IMM
y.n_hypothyroidism.AUTO.IMM mat_hypothyroidism.AUTO.IMM
sibling_hypothyroidism.AUTO.IMM pat.cousin_hypothyroidism.AUTO.IMM
pat.auntuncle_hypothyroidism.AUTO.IMM pat.grandparent_hypothyroidism.AUTO.IMM
proband_hashimotos.tyroiditis.AUTO.IMM pat_hashimotos.tyroiditis.AUTO.IMM
mat.cousin_hashimotos.tyroiditis.AUTO.IMM pat.auntuncle_hashimotos.tyroiditis.AUTO.IMM
pat.grandparent_hashimotos.tyroiditis.AUTO.IMM mat_diabetes.mellitus.type1.AUTO.IMM
sibling_diabetes.mellitus.type1.AUTO.IMM pat.cousin_diabetes.mellitus.type1.AUTO.IMM
pat.auntuncle_diabetes.mellitus.type1.AUTO.IMM pat.grandparent_diabetes.mellitus.type1.AUTO.IMM
proband_diabetes.mellitus.type2.AUTO.IMM pat_diabetes.mellitus.type2.AUTO.IMM
pat.cousin_diabetes.mellitus.type2.AUTO.IMM pat.auntuncle_diabetes.mellitus.type2.AUTO.IMM
pat.grandparent_diabetes.mellitus.type2.AUTO.IMM proband_adrenal.insufficiency.AUTO.IMM
mat.auntuncle_adrenal.insufficiency.AUTO.IMM pat.grandparent_adrenal.insufficiency.AUTO.IMM
proband_psoriasis.AUTO.IMM pat_psoriasis.AUTO.IMM
pat.halfsibling_psoriasis.AUTO.IMM pat.cousin_psoriasis.AUTO.IMM
pat.auntuncle_psoriasis.AUTO.IMM pat.grandparent_psoriasis.AUTO.IMM proband_bowel.disorders.AUTO.IMM
pat_bowel.disorders.AUTO.IMM mat.halfsibling_bowel.disorders.AUTO.IMM
pat.cousin_bowel.disorders.AUTO.IMM pat.auntuncle_bowel.disorders.AUTO.IMM
pat.grandparent_bowel.disorders.AUTO.IMM proband_cellac.disease.AUTO.IMM
pat_cellac.disease.AUTO.IMM mat.cousin_cellac.disease.AUTO.IMM
mat.auntuncle_cellac.disease.AUTO.IMM mat.grandparent_cellac.disease.AUTO.IMM
y.n_multiple.sclerosis.AUTO.IMM mat_multiple.sclerosis.AUTO.IMM
mat.cousin_multiple.sclerosis.AUTO.IMM mat.auntuncle_multiple.sclerosis.AUTO.IMM
mat.grandparent_multiple.sclerosis.AUTO.IMM y.n_other.autoimmune.disorder.AUTO.IMM
proband_other.autoimmune.disorder.AUTO.IMM pat_other.autoimmune.disorder.AUTO.IMM
pat.halfsibling_other.autoimmune.disorder.AUTO.IMM pat.cousin_other.autoimmune.disorder.AUTO.IMM
pat.auntuncle_other.autoimmune.disorder.AUTO.IMM pat.grandparent_other.autoimmune.disorder.AUTO.IMM
proband_cleft.lip.palate.BTH.DEF sibling_cleft.lip.palate.BTH.DEF
mat.cousin_cleft.lip.palate.BTH.DEF mat.auntuncle_cleft.lip.palate.BTH.DEF
mat.grandparent_cleft.lip.palate.BTH.DEF proband_open.spine.BTH.DEF
mat.cousin_open.spine.BTH.DEF pat.auntuncle_open.spine.BTH.DEF
y.n_congenital.heart.defect.BTH.DEF mat_congenital.heart.defect.BTH.DEF
sibling_congenital.heart.defect.BTH.DEF mat.cousin_congenital.heart.defect.BTH.DEF
mat.auntuncle_congenital.heart.defect.BTH.DEF mat.grandparent_congenital.heart.defect.BTH.DEF
y.n_kidney.defect.BTH.DEF mat_kidney.defect.BTH.DEF
sibling_kidney.defect.BTH.DEF pat.cousin_kidney.defect.BTH.DEF
pat.auntuncle_kidney.defect.BTH.DEF pat.grandparent_kidney.defect.BTH.DEF
proband_abnormal.shape.polydactyly.BTH.DEF pat_abnormal.shape.polydactyly.BTH.DEF
mat.cousin_abnormal.shape.polydactyly.BTH.DEF mat.auntuncle_abnormal.shape.polydactyly.BTH.DEF
mat.grandparent_abnormal.shape.polydactyly.BTH.DEF y.n_other.birth.defect.BTH.DEF
proband_other.birth.defect.BTH.DEF pat_other.birth.defect.BTH.DEF
pat.halfsibling_other.birth.defect.BTH.DEF pat.cousin_other.birth.defect.BTH.DEF
pat.auntuncle_other.birth.defect.BTH.DEF pat.grandparent_other.birth.defect.BTH.DEF
proband_heart.disease.CHRC.ILL pat_heart.disease.CHRC.ILL
mat.cousin_heart.disease.CHRC.ILL mat.auntuncle_heart.disease.CHRC.ILL
mat.grandparent_heart.disease.CHRC.ILL y.n_stroke.CHRC.ILL pat_stroke.CHRC.ILL
mat.auntuncle_stroke.CHRC.ILL mat.grandparent_stroke.CHRC.ILL
y.n_cancer.CHRC.ILL pat_cancer.CHRC.ILL
mat.halfsibling_cancer.CHRC.ILL pat.cousin_cancer.CHRC.ILL
pat.auntuncle_cancer.CHRC.ILL pat.grandparent_cancer.CHRC.ILL
proband_death.under.50.CHRC.ILL mat.halfsibling_death.under.50.CHRC.ILL
mat.cousin_death.under.50.CHRC.ILL mat.auntuncle_death.under.50.CHRC.ILL
mat.grandparent_death.under.50.CHRC.ILL y.n_other.disorder.illness1.CHRC.ILL
proband_other.disorder.illness1.CHRC.ILL pat_other.disorder.illness1.CHRC.ILL
mat.halfsibling_other.disorder.illness1.CHRC.ILL mat.cousin_other.disorder.illness1.CHRC.ILL
mat.auntuncle_other.disorder.illness1.CHRC.ILL mat.grandparent_other.disorder.illness1.CHRC.ILL
y.n_other.disorder.illness2.CHRC.ILL proband_other.disorder.illness2.CHRC.ILL
pat_other.disorder.illness2.CHRC.ILL mat.halfsibling_other.disorder.illness2.CHRC.ILL
pat.cousin_other.disorder.illness2.CHRC.ILL pat.auntuncle_other.disorder.illness2.CHRC.ILL
pat.grandparent_other.disorder.illness2.CHRC.ILL diet_other_1_past_p1.DT.MED.SLP diet_other_2_past_p1.DT.MED.SLP
other_meds_2_desc_p1.DT.MED.SLP over_counters_2_desc_p1.DT.MED.SLP other_meds_1_reason_s1.DT.MED.SLP other_meds_2_reason_s1.DT.MED.SLP
over_counters_1_reason_s1.DT.MED.SLP over_counters_2_reason_s1.DT.MED.SLP
other_meds_2_desc_fa.DT.MED.SLP over_counters_2_desc_fa.DT.MED.SLP
dpt_reaction_p1.DT.MED.SLP dtap_reaction_p1.DT.MED.SLP
hib_reaction_p1.DT.MED.SLP hepatitis_b_reaction_p1.DT.MED.SLP polio_oral_reaction_p1.DT.MED.SLP
polio_injectedl_reaction_p1.DT.MED.SLP mmr_reaction_p1.DT.MED.SLP
flu_shot_reaction_p1.DT.MED.SLP chicken_pox_varicella_if_recd_p1.DT.MED.SLP
vaccination_other_1_p1.DT.MED.SLP vaccination_other_1_reaction_p1.DT.MED.SLP vaccination_other_2_reaction_p1.DT.MED.SLP
y.n_angelman.GEN.DIS y.n_down.syndrome.GEN.DIS
mat.cousin_down.syndrome.GEN.DIS mat.auntuncle_down.syndrome.GEN.DIS
y.n_phenylketonuria.GEN.DIS pat.cousin_phenylketonuria.GEN.DIS
mat.auntuncle_cri.du.chat.GEN.DIS specify_other.genetic.GEN.DIS
mat_other.genetic.GEN.DIS sibling_other.genetic.GEN.DIS
pat.cousin_other.genetic.GEN.DIS pat.auntuncle_other.genetic.GEN.DIS
pat.grandparent_other.genetic.GEN.DIS gestational_age_days_LBR.DLY.BTH.FEED
labor_duration_LBR.DLY.BTH.FEED labor_duration_na_LBR.DLY.BTH.FEED anesthesia_spinal_LBR.DLY.BTH.FEED
anesthesia_general_LBR.DLY.BTH.FEED labor_induced_LBR.DLY.BTH.FEED
induction_prolonged_rupture_membrane_LBR.DLY.BTH.FEED induction_post_dates_LBR.DLY.BTH.FEED
induction_reason_other_specify_LBR.DLY.BTH.FEED induction_method_pitocin_LBR.DLY.BTH.FEED
induction_method_prostaglandins_LBR.DLY.BTH.FEED induction_method_other_specify_LBR.DLY.BTH.FEED
augmentation_premature_rupture_membrane_LBR.DLY.BTH.FEED augmentation_failure_to_progress_LBR.DLY.BTH.FEED
augmentation_reason_other_LBR.DLY.BTH.FEED augmentation_method_amniotomy_LBR.DLY.BTH.FEED
augmentation_method_stripping_LBR.DLY.BTH.FEED augmentation_method_other_LBR.DLY.BTH.FEED
presentation_cephalic_LBR.DLY.BTH.FEED presentation_transverse_LBR.DLY.BTH.FEED
delivery_type_LBR.DLY.BTH.FEED c_section_emergent_LBR.DLY.BTH.FEED
c_section_planned_LBR.DLY.BTH.FEED c_section_other_specify_LBR.DLY.BTH.FEED
nuchal_cord_LBR.DLY.BTH.FEED placenta_previa_LBR.DLY.BTH.FEED
child_severely_traumatized_LBR.DLY.BTH.FEED birth_weight_lbs_LBR.DLY.BTH.FEED
birth_weight_not_sure_LBR.DLY.BTH.FEED birth_head_circumference_percentile_LBR.DLY.BTH.FEED
birth_head_circumference_not_sure_list_LBR.DLY.BTH.FEED birth_length_not_sure_LBR.DLY.BTH.FEED
apgar_score_at_5_minutes_LBR.DLY.BTH.FEED hospital_after_birth_LBR.DLY.BTH.FEED
hyperbilirubinemia_rx_no_treatment_LBR.DLY.BTH.FEED hyperbilirubinemia_rx_exchange_trans_LBR.DLY.BTH.FEED
meconium_aspiration_LBR.DLY.BTH.FEED o2_supplement_LBR.DLY.BTH.FEED
o2_by_mask_and_ventilation_LBR.DLY.BTH.FEED resuscitation_required_LBR.DLY.BTH.FEED
nicu_duration_admission_LBR.DLY.BTH.FEED anemia_LBR.DLY.BTH.FEED
hypokalemia_LBR.DLY.BTH.FEED hypoglycemia_LBR.DLY.BTH.FEED
diff_regulate_temp_LBR.DLY.BTH.FEED physical_anomalies_polydactyly_LBR.DLY.BTH.FEED
physical_anomalies_kidney_LBR.DLY.BTH.FEED breast_bottle_feed_LBR.DLY.BTH.FEED
breast_bottle_months_LBR.DLY.BTH.FEED breast_total_months_LBR.DLY.BTH.FEED bottle_total_months_LBR.DLY.BTH.FEED
type_of_formula_cow_milk_LBR.DLY.BTH.FEED type_of_formula_other_specify_LBR.DLY.BTH.FEED
poor_suck_LBR.DLY.BTH.FEED stiff_infant_LBR.DLY.BTH.FEED
lethargic_overly_sleepy_LBR.DLY.BTH.FEED proband_speech.delay.LANG.DIS
pat_speech.delay.LANG.DIS mat.halfsibling_speech.delay.LANG.DIS
mat.cousin_speech.delay.LANG.DIS mat.auntuncle_speech.delay.LANG.DIS
mat.grandparent_speech.delay.LANG.DIS y.n_expressive.lang.disorder.LANG.DIS
mat_expressive.lang.disorder.LANG.DIS sibling_expressive.lang.disorder.LANG.DIS
pat.cousin_expressive.lang.disorder.LANG.DIS pat.auntuncle_expressive.lang.disorder.LANG.DIS
y.n_receptive.lang.disorder.LANG.DIS mat_receptive.lang.disorder.LANG.DIS
sibling_receptive.lang.disorder.LANG.DIS proband_mixed.expressive.disorder.LANG.DIS
pat_mixed.expressive.disorder.LANG.DIS mat.cousin_mixed.expressive.disorder.LANG.DIS
mat.auntuncle_mixed.expressive.disorder.LANG.DIS mat.grandparent_mixed.expressive.disorder.LANG.DIS
proband_communication.disorder.LANG.DIS mat.cousin_communication.disorder.LANG.DIS
y.n_pragmatics.lang.disorder.LANG.DIS sibling_pragmatics.lang.disorder.LANG.DIS
pat.cousin_pragmatics.lang.disorder.LANG.DIS pat.auntuncle_pragmatics.lang.disorder.LANG.DIS
y.n_stuttering.LANG.DIS mat_stuttering.LANG.DIS
sibling_stuttering.LANG.DIS pat.halfsibling_stuttering.LANG.DIS
pat.cousin_stuttering.LANG.DIS pat.auntuncle_stuttering.LANG.DIS
pat.grandparent_stuttering.LANG.DIS proband_reduced.articulation.LANG.DIS
pat_reduced.articulation.LANG.DIS mat.halfsibling_reduced.articulation.LANG.DIS
mat.cousin_reduced.articulation.LANG.DIS mat.auntuncle_reduced.articulation.LANG.DIS
mat.grandparent_reduced.articulation.LANG.DIS y.n_other.lang.disorder.LANG.DIS
proband_other.lang.disorder.LANG.DIS pat_other.lang.disorder.LANG.DIS
mat.halfsibling_other.lang.disorder.LANG.DIS mat.cousin_other.lang.disorder.LANG.DIS
mat.auntuncle_other.lang.disorder.LANG.DIS mat.grandparent_other.lang.disorder.LANG.DIS
Correlations(Compl.Pairs)
Figure 3: An overview blockplot of 757 variables. Groups of variables are marked by back-ground highlight squares along the ascending diagonal. The blockplot of Figure 1 is containedin this larger plot and can be found in the small highlight square in the lower left marked bya faint crosshair. Readers who are viewing this document in a PDF reader may zoom in toverify that this highlight square contains an approximation to Figure 1.
7
The function of this blockplot is not so much to facilitate discovery as to provide anoverview organized in such a way that meaningful subsets are recognizable to the expertwho is knowledgeable about the dataset. The tool helps in this regard by providing a wayto group the variables and showing the variable groups as diagonal highlight squares. InFigure 3 two large highlight squares are recognizable near the lower left and the upper right.Closer scrutiny should allow the reader to recognize many more much smaller highlightsquares up and down the diagonal, each marking a small variable group. In particular, thereader should be able to locate the third-largest highlight square in the lower left, shownin turquoise as opposed to gray and pointed at by a faint crosshair: this square marks thegroup of 38 variables shown in Figures 1 and 2.
The mechanism by which variable grouping is conveyed to the AN is a naming conventionfor variable names: to define a variable group, the variables to be included must be givennames that end in the same suffix separated by an underscore “ ” (default, can be changed),and the variables must be contiguous in the order of the dataset. As an example, Figure 4shows the 38 variable group of Figure 1 in the context of its neighbor groups: This group ischaracterized by the suffix “p1.CDV”, whereas the neighbor group on the upper right (onlya small part is visible) has the suffix “p1.OCUV” and the two groups on the lower left havesuffixes “cuPARENT” and “racePARENT”.5 The background highlight squares cover the intra-group correlations for the respective variable groups. As in Figure 3, the highlight squarefor the 38 variable group is shown in turquoise whereas the neighboring highlight squaresare in gray.
Figures 1-4 are a prelude for the zooming and panning functionality to be described inSection 4.
3.3 Other Uses of Blockplots (1): P-Values
Associated with correlations are other quantities of interest that can also be displayed withblockplots, foremost among them the p-values of the correlations. A p-value in this case is ameasure of evidence in favor of the assumption that the observed correlation is spurious, thatis, its deviation from zero is due to chance alone while the population correlation is zero.6
P-values are hypothetical probabilities, hence they fall in the interval [0, 1]. As p-valuesrepresent evidence in favor of the assumption that no linear association exists, it is smallp-values that are of interest, because they indicate that the chance of a spurious detectionof linear association is small. By convention one is looking for p-values at least below 0.05,for a “Type I error” probability of one in twenty or less. When considering p-values ofmany correlations on the same dataset — as is the case here — one needs to protect against“multiplicity”, that is, the fact that 5% of p-values will be below 0.05 even if in truth all
5 These suffixes abbreviate the following full-length meanings: “proband 1, core descriptive variables”,“proband 1, other commonly used variables”, “commonly used for parents” and“race of parents”. Thesevariable groups are from the following SSC tables: proband cdv.csv, proband ocuv.csv, and parent.csv.
6 Technically, the (two-sided) p-value of a correlation is the hypothetical probability of observing a futuresample correlation greater in magnitude than the sample correlation observed in the actual data — assumingthat in truth the population correlation is zero.
8
nativ
e−am
eric
an.m
o_ra
cePA
RE
NT
na
tive−
haw
aiia
n.m
o_ra
cePA
RE
NT
m
ore−
than
−on
e−ra
ce.m
o_ra
cePA
RE
NT
no
t−sp
ecifi
ed.m
o_ra
cePA
RE
NT
ot
her.m
o_ra
cePA
RE
NT
et
hnic
ity.h
ispa
nic.
mo_
race
PAR
EN
T
bapq
_nbr
_mis
sing
.fa_c
uPA
RE
NT
ba
pq_r
igid
_ave
rage
.fa_c
uPA
RE
NT
ba
pq_p
ragm
atic
_ave
rage
.fa_c
uPA
RE
NT
ba
pq_a
loof
_ave
rage
.fa_c
uPA
RE
NT
ba
pq_o
vera
ll_av
erag
e.fa
_cuP
AR
EN
T
srs_
adul
t_nb
r_m
issi
ng.fa
_cuP
AR
EN
T
srs_
adul
t_to
tal.f
a_cu
PAR
EN
T
bapq
_nbr
_mis
sing
.mo_
cuPA
RE
NT
ba
pq_r
igid
_ave
rage
.mo_
cuPA
RE
NT
ba
pq_p
ragm
atic
_ave
rage
.mo_
cuPA
RE
NT
ba
pq_a
loof
_ave
rage
.mo_
cuPA
RE
NT
ba
pq_o
vera
ll_av
erag
e.m
o_cu
PAR
EN
T
srs_
adul
t_nb
r_m
issi
ng.m
o_cu
PAR
EN
T
srs_
adul
t_to
tal.m
o_cu
PAR
EN
T
age_
at_a
dos_
p1.C
DV
fa
mily
_typ
e_p1
.CD
V
sex_
p1.C
DV
et
hnic
ity_p
1.C
DV
cp
ea_d
x_p1
.CD
V
adi_
r_cp
ea_d
x_p1
.CD
V
adi_
r_so
c_a_
tota
l_p1
.CD
V
adi_
r_co
mm
_b_n
on_v
erba
l_to
tal_
p1.C
DV
ad
i_r_
b_co
mm
_ver
bal_
tota
l_p1
.CD
V
adi_
r_rr
b_c_
tota
l_p1
.CD
V
adi_
r_ev
iden
ce_o
nset
_p1.
CD
V
ados
_mod
ule_
p1.C
DV
di
agno
sis_
ados
_p1.
CD
V
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ad
os_r
estr
icte
d_re
petit
ive_
p1.C
DV
ad
os_c
omm
unic
atio
n_so
cial
_p1.
CD
V
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ss
c_di
agno
sis_
verb
al_i
q_ty
pe_p
1.C
DV
ss
c_di
agno
sis_
nonv
erba
l_iq
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
full_
scal
e_iq
_p1.
CD
V
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
vma_
p1.C
DV
ss
c_di
agno
sis_
nvm
a_p1
.CD
V
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
sr
s_pa
rent
_raw
_tot
al_p
1.C
DV
sr
s_te
ache
r_t_
scor
e_p1
.CD
V
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rb
s_r_
over
all_
scor
e_p1
.CD
V
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cb
cl_6
_18_
exte
rnal
izin
g_t_
scor
e_p1
.CD
V
abc_
tota
l_sc
ore_
p1.C
DV
no
n_fe
brile
_sei
zure
s_p1
.CD
V
febr
ile_s
eizu
res_
p1.C
DV
bc
kgd_
hx_h
ighe
st_e
du_m
othe
r_p1
.OC
UV
bc
kgd_
hx_h
ighe
st_e
du_f
athe
r_p1
.OC
UV
bc
kgd_
hx_a
nnua
l_ho
useh
old_
p1.O
CU
V
bckg
d_hx
_par
ent_
rela
tion_
stat
us_p
1.O
CU
V
ssc_
dx_b
est_
estim
ate_
dx_l
ist_
p1.O
CU
V
ssc_
dx_o
vera
llcer
tain
ty_p
1.O
CU
V
gend
er_s
ib1_
p1.O
CU
V
nbr_
still
birt
h_m
isca
rria
ge_p
1.O
CU
V
prob
and_
birt
h_or
der_
p1.O
CU
V
fam
ily_s
truc
ture
_p1.
OC
UV
ad
i_r_
q09_
sing
le_w
ords
_p1.
OC
UV
w
ord_
dela
y_p1
.OC
UV
ad
i_r_
q10_
first
_phr
ases
_p1.
OC
UV
ph
rase
_del
ay_p
1.O
CU
V
adi_
r_q3
0_ov
eral
l_la
ngua
ge_p
1.O
CU
V
adi_
r_q8
6_ab
norm
ality
_evi
dent
_p1.
OC
UV
ad
i_r_
q87_
abno
rmal
ity_m
anife
st_p
1.O
CU
V
ados
1_al
gorit
hm_p
1.O
CU
V
native−hawaiian.mo_racePARENT more−than−one−race.mo_racePARENT
not−specified.mo_racePARENT other.mo_racePARENT
ethnicity.hispanic.mo_racePARENT bapq_nbr_missing.fa_cuPARENT
bapq_rigid_average.fa_cuPARENT bapq_pragmatic_average.fa_cuPARENT
bapq_aloof_average.fa_cuPARENT bapq_overall_average.fa_cuPARENT srs_adult_nbr_missing.fa_cuPARENT
srs_adult_total.fa_cuPARENT bapq_nbr_missing.mo_cuPARENT
bapq_rigid_average.mo_cuPARENT bapq_pragmatic_average.mo_cuPARENT
bapq_aloof_average.mo_cuPARENT bapq_overall_average.mo_cuPARENT srs_adult_nbr_missing.mo_cuPARENT
srs_adult_total.mo_cuPARENT age_at_ados_p1.CDV
family_type_p1.CDV sex_p1.CDV
ethnicity_p1.CDV cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV diagnosis_ados_p1.CDV
ados_css_p1.CDV ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV bckgd_hx_highest_edu_mother_p1.OCUV
bckgd_hx_highest_edu_father_p1.OCUV bckgd_hx_annual_household_p1.OCUV
bckgd_hx_parent_relation_status_p1.OCUV ssc_dx_best_estimate_dx_list_p1.OCUV
ssc_dx_overallcertainty_p1.OCUV gender_sib1_p1.OCUV
nbr_stillbirth_miscarriage_p1.OCUV proband_birth_order_p1.OCUV
family_structure_p1.OCUV adi_r_q09_single_words_p1.OCUV
word_delay_p1.OCUV adi_r_q10_first_phrases_p1.OCUV
phrase_delay_p1.OCUV adi_r_q30_overall_language_p1.OCUV
adi_r_q86_abnormality_evident_p1.OCUV adi_r_q87_abnormality_manifest_p1.OCUV
ados1_algorithm_p1.OCUV ados2_algorithm_p1.OCUV
Correlations(Compl.Pairs)
Figure 4: The 38 variable group of Figure 1 in the context of the neigboring variable groups.
9
population correlations vanish. Such protection is provided by choosing a threshold muchsmaller than 0.05, by the conservative Bonferroni rule as small as 0.05/#correlations. In thedata example with 757 variables, the number of correlations is 286,146, hence one might wantto choose the threshold on the p-values as low as .05/286,146 or about 1.75 in 10 million.The point is that in large-p problems one is interested in very small p-values.7
P-values lend themselves easily to graphical display with blockplots, but the direct map-ping of p-value to blocksize has some drawbacks. These drawbacks, on the other hand, canbe easily fixed:
• P-values are blind to the sign of the correlation: correlation values of +0.95 and -0.95,for example, result in the same two-sided p-value. We correct for this drawback byshowing p-values of negative correlations in red color.
• Of interest are small p-values that correspond to correlations of large magnitude, hencea direct mapping would represent the interesting p-values by small blocks, which isvisually incorrect because the eye is drawn to large objects, not to large holes. Wetherefore invert the mapping and associate blocksize with the complement 1−(p-value).
• Drawing on the preceding discussion, our interest is really in very small p-values, andone may hence want to ignore p-values greater than 0.05 altogether in the display. Wetherefore map the interval [0, 0.05] inversely to blocksize, meaning that p-values belowbut near 0.05 are shown as small blocks and p-values very near 0.00 as large blocks.
The resulting p-value blockplots are illustrated in Figure 5. The two plots show thesame 38 variables group as in Figure 1 with p-values truncated at 0.05 and at 0.000,000,1,respectively, as shown near the bottom left corners of the plots. The p-values are calculatedusing the usual normal approximation to the null distribution of the correlations. In viewof the large sample size, n ≥ 1, 800, the normal approximation can be assumed to be quitegood, even though one is going out on a limb when relying on normal tail probabilities assmall as 10−7. Then again, p-values this small are strong evidence against the assumptionthat the correlations are spurious.
3.4 Other Uses of Blockplots (2): Fraction of Missing and Com-plete Pairs of Values
Missing values are so common that they require special attention and special tools for under-standing their patterns. Missing values are sometimes approached with imputation methods,but in view of the large number of variables we wish to explore we use simple deletion meth-ods that rely on the largest number of available values. For correlations this means that weuse for a given pair of variables the full set of complete pairs of values. Another commonand more stringent deletion method is to use only cases that are complete on all variables,
7 The letters ‘p’ in “large-p” and “p-value” bear no relation. In the former, p is derived from “parameter”,in the latter from “probability”.
10
age_
at_a
dos_
p1.C
DV
fa
mily
_typ
e_p1
.CD
V
sex_
p1.C
DV
et
hnic
ity_p
1.C
DV
cp
ea_d
x_p1
.CD
V
adi_
r_cp
ea_d
x_p1
.CD
V
adi_
r_so
c_a_
tota
l_p1
.CD
V
adi_
r_co
mm
_b_n
on_v
erba
l_to
tal_
p1.C
DV
ad
i_r_
b_co
mm
_ver
bal_
tota
l_p1
.CD
V
adi_
r_rr
b_c_
tota
l_p1
.CD
V
adi_
r_ev
iden
ce_o
nset
_p1.
CD
V
ados
_mod
ule_
p1.C
DV
di
agno
sis_
ados
_p1.
CD
V
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ad
os_r
estr
icte
d_re
petit
ive_
p1.C
DV
ad
os_c
omm
unic
atio
n_so
cial
_p1.
CD
V
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ss
c_di
agno
sis_
verb
al_i
q_ty
pe_p
1.C
DV
ss
c_di
agno
sis_
nonv
erba
l_iq
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
full_
scal
e_iq
_p1.
CD
V
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
vma_
p1.C
DV
ss
c_di
agno
sis_
nvm
a_p1
.CD
V
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
sr
s_pa
rent
_raw
_tot
al_p
1.C
DV
sr
s_te
ache
r_t_
scor
e_p1
.CD
V
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rb
s_r_
over
all_
scor
e_p1
.CD
V
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cb
cl_6
_18_
exte
rnal
izin
g_t_
scor
e_p1
.CD
V
abc_
tota
l_sc
ore_
p1.C
DV
no
n_fe
brile
_sei
zure
s_p1
.CD
V
febr
ile_s
eizu
res_
p1.C
DV
age_at_ados_p1.CDV family_type_p1.CDV
sex_p1.CDV ethnicity_p1.CDV cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV diagnosis_ados_p1.CDV
ados_css_p1.CDV ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
P−values<.05(Normal)
age_
at_a
dos_
p1.C
DV
fa
mily
_typ
e_p1
.CD
V
sex_
p1.C
DV
et
hnic
ity_p
1.C
DV
cp
ea_d
x_p1
.CD
V
adi_
r_cp
ea_d
x_p1
.CD
V
adi_
r_so
c_a_
tota
l_p1
.CD
V
adi_
r_co
mm
_b_n
on_v
erba
l_to
tal_
p1.C
DV
ad
i_r_
b_co
mm
_ver
bal_
tota
l_p1
.CD
V
adi_
r_rr
b_c_
tota
l_p1
.CD
V
adi_
r_ev
iden
ce_o
nset
_p1.
CD
V
ados
_mod
ule_
p1.C
DV
di
agno
sis_
ados
_p1.
CD
V
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ad
os_r
estr
icte
d_re
petit
ive_
p1.C
DV
ad
os_c
omm
unic
atio
n_so
cial
_p1.
CD
V
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ss
c_di
agno
sis_
verb
al_i
q_ty
pe_p
1.C
DV
ss
c_di
agno
sis_
nonv
erba
l_iq
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
full_
scal
e_iq
_p1.
CD
V
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
vma_
p1.C
DV
ss
c_di
agno
sis_
nvm
a_p1
.CD
V
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
sr
s_pa
rent
_raw
_tot
al_p
1.C
DV
sr
s_te
ache
r_t_
scor
e_p1
.CD
V
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rb
s_r_
over
all_
scor
e_p1
.CD
V
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cb
cl_6
_18_
exte
rnal
izin
g_t_
scor
e_p1
.CD
V
abc_
tota
l_sc
ore_
p1.C
DV
no
n_fe
brile
_sei
zure
s_p1
.CD
V
febr
ile_s
eizu
res_
p1.C
DV
age_at_ados_p1.CDV family_type_p1.CDV
sex_p1.CDV ethnicity_p1.CDV cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV diagnosis_ados_p1.CDV
ados_css_p1.CDV ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
P−values<.0000001(Normal)
Figure 5: Blockplots of the p-values for the 38 variable group of Figure 1. Smaller and hencestatistically more significant p-values are shown as larger blocks. The colors are inheritedfrom the correlations to reflect their signs.Truncation levels of p-values: Left ≥ 0.05; right ≥ 0.000, 000, 1.Many modest correlations are extremely statistically significant due to n ≥ 1, 800.
age_
at_a
dos_
p1.C
DV
fa
mily
_typ
e_p1
.CD
V
sex_
p1.C
DV
et
hnic
ity_p
1.C
DV
cp
ea_d
x_p1
.CD
V
adi_
r_cp
ea_d
x_p1
.CD
V
adi_
r_so
c_a_
tota
l_p1
.CD
V
adi_
r_co
mm
_b_n
on_v
erba
l_to
tal_
p1.C
DV
ad
i_r_
b_co
mm
_ver
bal_
tota
l_p1
.CD
V
adi_
r_rr
b_c_
tota
l_p1
.CD
V
adi_
r_ev
iden
ce_o
nset
_p1.
CD
V
ados
_mod
ule_
p1.C
DV
di
agno
sis_
ados
_p1.
CD
V
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ad
os_r
estr
icte
d_re
petit
ive_
p1.C
DV
ad
os_c
omm
unic
atio
n_so
cial
_p1.
CD
V
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ss
c_di
agno
sis_
verb
al_i
q_ty
pe_p
1.C
DV
ss
c_di
agno
sis_
nonv
erba
l_iq
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
full_
scal
e_iq
_p1.
CD
V
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
vma_
p1.C
DV
ss
c_di
agno
sis_
nvm
a_p1
.CD
V
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
sr
s_pa
rent
_raw
_tot
al_p
1.C
DV
sr
s_te
ache
r_t_
scor
e_p1
.CD
V
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rb
s_r_
over
all_
scor
e_p1
.CD
V
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cb
cl_6
_18_
exte
rnal
izin
g_t_
scor
e_p1
.CD
V
abc_
tota
l_sc
ore_
p1.C
DV
no
n_fe
brile
_sei
zure
s_p1
.CD
V
febr
ile_s
eizu
res_
p1.C
DV
age_at_ados_p1.CDV family_type_p1.CDV
sex_p1.CDV ethnicity_p1.CDV cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV diagnosis_ados_p1.CDV
ados_css_p1.CDV ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
#MissingPairs/N
age_
at_a
dos_
p1.C
DV
fa
mily
_typ
e_p1
.CD
V
sex_
p1.C
DV
et
hnic
ity_p
1.C
DV
cp
ea_d
x_p1
.CD
V
adi_
r_cp
ea_d
x_p1
.CD
V
adi_
r_so
c_a_
tota
l_p1
.CD
V
adi_
r_co
mm
_b_n
on_v
erba
l_to
tal_
p1.C
DV
ad
i_r_
b_co
mm
_ver
bal_
tota
l_p1
.CD
V
adi_
r_rr
b_c_
tota
l_p1
.CD
V
adi_
r_ev
iden
ce_o
nset
_p1.
CD
V
ados
_mod
ule_
p1.C
DV
di
agno
sis_
ados
_p1.
CD
V
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ad
os_r
estr
icte
d_re
petit
ive_
p1.C
DV
ad
os_c
omm
unic
atio
n_so
cial
_p1.
CD
V
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ss
c_di
agno
sis_
verb
al_i
q_ty
pe_p
1.C
DV
ss
c_di
agno
sis_
nonv
erba
l_iq
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
full_
scal
e_iq
_p1.
CD
V
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
vma_
p1.C
DV
ss
c_di
agno
sis_
nvm
a_p1
.CD
V
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
sr
s_pa
rent
_raw
_tot
al_p
1.C
DV
sr
s_te
ache
r_t_
scor
e_p1
.CD
V
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rb
s_r_
over
all_
scor
e_p1
.CD
V
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cb
cl_6
_18_
exte
rnal
izin
g_t_
scor
e_p1
.CD
V
abc_
tota
l_sc
ore_
p1.C
DV
no
n_fe
brile
_sei
zure
s_p1
.CD
V
febr
ile_s
eizu
res_
p1.C
DV
age_at_ados_p1.CDV family_type_p1.CDV
sex_p1.CDV ethnicity_p1.CDV cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV diagnosis_ados_p1.CDV
ados_css_p1.CDV ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
#CompletePairs/N
Figure 6: Blockplots of the fractions of missing (left) and complete (right) pairs of values.
11
but in the large-p problem this is not a viable approach because complete cases may wellnot exist when the number of variables reaches into the hundreds or thousands.
An issue with calculating correlations from maximal sets of complete pairs of values isthat this set may vary from correlation to correlation because it is formed from the overlapof non-missing values in both variables. Thus, associated with each correlation r(x, y) are
• the number n(x, y) (≤ n) of complete pairs from which r(x, y) is calculated, and
• the number m(x, y) = n− n(x, y) of incomplete pairs where at least one of the two, xor y, is missing.
Just like the correlations r(x, y), the values n(x, y) and m(x, y) form n × n tables, hencecan be easily visualized with blockplots in their fractional forms n(x, y)/n and m(x, y)/n.An example of each is shown in Figure 6, again for the same 38 variable group of Figure 1.Apparently four variables have a major missing value problem.
Depending on whether the number of complete or incomplete pairs dominates, one orthe other plot is more sensible in that it uses less ink. Finally, we note that in this caseof a blockplot the diagonal is not occupied by a constant but contains instead the fractionof non-missing (n(x, x)/n) and missing (m(x, x)/n) values, respectively, for each individualvariable X. The two tables have inverse relationships between the diagonal and off-diagonalelements: n(x, x) ≥ n(x, y) and m(x, x) ≤ m(x, y). That is, in the n(x, y)-table the diagonaldominates its row and column, whereas in the m(x, y)-table the diagonal is dominated byits row and column.
3.5 Marginal and Bivariate Plots: Histograms/Barcharts,Scatterplots, and Scatterplot Matrices
The correlation of a pair of variables is a simple summary measure of association betweentwo variables, hence one often wonders about the detailed nature of the association. The fulldetails can be learned from a scatterplot of the two variables. Often the association is con-strained by the marginal distribution, hence we also show histograms and barcharts. Figure 7shows three examples of triples consisting of a pairwise scatterplot and two marginal his-tograms (for quantitative variables) and barcharts (for categorical variables). From Figure 7we can draw a few conclusions and recommendations:
• A most basic use of the plots is to note the type of the variables: In Figure 7, bothvariables on the left (..nonverbal iq.. and ..verbal iq..) and the y-variable in thecenter (..vma..) are quantitative, the x-variable in the center (ados module..) isapparently ordinal with four levels, and both variables on the right (nonverbal iq type
and verbal iq type) are binary. Quantitative variables can have strong marginal features:It might be of interest to observe that the x-variable on the left is slightly bimodal, with amajor mode around x = 90 and a minor mode around x = 30.8 The y-variable in the center
8 The bimodality of the IQ distribution is a measurement artifact: For cognitively highly impairedprobands a different and more appropriate IQ test is administered. In theory this alternative test should be
12
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●●
● ●
●
●
●●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
● ●
● ●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
● ●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●● ●
●
●
●
●●
●
●
●
●
●●
●
●●
●
● ●●
●●
●
●●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
● ●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●● ●
●●●
●
●●
●
●
●●●
●
●
●●
●●
●●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●
●
●●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
● ●● ●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●●●
●
●
●
●
●
0 50 100 150
050
100
150
ssc_diagnosis_verbal_iq_p1.CDV
ssc_
diag
nosi
s_no
nver
bal_
iq_p
1.C
DV
Frame #1: n = 1887Corr = 0.824 (pval = 0)
ssc_diagnosis_verbal_iq_p1.CDV
Fre
quen
cy
0 50 100 150
050
100
150
200
250
ssc_diagnosis_nonverbal_iq_p1.CDV
Fre
quen
cy
0 50 100 150
050
100
200
300
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●
●
●
● ●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
● ●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●● ●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
● ● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●●● ●
●
●
●
●
●●
●
●
●
●●
●● ●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
●●
●
●
●●
●●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●010
020
030
0
ados_module_p1.CDV
ssc_
diag
nosi
s_vm
a_p1
.CD
V
1 2 3 4
●●
●
●
Frame #2: n = 1880Corr = 0.707 (pval = 0)
ados_module_p1.CDV
Fre
quen
cy
020
040
060
080
010
00
1 2 3 4
ssc_diagnosis_vma_p1.CDV
Fre
quen
cy
0 100 200 300
050
100
200
300
●●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
● ●
●●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
● ●●
●●
●
●●
●●
●●
●●
●
●
● ●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●●● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
● ●
●
●
●
●●
●
●●
● ●●
●
●
●●
●
●
●
●
●
●
●●● ●
●
●
●
●
●●
● ●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●●
●
●●
●
●● ●
●
●
●●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●● ●
●
●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●●
● ●●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●
●●
●
●
●
●
● ●●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●●
●●
●
●●●
● ●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●●
●●
●●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
● ● ●
●
●●
●●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●●
●
● ●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
● ●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
● ●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
● ●●
●
●● ●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●●●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●●●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
1 2
12
●
●
Frame #3: n = 1887Corr = 0.76 (pval = 0)
ssc_diagnosis_verbal_iq_type_p1.CDV
Fre
quen
cy
050
010
0015
001 2
ssc_diagnosis_nonverbal_iq_type_p1.CDV
Fre
quen
cy
050
010
0015
00
1 2
Figure 7: Scatterplots and histograms/barplots for three variable pairs.
13
scatterplot is partially censored on the upper side at about y = 210, as can be seen both inthe scatterplot and in the (lower) histogram.
• Categorical variables, when scored numerically, can be gainfully displayed in scatterplots.It is useful to jitter them to avoid being misled by overplotting. In Figure 7, jittering isapplied to the x-variable in the center scatterplot and to both binary variables in the righthand scatterplot.
• To enhance the perception of the association, the scatterplots can be decorated withsmooths for continuous variables and with traces of group means when the x-variableis categorical with fewer than, say, 8 groups (default, can be changed). In the left and cen-ter scatterplots of Figure 7, the associations of the y-variables with the x-variables are seento be somewhat non-linear, but compared to the linear component of the association, thenon-linearities are relatively modest.9
The AN shows scatterplots and histograms/barcharts in a window separate from theblockplot window, one triple of plots at a time. To overcome the one-at-a time limitation,the AN also offers scatterplot matrices (sometimes called “sploms”) of arbitrary numbers ofvariables. An example, involving four variables (different from those in Figure 7), is shownin Figure 8. For readers not familiar with scatterplot matrices, note that each variable pairis shown twice, in plots located symmetrically off the diagonal, and with reverse roles as x-and y-variables. Each diagonal cell shows a variable label that indicates (1) the commonx-axis in the column of the cell and (2) the common y-axis in the row of the cell. For thereader familier with scatterplot matrices, note that we show the vertical order of the variablesascending from bottom to top, the reason being consistency with the convention we use inthe blockplots.
As for particulars of the scatterplot matrix shown in Figure 8, the visually most strikingfeatures concern marginal distributions, not associations: The first variable is capped at themaximal value +90, and the fourth variable is binary. Otherwise the associations look simplymonotone and seem well-summarized by correlations.
3.6 Variations on Blockplots
Blockplots are not the most common visualizations of correlation tables. As a google searchof “correlation plot” reveals, the most frequent visual rendering of correlation tables is interms of “heatmaps” where square cells are always filled and numeric values are coded on agray or color scale. An example is shown in the left frame of Figure 9; for comparison, theright frame shows the corresponding blockplot. Here are a few observations about the twotypes of plots:
scaled to cohere with the test administered to the majority, but in practice it creates a minor mode in thelow end of the IQ distribution, more so for verbal IQ than nonverbal IQ.
9 The non-linearity on the left could be due to the marginal distributions. The non-linearity in the centeris expected by the expert: verbal mental age (vma) on the y-axis should be considerably higher on average inADOS modules 3 and especially 4 because these modules or levels are formed from a simple test of languagecompetence.
14
20 40 60 80 100 120
02
46
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●
●
●●
● ●● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●● ●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
40 50 60 70 80 90
●
●●
●
●
●●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
ados2algorithmp1.OCUV
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●● ●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● srsteacher
tscore
p1.CDV
4050
6070
8090
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
4050
6070
8090
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●● ●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
●●●
●
●
●
● ●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
● ●●
● ●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
●● ●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
● ●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
srsparent
tscore
p1.CDV
●●
●
●●
● ●
●
●
● ●●
● ●
●
●
● ●
●
●
●●
●
●●
●
●●●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ●● ●
● ●
●
●●
●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
● ●
●●
●
●●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●● ●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●●
●
●
●
●
●
● ●
●
●●●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
● ● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
● ●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●●
●
●
●
●
●
●●
●
●
vinelandii
compositestandard
scorep1.CDV
40 50 60 70 80 90
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
● ●
●
●
●●●
●●
●●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●● ●
●●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●●
●
●●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●●
●●
● ●
●
●
●
●
●
●
● ●
●
●●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●● ●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●
●
●●
●
●
●
●● ●
●
●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
● ●
●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
● ●
●
●
●
● ●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
● ●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●●●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
● ●
●●●
●
●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
● ●
●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0 2 4 6
2040
6080
100
120
●● ●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●●
●
●
●
●●●
●
●
●
● ●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
● ●●
●●
●●
●●
● ●
● ●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
● ●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
Figure 8: Scatterplot matrix of four variables. (Note the convention for the vertical order ofthe variables: bottom to top, for consistency with the blockplots.)
15
• Color and gray scale is generally a weaker visual cue than size. This argument favorsblockplots as long as the blocks are not too small, that is, as long as the view is notzoomed out too much. The superiority of blockplots over heatmaps is also noted byWickham, Hofmann and Cook (2006, Figure 2).
• In heatmaps color fuses adjacent cells when they are close in value. This may or maynot be a problem for the trained eye, but there is a loss of identity of the rows andcolumns in heatmaps.
• Heatmaps do not permit mark-up with background color because they fill the square orrectangular cells completely. This problem can be overcome by shrinking the heatmapcells somewhat to allow some surrounding space to be freed up that can be filled withbackground color for mark-up, as shown in the center frame of Figure 9. This methodof rendering, however, seems to further decrease the crispness of heatmaps.
• Heatmaps perform nicely when the view is heavily zoomed out, in which case theindividual blocks are so small that size is no longer visually functional as a cue. In thiscase color coding works well and gives an accurate impression of global structure. Wesolve this problem for blockplots by showing only 10,000 or so of the largest correlationswhen heavily zoomed out. Thinning the table in this manner works well even whenthe visible table is so large that each cell is strictly speaking below the pixel resolutionof a raster screen.
Because none of the two types of plots — blockplots or heatmaps — may be uniformlysuperior at all scales, the AN provides both, and with one keystroke one can toggle betweenthe two rendering methods. Varying block size allows for the mixed variant shown in thecenter of Figure 9.
Visualization of correlation tables has a small literature in statistics. An early referencethat addresses large correlation tables is Hills (1969), who applies half-normal plots to tellstatistically significant from insignificant correlations and clusters variables visually in two-dimensional projections. Closer to the present work are articles by Murdoch and Chow(1996) and Friendly (2002). Both propose relatively complex renderings of correlations withellipses or augmented circles that may not scale up to the sizes of tables we have in mindbut may be useful for conveying richer information for tables that are smaller, yet too largefor numeric table display. Blockplot coding, which uses squares, has the advantage thatthese shapes can completely fill their cells to represent extremal correlations as these aregeometrically similar to the shapes of the containing cells (at least if the the default aspectratio of the blockplot is maintained), whereas all other shapes leave residual space even whenmaximally expanded.
What we prefer to call descriptively “blockplots”, possibly contracted to “blots”, haspreviously been named “fluctuation diagrams” (Hofmann 2000). Under this term one can finda static implementation in the R-package extracat on the CRAN site authored by Pilhoeferand Unwind (2013). Static software for heatmaps is readily available, for example, in theR-function heatmap(). Heatmaps are often applied to raw data tables, but they can beequally applied to correlation tables. Many variations of glyph coding can be found in the
16
classic book by Bertin (1983).An interesting aspect of blockplots is that there exists science regarding the perception
of area size. A general theory holds that most continuous stimuli (“continua” such as length,area, volume, weight, brightness, loudness, ...) result in perceptions according to “Stevens’power law” (Stevens (1957), Stevens and Galanter (1957) and Stevens (1975)). That is, aquantitative stimulus x translates to a quantitative perception p(x) through a law of theform p(x) = c xβ. As discussed by Cleveland (1985, p. 243) with reference to Stevens(1975), for area perception the power is about β = 0.7, meaning that an actual area ratioof 2:1 is on average perceived as a ratio of (2:1)0.7 ≈ 1.62. This law can be leveragedto determine the transformation that should be used to map correlations to squares ina blockplot. In R the symbol size is parametrized in terms of a linear expansion factorcalled cex (’character expansion’). Our goal is to use block sizes such that their perceivedratios faithfully reflect the ratios of respective correlations. This results in the conditioncor ∼ p(cex2) = (cex2)0.7 = cex1.4, hence cex ∼ cor1/1.4 ≈ cor0.7. This is indeed the defaultpower transformation in the AN, although users can change it (see Appendix B). If mostcorrelations are very small, a power closer to zero will expand the range of small values,resulting in enhanced discrimination at the low end at the cost of attenuated discriminationat the high end.
diag
nosi
s_ad
os_p
1.C
DV
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ados
_res
tric
ted_
repe
titiv
e_p1
.CD
V
ados
_com
mun
icat
ion_
soci
al_p
1.C
DV
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ssc_
diag
nosi
s_ve
rbal
_iq_
type
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_p
1.C
DV
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_p
1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_vm
a_p1
.CD
V
ssc_
diag
nosi
s_nv
ma_
p1.C
DV
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
srs_
pare
nt_r
aw_t
otal
_p1.
CD
V
srs_
teac
her_
t_sc
ore_
p1.C
DV
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rbs_
r_ov
eral
l_sc
ore_
p1.C
DV
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cbcl
_6_1
8_ex
tern
aliz
ing_
t_sc
ore_
p1.C
DV
abc_
tota
l_sc
ore_
p1.C
DV
non_
febr
ile_s
eizu
res_
p1.C
DV
febr
ile_s
eizu
res_
p1.C
DV
bckg
d_hx
_hig
hest
_edu
_mot
her_
p1.O
CU
V
bckg
d_hx
_hig
hest
_edu
_fat
her_
p1.O
CU
V
bckg
d_hx
_ann
ual_
hous
ehol
d_p1
.OC
UV
bckg
d_hx
_par
ent_
rela
tion_
stat
us_p
1.O
CU
V
ssc_
dx_b
est_
estim
ate_
dx_l
ist_
p1.O
CU
V
ssc_
dx_o
vera
llcer
tain
ty_p
1.O
CU
V
gend
er_s
ib1_
p1.O
CU
V
nbr_
still
birt
h_m
isca
rria
ge_p
1.O
CU
V
prob
and_
birt
h_or
der_
p1.O
CU
V
fam
ily_s
truc
ture
_p1.
OC
UV
adi_
r_q0
9_si
ngle
_wor
ds_p
1.O
CU
V
wor
d_de
lay_
p1.O
CU
V
diagnosis_ados_p1.CDV
ados_css_p1.CDV
ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV
ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV
ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV
ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV
ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV
ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV
srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV
srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV
rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV
cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV
cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV
non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
bckgd_hx_highest_edu_mother_p1.OCUV
bckgd_hx_highest_edu_father_p1.OCUV
bckgd_hx_annual_household_p1.OCUV
bckgd_hx_parent_relation_status_p1.OCUV
ssc_dx_best_estimate_dx_list_p1.OCUV
ssc_dx_overallcertainty_p1.OCUV
gender_sib1_p1.OCUV
nbr_stillbirth_miscarriage_p1.OCUV
proband_birth_order_p1.OCUV
family_structure_p1.OCUV
adi_r_q09_single_words_p1.OCUV
word_delay_p1.OCUV
diag
nosi
s_ad
os_p
1.C
DV
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ados
_res
tric
ted_
repe
titiv
e_p1
.CD
V
ados
_com
mun
icat
ion_
soci
al_p
1.C
DV
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ssc_
diag
nosi
s_ve
rbal
_iq_
type
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_p
1.C
DV
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_p
1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_vm
a_p1
.CD
V
ssc_
diag
nosi
s_nv
ma_
p1.C
DV
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
srs_
pare
nt_r
aw_t
otal
_p1.
CD
V
srs_
teac
her_
t_sc
ore_
p1.C
DV
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rbs_
r_ov
eral
l_sc
ore_
p1.C
DV
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cbcl
_6_1
8_ex
tern
aliz
ing_
t_sc
ore_
p1.C
DV
abc_
tota
l_sc
ore_
p1.C
DV
non_
febr
ile_s
eizu
res_
p1.C
DV
febr
ile_s
eizu
res_
p1.C
DV
bckg
d_hx
_hig
hest
_edu
_mot
her_
p1.O
CU
V
bckg
d_hx
_hig
hest
_edu
_fat
her_
p1.O
CU
V
bckg
d_hx
_ann
ual_
hous
ehol
d_p1
.OC
UV
bckg
d_hx
_par
ent_
rela
tion_
stat
us_p
1.O
CU
V
ssc_
dx_b
est_
estim
ate_
dx_l
ist_
p1.O
CU
V
ssc_
dx_o
vera
llcer
tain
ty_p
1.O
CU
V
gend
er_s
ib1_
p1.O
CU
V
nbr_
still
birt
h_m
isca
rria
ge_p
1.O
CU
V
prob
and_
birt
h_or
der_
p1.O
CU
V
fam
ily_s
truc
ture
_p1.
OC
UV
adi_
r_q0
9_si
ngle
_wor
ds_p
1.O
CU
V
wor
d_de
lay_
p1.O
CU
V
diagnosis_ados_p1.CDV
ados_css_p1.CDV
ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV
ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV
ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV
ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV
ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV
ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV
srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV
srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV
rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV
cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV
cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV
non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
bckgd_hx_highest_edu_mother_p1.OCUV
bckgd_hx_highest_edu_father_p1.OCUV
bckgd_hx_annual_household_p1.OCUV
bckgd_hx_parent_relation_status_p1.OCUV
ssc_dx_best_estimate_dx_list_p1.OCUV
ssc_dx_overallcertainty_p1.OCUV
gender_sib1_p1.OCUV
nbr_stillbirth_miscarriage_p1.OCUV
proband_birth_order_p1.OCUV
family_structure_p1.OCUV
adi_r_q09_single_words_p1.OCUV
word_delay_p1.OCUV di
agno
sis_
ados
_p1.
CD
V
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ados
_res
tric
ted_
repe
titiv
e_p1
.CD
V
ados
_com
mun
icat
ion_
soci
al_p
1.C
DV
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ssc_
diag
nosi
s_ve
rbal
_iq_
type
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_p
1.C
DV
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_p
1.C
DV
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ssc_
diag
nosi
s_vm
a_p1
.CD
V
ssc_
diag
nosi
s_nv
ma_
p1.C
DV
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
srs_
pare
nt_r
aw_t
otal
_p1.
CD
V
srs_
teac
her_
t_sc
ore_
p1.C
DV
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rbs_
r_ov
eral
l_sc
ore_
p1.C
DV
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cbcl
_6_1
8_ex
tern
aliz
ing_
t_sc
ore_
p1.C
DV
abc_
tota
l_sc
ore_
p1.C
DV
non_
febr
ile_s
eizu
res_
p1.C
DV
febr
ile_s
eizu
res_
p1.C
DV
bckg
d_hx
_hig
hest
_edu
_mot
her_
p1.O
CU
V
bckg
d_hx
_hig
hest
_edu
_fat
her_
p1.O
CU
V
bckg
d_hx
_ann
ual_
hous
ehol
d_p1
.OC
UV
bckg
d_hx
_par
ent_
rela
tion_
stat
us_p
1.O
CU
V
ssc_
dx_b
est_
estim
ate_
dx_l
ist_
p1.O
CU
V
ssc_
dx_o
vera
llcer
tain
ty_p
1.O
CU
V
gend
er_s
ib1_
p1.O
CU
V
nbr_
still
birt
h_m
isca
rria
ge_p
1.O
CU
V
prob
and_
birt
h_or
der_
p1.O
CU
V
fam
ily_s
truc
ture
_p1.
OC
UV
adi_
r_q0
9_si
ngle
_wor
ds_p
1.O
CU
V
wor
d_de
lay_
p1.O
CU
V
diagnosis_ados_p1.CDV
ados_css_p1.CDV
ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV
ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV
ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV
ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV
ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV
ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV
srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV
srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV
rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV
cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV
cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV
non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
bckgd_hx_highest_edu_mother_p1.OCUV
bckgd_hx_highest_edu_father_p1.OCUV
bckgd_hx_annual_household_p1.OCUV
bckgd_hx_parent_relation_status_p1.OCUV
ssc_dx_best_estimate_dx_list_p1.OCUV
ssc_dx_overallcertainty_p1.OCUV
gender_sib1_p1.OCUV
nbr_stillbirth_miscarriage_p1.OCUV
proband_birth_order_p1.OCUV
family_structure_p1.OCUV
adi_r_q09_single_words_p1.OCUV
word_delay_p1.OCUV
Figure 9: A heatmap (left) compared with a corresponding blockplot (right), as well as a“shrunk heatmap” in the center.
17
4 Operation of the Association Navigator
The purpose of the AN is to generate the displays described above in rapid order and evenwith realtime motion. Numerous realtime operations are under mouse and keyboard control,while a few text-based operations are under dialog and menu control. Further parameterscan be controlled from the R language (see Appendix B), but this will not be necessary formost users. This section describes the operations of the AN, the purposes they serve, aswell as a minimal set of R-related instructions that concern one-time setup, regular startingup, and saving of state. The software will be available as a R-package, but the instructionsbelow do not reflect this and get the reader going by sourcing the software from the firstauthor’s site.
4.1 Starting Up the AN
In order to simply see some AN running, the reader may paste the following code into anR interpreter:
source("http://stat.wharton.upenn.edu/~buja/association-navigator.R")
p <- 200
mymatrix <- matrix(rnorm(20000),ncol=p)
colnames(mymatrix) <- paste("V", 1:p, "_", c(rep("A",p/2),rep("B",p/2)), sep="")
a.n <- a.nav.create(mymatrix)
a.nav.run(a.n)
This code will download and source the software, generate an artificial data matrix of normalrandom numbers, generate an instance of an AN from it, and start up by creating a windowshowing a blockplot of correlations as they arise from pure random association among 100variables given a sample size of 200, divided into two block of 100 variables each, suffixed“A” and “B”, respectively. The reader may left-drag the mouse in the plot to see a firstrealtime response.
To prevent confusion in the operation of an AN, users should note the following funda-mental points:
• Important: While the AN is running, the R interpreter (R Gui) is blocked by theexecution of the AN’s event loop! All interactions must be directed at the masterwindow of the AN, which usually shows a blockplot.
• Quitting the AN and returning to the R interpreter is done by typing the capitalletter ‘Q’ into the AN master window. The master window will remain as a passiveR plot window. It will no longer respond to user input, but the R interpreter (R Gui)will be responsive again. (A live AN can also be stopped violently by typing interruptcharacters ctrl-C into the R interpreter or by killing the AN master window, but aneducated R user wouldn’t be this crude.)
18
• Help: On typing the letter ‘h’ into a live AN, a help window will appear with tersedocumentation of all AN interactions. The window is meant to give reminders topreviously initiated AN users, not introductions to beginners. — The help window isactually a menu such that selecting a line documenting a keystroke will emulate theeffects of the keystroke. Because the help window is a menu, it must be closed in orderto regain the AN’s attention. (This behavior will be changed in a future version.)
• Notion of ‘state’: An AN instance has internal state. As a consequence, whenevera user stops a live AN and restarts it, it will resume in the exact state in which it wasstopped.
• Saving ‘state’: From the previous point follows that state of an AN is saved acrossR sessions if the core image has been saved (save.image()) before quitting the R ses-sions.
4.2 Moving Around: Crosshair Placement, Panning and Zooming
When an AN is run for the first time, it shows an overview of the complete correlationtable, which may comprise hundreds of variables. Most likely the variables will be organizedin variable groups that are characterized by shared suffixes of variable names and visuallyform a series of highlight squares along the ascending diagonal. The first order of businessis to zoom in and pan up and down the ascending diagonal to gain an overview of thesesub-tables. Here are the steps:
• Crosshair: Place it by left-clicking anywhere in the plotting area. All subsequentzooming is done with regard to the location of the crosshair; it is also the referencepoint for some panning operations. Repeat left-clicking a few times for practice. Thelast location of the crosshair will be the target for zooming, described next.
• Zooming: Hit the following for a single step of zooming, or keep depressed for “con-tinuous” zooming.
– ‘i’ for zooming in (alternate: ‘=’).
– ‘I’ for accelerated zooming in (alternate: ‘+’).
– ‘o’ for zooming out (alternate: ‘-’).
– ‘O’ for accelerated zooming out (alternate: ‘_’).
Accelerated zooming changes the visible range by a factor 2, whereas regular zoomingis adjusted such that 12 steps change the visible range by a factor of 2. Thus theaccelerated zooms are usually done discretely with single keystrokes, and the regularzooms in “continuous” mode with depressed keys. For practice, zoom in and out a fewtimes with your choice of key alternates.
19
• Panning (shifting, translating) is most frequently done by dragging the mouse, butkeystrokes are sometimes useful for vertical, horizontal, and diagonal searching.
– Left-depress the mouse and drag; the plot will follow. When heavily zoomed outfrom a large table, the response may be slow. The response to mouse draggingwill be the swifter the more zoomed in the view is.
– ‘←’, ‘→’, ‘↑’, ‘↓’ for translation in the obvious directions by one block/variableper keystroke.
– ‘d’/‘D’ for diagonal moves down/up the ascending 45 degree diagonal.
– ‘ ’, the space bar for accelerated panning by doing the last single-step keyboardmove in jumps of five blocks/variables instead of one.
– ‘.’ to pan so the crosshair location becomes the center of the view.
– ‘[’, ‘]’, ‘{’, ‘}’ to pan so the crosshair location becomes, respectively, the bottomleft, the bottom right, the top left, or the top right of the view.
Yet another method of panning will be described below under “Text Search for Vari-able Names.” Combined pan/zoom based on focus rectangles is described in the nextsubsection.
4.3 Graphical Parameters
Graphical parameters that determine the aesthetics of a plot are rarely gotten right byautomatic algorithms. The problem of aesthetics is particularly difficult when zooming inand out over several orders of magnitude is the order of the day. The AN therefore makes noteven an attempt to guess pleasing and much less optimal values for such graphical parametersas font size of variable labels and margin size in blockplots. Instead, the user gets to choosethem by trial and error as follows:
• Block size in the blockplot: hit or depress
– ‘b’ to decrease,
– ‘B’ to increase.
After starting up a new AN, adjusting the block size is usually the second operationafter zooming in.
• Crosshair size: hit or depress
– ‘c’ to decrease,
– ‘C’ to increase.
Exploding the crosshair by depressing ‘C’ is an effective method for reading the variablenames of a given block in the margins.
• Font size of the variable labels: hit or depress
20
– ‘f’ to decrease,
– ‘F’ to increase.
Important: When the font size is large in relation to the zoom, the variable labelsget “thinned out” to avoid gross overplotting (only every second, third ... label mightbe shown). This allows viewers to at least identify the variable group from the suffix.
• Margin size for the variable labels: hit or depress
– ‘m’ to decrease,
– ‘M’ to increase.
Margin size needs adjusting according to the prevalent label length and font size. Adilemma occurs when, for example, the x-variable labels are much shorter than they-variable labels. For this situation we want the following:
• Differential margin size for the variable labels: hit or depress
– ‘n’ to decrease the left/y margin and increase the bottom/x margin,
– ‘N’ to increase the left/y margin and decrease the bottom/x margin.
4.4 Correlations, P-values, Missing and Complete Pairs
By default the blockplot of a AN represents correlations, but the user can choose them torepresent p-values or fraction of missing (incomplete) pairs or fraction of complete pairs asfollows: Hit
• ‘ctrl-O’ for observed correlations,
• ‘ctrl-P’ for p-values of the correlations (Section 3.3),
• ‘ctrl-M’ for fraction of missing/incomplete pairs (Section 3.4),
• ‘ctrl-N’ for fraction of complete pairs (Section 3.4),.
As discussed in Section 3.3, p-values can be thresholded to obtain Bonferroni-style protectionagainst multiplicity. The thresholds are confined to a ladder of “round” values. Stepping upand down the ladder is achieved by repeatedly hitting
• ‘>’ to lower the threshold and obtain greater protection,
• ‘<’ to raise the threshold and lose protection.
Recall Figure 5 for two examples of p-value blockplots that differ in the threshold only. —Thresholding also applies to correlation blockplots, in which case ‘>’ raises the threshold onthe magnitude of the correlations that are shown, and ‘<’ lowers it.
Sometimes it is useful to compare magnitudes of the blocks without the distraction ofcolor, hence it may be convenient to hit
• ‘ctrl-A’ to toggle between showing all blocks in blue (ignoring signs) and showing thenegative correlations (and their p-values) in red.
21
4.5 Highlighting (1): Strips
Highlight strips are horizontal or vertical bands that run across the whole width or height ofthe blockplot. They help users search the associations of a given variable with all other vari-ables. Cross-wise highlight strips are also often placed to maintain the connection betweena given block and the labels of the associated variable pair. By default the color of highlightstrips is ”lightgoldenrod1” in R. Their appearance is shown in Figure 2. Highlight stripscan co-exist in any number and combination, horizontally and vertically. The mechanismsfor creating and removing them are as follows:
• Right-click the mouse on
– a block in the blockplot to place a horizontal and a vertical highlight strip throughthe block;
– an x-variable label on the horizontal axis to place a vertical highlight strip throughthis variable;
– a y-variable label on the vertical axis to place a horizontal highlight strip throughthis variable.
• Hit ‘ctrl-C’ to clear the strips and start from scratch.
Instead of clicking one can right-depress and drag the mouse across the blockplot with theeffect that horizontal and vertical strips are placed across all blocks touched by the dragmotion.
Vertical highlight strips lend themselves to convenient searching of associations betweena fixed variable on the horizontal axis and all variables on the vertical axis. To this end it isuseful to pan vertically with ‘↑’, ‘↓’, and the space bar as accelerator (Section 4.2).
4.6 Highlighting (2): Rectangles
A highlight rectangle is a rectangular area in the blockplot selected by the user for high-lighting. Highlight rectangles are meant to help the user focus on the associations betweencontiguous groups of variables on the horizontal and the vertical axis. By default the colorof highlight rectangles is ”lightcyan1” in R. Their appearance is that of the center square inFigure 4. In the case of this figure, the highlight rectangle coincides with the highlight squarefor the variable group defined by the suffix “p1.CDV”. Unlike highlight squares, which markpredefined variable groups, highlight rectangles can be placed (and removed from) anywhereby the user. The mechanisms to this end are as follows:
• Define a highlight rectangle in arbitrary position by placing two opposite corners:
– Place the crosshair in the location of the desired first corner; thenhit ‘1’ to place the first corner of a new rectangle.
– Place the crosshair in the location of the desired second corner; thenhit ‘2’ to place the second corner.
22
Action ‘1’ creates a new highlight rectangle consisting of just one block. Action ‘2’never creates a new block but only sets/resets the second corner of the most recentrectangle.
• Define a highlight rectangle in terms of two variable groups:
– Place the crosshair such that the x-coordinate is in the desired horizontal variablegroup and the y-coordinate in the desired vertical variable group; then
– hit ‘3’ to create the highlight rectangle.
As a special case, this allows a highlight square to become a highlight rectangle byletting the x- and y-variable groups be the same, as in Figure 4.
• Pan and zoom to snap the view and the highlight rectangle to each other:
– Place the crosshair in the highlight rectangle to be snapped; then
– either hit ‘4’ to snap, preserving the aspect ratio,
– or hit ‘5’ to snap, distorting the aspect ratio, unless the rectangle is a square.
If the crosshair is not placed in a highlight rectangle, the most recent one will be used.Note that the squares in a blockplot always remain squares, even if the aspect ratio ofthe plot has been distorted. Changing the aspect ratio has the consequence that thesquares can no longer fill their cells because they have become rectangles.
• Any number of highlight rectangles can co-exist. Remove them selectively as follows:
– Place the crosshair anywhere in a highlight rectangle to be removed; then
– hit ‘0’ to remove it.
4.7 Reference variables
A recurrent issue when using the AN is that some variables are often of persistent interest.In autism phenotype data, for example, a recurrent theme is to check up on age, gender andsite association (potential confounders) while examining associations within and betweenvarious “autism instruments” such as ADOS, ADI, RBS,... To spare users the distraction ofhopping back and forth across the multi-hundred square table, the AN implements a notionof “reference variables”, that is, variables that never disappear from view. The AN keepsthem tucked in the left and the bottom of the blockplot. The manner in which referencevariables present themselves is shown in Figure 10. The mechanism for selecting referencevariables is by first selecting them with highlight strips (Section 4.5), and then hitting
• ‘R’ to turn the strip variables into reference variables,
• ‘r’ to toggle on and off the display of the selected reference variables.
23
age_
at_a
dos_
p1.C
DV
se
x_p1
.CD
V
ethn
icity
_p1.
CD
V
ados
_mod
ule_
p1.C
DV
age_
at_a
dos_
p1.C
DV
fa
mily
_typ
e_p1
.CD
V
sex_
p1.C
DV
et
hnic
ity_p
1.C
DV
cp
ea_d
x_p1
.CD
V
adi_
r_cp
ea_d
x_p1
.CD
V
adi_
r_so
c_a_
tota
l_p1
.CD
V
adi_
r_co
mm
_b_n
on_v
erba
l_to
tal_
p1.C
DV
ad
i_r_
b_co
mm
_ver
bal_
tota
l_p1
.CD
V
adi_
r_rr
b_c_
tota
l_p1
.CD
V
adi_
r_ev
iden
ce_o
nset
_p1.
CD
V
ados
_mod
ule_
p1.C
DV
di
agno
sis_
ados
_p1.
CD
V
ados
_css
_p1.
CD
V
ados
_soc
ial_
affe
ct_p
1.C
DV
ad
os_r
estr
icte
d_re
petit
ive_
p1.C
DV
ad
os_c
omm
unic
atio
n_so
cial
_p1.
CD
V
ssc_
diag
nosi
s_ve
rbal
_iq_
p1.C
DV
ss
c_di
agno
sis_
verb
al_i
q_ty
pe_p
1.C
DV
ss
c_di
agno
sis_
nonv
erba
l_iq
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
full_
scal
e_iq
_p1.
CD
V
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_t
ype_
p1.C
DV
ss
c_di
agno
sis_
vma_
p1.C
DV
ss
c_di
agno
sis_
nvm
a_p1
.CD
V
vine
land
_ii_
com
posi
te_s
tand
ard_
scor
e_p1
.CD
V
srs_
pare
nt_t
_sco
re_p
1.C
DV
sr
s_pa
rent
_raw
_tot
al_p
1.C
DV
sr
s_te
ache
r_t_
scor
e_p1
.CD
V
srs_
teac
her_
raw
_tot
al_p
1.C
DV
rb
s_r_
over
all_
scor
e_p1
.CD
V
cbcl
_2_5
_int
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_2_5
_ext
erna
lizin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_in
tern
aliz
ing_
t_sc
ore_
p1.C
DV
cb
cl_6
_18_
exte
rnal
izin
g_t_
scor
e_p1
.CD
V
abc_
tota
l_sc
ore_
p1.C
DV
no
n_fe
brile
_sei
zure
s_p1
.CD
V
febr
ile_s
eizu
res_
p1.C
DV
family.ID sz.sorted_sites.FAM
srs_adult_total.mo_cuPARENT age_at_ados_p1.CDV
family_type_p1.CDV sex_p1.CDV
ethnicity_p1.CDV cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV diagnosis_ados_p1.CDV
ados_css_p1.CDV ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV bckgd_hx_highest_edu_mother_p1.OCUV
Correlations(Compl.Pairs)
Figure 10: Reference variables shown in the left and bottom bands. Whenever the userzooms and pans the blockplot, these variables stay in place and show their associations withthe variables from the rest of the blockplot.
24
The disentangling of the two actions allows users to keep marking up strips without changingthe earlier selected reference variables.
In Figure 10, the y-reference variables are “sz.sorted sites.FAM” and “family.ID”, andtheir associations with the x-variables are shown in the horizontal band at the bottom. Sim-ilarly, the x-reference variables are “age a ados p1.CDV”, “sex p1.CDV”, “ethnicity p1.CDV”and “ados module p1.CDV”, and their associations with the y-variables are shown in the ver-tical band on the left. In the bottom left corner, the intersection of the reference bands, areshown the associations between x- and y reference variables.
4.8 Searching Variables
Other recurrent issues with analyzing large numbers of variables is simply finding variables.For example,
• find a variable whose name one remembers partly, but not exactly; or
• find a set of variables whose names share a meaningful syllable.
In the context of autism, for example, it might be of interest to find all variables relatedto anxiety across all instruments; it would then be sensible to search for all variables thatcontain the phoneme “anx” in their name. This type of problem can be solved in theAN with a blend of text search and menu selection. We address here the problem of locatingone variable and panning to it. To this end hit...
• ‘H’ to locate a variable on the x-axis;
• ‘V’ to locate a variable on the y-axis;
• ‘@’ to locate a variable on both the x- and the y-axis.
In each case a dialog box pops up where a search string or regular expression can be entered.On hitting ‘<Return>’ or ‘OK’, a menu appears with the list of variables that contains thesearch string or matches the regular expression (according to R’s grep() function). The useris then asked to select one of the offered variables, upon which the AN pans to the variable(depending on ‘H’, ‘V’ or ‘@’) on the x- or the y-axis or both, marks it with a vertical orhorizontal highlight strip or both, and places the crosshair on it. See Figure 11.
Search can be bypassed by not entering a search string at all. The menu shows then thecomplete list of all variables with scrolling.
4.9 Lenses: Scatterplots and Barplots/Histograms
We think of barplots, histograms and scatterplots as lenses into the blocks, each of whichrepresents a pair (x, y) of variables. Taking the pair “under the lens” means looking at theassociation (and the marginal distribution) in greater detail; see Section 3.5 above. Themechanics are as follows: Hit
25
rbs_
r_vi
_res
tric
ted_
beha
vior
_p1.
OC
UV
ab
c_nb
r_m
issi
ng_p
1.O
CU
V
abc_
i_irr
itabi
lity_
p1.O
CU
V
abc_
iv_h
yper
activ
ity_p
1.O
CU
V
abc_
ii_le
thar
gy_p
1.O
CU
V
abc_
iii_s
tere
otyp
y_p1
.OC
UV
ab
c_v_
inap
prop
riate
_spe
ech_
p1.O
CU
V
scq_
life_
nbr_
mis
sing
_p1.
OC
UV
sc
q_lif
e_ite
m_1
_p1.
OC
UV
sc
q_lif
e_to
tal_
p1.O
CU
V
cbcl
_2_5
_anx
ious
_dep
ress
ed_p
1.O
CU
V
cbcl
_2_5
_som
atic
_com
plai
nts_
p1.O
CU
V
cbcl
_2_5
_with
draw
n_p1
.OC
UV
cb
cl_2
_5_s
leep
_pro
blem
s_p1
.OC
UV
cb
cl_2
_5_a
ttent
ion_
prob
lem
s_p1
.OC
UV
cb
cl_2
_5_a
ggre
ssiv
e_be
havi
or_p
1.O
CU
V
cbcl
_2_5
_tot
al_p
robl
ems_
p1.O
CU
V
cbcl
_2_5
_affe
ctiv
e_pr
oble
ms_
p1.O
CU
V
cbcl
_2_5
_anx
iety
_pro
blem
s_p1
.OC
UV
cb
cl_2
_5_p
erva
sive
_dev
elop
men
tal_
p1.O
CU
V
cbcl
_2_5
_add
_adh
d_p1
.OC
UV
cbcl_2_5_add_adhd_p1.OCUV cbcl_2_5_oppositional_defiant_p1.OCUV
cbcl_6_18_activities_p1.OCUV cbcl_6_18_social_p1.OCUV
cbcl_6_18_school_p1.OCUV cbcl_6_18_total_competence_p1.OCUV
cbcl_6_18_anxious_depressed_p1.OCUV cbcl_6_18_withdrawn_p1.OCUV
cbcl_6_18_somatic_complaints_p1.OCUV cbcl_6_18_social_problems_p1.OCUV
cbcl_6_18_thought_problems_p1.OCUV cbcl_6_18_attention_problems_p1.OCUV
cbcl_6_18_rule_breaking_p1.OCUV cbcl_6_18_aggressive_behavior_p1.OCUV
cbcl_6_18_total_problems_p1.OCUV cbcl_6_18_affective_problems_p1.OCUV cbcl_6_18_anxiety_problems_p1.OCUV
cbcl_6_18_somatic_prob_p1.OCUV cbcl_6_18_add_adhd_p1.OCUV
cbcl_6_18_oppositional_defiant_p1.OCUV cbcl_6_18_conduct_problems_p1.OCUV
cbcl_2_5_emotionally_reactive_p1.OCUV
Correlations(Compl.Pairs)
Figure 11: Text search with ‘H’ for horizontal variables containing “anx”, followed by se-lection of “cbcl 2 5 anxious depressed p1.OCUV”. The view pans horizontally to the selectedvariable, marks it with a vertical hilight strip, and places the crosshair on it.
26
• ‘x’ to see in a separate window (Figure 7) a scatterplot and barplots/histograms of thetwo variables marked by the crosshair cursor.
• ‘y’ to switch the x-y roles of the variables.
• ‘l’ to toggle showing a “line”, that is, a smooth if x is quantitative, and a trace ofy-means of the x-groups if x is categorical.
Important: The lens window is passive and does not accept interactive input. One mustexpose the blockplot master window to continue with AN interactions.
These lenses have a simple history mechanism in that the consecutive x-y variable namesare collected in a list that can be traversed and edited: Hit
• ‘PgUp’ to take one step back in the history,
• ‘PgDn’ to take one step forward in the history,
• ‘Home’ to jump to the beginning of the history,
• ‘End’ to jump to the end of the history (the present),
• ‘Delete’ to delete the current lens from the history.
Finally, there is a separate lens mechanism with its own window that shows all pairwisescatterplots of the variables currently in highlight strips. An example is shown in Figure 8.As to the mechanics, hit
• ‘z’ to create the scatterplot matrix with independently scaled axes;
• ‘Z’ to create the scatterplot matrix with identically scaled axes.
The latter option is sometimes useful when all variables live on the same scale but havesomewhat different ranges.
4.10 Color-Brushing in Scatterplots
Often one would like to focus on groups of cases in the scatterplots of the lens window. Thiscan be achieved with color brushing as follows:
• Hit ‘s’ to see the current lens scatterplot in the main window, replacing the blockplot.
• Hit ‘r’ to fix one corner of a brush at the current mouse location.
• Left-depress and drag the mouse: the rectangular brushing area should open up andchange shape. Whenever the brush moves over a scatterplot point, it will change color.
• Right-depress and drag the mouse: the rectangular brushing area will translate alongwith the mouse. Again, moving over scatterplot points will change their color.
• The brushing color can be changed by cycling through a series of colors, hitting ‘S’.The color gray does not paint; it is useful for counting the points under the brush astheir number is shown in the bottom left corner.
27
Figure 12: Screenshot of the adjustment menu. As shown, it enables adjustment of the “srs”variables for “age at ados p1.CDV” and “sex p1.CDV”.
28
• Hit ‘s’ to return to the blockplot in the main window.
Thus, hitting ‘S’ toggles between blockplot and scatterplot in the main window. After eachbrushing operation, the lens scatterplot will follow suit and color its points to match thosein the main window.
4.11 Linear Adjustment
Another recurrent task in large tables is what we may call “adjustment”. The phrase “ad-justing for x” has many synonyms: “accounting for x”, “controlling for x”, “correcting for x”,“allowing for x”, and “holding x fixed” or “conditioning on x”. Technically most correct isthe last expression: We are often interested in the conditional association between variablesy and z given (holding fixed) a variable x, as measured for example by the conditional corre-lation r(y, z|x). In the context of the autism phenotype, one may be interested in adjustingfor age and/or gender. In practice, particulary in large-p problems, there is rarely sufficientdata to truly estimate conditional distributions,10 hence one makes the simplifying assump-tion that all associations are linear with constant conditional variances (homoscedasticity).11
In that case, adjustment of y for x amounts to a linear regression and forming residuals, thatis, “residualizing” or “partialling out” is done by subtracting the equation fitted with linearregression: y•x = y− (b0 + b1x). As a consequence, r(y•x, x) = 0, that is, by forming y•x oneremoves from y the linear association with x. This type of linear adjustment generalizes tomultiple x variables by residualizing with regard to a multiple linear regression.
In the AN implementation of linear adjustment, one has to select a set of “independent”x-variables, called “adjustors”, and a set of “dependent” y-variables, called the “adjustees”.Often the set of adjustors is small, possibly just one variable such as age, whereas the set ofadjustees can be large, for example, all items and summary scales of an autism phenotypeinstrument such as the SRS (“Social Responsiveness Scale”). The selection mechanismsare the same for both adjustors and adjustees: text search or regular expression matching,followed by menu selection, similar to Section 4.8, but here the menu selection allows multiplechoices. The mechanics are as follows: Hit
• ‘A’ to call up a large menu that forms the interface for all adjustment operations.
An example is shown in Figure 12. Initially, the list of adjustors and adjustees will be empty,so both need to be populated with text searches that require a dialog initiated by selectingthe lines “Find ADJUSTORS...” and “Find ADJUSTEES...” in sequence. Figure 12 shows thestate after having matched the regular expression “age at|sex p1.CDV” for adjustors andsearched the string “CDV” for adjustees.
Finally, after selection of adjustors and adjustees is completed, the user may select thetop line of the menu to actually “Do Adjustment”. Each raw adjustee will then be replaced
10Natural exceptions do exist: If we analyze females and males separate, for example, we study gender-conditional associations.
11Both assumptions may be wrong, but some form of adjustment, even if flawed, is often more informativethan remaining with raw variables.
29
by its residuals obtained from the regression onto the adjustors. (To undo adjustment, selectthe second line from the Adjustment dialog, “Undo Adjustment”.)
To assist the visual examination of adjustment results, one may want to select the thirdline from the top of the menu in order to highlight the adjustors among the x-variables andthe adjustees among the y-variables (“Mark with Highlight Strips...”). Turning themfurther into reference variables (Section 4.7) by hitting “R”, we obtain Figure 13. As itshould be, the correlations between the two adjustors on the x-axis and the many adjusteeson the y-axis vanish. The correlations of the adjustees with other variables many now beof renewed interest because they are free of age and gender “effects”, which would invite asearch of the correlations in the horizontal band of the adjustees.
A word of caution: Adjustment of a y-variable is done using only cases for which thereare no missing values among the adjustors and obviously the adjustee is not missing either.Thus the underlying set of cases may have been inadvertently decreased. It is therefore goodadvice to check the missing-pairs patterns with either ‘ctrl-M’ or ‘ctrl-N’ (Section 4.4) orby looking at scatterplots (Section 4.9).
Having done adjustment of variables, one often wonders how much of it was done and towhich variable. To answer this question, select the fourth line from the adjustment dialog(“Sort Adjustees...”): The result is a sorted list of the adjustees according to the R2 valuesfrom the regression of the adjustees/y-variables onto the adjustors/x-variables. See Figure 14for an example.
4.12 The Future of the AN
The functionality described here reflects the 2015 implementation of the AN. Changes tothe are planned, the major one being a redesign to give the lens windows interactive respon-siveness as well. Currently all interaction is funneled throught the blockplot window, evenif the actions affect the lens window.
Other obvious functionality is still missing, above all sorting of variables, manual andalgorithmic, and a limited set of sorting operations may be added in a future version of theAN. If readers of this document and users of the AN have further suggestions, the authorswould appreciate hearing.
30
age_
at_a
dos_
p1.C
DV
se
x_p1
.CD
V
adi_
r_b_
com
m_v
erba
l_to
tal_
p1.C
DV
ad
i_r_
rrb_
c_to
tal_
p1.C
DV
ad
i_r_
evid
ence
_ons
et_p
1.C
DV
ad
os_m
odul
e_p1
.CD
V
diag
nosi
s_ad
os_p
1.C
DV
ad
os_c
ss_p
1.C
DV
ad
os_s
ocia
l_af
fect
_p1.
CD
V
ados
_res
tric
ted_
repe
titiv
e_p1
.CD
V
ados
_com
mun
icat
ion_
soci
al_p
1.C
DV
ss
c_di
agno
sis_
verb
al_i
q_p1
.CD
V
ssc_
diag
nosi
s_ve
rbal
_iq_
type
_p1.
CD
V
ssc_
diag
nosi
s_no
nver
bal_
iq_p
1.C
DV
ss
c_di
agno
sis_
nonv
erba
l_iq
_typ
e_p1
.CD
V
ssc_
diag
nosi
s_fu
ll_sc
ale_
iq_p
1.C
DV
ss
c_di
agno
sis_
full_
scal
e_iq
_typ
e_p1
.CD
V
ssc_
diag
nosi
s_vm
a_p1
.CD
V
ssc_
diag
nosi
s_nv
ma_
p1.C
DV
vi
nela
nd_i
i_co
mpo
site
_sta
ndar
d_sc
ore_
p1.C
DV
sr
s_pa
rent
_t_s
core
_p1.
CD
V
srs_
pare
nt_r
aw_t
otal
_p1.
CD
V
srs_
teac
her_
t_sc
ore_
p1.C
DV
sr
s_te
ache
r_ra
w_t
otal
_p1.
CD
V
rbs_
r_ov
eral
l_sc
ore_
p1.C
DV
cb
cl_2
_5_i
nter
naliz
ing_
t_sc
ore_
p1.C
DV
cb
cl_2
_5_e
xter
naliz
ing_
t_sc
ore_
p1.C
DV
cb
cl_6
_18_
inte
rnal
izin
g_t_
scor
e_p1
.CD
V
cbcl
_6_1
8_ex
tern
aliz
ing_
t_sc
ore_
p1.C
DV
ab
c_to
tal_
scor
e_p1
.CD
V
non_
febr
ile_s
eizu
res_
p1.C
DV
fe
brile
_sei
zure
s_p1
.CD
V
bckg
d_hx
_hig
hest
_edu
_mot
her_
p1.O
CU
V
bckg
d_hx
_hig
hest
_edu
_fat
her_
p1.O
CU
V
bckg
d_hx
_ann
ual_
hous
ehol
d_p1
.OC
UV
bc
kgd_
hx_p
aren
t_re
latio
n_st
atus
_p1.
OC
UV
ss
c_dx
_bes
t_es
timat
e_dx
_lis
t_p1
.OC
UV
ss
c_dx
_ove
rallc
erta
inty
_p1.
OC
UV
ge
nder
_sib
1_p1
.OC
UV
nb
r_st
illbi
rth_
mis
carr
iage
_p1.
OC
UV
pr
oban
d_bi
rth_
orde
r_p1
.OC
UV
fa
mily
_str
uctu
re_p
1.O
CU
V
adi_
r_q0
9_si
ngle
_wor
ds_p
1.O
CU
V
wor
d_de
lay_
p1.O
CU
V
adi_
r_q1
0_fir
st_p
hras
es_p
1.O
CU
V
phra
se_d
elay
_p1.
OC
UV
ad
i_r_
q30_
over
all_
lang
uage
_p1.
OC
UV
ad
i_r_
q86_
abno
rmal
ity_e
vide
nt_p
1.O
CU
V
adi_
r_q8
7_ab
norm
ality
_man
ifest
_p1.
OC
UV
ad
os1_
algo
rithm
_p1.
OC
UV
ad
os2_
algo
rithm
_p1.
OC
UV
a1
_non
_ech
oed_
p1.O
CU
V
ados
_com
mun
icat
ion_
p1.O
CU
V
ados
_rec
ipro
cal_
soci
al_p
1.O
CU
V
vabs
_ii_
com
mun
icat
ion_
p1.O
CU
V
vabs
_ii_
dls_
stan
dard
_p1.
OC
UV
va
bs_i
i_so
c_st
anda
rd_p
1.O
CU
V
vabs
_ii_
mot
or_s
kills
_p1.
OC
UV
sr
s_pa
rent
_nbr
_mis
sing
_ite
ms_
p1.O
CU
V
srs_
pare
nt_a
war
enes
s_p1
.OC
UV
sr
s_pa
rent
_cog
nitio
n_p1
.OC
UV
sr
s_pa
rent
_com
mun
icat
ion_
p1.O
CU
V
srs_
pare
nt_m
anne
rism
s_p1
.OC
UV
sr
s_pa
rent
_mot
ivat
ion_
p1.O
CU
V
srs_
teac
her_
nbr_
mis
sing
_ite
ms_
p1.O
CU
V
srs_
teac
her_
awar
enes
s_p1
.OC
UV
sr
s_te
ache
r_co
gniti
on_p
1.O
CU
V
srs_
teac
her_
com
mun
icat
ion_
p1.O
CU
V
srs_
teac
her_
man
neris
ms_
p1.O
CU
V age_at_ados_p1.CDV
family_type_p1.CDV sex_p1.CDV
ethnicity_p1.CDV cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV diagnosis_ados_p1.CDV
ados_css_p1.CDV ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV ssc_dx_best_estimate_dx_list_p1.OCUV
ssc_dx_overallcertainty_p1.OCUV gender_sib1_p1.OCUV
nbr_stillbirth_miscarriage_p1.OCUV proband_birth_order_p1.OCUV
family_structure_p1.OCUV adi_r_q09_single_words_p1.OCUV
word_delay_p1.OCUV adi_r_q10_first_phrases_p1.OCUV
phrase_delay_p1.OCUV adi_r_q30_overall_language_p1.OCUV
adi_r_q86_abnormality_evident_p1.OCUV adi_r_q87_abnormality_manifest_p1.OCUV
ados1_algorithm_p1.OCUV ados2_algorithm_p1.OCUV a1_non_echoed_p1.OCUV
ados_communication_p1.OCUV ados_reciprocal_social_p1.OCUV vabs_ii_communication_p1.OCUV
vabs_ii_dls_standard_p1.OCUV vabs_ii_soc_standard_p1.OCUV
vabs_ii_motor_skills_p1.OCUV srs_parent_nbr_missing_items_p1.OCUV
srs_parent_awareness_p1.OCUV srs_parent_cognition_p1.OCUV
srs_parent_communication_p1.OCUV srs_parent_mannerisms_p1.OCUV
srs_parent_motivation_p1.OCUV srs_teacher_nbr_missing_items_p1.OCUV
srs_teacher_awareness_p1.OCUV srs_teacher_cognition_p1.OCUV
Correlations(Compl.Pairs)
Figure 13: Results of adjustment of the “CDV” variables for “age at ados p1.CDV” and“sex p1.CDV”: the former are reference variables on the y-axis, the latter on the x-axis.As it should be, the correlations between adjustors and adjustees vanish.
31
Figure 14: List of adjustees/y-variables sorted according to the R2 values from the regressionsonto the adjustors/x-variables.
32
A Appendix: The Versatility of Correlation Analysis
We return to the apparent limitations of correlations as measures of association which wasleft as a loose end in the Introduction. We address the objections that (1) correlationsare measures of linear association only, (2) correlations reflect bivariate association only, and(3) correlations apply to quantitative variables only. Towards this end we make the followingobservations and recommendations:
(1) While it is true that correlation is strictly speaking a measure of linear associationamong quantitative variables, it is also a fact that correlation is useful as a measure ofmonotone association in general, even when it is non-linear. As long as the associationis roughly monotone, correlation will be positive when the association is increasing andnegative when it is decreasing. Admittedly, correlation is not an optimal measure ofnon-linear monotone association, but it is still a useful one, in particular in the large-pproblem. Lastly, if gross non-linearity is discovered, it is always possible to replacea variable X with a non-linear transform f(X) (often log(X)) so its association withother variables becomes more linear.12
(2) The objection that correlations only reflect bivariate association is factually correctbut practically not very relevant. In practical data analysis it is too contrived to en-tertain the possibility that, for example, there exists association among three variablesbut there exists no monotone association among each pair of variables.13 In generalone follows the principle that lower-order association is more likely than higher-orderassociation, hence pairwise association is more likely than true interaction among threevariables. Therefore data analysts look first for groups of variables that are linked bypairwise association, and thereafter they may examine whether these variables alsoexhibit higher-order association. Note, however, that even multivariate methods suchas principal components analysis (PCA) do not detect true higher-order interactionbecause they, too, rely on correlations only. Finally, we are not asserting that sim-ple correlation analysis should be the end of data analysis, but it should certainly benear the beginning in the large-p problems envisioned here, namely, in the analysis ofrelatively noisy data as they arise in many social science and medical contexts.14
12 Linearity of association is not a simple concept. For one thing, it is asymmetric: if Y is linearlyassociated with X, it does not follow that X is linearly associated with Y . The reason is that the definitionof linear association, E[Y |X] = β0 + β1X, is not symmetric in X and Y . Linearity of association in bothdirections holds only for certain “nice” distributions such as bivariate Gaussians. A counter-exampls is asfollows: Let X be uniformly distributed on an interval and Y = β0 + β1X + ε with independent Gaussian ε,then Y is linearly associated with X by construction, yet X is not linearly associated with Y .
13 An example would be three variables jointly uniformly distributed on the surface of a 2-sphere in 3-space.14 In other large-p problems the variables may be so highly structured that they become intrinsically low-
dimensional, as for example in the analysis of libraries of registered images where each variable correspondsto a pixel location and its values consist of intensities at that location across the images. The problem here isnot to locate groups of variables with association but to describe the manifold formed by the images in veryhigh-dimensional pixel space. A sensible approach in this case would be non-linear dimension reduction.
33
(3) The final objection we consider is that correlations do not apply to categorical variables.This objection can be refuted with very practical advice on how to make categoricaldata quantitative and how to interpret the meaning of the resulting correlations. Wediscuss several cases in turn:
– If a categorical variable X is ordinal (its categories have a natural order), it iscommon practice to simply number the categories in order and use the resultinginteger variable as a quantitative variable. The resulting correlations will be ableto reflect monotone association with other variables that may be expressed bysaying “the higher categories of X tend to be associated with higher/lower val-ues/categories of other variables.” — An obvious objection is that the equi-spacedintegers may not be a good quantification of the categories. If this is a seriousconcern worth some effort, one may want to look into optimal scoring procedures(see, for example, De Leeuw and Rijckevorsel (1980) and Gifi (1990)). The ideabehind these methods is to estimate new scores for the categorical variables bymaking them as linearly associated as possible through optimization of the fit ofa joint PCA.
– If a categorical variable X is binary, it is common practice to numerically codeits two categories with the values 0 and 1, thereby creating a so-called “dummyvariable.” This practice is pervasive in the Analysis of Variance (ANOVA), butits usefulness is lesser known in multivariate analysis which is our concern. Theinterpretation of correlations with dummy variables is highly interesting as itsolves two seemingly different association problems:
∗ First order association between a binary variable X and a quantitative vari-able Y means that there exists a difference between the two means of Y inthe two groups denoted by X. As it turns out, the correlation of a dummyvariable X with a quantitative variable Y is mathematically equivalent toforming the t-statistic for a two-sample comparison of the two means of Yin the two categories of X (t ∝ r/(1 − r2)1/2). Even more, the statisticaltest for a zero correlation is virtually identical to the t-test for equality ofthe two means. Thus two-sample mean-comparisons can be subsumed undercorrelation analysis.
∗ Association between two binary variables means that their 2×2 table showsdependence. This situation is usually addressed with Fisher’s exact test ofindependence. It turns out, however, that Fisher’s exact test is equivalent totesting the correlation between the dummy variables, the only discrepancybeing that the normal approximation used to calculate the p-value of a cor-relation is just that, an approximation, although an adequate one in mostcases.
– If a categorical variable X is truly nominal with more than two values, that is,neither binary nor ordinal, we may again follow the lead of ANOVA and replaceX with a collection of dummy variables, one per category. For example, if in a
34
medical context data are collected in multiple sites, it will be of interest to seewhether substantive variables in some sites are systematically different from othersites. It is then useful to introduce dummy variables for the sites and examine theircorrelations with the substantive variables. A significant correlation indicates asignificant mean difference at that site compared to the other sites.
This discussion shows that categorical variables can be fruitfully included in correlationanalysis, either with numerical coding of ordinal variables, or with dummy coding ofbinary and nominal variables.
This concludes our discussion of the versatility of correlation analysis.
B Appendix: Creating and ProgrammingAN Instances
To create a new instance of an Association Navigator for a given dataset, use the followingR statement:
a.n <- a.nav.create(datamatrix)
where ‘datamatrix’ is a numeric matrix, not a dataframe. The new AN instance ‘a.n’ canbe run with the following R statement:
a.nav.run(a.n)
These steps are completely general and may be useful for arbitrary numeric data matriceswith up to about 2,000 variables.
Table 1 shows a template for forming potentially useful instances of ANs that displaylarge numbers of SSC phenotype variables. As written the statement would produce anAN in the order of 3,000 variables.
AN’s are implemented not as lists but as “environments,” a relatively little known datastructure among most R users. Environments have some interesting properties. One canlook inside an AN with the R idiom
with(a.n, objects())
in order to list the AN-internal variables inside the AN instance ‘a.n’. Assignments and anyother kind of programming of the internal state variables can be achieved the same way. Forexample, if one desires a change of color of highlight strips to “mistyrose”, one can achievethis with the following:
with(a.n, { strips.col <- "mistyrose"; a.nav.blockplot() })
The call to a.nav.blockplot() redisplays the blockplot with the new paramater setting.Changing the blockplot glyph from square to diamond is achieved with
with(a.n, { blot.pch <- 18; a.nav.blockplot() })
35
and reversing the color convention from “blue = positive” to “red = positive” in the style ofheatmaps is done with
with(a.n, { blot.col.pos <- 2; blot.col.neg <- 4; a.nav.blockplot() })
Note, however, that this affects only blockplots, not heatmaps, the latter requiring compu-tation of a color scale, not just a binary color decision. Still, there is plenty of opportunityfor playfulness by experimenting with display parameters. A more sophisticated exampleconcerns changing the power transformation that maps correlations to glyph sizes:
with(a.n, { blot.pow <- .7; a.nav.cors.trans(); a.nav.blockplot() })
In addition to redisplay with a.nav.blockplot(), this also requires recomputation of thedisplay table with a.nav.cors.trans().
R environments represent one of the two data types (the other being “connections”)that disobeys the functional programming paradigm that is otherwise fundamental to R. Asa consequence, assignment of an AN does not allocate a new copy but passes a referenceinstead. In particular, the R statement
b.n <- a.n
creates a variable ‘b.n’ that will be a reference to the same environment as the variable ‘a.n’.Hence the two statements
a.nav.run(a.n)
a.nav.run(b.n)
will run off the same AN instance. They have identical effects in the sense that interactiveoperations affect the same instance.
36
References
[1] J. Bertin. Semiology of Graphics. Madison, WI: The University of Wisconsin Press,1983.
[2] W. S. Cleveland. The Elements of Graphing Data. Pacific Grove, CA: Wadsworth &Brooks/Cole, 1985.
[3] J. De Leeuw and J. van Rijckevorsel. HOMALS and PRINCALS - some generaliza-tions of principal components analysis. In E. Diday et al., editor, Data Analysis andInformatics II, pages 231–242. Amsterdam: Elsevier Science Publisher, North Holland,1980.
[4] M. Friendly. Corrgrams: Exploratory displays of correlation matrices. The AmericanStatistician, 56(4):316–324, 2002.
[5] A. Gifi. Nonlinear multivariate analysis. New York: John Wiley & Sons, 1990.
[6] M. Hills. On looking at large correlation matrices. Biometrika, 56(2):249–253, 1969.
[7] H. Hofmann. Exploring categorical data: Interactive mosaic plots. Metrika, 51(1):11–26,2000.
[8] D. J. Murdoch and E. D. Chow. A graphical display of large correlation matrices. TheAmerican Statistician, 50(2):178–180, 1996.
[9] A. Pilhoefer and A. Unwin. New approaches in visualization of categorical data: Rpackage extracat. Journal of Statistical Software, 53(7):1–25, 2013.
[10] S. S. Stevens. On the psychophysical law. The Psychological Review, 64(3):153–181,1957.
[11] S. S. Stevens. Psychophysics. New York: John Wiley & Sons, 1975.
[12] S. S. Stevens and E. H. Galanter. Ratio scales and category scales for a dozen perceptualcontinua. Journal of Experimental Psychology, 54(6):377–411, 1957.
[13] H. Wickham, H. Hofmann, and D. Cook. Exploring cluster analysis. http: // www.
had. co. nz/ model-vis/ clusters. pdf , 2006.
37
a.n <- a.nav.create(cbind(
"family.ID"=as.numeric(v.families),
v.sites, v.srs.bg, v.individual,
v.family, v.parent.race, v.parent.common,
v.proband.cdv, v.proband.ocuv, v.sibling.s1, v.sibling.s2,
v.ados.common,
v.ados.1, v.ados.1.raw, v.ados.2, v.ados.2.raw,
v.ados.3, v.ados.3.raw, v.ados.4, v.ados.4.raw,
v.adi.r.diagnostic, v.adi.r.pca, v.adi.r,
v.adi.r.dum, v.adi.r.loss,
v.ssc.diagnosis,
v.vineland.ii.p1, v.vineland.ii.s1,
v.cbcl.2.5.p1, v.cbcl.2.5.s1,
v.cbcl.6.18.p1, v.cbcl.6.18.s1,
v.abc, v.abc.raw, v.rbs.r, v.rbs.r.raw,
v.srs.parent.p1, v.srs.parent.recode.p1,
v.srs.teacher.p1, v.srs.teacher.recode.p1,
v.srs.parent.s1, v.srs.parent.recode.s1,
v.srs.teacher.s1, v.srs.teacher.recode.s1,
v.srs.adult.fa, v.srs.adult.recode.fa,
v.srs.adult.mo, v.srs.adult.recode.mo,
v.bapq.fa, v.bapq.recode.fa, v.bapq.mo, v.bapq.recode.mo,
v.fhi.interviewer.fa, v.fhi.interviewer.mo,
v.scq.current.p1, v.scq.life.p1,
v.scq.current.s1, v.scq.life.s1,
v.ctopp.nr, v.purdue.pegboard, v.dcdq, v.ppvt,
v.das.ii.early.years, v.das.ii.school.age,
v.ctrf.2.5, v.trf.6.18,
v.ssc.med.hx.v2.autoimmune.disorders, v.ssc.med.hx.v2.birth.defects,
v.ssc.med.hx.v2.chronic.illnesses, v.ssc.med.hx.v2.diet.medication.sleep,
v.ssc.med.hx.v2.genetic.disorders, v.ssc.med.hx.v2.labor.delivery.birth.feeding,
v.ssc.med.hx.v2.language.disorders,
v.ssc.med.hx.v2.medical.history.child.1, v.ssc.med.hx.v2.medical.history.child.2,
v.ssc.med.hx.v2.medical.history.child.3,
v.ssc.med.hx.v2.medications.drugs.mother,
v.ssc.med.hx.v2.neurological.conditions,
v.ssc.med.hx.v2.other.developmental.disorders, v.ssc.med.hx.v2.pdd,
v.ssc.med.hx.v2.pregnancy.history, v.ssc.med.hx.v2.pregnancy.illness.vaccinations,
v.ssc.psuh.fa, vv.ssc.psuh.mo,
v.temperature.form.raw
), remove=T )
Table 1: Template for joining large numbers of SSC tables and creating an AN for them.Readers should make a selection from this template as the full collection creates a data matrixwith about 3,000 variables.
38