+ All Categories
Home > Documents > Homogeneity Analysis in R: The Package homals

Homogeneity Analysis in R: The Package homals

Date post: 03-Feb-2022
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
19
Homogeneity Analysis in R: The Package homals Jan de Leeuw University of California, Los Angeles Patrick Mair Wirtschaftsuniversit¨ at Wien Abstract Homogeneity analysis combines maximizing the correlations between variables of a multivariate data set with that of optimal scaling. In this article we present methodological and practical issues of the R package homals which performs homogeneity analysis and various extensions. By setting rank constraints nonlinear principal component analysis can be performed. The variables can be partitioned into sets such that homogeneity analysis is extended to nonlinear canonical correlation analysis or to predictive models which emulate discriminant analysis and regression models. For each model the scale level of the variables can be taken into account by setting level constraints. All algorithms allow for missing values. Keywords : homogeneity analysis, correspondence analysis, nonlinear principal component analysis, nonlinear canonical correlation analysis, homals, R. 1. Introduction During the last years correspondence analysis (CA) has become a popular descriptive statisti- cal method to analyze categorical data (Benz´ ecri 1973; Greenacre 1984; Gifi 1990; Greenacre and Blasius 2006). Due to the fact that the visualization capabilities of statistical software have increased during this time, researchers of many areas apply CA and map objects and variables (respectively their categories) onto a common metric plane. Currently, R (R Development Core Team 2007) offers a variety of routines to compute CA and related models. An overview of corresponding functions and packages is given in Mair and Hatzinger (2007). The package ca (Nenadic and Greenacre 2006) is a comprehensive tool to perform simple and multiple CA. Various two- and three-dimensional plot options are provided. In this paper we present the R package homals, starting from the simple homogeneity analysis, which corresponds to a multiple CA, and providing several extensions. Gifi (1990) points out that homogeneity analysis can be used in a strict and a broad sense. In a strict sense homogeneity analysis is used for the analysis of strictly categorical data, with a particular
Transcript

Homogeneity Analysis in R:

The Package homals

Jan de LeeuwUniversity of California, Los Angeles

Patrick MairWirtschaftsuniversitat Wien

Abstract

Homogeneity analysis combines maximizing the correlations between variables of amultivariate data set with that of optimal scaling. In this article we present methodologicaland practical issues of the R package homals which performs homogeneity analysis andvarious extensions. By setting rank constraints nonlinear principal component analysis canbe performed. The variables can be partitioned into sets such that homogeneity analysis isextended to nonlinear canonical correlation analysis or to predictive models which emulatediscriminant analysis and regression models. For each model the scale level of the variablescan be taken into account by setting level constraints. All algorithms allow for missingvalues.

Keywords: homogeneity analysis, correspondence analysis, nonlinear principal componentanalysis, nonlinear canonical correlation analysis, homals, R.

1. Introduction

During the last years correspondence analysis (CA) has become a popular descriptive statisti-cal method to analyze categorical data (Benzecri 1973; Greenacre 1984; Gifi 1990; Greenacreand Blasius 2006). Due to the fact that the visualization capabilities of statistical softwarehave increased during this time, researchers of many areas apply CA and map objects andvariables (respectively their categories) onto a common metric plane.

Currently, R (R Development Core Team 2007) offers a variety of routines to compute CAand related models. An overview of corresponding functions and packages is given in Mairand Hatzinger (2007). The package ca (Nenadic and Greenacre 2006) is a comprehensivetool to perform simple and multiple CA. Various two- and three-dimensional plot options areprovided.

In this paper we present the R package homals, starting from the simple homogeneity analysis,which corresponds to a multiple CA, and providing several extensions. Gifi (1990) pointsout that homogeneity analysis can be used in a strict and a broad sense. In a strict sensehomogeneity analysis is used for the analysis of strictly categorical data, with a particular

2 Homals in R

loss function and a particular algorithm for finding the optimal solution. In a broad sensehomogeneity analysis refers to a class of criteria for analyzing multivariate data in general,sharing the characteristic aim of optimizing the homogeneity of variables under various formsof manipulation and simplification (Gifi 1990, p.81). This view of homogeneity analysis willbe used in this article since homals allows for such general computations. Furthermore, thetwo-dimensional as well as three-dimensional plotting devices offered by R are used to developa variety of customizable visualization techniques. More detailed methodological descriptionscan be found in Gifi (1990) and some of them are revisited in Michailidis and de Leeuw (1998).

2. Homogeneity Analysis

In this section we will focus on the underlying methodological aspects of homals. Startingwith the formulation of the loss function, the classical alternating least squares algorithm ispresented in brief and the relation to CA is shown. Starting from basic homogeneity analysiswe elaborate various extensions such as nonlinear canonical analysis and nonlinear principalcomponent analysis.

2.1. Establishing the loss function

Homogeneity analysis is based on the criterion of minimizing the departure from homogeneity.Homogeneity is measured by a loss function. To write the corresponding basic equations thefollowing definitions are needed. For i = 1, . . . , n objects, data on m (categorical) variablesare collected where each of the j = 1, . . . ,m variable takes on kj different values (their levelsor categories). We code them using n×kj binary indicator matrices Gj , i.e., a dummy matrixfor each variable. The whole set of indicator matrices can be collected in a block matrix

G∆=

[G1

... G2... · · ·

... Gm

]. (1)

Missing observations are coded as complete zero row sums; if object i is missing on variablej, then row sum i of Gj is 0, otherwise row sum becomes 1 since the category entries aredisjunct. This corresponds to the first missing option presented in Gifi (1990, p.74). Otherpossibilities would be to add an additional column to the indicator matrix for each variablewith missing data or to add as many additional columns as there are missing data for thej-th variable. However, all row sums of Gj are collected in the diagonal matrix Mj . SupposeM? is the sum of the Mj and M• is their average. Furthermore, we define

Dj∆=G′jMjGj = G′jGj , (2)

where Dj is the diagonal matrix (kj × kj) with the relative frequencies of variable j in itsmain diagonal.

Now let X be the unknown n × p matrix containing the coordinates (object scores) of theobject projections into Rp. Furthermore, let Yj be the unknown kj × p matrix containing thecoordinates of the category projections into the same p-dimensional space (category quantifi-cations). The problem of finding these solutions can be formulated by means of the following

Journal of Statistical Software 3

loss function to be minimized:

σ(X;Y1, . . . , Ym) ∆=1m

m∑j=1

tr(X −GjYj)′Mj(X −GjYj) (3)

We use the normalization u′M•X = 0 and X ′M•X = I in order to avoid the trivial solutionX = 0 and Yj = 0. The first restriction centers the graph plot (see Section 4) around theorigin whereas the second restriction makes the columns of the object score matrix orthogonal.

2.2. Geometry of the loss function

In the homals package we motivate homogeneity analysis as graphical method to exploremultivariate data sets. The joint plot where the object scores and the category quantificationsare mapped in a joint space, can be considered as the classical or standard homals plot. Thecategory points are the center of gravity of the object points that share the same category.The larger the spread between category points the better a variable discriminates and thus, itindicates how much a variable contributes to relative loss. The distance between two objectscores is related to the “similarity” between their response patterns. A “perfect” solution, i.e.,without any loss at all, would imply that all object points coincide with their category points.

Moreover, we can think of G as the adjacency matrix of a bipartite graph in which the nobjects and the categories kj are the vertices. In the corresponding graph plot an object anda category are connected by an edge if the object is in the corresponding category. The lossin (3) pertains to the sum of squares of the line lengths in the graph plot. Producing a starplot, i.e., connecting the object scores with their category centroid, the loss corresponds tothe sum over variables of the sum of squared line lengths. More detailed plot descriptions aregiven in Section 4.

2.3. Minimizing the loss function

Typically, the minimization problem is solved by the iterative alternating least squares algo-rithm (ALS; sometimes quoted as reciprocal averaging algorithm). At iteration t = 0 we startwith arbitrary object scores X(0). Each iteration t consists of three steps:

1. Update category quantifications: Y (t)j = D−1

j G′jX(t)

2. Update object scores: X(t) = M−1?

∑mj=1GjY

(t)j

3. Normalization: X(t+1) = orth(X(t))

Note that matrix multiplications using indicator matrices can be implemented efficiently ascumulating the sums of rows over X and Y .

Here orth is some technique which computes an orthonormal basis for the column spaceof a matrix. We can use QR decomposition, modified Gram-Schmidt, or the singular valuedecomposition (SVD). In homals the left singular vectors of X(k), here denoted as lsvec, areused.

4 Homals in R

To simplify, let Pj denote the orthogonal projector on the subspace spanned by the columnsof Gj , i.e., Pj = GjD

−1j G′j . Correspondingly, the sum over the m projectors is

P? =m∑

j=1

Pj =m∑

j=1

GjD−1j G′j . (4)

Again, P• denotes the average. By means of the lsvec notation and including P• we candescribe a complete iteration step as

X(k+1) = lsvec(X(k)) = lsvec(M−1• P•X

(k)). (5)

In each iteration we compute the value of the loss function to monitor convergence. Notethat Formula (5) is not suitable for computation, because it replaces computation with sparseindicator matrices by computations with a dense average projector.

Computing the homals solution in this way is the same as performing a CA on G. Usually,multiple CA solves the generalized eigenproblem for the Burt matrix C = G′G and its diagonalD (Greenacre 1984; Greenacre and Blasius 2006). Thus, we can put the problem in Equation3 into a SVD context (de Leeuw, Michailides, and Wang 1999). Using the block matrixnotation, we have to solve the generalized singular value problem of the form

GY = M?XΛ, (6)G′X = DY Λ, (7)

or equivalently one of the two generalized eigenvalue problems

GD−1G′X = M?XΛ2, (8)G′M−1

? GY = DY Λ2. (9)

Here the eigenvalues Λ2 are the ratios along each dimension of the average between-categoryvariance and the average total variance. Also X ′PjX is the between-category dispersion forvariable j. Further elaborations can be found in Michailidis and de Leeuw (1998).

Compared to the classical SVD approach the ALS algorithm only computes the first p di-mensions of the solution. This leads to an increase in computational efficiency. Moreover, bycapitalizing the sparseness of G, homals is able to handle large data sets.

3. Extensions of homogeneity analysis

Gifi (1990) provides various extensions of homogeneity analysis and elaborates connectionsto other multivariate methods. The package homals allows for imposing restrictions on thevariable ranks and levels as well as defining sets of variables. These options offer a wide spec-trum of additional possibilities for multivariate data analysis beyond classical homogeneityanalysis (cf. broad sense view in the Introduction).

Journal of Statistical Software 5

3.1. Nonlinear principal component analysis

Having a n×m data matrix with metric variables, principal components analysis (PCA) is acommon technique to reduce the dimensionality of the data set, i.e., to project the variablesinto a subspace Rp where p � m. The Eckart-Young theorem states that this classical formof linear PCA can be formulated by means of a loss function. Its minimization leads to an× p matrix of component scores and a m× p matrix of component loadings.

However, having nonmetric variables, nonlinear PCA (NLPCA) can be used. The term “non-linear” pertains to nonlinear transformations of the observed variables (de Leeuw 2006). InGifi terminology, NLPCA can be defined as homogeneity analysis with restrictions on thequantification matrix Yj . Let us denote rj ≤ p as the parameter for the imposed restrictionon variable j. If no restrictions are imposed, as e.g. for a simple homals solution, rj = kj − 1iff kj ≤ p, and rj = p otherwise.

We start our explanations with the simple case for rj = 1 for all j. In this case we say thatall variables are single and the rank restrictions are imposed by

Yj = zja′j , (10)

where zj is a vector of length kj with category quantifications and aj a vector of length pwith weights. Thus, each quantification matrix is restricted to rank-1 which allows for theexistence of object scores with a single category quantification.

Straightforwardly, Equation 10 can be extended to the general case

Yj = ZjA′j (11)

where again 1 ≤ rj ≤ min (kj − 1, p), Zj is kj × rj and Aj is p× rj . We require, without lossof generality, that Z ′jDjZj = I. Thus, we have the situation of multiple quantifications whichimplies imposing an additional constraint each time PCA is carried out.

To establish the loss function for the rank constrained version we write r? for the sum of therj and r• for their average. The block matrix G of dummies now becomes

Q∆=

[G1Z1

... G2Z2... · · ·

... GmZm

]. (12)

Gathering the Aj ’s in a block matrix as well, the p× r? matrix

A∆=

[A1

... A2... · · ·

... Am

](13)

results. Then, Equation 3 becomes

σ(X;Z;A) =m∑

j=1

tr(X −GjZjA′j)′Mj(X −GjZjA

′j) =

= tr(Q−XA′)′(Q−XA′) +m(p− r•). (14)

This shows that σ(X;Y1, · · · , Ym) ≥ m(p− r•) and the loss is equal to this lower bound if wecan choose the Zj such that Q is of rank p. In fact, by minimizing (14) over X and A we see

6 Homals in R

that

σ(Z) ∆= minX,A

σ(X;Z;A) =r?∑

s=p+1

λ2s(Z) +m(p− r•), (15)

where the λs are the ordered singular values. A corresponding example in terms of a lossplotis given in Section 4.

Now we will take into account the scale level of the variables in terms of restrictions withinZj . To do this, the starting point is to split up Equation 14 into two separate terms (Gifi1990; Michailidis and de Leeuw 1998). Using Yj = D−1

j G′jX this leads to

∑mj=1 tr(X −GjYj)′Mj(X −GjYj)

=∑m

j=1 tr(X −Gj(Yj + (Yj − Yj)))′Mj(X −Gj(Yj + (Yj − Yj)))

=∑m

j=1 tr(X −Gj Yj)′Mj(X −Gj Yj) +∑m

j=1 tr(Yj − Yj)′Dj(Yj − Yj). (16)

Obviously, the rank restrictions Yj = ZjA′j affect only the second term and hence, we will

proceed on our explanations by regarding this particular term only, i.e.,

σ(Z;A) =m∑

j=1

tr(ZjA′j − Yj)′Dj(ZjA

′j − Yj). (17)

Now, level constraints for nominal, ordinal, and numerical variables can be imposed on Zj inthe following manner. For nominal variables, all columns in Zj are unrestricted. Equation 17is minimized under the conditions u′DjZj = 0, Z ′jDjZj = I, and u′DjYj = 0. The stationaryequations are

Aj = Y ′jDjZj , (18a)

YjAj = ZjW + uh′, (18b)

with W as a symmetric matrix of Langrange multipliers. Solving we find

h =1

u′DjuA′jY

′jDju = 0, (19)

and thus, letting Zj∆=D

1/2j Zj and Y j

∆=D1/2j Yj , it follows that

Y jY′jZj = ZjW. (20)

If Y j = KΛL′ is the SVD of Y j , then we see that Zj = KrO with O an arbitrary rotationmatrix. Thus, Zj = D

−1/2j KrO, and Aj = Y

′jZj = LrΛrO. Moreover, ZjA

′j = D

−1/2j KrΛrL

′r.

Having ordinal variables, the first column of Zj is constrained to be either increasing ordecreasing, the rest is free. Again (17) has to be minimized under the condition Z ′jDjZj = I(and optionally additional conditions on Zj). If we minimize over Aj , we can also solve theproblem tr(Z ′jDjYjY

′jDjZj) with Z ′jDjZj = I.

Journal of Statistical Software 7

In the case of numerical variables, the first column in Zj denoted by zj0 is fixed, the rest isfree. Hence, the loss function in (17) changes to

σ(Z,A) =m∑

j=1

tr(ZjA′j + zj0a′j0 − Yj)′Dj(ZjA

′j + zj0a′j0 − Yj). (21)

Since column zj0 is fixed, Zj is a kj × (rj − 1) matrix and Aj , with aj0 as the first column, isp× (rj − 1). As minimization condition z′j0DjZj = 0 is required.

Note that level constraints can be imposed additionally to rank constraints. If the data sethas variables with different scale levels, homals allows for setting level constraints for eachvariable j separately.

3.2. Nonlinear canonical correlation analysis

In Gifi terminology, nonlinear canonical correlation analysis (NLCCA) is called “OVERALS”(van der Burg, de Leeuw, and Verdegaal 1988; van der Burg, de Leeuw, and Dijksterhuis1994). This is due the fact that it has most of the other Gifi-models as special cases. In thissection the relation to homogeneity analysis is shown. The homals package allows for thedefinition of sets of variables and thus, for the computation NLCCA between g = 1, . . . ,Ksets of variables.

Recall that the aim of homogeneity analysis is to find p orthogonal vectors in m indicatormatrices Gj . This approach can be extended in order to compute p orthogonal vectors inK general matrices Gv, each of dimension n × mv where mv is the number of variables(j = 1, . . . ,mv) in set v. Thus,

Gv∆=

[Gv1

... Gv2

... · · ·... Gvmv

]. (22)

The loss function can be stated as

σ(X;Y1, . . . , YK) ∆=1K

K∑v=1

tr

X − mv∑j=1

GvjYvj

′Mv

X − mv∑j=1

GvjYvj

. (23)

X is the n×p matrix with object scores, Gvj is n×kj , and Yvj is the kj×p matrix containingthe coordinates. Missing values are taken into account in Mv which is the element-wiseminimum of the Mj in set v. The normalization conditions are XM•X = I and u′M•X = 0where M• is the average of Mv.

Since NLPCA can be considered as special case of NLCCA, i.e., for K = m, all the additionalrestrictions for different scaling levels can straightforwardly be applied for NLCCA. Unlikeclassical canonical correlation analysis, NLCCA is not restricted onto two sets of variables butallows for the definition of an arbitrary number of sets. Furthermore, if the sets are treatedin an asymmetric manner predictive models such as regression analysis and discriminantanalysis can be emulated. For v = 1, 2 sets this implies that G1 is n× 1 and G2 is n×m− 1.Corresponding examples will be given in Section 4.2.

8 Homals in R

3.3. Cone restricted SVD

In this final methodological section we show how the loss functions of these models can besolved in term of cone restricted SVD. All the transformations discussed above are projectionson some convex cone Kj . For the sake of simplicity we drop the j and v indexes and we lookonly at the second term of the partitioned loss function (see Equation 17), i.e.,

σ(Z,A) = tr(ZA′ − Y )′D(ZA′ − Y ), (24)

over Z and A, where Y is k × p, Z is k × r, and A is p × r. Moreover the first column z0

of Z is restricted by z0 ∈ K, with K as a convex cone. Z should also satisfy the commonnormalization conditions u′DZ = 0 and Z ′DZ = I.

The basic idea of the algorithm is to apply alternating least squares with rescaling. Thuswe alternate minimizing over Z for fixed A and over A for fixed Z. The “non-standard”part of the algorithm is that we do not impose the normalization conditions if we minimizeover Z. We show below that we can still produce a sequence of normalized solutions with anon-increasing sequence of loss function values.

Suppose (Z, A) is our current best solution. To improve it we first minimize over the non-normalized Z, satisfying the cone constraint, and keeping A fixed at A. This gives Z and acorresponding loss function value σ(Z, A). Clearly,

σ(Z, A) ≤ σ(Z, A), (25)

but Z is not normalized. Now update Z to Z+ using the weighted Gram-Schmidt solutionZ = Z+S for Z where S is the Gram-Schmidt triangular matrix. The first column z0 of Zsatisfies the cone constraint, and because of the nature of Gram-Schmidt, so does the firstcolumn of Z+. Observe that it is quite possible that

σ(Z+, A) > σ(Z, A). (26)

This seems to invalidate the usual convergence proof, which is based on a non-increasingsequence of loss function values. But now also adjust A to A = A(S−1)′. Then ZA′ = Z+A

′,and thus

σ(Z, A) = σ(Z+, A). (27)

Finally compute A+ by minimizing σ(Z+, A) over A. Since σ(Z+, A+) ≤ σ(Z+, A) we havethe chain

σ(Z+, A+) ≤ σ(Z+, A) = σ(Z, A) ≤ σ(Z, A). (28)

In any iteration the loss function does not increase. In actual computation, it is not necessaryto compute A, and thus it also is not necessary to compute the Gram-Schmidt triangularmatrix S.

Journal of Statistical Software 9

4. The R package homals

At this point we show how the models described in the sections above can be computed usingthe package homals in R (R Development Core Team 2007) available on CRAN.

The core routine of the package is homals. The extended models can be fitted through ap-propriate settings on the parameters sets, rank, and level. An object of class "homals" iscreated and the following methods are provided: print, summary, plot, plot3d, scatter-plot3d and predict.

The packages offers a wide variety of plots; some of them are discussed in Michailidis andde Leeuw (1998) and Michailidis and de Leeuw (2001). In the plot method the user can specifythe type of plot through the argument plot.type. For some plot types three-dimensionalversions are provided in plot3d (dynamic) and plot3dstatic:

• Object plot ("objplot"): Plots the scores of the objects (rows in data set) on two orthree dimensions.

• Category plot ("catplot"): Plots the rank-restricted category quantifications for eachvariable separately. Three-dimensional plot is available.

• Voronoi plot ("vorplot"): Produces a category plot with Voronoi regions.

• Joint plot ("jointplot"): The object scores and category quantifications are mappedin the same (two- or three-dimensional) device.

• Graph plot ("graphplot"): Like joint plot with additional connections of the scores/quantifications.

• Hull plot ("hullplot"): For each single variable the object scores are mapped onto twodimensions and the convex hull for each response category is drawn.

• Label plot ("labplot"): Similar to object plot, the object scores are plotted but foreach variable separately with the corresponding category labels. A three-dimensionalversion is provided.

• Span plot ("spanplot"): Like label plot, it maps the object scores for each variable andit connects them by the shortest path within each response category.

• Star plot ("starplot"): Again, the object scores are mapped on 2 or 3 dimensions. Inaddition, these points are connected with the category centroid.

• Loss plot ("lossplot"): Plots the rank-restricted category quantifications against theunrestricted for each variable separately.

• Projection plot ("prjplot"): For variables of rank 1 the category scores (two-dimensional)are projected onto a straight line determined by the rank restricted category quantifi-cations.

• Vector plot ("vecplot"): For variables of rank 1 the object scores (two-dimensional) areprojected onto a straight line determined by the rank restricted category quantifications.

• Transformation plot ("trfplot"): Plots variable-wise the original (categorical) scaleagainst the transformed (metric) scale Zj for each solution.

10 Homals in R

• Loadings plot ("loadplot"): Plots the loadings aj and connects them with the origin.Note that if rj > 1 only the first solution is taken.

4.1. Simple Homogeneity Analysis

The first example is a simple (i.e., no level or rank restrictions, no sets defined) three-dimensional homogeneity analysis for the senate data set (ADA 2002). The data consistsof 2001 senate votes on 20 issues selected by Americans for Democratic Action. The votesselected cover a full spectrum of domestic, foreign, economic, military, environmental andsocial issues. We tried to select votes which display sharp liberal/conservative contrasts. Asa consequence, Democrat candidates have much more “yes” responses than Republican can-didates. A full description of the items can be found in the corresponding package help file.The first column of the data set (i.e., 50 Republicans vs. 49 Democrats and 1 Independent)is inactive and will be used for validation.

> data(senate)

> res <- homals(senate, active = c(FALSE, rep(TRUE, 20)), ndim = 3)

> plot3d(res, plot.type = "objplot", sphere = FALSE, bgpng = NULL)

> plot(res, plot.type = "spanplot", plot.dim = c(1, 2), var.subset = 1)

> plot(res, plot.type = "spanplot", plot.dim = c(1, 3), var.subset = 1)

> plot(res, plot.type = "spanplot", plot.dim = c(2, 3), var.subset = 1)

> plot3dstatic(res, plot.type = "loadplot")

Figure 1 shows four “wings” of senators which we will denote by north, south, west andeast. The west and the north wing are composed by Republicans, the east and south wing byDemocrats. Note that the 3D-plot is rotated in a way that Dimension 3 is horizontally aligned,Dimension 2 is vertically aligned, and Dimension 1 is the one aback. The two-dimensionalslices show that Dimension 1 vs. 2 does not distinguish between Democrats and Republicans.If Dimension 3 is involved, as in the two bottom plots in Figure 1, the separation betweenDemocrats and Republicans is obvious. To distinguish within north-west and south-east,respectively, Item 19 has to be taken into account:

V19: S 1438. Military Base Closures. Warner (R-VA) motion to authorize an additionalround of U.S. military base realignment and closures in 2003. Motion agreed to 53-47.September 25, 2001. A “yes” vote is a +.

Republicans belonging to the north wing as well as Democrats belonging to the east winggave a “yes” vote. South-Wing Democrats and West-Wing Republicans voted with “no”. Itis well known that the response on this item mainly depends on whether there is a militarybase in the senator’s district or not: Those senators who have a military base in their districtdo not want to close it since such a base provides working places and is an important incomesource for the district. Hence, these are the determining factors and not the party affiliation.This result is underpinned by Figure 2 where Item 19 is clearly separated from the remainingitems.

Given a (multiple) homals solution, we can reconstruct the indicator matrix by assigning eachobject to the closest points of the variable.

Journal of Statistical Software 11

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

−0.02 0.00 0.02 0.04 0.06

−0.

04−

0.02

0.00

0.02

0.04

Span plot for Party

Dimension 1

Dim

ensi

on 2

Category (D)Category (I)Category (R)

Category (D)Category (I)Category (R)

Category (D)Category (I)Category (R)

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

−0.02 0.00 0.02 0.04 0.06

−0.

03−

0.02

−0.

010.

000.

010.

020.

03

Span plot for Party

Dimension 1

Dim

ensi

on 3

Category (D)Category (I)Category (R)

Category (D)Category (I)Category (R)

Category (D)Category (I)Category (R)

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

−0.04 −0.02 0.00 0.02 0.04

−0.

03−

0.02

−0.

010.

000.

010.

020.

03Span plot for Party

Dimension 2

Dim

ensi

on 3

Category (D)Category (I)Category (R)

Category (D)Category (I)Category (R)

Category (D)Category (I)Category (R)

Figure 1: 3D Object Plot and Span Plots for Senate Dataset

> p.res <- predict(res)

> p.res$cl.table$Party

preobs (D) (I) (R)(D) 49 1 0(I) 0 1 0(R) 0 8 41

From the classification table we see that 91% of the party affiliations are correctly classified.Note that in the case of such a simple homals solution it can happen that a lower dimensionalsolution results in a better classification rate than a higher dimensional. The reason is thatin simple homals the classification rate is not the criterion to be optimized.

12 Homals in R

Loadings plot

−0.05 0.00 0.05 0.10 0.15 0.20

−0.

2−

0.1

0.0

0.1

0.2

0.3

0.4

−0.15

−0.10

−0.05

0.00

0.05

0.10

0.15

Dimension 1

Dim

ensi

on 2

Dim

ensi

on 3

●●●

●●●●

●●

●●

●●

Party

V19

Figure 2: Loadings Plot for Senate Dataset

To show additional plotting features of the homals package we run a three-dimensional ho-mogeneity analysis on the mammals dentition dataset (Hartigan 1975). In this dataset dentalcharacteristics are used in the classification of mammals. The teeth are divided into fourgroups: incisors, canines, premolars, and molars. Within each group top and bottom teethare classified.

> data(mammals)

> res <- homals(mammals, ndim = 3)

> plot(res, plot.type = "graphplot")

> plot3dstatic(res, plot.type = "starplot", var.subset = 3, box = FALSE)

On the left hand side of Figure 3 we have a graph plot where the object scores are drawn asgreen stars and the category quantifications as red circles. The objects are connected withthe respective category responses in the dataset.

For the variable-wise star plot we pick out the top canines with zero canines coded as 1 andone canine coded as 2 (the full coding description can be found in package help file). Inthe star plot the object scores are connected with the corresponding category centroid. Foranimals with more than one canine the elk and the reindeer, which has the same object scores,are quite distant from the centroid. All other animals lie close around the category centroid.

4.2. Predictive Models and Canonical Correlation

The sets argument allows for partitioning the variables into sets in order to emulate canonicalcorrelation analysis and predictive models. As outlined above, if the variables are partitionedinto asymmetric sets of one variable vs. the others, we can put this type of homals model into

Journal of Statistical Software 13

−0.05 0.00 0.05 0.10

−0.

050.

000.

05

Graphplot

Dimension 1

Dim

ensi

on 2

● ●

Star plot for topcanines

−0.10 −0.05 0.00 0.05 0.10 0.15 0.20−0.

06−

0.04

−0.

02 0

.00

0.0

2 0

.04

0.0

6 0

.08

0.1

0

−0.10

−0.05

0.00

0.05

0.10

Dimension 1

Dim

ensi

on 2

Dim

ensi

on 3

1

2

●●

●●

●● ●

● ●

●●●●●●●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●●

elk

Figure 3: Graph plot and 3D star plot for Mammals data

a predictive modeling context. If not, the interpretation in terms of canonical correlation ismore appropriate.

To demonstrate this, we use the galo dataset (Peschar 1975) where data of 1290 school chil-dren in the sixth grade of elementary school in the city of Groningen (Netherlands) were col-lected. The variables are Gender, IQ (categorized into 9 ordered categories), Advice (teachercategorized the children into 7 possible forms of secondary education, i.e., Agr = agricultural;Ext = extended primary education; Gen = general; Grls = secondary school for girls; Man= manual, including housekeeping; None = no further education; Uni = pre-University) andSES (parent’s profession in 6 categories). In this example it could be of interest to predictAdvice from Gender, IQ, and SES.

> data(galo)

> res <- homals(galo, active = c(rep(TRUE, 4), FALSE), sets = list(c(1,

+ 2, 4), 3, 5))

> plot(res, plot.type = "vorplot", var.subset = 3)

> plot(res, plot.type = "labplot", var.subset = 2)

> predict(res)

Classification rate:Variable Cl. Rate %Cl. Rate

1 gender 0.5690 56.902 IQ 0.6333 63.333 advice 0.6318 63.184 SES 0.2907 29.075 School 0.0302 3.02

A rate of .6318 correctly classified teacher advices results. The Voronoi plot in Figure 4

14 Homals in R

shows the Voronoi regions for the same variable. A labeled plot is given for the IQs whichshows that on the upper half of the horseshoe there are mainly children with IQ-categories7-9. Distinctions between these levels of intelligence are mainly reflected by Dimension 1.For the lower horseshoe half it can be stated that both dimensions reflect differences in lowerIQ-categories.

−0.02 −0.01 0.00 0.01 0.02 0.03 0.04 0.05

−0.

04−

0.03

−0.

02−

0.01

0.00

0.01

0.02

Voronoi plot for advice

Dimension 1

Dim

ensi

on 2

Man

Man

Man

Man

ManMan

Gen

Ext

Man

Gen

Man

ManManManMan

Gen

ManMan

ManMan

ManManMan

Ext

Man

ManManMan

Gen

Man

Gen

Man

GenGen

Grls

Gen

Grls

Gen

Man

GenGenGen

UniUni

GenGen

UniUni

Gen

Man

Man

Uni

Man

Gen

Man

Man

Uni Uni

Gen

Gen

ManGen

Man

Grls

Gen

GrlsGrls

Man

Man

Gen

Man

Man

Gen

Gen

Man

Man

GenGen

Man

Gen

Uni

GenGen

Gen

Uni

Gen

GenGen

Man

Uni

ManMan

Gen

Gen

Uni UniGrls

Ext

Uni Uni

Ext

GenGen

Uni

Gen UniUni

Gen

Gen

Gen

Uni

Gen

Gen

GenMan

Uni Uni

Agr

GenGen

Gen

Man

Man

Ext

Man

Gen

Gen

UniUni

Gen GenGenGen

Gen

Man

Man

Man

Gen

Man

Ext

Uni

Man

UniGrls

Gen

GenGen

Man

Gen

Gen

Man

GenGen

Man

Uni

ManAgr

Gen

Gen

Man

ManManMan

Man

ManMan

Ext

Ext

Man

Man

Gen

ExtExt

ManMan

Man

Man

Ext

Gen

ManManMan

ExtExt

Man

Man

Gen

ManMan

Ext

Gen

GenGen

Man

Man

ExtExt

Gen

Man

Ext

Gen

Gen

Gen

Man

ManMan

Man

Grls

Man

Gen

Gen

ExtMan

Gen

Gen

Gen

Man

Gen

Ext

Uni

Man

Gen

Uni

Gen

Man

Man

ManManMan

Gen

Ext

Uni

Ext

Man

Man

Man

Man

Uni

Man

Gen

Gen

Gen

Man

Uni

Gen

Ext

Man

Gen

ManMan

Uni

Man

Gen

Man

Uni

Gen

Man

ManMan

Man

Uni

Gen

Uni

Ext

Ext

Ext

Gen

Man

Ext

Gen

GenGen

Man

Uni

Gen

Gen

Gen

Man

Man

Gen

Gen

Gen

Man

Gen

Gen

Man

Man

Uni

Gen

Man

Gen

Man

Uni

Gen

Man

Man

ManManMan

Ext

Man

Man

ManMan

Man

Uni

Man

Man

Man

Man

Uni

Ext

AgrExt

Man

Man

Man

Man

Man

Gen

Uni

Gen

Uni

Man

Man

GenGen

Uni

GenGen

Gen

Man

Gen

Man

Gen

Man

Gen GenGen

Man

Man

Gen

Man

Man

Gen

Uni

Gen

Gen

Uni UniUni

Man

Gen

Gen

Gen

Gen

GenUni

Man

ExtMan

Man

Gen

Uni

Man

Gen

ManMan

Uni

Ext

Man

Gen

Ext

Man

Ext

Gen

Man

Gen

Man

ManMan

Gen

Man

Man

ManMan

Gen

Ext

Man

Ext

Ext

Man

Gen

Man

Gen

Man

ManMan

ManMan

GenGen

Man

Ext

Gen

Gen

Gen

Man

Ext

ManMan

Man

ExtManMan

Man

Gen

Man

ManMan

Gen

ExtExt

Man

ManMan

ManMan

GenGenGen

Ext

Man

Gen

Gen

Gen

Man

Man

Man

Gen

Gen

Man

Uni

Ext

Uni

Man

Gen

Man

Gen

Man

Ext

Man

GenGen

Man

Ext

Gen

Gen

Ext

ManMan

Ext

ManManMan

Man

Gen

GenUni

Gen

Ext

Man

Man

Man

Gen Gen

Uni

ManManMan

Uni

Gen

Agr

ManMan

Gen

Man

Man

Gen

Gen

Man

Man

Man

Ext

GenGen

AgrMan

Gen

Ext

Man

Man

Uni

ManNone

Gen

Man

GenGenGen

None

Man

GenGen

Man

Man

Ext

Gen

ManMan

Gen

Man

Gen

Uni

Ext

Gen

Gen

Gen

Man

Man

Uni

GenGen

Uni

GenGenGen

UniUni

Ext

Gen

Man

Gen

Man

GenGen

Man

Ext

Ext

Man

Uni

None

ManMan

Man

Man

Gen

Man

Uni

Man

Gen

Gen

Man

Uni

ExtExt

Man

Man

Gen

Man

Man

Ext

Man

Uni

Gen

Man

Ext

Man

Uni

UniUni

Gen

Gen

Uni

Uni

Man

Gen

Uni

Gen

UniGen

Man

Man

Uni

Man

GenGen

Gen

Uni

GenGen

Gen

ManMan

Uni

Gen

Uni

Gen

Gen

Gen

Man

Uni

Gen

Ext

Gen

Gen

Gen

GenGen

Uni

GenGen

Uni UniUni

Gen

Man

Gen

Man

Gen

UniUni

Man

Gen

Uni

GenGen

Gen

Uni

Uni

GenGen

Man

Man

Uni

Man

ExtExt

Uni

Ext

UniGrls

Man

Uni

Gen

Uni

ExtExt

Gen

Ext

GenGen

Ext

GenGen

Ext

Man

Uni

Gen

Man

UniUni

Ext

GenGen

Ext

Gen

UniUni

Gen

Uni

Ext

Man

Grls

Gen

Gen

Uni

Gen

Gen

Gen

GrlsGrls Uni

Gen

Uni

Man

Gen

Uni

Man

Man

Man

GenGen

Ext

ExtMan

GenGen

Uni

Man

Ext

Ext

Uni

Ext

Gen

Man

Man

Man

Gen

Man Ext

Man

Man

Ext

Gen

Ext

Man

Uni

Man

Man

ExtExt

ManMan

ManMan

Man

Gen

Man

ManMan

Man

Gen

ManMan

Gen

Man

Gen

Gen

ManGen

Man

ManMan

Gen

Uni

Gen

Ext

Gen

Ext

Gen

Gen

ManMan

Gen

Ext

Man

Man

Gen

Gen

ManMan

Ext

Gen

ManManMan

Man

Ext

Gen

Man

Gen

Gen

Man

Ext

Man

ManMan

Man

Man

Gen

Man

ManMan

Gen

ManMan

ManManMan

ManMan

Man

Gen

Man

Ext

ManExt

Uni

Man

Ext

Uni

Gen

Man

Ext

Ext

Man

Uni

Man

Gen

ManManMan

Man

Gen

Gen

Man

Man

Ext

ManMan

Man

Gen

Man

UniUni

Man

Man

Gen

Gen

Gen

GenGen

Man

Gen

Man

Gen

Ext

Ext

Man

ManMan

Gen

Man

Man

Man

Man

ManMan

Uni

Man

GenMan

Ext

Ext

Ext

Man

Ext

Ext

ManMan

Gen

ManMan

Gen

Uni

Ext

Man

ExtMan

Ext

Man

Ext

Ext

Gen

Ext

None

Man

Man

Ext

Man

Ext

Ext

Ext

Man

Ext Ext

None

AgrManManMan

Agr

None

None

Man

None

None

GenGenGen

ManMan

Gen

Gen

Ext

Man

Gen

Man

Man

Man

Man

Man

Gen

Man

Gen

ManMan

Ext

Gen

Uni

Man

Gen

Man

ExtMan

Gen

Gen

GenGen

Gen

Gen

Man

ManMan

Gen

Gen

Uni

ManMan

Man

None

Ext

Man

Man

Man

Uni

Ext

Man

Man

Uni

ManMan

Gen

Gen

Man

Man

ManMan

Man

Man

Man

ManMan

Agr

Ext

Gen

ManMan

Man

Ext

Gen

ManMan

Man

Gen

Man

ManGen

GenGen Gen

Man

Gen

Man

Gen

Gen

Man

Man

GenGenGen

Gen

Man

Ext

Gen

Ext

Ext

Gen

Ext

Man

Ext

Man

Man

Man

Man

GenGen

Ext

GenGen

Man

Man

Gen

Gen

ManMan

Gen

Man

Uni UniUni

Gen

ManMan

Ext

Gen

UniUni

Man

GenGen

Gen

UniUni

Gen

Uni

Man

Gen Gen

Gen

Uni

Man

Man

Man

Gen

Uni

Man

Gen

Uni

Ext

Man

Man

Gen

Gen

Man

Uni

Gen

ManManAgr

Gen

Uni

Man

Ext

Man

Man

Uni

Ext

UniUni Uni

Ext

ExtExt

Gen

Uni

GenGen

Uni

Ext

Man

Uni

Gen

Grls

Uni

Gen

Ext

Uni

Uni

Ext

Ext

Grls

Uni

Ext

Ext

Ext

Uni

Ext

Grls

Ext

Ext

UniUni

GenUni

Agr

Ext

Grls

Gen

Uni

Man

GrlsGrls

Gen

UniUni

UniUni

Ext

Uni

Man

UniUni

Uni

Man

Uni

Gen

Gen Uni

Gen

ManMan

Uni

Ext

Gen

ManMan

GenGen

Man

Gen

Uni

Man

Gen

Gen

Gen

Ext

Uni

Man

Man

Uni

Gen

Man

Gen

Man

Man

GenGen GenGen

Man

Gen

Man

Gen

GenGenGen

Ext

Uni

ManMan

Uni

Gen

ManMan

Uni

Man

Gen

Man

Gen

Uni

Gen

GenUni

Uni UniUni

Man

Uni

Man

Gen

GenGen GenGen

Uni

GenGen

Gen

Gen

Man

ExtExt

Gen GenGen

Man

ManManMan

Gen

Gen

Gen

Agr

Ext

Ext

Man

Gen

Ext

Uni

ManMan

Ext

Man

Man

Man

Ext

Uni

Man

Gen

Ext

Gen

Man

Man

Uni

ManMan

Gen

Gen

ManManMan

Man

Man

GenGen

Man

Gen

Man

Man

Man

ManMan

Man

Gen

Man

Gen

ManManMan

Ext

AgrManMan

Ext

ManMan

Ext

Man

Man

Man

Uni

Man

Ext

Man

Gen

Ext

None

Ext

−0.02 0.00 0.02 0.04 0.06

−0.

04−

0.02

0.00

0.02

Labplot for IQ

Dimension 1

Dim

ensi

on 2

4

7

5

6

54

6

2

5

6

5

4444

6

55

44

555

3

3

555

7

4

5

5

66

7

5

6

6

5

666

8 8

66

77

6

6

5

8

6

6

6

5

7 8

6

7

53

5

7

6

76

6

5

6

4

3

6

4

5

1

76

5

7

7

77

4

7

6

5 5

5

8

44

7

6

8 86

4

7 9

4

7 7

9

7 96

5

7

5

8

5

6

46

7 8

5

66

6

5

6

3

5

4

7

97

6 777

5

4

5

5

5

5

5

7

6

77

6

76

5

7

7

1

67

4

6

66

7

4

3

555

4

5 5

2

4

4

5

5

33

44

3

4

4

5

555

33

5

3

6

55

4

5

66

3

4

33

4

3

3

6

5

4

5

44

4

5

5

7

4

5 3

4

3

4

5

6

3

7

5

7

7

5

5

4

5 55

5

1

6

3

4

5

3

5

7

5

5

4

5

5

8

7

3

5

5

54

8

4

7

7

7

7

5

44

5

7

6

7

3

4

3

7

6

3

5

45

5

7

7

4

3

4

2

5

7

6

4

6

5

4

5

7

6

4

6

3

7

6

4

3

555

4

4

5

44

3

8

5

3

4

3

8

2

45

5

6

5

5

4

7

9

7

8

6

5

6 6

9

67

6

5

7

5

6

5

6 76

5

4

6

4

4

8

7

5

8

7 87

5

6

6

6

5

7 7

6

75

5

8

7

5

6

54

7

3

5

6

3

5

3

6

4

5

4

55

5

5

3

45

6

4

5

3

4

5

6

5

7

4

55

55

76

5

3

6

5

7

5

3

45

5

544

3

6

3

44

6

33

5

33

55

6 66

3

3

7

4

6

5

7

4

5

6

5

6

3

7

5

6

4

5

4

3

4

76

4

2

5

6

3

55

2

444

5

6

77

5

3

4

6

5

6 8

8

555

9

7

6

55

6

5

3

5

6

5

6

3

4

55

44

7

3

4

5

8

15

7

4

776

3

5

66

4

5

4

5

34

7

3

6

8

3

5

7

5

4

5

7

66

7

555

87

3

6

7

6

5

55

4

3

1

5

9

4

55

4

5

6

3

7

5

5

6

5

8

33

5

7

6

6

4

3

6

8

6

4

3

5

7

76

8

4

7

8

5

4

6

6

76

6

5

7

6

55

7

7

67

6

55

8

5

6

6

5

6

5

7

5

4

6

3

5

66

9

66

7 77

5

4

4

5

5

77

5

5

7

44

5

8

6

66

4

5

8

6

44

8

4

97

3

9

4

8

44

5

4

7 7

4

44

5

6

8

5

7

96

3

55

5

6

76

6

9

3

5

7

7

5

7

6

4

8

77 9

5

8

5

7

7

5

6

4

77

3

54

66

7

4

3

6

7

2

6

4

2

4

5

5 6

5

4

3

7

5

5

7

4

2

22

44

44

5

5

5

33

4

6

44

7

4

6

5

65

5

44

5

7

5

5

6

5

5

6

55

6

3

5

4

6

5

44

5

6

444

3

3

5

5

6

7

5

3

4

55

6

5

7

5

44

6

55

445

55

6

6

5

6

35

7

6

4

8

7

6

3

5

5

7

5

6

555

4

6

5

5

4

3

44

5

7

5

87

5

5

7

7

5

66

6

5

3

5

3

6

4

33

6

5

4

5

4

55

8

5

46

4

5

3

4

5

5

33

7

44

6

8

4

4

54

3

5

3

4

7

3

3

5

3

3

4

4

5

3

4

4 4

3

4555

5

4

4

4

2

3

777

44

5

6

3

4

5

6

3

5

4

5

6

7

7

54

4

7

8

4

6

5

43

7

5

77

6

7

7

55

7

5

9

66

3

4

4

5

4

5

8

2

4

5

7

44

5

6

5

6

55

3

4

5

45

6

2

7

67

5

3

7

54

5

5

4

53

66 6

6

8

5

7

7

3

4

555

6

5

3

6

5

3

5

5

5

2

5

3

5

3

66

4

55

3

5

6

5

55

7

4

7 97

7

55

3

6

7 7

5

66

5

88

6

8

5

5 5

7

8

5

6

5

5

8

4

6

9

6

5

6

5

6

5

7

5

554

6

9

5

3

4

5

8

6

77 8

4

55

7

8

66

6

7

5

8

6

5

7

7

7

8

8

4

6

5

7

5

4

5

6

6

7

8

4

77

65

5

5

6

6

7

4

67

5

76

77

4

6

3

88

7

5

9

6

7 7

6

54

8

4

6

55

66

4

4

7

5

5

6

5

4

7

5

4

7

6

5

6

5

6

66 77

4

6

5

6

555

5

8

45

8

6

55

7

5

6

5

5

7

6

78

7 87

4

7

5

7

6 6 76

7

66

5

6

4

43

5 55

5

455

5

4

7

6

4

2

5

5

3

7

55

5

5

6

4

5

6

5

6

4

5

5

6

8

45

5

6

444

5

3

66

4

7

5

3

5

44

5

6

5

6

444

4

4 45

4

54

3

5

6

4

8

5

3

6

7

3

2

3

Figure 4: Voronoi Plot and Label Plot for Galo Data

Using the classical iris dataset, the aim is to predict Species from Petal/Sepal Length/Width.The polynomial level constraint is posed on the predictors and the response is treated asnominal. A hull plot for the response, a label plot Petal Length and loss plots for all predictorsare produced.

> data(iris)

> res <- homals(iris, sets = list(1:4, 5), level = c(rep("polynomial",

+ 4), "nominal"), rank = 2, itermax = 2000)

> plot(res, plot.type = "hullplot", var.subset = 5, cex = 0.7)

> plot(res, plot.type = "labplot", var.subset = 3, cex = 0.7)

> plot(res, plot.type = "lossplot", var.subset = 1:4, cex = 0.7)

For this two-dimensional homals solution, 100% of the iris species are correctly classified.The hullplot in Figure 5 shows that the species are clearly separated on the two-dimensionalplane. In the label plot the object scores are labeled with the response on Petal Length andit becomes obvious that small lengths form the setosa “cluster”, whereas iris virginica arecomposed by obervations with large petal lengths. Iris versicolor have medium lengths.

The loss plots in Figure 6 show the fitted rank-2 solution (red lines) against the unrestrictedsolution. The implication of the polynomial level restriction for the fitted model is obvious.

To show another homals application of predictive (in this case regression) modeling we use theNeumann dataset (Wilson 1926): Willard Gibbs discovered a theoretical formula connecting

Journal of Statistical Software 15

−0.05 0.00 0.05 0.10

−0.

10−

0.05

0.00

0.05

0.10

Hullplot for Species

Dimension 1

Dim

ensi

on 2

setosasetosa

setosa

setosa

setosa

setosa

setosa

versicolorversicolorversicolor

versicolorversicolor

versicolorversicolor

virginica

virginica

virginicavirginica

virginicavirginica

virginica

virginica

virginica

−0.05 0.00 0.05 0.10

−0.

10−

0.05

0.00

0.05

0.10

Labplot for Petal.Length

Dimension 1D

imen

sion

2

1.41.4

1.3

1.5

1.4

1.7

1.41.51.4

1.51.51.6

1.4

1.1

1.2

1.5

1.31.4

1.7

1.51.71.5

1

1.7

1.91.6

1.6

1.51.41.61.6

1.5

1.5

1.4

1.5

1.21.3

1.4

1.31.51.3

1.3

1.3

1.6

1.91.4

1.61.41.51.4

4.7

4.54.9

44.6

4.54.73.3

4.6

3.93.5 4.2

4

4.73.6

4.4

4.5

4.1

4.5

3.9

4.8

4

4.94.7

4.34.4 4.8

54.53.53.8

3.73.9

5.14.5

4.5

4.7

4.4

4.1

4

4.44.6

4

3.34.2

4.24.2

4.3

3

4.1

6

5.1

5.95.6

5.86.6

4.5

6.3

5.8

6.1

5.1 5.3

5.5

5

5.15.3

5.5

6.7 6.9

5

5.7

4.9

6.7

4.9

5.7

6

4.8

4.9

5.6

5.8

6.1

6.4 5.6

5.1

5.6

6.1

5.6

5.5

4.8

5.4

5.6

5.15.1

5.95.7

5.2

5

5.2

5.4

5.1

Figure 5: Hullplot and Label Plot for Iris Data

the density, the pressure, and the absolute temperature of a mixture of gases with convertiblecomponents. He applied this formula and the estimated constants to 65 experiments carriedout by Neumann, and he discusses the systematic and accidental divergences (residuals).In homals such a linear regression problem can be emulated by setting numerical levels.Constraining the levels to be ordinal, we get a monotone regression (Gifi 1990).

> data(neumann)

> res.lin <- homals(neumann, sets = list(3, 1:2), level = "numerical",

+ rank = 1)

> res.mon <- homals(neumann, sets = list(3, 1:2), level = "ordinal",

+ rank = 1)

> plot(res.lin, plot.type = "loadplot", main = "Loadings Plot Linear Regression")

> plot(res.mon, plot.type = "loadplot", main = "Loadings Plot Monotone Regression")

The points in the loadings plot in Figure 7 correspond to regression coefficients.

4.3. NLPCA on Roskam data

Roskam (1968) collected preference data where 39 psychologists ranked all nine areas (seeTable 1) of the Psychology Department at the University of Nijmengen.Using this data set we will perform two-dimensional NLPCA by restricting the rank onto 1.Note that the objects are the areas and the variables are the psychologists. Thus, the inputdata structure is a 9× 39 data frame. Note that the scale level is set to “ordinal”.

> data(roskam)

> res <- homals(roskam, rank = 1, level = "ordinal")

> plot(res, plot.type = "objplot")

> plot(res, plot.type = "labplot", var.subset = 2, main = "Labelplot Rater 2")

16 Homals in R

●●

●● ● ● ● ●

●●

−0.10 −0.05 0.00 0.05

−0.

10−

0.05

0.00

0.05

Lossplot for Sepal.Length

Dimension 1

Dim

ensi

on 2

4.3

4.4

4.5

4.6

4.7

4.8

4.9

5

5.1

5.25.3

5.45.5

5.65.75.85.9 6 6.16.26.36.4

6.56.6

6.76.8

6.97

7.1

7.2

7.3

7.4

7.6

7.7

7.9

●●●

●●

●●

●●

●●●

●●

●●

4.3

4.4

4.5

4.64.74.8

4.95

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

66.1

6.2

6.36.4

6.5

6.6

6.76.86.9

7

7.17.2

7.3

7.4

7.67.7

7.9

−0.10 −0.05 0.00 0.05

−0.

050.

000.

05

Lossplot for Sepal.Width

Dimension 1

Dim

ensi

on 2

2

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3

3.1

3.2

3.3

3.4

3.5

3.6

3.7

3.83.9

44.1

4.24.4

●●

●●

● ●

2

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

33.1

3.2

3.33.4

3.5

3.6

3.7 3.8

3.9

4

4.1

4.2

4.4

●●

●● ● ● ● ● ●

●●

−0.05 0.00 0.05 0.10

−0.

050.

000.

050.

100.

15

Lossplot for Petal.Length

Dimension 1

Dim

ensi

on 2

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.9

3

3.33.5

3.63.7

3.83.9

4 4.14.24.34.44.54.64.74.84.9

55.1

5.25.3

5.4

5.5

5.6

5.7

5.8

5.9

6

6.1

6.3

6.4

6.6

6.7

6.9

●●●

●●

●●●●●

●●

● ●

●●

●●

●1

1.1

1.2

1.31.41.51.61.71.9

33.33.53.63.73.83.9

4

4.14.2

4.34.4

4.5

4.64.7

4.84.9

5

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

6

6.1

6.3

6.4

6.6

6.7

6.9

●● ●

−0.05 0.00 0.05 0.10

−0.

050.

000.

050.

10Lossplot for Petal.Width

Dimension 1

Dim

ensi

on 2

0.1

0.2

0.3

0.4

0.5

0.6

1

1.11.2

1.3 1.4 1.51.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

●●

●● ●

●●

●●

0.1

0.2

0.30.4

0.50.6

11.11.21.3

1.41.5

1.6

1.7

1.81.9

2

2.1

2.2

2.3

2.4

2.5

Figure 6: Loss plots for Iris Predictors

The object plot in Figure 8 shows interesting rating “twins” of departmental areas: mathe-matical and experimental psychology, industrial psychology and test construction (which areclose to the former two areas), educational and social psychology, clinical and cultural psy-chology. Physiological and animal psychology are somewhat separated from the other areas.The label plot allows to look closer at a particular rater; we pick out rather #2. Obviouslythis rater is attracted to areas like social, cultural and clinical psychology rather than tomethodological fields. Further analyses of this dataset within a PCA context can be found inde Leeuw (2006).

Journal of Statistical Software 17

−0.5 0.0 0.5

−0.

8−

0.6

−0.

4−

0.2

0.0

Loadings Plot Linear Regression

Dimension 1

Dim

ensi

on 2

temperature

pressure

density

−0.5 0.0 0.5

−0.

8−

0.6

−0.

4−

0.2

0.0

Loadings Plot Monotone Regression

Dimension 1D

imen

sion

2

temperature

pressure

density

Figure 7: Loading Plots for Neumann Regression

SOC Social PsychologyEDU Educational and Developmental PsychologyCLI Clinical Psychology

MAT Mathematical Psychology and Psychological StatisticsEXP Experimental PsychologyCUL Cultural Psychology and Psychology of ReligionIND Industrial PsychologyTST Test Construction and ValidationPHY Physiological and Animal Psychology

Table 1: Psychology Areas in Roskam Data.

5. Discussion

In this paper theoretical foundations of the methodology used in homals are elaborated aswell as packge application and visualization issues are presented. Basically, homals coversthe models described in Gifi (1990): Homogeneity analysis, NLCCA, predictive models, andNLPCA. It can handle missing data and the scale level of the variables can be taken intoaccount. The package offers a broad variety of real-life datasets and furthermore providesnumerous methods of visualization; either in a two-dimensional or in a three-dimensionalway. To conclude, homals provides flexible easy-to-use routines which allow researchers fromdifferent areas to compute, interpret, and visualize models belonging to the Gifi-family.

18 Homals in R

−0.05 0.00 0.05 0.10

−0.

050.

000.

050.

10

Object score plot

Dimension 1

Dim

ensi

on 2

SOCEDU

CLI

MATEXP

CUL

IND

TST

PHY

−0.05 0.00 0.05 0.10

−0.

050.

000.

050.

10

Labelplot Rater 2

Dimension 1D

imen

sion

2

13

2

76

4

5

8

9

Figure 8: Plots for Roskam data

References

ADA (2002). “Voting Record: Shattered Promise of Liberal Progress.” ADA Today, 57(1),1–17.

Benzecri JP (1973). Analyse des Donnees. Dunod, Paris, France.

de Leeuw J (2006). “Nonlinear Principal Component Analysis and Related Techniques.” InM Greenacre, J Blasius (eds.), “Multiple Correspondence Analysis and Related Methods,”pp. 107–134. Chapman & Hall/CRC, Boca Raton, FL.

de Leeuw J, Michailides G, Wang D (1999). “Correspondence Analysis Techniques.” InS Ghosh (ed.), “Multivariate Analysis, Design of Experiments, and Survey Sampling,” pp.523–546. Dekker, New York.

Gifi A (1990). Nonlinear Multivariate Analysis. Wiley, Chichester, England.

Greenacre M (1984). Theory and Applications of Correspondence Analysis. Academic Press,London, England.

Greenacre M, Blasius J (2006). Multiple Correspondence Analysis and Related Methods. Chap-man & Hall/CRC, Boca Raton, FL.

Hartigan JA (1975). Clustering Algorithms. Wiley, New York.

Mair P, Hatzinger R (2007). “Psychometrics Task View.” R-News, 7/2, in press.

Michailidis G, de Leeuw J (1998). “The Gifi System of Descriptive Multivariate Analysis.”Statistical Science, 13, 307–336.

Journal of Statistical Software 19

Michailidis G, de Leeuw J (2001). “Data Visualization through Graph Drawing.” Computa-tional Statistics, 16, 435–450.

Nenadic O, Greenacre M (2006). “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package.” Journal of Statistical Software, 20(3), 1–13.

Peschar JL (1975). School, Milieu, Beroep. Tjeek Willink, Groningen, The Netherlands.

R Development Core Team (2007). R: A Language and Environment for Statistical Comput-ing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URLhttp://www.R-project.org.

Roskam E (1968). Metric Analysis of Ordinal Data in Psychology (PhD Thesis). Universityof Leiden, Leiden, The Netherlands.

van der Burg E, de Leeuw J, Dijksterhuis G (1994). “OVERALS: Nonlinear Canonical Cor-relation with k Sets of Variables.” Computational Statistics & Data Analysis, 18, 141–163.

van der Burg E, de Leeuw J, Verdegaal R (1988). “Homogeneity Analysis with k Sets ofVariables: An Alternating Least Squares Method with Optimal Scaling Factors.” Psy-chometrika, 53, 177–197.

Wilson EB (1926). “Empiricism and Rationalism.” Science, 64, 47–57.


Recommended