+ All Categories
Home > Documents > A Local Indicator of Multivariate Spatial Association: Extending...

A Local Indicator of Multivariate Spatial Association: Extending...

Date post: 22-Jan-2021
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
23
A Local Indicator of Multivariate Spatial Association: Extending Geary’s c. * Luc Anselin Center for Spatial Data Science University of Chicago March 17, 2017 Abstract This paper extends the application of the Local Geary c statistic to a multivariate context. The statistic is conceptualized as a weighted distance in multivariate attribute space between an observation and its geographical neighbors. Inference is based on a conditional permutation approach. The interpretation of significant univariate Local Geary statistics is clarified and the differences with a multivariate case outlined. An empirical illustration uses Guerry’s classic data on moral statistics in 1830s France. Keywords: spatial clusters, LISA, multivariate spatial association, Local Geary c, spatial data science * This research was funded in part by Award 1R01HS021752-01A1 from the Agency for Healthcare Research and Quality (AHRQ), “Advancing spatial evaluation methods to improve healthcare efficiency and quality.” Comments by Julia Koschinsky on an earlier draft are greatly appreciated.
Transcript
Page 1: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

A Local Indicator of Multivariate Spatial Association:Extending Geary’s c.∗

Luc Anselin

Center for Spatial Data ScienceUniversity of Chicago

March 17, 2017

Abstract

This paper extends the application of the Local Geary c statistic to a multivariate context.The statistic is conceptualized as a weighted distance in multivariate attribute space betweenan observation and its geographical neighbors. Inference is based on a conditional permutationapproach. The interpretation of significant univariate Local Geary statistics is clarified and thedifferences with a multivariate case outlined. An empirical illustration uses Guerry’s classicdata on moral statistics in 1830s France.Keywords: spatial clusters, LISA, multivariate spatial association, Local Geary c, spatial datascience

∗This research was funded in part by Award 1R01HS021752-01A1 from the Agency for Healthcare Research andQuality (AHRQ), “Advancing spatial evaluation methods to improve healthcare efficiency and quality.” Commentsby Julia Koschinsky on an earlier draft are greatly appreciated.

Page 2: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

1 IntroductionAn important and growing component of geographical analysis is a focus on the local, reflectedin new methods to deal with local spatial autocorrelation and local spatial heterogeneity (forgeneral overviews, see, e.g., Unwin, 1996; Fotheringham, 1997; Unwin and Unwin, 1998; Fother-ingham and Brunsdon, 1999; Boots and Okabe, 2007; Lloyd, 2010). Specifically, considerableinterest has been devoted to local indicators of spatial association (LISA) since the originalLISA framework was outlined in Anselin (1995, 1996), building upon the initial work by Getisand Ord (1992, 1996), and Ord and Getis (1995).

The idea of a local test for spatial autocorrelation has been extended in multiple directions,such as applications to categorical data (Boots, 2003, 2006), points on networks (Yamada andThill, 2007), the construction of optimal spatial weights (Getis and Aldstadt, 2004; Aldstadtand Getis, 2006), as well as space-time and income mobility (Rey, 2016). Considerable attentionhas been paid to problems of statistical inference, both exact and asymptotic, as well as morefundamental issues of multiple comparisons and correlated tests. For example, Sokal et al.(1998) examined the properties of asymptotic approximations based on analytical moments,whereas Tiefelsdorf (2002) developed a saddlepoint approximation to exact inference. Themultiple comparison problem was discussed in general by de Castro and Singer (2006), andinvestigations into inference in the presence of global spatial autocorrelation are reported byOrd and Getis (2001) and Rogerson (2015). More technical issues have been considered as well,such as power calculations (Bivand et al., 2009), the design of optimal spatial weights (Rogerson,2010; Rogerson and Kedron, 2012), and conceptual and computational issues pertaining torandomization inference (Lee, 2009; Hardisty and Klippel, 2010). In addition, the Local Morantest and the Getis-Ord Gi statistics have been implemented in both commercial and open sourcespatial analytical software, such as GeoDa (Anselin et al., 2006), the spdep and other packagesin R (Bivand, 2006; Bivand et al., 2013), the PySAL Python library for spatial analysis (Rey andAnselin, 2007), ESRI’s ArGIS Spatial Analyst, and the online spatial analytical functionalityin Carto (https://carto.com/blog/cluster-outlier-intro).

Most of the discussion of local spatial autocorrelation has been situated in a univariatecontext. The treatment of spatial autocorrelation in a multivariate setting has focused on aglobal statistic, specifically Moran’s I. This started with the work by Wartenberg (1985) thatextended the notion of principal components to include spatial autocorrelation. This line ofthinking was further generalized by Dray et al. (2008) into the concept of MULTISPATI, whichadds a matrix of spatially lagged variables to the statistical triplet used in co-inertia analysis(see also Dray and Jombart, 2011). A different approach was taken in Lee (2001) specifically fora bivariate case, where a distinction is made between the correlative and the spatial associationbetween two variables.

The current paper has two objectives. One is a closer examination of the univariate LocalGeary statistic. This test was also proposed as part of the general LISA framework in Anselin(1995), but it has received little attention to date. Speciffically, the interpretation and visu-alization of this statistic are discussed in some detail, with the emphasis on its use as a dataexploratory tool rather than a statistical test in a strict sense. The second and main goal is toextend the univariate case to a multivariate setting, and to introduce a Local Geary statisticfor multivariate spatial autocorrelation. The statistic is outlined and its inference and inter-pretation are discussed in detail, again with an emphasis on its use in data exploration, ratherthan as a strict test statistic. The new statistics are illustrated with a local take on the analysisby Dray and Jombart (2011) of global multivariate spatial autocorrelation based on the classicdata set with “moral statistics of France,” attributed to an 1833 essay by André-Michel Guerry.The paper closes with some concluding remarks.

1

Page 3: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

2 Local Geary c RevisitedAs is well-known in the spatial analysis literature, Geary (1954) introduced a global measureof spatial autocorrelation as:

c =(n− 1)

∑i

∑j wij(xi − xj)

2

2S0

∑i(xi − x̄)2

, (1)

where xi is an observation on the variable of interest at location i, x̄ is its mean, n is the totalnumber of observations, and wij are the elements of the familiar spatial weights matrix, whichembodies a prior notion of the neighbor structure of the observations.1 The term S0 correspondsto the sum of all the weights (

∑i

∑j wij). The Geary c statistic can equivalently be expressed

as a ratio of two sums of squares, i.e., the squared difference between observations at i and jin the numerator, and the sum of squared deviations from the mean in the denominator:

c =

∑i

∑j wij(xi − xj)

2/2S0∑i(xi − x̄)2/(n− 1)

. (2)

Clearly, the denominator is an unbiased estimator for the variance. The numerator on theother hand is a rescaled sum of weighted squared differences. The factor 2 is included to centerthe expected value of the statistic under the null hypothesis of no spatial autocorrelation tothe value of 1 (not zero). Statistics smaller than one, indicating a small difference between anobservation and its neighbors, suggest positive spatial autocorrelation. Statistics larger thanone suggest negative spatial autocorrelation (large differences between an observation and itsneighbors).

Geary’s c statistic is reminiscent of the pairwise squared deviation measure that underliesthe empirical semi-variogram in geostatistics. However, there are two important differences.First, in the semi-variogram, all pairwise differences are considered, which results in n(n−1)/2estimates. In Geary’s c statistic the difference between an observation and its neighbors issummarized as a weighted average for each location (roughly nk̄ comparisons, with k̄ as theaverage number of neighbors) and yields a single statistic. Second, whereas in the variogramthe squared difference measure is sorted by the distance that separates the observation pairs,in Geary’s c, the neighbors are pre-defined through the spatial weights. In sum, the variogramfocuses on all pairs of observations, but Geary’s c provides a single measure for each individualobservation. Both approaches take the same perspective in the sense that small values of thestatistic suggest similarity, or positive spatial autocorrelation, with large values of the statisticsuggesting the reverse.

A local version of Geary’s c was outlined in Anselin (1995) as:

ci =∑j

wij(xi − xj)2. (3)

Since the squared deviations cancel out the mean, it is irrelevant whether the variable is ex-pressed on the original scale, or in standardized form, although in a multivariate setting, thelatter is the preferred practice. Also, a number of variants of this statistic can be defined,depending on which of the scaling constants are included. For example, an alternative form,also given in Anselin (1995) and further investigated by Sokal et al. (1998) includes a consistentestimate for the variance as a scaling factor:

ci = (1/m2)∑j

wij(xi − xj)2, (4)

1By convention, wii = 0, so that self-neighbors are excluded

2

Page 4: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

where m2 =∑

i(xi − x̄)2/n.2

The inclusion of the scaling factor only results in a monotone transformation of the valuein Equation 3, so it is easier to keep the simplest formulation. This expression is also the onlyaspect of the global Geary c that changes with each observation i, since both the denominator(the variance) and S0 are constants.

The analytical moments for the Local Geary ci were given in Sokal et al. (1998, p. 353),using the expression in Equation 4. More specifically, the expected value for the Local Gearyunder a randomization approach is shown to be:

E[ci] = 2nwi/(n− 1), (5)

where wi is the sum of the weights in row i, i.e.,∑

j wij . Some straightforward algebraicmanipulations yield the expected value of the expression in Equation 3 as:

E[ci] = 2nwim2/(n− 1). (6)

In the case of row-standardized weights, wi = 1. Futhermore, with standardized zi, E[m2] =(n− 1)/n. Substituting these results in the expression for the expected value yields:

E[ci] = 2n(n− 1)/[(n− 1)n] (7)= 2 (8)

With expressions for the expected value and the variance in hand an asymptotic approxi-mation can be developed, as shown in Sokal et al. (1998). However, these same authors alsocautioned that asymptotic inference based on these moments tends to fail. Instead, the ap-proach taken in practice is to use conditional permutation, as outlined in Anselin (1995). Thisconsists of creating a reference distribution for each individual location by randomly permutingthe remaining values (i.e., all observations except the value at location i) and recomputing thestatistic each time. Inference can then be based on a pseudo p-value of a one-sided test com-puted from the number of replicated statistics that are more extreme (either larger or smaller)than the observed local statistic. As is well known, the resulting pseudo p-values should beinterpreted with caution, since they suffer from multiple comparisons, the potential biasingeffect of global autocorrelation, and other such complicating factors (see, among others, thediscussion in Ord and Getis, 2001; de Castro and Singer, 2006; Rogerson, 2015).

However, even with these caveats in mind, the interpretation of a “significant” Local Gearystatistic is not straightforward. This contrasts with both the Getis-Ord Gi and G∗

i statisticsas well as the Local Moran. For the former, a positive and significant value indicates a hotspot or cluster of high values, whereas a negative and significant value suggests a cold spot, orcluster of low values (Getis and Ord, 1992, 1996; Ord and Getis, 1995). The interpretationof the Local Moran is facilitated in conjunction with the quadrants of the Moran Scatter plotand suggests spatial clusters (high-high, and low-low) as well as spatial outliers (high-low, andlow-high) (Anselin, 1995, 1996)

A significant ci statistic that is less than its expected value under the null hypothesis (eitherthe analytical value, or the average of the empirical reference distribution in a permutationapproach) suggests a clustering of similar values. Unlike what is the case for the Moran Scatterplot, there is no unambiguous differentiation of the type of association. Similar neighbors couldbe either similar high values (the counterpart of high-high in the Local Moran case) or similarlow values (the counterpart of low-low in the Local Moran case). However, they could alsoresult from two data points that span the mean (e.g., one above the mean and one below), but

2Note how this is a consistent estimator for the variance, but not an unbiased one. The unbiased estimator usedin the expression for the global Geary c divides the sum of squared deviations by n− 1.

3

Page 5: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

that are very close together in value. So, unlike the Local Moran case where there is a clearcategorization of the results, for the Local Geary statistic, we can distinguish three cases ofpositive local spatial autocorrelation: high-high, low-low, and other. For negative local spatialautocorrelation, the Local Geary statistic simply indicates a large (larger than under spatialrandomness) difference between neighboring values, without suggesting a particular high-low orlow-high pattern (due to the use of a squared difference as the criterion for attribute similarity,this distinction is not possible).

3 A Multivariate Extension

3.1 A Multivariate Local Geary StatisticThe Local Geary ci is a univariate statistic. In essence, it measures the squared distance inattribute space (i.e., along a line for the univariate case) between the value at a geographiclocation and that at each neighboring location (in geographic space), and summarizes this in theform of a weighted sum.3 In practice, since the spatial weights are typically row-standardized,this boils down to a weighted average of the squared distances in attribute space between anobservation and its geographic neighbors (as defined by a spatial weights matrix).

This concept can be extended in a straightforward manner to a multivariate context. Forexample, consider two variables, z1 and z2. Following standard practice in multivariate clus-tering analysis, these variables have been standardized such that the mean of the transformedvariable is zero and its variance is one. The squared distance d2ij in two-dimensional attributespace between the values at observation i and its geographic neighbor j is:

d2ij = (z1,i − z1,j)2 + (z2,i − z2,j)

2 (9)

A weighted average of this expression incorporating the squared distance in two-dimensionalattribute space between location i and all its geographic neighbors is then:∑

j

wijd2ij =

∑j

wij [(z1,i − z1,j)2 + (z2,i − z2,j)

2] (10)

=∑j

wij(z1,i − z1,j)2 +

∑j

wij(z2,i − z2,j)2 (11)

= c1,i + c2,i (12)

In other words, the concept of a Local Geary statistic is additive in the attribute dimension.In general then, for k attributes, a multivariate Local Geary can be defined as:

ck,i =

k∑v=1

cv,i, (13)

with cv,i as the Local Geary statistic for variable v. This measure corresponds to a weightedaverage of the squared distances in multidimensional attribute space between the values ob-served at a given geographic location i and those at its geographic neighbors. As an alternativeto the simple sum in Equation 13, the average could be taken, which would keep the scale ofthe multivariate measure in line with the univariate measures:

ck,i =

k∑v=1

cv,i/k. (14)

3Note that the squared distance is used to keep the similarity with the original formulation of Geary’s c, butinstead the distance, i.e., the square root of this expression, could be used equivalently.

4

Page 6: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

The expected value of the multivariate statistic under the randomization null hypothesisfollows as a direct extension of the univariate case given above. For the expression in Equa-tion 14, this remains E[ck,i] = 2, and for the unscaled version E[ck,i] = 2k. However, unlike theunivariate case, the derivation of the variance of the multivariate counterpart is quite complex,and not analytically tractactable, since the general variance-covariance among the variablesneeds to be accounted for (in addition to their spatial correlation). Given the poor resultsof an asymptotic approximation reported in the literature for the univariate case, the morepractical way to obtain inference should again be based on conditional permutation.

3.2 Inference and InterpretationA conditional permutation approach consists of holding the tuple of values observed at i fixed,and computing the statistic form permutations of the remaining tuples over the other locations.This results in an empirical reference distribution that represents a computational approachat obtaining the distribution of the statistic under the null. The resulting pseudo p-valuecorresponds to the fraction of statistics in the empirical reference distribution that are equalto or more extreme than the observed statistic.

Such an approach suffers from the usual problems of multiple comparisons mentioned for theunivariate case. Also, since the multivariate statistic is a sum of the statistics for the univariatecases, there is an additional complication of correlated tests. In some initial experiments, therewas some evidence that the number of extreme cases increased with the number of variablesincluded in the statistic. In order to take this into account, it seems that dividing the target p-value by the number of variables provides reasonable guidance and prevents too many locationsto be designated as significant. However, as in the univariate case, these pseudo p-values shouldonly be taken as providing some indication of interesting locations in a data exploration exercise,and they should not be interpreted in a strict sense. In practice, some sensitivity analysis istherefore in order.

As for the univariate Local Geary, the interpretation of a location with a “significant”statistic (in the limited sense outlined above) is more complex than for the Local Moran orGetis-Ord statistics. Since multiple variables are involved, the notion of a hot spot or cold spotis not necessarily meaningful. In low-dimensional comparisons, such as in a bivariate case, it ispossible to construct cross-classification of whether each individual variable is above or belowthe mean relative to its neighbors, but for higher dimensions, this quickly becomes unwieldy,resulting in many cells with zero elements. In an interactive exploration environment suchas GeoDa (Anselin et al., 2006), it is possible to combine a cluster map with the locations ofsignificant multivariate Geary with a brushing and linking operation on the respective quadrantsof the univariate Moran scatter plots, yielding some insight into the combinations involved.Overall, however, the statistic indicates a combination of the notion of distance in multi-attribute space with that of geographic neighbors. In a broader sense, this is similar to thetrade-off encountered in spatially constrained multivariate clustering methods (for a recentdiscussion, see, for example Grubesic et al., 2014).

Finally, it is important to keep in mind that, even though the multivariate statistic is thesum of the univariate statistics, it is not so that significant locations for the univariate casenecessarily translate into significant locations for the multivariate case.

4 Empirical IllustrationThe Local Geary statistics are illustrated by means of the classic data set with “moral statisticsof France,” attributed to an 1833 essay by André-Michel Guerry. The data set consists of acollection of observations for 86 French départements on a range of social indicators, including

5

Page 7: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

crime, literacy, child mortality, etc. The data is contained in the R package Guerry, developedby Michael Friendly and Stéphane Dray, available at https://CRAN.R-project.org/package=Guerry. The analysis by Guerry was pioneering in that it combined aspects of multivariatestatistical analysis with geographic visualization (see Friendly, 2007, for a detailed descriptionand examples of multivariate analyses). More recently, the data was re-analyzed in Dray andJombart (2011), who illustrated new concepts of multivariate global spatial autocorrelation,based on the inclusion of spatially lagged variables in what they refer to as co-inertial analysis(see also Dray et al., 2008).

The same six variables are considered as in Dray and Jombart (2011): Crimes AgainstPersons, Crimes Against Property, Literacy, Donations, Infant Mortality, and Suicides. Allvariables are expressed such that larger values are “better.” For example, rather than expressingCrimes Against Persons as the usual crime rate consisting of the ratio of crimes over population,the reverse is used, i.e., the ratio of population over crime. This operation is applied to the twoCrime variables, to Infant Mortality and to Suicides. In the analysis that follows, all variablesare also standardized, such that their mean equals zero and their variance equals one. Finally,as in Dray and Jombart (2011), Corsica (an island) is removed from the data, which results ina final set of 85 observations.

A summary overview of the spatial distribution of the data is given in Figure 1, whichcontains a box map for each of the six variables. A box map is a variant of a quartile map,with two extra categories to highlight outliers, similar to the visualization contained in a boxplot (Anselin, 1994).4 For each of the variables, there seems to be a high degree of regionalgrouping, suggesting positive global spatial autocorrelation (this is confirmed by the analysisof Moran’s I in Dray and Jombart, 2011). However, this spatial correlation is not matchedby a similarly strong bivariate correlation between the variables, as shown in Table 1. In fact,none of the correlations is particularly high, with the largest value 0.523, between PropertyCrime and Suicides. Literacy turns out to be negatively correlated with all the other variables.Several of the bivariate relationships are very weak, such as the correlation between CrimeAgainst Persons and Literacy (−0.021) as well as with Infant Mortality (−0.027), and betweenDonations and Suicides (−0.035).

[Figure 1 about here.]

[Table 1 about here.]

First, in order to put the Local Geary statistics into context, in Figure 2, a local cluster mapfor each variable is depicted based on the familiar Local Moran statistic.5 The four colors in themap correspond to two types of spatial clusters (high-high as red, and low-low as blue), and twotypes of spatial outliers (high-low as light rose, and low-high as light blue).6 The cluster mapshighlight regional differences between the variables. For example, Crime Against Propertyand Suicides have low-low clusters (i.e., high crime and suicides) in the north of the country,whereas the low-low clusters for Crime Against Persons and Donations are in the south. Thecluster map for Literacy clearly confirms the split between la France éclairée (North-East) andla France obscure also highlighted in Figure 4 of Dray and Jombart (2011).

[Figure 2 about here.]

4Note that the higher values for the variables labeled with “(inv)” denote “better outcomes.” For example, theupper outliers for Crimes Against Property indicate départements with an extremely low crime rate. This is alsothe case for Infant Mortality and Suicides. In contrast, for Donations, the high outliers indicate départements withhigher levels of donations.

5Significance is based on p < 0.05 from a conditional permutation approach with 999 permutations that uses thesame random seed to allow for replication.

6For details, see Anselin et al. (2006).

6

Page 8: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

The corresponding cluster maps for the Local Geary statistic are shown in Figure 3, withsignificance set at 0.05.7 Like the Local Moran cluster map, this cluster map also has four colorsfor the significant locations, but three of those pertain to positive local spatial autocorrelation:dark brown for high-high clusters, light brown for low-low clusters, and even lighter brownfor positive local autocorrelation that could not be assigned to a low or high category (in theMoran scatter plot sense). Negative local spatial autocorrelation is depicted in light blue.

[Figure 3 about here.]

[Table 2 about here.]

Overall, the maps show a close correspondence with those for the Local Moran, althoughthey are by no means identical. This suggests that the Local Geary provides additional insightinto local patterns due to the use of a different criterion for attribute similarity.8 This isfurther elaborated in Table 2, which shows the number of significant locations as well as theirmake up in terms of positive and negative autocorrelation for both univariate Local Moranand Local Geary maps. For three of the variables (Crimes Against Property, Donations andSuicides), the total number of significant locations is roughly the same. For Literacy (47for Geary vs 37 for Moran), Infant Mortality (23 for Geary vs 16 for Moran) and CrimeAgainst Persons (26 for Moran vs 20 for Geary), there are slight differences. The breakdownbetween positive and negative autocorrelation is very similar as well, with a tendency for slightlyfewer negative autocorrelations in Geary (except for Infant Mortality). For Crime AgainstPersons and Literacy, the Local Geary does not identify any negative spatial autocorrelation(for Literacy, this is also the case for the Local Moran).

While the overall significance is roughly the same, the particular locations that are signifi-cant show much less correspondence. In Table 3, the number of locations that are significantfor both local statistics are listed for each variable, as well as their make up between positiveand negative autocorrelation. The closest match is for Literacy, where almost all the locationsidentified in the Local Moran cluster map are also significant in the Local Geary cluster map(there is no negative spatial autocorrelation for this variable). In some sense, this is to be ex-pected, given the clear spatial stratification for this variable, well documented in the literature(see Dray and Jombart, 2011). For the other variables, the match for the significant locationsis less than perfect, with roughly about a third to half of the significant locations being thesame in both maps.

[Table 3 about here.]

Moving on to the bivariate case, the variables Crime Against Persons and Literacy are takenas an example (both lack negative spatial autocorrelation for the univariate Local Geary). Thelocations that are significant in both univariate Local Geary cluster maps for these variables areshown in Figure 4. There are 8 such locations. Compare this to the cluster map for the bivariateLocal Geary measure shown in Figure 5.9 All locations that are significant in both univariatemeasures are also significant for the bivariate measure. However, the bivariate measure clearlygoes beyond the simple matching of significant univariate locations and contains 38 significantlocations. The resulting bivariate map is thus a compromise that takes into account a trade-offbetween the similarity for each of the variables, in the sense that they do not need to be bothhighly similar.

7To facilitate comparison with the Local Moran maps as well as between the different variables, the same randomseed and the same number of permutations (999) is used in all procedures.

8Note that in practice, the Local Moran and the Getis-Ord local cluster maps tend to be near identical with theonly differences due to the lack of spatial outliers in Getis-Ord.

9Again, the same random seed is used as before. Significance is defined as p < 0.025 to account for the correlatedtests in the bivariate case.

7

Page 9: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

[Figure 4 about here.]

[Figure 5 about here.]

Finally, Figure 6 shows the significant locations for the multivariate Local Geary statisticcomputed for all six variables, with a modern map as backdrop.10 There are 34 significantlocations, organized in two major groups and three smaller ones. The first major group hasan L-shape form starting in Brittany and moving east towards Orléans, then turning northall the way to the border with Belgium. Interestingly, Paris (the département Seine) is not asignificant location in this group. The second larger grouping consists of départements in thesouth west, surrounding (but not including) Toulouse (the département Haute Garonne). Thethree smaller groups are two single départements, one located in the upper north-east, aroundStrasbourg (Bas-Rhin), the other consisting of the département of Haute-Loire, as well as athree département cluster in the south-east, in the Provence area around Toulon (Basses-Alpes,Var and Vaucluse). It should be kept in mind that the locations identified are significant in thesense that their multivariate profile is similar to that of their neighbors. A more meaningfulinterpretation of the spatial imprint of the “clusters” should therefore include the neighbors aswell.

[Figure 6 about here.]

Overall, the multivariate measure brings out patterns that are not obvious in its univariatecounterparts. Even taking all the univariate maps together, no locations are significant for allsix variables. In fact, four is the highest number of variables that share a significant locationin the univariate Local Geary. This is the case for 8 observations (15 share three variables, 31two, and 25 one – 6 locations are not significant for any variable). Again, this emphasizes thepoint that the multivariate measure of attribute similarity is not a simple extrapolation of theunivariate measures, but it involves complex trade-offs in all attribute dimensions considered.

5 ConclusionThe transition from a univariate setting to a multivariate context brings interesting challengesto the construction of measures of local spatial autocorrelation. Such statistics represent amathematical compromise between a measure of attribute similarity and an indication of lo-cational similarity. In one dimension, there are several candidates for attribute similarity, themost commonly used one being the cross product (e.g., in Moran’s I). In a multivariate setting,it seems more intuitive to use a concept related to the distance in attribute space (i.e., squareddifferences) between a point (an observation) and the points that correspond to its geographicneighbors. As is well known from the literature on contiguity constrained spatial clustering,neighbors in attribute space are not necessarily also neighbors in geographic space.

The generalization of the Local Geary c statistic to multiple variables is a way to formalizethe combination of attribute similarity and locational similarity. It turns out that the statisticis simply the sum of the individual local statistics for each variable. This corresponds with anotion of the average squared distance in multivariate attribute space to the observations thatare neighbors in geographic space, as formalized in a spatial weights matrix. The combinationof the different dimensions introduces trade-offs so that the resulting clusters provide insightsthat differ from the simple overlay of univariate statistics.

Further work is needed to more precisely assess inference and the power of the test, as wellas to evaluate computational issues when scaling to the size of data sets encountered in currentspatial data science.

10The significance used is p < (0.05 / 6).

8

Page 10: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

ReferencesAldstadt, J. and Getis, A. (2006). Using AMOEBA to create a spatial weights matrix and

identify spatial clusters. Geographical Analysis, 38:327–343.

Anselin, L. (1994). Exploratory spatial data analysis and geographic information systems. InPainho, M., editor, New Tools for Spatial Analysis, pages 45–54. Eurostat, Luxembourg.

Anselin, L. (1995). Local indicators of spatial association — LISA. Geographical Analysis,27:93–115.

Anselin, L. (1996). The Moran scatterplot as an ESDA tool to assess local instability inspatial association. In Fischer, M., Scholten, H., and Unwin, D., editors, Spatial AnalyticalPerspectives on GIS in Environmental and Socio-Economic Sciences, pages 111–125. Taylorand Francis, London.

Anselin, L., Syabri, I., and Kho, Y. (2006). GeoDa, an introduction to spatial data analysis.Geographical Analysis, 38:5–22.

Bivand, R. S. (2006). Implementing spatial data analysis software in R. Geographical Analysis,38:23–40.

Bivand, R. S., Muller, W., and Reder, M. (2009). Power calculations for global and localMoran’s I. Computational Statistics and Data Analysis, 53:2859–2872.

Bivand, R. S., Pebesma, E. J., and Gómez-Rubio, V. (2013). Applied Spatial Data Analysiswith R, Second Edition. Springer, New York, NY.

Boots, B. (2003). Developing local measures of spatial association for categorical data. Journalof Geographical Systems, 5:139–160.

Boots, B. (2006). Local configuration measures for categorical spatial data: Binary regularlattices. Journal of Geographical Systems, 8:1–24.

Boots, B. and Okabe, A. (2007). Local statistical spatial analysis: Inventory and prospect.International Journal of Geographical Information Science, 21:355–375.

de Castro, M. C. and Singer, B. H. (2006). Controlling the false discovery rate: An appli-cation to account for multiple and dependent tests in local statistics of spatial association.Geographical Analysis, 38:180–208.

Dray, S. and Jombart, T. (2011). Revisiting Guerry’s data: Introducing spatial constraints inmultivariate analysis. The Annals of Applied Statistics, 5(4):2278–2299.

Dray, S., Saïd, S., and Débias, F. (2008). Spatial ordination of vegetation data using a gen-eralization of Wartenberg’s multivariate spatial correlation. Journal of Vegetation Science,19:45–56.

Fotheringham, A. S. (1997). Trends in quantitative methods I: Stressing the local. Progress inHuman Geography, 21:88–96.

Fotheringham, A. S. and Brunsdon, C. (1999). Local forms of spatial analysis. GeographicalAnalysis, 31:340–358.

Friendly, M. (2007). A.-M. Guerry’s Moral Statistics of France: Challenges for multivariablespatial analysis. Statistical Science, 22(3):368–399.

9

Page 11: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Geary, R. (1954). The contiguity ratio and statistical mapping. The Incorporated Statistician,5:115–145.

Getis, A. and Aldstadt, J. (2004). Constructing the spatial weighs matrix using a local statistic.Geographical Analysis, 36:90–104.

Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use of distance statistics.Geographical Analysis, 24:189–206.

Getis, A. and Ord, J. K. (1996). Local spatial statistics: an overview. In Longley, P. and Batty,M., editors, Spatial Analysis: Modeling in a GIS Environment, pages 261–277. GeoInforma-tion International.

Grubesic, T. H., Wei, R., and Murray, A. T. (2014). Spatial clustering overview and comparison:Accuracy, sensitivity, and computational expense. Annals of the Association of AmericanGeographers, 104:1134–1156.

Hardisty, F. and Klippel, A. (2010). Analysing spatio-temporal autocorrelation with LISTA-Viz. International Journal of Geographical Information Science, 24:1515–1526.

Lee, S.-I. (2001). Developing a bivariate spatial association measure: An integration of Pear-son’s r and Moran’s I. Journal of Geographical Systems, 3:369–385.

Lee, S.-I. (2009). A generalized randomization approach to local measures of spatial association.Geographical Analysis, 41:221–248.

Lloyd, C. D. (2010). Local Models for Spatial Analysis, Second Edition. CRC Press, BocaRaton, FL.

Ord, J. K. and Getis, A. (1995). Local spatial autocorrelation statistics: Distributional issuesand an application. Geographical Analysis, 27:286–306.

Ord, J. K. and Getis, A. (2001). Testing for local spatial autocorrelation in the presence ofglobal autocorrelation. Journal of Regional Science, 41:411–432.

Rey, S. J. (2016). Space-time patterns of rank concordance: Local indicators of mobilityassociation with application to spatial income inequality dynamics. Annals of the AmericanAssociation of Geographers, 106:788–803.

Rey, S. J. and Anselin, L. (2007). PySAL, a Python library of spatial analytical methods. TheReview of Regional Studies, 37(1):5–27.

Rogerson, P. A. (2010). Optimal geograpahic scales for local spatial statistics. StatisticalMethods in Medical Research, 20:119–129.

Rogerson, P. A. (2015). Maximum Getis-Ord statistic adjusted for spatially autocorrelateddata. Geographical Analysis, 47:20–33.

Rogerson, P. A. and Kedron, P. (2012). Optimal weights for focused tests of clustering usingthe local moran statistic. Geographical Analysis, 44:121–133.

Sokal, R. R., Oden, N. L., and Thompson, B. A. (1998). Local spatial autocorrelation in abiological model. Geographical Analysis, 30:331–354.

Tiefelsdorf, M. (2002). The saddlepoint approximation of Moran’s I and local Moran’s Ii’sreference distribution and their numerical evaluation. Geographical Analysis, 34:187–206.

10

Page 12: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Unwin, A. (1996). Exploratory spatial analysis and local statistics. Computational Statistics,11:387–400.

Unwin, A. and Unwin, D. (1998). Exploratory spatial data analysis with local statistics. TheStatistician, 47(3):415–421.

Wartenberg, D. (1985). Multivariate spatial correlation: A method for exploratory geographicalanalysis. Geographical Analysis, 17:263–283.

Yamada, I. and Thill, J.-C. (2007). Local indicators of network-constrained clusters in spatialpoint patterns. Geographical Analysis, 39:268–292.

11

Page 13: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

List of Figures1 Guerry data box maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Guerry data Local Moran cluster maps . . . . . . . . . . . . . . . . . . . 143 Guerry data Univariate Local Geary cluster maps . . . . . . . . . . . . . 154 Overlap in univariate Local Geary – Crime and Literacy . . . . . . . . . 165 Bivariate Local Geary Map – Crime and Literacy . . . . . . . . . . . . . 176 Multivariate Local Geary Map – All six variables . . . . . . . . . . . . . 18

12

Page 14: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Crimes Against Persons (inv)

Low OutQ1Q2Q3Q4Up Out

Crime Against Property (inv)

Low OutQ1Q2Q3Q4Up Out

Literacy

Low OutQ1Q2Q3Q4Up Out

Donations

Low OutQ1Q2Q3Q4Up Out

Infant Mortality (inv)

Low OutQ1Q2Q3Q4Up Out

Suicides (inv)

Low OutQ1Q2Q3Q4Up Out

Figure 1: Guerry data box maps

13

Page 15: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Crimes Against Persons (inv)

High−HighLow−LowLow−HighHigh−Low

Crime Against Property (inv)

High−HighLow−LowLow−HighHigh−Low

Literacy

High−HighLow−LowLow−HighHigh−Low

Donations

High−HighLow−LowLow−HighHigh−Low

Infant Mortality (inv)

High−HighLow−LowLow−HighHigh−Low

Suicides (inv)

High−HighLow−LowLow−HighHigh−Low

Figure 2: Guerry data Local Moran cluster maps

14

Page 16: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Crimes Against Persons (inv)

High−HighLow−LowOther PosNegative

Crime Against Property (inv)

High−HighLow−LowOther PosNegative

Literacy

High−HighLow−LowOther PosNegative

Donations

High−HighLow−LowOther PosNegative

Infant Mortality (inv)

High−HighLow−LowOther PosNegative

Suicides (inv)

High−HighLow−LowOther PosNegative

Figure 3: Guerry data Univariate Local Geary cluster maps

15

Page 17: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Figure 4: Overlap in univariate Local Geary – Crime and Literacy

16

Page 18: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Figure 5: Bivariate Local Geary Map – Crime and Literacy

17

Page 19: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Figure 6: Multivariate Local Geary Map – All six variables

18

Page 20: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

List of Tables1 Correlations between the six variables . . . . . . . . . . . . . . . . . . . 202 Number of significant locations by type . . . . . . . . . . . . . . . . . . 213 Matching significant locations by type . . . . . . . . . . . . . . . . . . . 22

19

Page 21: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Table 1: Correlations between the six variables

Crime persons 1.000 0.255 -0.021 0.134 -0.027 -0.134Crime property 1.000 -0.363 -0.082 0.278 0.523Literacy 1.000 -0.196 -0.412 -0.374Donations 1.000 0.159 -0.035Infant mortality 1.000 0.289Suicides 1.000

20

Page 22: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Table 2: Number of significant locations by type

Moran GearyAll Pos Neg All Pos Neg

Crime persons 26 23 3 20 20 0Crime property 16 13 3 14 11 3Literacy 37 37 0 47 47 0Donations 29 25 4 26 24 2Infant mortality 16 13 3 23 19 4Suicides 32 30 2 34 32 2

21

Page 23: A Local Indicator of Multivariate Spatial Association: Extending Geary…priede.bf.lu.lv/ftp/pub/TIS/datu_analiize/GeoDa/LA... · 2017. 11. 3. · A Local Indicator of Multivariate

Table 3: Matching significant locations by type

All Pos NegCrime persons 13 13 0Crime property 6 5 1Literacy 33 33 0Donations 17 16 1Infant mortality 7 5 2Suicides 18 17 1

22


Recommended