METU, GGIT 532
1
GGIT 538Spatial Data Analysis
Instructor: Dr. H. Şebnem DüzgünRoom:K4-123
METU, GGIT 532
2
Basic Aim of the Course
Introduce the certain spatial statistical concepts and their use in GIS so that the students can use them in their studies at GGIT.
METU, GGIT 532
3
OUTLINEIntroduction to Spatial Data Analysis
1. Introduction1.1. Introduction
1.1.1. Scope of spatial statistics1.2. Spatial versus non-spatial data analysis
1.2.1. Relatiaonship between classes of spatial entities
1.2.1. Facts on attributes of spatial entities1.3. Types of spatial phenomena and relationships1.4. Problem types in spatial data analysis
1.4.1. Problems of spatially discrete point data1.4.2. Problems of spatially continuous point data1.4.3. Problems of area data1.4.4. Problems of spatial interaction data
METU, GGIT 532
4
CHAPTER IIntroduction to Spatial Data Analysis
IntroductionSpatial statistics deals with ways of analyzing all varieties of data in a spatial context. Some of the examples of the kind of problems can be listed as:
Seismologist collect data on the regional distribution of earthquakes. Does this distribution show any pattern or predictability over space?
Public health specialist collect data on the occurrence of diseases. Does the distribution of cases of a disease form a pattern in space? Is there some association with possible sources of environmental pollution?
METU, GGIT 532
5
Police wish to investigate if there is any spatial pattern to the distribution of certain crime locations. Does the rate of crime in particular areas correlate with socio-economic characteristics of the area?
Geologist wish to estimate the extent of a mineral deposit over a particular region, given data on borehole samples taken from locations scattered across the area. How can we make sensible estimates?
A groundwater hydrologist collects data on the concentration of a toxic chemical in samples collected from a series of wells. Can we use these samples to construct a regional map of likely contamination?
METU, GGIT 532
6
Retailers wish to use socio-economic data, available for small areas from the population census, to assess the
likely demand for their products if they open or expand an outlet. How are we to classify such areas? The same retailers collect information on movements of shoppers
from residential zones to stores. Can we build models of such flows? Can we predict changes in such flows if we
expand an outlet or open a new one?
METU, GGIT 532
7
The subject of spatial data analysis is relevant in many different fields such as:
GeographersStatisticiansEconomistsSociologistsEpidemiologistsPlannersBiologistsEnvironmental scientistsEarth scientistsEngineers
METU, GGIT 532
8
General concepts in spatial analysis: Spatial versus non-spatial data analysis, problem types, kinds of spatial phenomena and relationships. (Chapter 1)
Review of basic statistics: Random variables, expectations, probability distributions, maximum likelihood estimation, stationary and anisotropy. (Chapter 2)
General Concepts in Spatial Data Analysis:Visualizing spatial, data,exploring spatial data, modeling spatial data (Chapter 3)
Point patter analysis: Visualizing, exploring and modeling the point patterns. (Chapter 4 & 5)
Spatially continuous data analysis: Visualizing, exploring and modeling the spatially continuous data. (Chapter 6)
Analysis of area data: Visualizing, exploring and modeling the area data. (Chapter 7)
Scope of Spatial Statistics
METU, GGIT 532
9
1.2. Spatial Versus Non-spatial Data Analysis
Spatial data analysis deals with the situation where observational data are available on some process operating in space and methods are sought to describe or explain the behavior of this process and its possible relationship to other spatial phenomena.
The main purpose of the analysis is:
To increase our basic understanding of the process To assess the evidence in favor of various hypotheses
concerning it To predict values in areas where observations have not
been made
METU, GGIT 532
10
Spatial data analysis is involved when the data are spatially located and explicit consideration is given to possible importance of their spatial agreement or in the interpretation of results.
E.g. Consider the relationship between number of plant species and geographical area for a set of small islands. It is empirically suggested that the logarithm of the number of species is related to the logarithm of the area of the island.
Reason: As area increases there is a greater possibility of a range of available habitats
METU, GGIT 532
11
Spatial data analysis has nothing to do at this stage. In other words one of the variables involved (area), which is geographical, does not itself make the analysis a spatial one.
However, if we search for whether the isolation of an island is an important factor, in terms of its distance from other islands or from a continental area, this hypothesis is handled in the context of spatial data analysis.
If the basic concern is to analyze the spatial interaction, it is tried to determine whether there is an association between a set of points and a set of lines or set of points and set of areas
METU, GGIT 532
12
E.g. Testing for the association between the occurrence of mineral deposits (point data) and configurations of geological lineaments (line data).
Testing the hypothesis that there is a link between childhood leukemia (point data) and proximity to high voltage power lines (line data).
Testing the existence of a relationship between a set of plants (point data) and soil type (aerial unit).
Testing the existence of a relationship between the incidence of Alzheimer’s disease (point data) and the presence of aluminum in water sampled in a set of water supply zones (aerial units).
METU, GGIT 532
13
Figure 1.1. Locations of rainfall measurement sites in California
E.g.Consider it is intended to model spatial variation in precipitation in California. Suppose we take a set of 30 monitoring stations, distributed across the state.
METU, GGIT 532
14
For each of the points we have recordings of:Average annual precipitations (Y)Altitude (X1)Latitude (X2)Distance from coast (X3)
A standard multiple linear repression model is fitted to the data and it is found that three of the independent variables are significant predictors of rainfall with which 60 % of variation is explained by them. (Non-spatial data analysis)
METU, GGIT 532
15
Then the residuals (the differences between the observed values of precipitation at the stations and those predicted by regression model) are mapped in order to see if any spatial pattern exists. This indicated that there is a clustering of negative residuals on the leeward side of the mountains. In other words the model over predicts precipitation at these locations.
•This leads the researcher to introduce a new variable which takes value of 1 if the location of the station is in the lee of the mountain, 0 otherwise. With this variable added to the regression model, the explained variation rose to 74 %. (Spatial data analysis)
METU, GGIT 532
16
Relationship between Classes of Spatial Entities
Sometimes it is necessary to transform one class of objects into another one.
Point to area transformation Use of Thiessen PolygonsArea to point transformation Use of centroids
This notion of "new objects for old" relates to the subject of relations between entities. This relation can be of many types, such as:
If the basic concern is to analyze the spatial arrangements of points, this involves the measurement of distances between points; distance is a spatial relation.
METU, GGIT 532
17
E.g.Comparing the distribution of set of disease cases with a set of healthy controls, which involve distance measurements.
If the basic concern is to analyze aerial data, simple information about spatial adjacency may be of interest. Usually spatial proximity is linked to attribute information. In many cases, it is searched for whether areas close to each other on the ground have similar values on one or more attributes.
E.g.Do set of neighboring health districts tend to have the same mortality rate? Do adjacent pixels in remote sensing tend to have similar electromagnetic reflectance?
METU, GGIT 532
18
Facts on Attributes of Spatial Entities
If the attributes are treated alone, ignoring the spatial relationships between sample locations, it cannot be claimed to be doing spatial data analysis. In order to undertake spatial data analysis it is required as a minimum, information on location and usually both location and attributes.
If it is desired to study the spatial arrangement or pattern of entities then this is essentially a geometric question and collecting only the data for locations of entities will be sufficient.
If it is aimed to compare the arrangements of different types of entities or to study spatial pattern in measurements taken at locations, then it is needed to make use of both attribute and location information.
METU, GGIT 532
19
1.3. Types of Spatial Phenomena and Relationships
There are different types of spatial phenomena and spatial relationships that may be involved in spatial data analysis.These are basically:
•Entity view of the space•Field view of the space
* Entity view: The space is considered as something filled with “objects”. The spatial phenomena being analyzed are usually conceptualized as points, lines or areas.
METU, GGIT 532
20
Points Plants, people, shops, soil pits, the epicenters of earthquakes, etc.
Lines Roads, streams, fault lines, etc.
Areas Countries, voting areas, health regions, land covers, etc.
Note that representing objects in a space as points, lines and areas are always scale dependent.
METU, GGIT 532
21
*Field view: The space is considered as something covered with "surfaces'. In this view the emphasis is on the continuity of spatial phenomena. Phenomena in natural environment, such as temperature, relief, atmospheric pressure, soil or rock characteristics, etc. are observed and measured anywhere on the earth's surface. In practice however, such variables are "discretised". In other words they are sampled at a set of discrete locations and represented as a continuously varying field.
The relation between kind of spatial phenomena and problem types
Entity view Point pattern and area dataField view Spatially continuous data
METU, GGIT 532
22
In entity view spatial objects have features or attributes attached to with them; on the other hand, in field view features are associated with a field as an attribute varying continuously over space. Such attributes are measured according to one of the classic measurement scales:
NominalOrdinalInterval / ratio
METU, GGIT 532
23
Discharge
Entity AttributeMeasurement ScaleNominal Ordinal Interval
Point
Line
Area
Tree
Steam
Land
Tree species
Clean or polluted
Land-use class
Short, medium,long
1st, 2nd or higher order
High, medium, low, quality
Age
Pollution density
Table 1.1. Attributes of spatial entities according to measurement scale
METU, GGIT 532
24
1.4. Problem Types in Spatial Data Analysis
There are basically four classes of problems encountered in spatial data analysis:
1. Problems of spatially discrete point data2. Problems of spatially continuous point data3. Problems of area data4. Problems of spatial interaction data
METU, GGIT 532
25
This type of problems deal with data for a set of point events or a point pattern. These points sometimes have simple attributes with them distinguishing one kind of event from another. The main concern in such analysis is to analyze the pattern of the event locations.
1. Problems of spatially discrete point data:
METU, GGIT 532
26
E.g. The locations of craters in a volcanic field The locations of certain tree type in a forest The locations of the centers of biological cells in
a section of tissue The locations of certain crime type in a
neighborhood The locations of cases of a certain disease in an
area The locations of certain cancer type in a part of
the country
METU, GGIT 532
27
Figure 1.2. Locations of cases of Legionaires' disease in Glasgow
METU, GGIT 532
28
This class of problems arise where there are again set of points but the pattern of these locations is not itself the subject of analysis. Rather, there is a variable/variables measured at these sites and the problem is to understand the process generating these values and possibly then to use this information to make predictions where there is no measurement.
2. Problems of spatially continuous point data:
METU, GGIT 532
29
E.g.
Rainfall measurementsTemperature for weather stationsGroundwater levelsRadon gas levelsGeochemical dataClimate measuresOre gradeSoil & rock properties
METU, GGIT 532
30
Figure 1.3. Rainfall maps in England and Wales
Location of rainfall measurement sites in England and Wales
Contoured precipitation levels(mm) in England and Wales
Prediction errors (mm) of precipitation in England and Wales
METU, GGIT 532
31
3. Problems of area data: This class of problems concerns area data
which have been aggregated to a set of aerial units, such as counties, districts, census zones, etc. In this case there are one or more variables whose values are measured over this set of zones. The problem is to understand the spatial arrangement of these values, to detect pattern and to examine relationships among the set of variables.
METU, GGIT 532
32
Child mortality rateSocio-economic dataCensus dataVoting dataPrevalence of human blood groupsEmissions of nitrogen and ammonia
E.g.
METU, GGIT 532
33
Figure 1.4. Incidence of AIDS in Ugandan districts
METU, GGIT 532
34
Figure 1.5. AIDS residuals in Uganda
METU, GGIT 532
35
4. Problems of spatial interaction data:
This class of problems examines data on flows that link a set of locations (areas or points). The basic aim is to understand the arrangement of flows, to build models of such flows and maybe to use this information in making predictions about how the flow may change under certain scenarios.
METU, GGIT 532
36
E.g.
Business trips made by air within a country Migration for provinces of a country Patients treated from different districts at a
hospital The relative attractiveness of different shopping
centers as branch sites for a financial district The effect of opening a new swimming pool The impact of new housing district on existing
flows