1
Texas A&M University
Zachary Department of Civil Engineering
Instructor: Dr. Francisco Olivera
CVEN 658 Civil Engineering Applications of GIS
The Use of ArcGIS Geostatistical Analyst Exploratory Spatial Data Analysis and an Integrated
Regionalization of Colorado Precipitation and Elevation Data
Lacey Bodnar
December 6, 2010
2
Contents
Abstract ......................................................................................................................................................... 3
1 Introduction ............................................................................................................................................... 3
2 Literature Review ....................................................................................................................................... 4
3 Methodology .............................................................................................................................................. 5
The reference material for all information about, and procedures regarding, the Geostatistical Analyst
extension is the ESRI manual “ArcGIS 9: Using ArcGIS Geostatistical Analyst” by Johnson et al. ............ 5
3.1 Geostatistics ........................................................................................................................................ 5
3.1.1 The Model for Ordinary Kriging and Cokriging ............................................................................ 5
3.2 ArcGIS Geostatistical Analyst .............................................................................................................. 7
3.2.1 Exploratory spatial data analysis.................................................................................................. 7
3.2.2 Validation ................................................................................................................................... 10
4 Application, Results, and Discussion ........................................................................................................ 10
4.1 Description of the Study Area and Datasets ..................................................................................... 10
4.1.1 Colorado Topographic Features ................................................................................................. 11
4.1.2 Colorado Climate ....................................................................................................................... 11
4.1.3 Datasets ..................................................................................................................................... 12
4.2 Results ............................................................................................................................................... 13
4.2.1 Exploratory Spatial Data Analysis .................................................................................................. 13
4.2.2 Regionalization Maps ..................................................................................................................... 14
4.3 Discussion .......................................................................................................................................... 16
Conclusion ................................................................................................................................................... 19
Works Cited ................................................................................................................................................. 20
Appendix A: Exploratory Spatial Data Analysis Graphs............................................................................... 21
Appendix B: Regionalization Maps ............................................................................................................. 24
ArcGIS Default Map: Inverse Distance Weighted ................................................................................... 24
ESDA Map: Ordinary Kriging with Outlier Removal and 2nd Degree Local Trend Removal ................... 25
Elevation Map: Ordinary Cokriging with Outlier Removal and 2nd Degree Local Trend Removal .......... 26
Elevation and World Image Map ............................................................................................................ 27
3
Abstract
The primary goal of this regionalization analysis was to determine which interpolation
methodology resulted in the least prediction error for precipitation in the mountainous state of
Colorado. This error was systematically quantified for three models (inverse distance weighted, kriging,
and cokriging) and nine variations, including combinations of logarithmic transformations, trend
removal, anisotropy, and outlier removal. A secondary objective was to account for the effect of
elevation on spatial variation in precipitation. The results of method comparisons demonstrated that
most method variations are improvements in certain error measurements, but not in all. Variations
from the default were better at improving mean error than maximum or minimum error. By combining
several variations, it was possible to arrive at a prediction map with all error values lower than the
default setting. Using the Ordinary kriging and cokriging methods in conjunction with outlier and 2nd
degree local trend removal resulted in the lowest error. The fact that the cokriging map, of which
elevation is a secondary variable, produced the least error confirms the expectation that precipitation
and elevation are positively correlated.
1 Introduction Regionalization procedures allow scientists and water managers to take point data, such as
precipitation and stream flow, and extrapolate values for locations where no monitoring is in effect.
This is essential because it is not possible to put a gauging station at every possible location.
Additionally, large scale analysis, such as watershed modeling or soil quality models, often requires
knowing aggregate values, such as total runoff (Carrera-Hernandez et al. 2007, Shen et al. 2001, Langella
et at. 2010). In turn, calculating runoff requires first knowing the depth of precipitation across a
surface. This report demonstrates how the application of regionalization procedures to point data
allows for the creation of a continuous predicted rainfall surface map.
4
This project explores the uses of ArcGIS Geostatistical Analyst for Exploratory Spatial Data
Analysis (ESDA) and large scale regionalization of average monthly precipitation for the year 2000 across
the state of Colorado. A secondary objective of the project is to account for the effect of elevation on
precipitation across the Rocky Mountains. An advantageous feature of geostatistical methods is that
they allow for the statistical determination of the accuracy of the predicted surface. This capability,
called validation, was applied to compare multiple regionalization methods and arrive at a final
recommendation for the method with least error.
2 Literature Review A great variety of regionalization methods exist, and their complexity tends to increase with
time. In 1998, Frei and Schar used a moderately simple distance weighted deterministic method to
assess precipitation climatology across the Eurpoean Alps. In 2001, Shen et al. used a method of
nearest-station assignment to average a dense network of gridpoints into polygons for assessment of
soil quality models in Alberta, Canada. More recently, Langella et al. (2010) investigated the application
of neural computing for matrix analysis in regionalization of climate data in the Campaina regions of
Italy.
The link between elevation and precipitation has also been studied by several authors. In 2007,
J.J. Carrera-Hernandez and S.J. Gaskin used kriging algorithms to analyze the spatial and temporal
characteristics of minimum and maximum temperature, precipitation, and its correlation with elevation.
Additionally, in 2010, Diodato et al. used cokriging in GIS to estimate rainfall in the Eastern Nepalese
Mountains. This report goes beyond that research by incorporating exploratory spatial data analysis in
the selection of numerous variations in kriging models. A validation of each variation allows for the
quantification of error in a large scope assessment of model prediction accuracy, and enables the author
to make recommendations for the highest accuracy modeling.
5
3 Methodology
The reference material for all information about, and procedures regarding, the Geostatistical
Analyst extension is the ESRI manual “ArcGIS 9: Using ArcGIS Geostatistical Analyst” by Johnson et al.
3.1 Geostatistics
Geostatistical interpolation is based on the assumptions that locations that are closer together
will be more similar than locations that are farther apart. As a result, the values of closer points are
weighted more heavily than those far away. The empirical semivariogram is the key link between point
data and regionalized surfaces. It relates the distance between points to the difference squared
between their values; it is the graphical representation of the similarity of points with distance. The
model that is best fit to the semivariogram is then used in the interpolation method that creates a
surface prediction. The geostatistical interpolation methods chosen for this report were kriging and
cokriging.
3.1.1 The Model for Ordinary Kriging and Cokriging
In kriging, the statistical weighting is based not only on the distance between measured points,
but also the overall spatial relationships between locations. Cokriging uses the same procedure, but
allows for the assessment of multiple variables which have an effect on the variable of interest.
The model for kriging is based on the following equations: Eq 1)
The value at location s [Z(s)] is equal to a constant mean (µ) plus random errors with
spatial dependence *ε(s)+.
Eq 2)
6
The predicted value [Ź(s0)] is equal to the sum from one to N of the product of an
unknown weight (λi) of the observed value at the ith location and the value at the ith
location [Z(si)].
Eq 3)
The lowest error occurs when the difference between the true value [Z(s0)], and the
predictor *Σλi Z(si)] is as small as possible. This occurs when the result of Eq 3 is
minimized.
Eq 4)
The purpose of kriging is to solve for all the weights (λ). The gamma (Г) matrix is
populated with semivariogram values based on a given distance between two locations i
and j. The vector g is a list of semivariogram values between the predicted location and
each known point.
The procedure for cokriging is conceptually similar, but it complicated by the fact that the
covariance of two or more variables must be accounted for in predictions. By solving these kriging
equations for the given sample points, the ArcGIS Geostatistical Analyst was able to produce
regionalized surface maps.
7
3.2 ArcGIS Geostatistical Analyst
For the purposes of the geostatistical analysis used in this project, the raw data for each point
(precipitation station) must have at least the following five attributes: STATION ID, LATITUDE,
LONGITUDE, AVERAGE ANNUAL PRECIPITATION, and the station ELEVATION. The station ID wasthen
used to identify each point and relate it to its record in the precipitation database. Latitude and
Longitude was needed for the Display XY Data function in ArcGIS to plot each point on the map. The
precipitation record was the primary spatial parameter analyzed by the Geostatistical Wizard. Elevation
was used as the second spatial parameter in the cokriging operation.
The Geostatistical Analyst extension allows for the exploration of spatial data. An improved
understating of the characteristics of a data set can be used to make improvements to the
regionalization model. The extension also provides a Geostatistical Wizard for prediction mapping. The
user has the option of accepting the pre-defined (default) settings, or making adjustments to better
represent the spatial properties of the data. The default settings were used in this report to create the
first regionalization map. Then, multiple adjustments were assessed based on information obtained
from the spatial data exploration. A comparison of the error associated with these models, to the error
of the default model, allowed for a quantifiable assessment of the advantages and disadvantages of
each attempted model.
3.2.1 Exploratory spatial data analysis
The tools available for ESDA in the Geostatistical Analyst include: Histogram, Normal QQ Plot,
Trend Analysis, Voronoi Map, Semivariogram / Covariance Cloud, General QQ Plot, and Crosscovariance
Cloud. Each tool is defined in greater detail on the following pages.
8
9
10
3.2.2 Validation
Geostatistical validation required that the points be split into two groups, one to be used for the
prediction surface (training), and the other to contain known values used to test the accuracy (test).
Using the Geostatstical Analyst function “Create Subsets,” the data was split 70:30 and put in a
Training_Testing geodatabase. The validation function was then invoked after the generation of a
regionalized map by simply clicking on the map layer and selecting “Validation.” Validation calculated
an error associated with each point in the test group. The statistics of the error column of the test
station attribute table was used as a quantifiable measure of the various interpolation methods.
4 Application, Results, and Discussion
4.1 Description of the Study Area and Datasets The scope of the project included the entire state of Colorado. Colorado extends from 37° to
41°N latitude and from 102° to 109°W longitude (Doesken et al, 2003). It is the eighth largest American
state, and has an area of over 104,000 mi2. As a mountainous state, Colorado was chosen in order to
develop a method for integrating elevation spatial data with predictions of precipitation.
Records for average monthly precipitation in 2000 were used from 185 precipitation stations
distributed across the state. The relative locations of the sites are shown on the map below.
Figure 1: Colorado Study Sites
11
4.1.1 Colorado Topographic Features
The average elevation of Colorado is 6,800 feet above sea level, making it the highest
contiguous state (Doesken et al, 2003). It has 59 mountains over 14,000 feet, and 830 mountains
between 11,000 and 14,000 feet. Along Colorado’s eastern border, elevations range from
approximately 3,350 feet to 4,000 feet. Elevations increase westward, and reach between 5,000 and
6,000 feet where the plains transition to the front range of the Rocky Mountains. In the Rocky
Mountain foothills, elevation rise sharply to 7,000 to 9,000 feet. Across the mountains, elevations reach
9,000 feet, with the highest points over 14,000 feet. These topographic features affect temperatures,
wind patterns, and storm tracks during all seasons.
Though Colorado is well known for its mountains, the eastern high plains account for almost 40
percent of the state area. High ground is also present along the eastern border to the south along New
Mexico, and to the north along Nebraska and Wyoming (Doesken et al, 2003).
4.1.2 Colorado Climate
The overall climate of Colorado is semi-arid (Doesken et al, 2003). It is highly affected by
variations in elevation, and to a lesser extent, by the orientation of the mountains and valleys in regards
to typical air movements.
The average annual precipitation of Colorado is 17 inches, but that number varies significantly
with location. Precipitation tends to decline gradually from the eastern border, and reaches the lowest
point in the state near the Rocky Mountain foothills. Precipitation then increases rapidly across the
mountain range as elevation increases. To the west of the Rockies, elevation declines, and precipitation
gets progressively lower (Doesken et al, 2003).
12
4.1.3 Datasets
The foundational dataset for the project was the monthly and annual precipitation data from
climate gauging stations across Colorado. This data was retrieved in a raw format, meaning in an excel
table only, with no link to ArcGIS. For the year 2000, there were 185 records available. The information
for accumulated precipitation per month was averaged to find the average monthly precipitation. The
information was provided by the Colorado Decision Support System, and may be found online at
http://cdss.state.co.us.
Three additional datasets were also used to facilitate the regionalization analysis. One was a
Shapefile of the locations of Colorado Precipitation Stations, used to plot the stations in ArcMap.
Important attributes of this dataset were: NAME, STATION_ID, LATDECDEG, and LONGDECDEG. The raw
average annual precipitation data was joined to the precipitation stations Shapefile by the field
STATION_ID. This allowed the stations with precipitation data to be projected in ArcGIS, using the
Display XY function, based on the latitude and longitude record of the Shapefile.
The joined table of precipitation station records was then split into a two datasets for the
validation process. Seventy percent of the records (129 stations) were included in the training set. The
remaining thirty percent (56 stations) were put in the test set, and stored in a personal geodatabase.
The remaining two datasets, State Boundaries and World Imagery, were used for presentation
purposes only, to help display the location and characteristics of the project site.
Table 2: Description of Data Sets
Data Source Format URL
Monthly and Annual Precipitation Data
Colorado Decision Support System
Excel table http://cdss.state.co.us/DNN/ViewData/tabid/60/Default.aspx
Colorado Precipitation Stations
Colorado Decision Support System
Shapefile http://cdss.state.co.us/DNN/default.aspx
State Boundaries USGS Shapefile http://coastalmap.marine.usgs.gov/GISdata/basemaps/boundaries/state_bounds/state_bounds.zip
World Imagery ESRI JPEG http://services.arcgisonline.com/ArcGIS/rest/services/World_ Imagery/MapServer
13
4.2 Results
4.2.1 Exploratory Spatial Data Analysis The Geostatistical Analyst was used to search for spatial trends in the state-wide distribution of
precipitation. Graphs were generated for each of the ESDA tools described in section 3.2.1. Please refer
to Appendix A for the complete sequence of graphs.
For a normal distribution, the mean will equal the median, the Skewness will equal one, and the
Kurtosis will be equal to three. The histogram generated from the complete data set had a mean of
1.2184 and a median of 1.1883, with a difference of 0.0301. The Skewness factor was 1.2886, and the
Kurtosis was 7.2863. Station 9181 appears to be a positive outlier. Positive outliers will contribute to a
high positive skew factor. After removing the highest and lowest values (possible outliers), the mean
became 1.21, the median 1.1883, the Skewness 0.64798, and the Kurtosis 3.4907. Also, a logarithmic
transformation resulted in a mean of 0.13755, and a median of 0.17255. The Skewness changed to
-0.20969, and the Kurtosis to 3.4864.
The Normal QQ Plot was also used to assess data normalcy. The closer the data’s quantile is to
the straight line, the more normal the distribution. The data quanitles were high for both low (below
-1.11) and high (above 1.67) standard normal values. For mid-range values, the quantiles fit the line
well. After a log transformation, the low (below -1.67) standard normal values were below, rather than
above, the line of normalcy.
Trend analysis was used to study spatial patterns in the data. Two trends were discovered in the
data. The green line projected against the XZ (north-south) plane represented a 2nd order trend. With a
rotation angle of 117:, the trend stretched from the N-NW to the S-SE. The blue line projected against
the YZ plane at 45: is also a second order trend from the NE to the SW.
14
The standard deviation Voronoi map displayed areas of high standard deviation relative to
neighboring values. The map showed that the greatest differences in precipitation are concentrated in
western half of the map. The south western quadrant of the western half had especially high deviations.
4.2.2 Regionalization Maps
The primary goal of this regionalization analysis was to determine which variation in
methodology resulted in the least prediction error. A secondary objective was to account for the effect
of elevation on spatial variation in precipitation. In order to accomplish these goals, three maps were
created (See Appendix B: Regionalization Maps). The first map was generated using all the default
ArcGIS Geostatistical Analyst settings. This map was called “ArcGIS Default Map: Inverse Distance
Weighted.” The second map was produced using kriging and adjustments chosen from the exploratory
spatial data analysis. Many variations of models were tested, and their error was validated. A
comparison of the model errors is provided in Table 3. The second map, called “ESDA Map: Ordinary
Kriging with Outlier Removal and 2nd Degree Local Trend Removal” was made using the model
variations which produced the least error. The third map was generated using cokriging, with elevation
of the stations as the second dataset. It was called “Elevation Map: Ordinary Cokriging with Outlier
Removal and 2nd Degree Local Trend Removal.”
15
Table 3: Comparison of Regionalization Models
Default Inverse Distance Weighted
Minimum: -1.078505 Maximum: 0.525964 Mean: -0.083442 SD: 0.346012
Model Type
Model Variation Error Advantage Disadvantages
Ordinary Kriging
Default MIN: -1.193436 MAX: 1.045916 Mean: -0.069173 SD: 0.35928
The mean error is reduced compared to the default IDW.
MIN, MAX, and SD measurements are higher than the default.
Log Transformation
MIN: -1.198215 MAX: 0.865418 Mean: -0.068397 SD: 0.337955
Mean error and error SD are less than default.
Minimum and maximum error is slightly higher.
Global 2nd degree trend removal
MIN: -1.177671 MAX: 1.106821 Mean: -0.070623 SD: 0.371293
Mean error is reduced compared to default.
Minimum and maximum error and standard deviation are slightly higher than default.
Local 2nd degree trend removal
MIN: -1.099931 MAX: 1.40959 Mean: -0.059516 SD: 0.378031
Mean error is reduced compared to default.
Minimum and maximum error and standard deviation are higher.
Anisotropy MIN: -1.137199 MAX: 0.839349 Mean: -0.061707 SD: 0.343763
Mean error is lower than default. SD is slightly reduced.
Minimum and maximum errors are higher.
Outlier removal MIN: -1.100251 MAX: 0.453051 Mean:-0.066228 SD: 0.340979
MAX, Mean, and SD are improvements over the default setting.
Minimum error is slightly larger.
Outlier Removal 2nd degree local trend removal
MIN: -1.077709 MAX: 0.436838 Mean:-0.074807 SD: 0.337523
All values are improvements over the default setting.
NA
Ordinary Cokriging
Default MIN: -1.027828 MAX:0.61427 Mean: -0.077562 SD: 0.352175
Minimum and mean errors are better than the default IDW map.
The MAX error and standard deviation of the error is slightly higher.
Outlier Removal 2nd Degree Local Trend Removal
MIN: -1.043587 MAX: 0.387861 Mean: -0.04397 SD: 0.220565
All values are improvements over the default setting of all maps.
NA
16
4.3 Discussion
The results of method comparison demonstrated that most methods are improvements in
certain respects, but not in all.
A log transformation changed the error in ways consistent with what was observed on the log
transformation of the Normal QQ Plot. As seen below, the transformation was an overall improvement.
The maximum values and middle values are all better fit. However, the minimum values on the log
graph deviated farther away from the line. This is consistent with slightly higher minimum error, and
reduced maximum, mean, and standard deviation error of the log transformation regionalization map.
Figure 2: Normal QQ Plot, No Transformation
Figure 3: Normal QQ Plot, Log Transformation
Removing trends proved to be effective at reducing average error. However, that came at a
trade-off with higher minimum and maximum errors, and as a result, higher standard deviation. Trends
are the non-random aspect of a spatial model, meaning they are deterministic.
Recall the first kriging equation:
Eq 1)
17
The value at location s [Z(s)] is equal to a constant mean (µ) plus random errors with
spatial dependence *ε(s)+.
Kriging methods are intended to account for random errors. Trend removal allows for the
separation of deterministic variation from random variation in the ε(s) factor. As described earlier in
section 4.2.1, two second-order trends were identified in the data. One stretched from the N-NW to the
S-SE. The other extended from from the NE to the SW.
Figure 4: Global Trend Removal Figure 5: Local Trend Removal
Anisotropy was also effective in reducing mean error, and to a slight degree, standard deviation
of error. However, the minimum and maximum errors increased. Anisotropy occurs when spatial
autocorrelation changes with both distance and direction. Adding anisotropy to the regionalization
method allowed for the incorporation of directional influences. In a situation such as this, where
geographical variation produces large scale physical patterns that are related to the variable of interest,
it seems that quantifying the effect of direction on the predicted surface will result in greater accuracy.
Figure 6: Accounting for Anisotropy in Geostatistical Wizard
18
Method comparison showed that outlier removal is a strong technique for minimizing error. In
the ordinary kriging default, log transformation, and trend removal error distributions, the highest
positive error was always station number 6259. Investigating the map showed that the station with the
highest positive error was located closest to the station with the highest recorded value (9181). When
examining the histogram, station 9181 appears to be a definite positive outlier. Removing the outlier
reduced the skew, as well as the maximum, average, and standard deviation of error.
Figure 7: Spatial Relationship of Positive Outlier and Highest Positive Error
The station with the highest negative error, station 1959, is also located in close proximity to the
station with the lowest recorded average annual precipitation, station 797. This close spatial
relationship, between the lowest value station and the most underestimated location from the kriging
analysis, suggests that station 797 is skewing the results too low. However, removing the minimum
value increased the minimum error from -1.0785 to -1.100251. Thus, it does not seem justified to say
that station 797 is an outlier. Rather, it likely contributes to an accurate prediction of that area.
The cokriging map was an improvement over the first two, because it produced the lowest
19
prediction error. The unique aspect of cokriging was factoring in the autocorrelation between elevation
and precipitation. This reinforces the fact that precipitation increases with increasing elevation. By
inspection, it is evident that the cokriging elevation map fits the geographic features of the state (See
Elevation and World Imagery Map, Appendix B). Precipitation is highest (darkest red) over the mountain
peaks. It tapers off moving east and west. Precipitation increases again noticeably along the eastern
border. This is consistent with the topography of the state, where higher elevations occur to the south
along New Mexico, and to the north along Nebraska and Wyoming (refer to Section 4.1.1).
Conclusion The primary goal of this regionalization analysis was to determine which interpolation
methodology resulted in the least prediction error for the mountainous state of Colorado. A secondary
objective was to account for the effect of elevation on spatial variation in precipitation. The results of
method comparisons demonstrated that most methods are improvements in certain error
measurements, but not in all. Model variations from the default were better at improving mean error
than maximum or minimum error. Every variation, including default kriging, log, trend, anisotropy, and
outlier removal, resulted in a lower mean error than the default map, and either higher minimum or
maximum error, or both. By combining several variations, it was possible to arrive at a prediction map
with all error values lower than the default setting. Using the ordinary kriging and cokriging methods in
conjunction with outlier and 2nd degree local trend removal resulted in lower error. The fact that the
cokriging map produced the least error confirms the expectation that precipitation increases with
increasing elevation.
20
Works Cited Carrera-Hernandez, J.J., and Gaskin, S.J. (2007). “Spatio temporal analysis of daily precipitation and temperature in the Basin of Mexico.” Journal of Hydrology, 336, 231-249. Diodato, N., Tartari, G., Bellocchi, G. (2010). “Geospatial rainfall modeling at Eastern Nepalese Highland from ground environmental data.” Water Resources Management, 24, 2703-2720. Doesken, N.J., Pielke, R.A., Bliss, O.A.P. (2003). “Climate of Colorado.” Colorado State University, <http://ccc.atmos.colostate.edu/climateofcolorado.php> (Nov. 13, 2010). Frei, C., and Schar, C. (1998). “A precipitation climatology of the Alps from high-resolution rain-gauge observations.” International Journal of Climatology, 18, 873-900. Johnson, K., Ver Hoef, J.M., Krivoruchko, K., Lucas, N. (2003). “ArcGIS9: Using ArcGIS Geostatistical Analysis.” <http://dusk.geo.orst.edu/gis/geostat_analyst.pdf> (Nov. 13, 2010). Langella, G., Basile, A., Bonfante, A., Terribile, F. (2010). “High-resolution space-time rainfall analysis using integrated ANN inference systems.” Journal of Hydrology, 387, 328-342. Shen, S.S., Dzikowski, P., Li, G., Griffith, D. (2001). “Interpolation of 1961-97 daily temperature and precipitation data onto Alberta Polygons of Ecodistrict and Soil Landscapes of Canada.” Journal of Applied Meteorology, 40, 2162-2177.
21
Appendix A: Exploratory Spatial Data Analysis Graphs Histogram: No Transformation
Histogram: Log Transformation
Histogram: Outlier Removal (Highest and Lowest Value)
Normal QQ Plot: No Transformation
Normal QQ Plot: Log Transformation
22
Trend Analysis
Voronoi Map: Mean Voronoi Map: Standard Deviation*
*used to look for local variation Voronoi Map: Cluster†
†used to look for local outliers
23
Semivariogram
Covariance Cloud
General QQ Plot* (High and Low Outlier Removed)
*X-axis is precipitation, Y-axis is elevation Crosscovariance Cloud (High and Low Outlier Removed)
24
Appendix B: Regionalization Maps
ArcGIS Default Map: Inverse Distance Weighted
25
ESDA Map: Ordinary Kriging with Outlier Removal and 2nd Degree Local Trend
Removal
26
Elevation Map: Ordinary Cokriging with Outlier Removal and 2nd Degree Local
Trend Removal
27
Elevation and World Image Map