+ All Categories
Home > Documents > Geostatistics Major Assignment - John...

Geostatistics Major Assignment - John...

Date post: 18-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
32
JB Industries March 21 st , 2014 Geostatistics Major Assignment Geostatistical Analysis of Student Collected Spatial Data John Bull
Transcript
Page 1: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

JB Industries

March 21st, 2014

Geostatistics Major

Assignment Geostatistical Analysis of Student Collected

Spatial Data

John Bull

Page 2: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

JB Industries | 3 Jasmin Crescent, St Catharines, ON, L2T 2B9

2

Executive Summary The geostatistical analysis of student collected spatial data project, was a major project that

included data collection, preprocessing, and surface creation. The project study area was

defined as the city of St. Catharines. Data used for the project were obtained online, primarily

from government ministries, and included water well data with elevation values used as the z-

value. Using these elevation values, two interpolated surfaces were created; an Inverse

Distance Weighted, and an Empirical Bayesian Kriging.

The IDW interpolator was used to create a smooth, gradual surface from the elevation points.

High levels of variation along the escarpment were attempted to be accounted for by altering

the search neighbourhood and outliers were attempted to be diminished through smoothing.

The IDW result came out as desired, with a slow gradual elevation change in the study area,

outside of a few outliers that created some small depressions in high elevation areas.

The EBK interpolator was used to create a more exact surface, attempting to account for more

elevation changes in the study area. The EBK used spatial autocorrelation to weight the

prediction locations, which in turn produced a slightly more accurate surface. The EBK result

accounted for more of the elevation changes in the area, as exemplified by the 12 Mile Creek

area.

Data coverage within the study area posed slight problems with two major data gaps and other

small clusters of data points. These areas slightly affected the accuracy of both interpolation

techniques, and the surfaces would be better suited if these gaps could be filled in with

supplemental elevation measurements.

In all, both interpolation techniques accurately performed the job they were set out to do. The

added geostatistical power of any kriging technique allows for a stricter surface to be derived

than that of an IDW due to the interactive modelling and added parameters. Depending on the

final use of the surface, however, one interpolation technique may not necessarily be better

than the other.

Page 3: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

JB Industries | 3 Jasmin Crescent, St Catharines, ON, L2T 2B9

3

Table of Contents 1. Introduction .......................................................................................................................................... 1

1.1. Project Goals ................................................................................................................................. 1

1.2. Study Area ..................................................................................................................................... 2

2. Data ....................................................................................................................................................... 3

2.1. Statistics ........................................................................................................................................ 5

2.2. Trends ........................................................................................................................................... 7

2.3. Preprocessing ................................................................................................................................ 8

3. Methodology ....................................................................................................................................... 10

3.1. Data Transformations and Trend Removal ................................................................................. 10

3.2. Inverse Distance Weighted ......................................................................................................... 12

3.3. Kriging ......................................................................................................................................... 17

4. Analysis ............................................................................................................................................... 20

4.1. Interpolation Techniques ............................................................................................................ 20

4.2. Data Coverage ............................................................................................................................. 23

4.3. Accuracy of Results ..................................................................................................................... 25

5. Future Recommendations .................................................................................................................. 27

6. Conclusion ........................................................................................................................................... 27

Bibliography ................................................................................................................................................ 28

Page 4: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

JB Industries | 3 Jasmin Crescent, St Catharines, ON, L2T 2B9

4

Figures and Tables Figure 1: Outline of study area ..................................................................................................................... 2

Figure 2: Locations of water wells within study area ................................................................................... 4

Figure 3: Summary statistics and distribution for water well data ............................................................... 6

Figure 4: Trend analysis of water well elevation measurements ................................................................. 7

Figure 5: Example of declustering ................................................................................................................. 9

Figure 6: Distribution of water wells data before transformation ............................................................. 10

Figure 7: Water well data distribution after transformation ...................................................................... 11

Figure 8: Cross-section of study area .......................................................................................................... 12

Figure 9: Comparison of IDW model to digital elevation model ................................................................ 14

Figure 10: Method report for IDW interpolation ........................................................................................ 15

Figure 11: Final IDW map output ................................................................................................................ 16

Figure 12: Method report for EBK interpolator .......................................................................................... 18

Figure 13: Empirical Bayesian Kriging output ............................................................................................. 19

Figure 14: Problems with IDW result .......................................................................................................... 21

Figure 15: Issues with EBK output ............................................................................................................... 22

Figure 16: Differences between EBK and IDW ............................................................................................ 23

Figure 17: Study area data gaps .................................................................................................................. 24

Figure 18: IDW overlaid onto satellite imagery .......................................................................................... 25

Figure 19: EBK overlaid on satellite imagery .............................................................................................. 26

Table 1: Descriptive statistics for water well dataset ................................................................................... 5

Page 5: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

1

1. Introduction The following report details the work that was undertook when completing the project:

Geostatistical Analysis of Student Collected Spatial Data. The report begins by detailing the

goals of the project, and then proceeds to explain the methodology used when performing each

analysis technique. Lastly, analytic commentary described the changes and differences between

each analysis technique, as well as the overall quality of the data coverage, and the accuracy of

the results when compared to real life.

1.1. Project Goals The ‘Geostatistical Analysis’ project’s main purpose and goals are listed below, as per the Terms

of Reference (Smith, 2014):

Derive a working ability to report upon the collection of geospatial data and to describe

the data both geostatistically as well as practically.

To predict geospatial coverage in areas not directly measured or observed, through

interpolation

The following report outlines the data, methodology, and analysis techniques used to achieve

these goals, and the report concludes with an assessment of the two interpolation techniques as

well as the data collection and coverage within the study area.

Page 6: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

2

1.2. Study Area The area of interest for the Geostatistical Analysis project is the city of St. Catharines, in the

Niagara Region. The study area limits were delineated using ESRI’s municipal polygons layer and

can be seen in Figure 1.

Figure 1: Outline of study area

Page 7: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

3

2. Data Data for the project was gathered entirely from online sources between January 17th and

January 19th of 2014.

The well water dataset is the primary dataset being used within this project, and has been

obtained from Ontario’s Ministry of the Environment. This dataset includes easting, northing,

and elevation measurements which will be used to interpolate the two different surfaces. The

municipal boundaries, used to derive the study area, were obtained online from Ontario Basic

Mapping; however the data is property of the Government of Ontario.

Page 8: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

4

The obtained data points, clipped to the study area, can be seen in Figure 2.

Figure 2: Locations of water wells within study area

Page 9: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

5

2.1. Statistics The water well dataset contains 573 data points within the area of interest, all with a unique

easting, northing, and elevation value. Descriptive statistics for these data points can be seen in

Table 1.

Descriptive Statistics for Well Data

Minimum Maximum Mean Median Standard Deviation

Easting 637,216.00 647,235.00 642,008.00 642,146.00 2,912.00

Northing 4,774,403.00 4,788,203.00 4,780,876.00 4,780,903.00 2,561.00

Elevation(m) 75.08 177.63 100.81 99.84 13.92 Table 1: Descriptive statistics for water well dataset

The range of elevation values (102.55m) immediately tells us that the study area has some fairly large

elevation changes within it. The mean and median values being close together, and around 100m,

suggest the elevation change is fairly steep (or else the mean would fall closer to the middle of the

range), and that the data is fairly close to normally distributed with a positive skew (due to the mean

lying closer to the minimum than the maximum).

Page 10: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

6

Further investigation into the data shows that the data is farther from a normal distribution than

first assumed. Skewness and Kurtosis are two statistical measures that can be used to describe

the distribution of a dataset; a skewness value of zero, and a kurtosis value of 3 are indicative of

a normal distribution (Babish, 2006). Figure 3 shows that the water well elevations have a

skewness value of 2.31 and a kurtosis value of 9.75.

Figure 3: Summary statistics and distribution for water well data

The skewness value of 2.31 indicates the dataset is distributed with positive skew, as previously

assumed due to a large cluster of values around the mean and a long tail into the higher values (Babish,

2006). The kurtosis value of 9.75 indicates the distribution is leptokurtic and therefore has a high peak

with thicker than normal tails (Babish, 2006). The lack of true normality within the dataset can

potentially be addressed before analysis by performing data transformations.

1801651501351201059075

Median

Mean

1021011009998

1st Q uartile 93.70

Median 99.84

3rd Q uartile 105.30

Maximum 177.63

99.66 101.95

98.42 100.15

13.16 14.78

A -Squared 19.05

P-V alue < 0.005

Mean 100.81

StDev 13.92

V ariance 193.84

Skewness 2.30888

Kurtosis 9.74837

N 573

Minimum 75.08

A nderson-Darling Normality Test

95% C onfidence Interv al for Mean

95% C onfidence Interv al for Median

95% C onfidence Interv al for StDev

95% Confidence Intervals

Summary for Elevation

Page 11: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

7

2.2. Trends The unique geomorphology of the Niagara Region, and in particular the study area, imposes

obvious trends within an elevation dataset. The presence of the Niagara Escarpment creates an

area where large elevation changes are occurring over short distance. Aside from this, the area

itself has a gradual increasing trend in the north-south direction; elevation values gradually

increase as one moves south from Lake Ontario, until the escarpment is reached and elevation

values rise sharply (Figure 4).

Figure 4: Trend analysis of water well elevation measurements

The green line in Figure 4 represents the projected trends in the data for the YZ plane, and the

blue line represents the projected trends for the XZ plane. The strong trend in the YZ plane can

be visualized within this figure, showing that the elevation is increasing as the northing values

decrease. The blue line also shows a slight trend in the XZ plane, with a slight elevation increase

in the centre of the study area. The appearances of trends in the data are sometimes removed

before performing Kriging or Cokriging, allowing the analysis to be performed more accurately

(ESRI, 2012).

Page 12: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

8

2.3. Preprocessing Preprocessing was performed on the water wells dataset to ensure accuracy in the

measurements as well as in the interpolated surface. Water well measurements were obtained

with a positional accuracy of +-500m, therefore using Google Earth, elevation values were

confirmed using a small degree of leeway in positional accuracy.

Water well measurements were also combed through to eliminate tight clusters of data points,

resulting in two benefits. Firstly, the kriging interpolation model is a processing-intensive

procedure and therefore the overall reduction of data points to below 500 is beneficial for

processing but also for obtaining accurate results. Secondly, clustered data results in data

redundancy and can also affect the outcome of any interpolation.

Page 13: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

9

An example of these clusters and points that have been removed can be seen in Figure 5.

Figure 5: Example of declustering

Page 14: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

10

3. Methodology The following section details the methodologies applied when creating the interpolated

surfaces, including all parameters used within the GIS analysis tools. Both interpolated surfaces

were created using ESRI’s Geostatistical Wizard.

3.1. Data Transformations and Trend Removal The lack of normality within the water well dataset indicates that the data should attempt to be

transformed in order to obtain a more accurate result, particularly when using the kriging model

(Babish, 2006). Using the histogram in ESRI’s Geostatistical Analyst extension, two

transformations can be applied to the dataset in which an updated distribution will be displayed

automatically, immediately showing what the transformation did to the dataset. The original

distribution, before the use of any transformations, can be seen in Figure 6.

Figure 6: Distribution of water wells data before transformation

The Box-Cox transformation is a useful method in alleviating heteroscedasticity which is

essentially where sub-populations in the data have different variabilities than others (Babish,

2006; Wikipedia, 2014). In the case of the water wells dataset, sub-populations in the lower

region of the study area, closest to Lake Ontario, should have similar variabilities; however, sub-

populations occuring on or around the Niagara Escarpment, will have an increased variability

due to the quick elevation changes in the area. Babish (2006) also explains that

heteroscedasticity can be caused by nonnormality of one of the variables, or an indirect

relationship between variables; the water well dataset contains both.

Page 15: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

11

The Box-Cox transformation, in turn, may help to make the variances more constant throughout

the study area and it will often make the data appear more normally distributed (ESRI, 2012).

Using the Box-Cox transformation, with a parameter of -2, the dataset appeared to become

more normally distributed, and thus more suitable for interpolation techniques (Figure 7).

Figure 7: Water well data distribution after transformation

It is evident in Figure 7, that the distribution visually appears to be far more normal then before

the transformation. The skewness and kurtosis statistics, mentioned earlier, also changed to

reflect a more normalized distribution. A skewness value of 0.088 is far closer to an appropriate

value of zero than the original 2.32, and a kurtosis value of 3.87 is also closer to the value of 3

which is indicative of a normal distribution.

A global trend in the dataset is an overriding process that affects all measurements in a

deterministic manner, meaning that all data points within the study area are affected by the

trend (ESRI, 2012). The trend we examined in the earlier sections, displayed an increasing curve

in the YZ plane, indicative of a global trend within the dataset. This trend can be represented by

a mathematical formula, and essentially removed from the dataset prior to the kriging analysis,

and then added back before predictions are made (ESRI, 2012).

Page 16: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

12

The most common way of modelling a trend is by using polynomial functions, with the degree

depending on the trends in the dataset. For the case of the water well dataset, the quick

increase in elevation values due to the escarpment creates a situation in which a second-degree

polynomial would fit the trend better than a first-degree polynomial. The first-degree

polynomial is simply a linear polynomial that would account for the gradual elevation changes

seen from Lake Ontario to the Niagara Escarpment; since elevation values were included on top

of the escarpment, the linear polynomial no longer fits as well. Figure 8 shows a cross-section of

St. Catharines in terms of elevation values.

Figure 8: Cross-section of study area

It can be seen that values at the right of the cross-section increase dramatically due to the

presence of the escarpment, and therefore a linear or planar trend model would not successfully

account for these data points. A second-degree polynomial, commonly referred to as a parabola

when graphed, allows for a curve in the modelling of the trend; this curved surface will fit the

water well dataset far better.

3.2. Inverse Distance Weighted The Inverse Distance Weighted (IDW) interpolation model operates under the assumption that

things that are close to one another are more alike than things that are farther apart (ESRI,

2012). Using this assumption, unmeasured locations are predicted using a weighting system;

measured points closer to the prediction location are given greater weight to those farther away

from the prediction location (ESRI, 2012).

Study Area Cross-Section

St Catharines

9,0008,0007,0006,0005,0004,0003,0002,0001,0000

Page 17: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

13

The IDW model generally works reasonably well for elevation values, due to the fact that

elevation values are typically more similar close to one another than far. The IDW model was

used within the Geostatistical Wizard to create the interpolated surface.

The original idea going into this project was to create an IDW surface that was exact as possible,

to better compare with all of the fluctuations that will be seen in the Krigged surface; however,

after working through the wizard with the dataset, it was decided that the IDW will be a

smoother surface. This decision was made because it will hopefully increase the overall

accuracy of the surface and reduce error, but also to avoid the bulls-eye effect near data points

that may have different z-values than the surrounding area.

Page 18: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

14

The first step in creating the IDW model was choosing the major and minor semiaxes for the

search neighbourhood. With the presence of the Niagara Escarpment running east to west

throughout the study area, it was assumed that values in the east-west direction would be more

similar to a predicted location than values in the north-south direction, particularly near the

escarpment. For this reason, an ellipsoidal search location was chosen and aligned parallel to

the escarpment by choosing a major semiaxis of 4,000, a minor semiaxis of 1,500, and an angle

of 60˚ from north. A comparison of the IDW model and search neighbourhood compared

against a digital elevation model can be seen in Figure 9, showing the elevation change the

neighbourhood was placed parallel to.

Figure 9: Comparison of IDW model to digital elevation model

Aligning the search neighbourhood this way helps for measurements near the escarpment as

elevation values to the north and south would be quite a bit different whereas values to the east

and west should be more similar. Eight sectors were chosen so that the surface would be

smoother; using eight sectors allows the maximum neighbour (20) and minimum neighbour (5)

values to be placed on each sector as opposed to overall (ESRI, 2012). This places less weight on

more surrounding data points, as opposed to only using a maximum of 20 points and placing

very high weights on nearby points. With the search neighbourhood parameters finalized, the

power parameter was then optimized using the Geostatistical Wizard. The power parameter

indicates how the weighting of data points will reduce based on distance; the optimization of

Page 19: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

15

this parameter is determined by minimizing the root mean square prediction error, and in the

case of the water well dataset, became 1.22844 (ESRI, 2012). With the parameters completed,

the wizard can be finished and the interpolated surface will be produced. The completion of the

wizard comes along with a method report detailing the parameters used in the interpolation;

the method report for the water wells can be seen in Figure 10.

Figure 10: Method report for IDW interpolation

Page 20: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

16

The final IDW output can be seen in Figure 11.

Figure 11: Final IDW map output

Page 21: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

17

3.3. Kriging The Kriging interpolation technique is similar to the IDW interpolator, with a farm more

geostatistically intensive approach. Similar to IDW, Kriging assigns weights to the surrounding

measured values to derive its predictions; however, unlike IDW, in Kriging the weights are not

dependent solely on distance to the predicted location but also dependent on the spatial

arrangement and autocorrelation of those points (ESRI, 2012). Therefore, Kriging essentially

quantifies the basic rule that things closer together are more similar than those far apart, and

uses it as part of the weighting method within the formula.

The first step in the Kriging process is determining the Kriging method to use; for use in this

project, the Empirical Bayesian Kriging (EBK) method was chosen. This method was chosen

primarily for the reason that in large datasets, EBK subsets the input data into overlapping

subsets for which multiple semivariograms are calculated and analyzed (ESRI, 2012). The

prediction for each location is then generated using unique semivariogram distributions,

weighting subsets closer to the location higher than those far away (ESRI, 2012). Due to the

extreme variation near the escarpment, the subsetting of data and subsequent analysis of

multiple semivariograms provided a more reliable and accurate Kriging approach than any other

method.

The EBK Kriging method essentially accounts for error introduced when estimating the

semivariogram; whereas other Kriging methods assume that the estimated semivariogram is the

true semivariogram for the entire interpolation region (ESRI, 2012). The semivariogram

estimation also reduces the amount of minimal interactive modelling, which in turn can reduce

the amount of human error introduced into the model.

With the kriging method chosen, the parameters must once again be filled out in the

Geostatistical Wizard. The parameters were chosen to try and highlight the variation in

elevation that is seen within the study area. The predictive surface is an attempt to build a strict

surface within the study area, picking up on smaller elevation changes than the IDW.

The subset size, which is defaulted as 100 data points, was reduced to 50 data points, in order to

create more, smaller subsets. This was done with the reasoning that the smaller subsets will in

turn give even lesser weights to values farther away and values that are less spatially auto

correlated. The overlap factor for the subsets was 1.2, indicating that about 20% of data points

will be used in two subsets, while the remaining 80% will only be included in one subset. The

number of simulations was left as 100, as changing this value did very little to the actual result.

The output surface type was also left as its default value of a prediction surface, as opposed to a

predictive surface showing probability or prediction standard error.

The search neighbourhood parameters were set up in an attempt to capture all of the variation

in the study area. The radius of the search neighbourhood was increased slightly from 750 to

800, but the minimum neighbours value was reduced significantly to only 3. This was done with

the reasoning that areas with very few surrounding data points should not travel a great

Page 22: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

18

distance to obtain their minimum amount of neighbours. The maximum amount of neighbours

was increased to 20 to also account for areas where there were large amounts of data points.

Lastly the sector was left as a standard circle with only one sector, which was performed once

again to try and account for the variation in the surface. Where the IDW was an attempt at

creating a smooth surface and used eight sectors, the krigged surface is far stricter and thus only

one sector is preferred.

With the parameters for the kriging completed, the Geostatistical Wizard can be finished and

the method report produced. The method report for the EBK performed can be seen in Figure

12.

Figure 12: Method report for EBK interpolator

Page 23: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

19

The final EBK output can be seen in Figure 13.

Figure 13: Empirical Bayesian Kriging output

Page 24: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

20

4. Analysis The following section details an analysis of the two interpolation techniques, the data coverage

within the study area, and the accuracy of the results.

4.1. Interpolation Techniques The IDW and EBK interpolation techniques have separate advantages and disadvantages, some of which

can be seen in the comparison between the two outputs that have been created for the water well

dataset. Multiple differences occur between the two outputs due to the difference in user defined

inputs, as well as the major statistical differences between the two interpolation methods.

Page 25: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

21

The IDW technique is extremely sensitive to clusters and outliers since the technique is directly

based off of a linear distance weighting system (ESRI, 2012). In clustered data, predictions near

clusters may be very accurate, but predictions made in areas with few data points probably will

not be, unless very little variation occurs in the study area. Outliers can also greatly affect the

data; extreme values that are highly weighted will deteriorate the accuracy of the predicted

result. Figure 14 shows examples, outlined in blue circles, of areas where abnormal elevation

values created small areas of reduced relief.

Figure 14: Problems with IDW result

These areas may not be statistical outliers, and instead may just be quite a bit different than the

surrounding data points. In this case it is known that many of these low spots are wells dug

along the 12 Mile Creek waterway, which reduces their elevations compared to surrounding

points. Aside from these isolated points, the IDW produced a smooth surface that gradually

increases the farther south it moves.

Page 26: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

22

The EBK technique is affected by clusters and outliers, but not to the extent of the IDW

technique. The use of subsets and multiple semivariograms can aid in reducing the weight

assigned to points that are less spatially auto correlated to the prediction location. Areas within

the study area with large gaps will once again be poorly represented by the model, but using a

more advanced technique like EBK, the error may not be as drastic as with IDW. An example of

this can be seen in Figure 15.

Figure 15: Issues with EBK output

The area outlined in blue shows the linear depression pattern within the EBK output, which

closely follows the true location of 12 Mile Creek. The use of spatial autocorrelation, instead of

simply distance, allows the EBK interpolator to make more educated decisions on what the

predicted value could be at any location. For this example, the trouble area (outlined in purple),

is an area with a real lack of data points, which the EBK modelled far better than the IDW.

Page 27: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

23

These outputs are tough to compare in terms of which technique performed the better

interpolation. This is because the IDW was chosen to be a smoother surface and the EBK was

chosen to be a stricter surface. Expected differences can be seen when looking at the

differences in the two interpolation techniques (Figure 16).

Figure 16: Differences between EBK and IDW

Figure 16 shows the differences in elevation between the two techniques. Areas in blues are

locations where the EBK was quite a bit greater than the IDW, whereas areas in orange and red

are areas where the EBK was quite a bit lower than the IDW. A lot of difference can be seen in

the aforementioned ‘trouble area’; this could be due to a lack of data points, or simply that the

IDW followed a smooth gradient and did not pick up the elevation decrease near 12 Mile Creek.

4.2. Data Coverage The data coverage within the study area greatly affects the output of an interpolated surface.

Areas with very little or no data points are going to be extremely tough to predict without data

Page 28: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

24

points close enough to them to interpolate from. The water well elevation measurements show

two major gaps within the study area (Figure 17).

Figure 17: Study area data gaps

These two areas are rather large and yet have very few data points. The lower yellow polygon

was previously described as a trouble area as elevation values are high aside from a select few

that were within the 12 Mile Creek basin. The lack of data in this area makes it extremely

difficult for the interpolator to pick up on a feature like 12 Mile Creek. The upper yellow

polygon is located in the urban area of St. Catharines, which could explain why so few wells have

been drilled there. This area did not cause as much trouble due to it not being an area of high

variability. The upper polygon is located north of the escarpment, on the Lake Iroquois bench,

where the elevation changes are very gradual and follow a similar trend to all of the closest data

points, even points that may be far away.

Page 29: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

25

Aside from these two trouble areas, the data coverage is pretty good for the study area and

allows for fairly accurate interpolation.

4.3. Accuracy of Results The accuracy of the results is a difficult thing to quantify, so therefore the interpolation

techniques accuracy will mainly be based off of visual assessment compared with satellite

imagery.

The IDW for the most part gave an accurate result, but with less precision than the EBK. Within

the IDW result, a constant smooth gradient can be seen, indicating elevation increases with

movement in the southward direction. Small elevation changes, such as those that occur near

water bodies, are not picked up in the IDW for the most part. Figure 18 shows the IDW result

overlaid onto satellite imagery, to assess accuracy.

Figure 18: IDW overlaid onto satellite imagery

Page 30: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

26

It can be seen that 12 Mile Creek, as well as the Welland Canal, do not represent any elevation

changes, indicating that the IDW interpolator smoothed over those areas. The green circular

areas in the bottom-left of the figure may appear to be errors, but the elevation measurements

are actually accurate due to them lying within the creek bed. The only noticeable error in the

study area is the location outlined by the yellow ellipse. This small depression has been

concluded to have occurred due to a measurement error, as elevation values in Google Earth

disprove the measurement value obtained from the water well.

The EBK result was slightly more precise than the IDW, picking up smaller elevation changes and

better representing the actual topography in the area. The EBK does a better job at picking up

elevation changes, even when there is a lack of data points for the predicted location. This can

be seen in Figure 19, as low elevation colours follow 12 Mile Creek far better than the IDW.

Figure 19: EBK overlaid on satellite imagery

Page 31: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

27

The area outlined in the yellow circle is the depression caused by the previously mentioned

measurement error. The only other slight error is indicated by the yellow ellipse; it shows a

small linear depression, however it is slightly off from where 12 Mile Creek actually runs.

In all, the accuracy of both interpolation techniques was acceptable, considering what was trying

to be obtained from each interpolation technique.

5. Future Recommendations The major issue encountered when completing the project, was the lack of adequate data

coverage in certain areas. Major data gaps create areas where it is very difficult for software to

interpolate a surface due to a lack of reference points. Even smaller data gaps can pose large

problems if they occur in areas of high variability. For example, if more data points were found

along the escarpment, as well as along 12 Mile Creek, both of these features could be better

identified using interpolation.

In the future, additional datasets could be used to supplement the well water data with more

elevation measurements. Additional data points in the previously mentioned areas would

provide a far more accurate result for the entire study area.

6. Conclusion The Geostatistical Analysis of Student Collected Spatial Data project successfully was completed

with the completion of two interpolated surfaces. Water well data, with elevation

measurements, was obtained for the city of St. Catharines, and was used as the basis for

interpolating each surface.

Two surface interpolation techniques were used, the Inverse Distance Weighted, and the

Empirical Bayesian Kriging. The IDW technique was used to create a much smoother surface,

which gave a good overall estimate of elevations in the area, whereas the EBK technique was

used to create a much stricter surface.

Both interpolation techniques produced surfaces that achieved the original goals and desires of

the client and project. Limited errors occurred throughout the project; however, overall data

coverage in the area could be improved which in turn would improve the interpolated surfaces.

Page 32: Geostatistics Major Assignment - John Bulljohnbull.weebly.com/uploads/2/5/3/7/25376223/bulljgisc... · 2018-09-10 · Geostatistics Major Assignment March 19, 2014 JB Industries |

Geostatistics Major Assignment March 19, 2014

28

Bibliography Babish, G. (2006). Geostatistics Without Tears: A practical guide to surface interpolation, geostatistics,

variograms and kriging. Regina: Environment Canada.

ESRI. (2012). ArcGIS 10.1 Help. Redlands, CA: ESRI.

Smith, I. D. (2014, January 24). Geostatistical Analysis of Student Collected Spatial Data. Retrieved from

Terms of Reference:

https://niagara.blackboard.com/webapps/portal/frameset.jsp?tab_tab_group_id=_2_1&url=%2

Fwebapps%2Fblackboard%2Fexecute%2Flauncher%3Ftype%3DCourse%26id%3D_113105_1%26

url%3D

Wikipedia. (2014, March 18). Heteroscedasticity. Retrieved from Wikipedia:

http://en.wikipedia.org/wiki/Heteroscedasticity


Recommended