1
Chapter 4Spatial preprocessing
4. Preprocessing
4.1 Introduction
Once we acquire spatial data, we think that we can start our study immediately. Unfortunately, this is not always the case.
Why?
4. Preprocessing
Data format, georeferencing system, map projection, data resolution, date of data acquisition, and spatial data unit used for reporting attribute data are different among spatial data.
Preprocessing is a computational operation that helps us to use various spatial datasets together in our analysis. It includes conversion of data format, conversion of georeferencing system, and spatial interpolation.
4. Preprocessing
If you are interested in a spatial phenomenon at a local scale, Cartesian coordinate systems are better than longitude-latitude system.
If the data are obtained only at sample points, you may have to interpolate them.
If you want to know the population in a small region and the data are aggregated by prefectures, you have to estimate the population from the data.
4. Preprocessing
The problems discussed in this chapter may seem trivial and may not be so interesting.
If you use only one database, you may not be troubled with such problems.
However, when you use several datasets simultaneously, overlaying them and performing calculations among datasets, you will face those intricate but important problems.
4. Preprocessing
Spatial preprocessing
1. Conversion of data format2. Conversion of georeferencing system3. Map transformation (including conversion of map
projection)4. Spatial interpolation5. Spatial smoothing6. Raster-vector conversion7. Areal interpolation8. Spatial data fusion
2
4. Preprocessing
4.2 Conversion of data format
1. ArcView (ESRI) shape2. ArcInfo (ESRI) coverage3. Mapinfo (Mapinfo) MIF4. SIS (Informatix) SIS5. AtlasGIS (?)6. TIGER (USGS)7. KIWI (DD)8. DXF, DIL, PIX9. JPEG…
4. Preprocessing
You can convert the data format by writing a program in C or Pascal if the data format is open.
However, you had better use existing GIS software because they can at least read various data formats; in some cases they can even save them in different formats.
For example, ArcView can read spatial data in shape, coverage, MIF, ….
4. Preprocessing
4.3 Conversion of georeferencing system
1. Address system2. Longitude-latitude system3. Cartesian coordinate system
4. Preprocessing
Cartesian coordinate system
Longitude-latitude system
Today’s GIS software has a conversion program as an internal function. You can convert georeferencing system by using GIS; you do not have to write a computer program.
Longitude-latitude system
Address system
Geocoding
4. Preprocessing
4.4 Map transformation
1. Map projection2. Affine transformation3. Rubber sheeting
4. Preprocessing
4.4.1 Map projections
3
4. Preprocessing
• References – map projection
1. Bugayeyskiy, L. M. and Snyder, J. P. (1995): Map Projections : A Reference Manual, Taylor & Francis.
2. Snyder, J. P. (1997): Flattening the Earth : Two Thousand Years of Map Projections, University of Chicago Press.
3. Tobler, W. R., Yang, O. H., and Snyder, J. P. (2000): Map Projection Transformation : Principles and Applications, Taylor & Francis.
4. Preprocessing
There are numerous methods of map projection. Map projection is a process of projecting the earth’s surface on a plane. Therefore, map projections are classified by the location and direction of source light and the location and shape of projection plane.
orthographic
stereographic
gnomonic
Figure: Location and direction of source light
Azimuthal projection Conic projection Cylindrical projection
Figure: Location and shape of projection plane
4. Preprocessing
Mercator’s projection
Source light: gnomonic projectionProjection plane: (modified) cylindrical projection
4. Preprocessing
As well as data format, map projection can be converted easily by today’s GIS software.
Mollweide's projectionMercator’s projection
Goode's projection
4
4. Preprocessing
Suppose a map whose projection or georeferencing system is unknown. If we want to convert it into digital data, we have to fit it to another map whose projection and georeferencing system are known.
To do this we scan the maps and specify spatial objects recorded on both maps in GIS. After that, we transform the former map to fit the latter one.
4.4.2 Affine transformation
4. Preprocessing
In transformation, if we want to keep the shape of spatial objects on a map, we can use only the similarity transformation that consists of 1) translation, 2) rotation, and 3) scaling.
In addition to the above three transformations, Affine transformation includes 4) reflection and 5) shearing.
You can transform spatial data by Affine transformation in GIS, by only giving parameter values.
4. Preprocessing
4.4.3 Rubber sheeting
In some cases it is enough to use Affine transformation to fit a map to another map.
However, if a map is seriously distorted, it is necessary to apply more flexible transformation to obtain good result.
The ‘rubber sheeting’ operation permits flexible transformation of spatial data. It is often used for treating sketch maps, historical maps and cognitive maps in GIS.
Figure: Two inconsistent maps
4. Preprocessing
We perform rubber sheeting operation by only giving a set of corresponding point pairs.
GIS system then calculates transformation function, not limited to Affine transformation, from the coordinates of the points and apply it to one of the maps.
Figure: Rubbersheeting
5
Figure: Result of rubbersheeting operation Figure: Portable map of Edo, 1861
Figure: A map of Hongo area Old map of Hongo area (Edo era)
Overlay of old and new maps Figure: Sketch map of an 8-year-old child
6
Figure: One resident's personal map of New York City
4. Preprocessing
4.5 Spatial interpolation
Spatial interpolation is a mathematical process of estimating unknown value of a (usually scalar) function at a given point from a set of known values at sample points.
135
127119
112
121108
124
?
Sample point
4. Preprocessing
Why is spatial interpolation important?
There are many spatial objects represented as the scalar function defined over a plane: elevation, temperature distribution, land price, etc..
However, it is impossible to measure the function value at all the locations, so we have to estimate it from a set of known values.
4. Preprocessing
Basic idea of spatial interpolation
Suppose two points and their function values.
If they are closely located, their values are expected to be close.
If they are distant, it is probable that their values are quite different.
221
108124
4. Preprocessing
Therefore, in estimation of a value at a certain point, we put more weight on function values of near sample points than those of distant points.
Another important point is the continuity of function. Scalar functions used in GIS are generally continuous, so spatial interpolation should generate continuous functions estimated from sample values.
4. Preprocessing
4.5.1 Discrete interpolation:Nearest neighborhood method
This method substitutes the function value of a point by that of its nearest sample point.
To find the nearest sample point from a certain point, we use the Voronoi diagram defined for a set of points.
The Voronoi diagram shows the assignment of points to their nearest sample points, so it gives a spatial tessellation.
7
Figure: Voronoi diagram
4. Preprocessing
Voronoi diagram is also referred to as Dirichlettessellation. Similarly, Voronoi regions are often called Dirichlet polygons or Thiessen polygons.
You can easily construct Voronoi diagram with GIS. Free programs are also available on the Internet.
http://www.voronoi.com/ImplFrames.htm
?
52 93 67
51
72
2543
24
19 37 21
13
24
Figure: Voronoi diagram
4. Preprocessing
4.5.2 One-dimensional continuous interpolation
Though discrete interpolation is theoretically natural and easy to perform, the result is not satisfactory because it yields discontinuity in scalar functions.
It is obviously better to interpolate scalar functions by using continuous functions.
4. Preprocessing
In usual, spatial data are two-dimensional, so interpolation of spatial data requires two-dimensional interpolation methods.
However, two-dimensional interpolation methods are more complicated and difficult to understand than one dimensional methods. Because of this, before going to two-dimensional case, one-dimensional interpolation methods will be explained.
4. Preprocessing
• References - spline interpolation
1. Spath, H. (1995): One Dimensional Spline Interpolation Algorithms, A. K. Peters.
2. Spath, H. (1995): Two Dimensional Spline Interpolation Algorithms, A. K. Peters.
8
4. Preprocessing
Terminology
4. Preprocessing
Sample points
Sample points are the locations where the value of a scalar function is known.
The elevation is measured at sample points and interpolated to generate the earth’s surface.
4. Preprocessing
Sample point
4. Preprocessing
Interpolation region
Interpolation region is the region where a scalar function is estimated from known values at sample points. This term is not widely used in GIS, but convenient for explaining spatial interpolation.
In usual, interpolation region is bounded by sample points. However, it can also be extended to the area of no sample points. In this case we use the term ‘extrapolation’ instead of interpolation.
4. Preprocessing
Interpolation region
Extrapolation
Interpolation
4. Preprocessing
Interpolation region is divided into subregions.
In one-dimensional case, their boundary points are called knots. In two-dimensional case, they are simply called boundaries.
9
4. Preprocessing
Sample pointKnots
4. Preprocessing
Knots can be determined arbitrarily. However, in usual, sample points are used as knots.
4. Preprocessing
Knots = sample points
4. Preprocessing
Outline of the methods
Unlike discrete interpolation, continuous interpolation always gives smooth a function whose value at knots is continuous between subregions.
In each subregion, a continuous function is fitted which is estimated from the function value at sample points.
In usual, polynomial function called spline function is used because of its tractability.
4. Preprocessing
Spline function
m: the number of subregions... the number of knots is m-1... the number of sample points is m+1
xi: the coordinate of the ith sample point (i=0, ..., m) hi: the measured value at the ith sample point (i=0, ..., m)
x0 x1 x2 x3 x4 xm-1 xm
4. Preprocessing
n: the degree of spline function
Spline function for the ith (i=1, ..., m) subregion:
We have m(n+1) unknown parameters (aij) to be estimated.
( ) 21 2
ni io i i inf x a a x a x a x= + + + +
x0 x1 x2 x3 x4 xm-1 xm
f1(x) f2(x) f3(x) f4(x) fm(x)
10
4. Preprocessing
How do we estimate parameter values?
We put two conditions satisfied by spline functions, in order to obtain a continuous scalar continuous.
4. Preprocessing
Condition C1
At each sample point the value of spline function agrees with the known (measured) value.
( )( )
1 1i i i
i i i
f x h
f x h− −=
=
x0 x1 x2 x3 x4 xm-1 xm
f1(x) f2(x) f3(x) f4(x) fm(x)
( )1, ...,i m=
4. Preprocessing
Condition C2
At each knot two spline functions have the same 1st, 2nd, ..., n-1th derivative values.
( ) ( )1d dd d
l l
i i i il lf x f xx x +=
x0 x1 x2 x3 x4 xm-1 xm
f1(x) f2(x) f3(x) f4(x) fm(x)
( )1, ..., 1; 1, ..., 1i m l n= − = −
4. Preprocessing
We have m(n+1) free parameters.
Condition C1 imposes one constraint at each end point, and two constraints at each middle point. As a total C1 gives 2m constraints.
Condition C2 imposes n-1 constraints at each knot. Consequently, it gives (m-1)(n-1) constraints.
4. Preprocessing
Therefore, we havem(n+1)-2m-(m-1)(n-1)=n-1
free parameters (degree of freedom).
This indicates that if n=1 the spline functions are unique. If n>1, additional conditions are necessary to determine spline functions.
4. Preprocessing
Linear spline
Linear spline uses linear functions where n=1.
Degree of freedom: 0
A set of line segments that simply connect the known values of the function at sample points.
( ) 1i io if x a a x= +
11
Figure: Linear spline
4. Preprocessing
Quadratic spline
Quadratic spline uses quadratic functions (n=2).
Degree of freedom: 1
To determine spline functions, we either
1. give the first derivative at one end of the interpolation region, or,
2. estimate the spline functions that minimize their total length
( ) 21 2i io i if x a a x a x= + +
Figure: Quadratic spline
4. Preprocessing
Cubic spline
Cubic spline uses quadratic functions (n=3).
Degree of freedom: 2
To determine spline functions, we usually give the first derivative at both ends of the interpolation region.
( ) 2 31 2 3i io i i if x a a x a x a x= + + +
Figure: Quadratic spline
4. Preprocessing
Cubic spline is the most popular among polynomial spline functions, because
1. it is well balanced. Functions interpolated by linear or quadratic spline function are not so smooth. On the other hand, polynomial functions of higher degree are too sensitive to the variation of known values at sample points, that is, they often yield functions that wildly fluctuate between sample points.
12
4. Preprocessing
2. it minimizes the total curvature among all the functions. This is a fact that was theoretically (mathematically) proved.
3. it gives natural curves found in the real world (it gives the same curve as the curve ruler).
4. Preprocessing
Extensions
Spline function and its extensions are also used for representing smooth curves on a two-dimensional plane.
• Bezier curves• Post Script• True type
Figure: An open curved figure on a plane Figure: A closed curved figure on a plane
Figure: An open curved figure on a plane
4. Preprocessing
4.5.3 Two-dimensional continuous interpolation 1:Introduction
Two-dimensional interpolation is a natural extension of one-dimentional spline interpolation.
A crucial difference lies in the way of dividing interpolation region into subregions.
13
4. Preprocessing
In one-dimensional interpolation, sample points are distributed on a one-dimensional space, that is, a line segment. This leads us to natural definition of subregions and knots used for spatial interpolation – sample points are also used as knots.
4. Preprocessing
In two-dimensional interpolation, on the other hand, sample points are distributed on a two-dimensional plane, which may be a regular pattern or may be an irregular distribution.
This makes it difficult to divide the interpolation region naturally into the subregions in which spline functions are estimated. In two-dimensional interpolation, we also have to discuss the way of dividing interpolation region into subregions.
4. Preprocessing
Except the division of interpolation region, two-dimensional interpolation is not substantially different from one-dimensional interpolation.
In the following, two-dimensional interpolation when sample points are regularly located is discussed first, though it rarely happens in the real world, because it is similar to one-dimensional interpolation. After that, the case of irregular sample points is explained.
4. Preprocessing
67 71 72 69 64 65 59
66 67 75 74 70 69 60
60 77 78 79 74 71 64
61 82 84 81 76 73 74
76 88 86 83 80 78 77
66 75 72 76 74 70 69
mx cells
my cells
4.5.4 Two-dimensional continuous interpolation 2: Sample points of rectangular lattice
4. Preprocessing
mx cells
my cells
f1mx(x, y)f12(x, y) f13(x, y) f14(x, y)f11(x, y)
f3mx(x, y)f22(x, y) f23(x, y) f24(x, y)f21(x, y)
f3mx(x, y)f32(x, y) f33(x, y) f34(x, y)f31(x, y)
fmymx(x, y)fmy2(x, y) fmy3(x, y) fmy4(x, y)fmy1(x, y)
In each cell, we consider a different spline function.
4. Preprocessing
n: the degree of spline function
We have mxmy(n+1)2 unknown parameters to be estimated.
( ) 00 10 01
2 211 20 02
2 2 2 221 12 22
,ij ij ij ij
ij ij ij
ij ij ij
f x y a a x a y
a xy a x a y
a x y a xy a x y
= + +
+ + +
+ + + +
14
4. Preprocessing
On the other hand, we have constraints to be satisfied at the corner and boundary of cells.
After all, the degree of freedom is
(mx+my+2)(n – 1) + (n – 1)2
4. Preprocessing
Bilinear spline
Bilinear spline uses two-dimensional linear function defined on a two-dimensional plane.
Spline function for the (i, j)-th (i=1, ..., mx; i=1, ..., my) subregion:
( ) 00 10 01 11,ij ij ij ij ijf x y a a x a y a xy= + + +
4. Preprocessing
Note that the function does not represent a flat surface but a curved surface (linear function on a one-dimensional space is a straight line).
Degree of freedom: 0 (no additional constraint is necessary)
Figure: A surface generated by the bilinear spline
Figure: A surface generated by the bilinear spline
4. Preprocessing
Biquadratic spline
Biquadratic spline uses two-dimensional quadratic function:
Degree of freedom: mx+my+3
( ) 00 10 01
2 211 20 02
2 2 2 221 12 22
,ij ij ij ij
ij ij ij
ij ij ij
f x y a a x a y
a xy a x a y
a x y a xy a x y
= + +
+ + +
+ + +
15
4. Preprocessing
To determine spline functions, we have to impose mx+my+3 constraints. We usually give the 1st derivatives on the outer boundary of interpolation region.
4. Preprocessing
1. The 1st derivatives by y at the lower end points: mx+12. The 1st derivatives by x at the right end points: my+13. The 1st derivatives by y and x at the lower right
corner point: 1
mx cells
my cells
Figure: A surface generated by the biquadratic spline Figure: A surface generated by the biquadratic spline
4. Preprocessing
Bicubic spline
Bicubic spline uses two-dimensional cubic function:
Degree of freedom: 2(mx+my+4)
( ) 00 10 01 11
2 2 2 2 2 220 02 21 12 22
3 3 3 330 03 31 13
3 2 2 3 3 332 23 33
,ij ij ij ij ij
ij ij ij ij ij
ij ij ij ij
ij ij ij
f x y a a x a y a xy
a x a y a x y a xy a x y
a x a y a x y a xy
a x y a x y a x y
= + + +
+ + + + +
+ + + +
+ + +
4. Preprocessing
To determine spline functions, we have to impose 2(mx+my+4) constraints. In cubic spline, we again give the 1st derivatives on the outer boundary of interpolation region.
16
4. Preprocessing
1. The 1st derivatives by y at the upper and lower end points: 2(mx+1)
2. The 1st derivatives by x at the right and left end points: 2(my+1)
3. The 1st derivatives by y and x at the corner points: 4
mx cells
my cells
Figure: A surface generated by the bicubic spline
Figure: A surface generated by the bicubic spline
4. Preprocessing
4.5.5 Two-dimensional continuous interpolation 3:Irregular sample points
If the sample points are regularly distributed, we can naturally define rectangular subregions for interpolating continuous function.
However, to measure the function value at regularly distributed points is practically impossible. Sample points may be located in a river, on a cliff, or in a building.
4. Preprocessing
Distribution of sample points in reality
67 71
72
69
6465
596667
75
74
70
69
6060
77
7879
74
71 6461
82
84
81 76 7374
76
88
86
83
80 78
7766
72 76
74
70
69
4. Preprocessing
Division of the interpolation region
We use Triangular Irregular Network (TIN) to divide the interpolation region into subregions.
Different spline functions are defined in individual triangular subregions. They are estimated to satisfy constraints similar to those imposed for interpolation by regularly-distributed sample points.
Cubic spline is generally used as the spline function.
17
4. Preprocessing
Triangular Irregular Network (TIN)
There are numerous triangular networks that divide the interpolation region into subregions.
Figure: A point distribution
Figure: Two different triangular networks
4. Preprocessing
Choice of triangular networks
Among numerous TINs, in spatial interpolation, Delaunaytriangulation is usually chosen, because it does not produce elongated, acute-angled triangles. This makes spline functions smooth, not fluctuated.
We can construct Delaunay triangulation by connecting points whose Voronoi regions are neighboring. Recent GIS provides a function of constructing Delaunaytriangulation from a point distribution.
4. Preprocessing
Homework Q.4.1 (10 pts)
Find an application of spatial interpolation, and report its objective, data acquisition method, interpolation method, and so forth.
4. Preprocessing
Homework Q.4.2 (20 pts)
Suppose you create spatial data of the surface elevation of the earth. Then you have to 1) determine the sample points, the locations whose elevation you measure, and 2) interpolate the elevation data measured to obtain the surface data.
Discuss how you should determine the sample points.