Download - Chapter 4 Spatial preprocessing - 東京大学ua.t.u-tokyo.ac.jp/okabelab/sada/docs/pdf_class/Ch04_1.pdf4. Preprocessing Data format, georeferencing system, map projection, data resolution,

1

Chapter 4Spatial preprocessing

4. Preprocessing

4.1 Introduction

Once we acquire spatial data, we think that we can start our study immediately. Unfortunately, this is not always the case.

Why?

4. Preprocessing

Data format, georeferencing system, map projection, data resolution, date of data acquisition, and spatial data unit used for reporting attribute data are different among spatial data.

Preprocessing is a computational operation that helps us to use various spatial datasets together in our analysis. It includes conversion of data format, conversion of georeferencing system, and spatial interpolation.

4. Preprocessing

If you are interested in a spatial phenomenon at a local scale, Cartesian coordinate systems are better than longitude-latitude system.

If the data are obtained only at sample points, you may have to interpolate them.

If you want to know the population in a small region and the data are aggregated by prefectures, you have to estimate the population from the data.

4. Preprocessing

The problems discussed in this chapter may seem trivial and may not be so interesting.

If you use only one database, you may not be troubled with such problems.

However, when you use several datasets simultaneously, overlaying them and performing calculations among datasets, you will face those intricate but important problems.

4. Preprocessing

Spatial preprocessing

1. Conversion of data format2. Conversion of georeferencing system3. Map transformation (including conversion of map

projection)4. Spatial interpolation5. Spatial smoothing6. Raster-vector conversion7. Areal interpolation8. Spatial data fusion

2

4. Preprocessing

4.2 Conversion of data format

1. ArcView (ESRI) shape2. ArcInfo (ESRI) coverage3. Mapinfo (Mapinfo) MIF4. SIS (Informatix) SIS5. AtlasGIS (?)6. TIGER (USGS)7. KIWI (DD)8. DXF, DIL, PIX9. JPEG…

4. Preprocessing

You can convert the data format by writing a program in C or Pascal if the data format is open.

However, you had better use existing GIS software because they can at least read various data formats; in some cases they can even save them in different formats.

For example, ArcView can read spatial data in shape, coverage, MIF, ….

4. Preprocessing

4.3 Conversion of georeferencing system

1. Address system2. Longitude-latitude system3. Cartesian coordinate system

4. Preprocessing

Cartesian coordinate system

Longitude-latitude system

Today’s GIS software has a conversion program as an internal function. You can convert georeferencing system by using GIS; you do not have to write a computer program.

Longitude-latitude system

Address system

Geocoding

4. Preprocessing

4.4 Map transformation

1. Map projection2. Affine transformation3. Rubber sheeting

4. Preprocessing

4.4.1 Map projections

3

4. Preprocessing

• References – map projection

1. Bugayeyskiy, L. M. and Snyder, J. P. (1995): Map Projections : A Reference Manual, Taylor & Francis.

2. Snyder, J. P. (1997): Flattening the Earth : Two Thousand Years of Map Projections, University of Chicago Press.

3. Tobler, W. R., Yang, O. H., and Snyder, J. P. (2000): Map Projection Transformation : Principles and Applications, Taylor & Francis.

4. Preprocessing

There are numerous methods of map projection. Map projection is a process of projecting the earth’s surface on a plane. Therefore, map projections are classified by the location and direction of source light and the location and shape of projection plane.

orthographic

stereographic

gnomonic

Figure: Location and direction of source light

Azimuthal projection Conic projection Cylindrical projection

Figure: Location and shape of projection plane

4. Preprocessing

Mercator’s projection

Source light: gnomonic projectionProjection plane: (modified) cylindrical projection

4. Preprocessing

As well as data format, map projection can be converted easily by today’s GIS software.

Mollweide's projectionMercator’s projection

Goode's projection

4

4. Preprocessing

Suppose a map whose projection or georeferencing system is unknown. If we want to convert it into digital data, we have to fit it to another map whose projection and georeferencing system are known.

To do this we scan the maps and specify spatial objects recorded on both maps in GIS. After that, we transform the former map to fit the latter one.

4.4.2 Affine transformation

4. Preprocessing

In transformation, if we want to keep the shape of spatial objects on a map, we can use only the similarity transformation that consists of 1) translation, 2) rotation, and 3) scaling.

In addition to the above three transformations, Affine transformation includes 4) reflection and 5) shearing.

You can transform spatial data by Affine transformation in GIS, by only giving parameter values.

4. Preprocessing

4.4.3 Rubber sheeting

In some cases it is enough to use Affine transformation to fit a map to another map.

However, if a map is seriously distorted, it is necessary to apply more flexible transformation to obtain good result.

The ‘rubber sheeting’ operation permits flexible transformation of spatial data. It is often used for treating sketch maps, historical maps and cognitive maps in GIS.

Figure: Two inconsistent maps

4. Preprocessing

We perform rubber sheeting operation by only giving a set of corresponding point pairs.

GIS system then calculates transformation function, not limited to Affine transformation, from the coordinates of the points and apply it to one of the maps.

Figure: Rubbersheeting

5

Figure: Result of rubbersheeting operation Figure: Portable map of Edo, 1861

Figure: A map of Hongo area Old map of Hongo area (Edo era)

Overlay of old and new maps Figure: Sketch map of an 8-year-old child

6

Figure: One resident's personal map of New York City

4. Preprocessing

4.5 Spatial interpolation

Spatial interpolation is a mathematical process of estimating unknown value of a (usually scalar) function at a given point from a set of known values at sample points.

135

127119

112

121108

124

?

Sample point

4. Preprocessing

Why is spatial interpolation important?

There are many spatial objects represented as the scalar function defined over a plane: elevation, temperature distribution, land price, etc..

However, it is impossible to measure the function value at all the locations, so we have to estimate it from a set of known values.

4. Preprocessing

Basic idea of spatial interpolation

Suppose two points and their function values.

If they are closely located, their values are expected to be close.

If they are distant, it is probable that their values are quite different.

221

108124

4. Preprocessing

Therefore, in estimation of a value at a certain point, we put more weight on function values of near sample points than those of distant points.

Another important point is the continuity of function. Scalar functions used in GIS are generally continuous, so spatial interpolation should generate continuous functions estimated from sample values.

4. Preprocessing

4.5.1 Discrete interpolation:Nearest neighborhood method

This method substitutes the function value of a point by that of its nearest sample point.

To find the nearest sample point from a certain point, we use the Voronoi diagram defined for a set of points.

The Voronoi diagram shows the assignment of points to their nearest sample points, so it gives a spatial tessellation.

7

Figure: Voronoi diagram

4. Preprocessing

Voronoi diagram is also referred to as Dirichlettessellation. Similarly, Voronoi regions are often called Dirichlet polygons or Thiessen polygons.

You can easily construct Voronoi diagram with GIS. Free programs are also available on the Internet.

http://www.voronoi.com/ImplFrames.htm

?

52 93 67

51

72

2543

24

19 37 21

13

24

Figure: Voronoi diagram

4. Preprocessing

4.5.2 One-dimensional continuous interpolation

Though discrete interpolation is theoretically natural and easy to perform, the result is not satisfactory because it yields discontinuity in scalar functions.

It is obviously better to interpolate scalar functions by using continuous functions.

4. Preprocessing

In usual, spatial data are two-dimensional, so interpolation of spatial data requires two-dimensional interpolation methods.

However, two-dimensional interpolation methods are more complicated and difficult to understand than one dimensional methods. Because of this, before going to two-dimensional case, one-dimensional interpolation methods will be explained.

4. Preprocessing

• References - spline interpolation

1. Spath, H. (1995): One Dimensional Spline Interpolation Algorithms, A. K. Peters.

2. Spath, H. (1995): Two Dimensional Spline Interpolation Algorithms, A. K. Peters.

8

4. Preprocessing

Terminology

4. Preprocessing

Sample points

Sample points are the locations where the value of a scalar function is known.

The elevation is measured at sample points and interpolated to generate the earth’s surface.

4. Preprocessing

Sample point

4. Preprocessing

Interpolation region

Interpolation region is the region where a scalar function is estimated from known values at sample points. This term is not widely used in GIS, but convenient for explaining spatial interpolation.

In usual, interpolation region is bounded by sample points. However, it can also be extended to the area of no sample points. In this case we use the term ‘extrapolation’ instead of interpolation.

4. Preprocessing

Interpolation region

Extrapolation

Interpolation

4. Preprocessing

Interpolation region is divided into subregions.

In one-dimensional case, their boundary points are called knots. In two-dimensional case, they are simply called boundaries.

9

4. Preprocessing

Sample pointKnots

4. Preprocessing

Knots can be determined arbitrarily. However, in usual, sample points are used as knots.

4. Preprocessing

Knots = sample points

4. Preprocessing

Outline of the methods

Unlike discrete interpolation, continuous interpolation always gives smooth a function whose value at knots is continuous between subregions.

In each subregion, a continuous function is fitted which is estimated from the function value at sample points.

In usual, polynomial function called spline function is used because of its tractability.

4. Preprocessing

Spline function

m: the number of subregions... the number of knots is m-1... the number of sample points is m+1

xi: the coordinate of the ith sample point (i=0, ..., m) hi: the measured value at the ith sample point (i=0, ..., m)

x0 x1 x2 x3 x4 xm-1 xm

4. Preprocessing

n: the degree of spline function

Spline function for the ith (i=1, ..., m) subregion:

We have m(n+1) unknown parameters (aij) to be estimated.

( ) 21 2

ni io i i inf x a a x a x a x= + + + +


f1(x) f2(x) f3(x) f4(x) fm(x)

10

4. Preprocessing

How do we estimate parameter values?

We put two conditions satisfied by spline functions, in order to obtain a continuous scalar continuous.

4. Preprocessing

Condition C1

At each sample point the value of spline function agrees with the known (measured) value.

( )( )

1 1i i i

i i i

f x h

f x h− −=

=


f1(x) f2(x) f3(x) f4(x) fm(x)

( )1, ...,i m=

4. Preprocessing

Condition C2

At each knot two spline functions have the same 1st, 2nd, ..., n-1th derivative values.

( ) ( )1d dd d

l l

i i i il lf x f xx x +=


f1(x) f2(x) f3(x) f4(x) fm(x)

( )1, ..., 1; 1, ..., 1i m l n= − = −

4. Preprocessing

We have m(n+1) free parameters.

Condition C1 imposes one constraint at each end point, and two constraints at each middle point. As a total C1 gives 2m constraints.

Condition C2 imposes n-1 constraints at each knot. Consequently, it gives (m-1)(n-1) constraints.

4. Preprocessing

Therefore, we havem(n+1)-2m-(m-1)(n-1)=n-1

free parameters (degree of freedom).

This indicates that if n=1 the spline functions are unique. If n>1, additional conditions are necessary to determine spline functions.

4. Preprocessing

Linear spline

Linear spline uses linear functions where n=1.

Degree of freedom: 0

A set of line segments that simply connect the known values of the function at sample points.

( ) 1i io if x a a x= +

11

Figure: Linear spline

4. Preprocessing

Quadratic spline

Quadratic spline uses quadratic functions (n=2).


To determine spline functions, we either

1. give the first derivative at one end of the interpolation region, or,

2. estimate the spline functions that minimize their total length

( ) 21 2i io i if x a a x a x= + +

Figure: Quadratic spline

4. Preprocessing

Cubic spline

Cubic spline uses quadratic functions (n=3).


To determine spline functions, we usually give the first derivative at both ends of the interpolation region.

( ) 2 31 2 3i io i i if x a a x a x a x= + + +

Figure: Quadratic spline

4. Preprocessing

Cubic spline is the most popular among polynomial spline functions, because

1. it is well balanced. Functions interpolated by linear or quadratic spline function are not so smooth. On the other hand, polynomial functions of higher degree are too sensitive to the variation of known values at sample points, that is, they often yield functions that wildly fluctuate between sample points.

12

4. Preprocessing

2. it minimizes the total curvature among all the functions. This is a fact that was theoretically (mathematically) proved.

3. it gives natural curves found in the real world (it gives the same curve as the curve ruler).

4. Preprocessing

Extensions

Spline function and its extensions are also used for representing smooth curves on a two-dimensional plane.

• Bezier curves• Post Script• True type

Figure: An open curved figure on a plane Figure: A closed curved figure on a plane

Figure: An open curved figure on a plane

4. Preprocessing

4.5.3 Two-dimensional continuous interpolation 1:Introduction

Two-dimensional interpolation is a natural extension of one-dimentional spline interpolation.

A crucial difference lies in the way of dividing interpolation region into subregions.

13

4. Preprocessing

In one-dimensional interpolation, sample points are distributed on a one-dimensional space, that is, a line segment. This leads us to natural definition of subregions and knots used for spatial interpolation – sample points are also used as knots.

4. Preprocessing

In two-dimensional interpolation, on the other hand, sample points are distributed on a two-dimensional plane, which may be a regular pattern or may be an irregular distribution.

This makes it difficult to divide the interpolation region naturally into the subregions in which spline functions are estimated. In two-dimensional interpolation, we also have to discuss the way of dividing interpolation region into subregions.

4. Preprocessing

Except the division of interpolation region, two-dimensional interpolation is not substantially different from one-dimensional interpolation.

In the following, two-dimensional interpolation when sample points are regularly located is discussed first, though it rarely happens in the real world, because it is similar to one-dimensional interpolation. After that, the case of irregular sample points is explained.

4. Preprocessing

67 71 72 69 64 65 59

66 67 75 74 70 69 60

60 77 78 79 74 71 64

61 82 84 81 76 73 74

76 88 86 83 80 78 77

66 75 72 76 74 70 69

mx cells

my cells

4.5.4 Two-dimensional continuous interpolation 2: Sample points of rectangular lattice

4. Preprocessing

mx cells

my cells

f1mx(x, y)f12(x, y) f13(x, y) f14(x, y)f11(x, y)



fmymx(x, y)fmy2(x, y) fmy3(x, y) fmy4(x, y)fmy1(x, y)

In each cell, we consider a different spline function.

4. Preprocessing

n: the degree of spline function

We have mxmy(n+1)2 unknown parameters to be estimated.

( ) 00 10 01

2 211 20 02

2 2 2 221 12 22

,ij ij ij ij

ij ij ij

ij ij ij

f x y a a x a y

a xy a x a y

a x y a xy a x y

= + +

+ + +

+ + + +

14

4. Preprocessing

On the other hand, we have constraints to be satisfied at the corner and boundary of cells.

After all, the degree of freedom is

(mx+my+2)(n – 1) + (n – 1)2

4. Preprocessing

Bilinear spline

Bilinear spline uses two-dimensional linear function defined on a two-dimensional plane.

Spline function for the (i, j)-th (i=1, ..., mx; i=1, ..., my) subregion:

( ) 00 10 01 11,ij ij ij ij ijf x y a a x a y a xy= + + +

4. Preprocessing

Note that the function does not represent a flat surface but a curved surface (linear function on a one-dimensional space is a straight line).

Degree of freedom: 0 (no additional constraint is necessary)

Figure: A surface generated by the bilinear spline

Figure: A surface generated by the bilinear spline

4. Preprocessing

Biquadratic spline

Biquadratic spline uses two-dimensional quadratic function:

Degree of freedom: mx+my+3

( ) 00 10 01

2 211 20 02

2 2 2 221 12 22

,ij ij ij ij

ij ij ij

ij ij ij

f x y a a x a y

a xy a x a y

a x y a xy a x y

= + +

+ + +

+ + +

15

4. Preprocessing

To determine spline functions, we have to impose mx+my+3 constraints. We usually give the 1st derivatives on the outer boundary of interpolation region.

4. Preprocessing

1. The 1st derivatives by y at the lower end points: mx+12. The 1st derivatives by x at the right end points: my+13. The 1st derivatives by y and x at the lower right

corner point: 1

mx cells

my cells

Figure: A surface generated by the biquadratic spline Figure: A surface generated by the biquadratic spline

4. Preprocessing

Bicubic spline

Bicubic spline uses two-dimensional cubic function:

Degree of freedom: 2(mx+my+4)

( ) 00 10 01 11

2 2 2 2 2 220 02 21 12 22

3 3 3 330 03 31 13

3 2 2 3 3 332 23 33

,ij ij ij ij ij

ij ij ij ij ij

ij ij ij ij

ij ij ij

f x y a a x a y a xy

a x a y a x y a xy a x y

a x a y a x y a xy

a x y a x y a x y

= + + +

+ + + + +

+ + + +

+ + +

4. Preprocessing

To determine spline functions, we have to impose 2(mx+my+4) constraints. In cubic spline, we again give the 1st derivatives on the outer boundary of interpolation region.

16

4. Preprocessing

1. The 1st derivatives by y at the upper and lower end points: 2(mx+1)

2. The 1st derivatives by x at the right and left end points: 2(my+1)

3. The 1st derivatives by y and x at the corner points: 4

mx cells

my cells

Figure: A surface generated by the bicubic spline

Figure: A surface generated by the bicubic spline

4. Preprocessing

4.5.5 Two-dimensional continuous interpolation 3:Irregular sample points

If the sample points are regularly distributed, we can naturally define rectangular subregions for interpolating continuous function.

However, to measure the function value at regularly distributed points is practically impossible. Sample points may be located in a river, on a cliff, or in a building.

4. Preprocessing

Distribution of sample points in reality

67 71

72

69

6465

596667

75

74

70

69

6060

77

7879

74

71 6461

82

84

81 76 7374

76

88

86

83

80 78

7766

72 76

74

70

69

4. Preprocessing

Division of the interpolation region

We use Triangular Irregular Network (TIN) to divide the interpolation region into subregions.

Different spline functions are defined in individual triangular subregions. They are estimated to satisfy constraints similar to those imposed for interpolation by regularly-distributed sample points.

Cubic spline is generally used as the spline function.

17

4. Preprocessing

Triangular Irregular Network (TIN)

There are numerous triangular networks that divide the interpolation region into subregions.

Figure: A point distribution

Figure: Two different triangular networks

4. Preprocessing

Choice of triangular networks

Among numerous TINs, in spatial interpolation, Delaunaytriangulation is usually chosen, because it does not produce elongated, acute-angled triangles. This makes spline functions smooth, not fluctuated.

We can construct Delaunay triangulation by connecting points whose Voronoi regions are neighboring. Recent GIS provides a function of constructing Delaunaytriangulation from a point distribution.

4. Preprocessing

Homework Q.4.1 (10 pts)

Find an application of spatial interpolation, and report its objective, data acquisition method, interpolation method, and so forth.

4. Preprocessing

Homework Q.4.2 (20 pts)

Suppose you create spatial data of the surface elevation of the earth. Then you have to 1) determine the sample points, the locations whose elevation you measure, and 2) interpolate the elevation data measured to obtain the surface data.

Discuss how you should determine the sample points.