    Chapter 5

    Objective Mapping and Kriging

    Most of you are familiar with topographic contour maps. Those squiggly lines represent locations

    on the map of equal elevation. Many of you have probably seen a similar mode of presentation

    for scientific data, the iso-ness of those lines are comparable. What many of you are probably not

    familiar with are the mathematics that lie behind the creation of those maps and their uses.

    5.1 Contouring and gridding concepts

    This chapter covers the question: what do you do when your data is not on a regular grid?

    This question comes up frequently because computers can only draw, for example, contour lines

    if they know where to draw them. Often, a contouring package will grid your data for you using

    a default method of generating a grid and this will be acceptable. But theres more to it than

    making it easy for computers to draw contour lines. Nevertheless, different gridding methods will

    produce different looking maps. How, then, can we objectively decide between these different

    results? The answer to this question is, of course, dependent upon the problem you are working.

    5.1.1 Data on a regular grid

    There is a straightforward way to contour irregularly spaced data: Delaunay triangularization. The

    individual data points are connected in a network of triangles that have the following properties.

    The triangles formed are nearly equiangular and the longest side of each triangle is as short aspossible. We surround our irregularly spaced data points with an irregularly shaped polygon such

    that every point inside the polygon is closer to our enclosed data point and every data point outside

    the polygon is closer to some other data point. These irregular polygons are known as Thiessen

    polygons and the surrounding Thiessen polygons also enclose data points. Straight lines drawn

    from only neighboring Thiessen data points creates a Delaunay triangular network. The location of

    the contour line along these triangularization lines is then computed by a simple linear interpolation

    (see Fig. 5.1). This approach is OK for producing contour maps, but is difficult to use for derived


    104 Modeling Methods for Marine Science

    products (gradients, etc.) and is compute intensive. Furthermore, if you were to sample at different

    locations you would get a different contour map.

    Figure 5.1: An example of what a triangularization grid looks like. Choosing the optimal way

    to draw the connecting lines is a form of the Delaunay triangularization problem.

    Better then to put your data onto a regular, rectangular grid. A regular grid is easier for the

    computer to use, but is more difficult for the user to generate. But the benefits to be gain for this

    extra trouble are large.

    5.1.2 Methods: nearest neighbor, bilinear, inverse square of distance, etc.

    Nearest Neighbor: is a method that works in a way you might expect from the name. The grid

    value ( ) is estimated from the value of the nearest neighbor data point. The distance from a gridpoint to the actual data points is given by:


    where the index is for a number indicating grid number (in a sequential sense) and in this case

    refers to sequential numbers identifying the actual data points.


    Glover, Jenkins and Doney; 9 May 2005 DRAFT 105

    Equations 5.1 and 5.2 are the bare bones of nearest neighbor formulation. Sometimes this

    method is augmented by the N-nearest neighbors (see Fig 5.2):


    This method of generating grids is of particular use for filling in gaps in data already on a regular

    grid or very nearly so.

    Bilinear Interpolation: is a method that is frequently referred to as a good enough for govern-

    ment work method. The value at the grid point is an interpolation product ( ) from the following

    formulas for a 2-dimensional case:








    where are the actual data points surrounding the grid point (sometimes called node). But

    this method is best used for interpolating between data already on a grid. This method can be

    augmented and there are the logical extensions such as bicubic interpolation which yields higher

    order accuracy, but suffers from over- and under-shooting the target more frequently.Inverse distance: is actually a class of methods that weight the data points contribution to the

    grid point by the inverse of the distance between the grid point and data point (sometimes this

    weight is raised to a power, 2, 3, or even higher if there is a reason). This is basically Eqn 5.3

    with the raised to the power mentioned. This method is fast, but has a tendency to generate

    bulls-eyes around the actual data points.

    Kriging: is a method to determined the best linear unbiased estimate of the grid points. We

    will discuss this in greater detail in section 5.4. This method is very flexible, but requires the user

    106 Modeling Methods for Marine Science

























    + Z2


    Z4 Z





    Figure 5.2: An example of a regular grid. The lines connecting the -points are the distances

    that could be calculated, the dashed lines indicate distances too large for the data to be expected

    to have any significant influence on the grid point value. For example, the N-nearest neighbors

    method ( ) estimation of grid point would use the points , and ; for the simpler

    nearest neighbor method . This is a two dimensional example with axes and .

    to bring a priori information about the data to the problem. This information takes the form of

    a variogram of the semivariances and there are several models of variograms that can be used.

    Typically, real data is best dealt with a linear variogram unless there is rasonable amount of data to

    derive a robustvariogram (more in sections 5.3 and 5.4).

    5.1.3 Weighted averaging

    Several of the above methods can also have weighting added to improve the fidelity of the grid to

    the actual data. Consider Eqn 5.3 (N-nearest neighbors), this equation can have a weighting factor

    added to the numerator to increase the influence of some data points over others on the value of

    the grid points. Typically the weighting is done with some idea of the uncertainty in the individual

    data points themselves (such as ).

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 107


    5.1.4 Splines

    We dont plan on covering splines per se. Like many of the topics covered in this course, splines

    are a course onto themselves. But we would be remiss if we did not mention them here. Splines

    got their start as long flexible pieces of wood or metal. They were used to fit curvilinearly smooth

    shapes when the mathematics and/or the tools were not available to machine the shapes directly

    (i.e. hull shapes and the curvature of airplane wings).

    Since then, a mathematical equivalent has grown up around their use and they are extremely

    useful in fitting a smooth line or surface to irregularly spaced data points. They are also useful

    for interpolating between data points. They exist as piecewise polynomials constrained to have

    continuous derivatives at the joints between segments. By piecewise we mean, if you dont

    know how/what to do for the entire data array, then fit pieces of it one at a time. Essentially then,

    splines are piecewise functions for connecting points in 2 or 3 dimensions. They are not analytical

    functions nor are they statistical models, they are purely empirical and devoid of any theoretical


    The most common spline (there are many of them) is the cubic spline. A cubic polynomial can

    pass through any four points at once. To make sure that it is continuously smooth, a cubic spline

    is fit to only two of the data points at a time. This allows for the use of the other information to

    maintain this smoothness.

    If you consider Fig. 5.3 there are four data points ( , and ). Cubic polynomials are

    fit to only two data points at a time ( to , to , etc.). By requiring the tangent of

    at to be equal to the tangent of at , we can write a series of simultaneous equations

    and solve for the unknown coefficients. See Davis (1986) for more details and M ATLABs spline

    toolbox (based on deBoor, 1978).

    There are a number of known problems with splines. Extrapolating beyond the edges of thedata domain quite often yields wildly erratic results. This is because there is no information beyond

    the data domain to constrain the extrapolation and splines are essentially higher order polynomials

    which will grow to large values (positive or negative). Closely spaced data points can develop

    aneurysms. In an attempt to squeeze a higher order polynomial into a tight space large over- and

    under-shoots of the true function can occur. These problems also occur in 3-D applications of

    splines. However, if a smooth surface is what you are looking for, frequently a spline (see spline

    relaxation in other texts) will give you a good, usable smooth fit to your data.

    108 Modeling Methods for Marine Science

    Figure 5.3: A cubic polynomial is fit piecewise from to , to , etc. Because only two

    points are used at any one time, the additional information from the other points can be used to

    constrain the tangents to be equal at the intersections of the piecewise polynomials, for example at


    5.2 Moving Averages

    Sometimes it is possible to put your data onto a regular grid through various averaging schemes.

    One of the most common is the moving average. These averaging schemes are an outgrowth

    of a school of thought largely credited to mining operations in France and South Africa and are

    precursors to kriging, the main topic of this segment. Each averaging scheme applies some variant

    of the following mathematical equation:


    where the grid estimate ( ) is the sum of a weighting scheme ( ) times the actual observations

    ( ). The nature of varies as we have seen in the first part of this segment ( -nearest neighbors,

    inverse of the distance, inverse of the square of the distance, etc.).

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 109

    5.2.1 Block Means

    The first and simplest of these averaging techniques is the block mean. This technique involvesdividing your field area (containing somewhat randomly located samples) into equal area/volume

    blocks. Consider a two-dimensional field divided into nine sub-areas, or blocks, of equal area.










    Figure 5.4: A hypothetical study area divided into nine equal sub-areas or blocks. The red points

    represent actual data sampling locations and the blue represents just one out of several grid

    points an estimate is desired. As shown in Eqn 5.11, the value at the blue can be estimated as

    the weighted sum of the means of the surrounding blocks.

    An estimator for the center of this design is then given by equation 5.11 and each sub-area

    can be estimated by making it the center of its own 3-by-3 block.


    Here the s are the weights applied to the block means. These weights are determined by a

    number of methods, some of which are outlined in Section 5.1.3 or from field data that allows the

    inversion of the system of equations in Eqn 5.11.

    One drawback to this approach is that although the mean of the block is relatively independent

    of the size of the block (once the block is above a certain, data dependent, size), the variance of the

    110 Modeling Methods for Marine Science

    block estimate tends to increase with increasing block size. It is quite possible that the variance

    of the block estimate may be too large to make the estimate of much use in your investigation.Your block size can go either way, smaller: not enough data to be realistic; larger: all the structure

    averaged out (see discussion of stationarity below).

    5.2.2 Moving average design

    To produce estimates with lower variance and increase the reliability of the estimate we can use

    a variation of the above block mean called a moving average. This moving average is a variation

    upon the design of the block averaging. Once the study area has been divided into blocks, these

    same blocks can be re-divided to give you more s in Eqn 5.11, consider Fig 5.5.










    Figure 5.5: The same study are as in Fig 5.4, but with the area surrounding the point to be estimated

    divided into four new areas indicated by the shaded areas. Instead of having only nine block

    averages to work with, Eqn 5.11 will have 13. The red data points have been rmoved from the

    figure for clarity and the new blocks have not been numbered.

    It is left to the readers imagination as to how other geometries could be used to divide and

    re-divide the study area into blocks for estimating the blue cross. Keep in mind that only one blue

    cross was shown for demonstration purposes, but that each block has its own blue cross that is

    estimated in a fashion similar to the one we just discussed.

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 111

    The averages from the blocks, the s, can also be weighted, or windowed. The results, in

    histogram form, are shown in Fig 5.6. Now in Eqn 5.11 the s are not equal over the entire block,but rather are a function of how distant the block centroid is from the point to be estimated.

    a b

    Figure 5.6: The effects of two kinds of windowing on averaging (there are many forms of window-ing, these are just examples of two). In a) a simple boxcar type of of windowing was applied to the

    data, in this case the windows are all of an equal width producing a classical looking histogram.

    In b) the windows are tapered, like a Gaussian, making the data points closer to the point of esti-

    mation more important (of greater weight) than points farther away. As an example compare the

    shaded areas, which enclose the data and their weights used to estimate the point being estimated.

    5.2.3 Trend surface analysisThere are two aspects of trend surface analysis that are important for gridding your data and kriging

    (which may sound redundant, but there are subtle differences between the two). In the first case,

    by fitting a trend surface to your data you can use the fit function to re-sample your data field on a

    regular grid. This reflects an interest in the trend surface itself. In the second case, you may want

    to remove a trend surface from your data before proceeding with the kriging operation.

    Sometimes it is desirable or just convenient to have a function that represents your data in

    terms of the coordinate system of your study area (e.g. in terms of the longitude and latitude).

    112 Modeling Methods for Marine Science

    In these cases it is possible to make your study variable a function of your coordinate system.

    You are, in fact, fitting atrend surface

    to your data in terms of the coordinates that you use tolocate your samples. It can be a trend surface of any order and rank; meaning that the trend can

    be order (a straight line, a flat plane) or order (quadratic curve, surface or hyper-surface).

    The order refers to the highest power any independent variable is raised to, the rankrefers to the

    dimensionality. You can set up the equations and solve them with either normal equations or the

    design matrix, in certain advanced cases you may need to apply the non-linear fitting technique of

    Levenberg-Marquardt. Or, in most cases, you can use a handy little m-file Bill Jenkins wrote up

    called surfit.m. It uses the repetitious nature of higher and higher order polynomials and the

    SVD solution to the normal equations to fit surfaces to your data of the form:


    Most grid generation schemes work best when , in order to accomplish this it is im-

    portant to remove any trend surface from your data first. At the very least you should remove a

    first order, -dimensional surface ( refers to the rank of your coordinate system) from your data

    before proceeding to run your grid generation routine. You can always add it back in to your grid

    estimation points because you now have an analytical equation that relates your property to the

    coordinate system of your study area. Higher order, -dimensional, surfaces can also be fitted.

    The higher order you go, the better your fit will be regardless of what you use as a goodness-of-fit

    parameter. But keep in mind the better fit may not be statistically significant and you can use

    ANOVA to test for this (see also Davis, 1986, pp 419-425).

    5.3 Variograms

    At the heart of kriging is the semivariogram or structure function of the regionalized variables that

    you are trying to estimate. This amounts to the a priori information that you must supply to the

    software in order to make a regular grid out of your irregularly spaced data. Basically the idea is to

    have an estimate of the distance one would need to travel before data points separated by that much

    distance are uncorrelated. This information is usually presented in the form of the variogram, in

    which the semivariance is a function of distance or lag ( ).

    5.3.1 Regionalized variables

    Simply put, a regionalized variable is a variable that can be said to be distributedin space. This

    space is not limited to the three-dimensional kind of space that we move around in every day, but

    can be extended to include time, parameter space, property space, etc. This definition distributed

    in space is purely descriptive and makes no probabilistic assumptions. It merely recognizes the

    fact that properties measured in space follow an irregular pattern that cannot be described by a

    mathematical function. Nevertheless, at every point in a space it has a value ( ,

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 113

    where is equal to the dimensionality of your space). A regionalized variable is typically repre-

    sented as and the grid point estimate of it as . A regionalized variable, then, seems tohave two contradictory characteristics:

    a local, random, erratic aspect which calls to mind the notion of a random variable;a general (or average) structured aspect which requires a certain functional representation.

    Hence we are dealing with a naturally occurring property (variable) that has characteristics

    intermediate between a truly random variable and completely deterministic variable. In addition,

    this variable (property) can have what is known as a drift associated with it. These drifts are

    generally handled with trend surface analysis and can be analyzed for and subtracted out of the

    data much the same way an offset can be subtracted out of a data set.

    5.3.2 Semivariance

    First remember the definition of variance:


    in most cases the variance of a data set is a number (scalar). The semivariance is a curve (vector)

    derived from the data according to:


    where the asterisk indicates an experimental variogram computed from the data and is the lag

    distance between data point pairs. There also are theoretical semivariograms which model the

    structure of the underlying correlation between data points, such as the exponential model:


    where equals the nugget, equals the silland equals the range of the semivariogram model.

    5.3.3 The nugget, range and sill

    These three parameters define the semivariogram:

    Nugget ( ): Represents unresolved, sub-grid scale variation or measurement error and is seen on

    the variogram as the intercept of the variogram.

    114 Modeling Methods for Marine Science

    Range ( ): The scalar that controls the degree of correlation between data points, usually repre-

    sented as a distance.Sill ( ): The value of the semivariance as the lag ( ) goes to infinity, it is equal to the total variance

    of the data set.

    Given the two parameters range and sill and the appropriate model of semivariogram, the

    semivariances can be calculated for any . These quantities can be best visualized in Fig 5.7, a

    simple exponential model of semivariance.

    0 5 10 15 20 25 300












    Lag (h)

    Exponential Semivariogram

    Figure 5.7: A simple exponential semivariogram with a range of 5 and a sill of 10.

    The constant offset ( ) added to the theoretical semivariance models is known as the nugget

    effect. This constant accounts for the influence of high concentration centers in the data that pre-

    vent the experimental semivariogram from passing through the origin. This model has its begin-nings with mining geologist who were looking for nuggets of gold, which were rarely sampled

    directly, hence the unresolved or sub-sampling grid scale variability.

    There are several models of semivariance to pick from, the trick is to pick the one that best fits

    your data. We will mention, later on in our discussions of kriging and cokriging, that if you are

    estimating the semivariogram experimentally (i.e. from actual data) often the linear modelseems

    to give the best results. But there seems to be quite a bit of debated over what is the universal

    model. You have already seen the exponential model, there are also the:

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 115

    spherical model - which rises to the sill value more quickly than the exponential model, the gen-

    eral equation for it looks like:


    Gaussian model - is a semivariogram model that displays parabolic behavior near the origin (un-

    like the previous models which display linear behavior near the origin). The formula that

    describes a gaussian model is:


    linear model - in this model the data do not support any evidence for a sill or a range and rather

    appear to have increasing semivariance as the lag increases. This is a key sign that the proper

    choice is the linear model. In these cases the linear model is concerned with the slope and

    intercept of the experimental semivariogram. It is given simply as:


    and the slope ( ) is nothing more than the ratio of the sill ( ) to the range ( ).

    5.3.4 2 Order Stationarity

    Data fields are said to be first order stationary when there is no trend, i.e. the mean of the field is

    the same in all sub-regions. This is easily accomplished by fitting and removing a trend surface

    to/from the data (if you know what the trend is in the first place). Second order stationary data

    field are realized when the variance is constant from one sub-region to the next. We say the data

    (actually really the residuals) are homoscedastic, that is to say, equally scattered about a mean of


    5.3.5 Isotropic and anisotropic data

    The easiest semivariance model to envision of your data is when the sill and range values are

    always the same, regardless of the direction being considered. But that is not always the case andit is often found that data display anisotropic behavior in their range. Nevertheless, if the data is

    second order stationary, the sill will be the same in all directions. If it is not, then this is a warning

    that not all the large-scale structure has been removed from the data. Consider again an exponential

    model but now look at the difference revealed when the semivariances are calculated only in the

    north-south direction compared to only in the east-west direction (Fig 5.8). Knowledge of these

    anisotropies is necessary when designing an appropriate semivariogram model of your data prior

    to kriging.

    116 Modeling Methods for Marine Science

    0 5 10 15 20 25 300












    Lag (h)



    Figure 5.8: Two semivariograms showing the presence of anisotropies in the data. In this case the

    range and sill for the east-west direction is 5 and 8, but in the north-south direction they are 3 and


    5.3.6 Robust semivariogram

    There will be times when you will hear references to a robust semivariance estimator. This idea

    was championed by Noel Cressie and is dealt with in some detail in his book (Statistics for Spatial

    Data). Basically it is a variant on Eqn 5.14 that accounts for the effects of outliers in your data.

    Outliers (data in the tails of your data distribution that fall outside Gaussian expectations) have a

    tendency to distort the results of Eqn 5.14. Cressie has put forward the following equation to make

    the experimentally determined semivariogram less sensitive to these outliers (hence, robust).


    While somewhat overwhelming looking, upon inspection we see that this is just Eqn 5.14

    modified. By taking the absolute value of the difference between two data points separated by

    a distance , then taking its square root, dividing by the number of data pairs separated by the

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 117

    distance , and then raising the results to the fourth power we diminish the effects of these outliers.

    The denominator is nothing more than anormalization

    to make gamma unbiased. This form ofthe experimental semivariogram is very useful in cases where we have a lot of data to estimate

    the semivariogram from and outliers can become an irksome problem; although this equation also

    works on lower data densities.

    5.4 Kriging

    Kriging is a method devised by geostatisticians to provide the best local estimate of the value of

    the mean value of a regionalized variable. Typically this was an ore grade and was motivated

    by the desire to extract the most value from the ore deposit with the minimum amount of capital

    investment. The technique and theory of geostatistics has grown since those early days into afield dedicated to finding the best linear unbiased estimator(BLUE) of the unknown characteristic

    being studied.

    5.4.1 Variogram criticality

    While one can use ANOVA to determine at what level a trend surface is significant, there is still

    something of an art to determining the correct variogram model to use. Using ANOVA as a guide

    one can fit trend surfaces to the data of ever increasing order, eventually your ANOVA will tell

    you that at some specified level of significance, a trend surface of that order does not provide a

    statistically significant increase in the fit to the data. Thats where you stop, at order minus one.

    The weights and neighborhood of the trend surface analysis is dependent upon the semivarianceof the data, i.e. dependent upon the structure function the data displays. This interdependency

    between trend surface and semivariance means that there is no unique solution/combination of

    trend and semivariance, hence the art. The degree of spatial continuity of the data (regionalized

    variable) is given by the semivariogram (see section 5.3) and some of the types of models used are

    covered in section 5.3.2.

    5.4.2 Punctual kriging

    To explain what kriging is, we am going to concentrate on the simplest form of kriging, punctual

    kriging. Consider that you want to find:


    That is to say, find the best linear weighted (unbiased) estimate for property at point (note

    thios is a capital ). In addition suppose you also want to know what the estimation error is as



    118 Modeling Methods for Marine Science

    That is to say, you want to know the difference between what you estimated is and what it

    really is, a quantity we usually dont know ( ). There is a way to do this by requiring that theweights sum to one, this will result in an unbiased estimate if there is no trend. You can then

    calculate the error variance as:


    It seems only logical that the closer a data point is to the grid point you wish to estimate the

    more weight it should carry. These weights used ( ) and the error of estimate ( ) are related to

    through the semivariogram. So, if we had three data points from which to estimate one grid

    point (as in Fig 5.9), we would have:


    for the estimate and:


    for the weights. The question that remains is: how do we find the best set of s? Consider

    Fig 5.9, here we have three data (control) points and from them we wish to make a best linear

    unbiased estimate of the -field at grid point .

    Using the semivariogram we can create the following sets of equations:



    where is the semivariance over the distance between control points and and

    is the semivariance over the distance between the control point and the grid point . With

    Eqn 5.24 we have three unknowns and fourequations (remember Eqn 5.24) and to force Eqn 5.24

    to always be true we add a slack variable resulting in a matrix set of equations like:


    This yields the and one more equation:


    Glover, Jenkins and Doney; 9 May 2005 DRAFT 119
















    Figure 5.9: Showing the layout of three control points and the grid point to be estimated in example

    5.1. The distances ( ) between control points (dashed lines) used to calculate the left hand side

    of Eqn 5.29 and the distances from control points to used to calculate the righthand side are


    yields the error of estimate.

    Now we want to point out that something really cool is happening here. If you stop and think

    about it you may wonder why the weights should apply to both the data points and the semi-

    variances. We shouldnt have any problem considering Eqn 5.23, after all its just the best linearly

    weighted combination of the surrounding data points. But what about Eqn 5.25? Why should these

    also be true? Well, strictly they arent, not until you add the slack variable ( ) that allows Eqn 5.24to always be true. What insight does this give you into the nature of regionalized variables? Well

    let you ponder that for a while.

    Now it sometimes happens that you dont want to or cant remove the trend surface prior to

    kriging. It is still possible to come up with a best linear unbiased estimate of your grid points using

    Universal Kriging, the matrix you form is even more complicated than the one in Eqn 5.27 and is

    covered in Davis, Chapter 5.

    120 Modeling Methods for Marine Science

    Table 5.1: Example 5.1

    Coordinate (km) Coordinate (km) Water Elevation (m)Well 1 3.0 4.0 120

    Well 2 6.3 3.4 103

    Well 3 2.0 1.3 142

    Your site 3.0 3.0 ???

    5.4.3 An example

    This example is taken from Davis, Chapter 5 and addresses only the concept of punctual kriging.

    Suppose you wanted to dig a well and wanted a good estimate of the elevation of the water tablebefore you began digging. Suppose further that you had three wells already dug distributed about

    your proposed site ( ) much in the same fashion as are the control points in Fig 5.9. Given the

    data in Table 5.1, you can use punctual kriging to make a best linear unbiased estimate of the water

    table elevation at your proposed site.

    From this information and a structure analysis (semivariogram) you can fill out the equations

    in Eqn 5.27 and then solve for the water table elevation at your site. The semivariogram analysis

    revealed a linear semivariogram out to 20 km with an intercept of zero and a slope of 4 m /km. So

    the matrices in Eqn 5.27 look like:


    The numbers on the left-hand side of the equation come from the semivariances between con-

    trol points calculated by knowing the distance between them and the linear semivariogram model.

    The numbers on the right-hand side of the equation come from knowing the distance between the

    proposed site and each control point and the linear semivariogram model.

    If the condition number of the matrix on the left-hand side isnt too bad, you can invert directly

    to solve for the Ws and lambda, otherwise you can use SVD to solve for the answer. Either way

    in this case you get a column vector of:


    which when multiplied through Eqn 5.23 gives an estimate of 125.3 m and using Eqn 5.28 yields an

    error estimate of 5.28 m . The square root of this number represents one standard deviation (2.30

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 121

    m), which represents the bounds of 68% confidence. So plus or minus two times this standard

    deviation yields the elevation of the water table at your proposed site with a 95% confidence.MATLABs answers are a little different from the ones in Davis (1986), but we attribute that

    to the fact that Davis does not use singular value decomposition to invert his matrices (see Davis,

    1986, Chap 3).

    5.5 Cokriging with MATLAB

    We have been very fortunate to obtain from Denis Marcotte (via e-mail) copies of the m-files

    published in Marcotte (1991). This section of the lecture notes covers material on how to use this

    program. Although not covered in lecture, this very powerful program will extremely useful to anyof you that must make objective grids of their data during your careers.

    The concept ofcokriging is nothing more than a multivariate extension of the kriging technique

    we went over in class and is covered in lecture notes section 5.4. Instead of going through all of

    the machinations necessary for kriging one property at a time, we do all of the properties we wish

    to grid in one calculation. In addition, covariance information about the way properties related

    to each other is used to improve the grid estimation and reduce the error associated with the grid


    5.5.1 Estimating the variogramAlong with the types of variograms estimated in lecture notes section 5.3, cross-variograms are also

    necessary. These are logical extensions of the variograms we have already dealt with. Remember

    the semivariance is provided by:


    The cross-semivariance is given by:


    where refers to the number of data pairs that are separated by the same distance , when

    you have the definition of the semivariogram. One interesting thing about the cross-semivariance

    is that it can take on negative values. The semivariance must, by definition always be positive, the

    cross-semivariance can be negative because the value of one property may be increasing while the

    other in the pair is decreasing.

    122 Modeling Methods for Marine Science

    5.5.2 The coregionalized model

    As we discussed earlier, a regionalized variable is a variable that is distributed in space, where themeaning of space can be extended to include phenomena that are generally thought of as occurring

    in time. A regionalized phenomena can be represented by several inter-correlated variables, for

    example, lead-zinc deposits or nutrients in the ocean. Then there may be some advantage to study

    them simultaneously, this is an extension of the regionalized variable theory to mulitvariate space

    and is what amounts to a coregionalized model. We can see from Eqn 5.32 that the cross-variogram

    is symmetric in ( ) and ( ), which is not always the case in the covariance matrix formed

    from the data.

    5.5.3 Using cokri.m

    In this section we will try to give you our best understanding of the program cokri.m. In this

    way we hope to make the simplest and most straightforward application of this program available

    to you while opening the possibility of future, more complicated uses, to you as well.

    Sometimes the easiest way to understand a program is to understand, as best as possible, what

    the input and output variables are. But first lets define some of the indices we will be using

    when talking about the parts of the cokri.m: represents the number of data points (real ones,

    not estimated ones), represents the number of dimensions you are working with (in the water

    table example above you have and coordinates, hence , remeber the elevation of the

    water table was your regionalized variable), lowercase represents the number of properties you

    are working with (again in the example above there was only water table elevation, so ),

    represents the total number of grid points (nodes) that you are working with, represents thenumber of variogram models you are working with. Now for the input and output variables, in the

    case ofcokri.m they are as follows:


    x this is the by matrix of data points. In this program refers to the total number of

    sample locations (stations, well locations, etc.), refers to the number of properties you are

    estimating, the dimensionality of the problem being studied (1-D, 2-D, 3-D, etc.).

    x0 is the by matrix of points on your grid that you will be kriging (cokriging) onto. In this

    program in the number of grid points, e.g. if you are working a 2-D problem and decide

    to put your estimates onto a 21 by 57 point grid, will be equal to 1197, is, however,

    represented as a 1197 by 2 matrix of coordinate doublets.

    model is perhaps one of the trickiest variables in the program. It represents half of the core-

    gionalization model to be used. If you are using only one model in 3-D, then model is a 1

    by 4 matrix wherein the first column is the code of the model (1=nugget, 2=exponential,

    3=gaussian, 4=spherical, 5=linear). The remaining columns of the variable represents

    the range of your model in the , , and directions. A special note should be made here

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 123

    about the use of a linear model. As stated in the help information of cokri.m, the ranges in

    a linear model are arbitrary so that when they are dividedinto

    the also arbitrary sill values inthey produce the slope of the linear semivariogram model being used for that linear model

    in that direction.

    c is the variable containing the by sills of the coregionalization model in use. In this program

    is used to represent the number of variogram models being applied to the problem. For

    example, one might wish to combine the effects of a nugget model with a linear model for

    three properties in 3-dimensions, then , is a 2 by 4 matrix, and is a 6 by 3 matrix

    of numbers. A nugget model is indicated when the intercept of a semivariogram model is

    not zero and that intercept value is put in the first by sub-matrix of to correspond to the

    first model row of the model variable.

    itype is a scale variable indicating the type of cokriging to be done. In five different values

    just about everything is covered, from simple cokriging to universal cokriging with a trend

    surface of order 2. In general, simple cokriging should be used when the mean of the data is

    known and the data field is globally stationary in its mean as well as locally stationary in it


    avg, in his paper (Marcotte, 1991) states that this variable is not used, but later in one of his

    examples he uses it. We cannot get the program to run unless we provide a 1 by matrix of

    the averages of the individual properties being cokriged when doing simple cokriging.

    block is a 1 by vector of the size of the to be cokriged. If we were certain of the volumeof our individual samples we could use something other than point kriging, i.e. any positive

    values will work in that case.

    ndis a 1 by vector of the discretization grid for cokriging, if using point cokriging make them

    all ones.

    ival is a scalar describing whether or not cross-validation should be done and how. We find it

    easier and quicker to run the program with set to zero for no cross-validation.

    nk is a scalar indicating the number of nearest neighbors of the input matrix to use in estimating

    the cokriged grid point. This is a difficult parameter to give hard and fast rules for deciding

    how large to make this. You may wish cokri.m to use all of the data points and set this

    scalar to a very large number, on the other hand you may wish for only local effects to factor

    into the weighted estimates for the grid point. If you dont get satisfactory results the first

    time around, increase or decrease this number.

    radis a scalar that describes the radius of search for the nearest neighbors in , clearly they are

    interrelated and one helps constrain the other. Additionally, it is clear here that the units of

    the coordinates need to be in the same units, if not, standardization helps.

    124 Modeling Methods for Marine Science

    ntok a scalar descibing how many groups of grid points in will be cokriged as one. When

    is greater than one, the points inside the search radius will be found from the scentroid location.


    x0s is, of course, your answer. It is a by matrix of the grid point estimates. The

    columns correspond to the grid point coordinates given in and the columns correspond

    to the estimates of the properties at those grid point coordinates.

    s is a by matrix of the error estimates of the grid points. This is the big benefit to

    kriging in that it provides you with not only an estimate of a propertys value at a grid point,

    but also an estimate of the uncertainty in that estimate.

    sv is a 1 by vector of variances of points in the universe.

    idis a ( ) by 2 matrix of the identifiers of the (or, in Davis, ) weights for the last cokriging

    system solved (i.e. the last grid point system of equations).

    lis a (( ) minus ) by ( ) matrix with the (or ) weights and Lagrange multipliers

    of the last cokriging system solved. In this program refers to the number of constraints

    applied to the simple cokriging system.

    A word of caution, for some reason, Marcotte has set up cokri.m to turn off case sensitivity.

    When the program is finished running variables Axb and axb are considered the same and makingreference to a variable such as Axb will generate a variable or function not found

    error. Simply issue the command casesen and case sensitivity will be restored. We have modi-

    fied the code we provide to you by simply commenting out the casesen off command with

    %casen off, so you neednt worry about this at first (but it is available, just remove the % sign).

    5.5.4 Things to Remember

    When using cokri.m it may be helpful to remember the following three insights as to how the

    program works.

    1. If the data have been properly detrended (rendered second order stationary), then it is only

    logical to assume that the nuggets will be equal regardless of direction. As the lag goes

    to zero the subgrid scale noise (composed of both real geophysical noise and measurment

    error) will converge to the same value for all directions.

    2. In a similar argument, if the data have been properly detrended, then the sills have to be

    equal regardless of direction for each property (and cross-property). Think about the mostly

    gridded example in class, the sill represents the total variance in the anomalies (residuals

    Glover, Jenkins and Doney; 9 May 2005 DRAFT 125

    = data minus trend). Just because you calculated the semivariances in different directions

    doesnt mean you havent used all of the data points, since youve used all of the data andthe sill represents the total variance contained in the data (anomalies), they will also be equal

    regardless of direction.

    3. This last one may seem a little odd. The ranges should all be the same in a given direction,

    regardless of the property or cross-property. Think of it this way, the decorrelation scale

    length is always the same in a given direction; the medium (seawater, granitic batholith, etc.)

    doesnt change eventhough the property might.

    Now, of course, weve told you how difficult it is to render your data second order stationary

    and the above insights might not be strickly, numerically true. Your options are to return to your

    trend surface analysis and see if you cant find a better filter to remove the large scale trend that

    is contaminating your anomalies. Or, if the fitted parameters are close in value (remember that

    nlleasqr.m gives you error estimates of these parametes), averaging them can still yield useful

    results. Remember, the sill and nugget are averaged over directions, but the ranges are averaged

    over properties.

    5.6 Problems

    All of your problems sets are served from the web page:


    which can be reach via a number of links from the main course web page. In addition, the date the

    problem set comes out, the date it is due, and the date the answers will be posted are also available

    in a number of locations (including the one above) on the course web page.


    Clark, I., 1979, Practical Geostatistics, Elsevier, New York, 129 p.

    Cressie, N.A., 1993, Statistics for Spatial Data, Wiley-Interscience, New York, 900 p.

    Davis, J.C., 1986, Statistics and Data Analysis in Geology, 2 Edition. John Wiley and Sons,

    New York, 646 pp.

    deBoor, C., 1978, A Practical Guide to Splines, Springer-Verlag, New York, 392 p.

    Marcotte, D., 1991, Cokriging with MATLAB, Comp. and Geosci., 17(9): 12651280.

    126 Modeling Methods for Marine Science
