Stevens.olsen.jsm Short Course.2006

8/17/2019 Stevens.olsen.jsm Short Course.2006

1/30

Spatial Sampling

Don StevensDepartment of Statistics

Oregon State University

Corvallis, Oregon

[email protected]

Anthony (Tony) R. OlsenWestern Ecology Division

US EPA

Corvallis, Oregon

[email protected]

Aquatic Resource Monitoring Web Page

http://www.epa.gov/nheerl/arm

Short Course Learning Objectives

• learn how to develop survey designs when the

population of interest occurs in 2-dimensional space

• learn the distinction between, and the importance of,

populations that can be modeled as points, lines , and

polygons

• learn how spatial survey designs can be selected using

GIS shapefiles and R Statistical Language

• be introduced to a local neighborhood variance

estimator

Course Outline

• Population elements: modeled as points, lines, or polygons. The point caseis a finite population; the other two are continuous (infinite) populations.How does this impact survey design process?

• What are spatial survey designs and why are they different?

• What is important about ensuring spatial balance?

• Generalized Random Tessellation Stratified (GRTS) survey designs

Ohio River illustration of spatial balance process

2-dimensional illustration of spatial balance process

Theory behind GRTS survey designs

• Need for flexibility in spatial survey designs

Imperfect sampling frames

Addressing non-response issues

• Spatial balance with respect to space versus spatial balance with respect totarget population

• Sample selection for GRTS survey designs using R and spsurvey library

• Variance estimation for GRTS

• Population estimation for GRTS using R and spsurveylibrary

Spatial Sampling Framework

• Applies to all natural resource monitoring

• Monitoring pieces must be designed and

implemented to fit together

• View as information system

• Short course focus on (1) “Design”,

specifically sample selection and (2)

“Assess”, population estimation

• Reference: National Water Quality

Monitoring Council, Water Resources

IMPACT, September 2003 issue

• Kish (1965): “The survey objectives should determine the

sample design; but the determination is actually a two-way

process…”

• Initially objectives are stated in common sense statements –

challenge is to transform them into quantitative questions that

can be used to specify the design.

• Statistical perspective

Know whether a monitoring design can answer the question

Know when the question is not precise enough – multiple interpretations

Developmonitoringobjectives

ConveyResultsand

findings

Designmonitoringprogram

Designmonitoringprogram Collect

field andlab data

Developmonitoringobjectives

• Key components of monitoring design

What resource will be monitored? (target population)

What will be measured? (variables or indicators)

How will indicators be measured? (response design)

When and how frequently will the measurements be taken?

(temporal design)

Where will the measurements be taken? (spatial survey design)

• Statistical perspective

Target population and its representation, the sample frame

Spatial survey design for site selection

Panel design for monitoring across years


2/30

Spatial Survey Design Process

Resource

Characteristics

Monitoring

Objectives

Institutional

Constraints

Design

Requirements

Target

Population

Sample

Frame

Spatial

Survey

Design

Site

Selection

using R

Design

File

Sampling in Space:

Spatial Characteristics ofTarget Population Elements

&

Sample Units

Spatial Objects

• 0-, 1-, 2-dimensional representation

(points, lines, polygons)

Point-like

• Finite population of discrete units, e.g., small- to medium-sized

lakes

Linear

• Width is very small relative to length, e.g., streams or riparian

vegetation belts

Extensive

• Covers large area in a more or less continuous and connected

fashion, e.g., a forest stand, large estuary (San Francisco Bay), or

wetland (Florida Everglades)

Point Example: Elements and Sample Units

are all centroids of lakes/reservoirs

Linear Example: Elements and Sample Units are all

points in the linear network Polygon Example: Elements and Sample Units are all

points within the polygons


3/30

Linear Network Modeled as Points: elements and sample

units are all segments defined by linear network

Polygon (Minnesota) modeled as point sample

units

Spatial Survey Design

and

Spatial Balance


• Survey design for Natural Resources Spatial relationships in population are critical

• Elements near one another tend to share substrate

• Elements near one another subject to same or similar natural and

anthropogenic stressors

• Tobler's First Law of Geography: Things that are close

together in space tend to have more similar properties

than things that are far apart.

OR

• Spatial correlation functions tend to decrease with

distance


• Survey design for natural resources Standard survey design methodology not well-suited natural

resource populations

• Overwhelming emphasis on finite populations

• Space is an intrinsic feature of the resource

• Most natural resource populations are naturallyconceptualized as continua of points Can discretize stream networks by chopping into fixed-length

segments or variable length reaches, but difficult to retain spatialrelationships

Can use grid to divide a forest or estuary into finite collection ofcells, but cells don’t have a natural or ecological interpretation

Inference is then to number, not extent, e.g., number of reaches,of miles of stream

Sampling Natural Resource Populations

• Patterned response (gradients, patches, periodic

responses)

• Variable inclusion probability

Ecological importance, economic importance, environmental

stressor levels, scientific interest, and political importance arenot uniform over the extent of the resource

• Pattern in population occurrence (density)

Stream or lake density (NE US versus Western US)

• Unreliable frame material

• Access difficulty

• Temporal panels often needed


4/30

Desirable Properties of Natural Resource

Samples

(1) Accommodate varying spatial sample intensity

(2) Spread the sample points evenly and regularly over the

domain, subject to (1)

(3) Allow augmentation of the sample after-the-fact, while

maintaining (2)

(4) Accommodate varying population spatial density for

finite & linear populations, subject to (1) & (2).


• Natural resource populations exist in a spatial matrix

• Population elements close to one another tend to be more

similar than widely separated elements• Good sampling designs tend to spread out the sample

points more or less regularly

• Simple random sampling tends to exhibit uneven spatial patterns

• Some survey techniques address spatial relationships

Basic Spatial Survey Designs

• Simple random sample

• Systematic sample

Regular grid

Regular spacing on linear resource

• Spatially stratified

Strata can be geometric figures (grid cells), political boundaries

(township lines), or natural boundaries (watersheds)

Maximum regularity ⇒• few ( 1 or 2) samples per stratum

• equal area strata

Basic Spatial Survey Designs (continued)

• Spatial ordering

Serpentine strips

Graph-theoretical

• Spatially balanced sample

Combination of simple random and systematic characteristics

Guarantees all possible samples are distributed across the resource

(target population)

Spatially Balanced Sample

• A balanced sample has the property that the number of

samples in any interval of the range of the response is

proportional to the extent of the population in that range.

• Let n I be the number of samples in the interval I , where

I = (z 1 , z 2 ).

• For spatial balance we want

n I ≈ n(F(z 2 )-F(z 1 )),

where F(⋅) is the distribution function of the response z.

• In a perfectly balanced sample, we would have equality,

but if we knew enough to perfectly balance, we wouldn’t

need to sample.

Spatially Balanced Sample

• For a response with spatial pattern, we get approximate

balance over the response by ensuring spatial balance,

i.e., that for any A Ã R, we have that n A ≈ n|A|/|R|, where

R is the domain of the response.

• Of course, for any equi-probable sample,

E[n A] ≈ n|A|/|R|,

so we really want V[n A] to be “small”.


5/30

A B C

28 28 15

Simple random sample of a domain

with 3 subdomains


• Systematic sample has substantial disadvantages

Well known problems with periodic response

Less well recognized problem: patch-like response or gradient

oriented along grid line

A B C

26 24 15

Systematic Sample

with

3 Subdomains

A B C

26 24 15

A B C

32 20 16

Systematic Sample

with

3 Subdomains

Sampling Aquatic Resource Populations

• Systematic sample has substantial disadvantages Difficult to apply to finite populations , e.g., Lakes

Limited flexibility to change sample point density

Difficult to accommodate variable inclusion probability orsample adjustment for frame errors

A B C

26 88 15

Sample point intensity can be

changed using nested grids


6/30

RANDOM-TESSELLATION STRATIFIED

(RTS) DESIGN

• Compromise between systematic & SRS that resolves periodic/patchy response

• Cover the population domain with a grid Randomly located

Regular (square or triangular)

Spacing chosen to give required spatial resolution

Tile the domain with equal-sized regular polygons containingthe grid points

Select one sample point at random from each tessellation polygon

RANDOM-TESSELLATION STRATIFIED

(RTS) DESIGN

• Solves some of systematic sample problems

Non-zero pairwise inclusion probability

Alignment with geographic features of population

Lets points get close together with low probability

RTS Sample

RTS DESIGN

• Does not resolve systematic sample difficulties with

variable probability

finite & linear populations

pattern in population occurrence (density)

unreliable frame material

Limited ability to change density

Generalized Random Tessellation Stratified

(GRTS) Survey Designs

• Emphasize spatial-balance: Every replication of the

sample exhibits a spatial density pattern that closelymimics the spatial density pattern of the resource

Historical Context

• GRTS design evolved from EMAP work on

global tessellations in the early 1990’s

• Became clear that basic systematic structure did

not have enough flexibility to accommodate thecharacteristics of environmental resource

sampling


7/30

Theoretical Development of GRTS

Generalized Random-Tessellation Stratified

(GRTS) Design

• Conceptual structure: Population indexed by points contained within a region R

Have inclusion probability p( s) defined on R

Select a sample by picking points

• Finite: points represent units

p(s) is usual inclusion probability

• Linear: points on the lines

p(s) is a density: E(#sample points) /unit length

• Extensive: points are in region area

p(s) is a density: E(#sample points)/unit area

GRTS Design

Mechanics

• Map R into first quadrant of unit square, & add a random

offset

• Subdivide unit square into “small” grid cells

At least small enough so that total inclusion probability for a cell

(expected number of samples in the cell) is less than 1

Total inclusion probability for cell is sum or integral of p( s) over

the extent of the cell

0.0 0.2 0.4 0.6 0.8 1.0

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

Population region image

0.0 0.2 0.4 0.6 0.8 1.0

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

Population region image + random offset GRTS Design

MechanicsOrder the cells so that some 2-dimensional proximity

relationships are preserved

Can’t preserve everything, because a 1-1, onto, continuous map

from unit square to unit interval is impossible

Can get 1-1,onto, & measurable, which is good enough GRTS uses a quadrant-recursive function, similar to the space

filling curve developed by Guiseppe Peano in 1890.


8/30

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3 01

2

3

0

1

2

3

0

1

2

3

0

1

2

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0 1

0

1

x

y

→

↑

Assign each cell an

address corresponding

to the order ofsubdivision

The address of the

shaded quadrant is

0.213

Order the cells

following the address

order

GRTS Design

Mechanics

• If we carry the process to the limit, letting the grid cell

size → 0, the result is a quadrant recursive function, thatis, a function that maps the unit square onto the unit

interval such that the image of every (sub) quadrant is an

interval.

• The resulting function is 1-1, onto, and measurable.

• Apply a restricted randomization that preserves quadrant

recursiveness.

HIERARCHICAL RANDOMIZATION

Each cell address is a base 4 fraction, that is, t = 0.t 1t 2t 3..., where each

digit t i is either a 0, 1, 2, or 3. A function h p is a hierarchical

permutation if

where is a possibly distinct permutation of {0,1,2,3} for

each unique combination of digits t 1 , t 2 , ..., t n - 1.

Every time a cell is sub-divided, we choose a random permutation to

order the sub-quadrants.

1 1 2 p 1 2 31 2 3t t t (t) = 0. ( ) ( ) ( )... p p ph t t t

_ 1 2 n 1... nt t t

( ) p•

HIERARCHICAL RANDOMIZATION

• If the permutations that define h p(·) are chosen at random

and independently from the set of all possible

permutations, we call h p(·) a hierarchical randomization

function, and the process of applying h p(·) hierarchical

randomization.

• Compose the basic q-r map with a hierarchical

randomization function to get a random, quadrant-

recursive function.

x

y

0 1

0

1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

x

y

0 1

0

1

x

y

0 1

0

1

1

2 3

4

5

67

8

9

10 11

12

1314

1516

x

y

0 1

0

1

Start →

← Start

GRTS Design

Mechanics

• The result is a random order of the “small” grid cells

such that

All grid cells in the same quadrant have consecutive order

positions

• But will be randomly ordered within those positions This holds for all quadrant levels

• This induces a random ordering of population elements


9/30

GRTS Design

Mechanics

• Assign each grid cell a length equal to its total inclusion

probability

• String the lengths in the random order

Result is a line with length equal to target sample size

• Take systematic sample along line (random start + unit

interval)

• Map back to population using inverse random q-r

function

GRTS DESIGN

Let• I = (0, 1] I 2 = (0, 1] ×(0, 1]

• φ( s) be a measure on I 2 (number, length, area)

• π( s) be an inclusion intensity function on I 2

• f : I 2 → I be a hierarchically randomized quadrant recursivefunction

GRTS DESIGN

• Map population domain R into (0, ½]×(0, ½],add random offset to get image R* ⊂ I 2

• Set

• F(x) is a random distribution function with range (0, M]

-1((0,x)) f

F(x) = (s)d (s)π φ ∫

GRTS DESIGN

• Pick u1


10/30

Reverse Hierarchical Order

• Illustrate for 2-levels of addressing:

First 16 addresses as base 4-fractions

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33



First 16 addresses as base 4-fractions

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Reversed digits

00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33



First 16 addresses as base 4-numbers

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Reversed digits

00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33

Reversed digits as base 10 numbers

0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15


• The above algorithm works for sample sizes that are

powers of 4, i.e., n = 4k

• For other values of sample, need to modify:

Let k be smallest integer such that n ≤ 4k

Form the reverse hierarchical order for the integers 1,,,, 4k

Scale the ordered integers to the range (1, n)

Eliminate any duplicates

• For example, let n = 12

Then k = 2, so that 4k = 16

RHO(16) = (1, 5, 9,13, 2, 6,10,14, 3, 7, 11, 15, 4, 8,12, 16)

RHO(12) = (1, 4, 7,10, 2, 5, 8,11, 3, 6, 9, 12)

SPATIAL PROPERTIES

OF

REVERSE HIERARCHICALORDERED GRTS SAMPLE

SPATIAL PROPERTIES OF REVERSE

HIERARCHICAL ORDEREDGRTS SAMPLE

• The complete sample is nearly regular, capturing much ofthe potential efficiency of a systematic sample without the

potential flaws

• Any subsample consisting of a consecutive subsequence is

almost as regular as the full sample; in particular, thesubsequence

is a spatially well-balanced sample.

• Any consecutive sequence subsample, restricted to theaccessible domain, is a spatially well-balanced sample ofthe accessible domain.

for k Mk 1 2 k = { , , ..., },S s s s ≤


11/30

0 0. 2

0. 4 0. 6

0. 8 1

X

0

0 .2

0 .4

0 .6

0 .8

1

Y

0

0 . 5

1

1

. 5

Z

Inclusion probability density surface

Region is (0,1)x(0,0.8)c(0, 1)

c ( 0 ,

1 )

0.0 0.2 0.4 0.6 0.8 1.0

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0


HIERARCHICAL ORDEREDGRTS SAMPLE

• Assess spatial balance by variance of size of Voronoi

polygons, compared to SRS sample of the same size.

• Voronoi polygons for a set of points {s1 ,s2 ,…,sk } :

The ith polygon is the collection of points in the domain

that are closer to si than to any other s j in the set.

• Estimate variance by 1000 replications of a sample of

size 256 in unit square

Spatial Balance: 256 points


HIERARCHICAL ORDERED GRTS SAMPLE

• Compare regularity as points are added one at a time,

following reverse hierarchical order under 4 scenarios:

Complete, continuous domain

Domains with “holes” excluding 20 %, modeling non-

response/access refusal• 20 randomly-located square holes, constant size

• 20 randomly-located square holes, increasing linearly in size

• 10 randomly-located square holes, increasing exponentially

in size

• Holes model inaccessibility…elements that are

in target domain, but cannot be sampled for

some reason

Linear IncreaseConstant Size

Exponential Increase

Inaccessible Domain Patterns (20% Inaccessible)


12/30

Spatial Balance: With oversample

point density

p o l y g o n a r e a v a r i a n c e r a t

i o

0 50 100 150 200 250

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

Exponentially increasing polygon size, total perimeter = 4.2

Linearly increasing polygon size, total perimeter = 7.6

Constant polygon size, total perimeter = 8

Continuous domain with no voids

20 –point GRTS Sample Four 20-point GRTS Panels

Five 20-point GRTS Panels Five 20-point GRTS Panels

+ Special Study Area


13/30

Ohio River GRTS Process Example

Ohio River GRTS Illustration

Create Straight Line

All Reaches

Create Line for Ohio River (length)

Divide into 4 segments

Create Random Sequence (3, 0, 2, 1)

Assign address & color

Sort address

Repeat for each segment

0 13 2

0 1 32

Repeating Process

Divide each segment into 4 segments

Create New Random Sequences

(2,0,3,1) (1,2,0,3) (0,3,1,2) (2,3,1,0)

Assign address & colors

Sort

0 1 32

11 12 10 13 3033322221232001030002 31

11 1210 13 30 33322221 232001 0300 02 31

Selecting 16 Sample Points

1 43 65 872 9 10 11 12 13 14 15 16

Subdivide, Create Random Sequence, Assign Address & Colors

Sort Addresses

Random Starting point, Uniformly Sample Line

Assign Sequence Number to Each Point


RHO

Site

Number

00 0302 1110 131201 20 21 22 23Base4 30 31 32 33

00 30201101 31211002 12 22 32Sort 03 13 23 33

1 43 65 872 9 10 11 12 13 14 15 16

00 3020 1101 312110 02 12 22 32Reverse

Base4 03 23 3313

Original

Order

1 43 65 872 9 10 11 12 13 14 15 16

Create Base4 Addresses

Reverse Address DigitsSort

Assign RHO Site Nos.


14/30

Map Sites

1 234 5 678 9 101112

RHO

1 43 65 872 9 10 11 12 13 14 15 16

Original

Line

(unsorted)

Ohio River Sites

1

3

4

5

6

7

89

10

11

12

2

GRTS Implementation Steps

• Concept of selecting a probability sample from a

sampling line for the resource

• Create a hierarchical grid with hierarchical addressing

• Randomize hierarchical addresses

• Construct sampling line using randomized hierarchical

addresses

• Select a systematic sample with a random start from

sampling line

• Place sample in reverse hierarchical address order

Selecting a Probability Sample from a Sampling Line:

Linear Network Case

• Place all stream segments in frame on a

linear line

Preserve segment length

Identify segments by ID

• In what order do place segments on line?

Randomly

Systematically (minimal spanning tree)

Randomized hierarchical grid

• Systematic sample with random start

k=L/n, L=length of line, n=sample size

Random start d between [0,k)

Sample: d + (i-1)*k for i=1,…,n

Selecting a Probability Sample from a Sampling

Line: Point and Area Cases

• Point Case:

Identify all points in frame

Assign each point unit length

Place on sample line

• Area Case:

• Create grid covering region of interest

• Generate random points within each grid cell

• Keep random points within resource (A)

• Assign each point unit length

• Place on sample line

Randomized Hierarchical Grid

• Step 1: Frame: Large lakes: blue; Small lakes: pink; Randomly place grid over theregion

• Step 2: Sub-divide region and randomly assign numbers to sub-regions

• Step 3: Sub-divide sub-regions; randomly assign numbers independently to each newsub-region; create hierarchical address. Continue sub-dividing until only one lake per

cell.

• Step 4: Identify each lake with cell address; assign each lake length 1; place lakes online in numerical cell address order.

Step 1 Step 2 Step 3 Step 4


15/30

Hierarchical Grid Addressing

213: hierarchical address

Population of 120 points

+

++

+

+

+

+

+

+ +

+

++

+

+

++

++

+

+ +

++

+

+ ++

+

+

+

+ +

++

++

++ +

+

+

+

+

++++

+

+

+

++

+

+

+

+

+

+

++

+

+

+

++

+

+

+ + +

+

+

++

+ +

+

+

+

+

+

+

+ +

+

+

+

++

+

+

+

+

++

+

+

+ ++

+ +

+

+

++

+

+

++

+

+

+ +

+

+++

+

0.0 0.2 0.4 0.6 0.8 1.0

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

Hierarchical Order

x

y

+ +

+

+

+

+

++

+

++

++

+

+

+

+

+

+

++

++

+

++

+ ++

+

++ +

+

+

+

+

+

++

+

++

+

+

++

+

++

+

++++

+

+

+

++ +

+

++

+ ++ ++

+

+

++

++

+

++

+

+

++

+

+

+

+ +

+ +

+

+

++

++

+

+

+

+

+++

+

+

+

++ +

+

+

++

++

++

+ + +

+

0.0 0.2 0.4 0.6 0.8 1.0

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

Hiearchical Randomized Order

x

y


• RHO: simply reverse order of digits

and sort

Reverse: 21 31 22 32 23 43 14 34 44

Sort: 14 21 22 23 31 32 34 43 44

Orig: 41 12 22 32 13 23 43 34 44

• Why use reverse hierarchical order?

Results in any contiguous set of

sample sites being spatially-balanced

Consequence: can begin at the

beginning of list and continue using

sites until have required number of

sites sampled in field

Unequal Probability of Selection

• Assume want large lakes to be twice as

likely to be selected as small lakes

• Instead of giving all lakes same unit

length, give large lakes twice unit

length of small lakes

• To select 5 sites divide line length by 5

(11/5 units); randomly select a starting

point within first interval; select 4

additional sites at intervals of 11/5

units

• Same process is used for points and

areas (using random points in area)

spsurvey Library for R

Implementation Examples

R and spsurvey library

• R statistics program and spsurvey library are free

• Information on where to get them and how to install


under “Download Software” on left hand menu

• All commands necessary to create Illinois designs were

given on previous slides

• Example “R scripts” and shapefiles are available on

ARM web site

• Challenges

Creating appropriate shapefile for the sample frame

Learning basics of R

Selecting appropriate spatial survey design


16/30

Options to use with GRTS

• Three sample frame types (shapefile types)

Points

Lines

Polygons

• Survey Design features

Stratification

Equal, unequal, or continuous probability of selection

Over sample for use when some sites can not be used

Panels for surveys over time

Two stage survey designs (requires two steps)

Specifying Designs in R

design1 = list(None=list(panel=c(Panel_1=50),

seltype= “Equal”) )

design2 = list(None=list(panel=c(Panel_1=50, Panel_2=50),

seltype= “Equal”,

over=100) )

design3 = list(Stratum1=list(panel=c(Panel_1=50),

seltype= “Equal”

over=50)

Stratum2=list(panel=c(Panel_1=50),

seltype= “Unequal”

caty.n=c(category1=25, category2=25) ) )

Illinois River Basin GRTS designs for Streams

dsgn


17/30

Example Design File

Use examples to illustrate generation of different

spatial survey design requirements and selection of

spatial survey designs

• Lakes

South Carolina Lakes as area resource

National Lake Assessment lakes as point lake resource

• Streams

Illinois River Basin streams as linear stream resource

Pennsylvania attaining stream segments as point stream resource

• Estuaries

Chesapeake Bay

Southern California Bight

• Wetlands

Iowa points

Ohio area

Minnesota wetlands as two-stage design

Lake Design: South Carolina

• Monitoring Objectives

Estimate the number of hectares of major and minor lakes in SouthCarolina that meet water quality criteria (also other indicators)

• Target Population and Resource Characteristics

State identifies 17 major lakes and 35 minor lakes

Require estimates for major, minor, and combined lake subpopulations Elements are all possible locations within surface area of identified

lakes

• Sample Frame

Shapefile from NHD

Attribute that identifies minor, major, and other lakes within state

• Institutional Constraints

Sample size 30 sites per year across target population

Complete survey over 5 year period


18/30

NHD Lakes Lake Design: National Lake Assessment

• Monitoring Objectives Estimate number of lakes in 48 states that are in “good” condition

nationally and by 9 aggregated ecoregions

Estimate change in eutrophication status for 1972-76 NationalEutrophication Study lakes

• Target Population and Resource Characteristics All lakes/reservoirs/ponds greater than 4 hectares

Elements are individual lakes

Very skewed lake area size distribution

• Sample Frame Shapefile based on NHD

Attributes for state, lake area category, ecoregion, and NES lake

• Institutional Constraints Total number of lakes that can be sampled: 1000

States operate independently

Survey occur in one year

NHD Lake Sample Frame: Points

National Lake Survey: Overview

123,439909Total

(>100 hectares)7,356264

> 250 acres

(50-100 hectares)6,134172

125-250 acres

(20-50 hectares)16,488184

50-125 acres

(10-20 hectares)

24,90218525-50 acres

(4-10 hectares)68,559104

10-25 acres

Total # of Lakes in theUS

# of LakesSelectedLake Size Category

Distribution of Lakes in Survey

Total number of lake visits: 1,000909 unique lakes91 lakes for repeat sampling

Number of Lakes from 1972-76 NationalLake EutrophicationStudy (NES):113

Number of Lakes per state:Range: 4-41

Median: 18

Number of lakes per ecoregion:Range: 84-119

Median: 101


19/30

Stream Design: Pennsylvania Attaining

Segments• Monitoring Objectives

Estimate number of currently attaining stream segments within each basin that remain attaini ng

• Target Population and Resource Characteristics All attaining stream segments within each basin in Pennsylvania

Elements are stream segments not point on stream linear network

• Sample Frame Polylineshapefile of stream network and point shapefile of segment

centroids

• Institutional Constraints 30 segments sampled per basin

5 random locations on each of the 30 segments; one of which will besampled

• Two-stage spatial survey design Stage 1: select equal probability sample of segments within basin using

GRTS for finite/point resource

Stage 2: select sites within each segment using GRTS for linearresource

Estuary Design: Chesapeake Bay NCA

• Monitoring Objectives

Estimate the square kilometers of Chesapeake Bay and 10 subregions

that are in “good” condition

• Target Population and Resource Characteristics

Surface area of Chesapeake Bay estuary

Elements are all locations Subpopulations are 10 subregions

• Sample Frame

NCA generated pol ygon shapefile

Attribute for subregions

• Institutional Constraints

125 sites sampled in 2005 and 2006

• Spatial survey design for an areal resource with unequal probability

for 10 subregions


20/302

Wetland Design: Pennsylvania• Monitoring Objectives

Estimate number of hectares of palustrine wetlands that are in “good”condition based on a level 2 assessment for each basin in Pennsylvaniaand for four landcover classes within each basin

• Target Population and Resource Characteristics All mapped NWI vegetated wetlands within the Palustrine Emergent,

Palustrine Scrub Shrub and Palustrine Forested classifications that havea predominance (>50%) of emergent, herbaceous or woody vegetation

Elements are all possible locations within the mapped polygons

• Sample Frame NWI polygon shape file restricted to palustrine classes defined

Attributes added identify 4 landcover classes and reporting basins

• Institutional Constraints Monitoring to be completed over 5 years; each year a basin in each of

the six reporting regions of state will be sampled

Expected sample size of 50 in each landcover class in each basin

Over sample of 200% due to sample frame deficiencies

• Spatially balanced survey design for an arealresource with unequal

probability

Wetland Design: Minnesota

• Monitoring Objectives Estimate total hectares of wetlands by wetland class and major basin in

Minnesota

Estimate number of hectares of depressional wetlands that are in good condition by major basin and state-wide

• Target Population and Resource Characteristics All wetlands that can be identified from aerial photointerpretation using USFWS

NWI status and trends mapping proc edures

For extent the elements are 1 sq mile pixels that cover Minnesota

For condition the elements are all locations within wetland polygons delineatedon aerial photos

• Sample Frame For extent, a point shapefile of centroids of 1 sq mile pixels: an “area frame”

For condition, all wetland polygons within sampled extent pixels

• Institutional Constraints 1800 1 sq mile pixels can be photo interpreted each year

Must cover entire state each year

• Two stage survey design Stage 1: Split panel design (annual repeat panel, 3 year panels) equal probability

Stage 2: GRTS design for area resource: remainder to be determined


21/302

Spatial Balance over Population contrasted with

Spatial Balance over Space

Finite Population Example Equi-probable GRTS Sample

Sample: Probability inversely

proportional to population density Equi-probable Inverse density


22/302

Groundwater Wells in Florida Using Inverse of Population Density to GenerateGRTS spatially Balanced Sample over Space

2-d well population density Inverse well inclusion probability

GRTS:

+++ ++

+ +++ +

++++++

+++++ ++++++ ++

++ +++++ +++ + ++++

+ +++ ++

++ ++ + +++

++++ ++

+++ +

+ + ++ +

++ ++ +++

++

++

++++

+++

+

+++ ++

+

+

+

++

+

+ ++ ++

+++ ++ +++ ++++ ++

+ ++ ++ + ++ +

+++ +++ +

++ + + +++ ++ ++ + ++ +++ +

+ ++++ ++ + +

++ ++++ ++

++ ++++++++ +

++ ++ ++

+

+

+++++

++ ++ +++ ++

+ +++ ++ +++ ++

+

+++ +++

++ ++

+++++

+ +++

+ ++

+

+

++ ++ +

+ +++

+++ ++ +++++++

+++ + + ++ +

+ ++ ++ ++

++++ + ++ +++++

+++ +++ ++ ++ ++

+ ++

+

+

++

+ ++ +

++++ +++ +++ +

+ +++++

+

+

+

+

+ +

+++

+

++ +

+ ++++ +++

++

++

+ +++ ++

+++

+

++++

++ ++ ++

++

+++

+++ +

+

+

+

++ ++

++ ++

++

++

+

+

++ +

+

++++

+++

++++

+

+

++++

+

+

+

Existing WellSample Well

+++ ++

+ +++ +

++++++

+++++ ++++++ ++

++ +++++ +++ + ++++

+ +++ ++

++ ++ + ++++

+++ ++

+++ +

+ + ++ +

++ ++ +++

++

++

++++

+++

+

+++ ++

+

+

+

++

+

+ ++ ++

+++ ++ +++ ++

++ +++ ++ ++ +

++ + +++

+++ +++ + + +++ +

+ ++ + ++ +++ ++ +++

+ ++ + +++ ++++ ++

++ ++++++++ +

++ ++ ++

+

+

+++++

++ ++ +++ ++

+ +++ ++ +++ ++

+

+++ +++

++ ++

+++++

+ +++

+ ++

+

+

++ ++ +

+ +++

+++ ++ ++++++

+ +++ + + ++ +

+ ++ ++ ++

++++ + ++ +++++

+++ +++ ++ ++ ++

+ ++

+

+

++

+ ++ +

++++ +++ +++ +

+ +++++

+

+

+

+

+ +

+++

+

++ +

+ ++++ +++

++

++

+ +++ ++

+++

+

++++

++ ++ ++

++

+++

+++ +

+

+

+++ ++

++ ++

++

++

+

+

++ ++

++++

+++

++++

+

+

++++

+

+

+

Existing WellSample Well

Spatially balanced over Population Spatially balanced over Space

Need for flexibility

in spatial survey designs

Target Population, Sample Frame, Sampled Population

We Live in an Imperfect World…

Ideally, cyan, yellow, gray squares would overlap completely

Local Neighborhood

Variance Estimator


23/302

Derivation of Variance Estimator

(Continuous Horvitz-Thompson): Let s1 ,s2 ,…,sn be a sample

selected from a universe U according to a design withinclusion function π (s) and joint inclusion function π (s, t)with π (s) > 0 on U . Let R ⊂ U , and let z(s) be a real-valuedintegrable function defined on R.

An unbiased estimator of isTR

z(s) ds = z∫

ˆn

R i iT

ii = 1

( ) z( ) s s I = z

( ) sπ ∑


The variance of the HT estimator is

Or equivalently,

ˆ2

T HT

R R R

(s, t) (s) (t)(s) z ( ) = ds + z(s) z(t)dt dsV z (s) (s) (t)

π π π π π π

⎡ ⎤−⎢ ⎥⎣ ⎦

∫ ∫ ∫

[ ]ˆ 2

R RT YG

UU

1 z(s) (s) z(t) (t) I I ( ) = (s) (t) (s , t) dt dsV z

2 (s) (t)π π π

π π

⎡ ⎤− −⎢ ⎥

⎣ ⎦∫∫


For the HR spatially-balanced designs, π(s, t) has the form

{ }(s, t) = (s) (t) 1- h(s, t)π π π

0 h(s, t) 1≤ ≤ h(s, t) = h(t, s)

where h(s, t) has the properties:

D(s)

(t)dt 4π ≈

∫( , ) 0 for ( )h s t t D s= ∉

There exists a neighborhood D(s) of s such that

( , ) 1h s s =


It follows that

an independence-like condition.

Intuitively, the design achieves spatial balance by making

the probability of two points being close together small, and

making the point locations effectively independent if the

points are far enough apart.

(s, t) (s) (t) , t D(s)π π π = ∉


Applying the above property in the YG expression for variancegives:

For fixed s, the expression

vanishes for , so that variance arises LOCALLYaround s. We are going to construct a variance estimator thatmimics the behavior of the YG expression for variance.

[ ]ˆ 2

T YG

U D(s)

1 z(s) z(t)( ) = (s) (t) (s, t ) dt dsV z

2 (s) (t)π π π

π π

⎡ ⎤− −⎢ ⎥

⎣ ⎦∫ ∫

[ ] 2

z(s) z(t)(s) (t) (s, t)

(s) (t)π π π

π π

⎡ ⎤− −⎢ ⎥

⎣ ⎦

( )t D s∉


Let B be the random event that determines boundaries from

which a point is eventually selected. Those boundaries

define a random partition of the universe, i.e., a random

stratification. The design, conditional on B, is a 1-sample-

per-stratum spatially stratified sample.


24/302


• Conditioning the variance on the event B leads to the

expression

ˆ ˆ ˆ

ˆi

T T T

i

T

i R s

V( ) = E [V( | B)] + V(E [ | B]) Z Z Z

z( ) s = E [V( | B)] = E V | B Z

( ) sπ ∈

⎡ ⎤⎛ ⎞⎢ ⎥⎜ ⎟

⎝ ⎠⎣ ⎦∑


We approximate by distributing the

observations over local neighborhoods, using a weighting

function that mimics the behavior of . We

replace the term corresponding to

in V YG with a term of the form

i

i

z( ) s E V | B

( ) sπ

⎡ ⎤⎛ ⎞⎢ ⎥⎜ ⎟

⎝ ⎠⎣ ⎦

(s, t) - (s) (t)π π π

2

j i

j i

z( ) z( ) s s -

( ) ( ) s sπ π

⎛ ⎞⎜ ⎟⎝ ⎠

2

j

i D

j

z( ) s - z ( ) s

( ) sπ

⎛ ⎞⎜ ⎟⎝ ⎠


The local mean is given by

We pick the neighborhoods D(si ) so that each neighborhood

contains about 4 sample points, and satisfies

j i

j

i D ij

j D( ) s s

z( ) s z ( ) = w s

( ) sπ ∈∑

j i i j D( ) D( ) s s s s∈ ⇔ ∈


The weights wij are selected using the following criteria:

1.The weight wij should vary inversely as π( s j ) and decrease

as the distance between si and s j increases.

2. , so that the neighborhood totals are

averages over the neighborhoods, and the sum of the

neighborhood totals is equal to the estimated overall total.

1ij iji j

w w= =∑ ∑


Initially, we set

The column total constraint is satisfied by setting

j i i*

j

rank(| - |) - 1)/count(D( ))s s s

( )sij

1 - ( w =

π

k i

*i j

i j *i k

D( ) s s

w =w

w∈∑


There is no unique way to satisfy both constraints incriterion (2), so we select the set of weights wij thatminimize

while satisfying criteria (2). We solve this constrainedminimization problem using Lagrange multipliers. Theunconstrained minimization is then

2

i j i j

i, j

( - )w w∑

2

, ,,

min ( ) ( 1) ( 1)ij k l

ij ij k kj l il w

i j k j l i

w w w wλ γ

λ γ − + − + −∑ ∑ ∑ ∑ ∑


25/302


We can eliminate the wij by setting derivatives equal to 0.

The resulting equations in and are singular, so we

use the Moore-Penrose generalized inverse to solve. The

resulting set of weights is given by

ˆk λ ˆ

l γ

ˆ ˆ ji*

i j i j

+= + .w w

2

γ λ


The resulting estimator is

2

( )

( )

2

( ) ( )

( )ˆ ˆ( )

( )

( ) ( )

( ) ( )

i

i j i

i j i k i

j

NBH T ij D s

s R s D s j

j k ij ik

s R s D s s D s j k

z sV Z w z

s

z s z sw w

s s

π

π π

∈ ∈

∈ ∈ ∈

⎛ ⎞= −⎜ ⎟⎜ ⎟

⎝ ⎠

⎛ ⎞= −⎜ ⎟⎜ ⎟

⎝ ⎠

∑ ∑

∑ ∑ ∑


• By the using the symmetry of h(s, t), the estimator can be

re-written as

2

( )

( )

( )ˆ ˆ( )

( ) i j i j

j

NBH T ij D s

s R s D s j

z sV Z w z

sπ ∈ ∈

⎛ ⎞= −⎜ ⎟⎜ ⎟

⎝ ⎠∑ ∑

Simulation Study

• Used finite, linear, & extensive populations

• Drew 1000 GRTS samples of size 50 in each case

• Generated z by interpolating on surface to sample point

coordinates

• Used two surfaces, a “smooth” and “rough”

• Variability probability in all cases, proportional to

“weight” variable

• Compared to IRS variance estimator


26/302

ˆ NBH V ˆ NBH V SD ˆ IRS V ˆ IRS V SD

1519753513836727899273252

563221678352612768104721

V 1000Surface

Finite Population Summary


27/302

Other Applications of Local Neighborhood

Variance Estimator

Application to systematic designs

• Widely accepted that a more or less regular pattern of points (e.g., systematic sampling) ismore efficient than SRS

• A variety of variance estimators for estimatedmean are available for 1-dimensional systematicsampling

• Examine the behavior of some varianceestimators for 2-dimensional systematic andspatially-balanced (not necessarily regular)designs

0 5 10 15 20 25 30

0

5

1 0

1 5

2 0

2 5

3 0

xg

y g


28/302

X

Y

xy[,1]^2 + xy[,2]^2 -xy[,1] -xy[,2]+.1* cos(20*xy[,1]) + 0.1*sin (15*xy[,2]) +1.5

z . p t c h

3 2 . a

r y [ , , i ]

Y

Z

z . p t c h

3 2 . a

r y [ , , i ]

Y

Z

z . p t c h

3 2 . a

r y [ , , i ]

Y

Z

z . p t c h

3 2 . a

r y [ , , i ]

Y

Z

Patchy Surfaces

Variance Estimators Examined

• SRS – simple random sample

• HS – horizontal stratification

• VS – vertical stratification

• AC – 1st order autocorrelation correction

• CAC – Cochran’s autocorrelation correction

Result GRF cv=1-exp(-2x)

0.9730620.001130.004949V NBH

0.788187-0.00230.001436V cac

0.9820000.001890.005709V ac

0.9786250.001900.005728V sv

0.977060.001900.005722V sh

0.999430.008010.011834V SRS

95%

Coverage

BiasMean

Result GRF cv=1-exp(-0.5x)

0.976810.002380.00992V NBH

0.89950-0.00130.00626V cac

0.988810.005150.01270V ac

0.987310.005150.01270V sv

0.987190.005210.01275V sh

0.993560.006930.01448V SRS

95%

Coverage

BiasMean

0.0 0.5 1.0 1.5 2.0

0 . 8

5

0 . 9

0

0 . 9

5

1 . 0

0

CV Parameter

C o v e r a g e

srs

sh

svac

cac

nbh


29/302

Results Patchy Surface

0.954080.000030.00033V NBH

0.74199-0.00020.00009V cac

0.962900.000060.00036V ac

0.934870.000020.00032V sv

0.957960.000110.00040V sh

0.999500.000580.00087V SRS

95%

Coverage

BiasMean

Conclusions

• The hs, vs, ac, and nbh estimators all seem to

work reasonably well for both the GRF and patchy surfaces

• The nbh estimator seems to give coverages that

are closer to nominal than the hs, vs, or ac

estimators

• The nbh works for variable probability, spatially

constrained designs for which the other

estimators do not.

Population estimation

for GRTS designs

using

spsurvey R library

Population Estimation

using spsurvey Library Functions

• Continuous data: cont.analysis Estimates CDF (percent and size of resource), percentiles, and

mean

• Categorical data: cat.analysis Estimates percent and size of resource in each category

• Functionality for both functions Specify subpopulations for which estimates required

Use Horwitz-Thompson estimator

Two variance estimator options

• local neighborhood variance estimator

• Horwitz-Thompson estimator

• Analyses for stratified, unequal probability, and two-stage designs

• Comparison of two CDFs

data.cat


30/30

References

Stevens, D. L., Jr. and A. R. Olsen (2004). "Spatially-balanced sampling ofnatural resources." Journal of American Statistical Association 99(465):262-278.

Stevens, D. L., Jr. and A. R. Olsen (2003). "Variance estimation for

spatially balanced samples of environmental resources." Environmetrics14: 593-610.

Stevens, Jr., D. L. and A. R. Olsen. 1999. Spatially Restricted Surveys OverTime for Aquatic Resources. Journal of Agricultural, Biological, and

Environmental Statistics 4:415-428.

Stevens, Jr., D. L., and T. M. Kincaid. 1998. Variance Estimation forSubpopulations Parameters from Samples of Spatial EnvironmentalPopulations. 1997 Proceedings of the Section on Survey Research

Methods. American Statistical Association, Alexandria, VA. p. 86-85.

Stevens, Jr., D.L. 1997 . Variable Density Grid-Based Sampling Designsfor Continuous Spatial Populations Environmetrics. 8:167-195.


Date post:	06-Jul-2018
Category:	Documents
Upload:	siti-hajar
View:	220 times
Download:	0 times

Stevens.olsen.jsm Short Course.2006

Documents