Date post: | 06-Jul-2018 |
Category: |
Documents |
Upload: | siti-hajar |
View: | 220 times |
Download: | 0 times |
of 30
8/17/2019 Stevens.olsen.jsm Short Course.2006
1/30
Spatial Sampling
Don StevensDepartment of Statistics
Oregon State University
Corvallis, Oregon
Anthony (Tony) R. OlsenWestern Ecology Division
US EPA
Corvallis, Oregon
Aquatic Resource Monitoring Web Page
http://www.epa.gov/nheerl/arm
Short Course Learning Objectives
• learn how to develop survey designs when the
population of interest occurs in 2-dimensional space
• learn the distinction between, and the importance of,
populations that can be modeled as points, lines , and
polygons
• learn how spatial survey designs can be selected using
GIS shapefiles and R Statistical Language
• be introduced to a local neighborhood variance
estimator
Course Outline
• Population elements: modeled as points, lines, or polygons. The point caseis a finite population; the other two are continuous (infinite) populations.How does this impact survey design process?
• What are spatial survey designs and why are they different?
• What is important about ensuring spatial balance?
• Generalized Random Tessellation Stratified (GRTS) survey designs
Ohio River illustration of spatial balance process
2-dimensional illustration of spatial balance process
Theory behind GRTS survey designs
• Need for flexibility in spatial survey designs
Imperfect sampling frames
Addressing non-response issues
• Spatial balance with respect to space versus spatial balance with respect totarget population
• Sample selection for GRTS survey designs using R and spsurvey library
• Variance estimation for GRTS
• Population estimation for GRTS using R and spsurveylibrary
Spatial Sampling Framework
• Applies to all natural resource monitoring
• Monitoring pieces must be designed and
implemented to fit together
• View as information system
• Short course focus on (1) “Design”,
specifically sample selection and (2)
“Assess”, population estimation
• Reference: National Water Quality
Monitoring Council, Water Resources
IMPACT, September 2003 issue
• Kish (1965): “The survey objectives should determine the
sample design; but the determination is actually a two-way
process…”
• Initially objectives are stated in common sense statements –
challenge is to transform them into quantitative questions that
can be used to specify the design.
• Statistical perspective
Know whether a monitoring design can answer the question
Know when the question is not precise enough – multiple interpretations
Developmonitoringobjectives
ConveyResultsand
findings
Designmonitoringprogram
Designmonitoringprogram Collect
field andlab data
Developmonitoringobjectives
• Key components of monitoring design
What resource will be monitored? (target population)
What will be measured? (variables or indicators)
How will indicators be measured? (response design)
When and how frequently will the measurements be taken?
(temporal design)
Where will the measurements be taken? (spatial survey design)
• Statistical perspective
Target population and its representation, the sample frame
Spatial survey design for site selection
Panel design for monitoring across years
8/17/2019 Stevens.olsen.jsm Short Course.2006
2/30
Spatial Survey Design Process
Resource
Characteristics
Monitoring
Objectives
Institutional
Constraints
Design
Requirements
Target
Population
Sample
Frame
Spatial
Survey
Design
Site
Selection
using R
Design
File
Sampling in Space:
Spatial Characteristics ofTarget Population Elements
&
Sample Units
Spatial Objects
• 0-, 1-, 2-dimensional representation
(points, lines, polygons)
Point-like
• Finite population of discrete units, e.g., small- to medium-sized
lakes
Linear
• Width is very small relative to length, e.g., streams or riparian
vegetation belts
Extensive
• Covers large area in a more or less continuous and connected
fashion, e.g., a forest stand, large estuary (San Francisco Bay), or
wetland (Florida Everglades)
Point Example: Elements and Sample Units
are all centroids of lakes/reservoirs
Linear Example: Elements and Sample Units are all
points in the linear network Polygon Example: Elements and Sample Units are all
points within the polygons
8/17/2019 Stevens.olsen.jsm Short Course.2006
3/30
Linear Network Modeled as Points: elements and sample
units are all segments defined by linear network
Polygon (Minnesota) modeled as point sample
units
Spatial Survey Design
and
Spatial Balance
Spatial Survey Design
• Survey design for Natural Resources Spatial relationships in population are critical
• Elements near one another tend to share substrate
• Elements near one another subject to same or similar natural and
anthropogenic stressors
• Tobler's First Law of Geography: Things that are close
together in space tend to have more similar properties
than things that are far apart.
OR
• Spatial correlation functions tend to decrease with
distance
Spatial Survey Design
• Survey design for natural resources Standard survey design methodology not well-suited natural
resource populations
• Overwhelming emphasis on finite populations
• Space is an intrinsic feature of the resource
• Most natural resource populations are naturallyconceptualized as continua of points Can discretize stream networks by chopping into fixed-length
segments or variable length reaches, but difficult to retain spatialrelationships
Can use grid to divide a forest or estuary into finite collection ofcells, but cells don’t have a natural or ecological interpretation
Inference is then to number, not extent, e.g., number of reaches,of miles of stream
Sampling Natural Resource Populations
• Patterned response (gradients, patches, periodic
responses)
• Variable inclusion probability
Ecological importance, economic importance, environmental
stressor levels, scientific interest, and political importance arenot uniform over the extent of the resource
• Pattern in population occurrence (density)
Stream or lake density (NE US versus Western US)
• Unreliable frame material
• Access difficulty
• Temporal panels often needed
8/17/2019 Stevens.olsen.jsm Short Course.2006
4/30
Desirable Properties of Natural Resource
Samples
(1) Accommodate varying spatial sample intensity
(2) Spread the sample points evenly and regularly over the
domain, subject to (1)
(3) Allow augmentation of the sample after-the-fact, while
maintaining (2)
(4) Accommodate varying population spatial density for
finite & linear populations, subject to (1) & (2).
Sampling Natural Resource Populations
• Natural resource populations exist in a spatial matrix
• Population elements close to one another tend to be more
similar than widely separated elements• Good sampling designs tend to spread out the sample
points more or less regularly
• Simple random sampling tends to exhibit uneven spatial patterns
• Some survey techniques address spatial relationships
Basic Spatial Survey Designs
• Simple random sample
• Systematic sample
Regular grid
Regular spacing on linear resource
• Spatially stratified
Strata can be geometric figures (grid cells), political boundaries
(township lines), or natural boundaries (watersheds)
Maximum regularity ⇒• few ( 1 or 2) samples per stratum
• equal area strata
Basic Spatial Survey Designs (continued)
• Spatial ordering
Serpentine strips
Graph-theoretical
• Spatially balanced sample
Combination of simple random and systematic characteristics
Guarantees all possible samples are distributed across the resource
(target population)
Spatially Balanced Sample
• A balanced sample has the property that the number of
samples in any interval of the range of the response is
proportional to the extent of the population in that range.
• Let n I be the number of samples in the interval I , where
I = (z 1 , z 2 ).
• For spatial balance we want
n I ≈ n(F(z 2 )-F(z 1 )),
where F(⋅) is the distribution function of the response z.
• In a perfectly balanced sample, we would have equality,
but if we knew enough to perfectly balance, we wouldn’t
need to sample.
Spatially Balanced Sample
• For a response with spatial pattern, we get approximate
balance over the response by ensuring spatial balance,
i.e., that for any A Ã R, we have that n A ≈ n|A|/|R|, where
R is the domain of the response.
• Of course, for any equi-probable sample,
E[n A] ≈ n|A|/|R|,
so we really want V[n A] to be “small”.
8/17/2019 Stevens.olsen.jsm Short Course.2006
5/30
A B C
28 28 15
Simple random sample of a domain
with 3 subdomains
Sampling Natural Resource Populations
• Systematic sample has substantial disadvantages
Well known problems with periodic response
Less well recognized problem: patch-like response or gradient
oriented along grid line
A B C
26 24 15
Systematic Sample
with
3 Subdomains
A B C
26 24 15
A B C
32 20 16
Systematic Sample
with
3 Subdomains
Sampling Aquatic Resource Populations
• Systematic sample has substantial disadvantages Difficult to apply to finite populations , e.g., Lakes
Limited flexibility to change sample point density
Difficult to accommodate variable inclusion probability orsample adjustment for frame errors
A B C
26 88 15
Sample point intensity can be
changed using nested grids
8/17/2019 Stevens.olsen.jsm Short Course.2006
6/30
RANDOM-TESSELLATION STRATIFIED
(RTS) DESIGN
• Compromise between systematic & SRS that resolves periodic/patchy response
• Cover the population domain with a grid Randomly located
Regular (square or triangular)
Spacing chosen to give required spatial resolution
Tile the domain with equal-sized regular polygons containingthe grid points
Select one sample point at random from each tessellation polygon
RANDOM-TESSELLATION STRATIFIED
(RTS) DESIGN
• Solves some of systematic sample problems
Non-zero pairwise inclusion probability
Alignment with geographic features of population
Lets points get close together with low probability
RTS Sample
RTS DESIGN
• Does not resolve systematic sample difficulties with
variable probability
finite & linear populations
pattern in population occurrence (density)
unreliable frame material
Limited ability to change density
Generalized Random Tessellation Stratified
(GRTS) Survey Designs
• Emphasize spatial-balance: Every replication of the
sample exhibits a spatial density pattern that closelymimics the spatial density pattern of the resource
Historical Context
• GRTS design evolved from EMAP work on
global tessellations in the early 1990’s
• Became clear that basic systematic structure did
not have enough flexibility to accommodate thecharacteristics of environmental resource
sampling
8/17/2019 Stevens.olsen.jsm Short Course.2006
7/30
Theoretical Development of GRTS
Generalized Random-Tessellation Stratified
(GRTS) Design
• Conceptual structure: Population indexed by points contained within a region R
Have inclusion probability p( s) defined on R
Select a sample by picking points
• Finite: points represent units
p(s) is usual inclusion probability
• Linear: points on the lines
p(s) is a density: E(#sample points) /unit length
• Extensive: points are in region area
p(s) is a density: E(#sample points)/unit area
GRTS Design
Mechanics
• Map R into first quadrant of unit square, & add a random
offset
• Subdivide unit square into “small” grid cells
At least small enough so that total inclusion probability for a cell
(expected number of samples in the cell) is less than 1
Total inclusion probability for cell is sum or integral of p( s) over
the extent of the cell
0.0 0.2 0.4 0.6 0.8 1.0
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
Population region image
0.0 0.2 0.4 0.6 0.8 1.0
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
Population region image + random offset GRTS Design
MechanicsOrder the cells so that some 2-dimensional proximity
relationships are preserved
Can’t preserve everything, because a 1-1, onto, continuous map
from unit square to unit interval is impossible
Can get 1-1,onto, & measurable, which is good enough GRTS uses a quadrant-recursive function, similar to the space
filling curve developed by Guiseppe Peano in 1890.
8/17/2019 Stevens.olsen.jsm Short Course.2006
8/30
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3 01
2
3
0
1
2
3
0
1
2
3
0
1
2
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0 1
0
1
x
y
→
↑
Assign each cell an
address corresponding
to the order ofsubdivision
The address of the
shaded quadrant is
0.213
Order the cells
following the address
order
GRTS Design
Mechanics
• If we carry the process to the limit, letting the grid cell
size → 0, the result is a quadrant recursive function, thatis, a function that maps the unit square onto the unit
interval such that the image of every (sub) quadrant is an
interval.
• The resulting function is 1-1, onto, and measurable.
• Apply a restricted randomization that preserves quadrant
recursiveness.
HIERARCHICAL RANDOMIZATION
Each cell address is a base 4 fraction, that is, t = 0.t 1t 2t 3..., where each
digit t i is either a 0, 1, 2, or 3. A function h p is a hierarchical
permutation if
where is a possibly distinct permutation of {0,1,2,3} for
each unique combination of digits t 1 , t 2 , ..., t n - 1.
Every time a cell is sub-divided, we choose a random permutation to
order the sub-quadrants.
1 1 2 p 1 2 31 2 3t t t (t) = 0. ( ) ( ) ( )... p p ph t t t
_ 1 2 n 1... nt t t
( ) p•
HIERARCHICAL RANDOMIZATION
• If the permutations that define h p(·) are chosen at random
and independently from the set of all possible
permutations, we call h p(·) a hierarchical randomization
function, and the process of applying h p(·) hierarchical
randomization.
• Compose the basic q-r map with a hierarchical
randomization function to get a random, quadrant-
recursive function.
x
y
0 1
0
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
x
y
0 1
0
1
x
y
0 1
0
1
1
2 3
4
5
67
8
9
10 11
12
1314
1516
x
y
0 1
0
1
Start →
← Start
GRTS Design
Mechanics
• The result is a random order of the “small” grid cells
such that
All grid cells in the same quadrant have consecutive order
positions
• But will be randomly ordered within those positions This holds for all quadrant levels
• This induces a random ordering of population elements
8/17/2019 Stevens.olsen.jsm Short Course.2006
9/30
GRTS Design
Mechanics
• Assign each grid cell a length equal to its total inclusion
probability
• String the lengths in the random order
Result is a line with length equal to target sample size
• Take systematic sample along line (random start + unit
interval)
• Map back to population using inverse random q-r
function
GRTS DESIGN
Let• I = (0, 1] I 2 = (0, 1] ×(0, 1]
• φ( s) be a measure on I 2 (number, length, area)
• π( s) be an inclusion intensity function on I 2
• f : I 2 → I be a hierarchically randomized quadrant recursivefunction
GRTS DESIGN
• Map population domain R into (0, ½]×(0, ½],add random offset to get image R* ⊂ I 2
• Set
• F(x) is a random distribution function with range (0, M]
-1((0,x)) f
F(x) = (s)d (s)π φ ∫
GRTS DESIGN
• Pick u1
8/17/2019 Stevens.olsen.jsm Short Course.2006
10/30
Reverse Hierarchical Order
• Illustrate for 2-levels of addressing:
First 16 addresses as base 4-fractions
00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33
Reverse Hierarchical Order
• Illustrate for 2-levels of addressing:
First 16 addresses as base 4-fractions
00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33
Reversed digits
00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33
Reverse Hierarchical Order
• Illustrate for 2-levels of addressing:
First 16 addresses as base 4-numbers
00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33
Reversed digits
00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33
Reversed digits as base 10 numbers
0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15
Reverse Hierarchical Order
• The above algorithm works for sample sizes that are
powers of 4, i.e., n = 4k
• For other values of sample, need to modify:
Let k be smallest integer such that n ≤ 4k
Form the reverse hierarchical order for the integers 1,,,, 4k
Scale the ordered integers to the range (1, n)
Eliminate any duplicates
• For example, let n = 12
Then k = 2, so that 4k = 16
RHO(16) = (1, 5, 9,13, 2, 6,10,14, 3, 7, 11, 15, 4, 8,12, 16)
RHO(12) = (1, 4, 7,10, 2, 5, 8,11, 3, 6, 9, 12)
SPATIAL PROPERTIES
OF
REVERSE HIERARCHICALORDERED GRTS SAMPLE
SPATIAL PROPERTIES OF REVERSE
HIERARCHICAL ORDEREDGRTS SAMPLE
• The complete sample is nearly regular, capturing much ofthe potential efficiency of a systematic sample without the
potential flaws
• Any subsample consisting of a consecutive subsequence is
almost as regular as the full sample; in particular, thesubsequence
is a spatially well-balanced sample.
• Any consecutive sequence subsample, restricted to theaccessible domain, is a spatially well-balanced sample ofthe accessible domain.
for k Mk 1 2 k = { , , ..., },S s s s ≤
8/17/2019 Stevens.olsen.jsm Short Course.2006
11/30
0 0. 2
0. 4 0. 6
0. 8 1
X
0
0 .2
0 .4
0 .6
0 .8
1
Y
0
0 . 5
1
1
. 5
Z
Inclusion probability density surface
Region is (0,1)x(0,0.8)c(0, 1)
c ( 0 ,
1 )
0.0 0.2 0.4 0.6 0.8 1.0
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
SPATIAL PROPERTIES OF REVERSE
HIERARCHICAL ORDEREDGRTS SAMPLE
• Assess spatial balance by variance of size of Voronoi
polygons, compared to SRS sample of the same size.
• Voronoi polygons for a set of points {s1 ,s2 ,…,sk } :
The ith polygon is the collection of points in the domain
that are closer to si than to any other s j in the set.
• Estimate variance by 1000 replications of a sample of
size 256 in unit square
Spatial Balance: 256 points
SPATIAL PROPERTIES OF REVERSE
HIERARCHICAL ORDERED GRTS SAMPLE
• Compare regularity as points are added one at a time,
following reverse hierarchical order under 4 scenarios:
Complete, continuous domain
Domains with “holes” excluding 20 %, modeling non-
response/access refusal• 20 randomly-located square holes, constant size
• 20 randomly-located square holes, increasing linearly in size
• 10 randomly-located square holes, increasing exponentially
in size
• Holes model inaccessibility…elements that are
in target domain, but cannot be sampled for
some reason
Linear IncreaseConstant Size
Exponential Increase
Inaccessible Domain Patterns (20% Inaccessible)
8/17/2019 Stevens.olsen.jsm Short Course.2006
12/30
Spatial Balance: With oversample
point density
p o l y g o n a r e a v a r i a n c e r a t
i o
0 50 100 150 200 250
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
Exponentially increasing polygon size, total perimeter = 4.2
Linearly increasing polygon size, total perimeter = 7.6
Constant polygon size, total perimeter = 8
Continuous domain with no voids
20 –point GRTS Sample Four 20-point GRTS Panels
Five 20-point GRTS Panels Five 20-point GRTS Panels
+ Special Study Area
8/17/2019 Stevens.olsen.jsm Short Course.2006
13/30
Ohio River GRTS Process Example
Ohio River GRTS Illustration
Create Straight Line
All Reaches
Create Line for Ohio River (length)
Divide into 4 segments
Create Random Sequence (3, 0, 2, 1)
Assign address & color
Sort address
Repeat for each segment
0 13 2
0 1 32
Repeating Process
Divide each segment into 4 segments
Create New Random Sequences
(2,0,3,1) (1,2,0,3) (0,3,1,2) (2,3,1,0)
Assign address & colors
Sort
0 1 32
11 12 10 13 3033322221232001030002 31
11 1210 13 30 33322221 232001 0300 02 31
Selecting 16 Sample Points
1 43 65 872 9 10 11 12 13 14 15 16
Subdivide, Create Random Sequence, Assign Address & Colors
Sort Addresses
Random Starting point, Uniformly Sample Line
Assign Sequence Number to Each Point
Reverse Hierarchical Order
RHO
Site
Number
00 0302 1110 131201 20 21 22 23Base4 30 31 32 33
00 30201101 31211002 12 22 32Sort 03 13 23 33
1 43 65 872 9 10 11 12 13 14 15 16
00 3020 1101 312110 02 12 22 32Reverse
Base4 03 23 3313
Original
Order
1 43 65 872 9 10 11 12 13 14 15 16
Create Base4 Addresses
Reverse Address DigitsSort
Assign RHO Site Nos.
8/17/2019 Stevens.olsen.jsm Short Course.2006
14/30
Map Sites
1 234 5 678 9 101112
RHO
1 43 65 872 9 10 11 12 13 14 15 16
Original
Line
(unsorted)
Ohio River Sites
1
3
4
5
6
7
89
10
11
12
2
GRTS Implementation Steps
• Concept of selecting a probability sample from a
sampling line for the resource
• Create a hierarchical grid with hierarchical addressing
• Randomize hierarchical addresses
• Construct sampling line using randomized hierarchical
addresses
• Select a systematic sample with a random start from
sampling line
• Place sample in reverse hierarchical address order
Selecting a Probability Sample from a Sampling Line:
Linear Network Case
• Place all stream segments in frame on a
linear line
Preserve segment length
Identify segments by ID
• In what order do place segments on line?
Randomly
Systematically (minimal spanning tree)
Randomized hierarchical grid
• Systematic sample with random start
k=L/n, L=length of line, n=sample size
Random start d between [0,k)
Sample: d + (i-1)*k for i=1,…,n
Selecting a Probability Sample from a Sampling
Line: Point and Area Cases
• Point Case:
Identify all points in frame
Assign each point unit length
Place on sample line
• Area Case:
• Create grid covering region of interest
• Generate random points within each grid cell
• Keep random points within resource (A)
• Assign each point unit length
• Place on sample line
Randomized Hierarchical Grid
• Step 1: Frame: Large lakes: blue; Small lakes: pink; Randomly place grid over theregion
• Step 2: Sub-divide region and randomly assign numbers to sub-regions
• Step 3: Sub-divide sub-regions; randomly assign numbers independently to each newsub-region; create hierarchical address. Continue sub-dividing until only one lake per
cell.
• Step 4: Identify each lake with cell address; assign each lake length 1; place lakes online in numerical cell address order.
Step 1 Step 2 Step 3 Step 4
8/17/2019 Stevens.olsen.jsm Short Course.2006
15/30
Hierarchical Grid Addressing
213: hierarchical address
Population of 120 points
+
++
+
+
+
+
+
+ +
+
++
+
+
++
++
+
+ +
++
+
+ ++
+
+
+
+ +
++
++
++ +
+
+
+
+
++++
+
+
+
++
+
+
+
+
+
+
++
+
+
+
++
+
+
+ + +
+
+
++
+ +
+
+
+
+
+
+
+ +
+
+
+
++
+
+
+
+
++
+
+
+ ++
+ +
+
+
++
+
+
++
+
+
+ +
+
+++
+
0.0 0.2 0.4 0.6 0.8 1.0
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
Hierarchical Order
x
y
+ +
+
+
+
+
++
+
++
++
+
+
+
+
+
+
++
++
+
++
+ ++
+
++ +
+
+
+
+
+
++
+
++
+
+
++
+
++
+
++++
+
+
+
++ +
+
++
+ ++ ++
+
+
++
++
+
++
+
+
++
+
+
+
+ +
+ +
+
+
++
++
+
+
+
+
+++
+
+
+
++ +
+
+
++
++
++
+ + +
+
0.0 0.2 0.4 0.6 0.8 1.0
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
Hiearchical Randomized Order
x
y
Reverse Hierarchical Order
• RHO: simply reverse order of digits
and sort
Reverse: 21 31 22 32 23 43 14 34 44
Sort: 14 21 22 23 31 32 34 43 44
Orig: 41 12 22 32 13 23 43 34 44
• Why use reverse hierarchical order?
Results in any contiguous set of
sample sites being spatially-balanced
Consequence: can begin at the
beginning of list and continue using
sites until have required number of
sites sampled in field
Unequal Probability of Selection
• Assume want large lakes to be twice as
likely to be selected as small lakes
• Instead of giving all lakes same unit
length, give large lakes twice unit
length of small lakes
• To select 5 sites divide line length by 5
(11/5 units); randomly select a starting
point within first interval; select 4
additional sites at intervals of 11/5
units
• Same process is used for points and
areas (using random points in area)
spsurvey Library for R
Implementation Examples
R and spsurvey library
• R statistics program and spsurvey library are free
• Information on where to get them and how to install
http://www.epa.gov/nheerl/arm
under “Download Software” on left hand menu
• All commands necessary to create Illinois designs were
given on previous slides
• Example “R scripts” and shapefiles are available on
ARM web site
• Challenges
Creating appropriate shapefile for the sample frame
Learning basics of R
Selecting appropriate spatial survey design
8/17/2019 Stevens.olsen.jsm Short Course.2006
16/30
Options to use with GRTS
• Three sample frame types (shapefile types)
Points
Lines
Polygons
• Survey Design features
Stratification
Equal, unequal, or continuous probability of selection
Over sample for use when some sites can not be used
Panels for surveys over time
Two stage survey designs (requires two steps)
Specifying Designs in R
design1 = list(None=list(panel=c(Panel_1=50),
seltype= “Equal”) )
design2 = list(None=list(panel=c(Panel_1=50, Panel_2=50),
seltype= “Equal”,
over=100) )
design3 = list(Stratum1=list(panel=c(Panel_1=50),
seltype= “Equal”
over=50)
Stratum2=list(panel=c(Panel_1=50),
seltype= “Unequal”
caty.n=c(category1=25, category2=25) ) )
Illinois River Basin GRTS designs for Streams
dsgn
8/17/2019 Stevens.olsen.jsm Short Course.2006
17/30
Example Design File
Use examples to illustrate generation of different
spatial survey design requirements and selection of
spatial survey designs
• Lakes
South Carolina Lakes as area resource
National Lake Assessment lakes as point lake resource
• Streams
Illinois River Basin streams as linear stream resource
Pennsylvania attaining stream segments as point stream resource
• Estuaries
Chesapeake Bay
Southern California Bight
• Wetlands
Iowa points
Ohio area
Minnesota wetlands as two-stage design
Lake Design: South Carolina
• Monitoring Objectives
Estimate the number of hectares of major and minor lakes in SouthCarolina that meet water quality criteria (also other indicators)
• Target Population and Resource Characteristics
State identifies 17 major lakes and 35 minor lakes
Require estimates for major, minor, and combined lake subpopulations Elements are all possible locations within surface area of identified
lakes
• Sample Frame
Shapefile from NHD
Attribute that identifies minor, major, and other lakes within state
• Institutional Constraints
Sample size 30 sites per year across target population
Complete survey over 5 year period
8/17/2019 Stevens.olsen.jsm Short Course.2006
18/30
NHD Lakes Lake Design: National Lake Assessment
• Monitoring Objectives Estimate number of lakes in 48 states that are in “good” condition
nationally and by 9 aggregated ecoregions
Estimate change in eutrophication status for 1972-76 NationalEutrophication Study lakes
• Target Population and Resource Characteristics All lakes/reservoirs/ponds greater than 4 hectares
Elements are individual lakes
Very skewed lake area size distribution
• Sample Frame Shapefile based on NHD
Attributes for state, lake area category, ecoregion, and NES lake
• Institutional Constraints Total number of lakes that can be sampled: 1000
States operate independently
Survey occur in one year
NHD Lake Sample Frame: Points
National Lake Survey: Overview
123,439909Total
(>100 hectares)7,356264
> 250 acres
(50-100 hectares)6,134172
125-250 acres
(20-50 hectares)16,488184
50-125 acres
(10-20 hectares)
24,90218525-50 acres
(4-10 hectares)68,559104
10-25 acres
Total # of Lakes in theUS
# of LakesSelectedLake Size Category
Distribution of Lakes in Survey
Total number of lake visits: 1,000909 unique lakes91 lakes for repeat sampling
Number of Lakes from 1972-76 NationalLake EutrophicationStudy (NES):113
Number of Lakes per state:Range: 4-41
Median: 18
Number of lakes per ecoregion:Range: 84-119
Median: 101
8/17/2019 Stevens.olsen.jsm Short Course.2006
19/30
Stream Design: Pennsylvania Attaining
Segments• Monitoring Objectives
Estimate number of currently attaining stream segments within each basin that remain attaini ng
• Target Population and Resource Characteristics All attaining stream segments within each basin in Pennsylvania
Elements are stream segments not point on stream linear network
• Sample Frame Polylineshapefile of stream network and point shapefile of segment
centroids
• Institutional Constraints 30 segments sampled per basin
5 random locations on each of the 30 segments; one of which will besampled
• Two-stage spatial survey design Stage 1: select equal probability sample of segments within basin using
GRTS for finite/point resource
Stage 2: select sites within each segment using GRTS for linearresource
Estuary Design: Chesapeake Bay NCA
• Monitoring Objectives
Estimate the square kilometers of Chesapeake Bay and 10 subregions
that are in “good” condition
• Target Population and Resource Characteristics
Surface area of Chesapeake Bay estuary
Elements are all locations Subpopulations are 10 subregions
• Sample Frame
NCA generated pol ygon shapefile
Attribute for subregions
• Institutional Constraints
125 sites sampled in 2005 and 2006
• Spatial survey design for an areal resource with unequal probability
for 10 subregions
8/17/2019 Stevens.olsen.jsm Short Course.2006
20/302
Wetland Design: Pennsylvania• Monitoring Objectives
Estimate number of hectares of palustrine wetlands that are in “good”condition based on a level 2 assessment for each basin in Pennsylvaniaand for four landcover classes within each basin
• Target Population and Resource Characteristics All mapped NWI vegetated wetlands within the Palustrine Emergent,
Palustrine Scrub Shrub and Palustrine Forested classifications that havea predominance (>50%) of emergent, herbaceous or woody vegetation
Elements are all possible locations within the mapped polygons
• Sample Frame NWI polygon shape file restricted to palustrine classes defined
Attributes added identify 4 landcover classes and reporting basins
• Institutional Constraints Monitoring to be completed over 5 years; each year a basin in each of
the six reporting regions of state will be sampled
Expected sample size of 50 in each landcover class in each basin
Over sample of 200% due to sample frame deficiencies
• Spatially balanced survey design for an arealresource with unequal
probability
Wetland Design: Minnesota
• Monitoring Objectives Estimate total hectares of wetlands by wetland class and major basin in
Minnesota
Estimate number of hectares of depressional wetlands that are in good condition by major basin and state-wide
• Target Population and Resource Characteristics All wetlands that can be identified from aerial photointerpretation using USFWS
NWI status and trends mapping proc edures
For extent the elements are 1 sq mile pixels that cover Minnesota
For condition the elements are all locations within wetland polygons delineatedon aerial photos
• Sample Frame For extent, a point shapefile of centroids of 1 sq mile pixels: an “area frame”
For condition, all wetland polygons within sampled extent pixels
• Institutional Constraints 1800 1 sq mile pixels can be photo interpreted each year
Must cover entire state each year
• Two stage survey design Stage 1: Split panel design (annual repeat panel, 3 year panels) equal probability
Stage 2: GRTS design for area resource: remainder to be determined
8/17/2019 Stevens.olsen.jsm Short Course.2006
21/302
Spatial Balance over Population contrasted with
Spatial Balance over Space
Finite Population Example Equi-probable GRTS Sample
Sample: Probability inversely
proportional to population density Equi-probable Inverse density
8/17/2019 Stevens.olsen.jsm Short Course.2006
22/302
Groundwater Wells in Florida Using Inverse of Population Density to GenerateGRTS spatially Balanced Sample over Space
2-d well population density Inverse well inclusion probability
GRTS:
+++ ++
+ +++ +
++++++
+++++ ++++++ ++
++ +++++ +++ + ++++
+ +++ ++
++ ++ + +++
++++ ++
+++ +
+ + ++ +
++ ++ +++
++
++
++++
+++
+
+++ ++
+
+
+
++
+
+ ++ ++
+++ ++ +++ ++++ ++
+ ++ ++ + ++ +
+++ +++ +
++ + + +++ ++ ++ + ++ +++ +
+ ++++ ++ + +
++ ++++ ++
++ ++++++++ +
++ ++ ++
+
+
+++++
++ ++ +++ ++
+ +++ ++ +++ ++
+
+++ +++
++ ++
+++++
+ +++
+ ++
+
+
++ ++ +
+ +++
+++ ++ +++++++
+++ + + ++ +
+ ++ ++ ++
++++ + ++ +++++
+++ +++ ++ ++ ++
+ ++
+
+
++
+ ++ +
++++ +++ +++ +
+ +++++
+
+
+
+
+ +
+++
+
++ +
+ ++++ +++
++
++
+ +++ ++
+++
+
++++
++ ++ ++
++
+++
+++ +
+
+
+
++ ++
++ ++
++
++
+
+
++ +
+
++++
+++
++++
+
+
++++
+
+
+
Existing WellSample Well
+++ ++
+ +++ +
++++++
+++++ ++++++ ++
++ +++++ +++ + ++++
+ +++ ++
++ ++ + ++++
+++ ++
+++ +
+ + ++ +
++ ++ +++
++
++
++++
+++
+
+++ ++
+
+
+
++
+
+ ++ ++
+++ ++ +++ ++
++ +++ ++ ++ +
++ + +++
+++ +++ + + +++ +
+ ++ + ++ +++ ++ +++
+ ++ + +++ ++++ ++
++ ++++++++ +
++ ++ ++
+
+
+++++
++ ++ +++ ++
+ +++ ++ +++ ++
+
+++ +++
++ ++
+++++
+ +++
+ ++
+
+
++ ++ +
+ +++
+++ ++ ++++++
+ +++ + + ++ +
+ ++ ++ ++
++++ + ++ +++++
+++ +++ ++ ++ ++
+ ++
+
+
++
+ ++ +
++++ +++ +++ +
+ +++++
+
+
+
+
+ +
+++
+
++ +
+ ++++ +++
++
++
+ +++ ++
+++
+
++++
++ ++ ++
++
+++
+++ +
+
+
+++ ++
++ ++
++
++
+
+
++ ++
++++
+++
++++
+
+
++++
+
+
+
Existing WellSample Well
Spatially balanced over Population Spatially balanced over Space
Need for flexibility
in spatial survey designs
Target Population, Sample Frame, Sampled Population
We Live in an Imperfect World…
Ideally, cyan, yellow, gray squares would overlap completely
Local Neighborhood
Variance Estimator
8/17/2019 Stevens.olsen.jsm Short Course.2006
23/302
Derivation of Variance Estimator
(Continuous Horvitz-Thompson): Let s1 ,s2 ,…,sn be a sample
selected from a universe U according to a design withinclusion function π (s) and joint inclusion function π (s, t)with π (s) > 0 on U . Let R ⊂ U , and let z(s) be a real-valuedintegrable function defined on R.
An unbiased estimator of isTR
z(s) ds = z∫
ˆn
R i iT
ii = 1
( ) z( ) s s I = z
( ) sπ ∑
Derivation of Variance Estimator
The variance of the HT estimator is
Or equivalently,
ˆ2
T HT
R R R
(s, t) (s) (t)(s) z ( ) = ds + z(s) z(t)dt dsV z (s) (s) (t)
π π π π π π
⎡ ⎤−⎢ ⎥⎣ ⎦
∫ ∫ ∫
[ ]ˆ 2
R RT YG
UU
1 z(s) (s) z(t) (t) I I ( ) = (s) (t) (s , t) dt dsV z
2 (s) (t)π π π
π π
⎡ ⎤− −⎢ ⎥
⎣ ⎦∫∫
Derivation of Variance Estimator
For the HR spatially-balanced designs, π(s, t) has the form
{ }(s, t) = (s) (t) 1- h(s, t)π π π
0 h(s, t) 1≤ ≤ h(s, t) = h(t, s)
where h(s, t) has the properties:
D(s)
(t)dt 4π ≈
∫( , ) 0 for ( )h s t t D s= ∉
There exists a neighborhood D(s) of s such that
( , ) 1h s s =
Derivation of Variance Estimator
It follows that
an independence-like condition.
Intuitively, the design achieves spatial balance by making
the probability of two points being close together small, and
making the point locations effectively independent if the
points are far enough apart.
(s, t) (s) (t) , t D(s)π π π = ∉
Derivation of Variance Estimator
Applying the above property in the YG expression for variancegives:
For fixed s, the expression
vanishes for , so that variance arises LOCALLYaround s. We are going to construct a variance estimator thatmimics the behavior of the YG expression for variance.
[ ]ˆ 2
T YG
U D(s)
1 z(s) z(t)( ) = (s) (t) (s, t ) dt dsV z
2 (s) (t)π π π
π π
⎡ ⎤− −⎢ ⎥
⎣ ⎦∫ ∫
[ ] 2
z(s) z(t)(s) (t) (s, t)
(s) (t)π π π
π π
⎡ ⎤− −⎢ ⎥
⎣ ⎦
( )t D s∉
Derivation of Variance Estimator
Let B be the random event that determines boundaries from
which a point is eventually selected. Those boundaries
define a random partition of the universe, i.e., a random
stratification. The design, conditional on B, is a 1-sample-
per-stratum spatially stratified sample.
8/17/2019 Stevens.olsen.jsm Short Course.2006
24/302
Derivation of Variance Estimator
• Conditioning the variance on the event B leads to the
expression
ˆ ˆ ˆ
ˆi
T T T
i
T
i R s
V( ) = E [V( | B)] + V(E [ | B]) Z Z Z
z( ) s = E [V( | B)] = E V | B Z
( ) sπ ∈
⎡ ⎤⎛ ⎞⎢ ⎥⎜ ⎟
⎝ ⎠⎣ ⎦∑
Derivation of Variance Estimator
We approximate by distributing the
observations over local neighborhoods, using a weighting
function that mimics the behavior of . We
replace the term corresponding to
in V YG with a term of the form
i
i
z( ) s E V | B
( ) sπ
⎡ ⎤⎛ ⎞⎢ ⎥⎜ ⎟
⎝ ⎠⎣ ⎦
(s, t) - (s) (t)π π π
2
j i
j i
z( ) z( ) s s -
( ) ( ) s sπ π
⎛ ⎞⎜ ⎟⎝ ⎠
2
j
i D
j
z( ) s - z ( ) s
( ) sπ
⎛ ⎞⎜ ⎟⎝ ⎠
Derivation of Variance Estimator
The local mean is given by
We pick the neighborhoods D(si ) so that each neighborhood
contains about 4 sample points, and satisfies
j i
j
i D ij
j D( ) s s
z( ) s z ( ) = w s
( ) sπ ∈∑
j i i j D( ) D( ) s s s s∈ ⇔ ∈
Derivation of Variance Estimator
The weights wij are selected using the following criteria:
1.The weight wij should vary inversely as π( s j ) and decrease
as the distance between si and s j increases.
2. , so that the neighborhood totals are
averages over the neighborhoods, and the sum of the
neighborhood totals is equal to the estimated overall total.
1ij iji j
w w= =∑ ∑
Derivation of Variance Estimator
Initially, we set
The column total constraint is satisfied by setting
j i i*
j
rank(| - |) - 1)/count(D( ))s s s
( )sij
1 - ( w =
π
k i
*i j
i j *i k
D( ) s s
w =w
w∈∑
Derivation of Variance Estimator
There is no unique way to satisfy both constraints incriterion (2), so we select the set of weights wij thatminimize
while satisfying criteria (2). We solve this constrainedminimization problem using Lagrange multipliers. Theunconstrained minimization is then
2
i j i j
i, j
( - )w w∑
2
, ,,
min ( ) ( 1) ( 1)ij k l
ij ij k kj l il w
i j k j l i
w w w wλ γ
λ γ − + − + −∑ ∑ ∑ ∑ ∑
8/17/2019 Stevens.olsen.jsm Short Course.2006
25/302
Derivation of Variance Estimator
We can eliminate the wij by setting derivatives equal to 0.
The resulting equations in and are singular, so we
use the Moore-Penrose generalized inverse to solve. The
resulting set of weights is given by
ˆk λ ˆ
l γ
ˆ ˆ ji*
i j i j
+= + .w w
2
γ λ
Derivation of Variance Estimator
The resulting estimator is
2
( )
( )
2
( ) ( )
( )ˆ ˆ( )
( )
( ) ( )
( ) ( )
i
i j i
i j i k i
j
NBH T ij D s
s R s D s j
j k ij ik
s R s D s s D s j k
z sV Z w z
s
z s z sw w
s s
π
π π
∈ ∈
∈ ∈ ∈
⎛ ⎞= −⎜ ⎟⎜ ⎟
⎝ ⎠
⎛ ⎞= −⎜ ⎟⎜ ⎟
⎝ ⎠
∑ ∑
∑ ∑ ∑
Derivation of Variance Estimator
• By the using the symmetry of h(s, t), the estimator can be
re-written as
2
( )
( )
( )ˆ ˆ( )
( ) i j i j
j
NBH T ij D s
s R s D s j
z sV Z w z
sπ ∈ ∈
⎛ ⎞= −⎜ ⎟⎜ ⎟
⎝ ⎠∑ ∑
Simulation Study
• Used finite, linear, & extensive populations
• Drew 1000 GRTS samples of size 50 in each case
• Generated z by interpolating on surface to sample point
coordinates
• Used two surfaces, a “smooth” and “rough”
• Variability probability in all cases, proportional to
“weight” variable
• Compared to IRS variance estimator
8/17/2019 Stevens.olsen.jsm Short Course.2006
26/302
ˆ NBH V ˆ NBH V SD ˆ IRS V ˆ IRS V SD
1519753513836727899273252
563221678352612768104721
V 1000Surface
Finite Population Summary
8/17/2019 Stevens.olsen.jsm Short Course.2006
27/302
Other Applications of Local Neighborhood
Variance Estimator
Application to systematic designs
• Widely accepted that a more or less regular pattern of points (e.g., systematic sampling) ismore efficient than SRS
• A variety of variance estimators for estimatedmean are available for 1-dimensional systematicsampling
• Examine the behavior of some varianceestimators for 2-dimensional systematic andspatially-balanced (not necessarily regular)designs
0 5 10 15 20 25 30
0
5
1 0
1 5
2 0
2 5
3 0
xg
y g
8/17/2019 Stevens.olsen.jsm Short Course.2006
28/302
X
Y
xy[,1]^2 + xy[,2]^2 -xy[,1] -xy[,2]+.1* cos(20*xy[,1]) + 0.1*sin (15*xy[,2]) +1.5
z . p t c h
3 2 . a
r y [ , , i ]
Y
Z
z . p t c h
3 2 . a
r y [ , , i ]
Y
Z
z . p t c h
3 2 . a
r y [ , , i ]
Y
Z
z . p t c h
3 2 . a
r y [ , , i ]
Y
Z
Patchy Surfaces
Variance Estimators Examined
• SRS – simple random sample
• HS – horizontal stratification
• VS – vertical stratification
• AC – 1st order autocorrelation correction
• CAC – Cochran’s autocorrelation correction
Result GRF cv=1-exp(-2x)
0.9730620.001130.004949V NBH
0.788187-0.00230.001436V cac
0.9820000.001890.005709V ac
0.9786250.001900.005728V sv
0.977060.001900.005722V sh
0.999430.008010.011834V SRS
95%
Coverage
BiasMean
Result GRF cv=1-exp(-0.5x)
0.976810.002380.00992V NBH
0.89950-0.00130.00626V cac
0.988810.005150.01270V ac
0.987310.005150.01270V sv
0.987190.005210.01275V sh
0.993560.006930.01448V SRS
95%
Coverage
BiasMean
0.0 0.5 1.0 1.5 2.0
0 . 8
5
0 . 9
0
0 . 9
5
1 . 0
0
CV Parameter
C o v e r a g e
srs
sh
svac
cac
nbh
8/17/2019 Stevens.olsen.jsm Short Course.2006
29/302
Results Patchy Surface
0.954080.000030.00033V NBH
0.74199-0.00020.00009V cac
0.962900.000060.00036V ac
0.934870.000020.00032V sv
0.957960.000110.00040V sh
0.999500.000580.00087V SRS
95%
Coverage
BiasMean
Conclusions
• The hs, vs, ac, and nbh estimators all seem to
work reasonably well for both the GRF and patchy surfaces
• The nbh estimator seems to give coverages that
are closer to nominal than the hs, vs, or ac
estimators
• The nbh works for variable probability, spatially
constrained designs for which the other
estimators do not.
Population estimation
for GRTS designs
using
spsurvey R library
Population Estimation
using spsurvey Library Functions
• Continuous data: cont.analysis Estimates CDF (percent and size of resource), percentiles, and
mean
• Categorical data: cat.analysis Estimates percent and size of resource in each category
• Functionality for both functions Specify subpopulations for which estimates required
Use Horwitz-Thompson estimator
Two variance estimator options
• local neighborhood variance estimator
• Horwitz-Thompson estimator
• Analyses for stratified, unequal probability, and two-stage designs
• Comparison of two CDFs
data.cat
8/17/2019 Stevens.olsen.jsm Short Course.2006
30/30
References
Stevens, D. L., Jr. and A. R. Olsen (2004). "Spatially-balanced sampling ofnatural resources." Journal of American Statistical Association 99(465):262-278.
Stevens, D. L., Jr. and A. R. Olsen (2003). "Variance estimation for
spatially balanced samples of environmental resources." Environmetrics14: 593-610.
Stevens, Jr., D. L. and A. R. Olsen. 1999. Spatially Restricted Surveys OverTime for Aquatic Resources. Journal of Agricultural, Biological, and
Environmental Statistics 4:415-428.
Stevens, Jr., D. L., and T. M. Kincaid. 1998. Variance Estimation forSubpopulations Parameters from Samples of Spatial EnvironmentalPopulations. 1997 Proceedings of the Section on Survey Research
Methods. American Statistical Association, Alexandria, VA. p. 86-85.
Stevens, Jr., D.L. 1997 . Variable Density Grid-Based Sampling Designsfor Continuous Spatial Populations Environmetrics. 8:167-195.
http://www.epa.gov/nheerl/arm