Spatial Prediction of Coho Salmon Counts on Stream Networks
Dan DalthorpLisa Madsen
Oregon State University
September 8, 2005
Sponsors
• U.S. EPA STAR grant # CR-829095
• U.S. EPA Program for Cooperative Research on Aquatic Indicators at Oregon State University grant # CR-83168201-0.
Outline
• Introduction
(i) Coho salmon data
(ii) GEEs for spatial data
• Latent process model for spatially correlated counts
• Estimation and results
• Cross-validation
• Simulation study
• Conclusions and future research
Coho Salmon Data
• Adult Coho salmon counts at selected points in Oregon coastal stream networks for 1998 through 2003.
• Euclidean distance between sampled points.
• Stream distance between sampled points.
Coastal Stream Networks and Sampling Locations
GEEs for Spatially Correlated Data
• Liang and Zeger’s (1986) pioneering paper in Biometrika introduced GEEs for longitudinal data.
• Zeger (1988) developed GEE analysis for a time series of counts using a latent process model.
•McShane, Albert, and Palmatier (1997) adapted Zeger’s model and analysis to spatially correlated count data.
• Gotway and Stroup (1997) used GEEs to model and predict spatially correlated binary and count data.
• Lin and Clayton (2005) develop asymptotic theory for GEE estimators of parameters in a spatially correlated logistic regression model
The Latent Process Model
)()(exp)(|)(var)(|)(
);()(),(corr
)(var
1)(
at process spatiallatent stationary negative-non)(
at vector covariate)(
counts observed)(,),(),(
2
21
iiiiii
n
ssssyssyE
hhss
s
sE
ss
ss
sysysy
x
x
y
Suppose:
The latent process allows for overdispersion andspatial correlation in .y
The Marginal Model
);()(),(cov
)(var
)(exp)(
2
22
hsysy
sy
ssyE
jiji
iii
iii
x
These assumptions imply:
For now, we assume a simple constant-mean model anda one-parameter exponential correlation function:
0exp
01);(
)exp()(
hh
hh
syE i
Estimating the Model Parameters
To estimate parameters solve estimating equations:,,, 2
0,,,,
0,,22
21
vzE
μyVD
where
i
i
jjii
n
n
sysyij
,,,,
,, of diagonal thebelow elements ofvector ,,
)()(element th
hmatrix wit a of diagonal thebelow elements ofvector
,,,,
corr
)cov(,,
,,,,
2
2
22
1
222
1
vE
Vv
z
D
εR
RIyV
μ
Iterative Modified Scoring Algorithm
Step 0: Calculate initial estimates
000,10
ˆ
ˆˆ
variancesample
)ˆlog(ˆ
mean sampleˆ
)0(
2)0(
)0(2)0(2
2
)0()0(
)0(
s
s
Step 1: Update .
IyVD
DVD)()()(2)()1()(
1)()()(2)()1()()()1(
ˆˆ,)ˆ(,ˆˆ
ˆˆ,)ˆ(,ˆˆˆˆ
mmmmm
mmmmmmm
Step 2: Update .
2)1(
1
)1(2)1(
)1(
ˆ
ˆˆ)(ˆ
m
n
i
mm
im
n
sy
2
Step 3: Update .
)1()1(2)()1()1()1(2)(
1)1()1(2)()1()1(2)()()1(
ˆ,)ˆ(,ˆˆˆ,)ˆ(,ˆ
ˆ,)ˆ(,ˆˆ,)ˆ(,ˆˆˆ
mmmmmmm
mmmmmmmm
vzE
EE
Iterate steps 1, 2, and 3 until convergence.
Assessing Model Fit–Estimating the Mean
Year
Sample Mean
EuclideanDistance
StreamDistance
1998 6.2451
6.0 6.4941
1999 9.0025 8.7286 9.0765
2000 11.92 10.898 11.481
2001 31.359 31.597 34.541
2002 46.494 46.782 46.725
2003 44.453 41.005 41.829
Assessing Model Fit – Estimating the Variance
Year
SampleStd. Dev.
EuclideanDistance
StreamDistance
1998 222.07 221.59 221.59
1999 443.65 442.61 442.54
2000 384.59 384.75 383.90
2001 2508.6 2502.3 2512.3
2002 9286.6 9265.4 9265.4
2003 3650.2 3653.4 3648.4
Assessing Model Fit – Estimating the Range (Euclidean Distance)
Assessing Model Fit – Estimating the Range (Stream Distance)
Cross validation to compare predictions based on three different assumptions about the underlying spatial process: 1. Null model (spatial independence) :
2. Spatial correlation as a function of Euclidean distance (ed):
3. Spatial correlation as a function of stream network distance (id)
ijji nzz )1/(ˆ
)ˆ(ˆˆ ][][1
][ ][][][ iiZii iiiiededz ZZZZ
)ˆ(ˆˆ ][][1
][ ][][][ iiZii iiiiididz ZZZZ
Covariance model _ Euclidean Stream distance1998 -0.001 -0.047 1999 0.007 -0.037 2000 0.013 0.011 2001 -0.005 -0.005 2002 -0.008 -0.007 2003 -0.002 0.020
z
nzz ii /)ˆ(1. Bias? Not an issue...
Covariance model _ Null Euclidean Stream distance1998 14.72 13.25 14.001999 20.58 19.75 21.172000 20.05 19.83 19.742001 48.69 34.38 37.752002 98.53 97.04 97.352003 60.49 60.92 58.61
nzzMSPE ii /)ˆ( 22. Precision?
Variances of predicteds
Null Euclidean Stream 0.04 10.32 4.95 0.05 12.07 7.68 0.04 11.46 6.40 0.13 38.08 33.36 0.22 15.74 10.65 0.14 24.34 25.99
Odds(|Eed| < |Eid|)
Year Odds 1998 256:152 1999 267:132 2000 266:171 2001 197:198 2002 266:171 2003 222:197 Total 1474:1021
ii ided ZZ ˆˆ
Simulations For each year, 8 scenarios that mimic the sample means,variances, and ranges from the data were simulated.
Mean and variance constant1. Euclidean spatial correlation 2. Stream network spatial correlation
Mean varies randomly by stream network; variance = 3.66 1.741
3. Euclidean spatial correlation; long range4. Euclidean spatial correlation; medium range5. Euclidean spatial correlation; short range
6. Stream network spatial correlation; long range7. Stream network spatial correlation; medium range8. Stream network spatial correlation; short range
•
•••••••••
•
•
•
•••••••
•
•
••• ••
•
••
•
••••
•••• •
••••• •
••••••••• •
•• •••
••••••
•
•••••••
• •••• • •••
•
• ••••• •• •• •
•••
•
• •
••••••
•
• ••
••• • •• •••••••••••
•
••
•
••• ••••
•••••• •
•
•
•••
•
•
••
•••
••• •••••••••••••••••
•
•• •••
•
•
•••
••••••
••••
•
•
•
•• ••••••••••• •••••• •• •••
•••••
••••• •
•••• •••••
•
•
•
•• •••••
•••••
•
•
•••••• •••••
•
•
•••• •••
•
••
• ••
• ••••
•
••
• ••••••••••••
••••••
•
•
•• ••
•••••••
•
•
•
• ••••••••
•
•••••••••••• •• • ••••••••
• • • •••
•
•
•
•••
•
••••
•
•••
••••••
•
•
•
•
•••••
• •• ••••••
•
•
• ••••••
•••••
•• •••
•
•
• ••••
•
•
•
•
••
• •
••
• • •
••
•••
• ••
•
••
•• ••
••
•
•
•
•
• ••
•
••
• •
1. Simulate vector Z of correlated lognormal-Poissons to cover all sampling sites (n ≈ 400)
2. Estimate parameters (range) via latent process regression from simulated data for a subset of the samplingsites (blue)
3. Predict Z at the remaining sites (red, m ≈ 400) using:
(Gotway and Stroup 1997)
4. Repeat 100 times for each scenario (8) and year (6)
Simulation proceedure
)ˆ(ˆˆ 10 ZZ0 ZZZ
Use Euclidean distance or stream distance in covariance model?
Evaluation of predictions via two measures:
mzzMSPEm
iii /)ˆ(
1
2
SSTSSE
Rsq 1 where:
m
ii zzSST
1
20 )(
m
iii zzSSE
1
200 )ˆ(
Summary of Findings
Cross-validations:1. MSPEs same for Euclidean distance and stream network distance;
2. Errors usually smaller with Euclidean distance;
3. Population spikes more likely to be detected with Euclidean distance.
Simulations:1. Euclidean spatial process: Euclidean covariance gives smaller MSPE than doesstream network distance covariance;
2. Stream network process: Euclidean covariance model MSPEs comparable tothose of stream distance model EXCEPT when network means varied and range of correlation was large.
Future work
-- Incorporate covariates (with some misaligned data);
-- Incorporate downstream distances/flow ratios;
-- Spatio-temporal modeling;
-- Rank correlations in place of covariances;
-- Model selection;
-- Non-random data;