Post on 14-Dec-2014
description
transcript
Combining Data in Species Distribution Models
Combining Data in Species Distribution Models
Bob O’Hara1 Petr Keil 2 Walter Jetz2
1BiK-F, Biodiversity and Climate Change Research CentreFrankfurt am MainGermany bobohara
2Department of Ecology and Evolutionary BiologyYale University
New Haven, CT, USA
Combining Data in Species Distribution Models
Motivation
Map Of Life
www.mol.org/
Combining Data in Species Distribution Models
The Problem
Different data sources
I GBIF
I expert range maps
I eBird and similar citizen science efforts
I organised surveys (BBS, BMSs)
Combining Data in Species Distribution Models
Pointed Process Models
Point process representation of actual distribution
I Continuous space models
Build different sampling models on top
Combining Data in Species Distribution Models
Point Processes: Model
Intensity ρ(ξ) at point s. Assume covariates (features?) X (ξ), anda random field ν(ξ)
log(ρ(ξ)) = η(ξ) =∑
βX (ξ) + ν(ξ)
then, for an area A,
P(N(A) = r) =λ(A)re−λ(A)
r !
where
λ(A) =
∫Aeη(s)ds
Combining Data in Species Distribution Models
In practice...
Constrained refined Delaunay triangulation
λ(A) ≈N∑
s=1
|A(s)|eη(s)
Approximate λ(ξ) numerically:select some integration points,and sum over those
Combining Data in Species Distribution Models
Some Data Types
I AbundanceI e.g. Point counts
I Presence/absenceI surveys, areal lists
I Point observationsI museum archives, citizen science observations
I Expert range maps
Combining Data in Species Distribution Models
Abundance
Assume a small area A, so that η(ξ) is constant, and observationfor a time t, then n(A, t) ∼ Po(eµ(A,t)) with
µA(A, t) = η(A) + log(|A|) + log(t) + log(p)
where p is the proability of observing each indidivual.Don’t know all of |A|, t and p, so estimate an interceptCan also add a sampling model to log(p)
Combining Data in Species Distribution Models
Presence/Absence for ’points’
As n(A, t) ∼ Po(µ(A, t)),
cloglogPr(n(A, t)) = µI (A, t)
with µI (A, t) as beforeAgain, can make log(|A|) + log(t) + log(p) an intercept
Combining Data in Species Distribution Models
Presence only: point process
log Gaussian Cox ProcessLikelihood is a Poisson GLM (but with non-integer response)
Combining Data in Species Distribution Models
Areal Presence/absence
If an area is large enough, we can’t assume constant covariates, so
Pr(n(A) > 0) = 1− e∫A eρ(ξ)dξ
in pracice this is calculated as
1− e∑
s |A(s)|eρ(s)
which causes problems with the fitting
Combining Data in Species Distribution Models
Expert Range Maps
Not the same as areal presence.Instead, use distance to range asa covariate
I within range, this is 0.
I Have to estimate the slopefor outside the range
Use informative priors to forcethe slope to be negative 0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
Space (1d)
Inte
nsity
Species'Range
Combining Data in Species Distribution Models
Put these together with INLA
Quicker than MCMC
SolTim.res <- inla(SolTim.formula,
family=c('poisson','binomial'),
data=inla.stack.data(stk.all),
control.family = list(list(link = "log"),
list(link = "cloglog")),
control.predictor=list(A=inla.stack.A(stk.all)),
Ntrials=1, E=inla.stack.data(stk.all)$e, verbose=FALSE)
Combining Data in Species Distribution Models
The Solitary Tinamou
Photo credit: Francesco Veronesi on Flickr(https://www.flickr.com/photos/francesco veronesi/12797666343)
Combining Data in Species Distribution Models
Data
Whole RegionExpert rangePark, absentPark, presenteBirdGBIF
I expert range
I 2 pointprocesses (49points)
I 28 parks
Combining Data in Species Distribution Models
A Fitted Model
mean sd mode
Intercept -0.30 0.09 -0.30b.PP 1.37 0.40 1.37
b.GBIF 1.43 0.26 1.43Forest -0.03 0.04 -0.03
NPP 0.15 0.05 0.15Altitude -0.02 0.04 -0.02
DistToRange -0.01 0.02 -0.01
Combining Data in Species Distribution Models
Predicted Distribution
−0.10
−0.05
0.00
0.05
0.10
0.15
0.20
0.25
Whole RegionExpert rangePark, absentPark, presenteBirdGBIF
Combining Data in Species Distribution Models
Individual Data Types
Expert Range
−10
−8
−6
−4
−2
0
GBIF−0.060
−0.058
−0.056
−0.054
−0.052
−0.050
−0.048
eBird−0.060
−0.058
−0.056
−0.054
−0.052
−0.050
−0.048
Parks
−10
−8
−6
−4
−2
0
all data
−0.10
−0.05
0.00
0.05
0.10
0.15
0.20
0.25
Combining Data in Species Distribution Models
Summary
Parks and expert range seem to drive distributionNPP is main covariate, not forest or altitude
Combining Data in Species Distribution Models
What Next
Multiple species
I already being done elsewhere
I estimate sampling biases
More Data
I Point counts (have it working)
Can we estimate absolute probability of presence?
I Distance sampling?
I Mark-recapture?
I scaling issues (in time and space)
Combining Data in Species Distribution Models
Not the final answer...
http://www.gocomics.com/nonsequitur/2014/06/24