Introduction, Day 2The myth of constant detectability...

Post on 10-Aug-2021

4 views 0 download

transcript

Introduction, Day 2

Peter Solymos

Point count data analysis workshop, BIOS2 2021, March 16-25

1 / 36

OutlineDay 1

• Introduction• We need to talk about data• A primer in regression techniques

Day 2

• Behavioral complexities• Removal models and assumptions

Day 3

• The detection process• Dealing with recordings

Day 4

• Putting it all together• Roadside surveys

2 / 36

Get course materials

1. Visit https://github.com/psolymos/qpad-workshop/releases2. Download the latest release into a NEW folder3. Extract the zip/tar.gz archive4. Open the workshop.Rproj file in RStudio (or open any other

R GUI/console and setwd() to the directory where youdownloaded the file)

5. Move your LOCAL files into the new folder to keep thingstogether

3 / 36

Local copy

Avoid conflicts as we update the workshop materials: work in aLOCAL copy of the R markdown filessource("src/functions.R")qpad_local(day=2)

LOCAL copies will not be tracked and overwritten by git. You cancopy new files over and it will not impact your local copies.

4 / 36

Update bSims

Patched the Shiny apps, please update!remotes::install_github("psolymos/bSims")

5 / 36

What is detectability?

In the most colloquial terms, δ is the probability that a species isdetected given it is present:

P(Y > 0 | N > 0)

6 / 36

Occupancy

In an occupancy framework, we can have:

• A detection: true positives (false positive rate is 0)• P(Y > 0) = P(Y > 0 | N > 0)P(N > 0) + P(Y > 0 | N =

0)P(N = 0)• P(Y > 0) = δϕ+ 0(1− ϕ) = δϕ

• A non-detection: false negatives + true negatives• P(Y = 0) = P(Y = 0 | N > 0)P(N > 0) + P(Y = 0 | N =

0)P(N = 0)• P(Y = 0) = (1− δ)ϕ+ 1(1− ϕ) = 1− δϕ,

(These are the marginal probabilities used to estimate theparameters using maximum likelihood.)

7 / 36

Side note on occupancy

People often confuse these two conditional probabilities:

1. Observing 0 given that the species is present:

• P(Y = 0 | N > 0) = 1− δ

2. Presence of a species given that we observe 0:

• P(N > 0 | Y = 0) = P(Y =0|N>0)P(N>0)P(Y =0)

• (1−δ)ϕϕ(1−δ)+(1−ϕ) = ϕ(1−δ)

1−δϕ

8 / 36

Abundance

A lot more combinations of true abundance and observed counts:

Y = 0 1 2 . . .

N = 0 x1 x x2 x x x. . . x x x x

9 / 36

Estimating detectability

To estimate δ, we need:

• ancillary information (multiple visits, distance bands, timeintervals, multiple observers),• parametric model assumptions (i.e. δ varies across locations).

10 / 36

The myth of constant detectability

Detectability zealots often view a method that cannot estimateconstant detection probability δ (e.g. single-visit occupancy andN-mixture models) as inferior.

Fortunately for the rest of us: δ can only be constant in very narrowsituations, e.g. when surveys are conducted:

• in the same region,• in similar habitat,• in the same year,• on the same day,• at the same time,• by the same observer,• using the same protocol.

11 / 36

Constant detectability is rare

Often a consequence of small sample size (i.e. not a lot of detectionfor a species)12:

1Solymos et al. 2013, Methods. Ecol. Evol. 4:1047–1058.2Solymos et al. 2018, Condor 120:765–786.

12 / 36

Availability varies

13 / 36

Availability really varies

14 / 36

Biological mechanisms

• Migration timing drives phenology for many species,e.g. ordinal day of year (DAY),• when study spans across biomes, use time since local spring(multi-year average),• or time since spring green up, last snow day, etc. based onactual survey year,• time of day,• time since local sunrise (TSSR).

15 / 36

Perceptibility varies too

16 / 36

Physical mechanisms

• Trees block the transmission of sound• Broad leaves rustle more• Louder sounds travel farther• Low frequency sounds travel farther

17 / 36

Let’s unwrap δ

1. Once the species/individual is present (N > 0)2. It needs to signal its presence: make itself heard/visible, make

itself available (p),3. Then the signal needs to be received by a sensor: human ear or

a microphone (and then the human ear in the lab listening to arecording), perceptibility q

These imply a pre-defined total time duration and maximumcounting radius.

18 / 36

QPAD

Now we can expand our equation:E [Y ] = NC = (AD)(pq) = qpAD

The expected value of the observed count becomes a function of the:

• population density (D),• area sampled (A),• availability (p),• and perceptibility (q).

19 / 36

Space and time to the rescue

p is a monotonic function of time, while q is monotonic function ofarea (space).

20 / 36

Simulate QPAD

D <- 2.0 # inds / unit areaA <- 1.0 # areap <- 0.8 # availabilityq <- 0.5 # perceptibility

n <- 100 # sample sizeN <- rpois(n, lambda = A * D)Y <- rbinom(n, size = N, prob = p * q)

21 / 36

Output

table(N=N, Y=Y)

## Y## N 0 1 2 3 4## 0 9 0 0 0 0## 1 15 12 0 0 0## 2 7 19 6 0 0## 3 2 9 5 0 0## 4 2 4 3 0 2## 5 0 0 3 0 0## 6 0 1 0 1 0

22 / 36

The model

• (N | D,A) ∼ Poisson(DA)• (Y | N, p, q) ∼ Binomial(N, pq).

Incorporates key components of reality, but also ignores a lot ofdetails.

23 / 36

Assumptions

• Observations are independent• Y involves no double counting• Area is known and measured without error• Detectability (pq) is independent of N

And a lot more that we’ll cover later

24 / 36

Why do we need simulation

Statistical techniques are expected to provide unbiased estimateswhen the data generation process follows the assumptions of themodel. Testing this is standard practice for any newtechnique.

But testing if the model assumptions are realistic is rarelyevaluated with the same rigor.

For that we need to link to biological mechanisms to do sensitivityanalysis. This is a lot more complex than a coin toss.

25 / 36

Capturing complexity

“I’ve yet to see any problem, however complicated, which when youlooked at it the right way didn’t become still more complicated.” —

Poul Anderson, Call Me Joe

26 / 36

It is popular

“Love the simulation we’re dreaming in” — Dua Lipa, Physical

27 / 36

Some history

Movement trajectories of bird flocks to mitigate mortality nearairports and wind farms.

Beth Rigby (2016): Simulating Effects of Imperfect Detectability inBird Surveys

But not a lot more. . .

28 / 36

bSims goals

• Allow easy testing of statistical assumptions and exploreeffects of violating these assumptions• Aid survey design by comparing different options• And most importantly, to have fun while doing it via an

intuitive and interactive user interface

29 / 36

bSims design

• Isolation: the spatial scale is small (local point count scale) sothat we can treat individual landscapes as more or lesshomogeneous units (but see below how certain stratifieddesigns and edge effects can be incorporated) and independentin space and time• Realism: the implementation of biological mechanisms and

observation processes are realistic, defaults are chosen to reflectcommon practice and assumptions• Efficiency: implementation is computationally efficient

utilizing parallel computing backends when available• Extensibility: the package functionality is well documentedand easily extensible

30 / 36

bSims verbs

• Initialize (bsims_init): the landscape is defined by theextent and possible habitat stratification• Populate (bsims_populate): the population of finite number

of individuals within the extent of the landscape• Animate (bsims_animate): individual behaviours describedby movement and vocalization events, i.e. the frequency ofsending various types of signals• Detect (bsims_detect): the physical side of the observationprocess, i.e. transmitting and receiving the signal• Transcribe (bsims_transcribe): the “human” aspect of the

observation process, i.e. the perception of the received signal

31 / 36

Behavioral events

Event time (T ) is a continuous random variable

In the simplest case, its probability density function is theExponential distribution: f (t) = φe−tφ

The corresponding cumulative distribution function is:F (t) =

∫ t0 f (t)dt = 1− e−tφ = pt , the probability that the event

has occurred by duration t

The parameter φ is the rate of the Exponential distribution withmean 1/φ and variance 1/φ2.

32 / 36

Survival and hazard function

The complement of F (t) CDF is called the survival function(S(t) = 1− F (t), S(0) = 1), which gives the probability that theevent has not occurred by duration t.

The hazard function (λ(t) = f (t)/S(t)) defines the instantaneousrate of occurrence of the event (risk, the density of events at tdivided by the probability of surviving).

The cumulative hazard (cumulative risk) is the sum of the risksbetween duration 0 and t (Λ(t) =

∫ t0 λ(t)dt).

33 / 36

Exponential model

The simplest survival distribution assumes constant risk over time(λ(t) = φ), which corresponds to the Exponential distribution.

The Exponential distribution also happens to describe the lengths ofthe inter-event times in a homogeneous Poisson process (events areindependent, it is a ‘memory-less’ process).

34 / 36

Exponential model visualized

0 2 4 6 8 10

010

2030

4050

Time

Indi

vidu

als

35 / 36

Zoom in on mechanisms

Today we are focusing on availability, p ∈ (0, 1)

We assume that peceptibility is q = 1

36 / 36