+ All Categories
Home > Documents > Dataintegrationmethodsforstudying...

Dataintegrationmethodsforstudying...

Date post: 03-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
104
Data integration methods for studying animal population dynamics by Audrey Béliveau M.Sc., Université de Montréal, 2012 B.Sc., Université de Montréal, 2010 Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Department of Statistics and Actuarial Science Faculty of Science c Audrey Béliveau 2015 SIMON FRASER UNIVERSITY Fall 2015 All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately.
Transcript
Page 1: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Data integration methods for studyinganimal population dynamics

by

Audrey Béliveau

M.Sc., Université de Montréal, 2012B.Sc., Université de Montréal, 2010

Dissertation Submitted in Partial Fulfillment of theRequirements for the Degree of

Doctor of Philosophy

in theDepartment of Statistics and Actuarial Science

Faculty of Science

c© Audrey Béliveau 2015SIMON FRASER UNIVERSITY

Fall 2015

All rights reserved.However, in accordance with the Copyright Act of Canada, this work may bereproduced without authorization under the conditions for “Fair Dealing.”

Therefore, limited reproduction of this work for the purposes of private study,research, criticism, review and news reporting is likely to be in accordance with

the law, particularly if cited appropriately.

Page 2: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Approval

Name: Audrey Béliveau

Degree: Doctor of Philosophy (Statistics)

Title: Data integration methods for studying animalpopulation dynamics

Examining Committee: Chair: Gary ParkerProfessor

Richard LockhartSenior SupervisorProfessor

Carl SchwarzCo-SupervisorProfessor

Steven ThompsonSupervisorProfessor

Rick RoutledgeInternal ExaminerProfessor

Paul ConnExternal ExaminerResearch Mathematical StatisticianNational Marine Mammal LaboratoryNOAA/NMFS Alaska Fisheries ScienceCenter

Date Defended: 22 December 2015

ii

Page 3: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Abstract

In this thesis, we develop new data integration methods to better understand animal pop-ulation dynamics. In a first project, we study the problem of integrating aerial and accessdata from aerial-access creel surveys to estimate angling effort, catch and harvest. We pro-pose new estimation methods, study their statistical properties theoretically and conducta simulation study to compare their performance. We apply our methods to data from anannual Kootenay Lake (Canada) survey.

In a second project, we present a new Bayesian modeling approach to integrate capture-recapture data with other sources of data without relying on the usual independence assump-tion. We use a simulation study to compare, under various scenarios, our approach withthe usual approach of simply multiplying likelihoods. In the simulation study, the MonteCarlo RMSEs and expected posterior standard deviations obtained with our approach arealways smaller than or equal to those obtained with the usual approach of simply multi-plying likelihoods. Finally, we compare the performance of the two approaches using realdata from a colony of Greater horseshoe bats (Rhinolophus ferrumequinum) in the Valais,Switzerland.

In a third project, we develop an explicit integrated population model to integrate capture-recapture survey data, dead recovery survey data and snorkel survey data to better under-stand the movement from the ocean to spawning grounds of Chinook salmon (Oncorhynchustshawytscha) on the West Coast of Vancouver Island, Canada. In addition to providingspawning escapement estimates, the model provides estimates of stream residence time andsnorkel survey observer efficiency, which are crucial but currently lacking for the use of thearea-under-the-curve method currently used to estimate escapement on the West Coast ofVancouver Island.

Keywords: Aerial-access; Capture-recapture; Creel surveys; Independence assumption;Integrated population modeling; Oncorhynchus tshawytscha

iii

Page 4: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Acknowledgements

First and foremost, I am very grateful to my supervisors Richard Lockhart and Carl Schwarzfor their time, advice, financial support and the collaboration opportunities offered through-out my doctoral program.

I would like to thank my collaborators: Steve Arndt for providing insight on the creelsurvey data; Roger Pradel for hosting me at the CEFE and introducing me to integratedpopulation modeling; Michael Schaub and Raphaël Arlettaz for providing the bats data andinsight; and finally Roger Dunlop for hosting me during the 2014 Burman River survey andfor the numerous discussions that have followed.

I can say without a doubt that those PhD years were the best of my life so far, for themost part thanks to the incredibly friendly atmosphere in the Department and the amazingpeople I met there. I would like to thank Derek Bingham for hosting me in his lab andproviding access to computing resources. I am also grateful to Gary Parker for his supportin a wide array of instances. To my fellow graduate students and friends Ararat, Biljana,Elena, Huijing, Mike, Ofir, Oksana, Ruth, Shirin, Zheng and many others, thank you forcheering up my days and for the many dinners, concerts, tennis matches and more! A veryspecial mention goes to Shirin and Ofir for their support in difficult times.

I would like to say a big thank you to all my dancing friends and teammates for allthe fun times that helped maintain a good balance in my life. I am also thankful to DavidHaziza for always believing in me!

Finally, I gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada.

iv

Page 5: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Table of Contents

Approval ii

Abstract iii

Acknowledgements iv

Table of Contents v

List of Tables vii

List of Figures ix

1 Introduction 1

2 Adjusting for undercoverage of access-points in creel surveys with feweroverflights 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Sampling Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.1 Inference Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Study of the Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.3 Study of the Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.4 Optimal Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.5 Stratification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Explicit integrated population modeling: escaping the conventional as-sumption of independence 233.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Background and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Capture-recapture survey . . . . . . . . . . . . . . . . . . . . . . . . 24

v

Page 6: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

3.2.2 Population count survey . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.3 Integrated population modeling via likelihood multiplication . . . . . 27

3.3 Integrated population modeling based on the true joint likelihood . . . . . . 283.3.1 Capture-recapture and count data . . . . . . . . . . . . . . . . . . . 283.3.2 Model variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Integrated population modeling of Chinook salmon (Oncorhynchus tshawytscha)migration on the West Coast of Vancouver Island 454.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Sampling Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4 A Jolly-Seber approach to estimate escapement . . . . . . . . . . . . . . . . 494.5 Integrated population modeling . . . . . . . . . . . . . . . . . . . . . . . . . 534.6 Analysis of the 2012 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.6.1 Assessment of the integrated population model . . . . . . . . . . . . 644.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Bibliography 67

Appendix A Supplementary materials for Chapter 2 70A.1 First-order Taylor expansions . . . . . . . . . . . . . . . . . . . . . . . . . . 70A.2 Assumptions, propositions and proofs for the study of Errparty . . . . . . . . 70

A.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70A.2.2 Study of Errparty for the estimators CR and CDE . . . . . . . . . . . 71A.2.3 Study of Errparty for the estimator C1 . . . . . . . . . . . . . . . . . 72A.2.4 Study of Errparty for the estimator C2 . . . . . . . . . . . . . . . . . 74

A.3 Proof of the Optimal Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 76A.4 Monte Carlo measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77A.5 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Appendix B Supplementary materials for Chapter 3 81B.1 Monte Carlo measures used in the simulation study . . . . . . . . . . . . . . 81B.2 Plots of the results of the simulation study . . . . . . . . . . . . . . . . . . . 83B.3 Bats data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Appendix C Supplementary materials for Chapter 4 94C.1 Analysis of the 2012 capture-recapture data using the software MARK . . . 94

vi

Page 7: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

List of Tables

Table 2.1 Values of αi and βi for the variance formulas . . . . . . . . . . . . . . 10Table 2.2 Parameter values used to generate the data for the simulation study. . 13Table 2.3 Monte Carlo measures for the simulation with µb = 130. Numbers are

expressed in %. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Table 2.4 Allocation of sample size in the 2010-2011 Kootenay Lake Creel Survey. 18Table 2.5 Optimal values of no/ng for each month and day type combination for

the number of rainbow trout kept. Note that we do not present resultsfor the double expansion estimator because in that case the optimalallocation is no = ng. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Table 2.6 Seasonal combined estimates (Est) of total number of rainbow troutkept along with approximate 95% confidence intervals (Low,Upp). Thelast column is computed as a separate total estimate over the threeseasons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Table 3.1 Changes in the population size per state over time for a study withK = 3 periods. The table follows the timeline in Figure 3.1. Startingin the upper left corner of the table, the population is comprised of N1

unmarked individuals at the beginning of period 1. Then, the countsurvey occurs (which does not affect the state nor size of the popula-tion). Then, B1 births occur resulting in N1 +B1 unmarked individu-als in the population. Then, C1 individuals are captured, marked andreleased which leaves N1+B1−C1 unmarked individuals in the popula-tion. Then, Du

1 unmarked individuals die and Dm11 marked individuals

die. When period 2 begins, there are respectively N1 +B1 − C1 −Du1

and C1−Dm11 unmarked and marked individuals in the population. The

table goes on like this until the study is finished. Note: C & R is usedto abbreviate “captures and recaptures”. . . . . . . . . . . . . . . . . 30

Table 3.2 Monte Carlo measures comparing the performance of the true jointlikelihood approach (L) and the composite likelihood approach (Lc) inthe simulation study, across scenarios and parameters. Each MonteCarlo measure is based on 250 simulated datasets. . . . . . . . . . . . 35

vii

Page 8: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Table 3.3 Monte Carlo estimates of P (WL ≤ WLc), where W stands for eitherthe absolute error (AE), the standard deviation of the posterior sample(SD) or the length of the 95% HPD credible interval (LCI). Each MonteCarlo measure is based on 250 simulated datasets. . . . . . . . . . . . 36

Table 4.1 Notation for the data collected at Burman River. The subscript s cantake the values m (males) and f (females). . . . . . . . . . . . . . . . 49

Table 4.2 Notation for the parameters used in the Jolly-Seber model and/or theintegrated population model. The subscript s can take the values m(males) and f (females). . . . . . . . . . . . . . . . . . . . . . . . . . 51

Table 4.3 Variables used in the Jolly-Seber model, categorized based on their rolein the model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Table 4.4 Formulas used to compute quantities of interest for the Jolly-Sebermodel or the integrated population model. Residence time and alivepopulation size in the stream cannot be estimated from the Jolly-Sebermodel. Notes: (1) Sums are defined as zero when backwards; (2) Theuse of d − 0.5 in the mean stopover time calculation is based on theassumption that within a day, the movement of fish upstream to thespawning grounds is distributed uniformly over the day; (3) The latentvariables Nm

i,j,s, and Ami,j,s are defined as 0 when i is not a capture-recapture day; (4) The time unit is days. . . . . . . . . . . . . . . . . 54

Table 4.5 Variables used in the integrated population model, categorized basedon their role in the model. . . . . . . . . . . . . . . . . . . . . . . . . 55

Table 4.6 Escapement estimates obtained from the Jolly-Seber model and the in-tegrated population model. The formulas used to calculate escapementare given in Table 4.4. CI denotes credible intervals. . . . . . . . . . . 60

Table 4.7 Integrated population modeling marginal estimates and credible in-tervals of observer efficiency in the snorkel survey, based on the fishvisibility covariate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

viii

Page 9: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

List of Figures

Figure 2.1 Boxplots of relative bias due to the partial interview of parties for100 population replicates with varying mean number of boats perday. Left column: scenario (A); right column: scenario (B). Thefirst to fourth rows relate, in order, to the estimators C1, C2, CDEand CR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Figure 2.2 Kootenay Lake and the creel survey access points. Riondel/CrawfordBay and Boswell/Kuskanook ramps were combined for field moni-toring and data analysis. Map provided by A. Waterhouse, Ministryof Forests, Lands, and Natural Resource Operations. . . . . . . . . 17

Figure 2.3 Monthly estimates of total number of rainbow trout kept along withapproximate 95% confidence intervals. The top and bottom graphsrepresent weekends and weekdays respectively. The estimators (2.1)to (2.4) are represented respectively by the following symbols: tri-angle, circle, x mark and square. . . . . . . . . . . . . . . . . . . . . 19

Figure 3.1 Timeline of events of the animal population study. The symbols “C”,“B” and “CR” stand for count survey, births and capture-recapture,respectively. Note that the time between the count survey, the birthsand the capture-recapture survey in each period is negligible. . . . 28

Figure 3.2 Marginal posterior distributions (smoothed) obtained from analyzingthe bats data. The plain line represents the true joint likelihoodmethod while the dashed line represents the composite likelihoodmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Figure 4.1 Map of Burman River on the West Coast of Vancouver Island, Canada 46Figure 4.2 Schematic representation of Chinook salmon migration at Burman

River, as assumed by the integrated population model. The arrowsdenote transitions while boxes denote states. . . . . . . . . . . . . . 53

Figure 4.3 Timeline when surveys were performed in 2012. Each occurrence isdenoted by a symbol “×”. Adjacent symbols correspond to consecu-tive days. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 4.4 Summary time series of the 2012 data. . . . . . . . . . . . . . . . . 58

ix

Page 10: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Figure 4.5 Daily discharge measured at Gold River over the 2012 migrationperiod. Although discharge data is not available at Burman River,the data at nearby Gold River are thought to be a good proxy forBurman River. The first big freshet occurred on October 14th. . . . 59

Figure 4.6 Estimates of the population size in the pool obtained using the Jolly-Seber model based on the formula in Table 4.4. Each estimate isrepresented along with a 95 % HPD credible interval. . . . . . . . . 61

Figure 4.7 Stopover time estimates obtained using the Jolly-Seber model basedon the formula in Table 4.4. Each estimate is represented along witha 95 % HPD credible interval. . . . . . . . . . . . . . . . . . . . . . 61

Figure 4.8 Estimates of the population size in the tagging pool obtained usingthe integrated population modeling approach based on the formulain Table 4.4. Each estimate is represented along with a 95 % HPDcredible interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Figure 4.9 Stopover time estimates obtained using the integrated populationmodeling approach based on the formula in Table 4.4. Each estimateis represented along with a 95 % HPD credible interval. . . . . . . . 62

Figure 4.10 Residence time estimates obtained using the integrated populationmodeling approach based on the formula in Table 4.4. Each estimateis represented along with a 95 % HPD credible interval. . . . . . . . 63

Figure 4.11 Estimates of alive population size in the spawning area obtainedusing the integrated population modeling approach based on theformula in Table 4.4. Each estimate is represented along with a 95% HPD credible interval. . . . . . . . . . . . . . . . . . . . . . . . . 63

Figure 4.12 Bayesian p-values for the assessment of the capture-recapture com-ponent of the integrated population model, using discrepancy D1. . 65

Figure 4.13 Bayesian p-values for the assessment of the snorkel survey componentof the integrated population model, using discrepancy D2. . . . . . 66

x

Page 11: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Chapter 1

Introduction

The study of animal population dynamics is important for the management and conservationof animal populations. A variety of surveys can be used for that purpose: capture-recapturesurveys, population counts, newborn counts, dead recoveries, telemetry surveys, creel sur-veys, etc. When a population is studied using more than one type of survey, the integrationof all the data in a single statistical analysis can be very challenging. It is currently an areaof active research and will be the main topic of this work.

Chapters 2, 3 and 4 form the core of this thesis. They are self-sufficient in the sensethat they can be read in any order, they each contain an introduction and the notation isnot shared between chapters.

In Chapter 2, we propose new statistical methods to integrate the data from aerial-accesscreel surveys in order to estimate angling effort, catch and harvest in recreational fisheries.Aerial-access creel surveys rely on two components: (1) A ground component in whichfishing parties returning from their trips are interviewed at some access-points of the fishery;(2) An aerial component in which an instantaneous count of the the number of fishing partiesis conducted. It is common practice to sample fewer aerial survey days than ground surveydays. This is thought by practitioners to reduce the cost of the survey, but there is a lackof sound statistical methodology for this case. In Chapter 2, we propose various estimationmethods to handle this situation and evaluate their asymptotic properties from a design-based perspective (see Lohr, 2009). The performance of the proposed estimators is studiedempirically using a simulation study with varying sampling scenarios. Another aspect thatwe study in this work is the optimal allocation of the effort between the ground and theaerial portion of the survey, for given costs and budget, for which we derive formulas usingthe Lagrange multipliers method. Finally, we apply our methods to data from an annualKootenay Lake (Canada) survey.

Capture-recapture surveys are periodic surveys that take place on a series of captureoccasions. On each occasions, a survey crew captures animals from a population. Whenan animal is captured for the first time, it is marked with a unique identification number

1

Page 12: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

and released back to the population, and the identification number is recorded. When ananimal is recaptured, its identification number is recorded and it is released back into thepopulation. The data collected by capture-recapture can be used to estimate the survivalprobability of the marked animals between capture occasions.

In Chapter 3, we develop new statistical methods for the integrated population modelingof capture-recapture data with other types of data, such as population counts and deadrecoveries. Typically, integrated population models rely on the assumption that the datasetsare independent so that a joint likelihood is easily formed as a product of likelihoods (Schauband Abadi, 2011). In our work, we develop a new capture-recapture Bayesian model thattakes into account the dependency between datasets. A key aspect of the model is thatit uses latent variables that keep track of all population gains (e.g. births) and losses (e.g.deaths) in the unmarked population and the marked cohorts over time. A simulation studycompares, under various scenarios, our approach with the common likelihood multiplicationapproach. Finally, we compare the performance of the two approaches using a real datasetcomprised of capture-recapture data, count data and newborn count data on a colony ofGreater horseshoe bats (Rhinolophus ferrumequinum) in the Valais, Switzerland.

In Chapter 4, we develop a Bayesian integrated population model to study the returnof Chinook salmon (Oncorhynchus tshawytscha) from the ocean to the spawning grounds inBurman River, on the west coast of Vancouver Island, Canada. Chinook salmon on the westcoast of Vancouver Island return to their natal stream in the fall after reaching maturity tospawn and die. When entering Burman River, fish stop for at least some time at a stopoverpool, where a capture-recapture survey takes place, then move upstream where they spawnand die. The upstream portion of the river is surveyed periodically by snorkelers that countthe number of marked and the total number of fish seen (alive). Carcass surveys also takeplace periodically, during which marked and unmarked carcasses are picked. Our integratedpopulation model integrates the capture-recapture data, carcass data and snorkel data allin a single analysis. This is, to our knowledge, the first use of explicit integrated populationmodeling applied to salmon migration. Our explicit integrated population model uses latentvariables to follow explicitly the movement and state of fish throughout the migration. Inthis work, we also implement a Bayesian version of the Jolly-Seber model (Schwarz andArnason, 1996) to the capture-recapture data alone and compare estimates between theintegrated method and the Jolly-Seber method.

2

Page 13: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Chapter 2

Adjusting for undercoverage ofaccess-points in creel surveys withfewer overflights

The work in this chapter underwent a peer-review process for publication in Biometrics, ajournal of the International Biometric Society published by Wiley. The paper is currentlyavailable in Early View on Wiley Online Library, see Béliveau et al. (2015).

The 2010-2011 Kootenay Lake creel survey was conducted with financial support of theFish and Wildlife Compensation Program on behalf of its program partners BC Hydro,the Province of BC, Fisheries and Oceans Canada, First Nations and the public. Accessinterviews and overflight boat count data were collected by Redfish Consulting Ltd. (Nelson,British Columbia).

2.1 Introduction

Sustainability of recreational fisheries relies on well-advised management decisions. Toinform those decisions, fishery agencies conduct creel surveys. Many characteristics of afishery can be of interest, including total catch (number of fish released or kept), totalharvest (number of fish kept), or total fishing effort (number of fishing days or hours)over a period of time. The data collection for creel surveys can be of two types: off-site(mail, telephone, door-to-door, logbooks) or on-site (Pollock, Jones and Brown, 1994). Inthis work, we focus on on-site surveys, which are conducted at the water body locationduring fishing hours. A common type of on-site survey is the access-point survey: it is aground survey, which relies on survey agents intercepting and interviewing angling partiesimmediately at the return of their fishing trip. The survey agents can be posted for example,at public boat ramps, piers or marinas. If a list of all access-points of the water body

3

Page 14: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

can be constructed, and access-points are selected randomly (each with strictly positiveprobability), an unbiased estimate of, e.g. total catch can be obtained for each survey day.However, in many practical situations, this option is impossible because some access-pointsmay be private (for example private docks or piers) or some parties may use unregulatedsites. Consequently, if these cases represent a significant proportion of the parties and/or ifthese cases differ significantly in their variables of interest from parties that use the coveredaccess-points, then standard estimation methods will have a substantial undercoverage bias(Lohr, 2009).

To address this problem, it is typically assumed in creel surveys that parties that useuncovered access-points do not differ in the variables of interests (catch, harvest, fishingeffort, etc.) from parties that use the covered access-points. For instance, they should not bemore or less experienced anglers. Still, this assumption is not sufficient for the estimation oftotals because the number of parties using uncovered access-points is unknown and typicallynot negligible. This last piece of information is deduced using aerial surveys. Aerial surveyscan be conducted, for example, using aircraft overflights or well-suited viewpoints fromwhich an instantaneous count of the number of fishing parties at a time of the day isobtained. Ideally, aerial surveys should be scheduled at random times of the day (Pollocket al., 1994) but environmental conditions (e.g. inclement weather, daylight hours, airportdelay) can make it hard for survey agents to respect the planned schedules. With this inmind, Dauk and Schwarz (2001) proposed estimation methods in the case when the aerialsurvey is conducted at a convenient time of the day, typically around the peak of fishingactivity. The use of deterministic aerial survey times is justified if parties’ choice of accesspoint is not related to their fishing schedule.

In this work, we focus on multi-day surveys for which we wish to estimate totals ofvariables of interest over multiple days, for example, the week-ends of August. Statisticalmethodology is currently available when ground and overflight surveys are conducted onthe same set of days, chosen at random among the days at study (Dauk and Schwarz, 2001).In practice, it is also common that aerial surveys are carried out on a random sample ofthe ground survey days only. This is thought by fisheries managers to be more economicalbecause flights are costly, and the biological (fish size, age, species) and angler data providedby ground sampling are highly valuable for management. Rather surprisingly, there is alack of statistical methodology for this type of aerial-access creel survey.

The motivating application for the work in this chapter is the annual creel survey onKootenay Lake, British Columbia. Estimates of catch (per species and overall), harvest(per species and overall) and fishing effort are required at the monthly level, separately forweekdays and weekends/holidays. In each stratum (eg. week-ends of August), the samplingof days follows a two-phase design: in the first phase, a simple random sample (srs) ofdays is selected to conduct the ground portion of the survey; in the second phase, a simplerandom sample of days for the overflight survey is selected from the days when access surveys

4

Page 15: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

are done. The access-points to be surveyed are selected deterministically to maximize theproportion of anglers that are interviewed. Also, in practice, because of inclement weather,mechanical breakdown or other reasons, some of the scheduled overflights might not becarried out. In this work, we assume that all scheduled overflights are conducted or that ifmissed it occurred at random.

In Section 2.2, the sampling protocol is described in detail and the notation is introduced.Then a variety of estimation methods are provided in Section 2.3 along with their design-based asymptotic properties and a strategy for optimal allocation of resources betweenthe ground and overflight components. In Section 2.4, a simulation study investigates theperformance of the estimators. Finally, in Section 2.5, the methods are applied to the2010-2011 Kootenay Lake creel survey data.

2.2 Sampling Protocol

Consider a population U of size N days. On every day i ∈ U , a set Vi of size Mi parties(or boats) fishes on that day on the water body of interest. For every party j ∈ Vi ondays i ∈ U , the variable of interest is cij . It may represent, for example: the number offish caught, the number of fish kept or the number of rod-hours. For every party j ∈ Vion days i ∈ U , an indicator variable, Iij , indicates whether party j returns to one of theground survey access points. In addition, for every day i ∈ U , if an overflight could beconducted, it would be conducted at time ti (we make the usual assumption that overflightsare instantaneous). Then, for every party j ∈ Vi on days i ∈ U , an indicator variable,δij(ti), indicates whether party j is fishing at time ti. For the rest of the chapter, we dropthe dependence on ti in δij(ti) for ease of notation.

In the first phase, a simple random sample sg ⊂ U of size ng days is selected to conductthe ground surveys. On every sampled day i ∈ sg, the parties that return to the surveyedaccess-points are interviewed (i.e. the parties for which Iij = 1): their corresponding vari-ables cij as well as the start and end times of their fishing trip are collected. On every dayi ∈ sg, the total of the variable of interest over the interviewed parties, Ci ≡

∑j∈Vi

cijIij , can

be computed from the data.In the second phase, a simple random sample so ⊂ sg of size no days is selected. Over-

flight surveys are conducted on those days and, for every day i ∈ so, the number of activeboats at time ti, Aoi ≡

∑j∈Vi

δij , is recorded. Thus, for every day i ∈ so, one can deduce

the value of δij for each party interviewed during the ground survey on that day using thestart and end times of their fishing trip. Then, one can compute the number of interviewedparties that are fishing at time ti, Agi ≡

∑j∈Vi

δijIij .

5

Page 16: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

2.3 Statistical Methods

The goal is to estimate the sum of a variable of interest over all angling parties during thestudy period: C∗ =

∑i∈U C

∗i , where C∗i =

∑j∈Vi

cij is the sum over all angling parties active

on day i of the variable of interest. In this section, we propose a number of estimators forC∗. We start by suggesting two intuitive estimators:

C1 =

Nng

∑i∈sg

Ci

∑i∈so

Aoi∑i∈so

Agi(2.1)

and

C2 =

Nng

∑i∈sg

Ci

1no

∑i∈so

AoiAgi

. (2.2)

The general idea behind these two estimators is to calculate an estimate of the total of thevariable of interest at the surveyed access-points and expand it to all access-points using aninflation factor computed as a ratio of the Aoi’s and Agi’s. The difference between C1 andC2 lies in computing the ratio involving the Aoi’s and Agi’s.

As a third estimator, we suggest

CDE = N

no

∑i∈so

CiAoiAgi

, (2.3)

which uses only information from days when both access and aerial components are avail-able. Setting yi = Ci

AoiAgi

, this estimator is a double expansion estimator (see, e.g. Särndal,Swensson and Wretman (1992), p.348), where yi can be seen as a proxy for C∗i . The doubleexpansion estimator is a generalization of the (single-phase) expansion estimator (also calledHorvitz-Thompson estimator) to two-phase designs. It is simply a weighted sum of the yi’scomputed from the aerial survey days’ data, where the weights correspond to the inverseprobability of inclusion in the sample, N

no. The estimator is design-unbiased but does not

integrate auxiliary information; namely the information collected on ground survey daysthat do not have an overflight. Hence, we propose to use that auxiliary information in atwo-phase ratio estimator (see again Särndal et al., p.359):

CR =

Nno

∑i∈so

yi

1ng

∑j∈sg

Cj

1no

∑j∈so

Cj. (2.4)

Ratio estimators are asymptotically design-unbiased and have improved design-efficiencyover expansion estimators when yi is approximately proportional to Ci.

These four estimators are consistent in the sense that if we sample all days and interviewall fishing parties every day, then the estimators are equal to the true total catch C∗.

6

Page 17: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Before describing the inference framework in which we study the proposed estimators,let us make some general remarks. First, note that if Ci is constant across days (all i ∈ U),then C2 = CR = CDE . Second, note that if AoiAgi is constant across days (all i ∈ U), then C1 =C2 = CR. Therefore, if the total catch per day and the proportion of interviewed partiesare similar across days, all estimators are expected to be roughly equivalent. However, thefirst condition seems very unlikely to be satisfied in practice because daily environmentalconditions (such as weather) could significantly affect the number of fishing parties and thesuccess of the parties. Also, regarding the second condition, there can be, for example, agreater use of non-sampled access points on good weather days in summer, which tends todecrease the proportion of interviewed parties.

2.3.1 Inference Framework

Throughout this chapter, we use the generic notation C to denote an estimator of C∗. Thetotal error of an estimator C is C − C∗ = (C − C) + (C − C∗) ≡ Errday(C) + Errparty(C),where the first term, Errday(C), is the error due to the sampling of days while the secondterm, Errparty(C), is the error due to the partial interview of fishing parties. Besides, Cdenotes the estimator one would have used in the case of a census of ground and overflightdays. For example, if the estimation strategy is to use the estimator CR, the estimator used

in the presence of a census of days would be C =(NN

∑i∈U yi

) 1N

∑j∈U

Cj

1N

∑j∈U

Cj=∑i∈U yi.

First, we assume that there is a superpopulation model, m, that randomly generates, foreach day i ∈ U , the number Mi of fishing parties. In addition, it generates, for each party jon day i: variables of interest, cij ’s; fishing status at time ti, δij ’s; and indicators of returnto one of the surveyed access-points, Iij ’s. Then, following the established sampling design,a two-phase sample of days is randomly selected by the survey practitioner. Inferencecan be made following different approaches depending on the sources of randomness one iswilling to take into account for inference. In this chapter, we adopt the design-based modeof inference, that considers only the randomness coming from the design. For example,unbiasedness under the design-based approach means that on average, over all the possiblesamples of days, the total error is null. Another type of inference that we do not pursue inthis work would be joint design and model-based inference. Although we are doing design-based inference, we make use of the superpopulation model mentioned in the beginningof this paragraph. The purpose of that model will be to give guidance concerning thedesign-based biases of our estimates.

From a design-based perspective, the contribution Errday(C) to the total error is random(design-dependent) while the contribution Errparty(C) is a fixed quantity, because C andC∗ do not depend on the sample of days. As a consequence, Errparty(C) contributes to the

7

Page 18: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

bias of C but not to its variance:

Biasp(C) ≡ Ep(C − C∗) = EpErrday(C) + Errparty(C)

= Ep

Errday(C)

+ Errparty(C) (2.5)

and

Varp(C) = Varp(C − C∗) = VarpErrday(C) + Errparty(C)

= Varp

Errday(C)

,

where Ep(·) and Varp(·) denote respectively the expectation and the variance under thesampling plan.

2.3.2 Study of the Bias

From equation (2.5), two terms contribute to the bias : EpErrday(C)

and Errparty(C).

To begin, we focus on the first term. In the case of the double-expansion estimator CDE , wehave Ep

Errday(CDE)

= Ep

(Nno

∑i∈so yi

)−∑i∈U yi = 0. This result follows from classical

survey sampling theory for two-phase designs (see eg. Lohr (2009), p.473). The otherestimators are smooth non-linear functions of estimated totals that can be linearized usinga first order Taylor series in the traditional finite population asymptotic framework of Isakiand Fuller (1982). The Taylor series expansions of the estimators are given in AppendixA.1. Consequently, Ep

Errday(C)

is negligible relative to the true total catch C∗ when no

is large enough.We now focus on the second term of (2.5). In general, this term is not negligible but

we are interested in finding situations in which it is. Note that if all fishing parties wereinterviewed on the sampled days (all access-points are known, accessible and surveyed), wewould have Errparty

(C)

= 0 for all four estimators. Therefore, sampling as many access-points as possible helps in reducing the bias of the estimators.

Now, we study the quantity Errparty(C)C∗ in an asymptotic framework consisting of a se-

quence of superpopulation models, mη∞η=1. For any superpopulation model mη, the num-ber of fishing parties on each day i in the population of days U is denoted Mηi and tendsto infinity in probability, as η →∞. The subscript η is dropped for ease of notation.

Note that CDE and CR have the same value of C and therefore, the same value ofErrparty(C) so they can be studied at the same time. Because neither the access-points northe time of the overflight were selected randomly, it is necessary to assume that the super-population model is such that parties generated on a given day have the same probabilityof being interviewed (this probability should not depend on e.g. fishing period or ability).

8

Page 19: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

More formally, we assume that on any day i ∈ U , the random variables

Iij |(Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi), j = 1 . . .Mi

are i.i.d. Bernoulli(pi), where pi is a non-zero probability. This is an important assumptionwhose validity must be gauged by fisheries scientists prior to the survey. In addition, weuse two assumptions that are mainly technical and are normally satisfied. See AppendixA.2.1 for the assumptions.

It can be shown that, under the assumptions previously described, Errparty(CDE) andErrparty(CR) are negligible relative to the target C∗ when the number of fishing parties oneach day is large enough. See Appendix A.2.2 for a proof of this assertion. On the otherhand, the estimators C1 and C2 require additional assumptions in order for the error due tothe partial interview of parties to be negligible. In the case of C2, pi should be approximatelyconstant across days i ∈ U (see Appendix A.2.4 for a proof of this assertion). RegardingC1, we found two cases that provide negligible error:

1. pi is approximately constant across days i ∈ U , or

2. C∗iMi

(the average catch per boat) and AoiMi

(the proportion of parties fishing at the timeof the aerial count) are approximately constant across days i ∈ U .

See Appendix A.2.3 for a proof of this assertion. To sum up, we found that the estimatorsCR and CDE are those that require the weakest assumptions to get negligible error due topartial interview of parties.

2.3.3 Study of the Variance

The variance of C can be expressed using the usual two-phase decomposition of the variance:Varp

(C)

= Var1E2(C|sg

)+ E1Var2

(C|sg

), where E1(·) and Var1(·) denote respectively the

first-phase expectation and variance and E2(·|sg) and Var2(·|sg) denote respectively thesecond-phase expectation and variance, conditional on the first-phase sample sg.

In the case of the double expansion estimator, we have Var1E2(C|sg

)= N2 (1− ng

N

) S2y

ng

and E1Var2(C|sg

)= N2

(1− no

ng

)S2y

no, where S2

y = 1N−1

∑i∈U

(yi− yU

)2 and yU = 1N

∑i∈U

yi. The

total variance is therefore:Varp

(C)

= N2(

1− noN

)S2y

no. (2.6)

The asymptotic variances of the remaining three estimators can be obtained from theirfirst-order Taylor expansions. Here, the large sample properties refer again to the finitepopulation framework. We use AVp(·) to denote asymptotic design-variance. For all three

9

Page 20: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

C αi βi

C1 CiAoUAgU

+AoiCUAgU−Agi CUAoU

AgU2

CUAgU

(Aoi −Agi AoUAgU

)C2 CiRU +RiCU , Ri = Aoi

AgiRiCU

CR yi yi − Ci yUCU

Table 2.1: Values of αi and βi for the variance formulas

remaining estimators, the asymptotic variance can be written in the form

AVp(C)

= N2(

1− ngN

)Sα

2

ng+N2

(1− no

ng

)S2β

no, (2.7)

where S2α and S2

β are measures of dispersion defined analogously to S2y . The values of αi

and βi associated with each estimator can be found in Table 2.1.Note that the variance formulas do not rely upon the assumptions in Section 2.3.2,

which are only relevant to the study of bias. Theoretical comparison between variancesseem ambitious for most estimators. However, it can be seen that CR will be more efficientthan the double expansion estimator if its corresponding value of S2

β is smaller than S2y or,

equivalently, if the population correlation coefficient between y and C is sufficiently large,that is, greater than 1

2CV(C)CV(y) , where CV stands for the population coefficient of variation.

Variance estimators can be obtained from (2.7) by replacing all S2 quantities by theirequivalent at the sample level. For example, an estimator of the variance of CR wouldbe Varp

(CR)

= N2 (1− ngN

) sα2

ng+ N2

(1− no

ng

) s2β

no, where s2

α = 1no−1

∑i∈so(yi − yso)

2 and

s2β = 1

no−1∑i∈so(βi −

¯βso)2, with yso = 1

no

∑i∈so

yi, ¯βso = 1

no

∑i∈so

βi and βi = yi − ysoCso

Ci.

A confidence interval of approximate level 1−α can be obtained as C±tng−1,1−α/2

√Varp

(C).

2.3.4 Optimal Allocation

Suppose that a budget B is allocated to the survey and that each overflight has a costof κo and each ground survey has a cost of κg. In the case of CDE , the allocation thatminimizes the variance (2.6) subject to the constraint noκo + ngκg ≤ B is obviously no =ng = B/(κo + κg) because the information collected on ground survey days that don’t havean aerial survey is not used to compute CDE . For the other estimators, the allocation thatminimizes the asymptotic variance (2.7) subject to the constraint noκo+ngκg ≤ B is foundusing the method of Lagrange multipliers; see Appendix A.3. We obtain:

ng = B

κg

1 +

√√√√κoκg

S2β

S2α − S2

β

−1

10

Page 21: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

and

no = ng

√√√√κgκo

S2β

S2α − S2

β

.

Although the quantities S2α and S2

β are unknown, they can be approximated using pre-vious years’ survey data. We now make two practical remarks. First, the optimal allocationformulas require that S2

α − S2β > 0. If this is not the case, then the optimal allocation is

necessarily no = ng = Bκo+κg . Second, the optimal allocation formulas can lead to alloca-

tions that do not satisfy no ≤ ng ≤ N . In that case, the optimal solution is found on theboundary, which means that either no = ng < N or no < ng = N . In the former case,the optimal allocation is no = ng = B

κo+κg and in the latter, no = B−Nκgκo

and ng = N .It suffices to compute both allocations along with their variance and choose the allocationwith the smallest variance.

The optimal fraction of overflight days to ground days depends on two ratios. As theratio of activity costs of an aerial to a ground survey increases, fewer overflight days shouldbe performed. As S2

α increases relative to S2β, then the optimal allocation also favors ground

surveys.

2.3.5 Stratification

Aerial-access creel surveys, such as the Kootenay Lake survey, may also be obtained througha stratified two-phase design. Stratification occurs at the population level, that is, the studyperiod is divided into strata and two-phase srs/srs samples are selected independently ineach stratum. When such a design is used, it is usually desirable to estimate the total ofvariables of interest over larger periods of time such as seasons or years. In this section, wemodify our notation by including stratum indicator indices. The stratum populations are

denoted by U1, . . . , UH , and U now represents the overall population U =H⋃h=1

Uh. As well,

the stratum first-phase and second-phase samples are denoted respectively by sg1, . . . , sgHand so1, . . . , soH and are of size ng1, . . . , ngH and no1, . . . , noH . We are interested in es-timating the total C∗ =

∑Hh=1C

∗h, where C∗h is the variable of interest total in stratum

h.In the case of the double expansion estimator, one can obtain a stratified estimator and

variance estimator by simply summing the stratum estimators and variance estimators. Forthe two-phase nonlinear estimators (2.1), (2.2) and (2.4) there are typically two ways tocombine the information across strata, that is: separate ratio estimators and combined ratioestimators (see Lohr (2009), p.144).

The separate estimator, which we denote Cs, is obtained by summing the estimatorscomputed within each stratum: Cs =

∑Hh=1 Ch, where Ch represents one of the estimators

(2.1), (2.2) or (2.4) computed within stratum h. The variance of the separate estimator is

11

Page 22: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

the sum of the stratum variances, and therefore a variance estimator is simply obtained bysumming the variance estimates within each stratum.

For the combined estimators, we present only the case of estimator CR for sake ofsimplicity but the other combined estimators can be obtained analogously.

The combined ratio estimator is given by CRc =(∑H

h=1Nhnoh

∑i∈soh yhi

) H∑h=1

Nhngh

∑j∈sgh

Chj

H∑h=1

Nhnoh

∑j∈soh

Chj

and the variance estimator is Varp(CRc

)=∑Hh=1

N2h

(1− ngh

Nh

)sαh

2

ngh+N2

h

(1− noh

ngh

) s2βh

noh

,

where αhi = yhi and βhi = yhi −

H∑l=1

Nlnol

∑j∈sol

ylj

H∑l=1

Nlnol

∑j∈sol

Clj

Chi.

A known fact about the separate ratio estimator is that it sums up the separate biaseswhile the standard error generally decreases relative to the total of interest (Lohr (2009),p. 145). As a result, the bias-to-SE ratio increases. If the separate biases are negligible,the use of the separate ratio estimators is appropriate, otherwise the combined estimatorsare preferable.

2.4 Simulation Study

The first part of the simulation study was designed to study the bias due to the partialinterview of parties under different scenarios. We considered one scenario (A) where weuse the same model to generate the fishing parties on every day and another scenario (B)where there are three different types of days (e.g. based on weather conditions) and adifferent generating model for the parties for each day type. Hence scenario (A) makesa strong assumption that pi is the same for all days i in U and in that case we expectErrparty(C)/C∗ to be asymptotically equal to zero for all four estimators based on theresults in Section 2.3.2. Scenario (B) allows pi to vary across days and in this case itwas asserted in Section 2.3.2 that Errparty(C)/C∗ is asymptotically equal to zero for theestimators CDE and CR. For each scenario, and for increasing mean number of boats (fishingparties) per day, µb = 50, 100, 250 and 500, we generated 100 populations of size N = 22days (corresponding roughly to weekdays in a month). For every day i in U = 1, . . . ,22,we proceeded in the following way:

• In the case of scenario (B), generate the day type: type (i) with probability 0.3, type(ii) with probability 0.4 or type (iii) with probability 0.3.

• Generate the number of fishing parties : Mi ∼ Poisson(µb)

12

Page 23: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Parameter Scenario (A) Scenario (B)Mean number of fishermen per party, νi 2 2Mean catch per fisherman, λi 0.3 0.8(i) 0.5(ii) 0.2(iii)Probability of fishing at time of overflight, φi 0.5 0.6(i) 0.7(ii) 0.8(iii)Probability of returning to an access-point, pi 0.66 0.5(i) 0.7(ii) 0.8(iii)

Table 2.2: Parameter values used to generate the data for the simulation study.

• Generate Mi fishing parties with catch, activity indicator at time of overflight andaccess-point landing indicator using the following distributions:

cij ∼ (Poisson(νi − 1) + 1)× Poisson(λi);

δij ∼ Bernoulli(φi);

Iij ∼ Bernoulli(pi).

The parameters used for data generation are listed in Table 2.2.

Then, for each population that was generated, we computed the relative bias due tothe partial interview of parties for estimators (2.1) to (2.4) as RBparty(C) = Errparty(C)

C∗ .

Note that the probability distributions used to generate the data were chosen arbitrarilybut an accurate match between model and reality is not required here in order to study thedesign-based properties of our estimators. Figure 2.1 shows the simulation results. In thecase of scenario (A), the four estimators have similar behaviors, i.e. the relative biases aredistributed closer around zero as the mean number of fishing parties per day gets larger.In the case of scenario (B), the relative biases of CDE and CR are centered around zerowhile the other estimators exhibit a systematic bias that does not diminish as the numberof fishing parties increases.

13

Page 24: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Figure 2.1: Boxplots of relative bias due to the partial interview of parties for 100 populationreplicates with varying mean number of boats per day. Left column: scenario (A); rightcolumn: scenario (B). The first to fourth rows relate, in order, to the estimators C1, C2,CDE and CR.

14

Page 25: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

The second part of the simulation study was designed to compare the estimators in termsof accuracy and confidence interval coverage. In order to do so, we generated, for each of thetwo scenarios, a single population with µb = 130 using the algorithm already described. Forboth scenarios we generated, from the population, K = 50,000 two-phase srs/srs samples ofsize ng and no. We varied the first-phase sampling fraction by investigating the cases ng = 4,no = 2 and ng = 10, no = 5. For each replicated sample, we computed the estimators oftotal catch (2.1) to (2.4). Note that given the small sample and population sizes, we couldhave generated all the possible samples rather than simulating a large number of samplesbut our code would not have been as generally usable for larger population and/or samplesizes. With K = 50,000 the two approaches give similar results.

We summarize the results using the following Monte Carlo measures: the relative biasdue to the sampling of days (RBdaysMC), the relative root mean squared error (RRMSEMC),the coverage probability of a 95% confidence interval (CPMC) and the bias ratio (BRMC).The formulas used for the calculations are given explicitly in Appendix A.4.

The simulation results are displayed in Table 2.3. For both scenarios (A) and (B), theRRMSEMC of the double expansion estimator is larger than that of C1, C2 and CR. Inscenario (A), the coverage probability is close to 95% for all estimators. The bias ratiois also relatively low (Cochran, 1977, p.14) which explains why the confidence intervalshave proper coverage. The coverages are consistently slightly over 95% which we think is aconsequence of the t distribution being an approximation for the distribution of C. In thecase of scenario (B), when the first-phase sampling fraction is smaller, the bias ratios remainfairly low and all estimators have coverage probability close to 95%. However, when thefirst-phase sampling fraction is larger, the biases of C1 and C2 become important relative tothe standard error and affect the coverage probability negatively. Hence, the estimator CRis preferable in this study because it has the smallest RRMSEMC along with 95% coverageprobability for the confidence interval.

2.5 Application

An aerial-access creel survey was conducted on Kootenay Lake, British Columbia, fromDecember 2010 through November 2011. The study period was stratified by month andby day status: weekday or weekend. Statutory holidays were also defined as weekends. Ineach stratum, a simple random sample of days was selected to conduct ground surveys.Within each of these samples, a simple random sample of days was selected to conductoverflights. The allocation of sample sizes for the study is displayed in Table 2.4. Thenumber of samples per month was adjusted seasonally to increase the intensity duringmonths when fishing effort was expected to be higher (based on previous data). Unsafeweather conditions also resulted in cancellation of some flights but we assume here that thesimple random sample assumption is valid.

15

Page 26: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Scenario ng no C RBparty RBdaysMC BRMC RRMSEMC CPMC

(A) 4 2

C1 2 0 14 13 96C2 2 0 17 13 96

CDE 2 0 9 18 97CR 2 0 15 13 96

(A) 10 5

C1 2 0 23 7 96C2 2 0 32 7 96

CDE 2 0 17 11 97CR 2 0 27 7 96

(B) 4 2

C1 -12 3 -36 25 95C2 -7 1 -23 25 96

CDE -2 0 -5 35 98CR -2 -1 -12 26 97

(B) 10 5

C1 -12 1 -89 16 83C2 -7 0 -47 14 91

CDE -2 0 -8 21 97CR -2 0 -15 14 95

Table 2.3: Monte Carlo measures for the simulation with µb = 130. Numbers are expressedin %.

The survey also recorded data on shore anglers but we focus on boat anglers only. Therewere fifteen derby days during the study period. During those days, a fishing derby wasorganized on Kootenay Lake, with entry fees and substantial prize money ($ 100s or $1000) for the largest fish. Derbies are organized mostly by local businesses (or a communitygroup). For sake of simplicity we chose to exclude derby days for this analysis. Estimates oftotal catch on these days could have been obtained separately and then added to our totalestimates. Because the sampling of derby days is independent from the sampling on otherdays, bias and variances add up. Hence a variance estimate for the total over derby andnon derby days altogether could be obtained by summing the variance estimates obtainedfor derby and non derby days respectively.

The ground portion of the survey was located at the following access points: Balfour,Boswell, Kuskanook, Kaslo, Riondel, Crawford Bay and Woodbury; see map in Figure 2.2.During the sampled days, angling parties returning to those access points were interviewedto determine the number of fish kept and released from each species, the start and the endtime of the angling trip, and other variables.

The aerial survey was conducted around noon, which is the peak daily activity. Thenumber of boats showing fishing activity was counted once as the airplane flew out andagain on the return flight. We compute the quantity Ao as the average of the inboundand outbound counts. We compute Ag as the average of the number of parties fishingat the inbound overflight midtime and the outbound overflight midtime. For example, if

16

Page 27: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

!.

!.

!.

!.

!.

!.

!.

!(

!(

!(

!.

!.

!.

!.

!.

!.

!.

Salmo

Nelson

Slocan

SilvertonNew Denver

Kaslo

Boswell

Balfour

RiondelWoodbury

Kuskanook

Crawford Bay

!.Nelson

Victoria

CranbrookVancouver

Revelstoke

Alberta

Washington MontanaIdaho

!. Survey Access Points

µ

0 10 205Kilometers

Map Projection/Coordinate System: NAD 1983 UTM Zone 11N

Figure 2.2: Kootenay Lake and the creel survey access points. Riondel/Crawford Bay andBoswell/Kuskanook ramps were combined for field monitoring and data analysis. Mapprovided by A. Waterhouse, Ministry of Forests, Lands, and Natural Resource Operations.

17

Page 28: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Period Weekdays Weekends(yyyy-mm) N ng no N ng no2010-12 22 2 1 9 2 12011-01 21 1 1 10 2 12011-02 20 1 0 8 2 02011-03 23 1 1 8 2 12011-04 21 2 1 6 2 12011-05 22 3 2 6 4 32011-06 22 3 2 8 4 22011-07 21 3 2 10 4 22011-08 23 3 2 8 3 22011-09 22 3 2 8 2 22011-10 20 3 1 5 2 02011-11 22 3 1 5 2 2

Table 2.4: Allocation of sample size in the 2010-2011 Kootenay Lake Creel Survey.

the overflight takes place from 12 pm to 1 pm on the way out and 1 pm to 2 pm on theway in, then Ag is the average of the number of interviewed parties fishing at 12:30 and1:30. This can introduce a bias in the estimates that we assume to be negligible. Findingthe most suitable way of computing the quantities Ao and Ag for aerial surveys that spanconsiderable time remains an open question. This is a significant consideration in this lakethat can take between 45 minutes and 1.5 hours to fly in one direction.

In this work, we present the results for the variable number of rainbow trout kept. Plotsof the data in Appendix A.5 give insight on the proportion of parties interviewed, the totalcatch and the number of fishing parties per day, respectively.

Variance estimates cannot be obtained in strata that have zero or one aerial survey. Forthis reason, we present stratum estimates for the months of May to September only; seeFigure 2.3 (those total estimates do not include derby days). The estimators C1, C2 andCR produce similar results in each stratum. This may not be the case in other scenarios orstudies. Furthermore, the confidence intervals in some strata are very wide, whereas theyare significantly shorter in other strata. This is explained by the small sample sizes in somestrata producing quite variable estimates. Rather surprisingly, the estimator CDE has amuch smaller confidence interval in some strata, especially in August. An inspection of thedata reveals that, in those cases, the values of yi for the no = 2 sampled overflight daysturn out to be very close, thus leading to a small variance estimate for CDE .

18

Page 29: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Figure 2.3: Monthly estimates of total number of rainbow trout kept along with approximate95% confidence intervals. The top and bottom graphs represent weekends and weekdaysrespectively. The estimators (2.1) to (2.4) are represented respectively by the followingsymbols: triangle, circle, x mark and square.

19

Page 30: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

The optimal allocation was computed for the months of May to September, separatelyfor weekends and weekdays. The cost of an overflight on Kootenay Lake is approximately$1,200 whereas the daily cost of an access-point survey is approximately $1,600. From theresults in Table 2.5, we cannot conclude generally that conducting fewer overflight thanground surveys is the best strategy. In particular the results suggest that estimation forJune to September weekdays would be more efficient with an equal number of overflight andground survey days. Note that these results apply only to the variable “number of rainbowtrout kept" so optimal allocation should also be investigated for other key variables of thestudy before a decision on sample size allocation is made.

Weekdays WeekendsC1 C2 CR C1 C2 CR

May 0.14 0.14 0.08 0.60 0.65 0.69June 1.00 1.00 1.00 0.87 0.87 0.87July 1.00 1.00 1.00 0.69 0.69 0.66

August 1.00 1.00 1.00 1.00 1.00 1.00September 1.00 1.00 1.00 0.68 0.67 0.60

Table 2.5: Optimal values of no/ng for each month and day type combination for thenumber of rainbow trout kept. Note that we do not present results for the double expansionestimator because in that case the optimal allocation is no = ng.

In order to illustrate the methods described in Section 2.3.5, we produced estimates atthe seasonal level (see Table 2.6). The year was divided into three seasons: winter (Decemberto March), shoulder (April, May, October, November) and summer (June to September).We chose to compute combined estimates rather than separate estimates in order to preventthe bias from becoming important relative to the standard error. However, when no is equalto zero or one in some stratum, the combined variance estimators cannot be computed. Butbecause stratification is expected to enhance the efficiency of estimators (provided that thestrata are sufficiently homogeneous, which should be satisfied here), one can pool some strataand pretend the data were obtained from a two-phase srs/srs sample (without stratification)for computing the variance estimate. This variance estimate is expected to overestimatethe variance and provide confidence intervals with coverage probability greater than 1− α.For the winter analysis, we pooled all weekdays together and all weekends together. For theshoulder analysis, we pooled April and May weekdays, October and November weekdays,and similarly for weekends. No pooling was necessary for summer. We also computed atotal estimate over the whole survey period by summing the seasonal estimates and theirvariance estimates (separate estimator strategy). We observe that estimates are highest inthe summer season and lowest during winter. The confidence intervals are also narrower(relative to the estimate values) than those associated with monthly estimation in Figure2.3.

20

Page 31: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Shoulder Season Summer Season Winter Season TotalApr, May, Oct, Nov June to Sept Dec to Mar Dec to NovEst Low Upp Est Low Upp Est Low Upp Est Low Upp

C1 1672 1265 2079 3341 2872 3809 874 576 1172 5887 4767 7006C2 1639 1231 2048 3526 2895 4156 887 612 1162 6052 4796 7308

CDE 2274 1840 2708 4027 3357 4698 1312 865 1758 7613 6136 9090CR 1715 1368 2062 3616 2992 4239 873 565 1181 6203 4983 7424

Table 2.6: Seasonal combined estimates (Est) of total number of rainbow trout kept alongwith approximate 95% confidence intervals (Low,Upp). The last column is computed as aseparate total estimate over the three seasons.

2.6 Discussion

In this chapter, we have provided estimation strategies to be used for aerial-access creelsurveys with overflights occurring only on a subset of access survey days. The estimatorsCR and CDE were shown to be the most suitable in terms of bias. However, the bias maybe substantial if one cannot assume that parties fishing on a given day are generated froma model which gives to each party the same probability of being interviewed. Simulationresults have shown that, when the first-phase sampling fraction is small, the bias of C1

and C2 can be negligible relative to standard error and thus does not affect the coverage ofconfidence intervals.

We have applied our methods to the 2010-2011 Kootenay Lake creel survey for onevariable of interest of the survey: the number of rainbow trout kept. Although conductingfewer overflights than ground surveys is thought to be more economical by the fisheriesmanagers, our optimal allocation results suggest that this might not be true for a numberof months/day type combinations for the estimation of totals, as many allocations lie on theboundary no = ng. However, the biological data obtained from the ground surveys is quitevaluable for fisheries scientists. For example, changes in fish size and age composition areoften used to evaluate population responses to management decisions such as changed dailycatch limits. These variables do not require aerial surveys (the purpose of aerial surveys isto be able to estimate total effort and catch) but may suggest more ground survey effort toadequately describe their trends. A decision about optimal allocation for future years thusneeds to balance the relative importance of the different quantities of interest of the survey.

It also remains to determine the best way to compute the quantities Ag when theoverflight is not quite instantaneous, as in the Kootenay Lake survey where the average flighttime one way is one hour. Another topic of interest is to investigate the appropriateness ofthe assumption that all scheduled overflights are conducted or that those missed occurredat random. Missing overflights are often due to weather conditions which can be related tothe variable of interest such as catch and fishing effort. Ignoring the non response in thiscase could possibly lead to biased estimates.

21

Page 32: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Finally, the methods presented in this chapter can be applied in contexts other thanfisheries; for instance, to estimate the attendance at a multi-day street festival. In this case,the access survey can consist of posting interviewers at some access locations and collectarrival and departure times. The aerial survey can be replaced by a ground count of peopleat the peak attendance time of the day. Large areas can be covered by partitioning thetotal area into smaller sections and assigning a surveyor to each of them.

22

Page 33: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Chapter 3

Explicit integrated populationmodeling: escaping theconventional assumption ofindependence

3.1 Introduction

Monitoring changes in population size and structure (age, sex) provides valuable insightfor effective management of animal populations. A common way to gain insight into pop-ulation dynamics is to capture and mark cohorts of individuals with a unique identifierfollowed by recaptures and/or resightings and/or dead recoveries of the marked animals.Capture-recapture, mark-resight, mark-recovery or mark-recapture-recovery surveys maybe supplemented by other types of surveys on the same population such as periodic countsof individuals (all, adults, females, unmarked, etc.) or nests. Those counts are typicallysubject to observational error.

When multiple surveys are used to study a single population, the data can be analyzedseparately by survey. However, a joint analysis of the data via integrated population mod-eling is often preferred because it can provide more precise estimates and/or permit theestimation of parameters that cannot be estimated using separate analyses. For instance,capture-recapture data and population count data alone do not permit the estimation of afecundity rate but an integrated population model that combines both datasets does. Fora recent review of publications where integrated population modeling has been used withbird and mammal populations, see Schaub and Abadi (2011) .

Currently, integrated population models are typically formulated by multiplying the like-lihoods of the various datasets. In some circumstances, this approach, while approximate,is indeed very good - for example, if the different surveys are conducted on sub-populations

23

Page 34: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

which do not share many individuals in common (nearly independent datasets), but havecommon demographic parameters. Simulation studies have been conducted to compare theestimates obtained when multiplying the likelihoods for both dependent and independentdatasets; see Besbeas, Borysiewicz and Morgan (2008) and Abadi et al. (2010). However, animportant approach has not been compared in these empirical studies, that is, the analysisof dependent datasets using the true joint likelihood. We pursue this idea in this chapter.In parallel with our work, there has been recently a growing interest in using integratedpopulation modeling methods that do not rely on an independence assumption, see e.g.Chandler and Clark, 2014, for a solution based on data augmentation.

To simplify the presentation of our methodology, we focus, for most of this chapter,on the case of a population studied using two surveys: a capture-recapture survey and apopulation count survey. In section 3.2, we give some background and notation. In section3.3, we develop the model based on the true joint likelihood and we further explain how ourmodel can be modified to accommodate a variety of situations (not only capture-recaptureand population count data). In section 3.4, we present the results of a simulation study.Finally, in section 3.5, we apply our methodology to data from a colony of Greater horseshoebats (Rhinolophus ferrumequinum) in Switzerland.

3.2 Background and notation

3.2.1 Capture-recapture survey

The data collection process of capture-recapture entails sending a survey crew into the fieldon a series of capture occasions. When an animal is captured for the first time, it is markedwith a unique tag and released in its environment so that it can be identified if recapturedat a further capture event. When marked individuals are recaptured at a further captureevent, their identity is recorded.

Suppose that there are K capture occasions. The capture-recapture data can be sum-marized into an m-array1 with K − 1 lines and K columns:

M12 M13 . . . M1K Z1

M23 . . . M2K Z2. . . ...

...MK−1,K ZK−1

.

The first K − 1 columns of the m-array form an upper-triangular array that we denoteby M with the lines indexed by i = 1, . . . , K − 1 and the columns indexed by j =

1The term m-array is commonly used in capture-recapture studies to summarize the capture-recapturefrom individual capture histories.

24

Page 35: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

i+ 1, . . . , K. The elements Mij represent the number of individuals released on occasioni (after either being captured for the first time or recaptured) that are alive and recapturedfor the next time on occasion j. The last column of the m-array is denoted by Z andis indexed by 1, . . . , K − 1 with Zi representing the number of individuals released atoccasion i that are never recaptured.

The capture-recapture data can be modeled using a Cormack-Jolly-Seber model, whichconditions on the number of animals released at each occasion. The lines of the m-array aremodeled using independent multinomial distributions conditional on the number of releases:

[Mi,i+1, . . . , Mi,K , Zi|Ri = ri]indep∼ Multinomial (ri, qi) , for i = 1, . . . , K − 1, (3.1)

where Ri =∑Kl=i+1Mil + Zi is the number of individuals released at time i and

qi =(q(i,i+1), . . . , q(i,K),1−

K−i∑l=1

q(i,i+l)

)>

is a vector of size K − i + 1 where qij represents the probability that a marked individualsurvives2 from occasion i to occasion j, and is not recaptured until occasion j. Let φ =(φ1, . . . ,φK−1)> with φj representing the individual’s probability of apparent survival fromoccasion j to j+ 1 and p = (p2, . . . , pK)> with pj representing the individual’s probabilityof recapture at occasion j. Then, the qij ’s can be expressed in terms of φ and p. Forexample, q46 = φ4(1 − p5)φ5p6 is the probability that a marked individual survives fromoccasion 4 to occasion 6, and is not recaptured until occasion 6.

The Cormack-Jolly-Seber model relies on a number of assumptions:

i. Survival is independent between individuals and does not depend on individual char-acteristics (sex, age etc.)

ii. Capture is independent between individuals and does not depend on individual char-acteristics (sex, age etc.)

iii. No temporary emigration (permanent emigration is confounded with death)

iv. No tag loss, no recording errors and marking does not affect the future behavior of anindividual

v. Capture occasions are instantaneous

For inference, a conditional likelihood, L(φ,p|M,Z), is formed simply as the product ofthe K − 1 multinomial densities in (3.1).

2Apparent survival is used because permanent emigration from the study area is indistinguishable fromdeath. Unless explicity stated, survival in this chapter is always apparent survival.

25

Page 36: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

3.2.2 Population count survey

A count survey of the population provides information on the relative changes in populationsize over time. Let us suppose that a population is studied using K population countsequally spaced in time. Let the count data be denoted by a vector Y, of size K, withYi being the number of individuals counted on occasion i. Note that the counts Yi aretypically imperfect counts because they are subject to observational error. The count datais typically modeled using a state-space model (Buckland et al., 2004). State-space modelsinvolve a state-process and an observational process. In this case, the state process isthe latent process that governs the changes in population size between counts. Let N =(N1, . . . , NK)>, where Nj is the population size at the time of the jth population count.The state process is specified by specifying a distribution for Nj conditional on Nj−1. Weassume that the birth process is instantaneous and that births occur right after populationcounts. A simple model for Nj , j = 2, . . . , K that accounts for births and survival couldbe

Nj |Nj−1, Bj−1 ∼ Binomial (Nj−1 +Bj−1, φj−1) , for j = 2, . . . , K (3.2)

with the number of births right after the jth count defined as

Bj |Nj ∼ Poisson (Njfj/2) , for j = 1, . . . , K − 1. (3.3)

The division by 2 is a way to estimate, assuming a 50/50 sex-ratio, a fecundity per female.For sake of simplicity, this model assumes that juvenile and adult survival probabilities arethe same although this is often not true in real populations. This state-space model relieson a number of assumptions:

i. Females start reproducing at the age of one year old

ii. The expected sex ratio of newborns is 50%

iii. Survival is independent between individuals and does not depend on individual char-acteristics (sex, age etc.)

iv. No immigration and no temporary emigration (permanent emigration is confoundedwith death).

In addition to the state-space process, an observation process describes the populationcount data, Y, conditional on N. In practice, a normal distribution is often used as anapproximation:

Yj |Njindep∼ Normal

(Nj ,σ

2), for j = 1, . . . , K.

26

Page 37: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Note that if larger counts are thought to be less precise than smaller ones, a log-normaldistribution can be used instead:

log(Yj)|Njindep∼ Normal

(log(Nj),σ2

), for j = 1, . . . , K.

The likelihood of the count survey data is thus

L(φ, f ,σ2, N1|Y) =∑

(N∗,B)∈ΩP (Y1|N1)

K∏j=2

[P (Yj |Nj)P (Nj |Bj−1, Nj−1)P (Bj−1|Nj−1)] ,

where N∗ = (N2, . . . ,NK)> and Ω is the set of all possible values for (N∗,B).

3.2.3 Integrated population modeling via likelihood multiplication

In the ecological literature, integrated population models have typically been obtained bymultiplying the likelihoods of the separate datasets. In the case of a population studied withboth a capture-recapture and a count survey, the following pseudo-likelihood is constructedby multiplying the capture-recapture likelihood and the population count likelihood:

Lc(φ, f ,p,N1|Y,M,Z) = L(φ,p|M,Z)L(φ, f , N1,σ2|Y). (3.4)

Note that the capture-recapture likelihood and the population count survey likelihoodhave a parameter φ in common, which represents both the survival between capture occa-sions and between counts. Therefore, this approach assumes that the jth capture occasionand the jth count occur at about the same time for all js.

The capture-recapture data and the count data are not independent when both surveysare conducted on a single population (or overlapping populations). In the literature so far,the term independence assumption has been coined when describing the likelihood (3.4).The use of this likelihood is attractive in practice because of its simplicity and because ituses a reduced number of parameters. This likelihood multiplication approach is reminiscentof the naive Bayes approach (Koller and Friedman, 2009).

Equation (3.4) is not the true joint likelihood but rather a composite likelihood (Varinet al., 2011). Hence, it provides unbiased estimating equations. However, pretending that itis the true joint likelihood for inference leads to incorrect variance estimates and hence con-fidence intervals that do not have the targeted confidence level. Surprisingly, this character-istic of the composite likelihood seems to have been overlooked in the integrated populationmodeling literature so far. In particular, the simulation studies of Besbeas, Borysiewicz andMorgan (2008) and Abadi et al. (2010) investigate the frequentist properties of the estima-

27

Page 38: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

tors but do not investigate the properties of the variance estimators and confidence/credibleintervals. This will be addressed in Section 3.4.

3.3 Integrated population modeling based on the true jointlikelihood

3.3.1 Capture-recapture and count data

Suppose, as in Section 3.2, that we have capture-recapture data (M,Z) and count data Yand that assumptions i.-v. and i.-iv. from Sections 3.2.1 and 3.2.2, respectively, are met.In order to formulate an explicit model, we have to take into account the order in whichthe surveys and the demographic gains and losses occur in the population. For sake ofillustration, we assume that the events follow the timeline represented in Figure 3.1.

Figure 3.1: Timeline of events of the animal population study. The symbols “C”, “B” and“CR” stand for count survey, births and capture-recapture, respectively. Note that thetime between the count survey, the births and the capture-recapture survey in each periodis negligible.

The formulation of an explicit integrated population model based on the true jointlikelihood can be achieved using a Bayesian model (Koller and Friedman, 2009). The keyto formulating the true joint likelihood is to introduce a set of latent variables so that whencombined with the capture-recapture data, one can deduce, at any point in time, the stateof the population, that is

• the number of unmarked animals in the population

• the number of marked animals remaining (alive and not recaptured) in each releasedcohort.

A set of variables that is appropriate is

• N1, the population size at the beginning of the study

• Du, a vector of length K − 1, where Duj represents the number of unmarked animals

that died in period j

28

Page 39: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

• Dm, an upper triangular array indexed by i = 1, . . . , K−1 and j = i, . . . , K−1where Dm

ij represents the number of marked animals released for the last time inperiod i that died in period j

• B, a vector of length K, where Bj represents the number of births3 in period j.

In order to be able to follow the transition of individuals from an unmarked state toa marked state, it is convenient to reparametrize the capture-recapture data as (M,C)rather than (M,Z), where C is a vector of length K − 1 where Cj represents the number ofindividuals captured for the first time (i.e. marked) in period j. The relationship between(M,C) and (M,Z) is one-to-one; they contain the same information. The quantity Cj can becomputed from the capture-recapture data M and Z as the number of individuals releasedat period j minus the number of individuals recaptured at period j, that is:

Cj =

Zj +K∑

l=j+1Mjl

− j−1∑k=1

Mkj for 2 ≤ j ≤ K − 1,

with C1 =(Z1 +

∑Kl=2M1l

).

To show that our parametrization N1,Du,Dm,B,M,C allows us to track the stateof the population at any point in time, we constructed Table 3.1, which illustrates the caseof K = 3 periods. Each line of the table shows the distribution of the population acrossstates at a given time. Each column of the table follows the change in population size, overtime, per state. We added a column to the right of the table to keep track of the totalpopulation, because this column will be useful for modeling the count data.

3Births is the term generally used to represent ANY source of new animals to the study area. The newanimals in general do not have to be juvenile animals.

29

Page 40: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Timeline Number of unmarked Number of marked Number of marked Number of marked Total number ofof events individuals individuals last individuals last individuals last individuals

released during released during released duringperiod 1 period 2 period 3

Perio

d1

N1 N1Count

N1 N1Births

N1 +B1 N1 +B1Captures

N1 +B1 − C1 C1 N1 +B1Deaths

Perio

d2

N1 +B1 − C1 −Du1 C1 −Dm

11 N1 +B1 −Du1 −Dm

11Count

N1 +B1 − C1 −Du1 C1 −Dm

11 N1 +B1 −Du1 −Dm

11Births

N1 +2∑j=1

Bj − C1 −Du1 C1 −Dm

11 N1 +2∑j=1

Bj −Du1 −Dm

11

C & R

N1 +2∑j=1

Bj −2∑j=1

Cj −Du1 C1 −Dm

11 −M12 C2 +M12 N1 +2∑j=1

Bj −Du1 −Dm

11

Deaths

Perio

d3

N1 +2∑j=1

Bj −2∑j=1

Cj −2∑j=1

Duj C1 −

2∑j=1

Dm1j −M12 C2 +M12 −Dm

22 N1 +2∑j=1

Bj −2∑j=1

Duj −

2∑i=1

2∑j=i

Dmij

Count

N1 +2∑j=1

Bj −2∑j=1

Cj −2∑j=1

Duj C1 −

2∑j=1

Dm1j −M12 C2 +M12 −Dm

22 N1 +2∑j=1

Bj −2∑j=1

Duj −

2∑i=1

2∑j=i

Dmij

Births

N1 +3∑j=1

Bj −2∑j=1

Cj −2∑j=1

Duj C1 −

2∑j=1

Dm1j −M12 C2 +M12 −Dm

22 N1 +3∑j=1

Bj −2∑j=1

Duj −

2∑i=1

2∑j=i

Dmij

C & R

N1 +3∑j=1

Bj −3∑j=1

Cj −2∑j=1

Duj C1 −

2∑j=1

Dm1j −

3∑j=2

M1j C2 +M12 −Dm22 −M23 C3 +

2∑i=1

Mi3 N1 +3∑j=1

Bj −2∑j=1

Duj −

2∑i=1

2∑j=i

Dmij

Deaths

Table 3.1: Changes in the population size per state over time for a study with K = 3 periods. The table follows the timeline in Figure3.1. Starting in the upper left corner of the table, the population is comprised of N1 unmarked individuals at the beginning of period 1.Then, the count survey occurs (which does not affect the state nor size of the population). Then, B1 births occur resulting in N1 +B1unmarked individuals in the population. Then, C1 individuals are captured, marked and released which leaves N1 +B1−C1 unmarkedindividuals in the population. Then, Du

1 unmarked individuals die and Dm11 marked individuals die. When period 2 begins, there are

respectively N1 + B1 − C1 −Du1 and C1 −Dm

11 unmarked and marked individuals in the population. The table goes on like this untilthe study is finished. Note: C & R is used to abbreviate “captures and recaptures”.

30

Page 41: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Next, we exploit Table 3.1 to define conditional distributions for the data M, C and Yas well as for the latent random variables B, Du and Dm (we do not model N1 becausethis parameter is at the top of the hierarchy). For ease of notation, we do not specify thevariables that the distributions are conditioned upon. Also, sums that go backwards aredefined as zero and undefined variables (e.g. M11) are defined as zero. The parts of theequations that are highlighted are derived from Table 3.1.

• Cj ∼ Binomial(Nuj +Bj , ξj

)for j = 1, . . . , K − 1, where Nu

j = N1 +∑j−1l=1 (Bl −

Cl −Dul ) is the number of unmarked individuals at the beginning of period j

• Mij ∼ Binomial( Ri −∑j−1l=i D

mil −

∑j−1l=i+1Mil , pj), for i = 1, . . . , K − 1 and j =

i + 1, . . . , K, where Ri = Ci +∑i−1k=1Mki is the number of released individuals in

period i

• Yj ∼ Normal(Nj , σ

2) , for j = 1, . . . , K, where Nj = N1 +∑j−1l=1 (Bl − Du

l −∑j−1k=1D

mkl) is the population size at the beginning of time j

• Bj ∼ Poisson ( Nj fj/2) , for j = 1, . . . , K

• Duj ∼ Binomial

(Nuj +Bj − Cj , 1− φj

), for j = 1, . . . , K − 1

• Dmij ∼ Binomial

(Ri −

∑j−1l=i D

mil −

∑jl=i+1Mil , 1− φj

), for i = 1, . . . , K−1 and j =

i, . . . , K − 1

The parameters φ, σ2,p and f used throughout are defined as in Section 3.2. The Poissonand Normal distributions were chosen arbitrarily for sake of illustration. In addition, weintroduced the parameter ξ = (ξ1, . . . ,ξK−1)>, where ξj represents the probability for anunmarked individual to be captured on occasion j. In some capture-recapture studies, wewould set (ξ2, . . . ,ξK−1) = (p2, . . . ,pK−1) when marked and unmarked individuals have thesame capture probability at each occasion.

The true joint likelihood is obtained as

L(φ, f ,p, N1, ξ|M,C,Y) = (3.5)

∑(B,Dm,Du)∈Ω∗

[K∏i=1

P (Yi)P (Bi)]K−1∏i=1

P (Dui )P (Ci)

K∏j=i+1

P (Mij)K−1∏j=i

P (Dmij )

,where the densities in (3.5) are given in the earlier bullet list and are actually conditionaldensities but the conditioning variables have not been explicited to simplify the notation.Also, Ω∗ is the set of all possible values for (B,Dm,Du). The observed data likelihood (3.5)highlights that the datasets (M,C) and Y are not independent - but they are conditionallyindependent given the latent variables B, Dm and Du.

31

Page 42: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

It is computationally convenient to fit the model (3.5) using a Bayesian approach viaMCMC techniques. Bayesian software that have built-in MCMC algorithms such as Win-BUGS and JAGS have become widespread in the statistical ecology community and areuser-friendly. Thus, we adopt a Bayesian framework in Sections 3.4 and 3.5.

3.3.2 Model variations

The model that we developed can easily be modified to suit a variety of situations that wehave not considered explicitly in this chapter so far. In this section, we give three examplesof this. Some involve combining the capture-recapture and count data with other typesof data (e.g. dead-recovery data, newborn counts). Other modifications are driven by theavailability of individual categorical covariates (e.g. sex, age) in the capture-recapture data.

• Consider the case where a count of newborns is carried out in every period, right afterbirths occur. Let this data be given by J = (J1, . . . , JK), where Jj is the newborncount in period j. The data can be incorporated in the model by using a distributionthat models the measurement error in the newborn counts. For example, one couldmodel Jj , for j = 1, . . . , K, using Jj ∼ Poisson(Bj).

• Consider the case where dead-recovery data is collected. The data consists of reportedidentification numbers of marked individuals found dead throughout the study. Thedata can be summarized in the form of an upper-triangular array, H, indexed byi = 1, . . . , K − 1 and j = i, . . . , K − 1. Cell Hij contains the number of marksrecovered dead in period j from individuals that were released for the last time onperiod i. Assuming that the dead recoveries occur at the very end of each time periodand that a mark may only be recovered in the period that an individual died, thedead recovery data can be modeled in the following way:

Hij ∼ Binomial(Dmij ,ρj),

where ρj is the probability that a marked individual that died during period j has itsmark recovered. If marks can be recovered in any period after death occurs, then wesuggest the use of a more complicated model to account for all the possible periodsthat death might have occurred:

Hij =j∑l=i

Xilj ,

where X111 ∼ Binomial (Dm11, ρ1) and other Xilj ’s are given by

Xilj ∼ Binomial

Dmil , ρj

j−1∏m=l

(1− ρm)

.32

Page 43: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

• Individual categorical covariates such as age or sex may be recorded at the time ofcapture. The capture-recapture data can thus be expressed in a number of separatem-arrays, e.g. one for individuals released as juveniles and one for individuals releasedas adults. In order to model the extra covariate information properly, the latentstate structure of the marked population in Table 3.1 needs to be changed. Eachcombination of release period and covariate value should have its own column, e.g.juveniles released in period 2. The modeling may also require the use of differentparameters for each covariate value, e.g. φ1 and φA for juvenile and adult survivalprobabilities.

In Section 3.5, we apply some of the variations discussed in this section to a real dataset.

3.4 Simulation Study

We conduct a simulation study in order to compare inference under the composite likelihoodand the true joint likelihood approaches presented in Sections 3.2.3 and 3.3.1, respectively.For sake of simplicity and faster convergence, we assume that parameters ξ, p, f and φ areconstant over time. We consider four scenarios:

Scenario ξ p f φ Characteristic1 0.2 0.2 2 0.5 Low sample effort, low turnover2 0.2 0.2 6 0.25 Low sample effort, high turnover3 0.5 0.5 2 0.5 High sample effort, low turnover4 0.5 0.5 6 0.25 High sample effort, high turnover

By varying ξ, p, f and φ across scenarios, we aim to vary the amount of dependencybetween the capture-recapture and the count data. Scenario 3 has the highest dependencybetween the datasets because it has the largest expected proportion of marked animalsover time. That is, it has the highest capture and recapture probabilities along with theslowest renewal of the population (lowest fecundity rate and highest survival probability).Conversely, scenario 2 has the lowest dependency between the datasets.

Common to all four scenarios, we set N1 = 500 for the initial population size andσ = 30 for the standard deviation of the population counts. For each of the four scenarios,we generate H = 250 datasets with K = 4 years of data, by sampling from the true modeldescribed in Section 3.3.1. Each dataset is analyzed using both the composite likelihood andtrue joint likelihood inference methods (described in Sections 3.2 and 3.3.1, respectively)in a Bayesian framework. Parameter estimates are computed as posterior means. Foreach scenario and inference method, we summarize the results for each parameter usingthe Monte Carlo bias (Bias), relative root mean square error (RMSE), bias ratio (BR),expected posterior standard deviation (E.SD), expected length of the 95% HPD, i.e. highest

33

Page 44: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

probability density, credible interval (E.LCI) and coverage probability of the HPD credibleinterval (CP). The formulas used to compute those Monte Carlo measures are given inAppendix B.1. For each parameter, we also compute the proportion of the H = 250populations for which the parameter estimate from the true joint likelihood approach iscloser in absolute value to the true parameter than the composite likelihood approach(PS.AE). Similarly, we compute the proportion of the H = 250 populations for which theposterior standard deviation from the true joint likelihood approach is smaller than thecomposite likelihood approach (PS.SD) and for which the credible interval length from thetrue joint likelihood approach is smaller than the composite likelihood approach (PS.LCI).The formulas used to compute those Monte Carlo measures are also given in Appendix B.1.

Sampling from the posterior distributions is done using MCMC techniques through theJAGS software. In order to save computational time, we set the initial parameter valuesof the Markov chains equal to the true parameter values. The number of iterations perrun chain was chosen as conservative and determined by visual inspection of the trace plotsof the first three populations. The chains for the true joint likelihood method are run for2,000,000 iterations for scenarios 1 and 2 and 1,000,000 iterations for scenarios 3 and 4.The chains for the composite likelihood method are run for 600,000 iterations. All chainsare thinned so to keep 100,000 iterations.

For our Bayesian data analysis, we use, whenever possible, the same prior distributionsfor both the composite likelihood and the true joint likelihood approaches. This is importantbecause we do not want differences in the results between the two methods to be artifactsof different priors. The priors for the parameters p and φ are Beta(1,1), the prior for fis Uniform(0,10), the prior for N1 is a discrete uniform on [1,2000] and the prior for σ isUniform(0,100). The prior for ξ in the true joint likelihood approach is Beta(1,1).

34

Page 45: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Scenario Method True value Bias RMSE BR E.SD E.LCI CPParameter φ

1 L 0.5 -0.01 0.06 -0.11 0.07 0.26 0.961 Lc 0.5 -0.01 0.07 -0.14 0.08 0.29 0.962 L 0.25 0.01 0.04 0.25 0.05 0.18 0.982 Lc 0.25 0.01 0.04 0.15 0.05 0.19 0.983 L 0.5 -0.00 0.02 -0.03 0.02 0.09 0.953 Lc 0.5 0.00 0.02 0.00 0.02 0.10 0.964 L 0.25 0.00 0.02 0.06 0.02 0.07 0.944 Lc 0.25 0.00 0.02 0.04 0.02 0.08 0.94

Parameter f1 L 2 0.21 0.59 0.37 0.56 2.10 0.961 Lc 2 0.30 0.77 0.42 0.72 2.75 0.972 L 6 0.13 1.10 0.12 1.43 5.22 0.972 Lc 6 0.26 1.26 0.22 1.57 5.74 0.973 L 2 0.02 0.14 0.11 0.14 0.56 0.963 Lc 2 0.02 0.22 0.08 0.28 1.12 0.994 L 6 0.05 0.55 0.10 0.52 2.02 0.954 Lc 6 0.09 0.67 0.14 0.75 2.95 0.96

Parameter p1 L 0.2 0.02 0.05 0.39 0.04 0.17 0.961 Lc 0.2 0.02 0.05 0.45 0.05 0.19 0.962 L 0.2 0.01 0.04 0.15 0.05 0.19 0.962 Lc 0.2 0.01 0.05 0.26 0.05 0.20 0.963 L 0.5 0.00 0.03 0.09 0.03 0.12 0.963 Lc 0.5 0.00 0.03 0.06 0.03 0.13 0.964 L 0.5 0.00 0.05 0.09 0.04 0.17 0.944 Lc 0.5 0.01 0.05 0.12 0.05 0.18 0.93

Parameter N1

1 L 500 -3.85 23.24 -0.17 30.39 124.10 0.981 Lc 500 -2.52 25.49 -0.10 39.76 168.00 0.992 L 500 -2.62 22.94 -0.11 28.51 115.48 0.982 Lc 500 3.14 25.47 0.12 40.01 169.16 1.003 L 500 -0.02 20.89 -0.00 29.49 119.66 0.963 Lc 500 1.30 25.04 0.05 41.24 173.78 0.994 L 500 -2.86 20.33 -0.14 26.54 107.70 0.984 Lc 500 -1.43 26.08 -0.05 40.47 171.07 0.98

Parameter σ1 L 30 8.62 16.31 0.62 20.65 72.78 0.961 Lc 30 10.78 17.47 0.78 22.06 77.02 0.932 L 30 9.19 16.48 0.67 20.50 72.65 0.952 Lc 30 10.84 17.53 0.79 22.35 77.79 0.923 L 30 10.60 17.29 0.78 19.94 71.22 0.933 Lc 30 12.77 18.63 0.94 22.06 77.18 0.934 L 30 8.94 17.19 0.61 19.37 68.90 0.934 Lc 30 11.50 18.02 0.83 22.33 77.80 0.91

Table 3.2: Monte Carlo measures comparing the performance of the true joint likelihoodapproach (L) and the composite likelihood approach (Lc) in the simulation study, acrossscenarios and parameters. Each Monte Carlo measure is based on 250 simulated datasets.

35

Page 46: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Scenario PS.AE PS.SD PS.LCIParameter φ

1 0.64 0.86 0.862 0.61 0.64 0.653 0.61 0.93 0.934 0.53 0.82 0.82

Parameter f1 0.61 1.00 1.002 0.65 0.82 0.763 0.69 1.00 1.004 0.60 1.00 1.00

Parameter p1 0.58 0.99 0.982 0.61 0.82 0.843 0.58 0.96 0.964 0.53 0.93 0.92

Parameter N1

1 0.61 0.95 0.982 0.59 0.94 0.983 0.60 0.96 0.974 0.58 0.96 0.98

Parameter σ1 0.66 0.90 0.842 0.59 0.88 0.793 0.62 0.91 0.824 0.55 0.96 0.85

Table 3.3: Monte Carlo estimates of P (WL ≤WLc), where W stands for either the absoluteerror (AE), the standard deviation of the posterior sample (SD) or the length of the 95%HPD credible interval (LCI). Each Monte Carlo measure is based on 250 simulated datasets.

36

Page 47: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

The results are displayed in Tables 3.2 and 3.3. Additional plots are provided in Ap-pendix B.2. We observe from Table 3.2 that the Monte Carlo RMSEs and expected posteriorstandard deviations obtained with the true joint likelihood approach are always smaller thanor equal to those obtained with the composite likelihood approach. Furthermore, turningto Table 3.3, the estimators were closer to the true value with the true likelihood approachmore than half of the time (between 53− 69% of the samples depending on the parameterand scenario). The true likelihood approach yielded smaller credible intervals than the com-posite likelihood approach more than half of the time (between 65 − 100% of the samplesdepending on the parameter and scenario).

We did not find evidence that the difference in performance between the true likelihoodapproach and the composite likelihood approach is more important when there is moredependency in the datasets. The boxplots for the parameters φ and f in Appendix B.2 showthis quite well, when comparing scenarios 1 with 3 and 2 with 4. Because the variabilityin the estimates is higher when the sampling effort is lower, the gain in absolute value ofusing the true likelihood approach is greater. This result is quite interesting because wefeel that there is at the moment a general presumption in the current integrated populationmodeling literature that when the dependency is low between the capture-recapture dataand the count data, the performance of the composite likelihood method is very close tothat of the true joint likelihood because the composite likelihood is similar to the true jointlikelihood .

A reassuring result in favor of the use of the simpler composite likelihood method isthat the credible interval coverage probabilities were all close to the 95% target (between91% and 100% across scenarios and parameters, with most of them greater than or equal to95%). However, there is no guarantee of a similar behavior in other studies, when modelingassumptions and/or data types are different.

With both the true likelihood approach and the composite likelihood approach, the biasin the estimates of the parameter σ was significant. We believe that this is due to the factthat the parameter σ is quite sensitive to the prior choice and the estimates got pulledtowards the upper tail of the prior distribution, see Gelman (2006). Also, with both thetrue likelihood approach and the composite likelihood approach, the bias in the estimatesof the parameter f was significant and positive in scenarios 1 and 2. We believe that this isdue to influence of the prior on f whose effect diminishes with increasing amount of data.The prior on σ could potentially also have an effect.

For the parameters φ, f and p, we notice a lower bias ratio in scenarios that have highersampling effort. We expected the opposite since increasing sampling effort typically reducesstandard errors. However, it seems that in this study, the decrease in bias ratio whenincreasing sampling effort would be explained by the fact that increasing sampling effortreduces the effect of the prior on the estimate.

37

Page 48: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

For the parameters N1 and σ, we notice that, for each method, the difference in theperformance of the estimates across scenarios is not important. In particular, increasingthe capture-recapture sampling effort does not improve significantly the estimation of N1

and σ. The boxplots for the parameters N1 and σ in Appendix B.2 show this quite well.Regarding N1, we attribute this result to the fact that most of the information on N1 comesfrom Y1, which has the same expected value across scenarios. Regarding σ, its estimate isbased on the difference between the counts and the expected population size at the time ofthe counts, which does not vary across scenarios (our values of f and φ were chosen suchthat the growth factor φ+ φf/2 is the same in all scenarios).

3.5 Application

We compared the performance of our integrated population modeling approach to the com-posite likelihood approach using data from a greater horseshoe bat colony (Rhinolophusferrumequinum) that lives in the attics of a twelfth-century chapel in Vex, Valais, Switzer-land (4613′N, 724′E); see Sierro et al., 2009. The data consists of capture-recapturedata, population counts and newborn counts from 1991 to 2005. This data was analyzed inSchaub et al. (2007) using integrated population modeling based on a composite likelihoodapproach.

The survey protocol included the following activities:

• Counts of individuals shortly before parturitionEvery year, except for 1991 and 2001, individuals emerging from the roost at duskwere counted on a day shortly before parturition. These population counts consist ofyoung and adults from both sexes that are present at the colony. Flying bats cannotbe aged or sexed.

• Chapel visits for captures and newborn counts while young were left unattendedEvery year, during the first weeks after parturition, when young were left unattendedin the attics, a count of the number of young was recorded and most young wereringed. The ring number along with the sex of the bats were recorded. Generally,the aim was to mark all the newborns in each year. The young must have a certainsize in order to be marked, but they must not be independent yet, otherwise they flyaway when they are approached. Thus, there is a time window in which the youngcan be marked. Since the births are always very not synchronous, this means thatseveral visits have to be made to the chapel in order to mark as many as possible. Insome years, only one visit has been done due to time constraints and, in some years,the births were very asynchronous and more visits would have been necessary so anumber of young remained unmarked. There is also a possibility for the timing to be

38

Page 49: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

judged wrongly and then some young were already independent during the first visit.In all cases, the number of young could be counted with very high accuracy.

• Recaptures in the chapel in 2004 and 2005In 2004 and 2005, at about the same period as the count survey, the main entranceof the chapel was blocked on one day at daylight and all individuals present in thechapel were recaptured. The ring number along with the sex of the bats were recorded.These recaptures were only carried out in 2004 and 2005 in order not to perturb thecolony too much.

When modeling the data, we assume the following timeline of events. The study has 15time periods, which start with a population count (except in 1991 and 2001) and simultane-ous recaptures (2004 and 2005 only), immediately followed by births, immediately followedby newborn counts and simultaneous captures of unmarked animals.

In this section, we use the superscript age,sex to define variables for age and sex cate-gories. Age can take the values 0 or 1+, which represent respectively zero years old (firstyear of life) and at least one year old. Sex can take the values m or f denoting males andfemales, respectively.

The count data is denoted by the vector Y and the newborn data by the vector J.Both vectors have length K = 15. The capture-recapture data takes the form of four m-arrays, based on sex and age at release. The data is coded in the variables (M0,f ,Z0,f ),(M0,m,Z0,m), (M1+,f ,Z1+,f ) and (M1+,m,Z1+,m) defined similarly to (M,Z) in Section3.2.1. For example, M0,m

3,14 is the number of males released in their first year of life in period3 that were recaptured (as adults) in period 14.

To model the data, we choose to use the parametrization that was selected based on theDIC criterion in Schaub et al. (2007). We are not performing model selection because ourgoal with this work is to illustrate the difference between two inference approaches whenusing a common set of parameters. Following Schaub et al. (2007), we assume that thesurvival probability in the first year of age, φ0, is different from the survival probabilityin subsequent years, φ1+ . A fecundity parameter, f , is constant over time and stands forthe yearly mean number of newborn per female old enough to reproduce. We assume thatfemales start reproducing in their second year of life because reproduction at an earlierage is rare. We also assume that a bat present in the chapel at a recapture occasion isrecaptured with probability one. Therefore, the probability of recapture corresponds to theprobability of being present in the nursery colony at the time of recaptures. The probabilityof presence in the chapel, page,sexj , is assumed to vary by year (j =14 or 15, for 2004 and2005), sex (m or f) and age category (0 or 1+).

To define the true joint likelihood model, we introduce the following latent variables,defined analogously to the ones in Section 3.3.1: Nage,sex

1 , Bsex, Du;age,sex, Dm;age,sex, whereage = 0 or 1+ and sex = m or f . Note that in Dm;age,sex

ij , age indicates age at release at

39

Page 50: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

time i, not at recapture at time j. Analogously to Section 3.3.1, we compute the numberof first captures C0,m and C0,f from the capture-recapture data.

The true joint likelihood model is described by the following distributions. For easeof notation, we omit specifying the variables that the distributions are conditioned upon.Also, sums that go backwards are defined as zero and undefined variables are defined aszero.

• C0,sexj ∼ Binomial

(Bsexj , ξj

)for j = 1, . . . , K − 1

• M0,sexij ∼ Binomial(C0,sex

i −∑j−1l=i D

m;0,sexil −

∑j−1l=i+1M

0,sexil , p0,sex

j ), for (i,j) = (13,14),(14,15)

• M0,sexij ∼ Binomial(C0,sex

i −∑j−1l=i D

m;0,sexil −

∑j−1l=i+1M

0,sexil , p1+,sex

j ), for i = 1, . . . ,12 and j =14,15 or (i,j) = (13,15)

• M1+,sexij ∼ Binomial(R1+,sex

i −∑j−1l=i D

m;1+,sexil −

∑j−1l=i+1M

1+,sexil , p1+,sex

j ), for i =1, . . . ,13 and j = 14,15 or (i,j) = (14,15), where

R1+,sexi = C1+,sex

i +i−1∑k=1

(M1+,sexki +M0,sex

ki )

is the number of released adult individuals of a given sex in period i

• Yj ∼ Normal(N1+,mj τ1+,m +N1+,f

j τ1+,f +N0,mj τ0,m +N0,f

j τ0,f , σ2), for j = 1, . . . , K,

where

N0,sexj = Bsex

j−1 −Du;0,sexj−1 −Dm;0,sex

j−1,j−1,

is the number of individuals of a given sex in their first year of life at the beginningof period j

N1+,sexj = N1+,sex

j−1 +N0,sexj−1 −D

u;1+,sexj−1 −

j−1∑k=1

Dm;1+,sexk,j−1 −

j−2∑k=1

Dm;0,sexk,j−1

is the number of individuals of a given sex over one year old at the beginning of periodj,

andτage,sexj =

√page,sex14 page,sex15 (3.6)

is an index of the presence rate at the colony at the time of the count.

• Jj ∼ Poisson(N1+,fj f

), for j = 1, . . . , K

•(Bmj , B

fj

)∼ Multinomial (Jj ; 0.5, 0.5) , for j = 1, . . . , K

40

Page 51: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

• Du;1+,sexj ∼ Binomial

(Nu;1+,sexj +Nu;0,sex

j , 1− φ1+j

), for j = 1, . . . , K − 1, where

Nu;0,sexj = Bsex

j−1 − C0,sexj−1 −D

u;0,sexj−1

is the number of unmarked individuals of a given sex in their first year of life at thebeginning of period j and

Nu;1+,sexj = Nu;1+,sex

j−1 +Nu;0,sexj−1 −Du;1+,sex

j−1

is the number of unmarked individuals of a given sex over one year old at the beginningof period j

• Du;0,sexj ∼ Binomial

(Bsexj − C0,sex

j , 1− φ0j

), for j = 1, . . . , K − 1

• Dm;0,sexij ∼ Binomial

(C0,sexi −

∑j−1l=i D

m;0,sexil −

∑jl=i+1M

0,sexil , 1− φ1+

j

), for i = 1, . . . , K−

1, j = i+ 1, . . . , K − 1

• Dm;1+,sexij ∼ Binomial

(R1+,sexi −

∑j−1l=i D

m;1+,sexil −

∑jl=i+1M

1+,sexil , 1− φ1+

j

), for i =

1, . . . , K − 1, j = i+ 1, . . . , K − 1

• Dm;0,sexii ∼ Binomial

(C0,sexi , 1− φ0

j

), for i = 1, . . . , K − 1

• Dm;1+,sexii ∼ Binomial

(R1+,sexi , 1− φ1+

j

), for i = 1, . . . , K − 1

For the approach based on a composite likelihood, we specify models for the countdata, the newborn count data and the capture-recapture data separately. For the capture-recapture data, we model the data in each of the four m-arrays separately using productsof multinomial distributions as in Schaub et al. (2007). Because there is no recapture effortbefore 2004, the multinomial probabilities are set to zero prior to 2004. For the count data,we use

Yj ∼ Normal(N0,fj τ0,f +N0,m

j τ0,m +N1+,fj τ1+,f +N1+,m

j τ1+,m, σ2), for j = 1, . . . , K,

where the τ ’s are defined in equation (3.6) and

N1+,sexj ∼ Binomial

(N1+,sexj−1 +N0,sex

j−1 , φ1+)N0,sexj ∼ Poisson

(N1+,fj−1

f

2φ0),

for j = 2, . . . , K. For the newborn count data, we use Jj ∼ Poisson(N1+,fj f

)for j =

1, . . . , K.Whenever possible, we used the same prior distributions for the true likelihood and the

composite approaches. For the prior distribution of ξ, we set ξj ∼ Beta(1,1) for all js.

41

Page 52: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

For the other parameters, we used the priors in Schaub et al. (2007). We used Beta(1,1)priors for the survival and the recapture probabilities. We used a Gamma prior with shapeand rate parameters both set at 10−3 for the inverse of σ2 and a Normal(0, 104) on thelog fecundity. We used Normal priors truncated to positive values and rounded to thenearest integer for the initial population sizes: Normal(10, 104) for N1+,f

1 and N1+,m1 and

Normal(20, 104) for N0,f1 and N0,m

1 .For both methods, we ran the MCMC chains for 1,500,000 iterations, with a burn-

in/adaptation period of 500,000 iterations and a thinning factor of 10. Plots of the marginalposterior distributions are shown in Figure 3.2. The posterior distributions for fecundityand survival parameters (f, φ0, φ1+) look the most different between the true joint likelihoodapproach and the composite likelihood approach. The true joint likelihood approach yieldsnarrower posterior distributions for those parameters, hence indicating that for this datathere is a benefit in using the true joint likelihood approach because it yields narrowercredible intervals. Also, the posterior means and standard deviations are summarized inAppendix B.3. The true joint likelihood approach yields smaller estimates of fecundity,f = 0.64 vs 0.71, and juvenile survival, φ0 = 0.41 vs 0.45, but a larger estimate of survivalwhen older, φ1+ = 0.94 vs 0.92. All 95% HPD credible interval are smaller with the truejoint likelihood method than with the composite likelihood method, except for the parameterτ1+,f .

42

Page 53: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

0.2 0.3 0.4 0.5 0.6 0.7

02

46

Survival probabilityin the first year

Pos

terio

r de

nsity

0.80 0.90 1.00

05

1015

20

Survival probabilityafter the first year

Pos

terio

r de

nsity

0.2 0.4 0.6 0.8 1.0 1.2

01

23

4

Fecundity rate

Pos

terio

r de

nsity

0.4 0.6 0.8 1.0

01

23

4

Presence probability of femalesin their first year

Pos

terio

r de

nsity

0.6 0.7 0.8 0.9 1.0

02

46

8

Presence probability of femalesafter the first year

Pos

terio

r de

nsity

0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Presence probability of malesin their first year

Pos

terio

r de

nsity

0.0 0.2 0.4 0.6 0.8

01

23

4

Presence probability of malesafter the first year

Pos

terio

r de

nsity

0 2 4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

Standard deviation in counts

Pos

terio

r de

nsity

Figure 3.2: Marginal posterior distributions (smoothed) obtained from analyzing the batsdata. The plain line represents the true joint likelihood method while the dashed linerepresents the composite likelihood method.

3.6 Discussion

In this chapter, we introduced an integrated population modeling approach that does notrely on the typical independence assumption. The independence assumption is widely usedin practice and there appears to be a belief in the literature that the use of the independenceassumption is justified when the dependency between the capture-recapture data and thecount data is low. However, this belief was not supported by our simulation study whichsuggests that, even in cases of low dependency, inference based on the true joint likeli-hood might be significantly different from inference based on the independence assumption.

43

Page 54: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

When the marking effort is lower, the dependency between the datasets is thought to beweaker but the precision of estimates is also expected to be lower. Thus, using an explicitintegrated population modeling approach rather than a composite likelihood approach inlower dependency cases could be worth the extra effort even if the percent gain in precisionis smaller relative to higher dependency scenarios, because this gain would be larger inabsolute value.

The composite likelihood method performed surprisingly well in our simulation studywith credible intervals having the targeted coverage probability. However, satisfactory be-havior in this study does not guarantee a similar behavior in studies with different modelingassumptions and/or types of data.

In the simulation study, we did not investigate cases with values of ξ = p smaller than0.2. That is because the MCMC algorithm used in JAGS was slower to converge for thetrue likelihood method which increased the computational burden of the simulation study.We had initially considered ξ = p = 0.1 for scenarios 1 and 2 and the MCMC chains hadclearly not converged after one million iterations due to high auto-correlations. In furtherwork, it would be interesting to investigate cases with smaller values of p and ξ by eitherinvesting more computer resources and time in the simulation study or reduce the numberof replicates. Improving the mixing of the chain with a custom algorithm does not seemeasy.

At some point during this work, we considered an alternative parametrization to the onegiven in Section 3.3.1 when formulating the true joint likelihood model. We have successfullyimplemented this parametrization, which is given by

• Nu, a vector of length K, where Nuj represents the size of the unmarked population

at the beginning of period j

• Nm, an upper triangular array indexed by i = 1, . . . , K−1 and j = i+1, . . . , Kwhere Nm

ij represents the number of animals released for the last time in period i thatare alive and not recaptured prior to period j

• B, a vector of length K, where Bj represents the number of births in period j.

This parametrization is similar in spirit to that of Lee et al. (2015). However Lee et al.did not introduce Nm because they analyzed the capture histories rather than the m-arraysummaries.

44

Page 55: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Chapter 4

Integrated population modeling ofChinook salmon (Oncorhynchustshawytscha) migration on theWest Coast of Vancouver Island

4.1 Introduction

As reported by Fisheries and Oceans Canada (DFO), “Chinook (Oncorhynchus tshawytscha)from the west coast of Vancouver Island (WCVI) are one of British Columbia’s most impor-tant natural resources. These stocks have long been major contributors to First Nations,commercial troll, and sport catches, from Alaska to southern Vancouver Island” (DFO,2012). Wild WCVI Chinook salmon current population status is poor despite managementactions taken over the last 15 years (DFO, 2012) including harvest restrictions and hatcherypropagation. The factors contributing to the low abundance and failure to rebuild stocksremain uncertain.

Chinook salmon on the West Coast of Vancouver Island return to their natal streamsor rivers in the fall to spawn and die once they have spawned. Burman River, on theWest Coast of Vancouver Island, is one of the six streams selected to represent the escape-ment of naturally spawned Chinook salmon in WCVI streams for management under thePacific Salmon Treaty which provides a joint Canada-US framework for conservation andmanagement of Pacific salmon – escapement is the number of fish that escape fisheries tospawn in freshwater. When transitioning from saltwater to freshwater an acclimation ofsalmon’s osmoregulatory function is required to maintain homeostasis in freshwater. Tothis end, Chinook salmon migrating to the Burman River hold at least briefly at the upperlimit of tidal influence in a trench pool scoured by a confining bedrock (Figure 4.1). Notethat similar stopover behavior is observed in other local Chinook salmon populations, e.g.

45

Page 56: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Conuma River, Gold River. Although occasional early freshets in August may stimulateChinook salmon to move upstream, most hold in the stopover pool until the higher flowsof September and October stimulate upstream movements and spawning. The freshet pro-vides not only access to spawning areas but also sufficient freshwater volume to initiate thephysiological osmoregulatory changes. Migration into the upstream spawning area (km 0to 7.5, in Figure 4.1) is generally complete by mid-October. Spawning is complete by lateOctober or early November.

Figure 4.1: Map of Burman River on the West Coast of Vancouver Island, Canada

In the WCVI region, DFO relies on periodic snorkel surveys to estimate escapementusing an area-under-the-curve (AUC) method (Hilborn et al. 1999; Parken et al. 2003). Thetraditional AUC method estimates escapement by dividing the AUC – the area under thetime curve formed by interpolating snorkel counts – by an estimate of observer efficiency andan estimate of mean fish residence time in the counting area. As detection probability andresidence time are known to vary annually, by location and with environmental conditions,stream-specific parameter measurements are required to produce reliable species-specificestimates with the method (English et al. 1992; Parken et al. 2003). DFO’s AUC estimationapproach in the WCVI has been criticized (DFO, 2014) because the observer efficiency andmean residence time estimates are chosen subjectively rather than estimated rigorouslyusing, for example, a radio-tagging survey. Periodic snorkel surveys will likely remain the

46

Page 57: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

method of choice to monitor escapements in the future on the WCVI unless replaced byother technology. In order for estimates to be reliable under this method, it is necessaryto gain a better understanding of the migration dynamics affecting residence time in thecounting area as well as estimates of observer efficiency. For example, it is believed thatin years with late freshets, fish spend more time in the stop-over pool and less time in thespawning area where the snorkel survey takes place. Residence time is also believed to berelated to local temperature. Although costs of initial studies to establish the relationshipbetween time of first freshet and residence time in the counting area are substantial, suchstudies would provide better estimates of spawning escapement in the long term.

Lack of certainty regarding the accuracy of population estimates with the AUC esti-mation approach prompted establishment of the Sentinel Stocks Program (SSP) under the2009 Pacific Salmon Treaty to improve estimates of Chinook salmon escapement in WCVIand other regions. Funding was provided annually from 2009 to 2013 by SSP for surveys inthe Burman River. Funding was also provided in 2014 by the PSC Southern EndowmentFund. Every year from 2009 to 2014, capture-recapture surveys, dead recovery surveys andsnorkel surveys were conducted. In addition, a radio-tagging survey was conducted in 2012.One hundred radio-tags were applied to fish in the stopover pool over the course of thecapture-recapture survey. Radio-tag signals were recorded by fixed receivers at km 0 andkm 7.5 and by foot survey. The radio-tags were set so that a different signal was sent if afish had not moved for at least 12 hours.

In Section 4.2, we describe the sampling protocol used in 2009-2014 to conduct thecapture-recapture, dead recoveries and snorkel surveys. In Section 4.4, we show how toapply the Jolly-Seber method to the capture-recapture data to estimate escapement andother parameters of interest. In Section 4.5, we show how to analyze the capture-recapture,carcass and snorkel data together using an explicit integrated population model, along thelines of Chapter 3. In Section 4.6, we apply the methods presented in Sections 4.4 and 4.5to the 2012 Burman data. We conclude with a discussion in Section 4.7.

4.2 Sampling Protocol

From 2009-2014, Chinook salmon surveys consisting of capture-recapture, dead recoveriesand snorkel surveys, took place at Burman River. Surveys were conducted periodically butcould not be conducted at high flows. In 2012 only, a radio-tagging study also took placebut it is not the focus of this work.

Capture-recapture surveys took place at the Burman stopover pool, starting in Septem-ber when fish started to arrive in the lower river; they were continued until three consecutivecapture-recapture occasions led to no catch, suggesting that most fish had moved to thespawning area. On capture-recapture occasions (two to three days per week), fish werecaught with a beach seine. Sampling was limited to three beach-seine sets per day in an

47

Page 58: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

attempt to keep absolute sampling effort approximately constant each week. Fish capturedfor the first time were tagged, measured for post orbital hypural length (POH) and visuallysexed before being released. The tags used for marking were dorsally visible uniquely-numbered 80 lb. monofilament-cored Floytm tags inserted between the pterigiophores witha needle and secured with size “J” metal sleeves. In order to assess tag loss, a secondarypermanent mutilation mark was applied to the opercula. Recaptured fish had their tagcolor and number recorded and sex was assessed visually. Although some jacks (subadultfish) were marked during the capture-recapture survey, adults are the main interest of thisstudy so the jack data is not analyzed in this work. The POH length cutoff used for jackswas set at 500 mm.

As spawners moved upstream from the stopover site, carcass surveys began and contin-ued until carcasses were no longer present. On each carcass survey day, a crew recovered allaccessible carcasses along a given route down the main channel. Recovered carcasses weresexed, measured (POH) and the tag id and color were recorded for marked fish. However,not all carcasses present in the stream could be sampled on a given carcass survey occasion;for example, some carcasses along the sampling route were stuck in a log jam and thereforenot accessible. Resampling of carcasses was prevented by sectioning the head. Sampledcarcasses were rarely observed again on subsequent sampling occasions as they most likelygot flushed out by flows. Although some jack carcasses were recovered, this data will notbe analyzed in this work.

Snorkel surveys were conducted periodically over the study period. The river, from rkm7.5 to rkm 0, was typically swum by two snorkelers who recorded the number of markedand total fish seen. It is not possible to read the tag number of marked fish in the snorkelsurvey. Other variables were recorded on survey days such as fish visibility. The standardsurvey procedure that was followed consisted in recording one joint observation for eachriver section, agreed between the two snorkelers. The individual observers’ counts are notavailable.

Finally, extraneous to the survey protocol, the number of fish removed from the pool bythe hatchery during migration was recorded.

4.3 Notation

Throughout this chapter, we use the notation given in Table 4.1 to describe the data. Wediscard any snorkel survey data that is outside the range of dates spanned by the capture-recapture and carcass surveys. Data are defined as zero on days when surveys did not takeplace.

The notation for the parameters used to define the Jolly-Seber and/or the integratedpopulation models is given in Table 4.2.

48

Page 59: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Survey Variable DefinitionCapture-recapture Mi,j,s Number of fish of sex s released on day i and recaptured

next on day j.Capture-recapture Cj,s Number of individuals of sex s captured and released for

the first time on day j.Capture-recapture Rj,s Number of individuals of sex s released on day j.

Note: this is a redundant variable because it can becomputed from C and M, as Cj,s +

∑j−1i=1 Mi,j,s.

Carcass survey Zuj,s Number of unmarked fish of sex s whose carcass wasrecovered on day j.

Carcass survey Zmi,j,s Number of marked fish of sex s whose carcass wasrecovered on day j and that were released previously onday i and not recaptured since.

Snorkel survey Y uj Snorkel count of unmarked fish on day j.

Snorkel survey Y mj Snorkel count of marked fish on day j.

Snorkel survey vj Fish visibility on day j. Can take the values low,medium, high or unknown.

Hatchery removals Huj,s Number of unmarked fish of sex s removed by the

hatchery on day j.

Table 4.1: Notation for the data collected at Burman River. The subscript s can take thevalues m (males) and f (females).

The first day of the capture-recapture study is considered as day one. Let Kcapt denotethe day on which the last capture occasion with nonzero catch occurred. Let Kpool denotean arbitrary date which we assume is the last day when Chinook salmon are present in thestopover pool. Let Kcarc be the last carcass survey day, i.e. the last day of the study.

For improved readability, through this chapter, we purposely omit the conditioningvariables in distributions.

4.4 A Jolly-Seber approach to estimate escapement

As a straightforward approach to estimating escapement in a given survey year, we suggestanalyzing the capture-recapture data at the stopover pool using a Jolly-Seber model withthe POPAN parametrization of Schwarz and Arnason (1996). Although inference underthe Jolly-Seber model is typically conducted using a frequentist framework, we adopt aBayesian approach in order to make pertinent comparisons with the integrated populationmodel in Section 4.5.

49

Page 60: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Parameter DefinitionBj,s Number of individuals of sex s that arrive (from the ocean) to the stopover

pool after midday on day j and before midday the next day.Note: B0,s is the number of individuals of sex s in the pool right beforemidday on day 1.

Nuj,s Number of unmarked individuals of sex s in the stopover pool at midday on

day j.Nmi,j,s Number of marked individuals of sex s that are in the stopover pool at

midday on day j and were released previously on day i and not recapturedprior to day j.

T uj,s Number of unmarked individuals of sex s that transition from the stopoverpool to the spawning area after midday on day j and before midday the nextday.

Tmi,j,s Number of marked individuals of sex s that transition from the stopoverpool to the spawning area between midday on day j and midday the nextday and were released previously on day i and not recaptured since.

Auj,s Number of unmarked individuals of sex s alive in spawning area beforemidday on day j.

Ami,j,s Number of marked individuals of sex s that are alive in the spawning areabefore midday on day j and were released previously on day i and notrecaptured since.

Duj,s Number of unmarked fish of sex s that died after midday on day j and

before midday the next day.Dmi,j,s Number of marked fish of sex s that died between midday on day j and

midday the next day and were released previously on day i and notrecaptured since.

Xuj,s Number of dead unmarked fish of sex s present in the river at midday on

day j.Xmi,j,s Number of marked fish of sex s that died and are present in the river at

midday on day j and were released previously on day i and not recapturedsince.

F uj,s Number of dead unmarked fish of sex s that got flushed out between middayon day j and midday the next day.

Fmi,j,s Number of dead marked fish of sex s that got flushed out between middayon day j and midday the next day and were released previously on day iand not recaptured since.

pmovej,s Probability for individuals of sex s in the stopover pool at midday on day j

to move to the spawning area before midday the next day.φj,s Probability for individuals of sex s alive in the spawning area at midday on

50

Page 61: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

day j to survive until midday the next day.pflushj Probability for dead individuals in the river at midday on day j to get

flushed out before midday the next day.pcapti,s Capture probability for individuals of sex s in the stopover pool at midday

on day i.precov Recovery probability of carcasses present in the river at midday on a given

carcass survey day.psnorj Probability for fish alive in the river at midday on day j to be counted in the

snorkel survey.µsnorlow Intercept, on the non-logit scale, used to model logit

(psnorj

).

αv Linear effect of fish visibility v, on the logit scale, used to model logit(psnorj

).

αlow is set equal to 0.σv Standard deviation used to model logit

(psnorj

)for a given visibility v.

∆j Number of marked adult fish miscounted as unmarked in the snorkel surveyon day j.

p∆ Probability for marked fish to be miscounted as unmarked in a given snorkelsurvey.

Table 4.2: Notation for the parameters used in the Jolly-Seber model and/or the integratedpopulation model. The subscript s can take the values m (males) and f (females).

Table 4.3 gives an organized representation of the notation used in the Jolly-Seber modelformulation. We suppose that before the study begins, there is a superpopulation of adultChinook salmon in the ocean that will soon migrate to Burman River. Before the firstcapture-recapture sampling occasion, B0,m and B0,f fish (males and females respectively)enter the lower river and are available to be sampled at the stopover pool at the first captureoccasion. Between midday on day j = 1, . . . ,Kcapt− 1 and midday the next day, Bj,m maleand Bj,f female Chinook salmon newly arrive at the stopover pool. We assume that fishthat newly arrive in the pool do not leave before midday the next day. Fish present in thepool at midday on day j = 1, . . . ,Kcapt − 1, move to the upper sections of the river beforemidday the next day with probability pmove

j,m and pmovej,f , for males and females respectively.

In absolute numbers, T uj,m unmarked males and T uj,f unmarked females move upstream; andTmi,j,m marked males and Tmi,j,f marked females move upstream and were released previouslyon day i and not recaptured since.

Note that the entrants to the pool in this description are analogous to “births” ina typical Jolly-Seber model while fish that move out of the stopover pool towards thespawning area are analogous to “apparent deaths” in a typical Jolly-Seber model. Typically,in a Jolly-Seber model, arrival (“birth”) and transition (“apparent death”) parameters aredefined on capture occasions only, but the Bayesian framework here allows us to have births

51

Page 62: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Parameters toLatent State relate the data

Transition State number transitions Data to the latentof fish probabilities number of fish

Arrive at Burman BIn stopover pool Nu, Nm M, C pcapt

Move upstream Tu, Tm pmove

Table 4.3: Variables used in the Jolly-Seber model, categorized based on their role in themodel.

and movement every day and estimate Bj,s, T uj,s, Tmi,j,s and pmovej,s even for days j that do

not have capture occasions. This way, we can obtain daily estimates of population size inthe pool as well as better estimates of escapement. Having births and movement every dayalso allows for better comparisons with the integrated population model.

On capture occasion i, males and females are captured with probability pcapti,m and pcapti,f

respectively. To ensure identifiability of all parameters, we set the capture probability onthe first capture occasion, for each sex, equal to that on the second occasion. We imposethe same equality on the last two occasions.

The Jolly-Seber model is described by the following distributions and state equations,where i takes values within the set of capture-recapture survey days and s = f or m forfemales or males, respectively. First, the size of the unmarked population of sex s in thepool at midday on day j is governed by the following equation

Nuj,s = Nu

j−1,s − Cj−1,s +Bj−1,s − T uj−1,s −Huj−1,s, (4.1)

for j = 2, . . . ,Kcapt and with Nu1,s = B0,s. The number of marked individuals of sex s

released previously on capture-recapture day i, not recaptured since, and in the pool atmidday on day j is governed by the following equations

Nmi,i+1,s = Ri,s − Tmi,i,s (4.2)

andNmi,j,s = Nm

i,j−1,s −Mi,j−1,s − Tmi,j−1,s, (4.3)

for j = i+ 2, . . . ,Kcapt.The transitions (unobservable) are modeled as

• T uj,s ∼ Binomial(Nuj,s − Cj,s, pmove

j,s

), for j = 1, . . . ,Kcapt − 1

• Tmi,i,s ∼ Binomial(Ri,s, p

movei,s

)Tmi,j,s ∼ Binomial

(Nmi,j,s −Mi,j,s, p

movej,s

), for j = i+ 1, . . . ,Kcapt − 1.

52

Page 63: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

The capture-recapture data (observable) are modeled as

• Mi,j,s ∼ Binomial(Nmi,j,s, p

captj,s

), for capture survey days j ∈ i+ 1, . . . ,Kcapt

• Cj,s ∼ Binomial(Nuj,s, p

captj,s

), for capture survey days j ∈ 1, . . . ,Kcapt.

Quantities of interest can be calculated once a posterior sample is obtained. Theseinclude escapement (per sex and total), population size in the pool (per sex, over time) andmean stopover time in the pool in days (per sex and arrival day in the pool). The formulasused to calculate the latter quantities are given in Table 4.4. The escapement estimatesobtained with the Jolly-Seber model are likely to be biased low because they do not accountfor fish that enter the river after the last capture-recapture occasion with non-zero catch.The mean stopover time estimates may also be biased low because they do not accountfor the fact that some fish most likely remain in the pool after the last capture-recaptureoccasion with nonzero catch.

4.5 Integrated population modeling

In this section, we present an integrated population model that incorporates all sources ofdata in a single analysis, along the lines of the work in Chapter 3. The Chinook salmon mi-gration is assumed to follow the steps illustrated in Figure 4.2. Table 4.5 gives an organizedrepresentation of the notation used in the integrated population model.

Figure 4.2: Schematic representation of Chinook salmon migration at Burman River, asassumed by the integrated population model. The arrows denote transitions while boxesdenote states.

For the integrated population model, we assume that movement to and out of thestopover pool is as described in Section 4.4, except that we allow for new entrants up toj = Kpool − 1 and we assume that after midday on day Kpool, no new entrants arriveand all fish remaining in the pool leave the pool by midday the next day. Thus we setpmoveKpool = 1 and Bj,s = 0 for j = Kpool, . . . ,Kcarc − 1, as well as T uj,s = 0 and Tmi,j,s = 0 forj = Kpool + 1, . . . ,Kcarc − 1. We assume that the number of fish alive of each sex in thespawning area at midday on day 1 is equal to those on day 2. Fish alive in the spawning areaafter midday on day j = 1, . . . ,Kcarc − 1 die before midday the next day with probability1−φj,m for males and 1−φj,f for females. We also allow for fish that move to the upstream

53

Page 64: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Quantity Formulas for the Jolly-Seber model Formulas for the integrated population modelKcapt−1∑j=0

Bj,sKpool−1∑j=0

Bj,sEscapement for sex s

Total escapementKcapt−1∑j=0

(Bj,m +Bj,f )Kcapt−1∑j=0

(Bj,m +Bj,f )

Number of individualsNuj,s +

j∑i=1

Nmi,j,s Nu

j,s +j∑i=1

Nmi,j,sof sex s in the pool

at midday on day jMean stopover time

0.5pmovej,s +

Kcapt−j∑d=2

(d− 0.5)pmovej+d−1,s

j+d−2∏l=j

(1− pmovel,s ) 0.5pmove

j,s +Kpool−j∑d=2

(d− 0.5)pmovej+d−1,s

j+d−2∏l=j

(1− pmovel,s )

for individuals of sexs that arrived in thepool between middayon day j and middaythe next dayMean residence time

n/a 0.5(1− φj,s) +Kcarc−j∑d=2

(d− 0.5)(1− φj+d−1,s)j+d−2∏l=j

φl,s

for individuals of sexs that arrived in thespawning area betweenmidday on day j andmidday the next dayNumber of individuals

n/a Auj,s +j∑i=1

Ami,j,sof sex s alive in thespawning area atmidday on day jMedian snorkel

n/a logit−1 (logit (µsnorlow ) + αv)observer efficiency atvisibility v

Table 4.4: Formulas used to compute quantities of interest for the Jolly-Seber model or the integrated population model. Residencetime and alive population size in the stream cannot be estimated from the Jolly-Seber model. Notes: (1) Sums are defined as zero whenbackwards; (2) The use of d− 0.5 in the mean stopover time calculation is based on the assumption that within a day, the movementof fish upstream to the spawning grounds is distributed uniformly over the day; (3) The latent variables Nm

i,j,s, and Ami,j,s are defined as0 when i is not a capture-recapture day; (4) The time unit is days.

54

Page 65: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Latent State Parameters toTransition State number transitions Data relate the data to the

of fish probabilities latent number of fishArrive at

BBurman

InNu, Nm M, C pcaptstopover

poolMove

Tu, Tm pmoveupstream

Alive inAu, Am Yu, Ym, v psnor, µsnor

low , α, σ, ∆, p∆stream

Die Du, Dm φ

Dead inXu, Xm Zu, Zm precov

streamGet

Fu, Fm pflushflushedout

Table 4.5: Variables used in the integrated population model, categorized based on theirrole in the model.

area between midday on day j and midday the next day to die with the same probabilities.In absolute numbers, Du

j,f unmarked females and Duj,m unmarked males die between midday

on day j and midday the following day. Also, Dmi,j,f marked females andDm

i,j,m marked malesdie between midday on day j and midday the following day and were previously releasedon day i and not recaptured since. We assume that there are no carcasses in the streamat midday on day 1. Fish dead in the stream at midday on day j = 2, . . . ,Kcarc − 1 getflushed out before midday the next day with probability pflush. In absolute numbers, F uj,funmarked carcasses of female fish and F uj,m unmarked carcasses of male fish get flushed out.In addition, Fmi,j,f marked carcasses of female fish and Fmi,j,m marked carcasses of male fishget flushed out and were released previously on day i and not recaptured since.

We assume that capture-recapture surveys, carcass surveys and snorkel surveys areinstantaneous and occur simultaneously at midday. On capture-recapture day i, males andfemales are captured with probability pcapti,m and pcapti,f respectively. To avoid identifiabilityissues, and to make things comparable with the Jolly-Seber model, we set the captureprobability on the first capture occasion, for each sex, equal to that on the second occasion.We impose the same equality on the last two occasions. On carcass survey day j, carcassespresent in the stream are picked with probability precov. On snorkel survey day j, a number

55

Page 66: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

∆j of marked fish appears as unmarked if counted. Fish are counted with probability psnorj ,which depends on the visibility on day j.

While there are many possible candidate integrated population models, we present onlyone candidate model for the sake of simplicity and as a starting point in understandingbetter the migration of Chinook salmon at Burman River. The model is developed withthe perspective that it will be fitted using Bayesian methods. It is parametrized using anumber of latent variables and fundamental parameters, described in Table 4.2.

Our integrated population model is described by the following distributions and stateequations, where i takes values within the set of capture-recapture survey days and s =f or m for females or males, respectively. First, the sizes of the marked and unmarkedpopulation in the pool at midday, Nm

i,j,s and Nuj,s respectively, are governed by equations

(4.1)-(4.3), where j = 1, . . . ,Kpool. The state equation describing the number of unmarkedindividuals of sex s alive in the river at midday on day j is

Auj,s = Auj−1,s + T uj−1,s −Duj−1,s,

for j = 2, . . . ,Kcarc and with Au1,s = Au2,s. The state equation describing the number ofmarked individuals of sex s released last on day i and alive in the river at midday on day jis

Ami,j,s = Ami,j−1,s + Tmi,j−1,s −Dmi,j−1,s,

for j = i + 1, . . . ,Kcarc. The state equation describing the number of dead unmarked fishof sex s that are present in the river at midday on day j is

Xuj,s = Xu

j−1,s − F uj−1,s − Zuj−1,s +Duj−1,s,

for j = 2, . . . ,Kcarc. The state equation describing the number of dead marked fish of sexs present in the river at midday day j and released previously on day i and not recapturedsince is

Xmi,j,s = Xu

i,j−1,s − F ui,j−1,s − Zui,j−1,s +Dui,j−1,s.

The transition distributions are given by

• T uj,s ∼ Binomial(Nuj,s − Cj,s, pmove

j,s

), for j = 1, . . . ,Kpool

• Tmi,i,s ∼ Binomial(Ri,s, p

movei,s

)Tmi,j,s ∼ Binomial

(Nmi,j,s −Mi,j,s, p

movej,s

), for j = i+ 1, . . . ,Kpool.

• Duj,s ∼ Binomial

(T uj,s +Auj,s, 1− φj,s

), for j = 1, . . . ,Kcarc − 1

• Dmi,j,s ∼ Binomial

(Tmi,j,s +Ami,j,s, 1− φj,s

), for j = i, . . . ,Kcarc − 1

• F uj,s ∼ Binomial(Xuj,s, p

flushj

), for j = 2, . . . ,Kcarc − 1

56

Page 67: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

• Fmi,j,s ∼ Binomial(Xmi,j,s, p

flushj

), for j = 2, . . . ,Kcarc − 1

The distributions used to relate the data to the latent variables are given by

• Zuj,s ∼ Binomial(Xuj,s, p

recov), for carcass survey days j ∈ 1, . . . ,Kcarc

• Zmi,j,s ∼ Binomial(Xmi,j,s, p

recov), for carcass survey days j ∈ i+ 1, . . . ,Kcarc

• Y uj ∼ Binomial

(Auj,f +Auj,m + ∆j , p

snorj

), for snorkel survey days j ∈ 1, . . . ,Kcarc

• Y mj ∼ Binomial

(∑all possible i

[Ami,j,f +Ami,j,m

]−∆j , p

snorj

), for snorkel survey days j ∈

1, . . . ,Kcarc

• logit(psnorj ) ∼ Normal(logit(µsnorlow ) + αvj , σ

2vj

), for snorkel survey days j ∈ 1, . . . ,Kcarc

• ∆j ∼ Binomial(∑

all possible i

[Nmi,j,f +Nm

i,j,m

], p∆

), for snorkel survey days j ∈ 1, . . . ,Kcarc.

Quantities of interest can be calculated once a posterior sample is obtained. Theseinclude escapement (per sex and total), population size in the pool (per sex, over time),mean stopover time in the pool (in days, per sex and arrival day in the pool), mean residencetime in the spawning area (in days, per sex and arrival day in the spawning area) andmedian observer efficiency for the snorkel survey (per visibility category). The formulasused to calculate the latter quantities are given in Table 4.4.

4.6 Analysis of the 2012 data

The 2012 data were collected following the timeline shown in Figure 4.3. A summary ofthe data is given in Figure 4.4. For the data analysis, we focused on the study period ofSeptember 10th 2012 to October 27th 2012, which spans from the first day of live capturesto the last day of carcass recoveries. In other words, for this analysis we did not considerthe snorkel data collected outside of this period.

Over the course of the capture-recapture survey, a total of 1179 adult Chinook salmonwere tagged, of which 35% were females. A total of 348 recaptures occured with most beingrecaptured once. Over the course of the carcass survey, 299 adult carcasses were recovered,of which 15% were marked males, 7% were marked females, 40% were unmarked malesand 38% were unmarked females. Snorkel counts varied between 7 (November 10th 2012,high visibility) and 725 (October 17th 2012, medium visibility). In the 2012 study, thehatchery removed 25 females and 55 males on September 21st and 67 females and 47 maleson September 22nd.

57

Page 68: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Hatchery removals

Snorkel counts

Dead recoveries

Live captures

07−

Sep

14−

Sep

21−

Sep

28−

Sep

05−

Oct

12−

Oct

19−

Oct

26−

Oct

02−

Nov

09−

Nov

Figure 4.3: Timeline when surveys were performed in 2012. Each occurrence is denoted bya symbol “×”. Adjacent symbols correspond to consecutive days.

020

040

060

080

0

Num

ber

of fi

sh

07−

Sep

14−

Sep

21−

Sep

28−

Sep

05−

Oct

12−

Oct

19−

Oct

26−

Oct

02−

Nov

09−

Nov

Types of data

Snorkel countsNew live capturesLive recapturesCarcass recoveries

Figure 4.4: Summary time series of the 2012 data.

58

Page 69: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Migration timing of Chinook salmon in the Burman River is believed to be related towater discharge. Figure 4.5 shows the daily water discharge measured at Gold River, nearthe Burman River. The first large water discharge in 2012 is observed on October 14th.Concurrently, the last capture-recapture survey with positive catch took place on October11th. This was followed by three capture-recapture occasions with no catch. Unfortunately,because no data was recorded on those days, those three exact dates are unknown butwe hypothesize that they must have been after October 14th. Hence, for the integratedpopulation model we assume that all fish have left the stopover area by October 16th andwe ensure this by setting pmove

j,s = 1 on October 15th.

Aug 15 Sep 01 Sep 15 Oct 01 Oct 15 Nov 01 Nov 15

050

100

150

200

250

300

350

Mea

n da

ily d

isch

arge

at G

old

Riv

er in

201

2 (m

3 s−1) Oct.14 Oct.19

Figure 4.5: Daily discharge measured at Gold River over the 2012 migration period. Al-though discharge data is not available at Burman River, the data at nearby Gold River arethought to be a good proxy for Burman River. The first big freshet occurred on October14th.

For our Bayesian approach, we fit the models using the JAGS software. Before runningthe chains, we ran an adaptation/burn-in phase of 250,000 iterations. We ran the chains for2.5 million iterations, thinned by a factor of 10 and summarized the results using marginalposterior means and highest probability density (HPD) credible intervals. Convergence wasassessed through traceplots.

Whenever possible, we use the same priors for the Jolly-Seber approach and the in-tegrated population modeling approach. The priors on the number of entrants, Bj,s, areUniform(0,400), rounded to the nearest integer. For the various probabilities pmove

j,s , φj,s,pcapti,s , µsnorlow , p∆, pcarc, pflushj we use Beta(1,1) distributions, which are equivalent to Uni-form(0,1). For the visibility effects, we use a Gamma(shape = 0.5, rate = 0.005) prior onαmedium and αunknown. The positivity of the Gamma prior ensures that observer efficiency

59

Page 70: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Jolly-Seber model Integrated modelEstimate 95 % CI Estimate 95 % CI

Male escapement 3022 (2548-3511) 2411 (2154-2670)Female escapement 2443 (1798-3120) 2878 (2316-3437)Total escapement 5465 (4641-6293) 5289 (4710-5913)

Table 4.6: Escapement estimates obtained from the Jolly-Seber model and the integratedpopulation model. The formulas used to calculate escapement are given in Table 4.4. CIdenotes credible intervals.

is higher for medium visibility than low visibility because αlow = 0. On the same line ofthought, to form a prior on αhigh that ensures higher observer efficiency than at mediumvisibility, we add a Gamma(shape = 0.5, rate = 0.005) effect to the prior on αmedium.Finally, the priors on σv are Uniform(0,4).

The escapement estimates obtained with the Jolly-Seber approach and the integratedpopulation modeling approach are displayed in Table C.1. With the Jolly-Seber model,the escapement estimate is larger for males than females whereas the integrated populationmodeling shows opposite behavior in the estimates. However, credible intervals for malesand females overlap for both methods. Note that more males than females were markedin the capture-recapture survey but this does not imply that male escapement is largerthan female escapement because capture probabilities may depend on sex. Our escapementestimates can be compared with those obtained under a frequentist implementation of theJolly-Seber (POPAN) model using the software MARK, see Appendix C.1.

Figures 4.6 and 4.7 show plots of results obtained with the Jolly-Seber approach; Figures4.8 to 4.11 show plots of results obtained with the integrated population modeling approach.

For the Jolly-Seber approach, the number of Chinook salmon in the stopover pool atmidday can only be estimated up to October 11th, the last capture-recapture occasion withpositive catch. On October 11th, the population in the pool is near its maximum.

The mean stopover time estimates from the Jolly-Seber model are typically lower thanthose from the integrated population model. This was expected because the calculation ofstopover time with the Jolly-Seber model is truncated at the last capture-recapture occasionwith positive catch and does not account for fish still in the pool after this time.

The plots representing the number of individuals alive in the stream in Figure 4.11 showthat the population in the pool peaked shortly after the big freshet on October 14th.

The mean residence time plots in Figure 4.10 show the estimated mean residence timeas a function of the date of entry in the spawning area. Residence time assessment is animportant component for the use of the AUC method by DFO.

60

Page 71: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

0

500

1000

1500

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08

Num

ber

of fe

mal

es in

the

tagg

ing

pool

0

500

1000

1500

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08

Num

ber

of m

ales

in th

eta

ggin

g po

ol

Figure 4.6: Estimates of the population size in the pool obtained using the Jolly-Seber model based on the formula in Table 4.4. Eachestimate is represented along with a 95 % HPD credible interval.

0.0

2.5

5.0

7.5

10.0

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08

Mea

n st

opov

er ti

me

for

fem

ales

, by

arriv

al ti

me

in th

e ta

ggin

g po

ol (

days

)

0.0

2.5

5.0

7.5

10.0

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08

Mea

n st

opov

er ti

me

for

mal

es, b

yar

rival

tim

e in

the

tagg

ing

pool

(da

ys)

Figure 4.7: Stopover time estimates obtained using the Jolly-Seber model based on the formula in Table 4.4. Each estimate is representedalong with a 95 % HPD credible interval.

61

Page 72: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

0

400

800

1200

1600

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15

Num

ber

of fe

mal

es in

the

tagg

ing

pool

0

400

800

1200

1600

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15

Num

ber

of m

ales

in th

eta

ggin

g po

ol

Figure 4.8: Estimates of the population size in the tagging pool obtained using the integrated population modeling approach based onthe formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval.

0

5

10

15

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15Mea

n st

opov

er ti

me

for

fem

ales

, by

arriv

al ti

me

in th

e ta

ggin

g po

ol (

days

)

0

5

10

15

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15M

ean

stop

over

tim

e fo

r m

ales

, by

arriv

al ti

me

in th

e ta

ggin

g po

ol (

days

)

Figure 4.9: Stopover time estimates obtained using the integrated population modeling approach based on the formula in Table 4.4.Each estimate is represented along with a 95 % HPD credible interval.

62

Page 73: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

0.0

2.5

5.0

7.5

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15

Mea

n sp

awni

ng a

rea

resi

denc

etim

e of

fem

ales

,by

arriv

al ti

me

in th

e sp

awni

ng a

rea

(day

s)

0.0

2.5

5.0

7.5

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15

Mea

n sp

awni

ng a

rea

resi

denc

etim

e of

mal

es,b

y ar

rival

tim

ein

the

spaw

ning

are

a (d

ays)

Figure 4.10: Residence time estimates obtained using the integrated population modeling approach based on the formula in Table 4.4.Each estimate is represented along with a 95 % HPD credible interval.

0

500

1000

1500

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Oct 22 Oct 29

Num

ber

of a

live

fem

ales

inth

e sp

awni

ng a

rea

0

500

1000

1500

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Oct 22 Oct 29

Num

ber

of a

live

mal

es in

the

spaw

ning

are

a

Figure 4.11: Estimates of alive population size in the spawning area obtained using the integrated population modeling approach basedon the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval.

63

Page 74: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Observer efficiencyEstimate 95 HPD % CI

Low visibility 0.35 (0.01-0.73)Medium visibility 0.65 (0.29-0.95)

High visibility 0.77 (0.60-0.97)Unknown visibility 0.84 (0.45-1.00)

Table 4.7: Integrated population modeling marginal estimates and credible intervals ofobserver efficiency in the snorkel survey, based on the fish visibility covariate.

Table 4.7 presents the observer efficiency estimates obtained with the integrated popu-lation model. This information is pertinent to the use of the AUC method.

With the integrated population modeling approach, the probability to recover a deadcarcass present in the stream during a carcass survey occasion is estimated at 0.11 (95% HPDCI of 0.06-0.17). This estimate is quite higher than the proportion of marked individualsrecovered dead (0.06) because it is conditional upon the carcasses not being flushed.

4.6.1 Assessment of the integrated population model

Model fit can be assessed using posterior predictive p-values, also called Bayesian p-values(Meng, 1994; Gelman et al., 2003). Approximate p-values are computed as follows basedon a sample of size ν from the posterior distribution:

p = 1ν

ν∑k=1

1[D(W′k,θk) > D(W,θk)], (4.4)

where D is a discrepancy measure, W is the data, θk is the kth posterior sample and W′k

is data simulated from θk using the model.We used (4.4) to assess the capture-recapture and snorkel components of the integrated

population model. A more thorough model assessment was not possible in the given time-frame due to not saving all parameter values from the MCMC run and the algorithm beingcomputationally demanding. We used a sample size of ν = 5000. For the discrepancies, weused Freeman-Tukey statistics (Freeman and Tukey, 1950) which have the form (

√O−√E)2,

where O is observed (or simulated) count data and E is the expectation of the count underthe model. We use, for the capture-recapture and snorkel components respectively:

D1(W,θ) =

√Rj,s −√√√√√Nu

j,s +∑

all possible iNmi,j,s

pcaptj,s

2

D2(W,θ) =

√Y uj + Y m

j −

√√√√√ ∑s∈m,f

Auj,s +∑

all possible iAmi,j,s

logit−1(logit(µsnorlow ) + αvj

)2

.

64

Page 75: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Bayesian p-values lie between 0 and 1. A Bayesian p-value not “close” to 0.5 suggeststhat the simulated data and observed data discrepancies are significantly different henceproviding evidence of a poor fit of the model. Figure 4.12 shows the p-values computed forassessing the capture-recapture component of the model. The average of all p-values onthe graph is 0.51. The p-values which are the farthest from 0.5 are from the beginning andend of the capture-recapture survey. This suggests that the assumption that the captureprobabilities are the same at the first two and last two occasions might be inappropriate.Figure 4.12 shows the p-values computed for assessing the snorkel survey component of themodel. The average of all p-values on the graph is 0.49. We do not have evidence that thesnorkel survey component of the model does not fit well, except maybe for the second lastsurvey occasion, which could simply be an outlier.

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08

0.0

0.2

0.4

0.6

0.8

1.0

Bay

esia

n p−

valu

e

Sex

MaleFemale

Figure 4.12: Bayesian p-values for the assessment of the capture-recapture component ofthe integrated population model, using discrepancy D1.

65

Page 76: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Oct 22

0.0

0.2

0.4

0.6

0.8

1.0

Bay

esia

n p−

valu

e

Figure 4.13: Bayesian p-values for the assessment of the snorkel survey component of theintegrated population model, using discrepancy D2.

4.7 Discussion

In this paper, we developed an integrated population model to integrate capture-recapturedata along with carcass recovery data and snorkel survey data to gain insight on the biologi-cal processes driving the migration of Chinook salmon populations that use a single stopoverpool on their migration route. An integrated population approach has the advantage over aJolly-Seber approach of providing insight on snorkel observer efficiency and mean residencetime spent in the counting area. These two issues are crucial to assess in order to use thearea-under-the-curve method properly, which is likely to remain the method of choice forthe coming years on the West Coast of Vancouver Island. Radio-tagging surveys could alsoprovide insight on observer efficiency and mean residence time, but they are typically moreexpensive.

In future work, we want to apply the integrated population modeling methodology to all2009-2014 study years, compare different modeling assumptions and reflect on similaritiesand differences in estimates between years. Much remains to be done, namely evaluatingthe potential impact of transients and incorporating tag loss, loss on capture and hatcheryremovals of marked fish in the models. We are also interested in studying the relationshipbetween stopover time, residence time and time of the first big freshet. Work by Dunlop(2015) suggests that this relationship is strong and thus residence time needed for the area-under-the-curve method could be estimated yearly from the time of the first big freshetalone.

66

Page 77: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Bibliography

[1] Abadi, F., Gimenez, O., Arlettaz, R., and Schaub, M. (2010). An assessment of inte-grated population models: bias, accuracy, and violation of the assumption of indepen-dence. Ecology, 91, 7-14.

[2] Béliveau A., Lockhart R.A., Schwarz C. J. and Arndt S.K. (2015). Adjusting for un-dercoverage of access-points in creel surveys with fewer overflights. Biometrics, doi:10.1111/biom.12335.

[3] Besbeas, P. and Morgan, B.J.T. (2012). Kalman Filter Initialization for IntegratedPopulation Modelling. Journal of the Royal Statistical Society: Series C, 61, 151-162.

[4] Besbeas, P., Borysiewicz, R.S. and Morgan, B.J.T. (2008) Completing the ecologicaljigsaw. In: Thomson D.L., Cooch E.G. and Conroy M.J., eds. Modeling DemographicProcesses in Marked Populations. Environmental and Ecological Statistics, 3. Springer,New York, pp. 513-539.

[5] Besbeas, P., Lebreton, J.-D. and Morgan, B.J.T. (2003). The Efficient Integration ofAbundance and Demographic Data. Journal of the Royal Statistical Society: Series C,52, 95-102.

[6] Besbeas, P., Freeman, S.N., Morgan, B.J.T. and Catchpole E.A. (2002). IntegratingMark-Recapture-Recovery and Census Data to Estimate Animal Abundance and De-mographic Parameters. Biometrics, 58, 540-547.

[7] Brooks S.P., Catchpole E.A. and Morgan B.J.T. (2000). Bayesian animal survival es-timation. Statistical Science, 15, 357-376.

[8] Buckland, S.T., Newman, K.B., Thomas, L. and Koesters, N.B. (2004). State-SpaceModels for the Dynamics of Wild Animal Populations. Ecological Modelling, 171, 157-175.

[9] Chandler, R.B. and Clark, J.D. (2014). Spatially explicit integrated population models.Methods in Ecology and Evolution, 5, 1351-1360.

[10] Cochran W.G. (1977). Sampling techniques. 3rd edition. New York: John Wiley.

[11] Dauk P.C. and Schwarz C.J. (2001). Catch estimation with restricted randomizationin the effort survey. Biometrics 57, 461–468.

[12] Dunlop R.H. (2015). Open population mark-recapture estimation of ocean-type Chinooksalmon spawning escapements at stopover sites on the west coast of Vancouver Island.

67

Page 78: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Report prepared for the Sentinel Stocks & Southern Boundary and Enhancement Com-mittee, Pacific Salmon Commission.

[13] DFO. (2014). Proceedings of the Regional Peer Review on the West Coast VancouverIsland Chinook Salmon Escapement Estimation and Stock Aggregation Procedures;June 18-20, 2013. DFO Can. Sci. Advis. Sec., Proceed Ser. 2014/025.

[14] DFO. (2012). Assessment of west coast Vancouver Island Chinook and 2010 Forecast.DFO Can. Sci. Advis. Sec. Sci. Advis. Rep. 2011/032.

[15] English K.K., Bocking R.C. and Irvine J.R. (1992). A robust procedure for estimatingsalmon escapement based on the area-under-the-curve method. Canadian Journal ofFisheries and Aquatic Science, 49, 1982-1989.

[16] Freeman M.F., Tukey J.W. (1950). Transformations related to the angular and thesquare root. Ann. Math. Statist., 21, 607-611.

[17] Gelman A., Meng X-L. and Stern H. (1996). Posterior predictive assessment of modelfitness via realized discrepancies. Statistica Sinica, 6, 733-807.

[18] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models.Bayesian Analysis, 1, 515-533.

[19] Hilborn R., Bue B.G. and Sharr S. (1999). Estimating spawning escapements fromperiodic counts: a comparison of methods. Canadian Journal of Fisheries and AquaticScience, 56, 888-896.

[20] Isaki C.T. and Fuller W.W. (1982). Survey design under the regression superpopulationmodel. Journal of the American Statistical Association 77, 89-96.

[21] Kéry, M. and Schaub, M. (2012). Bayesian Population Analysis Using WinBUGS: AHierarchical Perspective. Academic Press, Burlington, MA.

[22] King, R. (2012). A review of Bayesian state-space modelling of capture-recapture-recovery data. Interface Focus, 2, 190-204.

[23] Koller D. and Friedman N. (2009). Probabilistic Graphical Models: Principles and Tech-niques. The MIT press. Cambridge, MA.

[24] Lee A.M., Bjørkvoll E.M., Hansen B.B., Albon S.D., Stien A., Sæther B.-E., EngenS., Veiberg V., Loe L.E. and Grøtan V. (2015). An integrated population model for along-lived ungulate: more efficient data use with Bayesian methods. Oikos. In press.

[25] Lohr S.L. (2009). Sampling: Design and Analysis. Second Edition. Boston, MA:Brooks/Cole Cengage Learning.

[26] Matechou, E., Morgan, B.J.T., Pledger, S., Collazo, J. and Lyons J. (2013). IntegratedAnalysis of Capture-Recapture-Resighting Data and Counts of Unmarked Birds atStop-Over Sites. Journal of Agricultural, Biological, and Environmental Statistics, 18,120-135.

[27] Mazzetta, C., Morgan, B.J.T. and Coulson T. (2010). A state-space modelling approachto population size estimation. University of Warwick institutional repository, 1-27.

68

Page 79: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

[28] McCrea, R.S., Morgan, B.J.T., Gimenez, O., Besbeas, P., Lebreton, J.-D. and Bregn-balle, T. (2010). Multi-Site Integrated Population Modelling. Journal of Agricultural,Biological, and Environmental Statistics, 15, 539-561.

[29] Meng X-L. (1994). Posterior Predictive p-Values. The Annals of Statistics, 22, 1142-1160.

[30] Parken C.K., Bailey R.E. and Irvine J.R. (2003). Incorporating Uncertainty into Area-under-the-Curve and Peak Count Salmon Escapement Estimation. North AmericanJournal of Fisheries Management, 23, 78-90.

[31] Pollock K.H., Jones C.M. and Brown T.L. (1994) Angler Survey Methods and their Ap-plications in Fisheries Management. Bethesda, Maryland: American Fisheries SocietySpecial Publication 25.

[32] Särndal C-E., Swensson B. and Wretman J. (1992). Model Assisted Survey Sampling.New York: Springer-Verlag.

[33] Schaub, M., Gimenez, O., Sierro, A. and Arlettaz, R. (2007). Use of Integrated Mod-eling to Enhance Estimates of Population Dynamics Obtained from Limited Data.Conservation Biology, 21, 945-955.

[34] Schaub, M. and Abadi, F. (2011). Integrated population models: a novel analysisframework for deeper insights into population dynamics. Journal of Ornithology, 152(Suppl 1), S227-S237.

[35] Schwarz C.J. and Arnason A.N. (1996). A General Methodology for the Analysis ofCapture-Recapture Experiments in Open Populations. Biometrics, 52, 860-873.

[36] Sierro A., Lugon, A. and Arlettaz, R. (2009). La colonie de grands rhinolophes Rhi-nolophus ferrumequinum de l’église St-Sylve à Vex (Valais, Suisse): évolution sur deuxdécennies (1986-2006). Le Rhinolophe, 18, 75-82

[37] Stoklosa, J., Hwang, W-H., Wu, S-H. and Huggins, R. (2011). Heterogeneous Capture-Recapture Models with Covariates: A Partial Likelihood Approach for Closed Popula-tions. Biometrics, 67, 1659-1665.

[38] Varin C., Reid N. and Firth, D. (2011). An Overview of Composite Likelihood Methods.Statistica Sinica, 21, 5-42.

69

Page 80: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Appendix A

Supplementary materials forChapter 2

A.1 First-order Taylor expansions

N−1C1.= 1ng

∑i∈sg

(CiAoU

AgU

)+ 1no

∑i∈so

(CU

Aoi

AgU− CU

AoU

AgU

Agi

AgU

)

N−1C2.= 1ng

∑i∈sg

CiRU + 1no

∑i∈so

CURi − CU RU , where Ri = AoiAgi

N−1CR.= 1ng

∑i∈sg

CiyU

CU+ 1no

∑i∈so

(yi − Ci

yU

CU

)

A.2 Assumptions, propositions and proofs for the study ofErrparty

A.2.1 Assumptions

(A1.) For every i ∈ U , Ii1, . . . ,IiMi are independent conditionally on (Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi)

(A2.) For every i ∈ U and j ∈ Vi,

Iij |(Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi) ∼ Bernoulli(pi);

(A3.) For every i ∈ U , Aoip→∞;

(A4.) For every i ∈ U , CVi(cij) ≡

√1Mi

∑j∈Vi

(cij− 1

MiC∗i

)2

1Mi

C∗i= op(

√Mi).

70

Page 81: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

A.2.2 Study of Errparty for the estimators CR and CDE

Proposition 1. Under assumptions (A1.)–(A4.),

Errparty(C)C∗

→ 0 in probability,

where C stands for either CDE or CR.

Proof. We start by bounding the absolute relative error in the following way:

|Errparty(C)|C∗

=

∣∣∣∣∣ ∑i∈Uyi − ∑i∈U

C∗i

∣∣∣∣∣∑i∈U

C∗i

=

∣∣∣∣∣ ∑i∈UεiC∗i∣∣∣∣∣∑

i∈UC∗i

∑i∈U|εi|C∗i∑

i∈UC∗i

∑i∈U

maxi∈U

(|εi|)C∗i∑i∈U

C∗i

= maxi∈U

(|εi|),

where

εi = yi − C∗iC∗i

=Aoipi −Agi

Aoipi

(Agi −Aoipi

Aoipi+ 1

)−1+ 1

(Ci − C∗i piC∗i pi

− Agi −AoipiAoipi

).

Noting that Agi and Aoi can be expressed as Agi =∑j∈Vi

δijIij and Aoi =∑j∈Vi

δij , respectively,

conditional moment calculations show that

E(Agi −Aoipi

Aoipi

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)= Aoipi −Aoipi

Aoipi= 0,

71

Page 82: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

using assumption (A2.) and

Var(Agi −Aoipi

Aoipi

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)= Aoipi(1− pi)

A2oip

2i

= (1− pi)Aoipi

→ 0 in probability

using assumptions (A1.)–(A3.).

Also, noting that Ci and C∗i can be expressed as Ci =∑j∈Vi

cijIij and C∗i =∑j∈Vi

cij , respec-

tively, conditional moment calculations show that

E(Ci − C∗i piC∗i pi

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)= C∗i pi − C∗i pi

C∗i pi= 0,

using assumption (A2.) and

Var(Ci − C∗i piC∗i pi

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)=pi(1− pi)

∑j∈Vi c

2ij

C∗i2p2i

= (1− pi)CVi(cij)2 + 1piMi

→ 0 in probability

using assumptions (A1.), (A2.) and (A4.). Then, applying Chebyshev’s inequality, we havethat for any ε > 0,

P

(∣∣∣∣Agi −AoipiAoipi

∣∣∣∣ ≥ ε∣∣∣Mi,δi1, . . . ,δiMi

)→ 0

andP

(∣∣∣∣∣Ci − C∗i piC∗i pi

∣∣∣∣∣ ≥ ε∣∣∣Mi,δi1, . . . ,δiMi

)→ 0

which imply

P

(∣∣∣∣∣Errparty(C)C∗

∣∣∣∣∣ ≥ ε∣∣∣Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi

)→ 0 in probability.

Taking expected value and applying the Dominated Convergence Theorem gives the con-clusion of the proposition.

A.2.3 Study of Errparty for the estimator C1

Proposition 2. If assumptions (A1.)–(A4.) are satisfied, then for any ε > 0,

P

(mini∈U (pi)maxi∈U (pi)

− 1− ε ≤ Errparty(C1)C∗

≤ maxi∈U (pi)mini∈U (pi)

− 1 + ε

)→ 1,

72

Page 83: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Proof. Errparty(C1) can be bounded above in the following way:

Errparty(C1)C∗

=

∑i∈U Ci

∑i∈U Aoi∑i∈U Agi

−∑i∈U C

∗i∑

i∈U C∗i

=(∑

i∈U C∗i pi∑

i∈U C∗i

∑i∈U Aoi∑i∈U Aoipi

)(∑i∈U Ci − C∗i pi∑

i∈U C∗i pi

+ 1)

(A.1)

×

∑i∈U Aoipi −Agi∑

i∈U Aoipi

(∑i∈U Agi −Aoipi∑

i∈U Aoipi+ 1

)−1

+ 1

− 1

≤ maxi∈U (pi)mini∈U (pi)

(∑i∈U Ci − C∗i pi∑

i∈U C∗i pi

+ 1)

×

∑i∈U Aoipi −Agi∑

i∈U Aoipi

(∑i∈U Agi −Aoipi∑

i∈U Aoipi+ 1

)−1

+ 1

− 1.

Similarly, we can obtain a lower bound:

Errparty(C1)C∗

≥ mini∈U (pi)maxi∈U (pi)

(∑i∈U Ci − C∗i pi∑

i∈U C∗i pi

+ 1)

×

∑i∈U Aoipi −Agi∑

i∈U Aoipi

(∑i∈U Agi −Aoipi∑

i∈U Aoipi+ 1

)−1

+ 1

− 1.

The first conditional moment computations are analogous to the ones in Appendix A.2.2:

E(Agi −Aoipi∑j∈U Aojpj

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)= E

(Ci − C∗i pi∑j∈U C

∗j pj

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)= 0.

Also following from Appendix A.2.2:

Var(Agi −Aoipi∑j∈U Aojpj

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)≤ Var

(Agi −Aoipi

Aoipi

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)→ 0 in probability

and

Var(Ci − C∗i pi∑j∈U C

∗j pj

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)≤ Var

(Ci − C∗i piC∗i pi

∣∣∣∣∣Mi,δi1, . . . ,δiMi

)→ 0 in probability.

73

Page 84: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Then, we use Chebyshev’s inequality the same way as in Appendix A.2.2 and obtain thatfor any ε > 0,

P

(mini∈U (pi)maxi∈U (pi)

− 1− ε ≤ Errparty(C1)C∗

≤ maxi∈U (pi)mini∈U (pi)

− 1 + ε∣∣∣Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi

)→ 1 in probability.

Taking expected value and applying the Dominated Convergence Theorem gives the con-clusion of the proposition.

Proposition 3. If assumptions (A1.)–(A4.) are satisfied and if, for every i ∈ U ,

A5. C∗iMi

= µci +Op( 1√Mi

)

A6. AoiMi

= µδi +Op( 1√Mi

)

thenErrparty(C1)

C∗−(∑

i∈U µcipi∑i∈U µci

∑i∈U µδi∑i∈U µδipi

− 1)→ 0 in probability.

Proof. In the expression for the relative error due to party sampling (A.1), we develop thefactor

(∑i∈U C

∗i pi∑

i∈U C∗i

∑i∈U Aoi∑i∈U Aoipi

)into a Taylor series expansion at the point

(C∗1M1

, . . . ,C∗NMN

,Ao1M1

, . . . ,AoNMN

)= (µc1, . . . ,µcN ,µδ1, . . . ,µδN ).

It is then easy to see that(∑i∈U C

∗i pi∑

i∈U C∗i

∑i∈U Aoi∑i∈U Aoipi

)=(∑

i∈U µciMipi∑i∈U µciMi

∑i∈U µδiMi∑i∈U µδiMipi

)+Op

( 1√Mi

),

under assumptions (A5.) and (A6.). Then, using the conditional moment calculations andChebyshev’s inequalities from Appendix A.2.3, and noting that Op

(1√Mi

)is already op(1),

we get : for every ε > 0,

P

∣∣∣∣∣Errparty(C1)C∗

−(∑

i∈U µcipi∑i∈U µci

∑i∈U µδi∑i∈U µδipi

− 1)∣∣∣∣∣ > ε

∣∣∣Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi

→ 0 in probability.

Taking expected value and applying the Dominated Convergence Theorem gives the con-clusion of the proposition.

A.2.4 Study of Errparty for the estimator C2

Proposition 4. If assumptions (A1.)–(A4.) are satisfied, then for any ε > 0,

P

(mini∈U (pi)maxi∈U (pi)

− 1− ε ≤ Errparty(C2)C∗

≤ maxi∈U (pi)mini∈U (pi)

− 1 + ε

)→ 1,

74

Page 85: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Proof. Errparty(C2) can be bounded above in the following way:

Errparty(C2)C∗

=(∑i∈U Ci)

(1N

∑i∈U

AoiAgi

)−∑i∈U C

∗i∑

i∈U C∗i

=(∑

i∈U C∗i pi∑

i∈U C∗i

)∑i∈U (Ci − C∗i pi)∑

i∈U C∗i pi

+ 1

×

1N

∑i∈U

(Aoipi −Agi)Aoipi

(Agi −Aoipi

Aoipi+ 1

)−1 1pi

+ 1pi

− 1

≤ maxi∈U

(pi)∑

i∈U (Ci − C∗i pi)∑i∈U C

∗i pi

+ 1

×

1N

∑i∈U

(Aoipi −Agi)Aoipi

(Agi −Aoipi

Aoipi+ 1

)−1 1pi

+ 1mini∈U (pi)

− 1.

Similarly, we can obtain a lower bound:

Errparty(C2)C∗

≥ mini∈U

(pi)∑

i∈U (Ci − C∗i pi)∑i∈U C

∗i pi

+ 1

×

1N

∑i∈U

(Aoipi −Agi)Aoipi

(Agi −Aoipi

Aoipi+ 1

)−1 1pi

+ 1maxi∈U (pi)

− 1.

From here, the rest of the proof is analogous to the proof in Appendix A.2.3.

75

Page 86: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

A.3 Proof of the Optimal Allocation

Introducing the Lagrange multiplier λ in the asymptotic variance formula, the function tominimize is

N2(

1ng− 1N

)S2α +N2

(1no− 1ng

)S2β + λ (ngCg + noCo −B) .

Setting the derivative with respect to ng equal to zero gives the equation

−N2S2

α

n2g

+N2S2

β

n2g

+ λCg = 0.

Similarly, setting the derivative with respect to no equal to zero gives

−N2S2

β

n2o

+ λCo = 0

and setting the derivative with respect to λ equal to zero gives

ngCg + noCo −B = 0.

Solving the first equation, we get

ng = N

√√√√S2α − S2

β

λCg. (A.2)

Similarly, solving the second equation, we get

no = N

√S2β

λCo. (A.3)

It remains to find the value of λ. Inserting (A.2) and (A.3) in the third equation gives

√λ = N

C

√Cg(S2

α − S2β) +

√CoS2

β

.

The final expressions for ng and no are obtained by inserting√λ in (A.2) and (A.3).

76

Page 87: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

A.4 Monte Carlo measures

The Monte Carlo relative bias due to the sampling of days is computed as

RBdaysMC(C)

=EMC

(C)− C

C∗,

where

EMC(C)

= 1K

K∑k=1

Ck

is the Monte Carlo expectation, with Ck representing the estimator C computed within thekth sample. The Monte Carlo relative root mean squared error is given by

RRMSEMC(C)

=

√MSEMC

(C)

C∗,

where

MSEMC(C)

= 1K

K∑k=1

(Ck − C∗

)2.

The Monte Carlo coverage probability of confidence intervals is given by

CPMC(C)

= 1K

K∑k=1

Ik,

where Ik is an indicator variable of the coverage of the true total C∗ by the 95% confidenceinterval Ck ± tng−1,0.975

√Var

(Ck). The Monte Carlo bias ratio is given by

BRMC(C)

=EMC

(C)− C∗√

VarMC(C) ,

where

VarMC(C)

= 1K

K∑k=1

Ck − EMC

(C)2

.

77

Page 88: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

A.5 Figures

Figure A.1: Time series of the ratio AgiAoi

, calculated from the Kootenay Lake data foroverflight survey days, separately for weekends (WE) and weekdays (WD).

78

Page 89: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Figure A.2: Time series of the quantity yi = CiAoiAgi

, calculated from the Kootenay Lakedata for overflight survey days, separately for weekends (WE) and weekdays (WD).

79

Page 90: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Figure A.3: Time series of the quantity Bi AoiAgi, where Bi is the number of parties interviewed

on day i. The quantity was calculated from the Kootenay Lake data for overflight surveydays, separately for weekends (WE) and weekdays (WD).

80

Page 91: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Appendix B

Supplementary materials forChapter 3

B.1 Monte Carlo measures used in the simulation study

Let θh be an estimate of a parameter θ obtained by analyzing the hth simulated dataset(out of H = 250) in a given scenario.

The Monte Carlo bias is calculated as

Bias(θ) = E(θ)− θ,

where

E(θ) = 1H

H∑h=1

θh.

The Monte Carlo root mean square error is calculated as

RMSE(θ) = 1H

H∑h=1

(θh − θ)2.

The Monte Carlo bias ratio is calculated as

BR(θ) = Bias(θ)√V(θ)

,

where

V(θ) = 1H

H∑h=1

θh − E(θ)

2.

81

Page 92: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

The Monte Carlo expected credible interval length is calculated as

E.LCI(θ) = 1H

H∑h=1

(Uθ,h − Lθ,h),

where Uθ,h and Lθ,h are respectively the upper and lower bound of the highest posteriordensity (HPD) credible interval for θ in the dataset h.

The Monte Carlo coverage probability of the credible interval is calculated as

CP(θ) = 1H

H∑h=1

I(Lθ,h ≤ θ ≤ Uθ,h),

where I is an indicator function.

Let θh,L be an estimator obtained by true joint likelihood modeling of dataset h and θh,Lcbe an estimator obtained by composite likelihood modeling of dataset h. The Monte Carloprobability that the true likelihood method has the smallest absolute error is calculated as

PS.AE(θ|L,Lc) = 1H

H∑h=1

I(|θh,L − θ| ≤ |θh,Lc − θ|

).

Let SDθ,h,L be the posterior standard deviation for parameter θ obtained by true joint likeli-hood modeling of dataset h and SDθ,h,Lc be the posterior standard deviation for parameterθ obtained by composite likelihood modeling of dataset h. The Monte Carlo probabilitythat the true likelihood method has the smallest posterior standard deviation is calculatedas

PS.SD(θ|L,Lc) = 1H

H∑h=1

I (SDθ,h,L ≤ SDθ,h,Lc) .

Let Lθ,h,L and Uθ,h,L be the lower and upper bounds of the credible interval obtained forθ by the true joint likelihood modeling of dataset h, and let Lθ,h,Lc and Uθ,h,Lc be thelower and upper bounds of the credible interval obtained for θ by the composite likelihoodmodeling of dataset h. The Monte Carlo probability that the true likelihood method hasthe smallest credible interval length is calculated as

PS.LCI(θ|L,Lc) = 1H

H∑h=1

I (Uθ,h,L − Lθ,h,L ≤ Uθ,h,Lc − Lθ,h,Lc) .

82

Page 93: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

B.2 Plots of the results of the simulation study

Survival probability

0.3 0.4 0.5 0.6 0.7

0.3

0.4

0.5

0.6

0.7

φ via the true joint likelihood approach

φ vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(a) Scenario 1

0.1 0.2 0.3 0.4

0.1

0.2

0.3

0.4

φ via the true joint likelihood approach

φ vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(b) Scenario 2

0.45 0.50 0.55

0.45

0.50

0.55

φ via the true joint likelihood approach

φ vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(c) Scenario 3

0.20 0.22 0.24 0.26 0.28 0.30

0.20

0.22

0.24

0.26

0.28

0.30

φ via the true joint likelihood approach

φ vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(d) Scenario 4

Figure B.1: Plots comparing the values of φ (posterior mean) obtained for the true jointlikelihood approach and the composite likelihood approach in the simulation study. Eachplot contains 250 data points. The points that lay in the gray hourglass region are thesimulation runs for which the value of φ obtained from the true joint likelihood approach iscloser (in absolute value) to the true parameter value φ. Note that the scales differ acrossplots.

83

Page 94: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Recapture probability

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

p via the true joint likelihood approach

p v

ia th

e co

mpo

site

like

lihoo

d ap

proa

ch

(a) Scenario 1

0.05 0.10 0.15 0.20 0.25 0.30 0.35

0.05

0.10

0.15

0.20

0.25

0.30

0.35

p via the true joint likelihood approachp

via

the

com

posi

te li

kelih

ood

appr

oach

(b) Scenario 2

0.40 0.45 0.50 0.55 0.60

0.40

0.45

0.50

0.55

0.60

p via the true joint likelihood approach

p v

ia th

e co

mpo

site

like

lihoo

d ap

proa

ch

(c) Scenario 3

0.4 0.5 0.6

0.4

0.5

0.6

p via the true joint likelihood approach

p v

ia th

e co

mpo

site

like

lihoo

d ap

proa

ch

(d) Scenario 4

Figure B.2: Plots comparing the values of p (posterior mean) obtained for the true jointlikelihood approach and the composite likelihood approach in the simulation study. Eachplot contains 250 data points. The points that lay in the gray hourglass region are thesimulation runs for which the value of p obtained from the true joint likelihood approach iscloser (in absolute value) to the true parameter value p. Note that the scales differ acrossplots.

84

Page 95: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Fecundity rate

0 1 2 3 4 5

01

23

45

f via the true joint likelihood approach

f vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(a) Scenario 1

3 4 5 6 7 8 9

34

56

78

9

f via the true joint likelihood approachf

via

the

com

posi

te li

kelih

ood

appr

oach

(b) Scenario 2

1.0 1.5 2.0 2.5 3.0

1.0

1.5

2.0

2.5

3.0

f via the true joint likelihood approach

f vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(c) Scenario 3

4 5 6 7 8

45

67

8

f via the true joint likelihood approach

f vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(d) Scenario 4

Figure B.3: Plots comparing the values of f (posterior mean) obtained for the true jointlikelihood approach and the composite likelihood approach in the simulation study. Eachplot contains 250 data points. The points that lay in the gray hourglass region are thesimulation runs for which the value of f obtained from the true joint likelihood approach iscloser (in absolute value) to the true parameter value f . Note that the scales differ acrossplots.

85

Page 96: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Initial population size

440 460 480 500 520 540 560

440

480

520

560

N1 via the true joint likelihood approach

N1

via

the

com

posi

te li

kelih

ood

appr

oach

(a) Scenario 1

450 500 550

450

500

550

N1 via the true joint likelihood approachN

1 vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(b) Scenario 2

440 460 480 500 520 540 560

440

480

520

560

N1 via the true joint likelihood approach

N1

via

the

com

posi

te li

kelih

ood

appr

oach

(c) Scenario 3

450 500 550

450

500

550

N1 via the true joint likelihood approach

N1

via

the

com

posi

te li

kelih

ood

appr

oach

(d) Scenario 4

Figure B.4: Plots comparing the values of N1 (posterior mean) obtained for the true jointlikelihood approach and the composite likelihood approach in the simulation study. Eachplot contains 250 data points. The points that lay in the gray hourglass region are thesimulation runs for which the value of N1 obtained from the true joint likelihood approachis closer (in absolute value) to the true parameter value N1. Note that the scales differacross plots.

86

Page 97: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Count variability

0 20 40 60

020

4060

σ via the true joint likelihood approach

σ vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(a) Scenario 1

0 20 40 60

020

4060

σ via the true joint likelihood approachσ

via

the

com

posi

te li

kelih

ood

appr

oach

(b) Scenario 2

0 20 40 60

020

4060

σ via the true joint likelihood approach

σ vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(c) Scenario 3

0 20 40 60 80

020

4060

80

σ via the true joint likelihood approach

σ vi

a th

e co

mpo

site

like

lihoo

d ap

proa

ch

(d) Scenario 4

Figure B.5: Plots comparing the values of σ (posterior mean) obtained for the true jointlikelihood approach and the composite likelihood approach in the simulation study. Eachplot contains 250 data points. The points that lay in the gray hourglass region are thesimulation runs for which the value of σ obtained from the true joint likelihood approach iscloser (in absolute value) to the true parameter value σ. Note that the scales differ acrossplots.

87

Page 98: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Survival probability

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

0.3

0.4

0.5

0.6

0.7

φ

(a) Scenario 1

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

0.15

0.20

0.25

0.30

0.35

0.40

0.45

φ

(b) Scenario 2

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

0.3

0.4

0.5

0.6

0.7

φ

(c) Scenario 3

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

0.15

0.20

0.25

0.30

0.35

0.40

0.45

φ

(d) Scenario 4

Figure B.6: Boxplots of φ (posterior mean) per scenario and estimation method. Thehorizontal line in each plot indicates the true value of the parameter φ in that scenario.

88

Page 99: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Recapture probability

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

0.1

0.2

0.3

0.4

0.5

p

(a) Scenario 1

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

0.1

0.2

0.3

0.4

0.5

p

(b) Scenario 2

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

p

(c) Scenario 3

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

p

(d) Scenario 4

Figure B.7: Boxplots of p (posterior mean) per scenario and estimation method. Thehorizontal line in each plot indicates the true value of the parameter p in that scenario.

89

Page 100: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Fecundity rate

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

1

2

3

4

5

f

(a) Scenario 1tr

ue jo

int

like

lihoo

d a

ppro

ach

com

posi

te

like

lihoo

d a

ppro

ach

3

4

5

6

7

8

9

f

(b) Scenario 2

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

1

2

3

4

5

f

(c) Scenario 3

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

3

4

5

6

7

8

9

f

(d) Scenario 4

Figure B.8: Boxplots of f (posterior mean) per scenario and estimation method. Thehorizontal line in each plot indicates the true value of the parameter f in that scenario.

90

Page 101: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Initial population size

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

400

450

500

550

600

N1

(a) Scenario 1

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

400

450

500

550

600

N1

(b) Scenario 2

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

400

450

500

550

600

N1

(c) Scenario 3

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

400

450

500

550

600

N1

(d) Scenario 4

Figure B.9: Boxplots of N1 (posterior mean) per scenario and estimation method. Thehorizontal line in each plot indicates the true value of the parameter N1 in that scenario.

91

Page 102: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Count variability

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

10

20

30

40

50

60

70

80

σ

(a) Scenario 1tr

ue jo

int

like

lihoo

d a

ppro

ach

com

posi

te

like

lihoo

d a

ppro

ach

10

20

30

40

50

60

70

80

σ

(b) Scenario 2

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

10

20

30

40

50

60

70

80

σ

(c) Scenario 3

true

join

t li

kelih

ood

app

roac

h

com

posi

te

like

lihoo

d a

ppro

ach

10

20

30

40

50

60

70

80

σ

(d) Scenario 4

Figure B.10: Boxplots of σ (posterior mean) per scenario and estimation method. Thehorizontal line in each plot indicates the true value of the parameter σ in that scenario.

92

Page 103: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

B.3 Bats data analysis

Parameters Mean Credible intervalLc L Lc L

First year survival (φ0) 0.45 0.41 (0.32-0.58) (0.30-0.52)Subsequent survival (φ≥1) 0.92 0.94 (0.88-0.97) (0.91-0.97)

Fecundity (f) 0.71 0.64 (0.49-0.95) (0.48-0.82)Presence of first year females (τ0,f ) 0.83 0.84 (0.63-0.99) (0.65-0.99)Presence of females ≥ 1 y.o. (τ≥1,f ) 0.91 0.89 (0.82-0.99) (0.79-0.98)Presence of first year males (τ0,m) 0.70 0.71 (0.43-0.96) (0.45-0.96)Presence of males ≥ 1 y.o. (τ≥1,m) 0.28 0.27 (0.12-0.47) (0.11-0.45)Standard deviation in counts (σ) 4.47 4.17 (2.28-7.03) (2.29-6.38)

Table B.1: Comparison of posterior means and credible intervals between the true likelihood(L) and the composite likelihood (Lc) analysis.

93

Page 104: Dataintegrationmethodsforstudying animalpopulationdynamicsstat.sfu.ca/content/dam/sfu/stat/alumnitheses/2015... · 2016-01-21 · Dataintegrationmethodsforstudying animalpopulationdynamics

Appendix C

Supplementary materials forChapter 4

C.1 Analysis of the 2012 capture-recapture data using thesoftware MARK

This section contains results from a frequentist analysis of the 2012 capture-recapture dataon Chinook at Burman River. The analysis was conducted using the software MARK. Asin Section 4.4, we use the POPAN formulation of the Jolly-Seber model. In addition, weset the capture probability on the first capture occasion, for each sex, equal to that on thesecond occasion. We impose the same equality on the last two occasions.

Estimate 95 % CIMale escapement 2767 (2141-3575)

Female escapement 1898 (1266-2845)Total escapement 4664 (3610-5718)

Table C.1: Escapement estimates obtained from analysis of the capture-recapture data inMARK. CI denotes confidence interval.

94


Recommended