Accounting for variable group sizes when determining availability of individuals during camera trap distance sampling (CTDS) more accurately predicts detection-level parameters and thus more accurately estimates abundance.
Load relevant packages into environment and include some custom functions.
knitr::opts_chunk$set(echo = FALSE)
library(tidyverse)
library(cmdstanr)
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)
register_knitr_engine(override = FALSE)
library(bayesplot)
library(data.table)
library(kableExtra)
get_closest<- function(n, max_distance) {
rho <- sqrt(runif(n)) * max_distance
theta <- runif(n, 0, 2*base::pi)
x <- rho * cos(theta)
y <- rho * sin(theta)
d<- sqrt(x^2 + y^2)
return(min(d))
}
get_closest_x_y <- function(n, max_distance) {
rho <- sqrt(runif(n)) * max_distance
theta <- runif(n, 0, 2*base::pi)
x <- rho * cos(theta)
y <- rho * sin(theta)
d <- sqrt(x^2 + y^2)
mind <- which(d == min(d))
return(data.frame(x = x[[mind]], y = y[[mind]]))
}
midpoints <- seq(from = 0.5, to = 12.5)
delta <- 1
max_distance <- 13
breaks <- 0:13
Camera trap distance sampling (CTDS) can be used to measure abundance of species. CTDS employs conventional distance sampling (DS) methods at a point and offsets the surveyed area by the time period a camera is deployed for. Key to CTDS are DS assumptions that detection rates decrease further from the observer (camera) and that the availability of an individual in the surveyed area (camera to truncation distance) follows a uniform distribution per metre-squared.
However, when CTDS is used to measure abundance of a species where group size is not fixed at 1 then this assumption is likely violated in practice. For larger groups, CTDS should account for the availability of the closest individual rather than the availability of all individuals. When group size increases it is more likely the the closest individual from the group is closer to the camera. If we assume that the triggering of the camera is dependent upon the closest individual than we must adjust our estimated availability to account for variable group sizes. If we do not adjust for group size and only use the distance to the closest individual for our DS models then we will likely underestimate the detection rate as distances will be smaller than reality. Alternatively, if we record distances to multiple individuals in the same photo and take an average or model them independently we would likely overestimate detection probability because individuals at further distances are recorded because a closer individual has triggered the camera trap.
Here we simulate camera trap distance sampling data with variable group sizes to compare the effect of taking into account an adjusted availability, which estimates the availability of the closest individual given a group of a size >= 1.
For illustration purposes we can visualize the probability of the closest individual in various group sizes falling within a given distance bin. For group sizes of 1, probability is uniform for a given area. However as group size increases the probability that the closest individual is closer to the camera (centre) increases.
#### Test with a simulation ####
# emp_mult <- matrix(data = NA, nrow = length(midpoints), ncol = 5)
#
# for(i in 1:5) {
# cat("ind ",i,"\n")
# d<- replicate(1e6, get_closest(n=i, max_distance))
# bins<- cut(d, breaks = breaks)
# emp_mult[,i]<- table(bins)/sum(table(bins))
# }
#
# list(Simultion = round(emp_mult, 3),
# Theoretical = round(pi_mult, 3))
xy <- list()
for(i in 1:5) {
cat("ind ",i,"\n")
xy[[i]] <- replicate(1e4, get_closest_x_y(n=i, max_distance)) %>%
str2str::a2lm() %>%
lapply(., function(x) as.data.frame(matrix(unlist(x, recursive = TRUE), ncol = 2, byrow = T))) %>%
bind_rows() %>%
`colnames<-`(c("x", "y")) %>%
mutate(`Group size` = i)
}
ind 1
ind 2
ind 3
ind 4
ind 5
xy_joined <- bind_rows(xy)
# Area + contour
ggplot(xy_joined, aes(x=x, y=y) ) +
stat_density_2d(aes(fill = ..level..), geom = "polygon", colour="white") +
facet_wrap(~ `Group size`, nrow = 3) +
scale_fill_viridis_c(option = "A") +
labs(x = "distance (x)", y = "distance (y)", fill = "Probability \nClosest individual \n at Location") +
theme_minimal()
Figure 1: Probability of availability under various group size scenarios
For convenience, we also create functions for the calculation of availability of the closest individual. This is valuable for more general applications to different datasets,
#' Availability calculation for a point transect
#'
#' @param delta bin width
#' @param midpoint bin midpoint
#' @param max_distance maximum distance of survey (truncation distance)
#'
#' @return numeric
availability_fn <- function(delta, midpoint, max_distance) {
(2 * delta * midpoint)/max_distance^2
}
#' Probability closest animal is in a given bin, given a provided bin and number of individuals in a photo
#'
#' @param bin_start start point of bin
#' @param bin_end end point of bin
#' @param max_distance maximum distance of survey (truncation distance)
#' @param n number of individuals in photo
#'
#' @return numeric
pr_closest <- function(bin_start, bin_end, max_distance, n) {
if(bin_start == 0) {
prob_closer <- 0
} else {
closer_midpoint <- (bin_start)/2
prob_closer <- 1 - (1 - availability_fn(bin_start, closer_midpoint, max_distance = max_distance))^n
}
if(bin_end == max_distance) {
prob_further <- 0
} else {
further_midpoint <- (bin_end+max_distance)/2
further_delta <- max_distance-bin_end
prob_further <- availability_fn(further_delta, further_midpoint, max_distance = max_distance)
}
# Combined probability
# probability that the closest indiv is in this bin is 1 minus the probability that the closest was in a closer bin, minus the probability that all individuals are in further bins
pi_m <- 1 - (prob_closer + prob_further^n)
return(pi_m)
}
Below we simulate our data for the model. Here we simulate point-transect data for camera traps across 10 sites. Each site has 250 entries by a species or groups of a species, with group sizes in these cases varying based on a gamma distribution with variable alpha
parameters across different simulated datasets.
We simulate detection probability using a half-normal detection function with a sigma value of 5.
A matrix of the availability for detections is generated using the ‘probability of closest individual’ described earlier.
# Simulate number of sites and number of entries for each site
sites <- 10
entries_per_site <- rep(250, sites)
total_entries <- sum(entries_per_site)
density_average <- total_entries/base::pi*max_distance^2
group_lambdas <- c(1,2,3,4,5)
counts <- lapply(group_lambdas, function(x) {ceiling(rgamma(n = total_entries, shape = x, scale = 0.5))})
unique_groups <- list()
length_unique_groups <- list()
group_sizes <- sapply(counts, max)
length_group_sizes <- length(group_sizes)
camera_data_base <- data.frame(Site = rep(1:sites, times = entries_per_site))
detected_prob <- function(distance, key = "hn", scale = 5) {
if(key == "hn") {
sigma <- scale
pr <- exp(-(distance)^2/(2*sigma^2))
}
return(pr)
}
# Create data for 3 scenarios with different group sizes
camera_data <- vector("list", length_group_sizes)
y <- vector("list", length_group_sizes)
y_a <- vector("list", length_group_sizes)
# y_dets <- vector("list", 5)
# pa <- vector("list", 5)
n_obs <- vector("list", length_group_sizes)
log_gs_cont <- vector("list", length_group_sizes)
for (i in 1:length(group_sizes)) {
camera_data[[i]] <- camera_data_base
camera_data[[i]]$count <- counts[[i]]
camera_data[[i]]$distance <- sapply(camera_data[[i]]$count, function(x) get_closest(n=x, max_distance))
camera_data[[i]]$detected_prob <- detected_prob(camera_data[[i]]$distance, scale = 5)
camera_data[[i]]$detected <- rbinom(n = total_entries, size = 1, prob = camera_data[[i]]$detected_prob)
camera_data[[i]] <- camera_data[[i]] %>%
mutate(detected_group_size = detected*count,
midpoint = case_when(detected == 1 ~ as.numeric(cut(distance, breaks = 0:max_distance, include.lowest = TRUE, right = FALSE, labels = midpoints)),
detected == 0 ~ NA_real_))
# replace unique groups base on detections
unique_groups[[i]] <- camera_data[[i]] %>%
filter(detected == 1) %>%
pull(detected_group_size) %>%
unique() %>%
sort()
length_unique_groups[[i]] <- length(unique_groups[[i]])
y[[i]] <- list()
for(j in 1:length(unique_groups[[i]])) {
y[[i]][[j]] <- camera_data[[i]] %>%
mutate(midpoint = factor(midpoint, levels = 1:length(midpoints)),
Site = factor(Site, levels = 1:sites)) %>%
group_by(Site, midpoint, .drop = FALSE) %>%
filter(detected_group_size == unique_groups[[i]][j]) %>%
summarise(n = length(detected_group_size)) %>%
as.data.table() %>%
dcast(Site ~ midpoint, value.var = "n") %>%
dplyr::select(-Site, - `NA`)
y[[i]][[j]][is.na(y[[i]][[j]])] <- 0
}
y_a[[i]] <- str2str::ld2a(y[[i]])
n_obs[[i]] <- camera_data[[i]] %>%
filter(detected == 1) %>%
mutate(Site = factor(Site, levels = 1:sites),
dgs = factor(detected_group_size, levels = unique_groups[[i]])) %>%
group_by(Site, dgs, .drop = TRUE) %>%
summarise(n = sum(detected, na.rm = T)) %>%
as.data.table() %>%
dcast(Site ~ dgs, value.var = "n") %>%
dplyr::select(-Site) %>%
as.matrix()
n_obs[[i]][is.na(n_obs[[i]])] <- 0
log_gs_cont[[i]] <- log((n_obs[[i]] %>% colSums())/(sum(n_obs[[i]])/(n_obs[[i]] %>% ncol())))
}
pa <- matrix(data = NA, nrow = max(group_sizes), ncol = length(midpoints))
pa_static <- matrix(data = NA, nrow = max(group_sizes), ncol = length(midpoints))
for(j in 1:max(group_sizes)) {
pa[j, ] <- sapply(midpoints, function(x) {pr_closest(x-(0.5*delta),
x+(0.5*delta),
max_distance = max_distance,
n = j)})
pa_static[j, ] <- sapply(midpoints, function(x) {pr_closest(x-(0.5*delta),
x+(0.5*delta),
max_distance = max_distance,
n = 1)})
}
Using a model written in the STAN programming language we test the effect of accounting for variable group size with modified availability functions. The model can be described as:
data {
real delta; // bin width
int<lower=1> n_site; // site
int<lower=1> n_distance_bins; // distance bins
int<lower=1> n_gs; // number of group sizes
vector[n_gs] gs; // group sizes
vector[n_distance_bins] midpts; // midpt of each interval
real<lower=1> max_distance;
int<lower=1> max_int_dist; // max distance as an integer
array[n_site, n_distance_bins, n_gs] int y; // observations matrix
array[n_gs, n_distance_bins] real pi ; // averaged availability for multiple individuals
array[n_site, n_gs] int n_obs; //number of observations
}
parameters {
// detection parameters
real log_sigma;
// abundance parameters
real beta_intercept;
vector[n_site] eps_site;
simplex[n_gs] eps_ngs; // random variation in group size
}
transformed parameters {
// distance parameters
real sigma = exp(log_sigma);
array[n_distance_bins] real<lower=0,upper=1> p_raw; // detection probability
array[n_distance_bins, n_gs] real<lower=0,upper=1> p_raw_scale; // detection probability for point independence model and accounting for availability
array[n_distance_bins, n_gs] real<upper=0> log_p_raw; // log detection probability accounting for availability
array[n_gs] real log_p; // mean site detection probability for a group size
array[n_gs] real<lower=0,upper=1> p; // log probability with summed exponential for the multinomial logit model
// real log_theta = log(theta)
array[n_site, n_gs] real lambda;
for (i in 1:n_distance_bins) {
// assuming a half-normal detection fn from line
2/(2*sigma^2)); // half-normal function (pg 419 of AHM)
p_raw[i] = exp(-(midpts[i])^// hazard function
// p_raw[i] = 1 - exp(-(midpts[i]/theta)^(-sigma)); // hazard function (pg 507 of AHM)
for(j in 1:n_gs) {
// pr(animal occurs and is detected in bin i)
p_raw_scale[i,j] = p_raw[i]*pi[j,i];
log_p_raw[i,j] = log(p_raw_scale[i,j]);
}
}
for(j in 1:n_gs) {
log_p[j] = log_sum_exp(log_p_raw[,j]);
p[j] = exp(log_p[j]);
}
// model site abundance
for(n in 1:n_site) {
for(j in 1:n_gs) {
lambda[n, j] = exp(beta_intercept + log_p[j] + log(eps_ngs[j]) + eps_site[n]);
}
}
}
model {
0, 5); // prior for sigma
log_sigma ~ normal(4, 0, 1); // prior for site effect
eps_site ~ student_t(0, 1); // prior for group size effect
eps_ngs ~ uniform(4, 0, 1); // prior for abundance
beta_intercept ~ student_t(
for(n in 1:n_site) {
for(j in 1:n_gs) {
n_obs[n,j] ~ poisson(lambda[n,j]);
}
}
for(n in 1:n_site) {
for(j in 1:n_gs) {
y[n,,j] ~ multinomial_logit(to_vector(log_p_raw[,j]));
}
}
}
generated quantities {
array[max_int_dist+1] real DetCurve;
array[n_site, n_gs] real log_lik;
array[n_site, n_gs] real n_obs_true;
array[n_site] real N_site;
real N;
real D;
for(i in 1:n_site) {
for(j in 1:n_gs) {
//for loo
log_lik[i,j] = multinomial_logit_lpmf(y[i,,j] | to_vector(log_p_raw[,j])); 1-p[j]) + log(eps_ngs[j]) + eps_site[i])) + n_obs[i,j]);
n_obs_true[i, j] = gs[j] * (poisson_rng(exp(beta_intercept + log(
}
N_site[i] = sum(n_obs_true[i,]);
}
N = sum(N_site);2);
D = N/(pi()*max_distance^// loop over distance bins
for(j in 0:max_int_dist) { // get DS predictions for distance 0 to max bin distance
1] = exp(-(j+0.5)^2/(2*sigma^2)); // half normal
DetCurve[j+// DetCurve[j+1] = 1 - exp(-(j+0.5/theta)^(-sigma)); //hazard rate
}
}
We generate two model data sets. One with a supplied adjusted availability matrix for the various group sizes and the other with an unadjusted availaility matrix (i.e., modelled as group size equalling 1); which assumes availability is uniform (adjusted for point transect however).
modelpartial <- list()
modelcrook <- list()
for(i in 1:length(group_sizes)) {
modelpartial[[i]] <- list(delta = delta,
n_site = sites,
n_distance_bins = length(midpoints),
n_gs = length_unique_groups[[i]],
gs = unique_groups[[i]],
midpts = midpoints,
max_distance = max_distance,
max_int_dist = as.integer(round(max_distance)),
y = y_a[[i]],
pi = pa[unique_groups[[i]],,drop = F],
n_obs = n_obs[[i]])
}
for(i in 1:length(group_sizes)) {
modelcrook[[i]] <- list(delta = delta,
n_site = sites,
n_distance_bins = length(midpoints),
n_gs = length_unique_groups[[i]],
gs = unique_groups[[i]],
midpts = midpoints,
max_distance = max_distance,
max_int_dist = as.integer(round(max_distance)),
y = y_a[[i]],
pi = pa_static[unique_groups[[i]],,drop = F],
n_obs = n_obs[[i]])
}
Using cmdstanr
we run the models for 1,500 iterations with a warmup of 500 iterations across 3 parallel chains.
ni <- 1500
nw <- 500
nc <- 3
inits = lapply(1:nc, function(i) list(log_sigma=runif(1),
beta_intercept=runif(1),
eps_ngs = runif(1)))
fit_partial <- list()
for(i in 1:length(group_sizes)) {
fit_partial[[i]] <- model$sample(data = modelpartial[[i]],
chains = nc,
parallel_chains = nc,
show_messages = FALSE,
save_warmup = FALSE,
iter_sampling = ni,
iter_warmup = nw, refresh = 0)
}
fit_crook <- list()
for(i in 1:length(group_sizes)) {
fit_crook[[i]] <- model$sample(data = modelcrook[[i]],
chains = nc,
parallel_chains = nc,
show_messages = FALSE,
save_warmup = FALSE,
iter_sampling = ni,
iter_warmup = nw, refresh = 0)
}
data_combined <- c(modelpartial, modelcrook)
fits_combined <- c(fit_partial, fit_crook)
From the models we obtain estimates of sigma (half-normal detection function shape parameter), and the total abundance of individuals that entered the sampling area of the camera trap. We compare these values against the true simulated values.
sigma_values <- list()
model_names <- c("Adjusted availability, group size = gamma(1,0.5)",
"Adjusted availability, group size = gamma(2,0.5)",
"Adjusted availability, group size = gamma(3,0.5)",
"Adjusted availability, group size = gamma(4,0.5)",
"Adjusted availability, group size = gamma(5,0.5)",
"Unadjusted availability, group size = gamma(1,0.5)",
"Unadjusted availability, group size = gamma(2,0.5)",
"Unadjusted availability, group size = gamma(3,0.5)",
"Unadjusted availability, group size = gamma(4,0.5)",
"Unadjusted availability, group size = gamma(5,0.5)")
cam_data_rep <- rep(camera_data, 2)
total_counts <- vector()
for(i in 1:length(fits_combined)) {
max_modelled_distance <- data_combined[[i]][["max_distance"]]
camera_data_corrected <- cam_data_rep[[i]] %>%
filter(distance < max_modelled_distance)
average_count <- mean(camera_data_corrected$count)
average_det <- sum(camera_data_corrected$detected == 1) / nrow(camera_data_corrected)
total_counts[i] <- sum(camera_data_corrected$count)
sigma_values[[i]] <- fits_combined[[i]]$summary(c("sigma" , "N")) %>%
mutate(Model = model_names[i],
`Average count` = average_count,
`True value` = c(5, total_counts[i]))
}
summary_bound <- bind_rows(sigma_values)
summary_bound %>%
select(Model, `Average count`, variable, `True value`, mean, sd, q5, q95) %>%
kbl(format = "html",
digits = 2,
caption = "Comparison of estimated half-normal shape (sigma) and the total population size (N) to the simulated true value") %>%
kable_styling(bootstrap_options = "striped") %>%
collapse_rows(1:3)
Model | Average count | variable | True value | mean | sd | q5 | q95 |
---|---|---|---|---|---|---|---|
Adjusted availability, group size = gamma(1,0.5) | 1.16 | sigma | 5 | 4.98 | 0.12 | 4.79 | 5.19 |
N | 2912 | 3051.89 | 150.01 | 2814.00 | 3306.00 | ||
Adjusted availability, group size = gamma(2,0.5) | 1.53 | sigma | 5 | 5.00 | 0.12 | 4.80 | 5.21 |
N | 3827 | 3826.52 | 171.26 | 3553.95 | 4112.00 | ||
Adjusted availability, group size = gamma(3,0.5) | 1.99 | sigma | 5 | 4.91 | 0.13 | 4.71 | 5.14 |
N | 4970 | 5187.71 | 212.06 | 4842.00 | 5527.05 | ||
Adjusted availability, group size = gamma(4,0.5) | 2.49 | sigma | 5 | 5.05 | 0.14 | 4.83 | 5.29 |
N | 6229 | 6290.95 | 230.62 | 5914.00 | 6685.00 | ||
Adjusted availability, group size = gamma(5,0.5) | 3.01 | sigma | 5 | 4.92 | 0.14 | 4.70 | 5.16 |
N | 7535 | 7670.72 | 273.11 | 7240.00 | 8127.00 | ||
Unadjusted availability, group size = gamma(1,0.5) | 1.16 | sigma | 5 | 4.77 | 0.10 | 4.60 | 4.95 |
N | 2912 | 3779.90 | 199.48 | 3461.95 | 4119.00 | ||
Unadjusted availability, group size = gamma(2,0.5) | 1.53 | sigma | 5 | 4.43 | 0.08 | 4.30 | 4.57 |
N | 3827 | 6531.21 | 311.24 | 6031.95 | 7054.00 | ||
Unadjusted availability, group size = gamma(3,0.5) | 1.99 | sigma | 5 | 4.07 | 0.07 | 3.95 | 4.18 |
N | 4970 | 11634.74 | 520.04 | 10786.95 | 12497.10 | ||
Unadjusted availability, group size = gamma(4,0.5) | 2.49 | sigma | 5 | 3.88 | 0.06 | 3.78 | 3.98 |
N | 6229 | 17710.37 | 756.76 | 16492.90 | 18976.05 | ||
Unadjusted availability, group size = gamma(5,0.5) | 3.01 | sigma | 5 | 3.64 | 0.05 | 3.56 | 3.74 |
N | 7535 | 25384.43 | 1015.44 | 23783.95 | 27089.00 |
Below we plot the posterior estimates for the total abundance against the true simulated value (red line).
N_dist <- list()
for(i in 1:length(fits_combined)) {
N_dist[[i]] <- fits_combined[[i]]$draws(c("N")) %>%
bayesplot::mcmc_areas(prob = 0.9, prob_outer = 0.99) +
geom_vline(xintercept = total_counts[i], linetype = "dashed", colour = "red") +
scale_x_continuous(labels = function(x) format(x, scientific = TRUE)) +
ggtitle(model_names[i])
}
cowplot::plot_grid(plotlist = N_dist, ncol = 2, byrow = FALSE)
Camera Trap Distance Sampling of species, where group size is not always equa to 1 should account for adjusted availability of the closest individual. Our simulation shows that abundance estimates are likely to be biased (overestimated) if group size is not taken into account.
We also provide a model that allows for the joint estimation of parameters across various sites and group sizes.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Cally (2023, Jan. 18). Justin's Code Blog: CTDS For Groups. Retrieved from https://justincally.github.io/blog/posts/2022-11-10-ctds-for-groups/
BibTeX citation
@misc{cally2023ctds, author = {Cally, Justin G}, title = {Justin's Code Blog: CTDS For Groups}, url = {https://justincally.github.io/blog/posts/2022-11-10-ctds-for-groups/}, year = {2023} }