Investigating the longevity of baits and the response of cats and other species when encountering the bait
Load useful packages and some data for the analysis
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
cache = FALSE,
message = FALSE,
warning = FALSE,
tidy.opts = list(width.cutoff = 60), tidy = TRUE
)
library(tidyverse)
library(survival)
library(tidycmprsk)
library(survminer)
library(ggsurvfit)
library(data.table)
library(brms)
library(survminer)
library(ggplot2)
library(kableExtra)
require(AICcmodavg) # has the aictab function
library(adjustedCurves)
library(patchwork)
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
if(!tibble) x <- lapply(x, as.data.frame)
names(x) <- sheets
x
}
bait_state <- read_csv("data/Bait_Attractiveness.csv") %>%
mutate(`24-hour precipitation (sqrt)` = sqrt(prev24),
`24-hour temperature` = prev24temp,
Location = factor(Location)) %>%
# mutate(Season = as.integer(factor(Season)),
# Habitat = factor(Habitat),
# Location = factor(Location)) %>%
as.data.frame()
bait_state <- read_csv("data/Bait_Attractiveness_time_varying.csv") %>%
mutate(`24-hour precipitation (sqrt)` = sqrt(rain24),
`24-hour temperature` = temp24,
`Cumulative Rain (sqrt)` = sqrt(cumrain),
Location = factor(Location)) %>%
group_by(BaitID) %>%
arrange(time1) %>%
mutate(cumtemp = cumsum(temp24)) %>%
ungroup() %>%
as.data.frame()
delwp_cols <- c(Teal = "#00B2A9",
Navy = "#201547",
Environment = "#CEDC00",
`Climate Change` = "#FDDA24",
Energy = "#0072CE",
Water = "#71C5E8",
Planning = "#642667",
Infrastructure = "#AF272F",
FFR = "#E57200",
Corporate = "#201547")
We investigate the effect of several environmental covariates on the survivorship of feral cat baits. For this analysis we use a Cox-proportional hazard regression model that accounts for various longevity of the bait (Andersen and Gill 1982). Models were implemented using the survival
R package (Therneau 2022; Terry M. Therneau and Patricia M. Grambsch 2000). We compared model fits for several candidate models using second-order Akaike’s information criterion (AICc) with the AICcmodavg
R package (Mazerolle 2020).
We have various deployments across sites and different seasons. Given that the experimental design has not been set-up to look at the crossed effects we should really only use one categorical group: Location
Given, that sample sizes are most balanced for region (GLCP, HKNP and WPNP) we will exclusively use this group for the analysis. However, it should be kept in mind that the regional effects may be confounded by season and habitat types. Based on ‘location’, rainfall and temperature, we use AIC-based model selection to compare and select a model to use in the analysis.
# time varying model
baitcrr <- coxph(Surv(time2, time2 = time1, status) ~ `24-hour temperature`
+ `Cumulative Rain (sqrt)`
+ Location, data = bait_state, x = TRUE)
baitcrr1 <- coxph(Surv(time2, time2 = time1, status) ~ Location, data = bait_state)
baitcrr2 <- coxph(Surv(time2, time2 = time1, status) ~ `24-hour temperature`
+ `Cumulative Rain (sqrt)`, data = bait_state)
baitcrr3 <- coxph(Surv(time2, time2 = time1, status) ~ `Cumulative Rain (sqrt)`
+ Location, data = bait_state)
baitcrrNull <- coxph(Surv(time2, time2 = time1, status) ~ 1, data = bait_state)
model_list <- list(baitcrr, baitcrr1, baitcrr2, baitcrr3, baitcrrNull)
names(model_list) <- 1:5
# Compare models with AIC table
aic_table <- aictab(model_list)
# Add a column to the table with variable names
model_vars <- sapply(model_list, function(x) gsub(":", "*", paste(names(x$means), collapse = " + ")))
aic_table <- cbind(Modnames = names(model_vars), model_vars) %>%
merge(aic_table, by="Modnames") %>%
arrange(AICc)
aic_table %>%
kbl("html", caption = "Model comparison of models", digits = 2) %>%
kable_styling(bootstrap_options = "striped")
Modnames | model_vars | K | AICc | Delta_AICc | ModelLik | AICcWt | LL | Cum.Wt |
---|---|---|---|---|---|---|---|---|
1 |
24-hour temperature + Cumulative Rain (sqrt) + LocationHKNP + LocationWPNP
|
4 | 430.93 | 0.00 | 1.00 | 0.73 | -211.45 | 0.73 |
4 |
Cumulative Rain (sqrt) + LocationHKNP + LocationWPNP
|
3 | 432.90 | 1.97 | 0.37 | 0.27 | -213.44 | 1.00 |
2 | LocationHKNP + LocationWPNP | 2 | 471.54 | 40.61 | 0.00 | 0.00 | -233.76 | 1.00 |
5 | 0 | 484.63 | 53.70 | 0.00 | 0.00 | -242.32 | 1.00 | |
3 |
24-hour temperature + Cumulative Rain (sqrt)
|
2 | 485.49 | 54.56 | 0.00 | 0.00 | -240.74 | 1.00 |
lapply(model_list[1:4], cox.zph)
#> $`1`
#> chisq df p
#> `24-hour temperature` 17.7 1 2.6e-05
#> `Cumulative Rain (sqrt)` 32.3 1 1.3e-08
#> Location 45.8 2 1.1e-10
#> GLOBAL 55.1 4 3.2e-11
#>
#> $`2`
#> chisq df p
#> Location 32.3 2 9.7e-08
#> GLOBAL 32.3 2 9.7e-08
#>
#> $`3`
#> chisq df p
#> `24-hour temperature` 13.5 1 0.00024
#> `Cumulative Rain (sqrt)` 17.5 1 2.8e-05
#> GLOBAL 22.5 2 1.3e-05
#>
#> $`4`
#> chisq df p
#> `Cumulative Rain (sqrt)` 21.2 1 4.2e-06
#> Location 29.6 2 3.8e-07
#> GLOBAL 29.8 3 1.5e-06
Based on the above checks, no models meet the assumptions of a cox-proportional hazards model. We can try and circumvent this issue by stratifying the location:
#> Call:
#> coxph(formula = Surv(time2, time2 = time1, status) ~ `24-hour temperature` +
#> `Cumulative Rain (sqrt)` + strata(Location), data = bait_state)
#>
#> n= 1187, number of events= 68
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> `24-hour temperature` -6.597e-02 9.362e-01 3.759e+06 0 1
#> `Cumulative Rain (sqrt)` NA NA 0.000e+00 NA NA
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> `24-hour temperature` 0.9362 1.068 0 Inf
#> `Cumulative Rain (sqrt)` NA NA NA NA
#>
#> Concordance= 0.5 (se = 0 )
#> Likelihood ratio test= 0 on 1 df, p=1
#> Wald test = 0 on 1 df, p=1
#> Score (logrank) test = 0 on 1 df, p=1
cox.zph(baitcrrS)
#> chisq df p
#> `24-hour temperature` 4.54e-16 1 1
#> GLOBAL 4.54e-16 1 1
Based on the stratification it looks like the model has issues in estimate coeffieiceients for more than one parameter. It does not seem to be the solution we are after.
To circumvent this issue we are going to try a different approach
Given the issues with the above approach in survival analysis and the violation of the cox-proportional hazards test it is not feasible to use a cox-proportional hazard model to evaluate the response of baits to various environmental factors. Another option is to use logistic regression with the time-varying components. This approach models the likelihood of a bait becoming unattractive as a function of environmental and other covariates for each given bait at each period the bait is still uncensored. For a vignette on the topic on how to apply this approach in brms see:
We can assess the correlations between predictors with the following checks:
panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y, use="pairwise.complete.obs"))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
baitselectdata <- bait_state %>%
dplyr::select(cumtemp, cumrain, time1, Location, rain24, temp24)
pairs(baitselectdata, lower.panel=panel.smooth, upper.panel=panel.cor)
Based on this we should not use ‘cumtemp’ and ‘time1’ (days) in the same model. We should also avoid ‘rain24’ and ‘cumrain’. We run several select models and compare the models using leave-one-out cross validation (LOO-CV) to select the best performing model (Vehtari et al. 2020; Vehtari, Gelman, and Gabry 2017). Models were implemented using Bayesian methods with the brms
R package (Bürkner 2017, 2018, 2021). We used a weakly informative normal prior with a standard deviation of 4. 4 chains were run in parallel with 1,000 warmup iterations and 2,000 sampling iterations.
brms_bait_survival_model <- brm(data = bait_state,
family = bernoulli(link = "logit"),
formula = status ~ scale(cumtemp) + scale(sqrt(cumrain)) + Location + (1|BaitID),
prior(normal(0, 4), class = b),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 12)
brms_bait_survival_model2 <- brm(data = bait_state,
family = bernoulli(link = "logit"),
formula = status ~ scale(cumtemp) + scale(sqrt(cumrain)) + (1|Location) + (1|BaitID),
prior(normal(0, 4), class = b),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 12)
brms_bait_survival_model3 <- brm(data = bait_state,
family = bernoulli(link = "logit"),
formula = status ~ scale(time1) + scale(sqrt(cumrain)) + Location + (1|BaitID),
prior(normal(0, 4), class = b),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 12)
brms_bait_survival_model4 <- brm(data = bait_state,
family = bernoulli(link = "logit"),
formula = status ~ scale(temp24) + scale(sqrt(cumrain)) + scale(time1) + Location + (1|BaitID),
prior(normal(0, 4), class = b),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 12)
brms_bait_survival_model5 <- brm(data = bait_state,
family = bernoulli(link = "logit"),
formula = status ~ Location*scale(time1) + (1|BaitID),
prior(normal(0, 4), class = b),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 12)
brms_bait_survival_model6 <- brm(data = bait_state,
family = bernoulli(link = "logit"),
formula = status ~ Location*scale(time1) + scale(temp24) + scale(sqrt(rain24)) + (1|BaitID),
prior(normal(0, 4), class = b),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 12)
brms_bait_survival_modelHS <- brm(data = bait_state,
family = bernoulli(link = "logit"),
formula = status ~ Location*scale(time1) + scale(temp24) + scale(sqrt(rain24)) + (1|BaitID),
prior(horseshoe(df = 2, par_ratio = 1), class = b),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 12)
brms_bait_survival_modelNull <- brm(data = bait_state,
family = bernoulli(link = "logit"),
formula = status ~ 1,
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 12)
model_list <- list(brms_bait_survival_model,
brms_bait_survival_model2,
brms_bait_survival_model3,
brms_bait_survival_model4,
brms_bait_survival_model5,
brms_bait_survival_model6,
brms_bait_survival_modelHS,
brms_bait_survival_modelNull)
loo_compare <- loo_compare(loo(brms_bait_survival_model),
loo(brms_bait_survival_model2),
loo(brms_bait_survival_model3),
loo(brms_bait_survival_model4),
loo(brms_bait_survival_model5),
loo(brms_bait_survival_model6),
loo(brms_bait_survival_modelHS),
loo(brms_bait_survival_modelNull))
saveRDS(loo_compare, "models/loo_compare.rds")
saveRDS(brms_bait_survival_model6, "models/brms_bait_survival_model6.rds", compress = "xz")
loo_compare <- readRDS("models/loo_compare.rds")
loo_compare
#> elpd_diff se_diff
#> brms_bait_survival_model6 0.0 0.0
#> brms_bait_survival_modelHS -2.0 1.2
#> brms_bait_survival_model5 -2.8 3.2
#> brms_bait_survival_model3 -7.0 6.6
#> brms_bait_survival_model4 -8.0 6.5
#> brms_bait_survival_model2 -10.9 7.8
#> brms_bait_survival_model -11.1 7.6
#> brms_bait_survival_modelNull -117.7 17.7
brms_bait_survival_model6 <- readRDS("models/brms_bait_survival_model6.rds")
bait_survival_top_model <- brms_bait_survival_model6
bayesplot::mcmc_trace(bait_survival_top_model, facet_args = list(ncol = 2), regex_pars = c("time1", "Location", "temp24", "rain24", "lp"))
Bait survival was primarily impacted by the location of the bait station and the time the bait was left out for; with some slight evidence suggesting rainfall may improve bait survival (reduce the rate of bait decay). The effect of location on bait survival suggests that there is unexplained variation driving bait decay; with GLCP appearing to have faster rates of decay.
summary(bait_survival_top_model)
#> Family: bernoulli
#> Links: mu = logit
#> Formula: status ~ Location * scale(time1) + scale(temp24) + scale(sqrt(rain24)) + (1 | BaitID)
#> Data: bait_state (Number of observations: 1187)
#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 4000
#>
#> Group-Level Effects:
#> ~BaitID (Number of levels: 117)
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> sd(Intercept) 0.37 0.29 0.02 1.11 1.00 1052
#> Tail_ESS
#> sd(Intercept) 1509
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat
#> Intercept -6.88 1.23 -9.49 -4.72 1.00
#> LocationHKNP -4.21 2.17 -8.63 -0.11 1.00
#> LocationWPNP 2.77 1.29 0.36 5.35 1.00
#> scaletime1 6.36 1.31 3.95 9.04 1.00
#> scaletemp24 -0.04 0.26 -0.57 0.46 1.00
#> scalesqrtrain24 -0.45 0.21 -0.87 -0.05 1.00
#> LocationHKNP:scaletime1 -0.37 1.53 -3.48 2.56 1.00
#> LocationWPNP:scaletime1 -4.28 1.31 -6.91 -1.82 1.00
#> Bulk_ESS Tail_ESS
#> Intercept 1653 2259
#> LocationHKNP 2019 2231
#> LocationWPNP 1782 2379
#> scaletime1 1630 2039
#> scaletemp24 2851 3091
#> scalesqrtrain24 3734 3085
#> LocationHKNP:scaletime1 1627 2293
#> LocationWPNP:scaletime1 1634 2104
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
bait_aggregated <-
bait_state %>%
mutate(event = case_when(status == 1 ~ "event",
TRUE ~ "no_event")) %>%
group_by(Location) %>%
count(event, temp24, rain24, time1) %>%
ungroup() %>%
complete(Location, event, temp24, rain24, time1) %>%
pivot_wider(names_from = event,
values_from = n) %>%
mutate(total = event + no_event) %>%
mutate(proportion = event / total) %>%
mutate(logit = log(proportion / (1 - proportion)))
scenarios <- data.frame(cond = c("Hot and Dry",
"Hot and Wet",
"Hot and Mid-precipitation",
"Cold and Dry",
"Cold and Wet",
"Cold and Mid-precipitation",
"Mid-temp and Dry",
"Mid-temp and Wet",
"Mid-temp and Mid-precipitation"),
q1 = c(0.9, 0.9, 0.9, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5),
q2 = c(0.1, 0.9, 0.5, 0.1, 0.9, 0.5, 0.1, 0.9, 0.5))
newdata_env <- list()
for(i in 1:nrow(scenarios)) {
newdata_env[[i]] <- expand.grid(Type = scenarios$cond[i],
rain24 = quantile(bait_state$rain24, scenarios$q1[i]),
temp24 = quantile(bait_state$temp24, scenarios$q2[i]),
Location = unique(bait_state$Location),
time1 = as.numeric(1:14),
# status = 1,
Title = paste0("Temp = ", round(quantile(Temp, scenarios$q1[i]),1), "°C, Precip = ", round(quantile(Precip, scenarios$q2[i]),1), "mm")) %>%
mutate(time2 = time1 - 1)
}
newdata_all <- bind_rows(newdata_env)
preds_bs <- cbind(newdata_all, predict(bait_survival_top_model, newdata = newdata_all, re_formula = NA)) %>%
rowwise() %>%
mutate(LCI = max(c(Estimate - Est.Error, 0)),
UCI = min(c(Estimate + Est.Error, 1))) %>%
ungroup() %>%
group_by(Location, time1, Type, Title) %>%
mutate_at(vars(Estimate, LCI, UCI), .funs = ~cumprod(1 - .))
ggplot(data = preds_bs) +
geom_ribbon(aes(x = time1, ymin = LCI, ymax = UCI, fill = Location), alpha = 0.5) +
geom_line(aes(x = time1, y = Estimate, colour = Location)) +
scale_fill_manual(values = unname(delwp_cols[1:3])) +
scale_colour_manual(values = unname(delwp_cols[1:3])) +
facet_wrap(~ Type + Title, nrow = 3) +
scale_x_continuous(breaks = seq(0,14, by = 2), expand = expansion(0, 0), name = "Days") +
ylab("Survival probability") +
theme_bw()
Figure 1: Survival of bits under different possible scenarios. Scenarios were derived from the 0.9, 0.5 and 0.1 quantiles of the data for temperature and precipitation values. Lines are the mean prediction and shaded area represents ± 1 standard error
Based on the above scenarios we can generate the following table that shows the day when the probability of bait survival drops below 0.5.Note that in some cases survival probability does not drop below 0.5 within the 14 days.
preds_bs %>%
group_by(Type, Location) %>%
arrange(desc(Estimate)) %>%
filter(Estimate <= 0.5) %>%
slice(1) %>%
select(Type, Conditions = Title, Location, Time = time1, `Survival Probability` = Estimate, `Survival Probability SE` = Est.Error) %>%
kbl("html", digits = 2, caption = "Time at which bait survival probability falls below 0.5") %>%
kable_styling(bootstrap_options = "striped") %>%
kableExtra::collapse_rows(columns = 1:2)
Type | Conditions | Location | Time | Survival Probability | Survival Probability SE |
---|---|---|---|---|---|
Hot and Dry | Temp = 20.2°C, Precip = 1.2mm | GLCP | 11 | 0.25 | 0.43 |
HKNP | 14 | 0.21 | 0.40 | ||
WPNP | 14 | 0.49 | 0.50 | ||
Hot and Wet | Temp = 20.2°C, Precip = 15.2mm | GLCP | 11 | 0.27 | 0.44 |
HKNP | 14 | 0.23 | 0.42 | ||
Hot and Mid-precipitation | Temp = 20.2°C, Precip = 3.6mm | GLCP | 11 | 0.26 | 0.44 |
HKNP | 14 | 0.21 | 0.41 | ||
Cold and Dry | Temp = 13.5°C, Precip = 1.2mm | GLCP | 10 | 0.46 | 0.50 |
HKNP | 13 | 0.39 | 0.49 | ||
WPNP | 13 | 0.46 | 0.50 | ||
Cold and Wet | Temp = 13.5°C, Precip = 15.2mm | GLCP | 10 | 0.48 | 0.50 |
HKNP | 13 | 0.41 | 0.49 | ||
WPNP | 13 | 0.49 | 0.50 | ||
Cold and Mid-precipitation | Temp = 13.5°C, Precip = 3.6mm | GLCP | 10 | 0.47 | 0.50 |
HKNP | 13 | 0.38 | 0.49 | ||
WPNP | 13 | 0.47 | 0.50 | ||
Mid-temp and Dry | Temp = 16.8°C, Precip = 1.2mm | GLCP | 11 | 0.17 | 0.37 |
HKNP | 13 | 0.43 | 0.50 | ||
WPNP | 14 | 0.39 | 0.49 | ||
Mid-temp and Wet | Temp = 16.8°C, Precip = 15.2mm | GLCP | 11 | 0.20 | 0.40 |
HKNP | 13 | 0.46 | 0.50 | ||
WPNP | 14 | 0.40 | 0.49 | ||
Mid-temp and Mid-precipitation | Temp = 16.8°C, Precip = 3.6mm | GLCP | 11 | 0.18 | 0.39 |
HKNP | 13 | 0.45 | 0.50 | ||
WPNP | 14 | 0.39 | 0.49 |
If we predict survival rates for mean levels of various environmental predictors (rainfall and temperature) we can obtain more general survival curves:
f <-
fitted(bait_survival_top_model,
newdata = newdata_all, re_formula = NA) %>%
data.frame() %>%
bind_cols(newdata_all)
# hazard (top panel)
p1 <-
f %>%
group_by(Location, time1) %>%
summarise_all(mean) %>%
ggplot(aes(x = time1, group = Location, color = Location, fill = Location)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
alpha = 0.5, size = 0) +
geom_line(aes(y = Estimate)) +
scale_y_continuous("fitted hazard")
# survival (bottom panel)
p2 <-
f %>%
group_by(Location, time1) %>%
mutate_at(vars(Estimate, Q2.5, Q97.5), .funs = ~cumprod(1 - .)) %>%
summarise_all(mean) %>%
ggplot(aes(x = time1, group = Location, color = Location, fill = Location)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
alpha = 0.5, size = 0) +
geom_line(aes(y = Estimate)) +
geom_hline(yintercept = .5, color = "red", linetype = "dashed") +
scale_y_continuous("fitted survival probability", limits = c(0, 1)) +
theme(strip.background.x = element_blank(),
strip.text.x = element_blank())
(
(p1 / p2) &
scale_fill_manual(values = unname(delwp_cols[1:3])) &
scale_colour_manual(values = unname(delwp_cols[1:3])) &
theme_bw() &
scale_x_continuous(breaks = seq(0,14, by = 2), expand = expansion(0, 0), name = "Days")
) +
plot_layout(guides = "collect")
Figure 2: Hazard rates and survival probability of fitted models
Due to the data structures provided a decent amount of data wrangling was needed in order to format the data into a suitable long format for binomial regression. The aim of this analysis is to determine the probability of a cat (or other animal) taking a bait, having encountered it.
melt_taken <- function(x, region, round) {
df <- data.table::as.data.table(x) %>%
melt(id.vars = "Station", variable.name = "Date") %>%
mutate(Date = as.Date(Date)) %>%
group_by(Station) %>%
arrange(Date) %>%
tidyr::fill(Date, .direction = "down") %>%
mutate(value = coalesce(value, "BP")) %>%
filter(value != "BA") %>%
mutate(StartDate = min(Date),
EndDate = max(Date)) %>%
slice_tail(n = 1) %>%
ungroup() %>%
mutate(Date = as.Date(Date)) %>%
mutate(Taken = case_when(value == "BP" ~ 0,
TRUE ~ 1)) %>%
rename(Species = value)
df$Region <- region
df$Round <- round
return(df)
}
observations <- list()
takes <- list()
# BD
BD_data_R1 <- readxl::read_excel("data/FateData/BD_record_table_15min_BaitFate.xlsx", sheet = 1) %>%
mutate(Region = "BD",
Round = 1)
BD_data_R2 <- readxl::read_excel("data/FateData/BD_record_table_15min_BaitFate.xlsx", sheet = 2) %>%
mutate(Region = "BD",
Round = 2)
BD_taken_matrix <- openxlsx::read.xlsx("data/FateData/BD_record_table_15min_BaitFate.xlsx", sheet = 3, detectDates = T)
BD_take_R1 <- BD_taken_matrix[2:nrow(BD_taken_matrix),c(1,which(BD_taken_matrix[1,] == "R1"))]
BD_take_R2 <- BD_taken_matrix[2:nrow(BD_taken_matrix),c(1,which(BD_taken_matrix[1,] == "R2"))]
observations[["BD"]] <- bind_rows(BD_data_R1, BD_data_R2)
melt_BD_take_R1 <- melt_taken(BD_take_R1, region = "BD", round = 1)
melt_BD_take_R2 <- melt_taken(BD_take_R2, region = "BD", round = 2)
takes[["BD"]] <- bind_rows(melt_BD_take_R1, melt_BD_take_R2)
# BM
BM_data_R1 <- readxl::read_excel("data/FateData/BM_record_table_15min_BaitFate.xlsx", sheet = 1)[,1:5] %>%
mutate(Region = "BM",
Round = 1)
BM_taken_matrix <- openxlsx::read.xlsx("data/FateData/BM_record_table_15min_BaitFate.xlsx", sheet = 2, detectDates = T) %>%
rename(Station = CamID)
observations[["BM"]] <- BM_data_R1
takes[["BM"]] <- melt_taken(BM_taken_matrix, region = "BM", round = 1)
#GLCP
GLCP_data_R1 <- readxl::read_excel("data/FateData/GLCP_record_table_15min_BaitFate.xlsx", sheet = 1,
col_types = c("numeric", "text", "text", "date", "date"))[,1:5] %>%
mutate(Region = "GLCP",
Round = 1,
Station = paste0("GLCP", Station))
GLCP_taken_matrix <- openxlsx::read.xlsx("data/FateData/GLCP_record_table_15min_BaitFate.xlsx", sheet = "BaitFate", detectDates = T) %>%
rename(Station = `Site.ID`)
observations[["GLCP"]] <- GLCP_data_R1
takes[["GLCP"]] <- melt_taken(GLCP_taken_matrix, region = "GLCP", round = 1)
#HKNP
HKNP_data_R1 <- readxl::read_excel("data/FateData/HKNP_record_table_15min_BaitFate_2.xlsx", sheet = 1) %>%
mutate(Region = "HKNP",
Round = 1)
HKNP_data_R2 <- readxl::read_excel("data/FateData/HKNP_record_table_15min_BaitFate_2.xlsx", sheet = 3) %>%
mutate(Region = "HKNP",
Round = 2)
HKNP_take_R1 <- openxlsx::read.xlsx("data/FateData/HKNP_record_table_15min_BaitFate_2.xlsx", sheet = 2, detectDates = T)
HKNP_take_R2 <- openxlsx::read.xlsx("data/FateData/HKNP_record_table_15min_BaitFate_2.xlsx", sheet = 4, detectDates = T) %>%
rename(Station = `Row.Labels`)
observations[["HKNP"]] <- bind_rows(HKNP_data_R1, HKNP_data_R2)
melt_HKNP_take_R1 <- melt_taken(HKNP_take_R1, region = "HKNP", round = 1)
melt_HKNP_take_R2 <- melt_taken(HKNP_take_R1, region = "HKNP", round = 2)
takes[["HKNP"]] <- bind_rows(melt_HKNP_take_R1, melt_HKNP_take_R2)
#WPNP
WPNP_data_R1 <- readxl::read_excel("data/FateData/WPNP_record_table_15min_BaitFate.xlsx", sheet = 1)[,1:5] %>%
mutate(Region = "WPNP",
Round = 1,
Station = as.character(Station))
WPNP_taken_matrix <- openxlsx::read.xlsx("data/FateData/WPNP_record_table_15min_BaitFate.xlsx", sheet = "BaitFate", detectDates = T)
observations[["WPNP"]] <- WPNP_data_R1
takes[["WPNP"]] <- melt_taken(WPNP_taken_matrix, region = "WPNP", round = 1) %>%
mutate(Station = as.character(Station))
# TA Data
TA_data <- readxl::read_excel("data/FateData/TA_record_table_15min_BaitFate.xlsx", sheet = 1)[,1:5] %>%
mutate(Region = "TA",
Round = as.numeric(substr(Round,2,2)),
Station = as.character(Station),
Date = as.Date(DateTimeOriginal))
observations[["TA"]] <- TA_data
TA_taken_matrix <- openxlsx::read.xlsx("data/FateData/TA_record_table_15min_BaitFate.xlsx", sheet = 2, detectDates = T) %>%
rename(Station = Date)
TA_take_R1 <- TA_taken_matrix[2:nrow(TA_taken_matrix),c(1,which(TA_taken_matrix[1,] == "R1"))]
TA_take_R2 <- TA_taken_matrix[2:nrow(TA_taken_matrix),c(1,which(TA_taken_matrix[1,] == "R2"))]
melt_TA_take_R1 <- melt_taken(TA_take_R1, region = "TA", round = 1)
melt_TA_take_R2 <- melt_taken(TA_take_R2, region = "TA", round = 2)
takes[["TA"]] <- bind_rows(melt_TA_take_R1, melt_TA_take_R2)
##############
observations_joined <- bind_rows(observations)
taken_joined <- bind_rows(takes)
unique_sp_df <- list()
unique_sp_df[["unique_obs_species"]] <- data.frame(`NameInData` = unique(observations_joined$Species) %>% sort(), `ConsistentStandardisedSpeciesName` = NA, GroupLabel = NA)
unique_sp_df[["unique_taken_species"]] <- data.frame(`NameInData` = unique(taken_joined$Species) %>% sort(), `ConsistentStandardisedSpeciesName` = NA, GroupLabel = NA)
# openxlsx::write.xlsx(unique_sp_df, "data/unique_sp_df.xlsx", overwrite = T)
matched_sp_data <- readxl::read_excel("data/matched_sp.xlsx", sheet = 1) %>% unique()
matched_sp_data2 <- readxl::read_excel("data/matched_sp.xlsx", sheet = 2) %>% unique()
taken_date <- taken_joined %>%
# filter(Taken == 1) %>%
mutate(Date = coalesce(Date, as.Date("2050-01-01"))) %>%
dplyr::select(Station, TakenDate = Date, Region, Round)
taken_by <- taken_joined %>%
mutate(Taken = 1) %>%
left_join(matched_sp_data2, by = c("Species" = "NameInData")) %>%
dplyr::select(Station, Date, Region, Round, ConsistentStandardisedSpeciesName, Taken)
StationStarts <- taken_joined %>%
dplyr::select(Station, Region, Round, StartDate)
observations_anti <- observations_joined %>%
left_join(matched_sp_data, by = c("Species" = "NameInData")) %>%
left_join(taken_date, by = c("Station", "Region", "Round")) %>%
anti_join(taken_by, y = .) %>% filter(!is.na(ConsistentStandardisedSpeciesName))
write.csv(observations_anti, "data/non-matching-data.csv")
data_size <- observations_joined %>%
left_join(StationStarts, by = c("Station", "Region", "Round"))
observations_matched<- observations_joined %>%
left_join(StationStarts, by = c("Station", "Region", "Round")) %>%
mutate(Day = as.numeric(as.Date(Date) - StartDate + 1)) %>%
left_join(matched_sp_data, by = c("Species" = "NameInData")) %>%
left_join(taken_date, by = c("Station", "Region", "Round")) %>%
left_join(taken_by) %>%
filter(Date <= TakenDate) %>%
mutate(Taken = coalesce(Taken, 0)) %>%
group_by(Station, Region, Round, Taken) %>%
filter(Taken == 0 | Taken == 1 & DateTimeOriginal == min(DateTimeOriginal)) %>% ungroup() %>%
dplyr::select(Station, Species, DateTimeOriginal, Date, Region, Round, StartDate, Day, ConsistentStandardisedSpeciesName, GroupLabel_1, GroupLabel_2, TakenDate, Taken) %>%
filter(!is.na(GroupLabel_2))
write_csv(observations_matched, "data/observations_matched.csv")
Given we are modelling the probability of an animal taking the bait, having already encountered it, we do not need to model factors affecting encounter rate (which is likely linked to abundance and behaviour). Instead, we model the probability an animal will take a bat, having encountered it based on the days a bait has been deployed and a random effect of region and station. ‘Days’ is modelled as an interaction with species group (cat or other). Below we run a binomial regression (logit-link) model with region and station as random effects, just region and a null model. We compare the models using leave-one-out cross validation (LOO-CV) to select the best performing model (Vehtari et al. 2020; Vehtari, Gelman, and Gabry 2017). Models were implemented using Bayesian methods with the brms
R package (Bürkner 2017, 2018, 2021). We used a weakly informative student-t prior with 3 degrees of freedom. 3 chains were run in parallel with 1,000 warmup iterations and 2,000 sampling iterations.
brms_bait_model <- brm(formula = Taken ~ GroupLabel_2*scale(Day) + (1|Region),
data = observations_matched,
family = bernoulli(link = "logit"),
warmup = 1000,
iter = 2000,
chains = 3,
inits= "0",
cores=3,
seed = 123, silent = 2, refresh = 0)
brms_bait_model_st <- brm(formula = Taken ~ GroupLabel_2*scale(Day) + (1|Region) + (1|Station),
data = observations_matched,
family = bernoulli(link = "logit"),
warmup = 1000,
iter = 2000,
chains = 3,
inits= "0",
cores=3,
seed = 123, silent = 2, refresh = 0)
brms_bait_model_null <- brm(formula = Taken ~ 1,
data = observations_matched,
family = bernoulli(link = "logit"),
warmup = 1000,
iter = 2000,
chains = 3,
inits= "0",
cores=3,
seed = 123, silent = 2, refresh = 0)
We use leave-one-out cross validation to compare the written model against null model. Broadly, this tests the posterior predictive power of the given model against one with no predictors.
loo_compare(loo(brms_bait_model), loo(brms_bait_model_null), loo(brms_bait_model_st))
#> elpd_diff se_diff
#> brms_bait_model_st 0.0 0.0
#> brms_bait_model -12.1 5.6
#> brms_bait_model_null -43.7 10.9
Based on the above comparison, our model with days and a region and station random effect performs better than a null model.
summary(brms_bait_model_st)
#> Family: bernoulli
#> Links: mu = logit
#> Formula: Taken ~ GroupLabel_2 * scale(Day) + (1 | Region) + (1 | Station)
#> Data: observations_matched (Number of observations: 1761)
#> Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 3000
#>
#> Group-Level Effects:
#> ~Region (Number of levels: 6)
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> sd(Intercept) 1.01 0.49 0.38 2.32 1.00 807
#> Tail_ESS
#> sd(Intercept) 1392
#>
#> ~Station (Number of levels: 280)
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> sd(Intercept) 1.23 0.23 0.78 1.70 1.01 616
#> Tail_ESS
#> sd(Intercept) 749
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat
#> Intercept 0.23 0.74 -1.20 1.69 1.00
#> GroupLabel_2Other -2.86 0.58 -4.00 -1.71 1.00
#> scaleDay 1.29 0.46 0.45 2.21 1.00
#> GroupLabel_2Other:scaleDay -1.14 0.45 -2.06 -0.30 1.00
#> Bulk_ESS Tail_ESS
#> Intercept 1497 1392
#> GroupLabel_2Other 2688 2262
#> scaleDay 1459 1628
#> GroupLabel_2Other:scaleDay 1667 1544
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
We assess convergenve via inspection of traceplots and Rhat values. Traceplots show good evidence of convergence and no Rhat values are above 1.05.
bayesplot::mcmc_trace(brms_bait_model_st, facet_args = list(ncol = 3), regex_pars = c("GroupLabel_2", "Region", "Day", "lp"))
Figure 3: Traceplots for the binomial regression encounter model
From our model, we can generate predictions of the probability a cat or other species will take a bait, having encountered it. This probability will vary with the ‘days’ since the bait was deployed.
ce_plot <- conditional_effects(brms_bait_model_st, effects = "Day:GroupLabel_2")
plot(ce_plot, plot = FALSE)[[1]] +
scale_fill_manual(values = unname(delwp_cols[1:2]), name = "Species") +
scale_colour_manual(values = unname(delwp_cols[1:2]), name = "Species") +
ylab("Probability of taking bait, having encountered it") +
ggplot2::theme_bw()
From the results we see that the probability a cat will take a bait increases the longer a bait is deployed. This does not appear to be the case for other species. This then raises the question on the various probabilities a bait will be taken by a cat under different encounter rates (by cat and others). To investigate these scenarios we will use the distribution of encounter rates present in the original data:
encounter_scenarios <- list()
encounter_scenarios[["cat_1"]] <- data.frame(GroupLabel_2 = "Cat",
Day = 1:38)
encounter_scenarios[["cat_2"]] <- data.frame(GroupLabel_2 = "Cat",
Day = seq(2, 38, by = 2))
encounter_scenarios[["cat_week"]] <- data.frame(GroupLabel_2 = "Cat",
Day = seq(7, 35, by = 7))
# encounter_scenarios_cat_once <- data.frame(GroupLabel_2 = "Cat",
# Day = round(runif(1,1,38)))
encounter_scenarios[["other_1"]] <- data.frame(GroupLabel_2 = "Other",
Day = 1:38)
encounter_scenarios[["other_5"]] <- data.frame(GroupLabel_2 = "Other",
Day = rep(1:38, each = 5))
encounter_scenarios[["other_10"]] <- data.frame(GroupLabel_2 = "Other",
Day = rep(1:38, each = 10))
postdraws <- list()
pp_day <- list()
for(i in 1:length(encounter_scenarios)) {
postdraws[[i]] <- posterior_predict(brms_bait_model_st, newdata = encounter_scenarios[[i]], re_formula = NA, ndraws = 1000)
pp_day[[i]] <- data.frame(DayTaken = encounter_scenarios[[i]]$Day[apply(postdraws[[i]], 1, which.max)])
colnames(pp_day[[i]]) <- encounter_scenarios[[i]]$GroupLabel_2[1]
}
bind_cols_scenarios <- function(df1, df2, name) {
new_df <- bind_cols(df1, df2) %>%
mutate(Draw = 1:n()) %>%
rowwise() %>%
mutate(Scenario = name,
DayTaken = min(Cat, Other),
SpeciesTaken = case_when(DayTaken == Cat & DayTaken !=Other ~ "Cat",
DayTaken != Cat & DayTaken ==Other ~ "Other",
DayTaken == Cat & DayTaken == Other ~ "Same Day")) %>%
ungroup()
return(new_df)
}
scenario_joined <- bind_cols_scenarios(pp_day[[2]], pp_day[[5]], "Cat encounter every second day, \nothers have 5 encounters a day") %>%
bind_rows(bind_cols_scenarios(pp_day[[3]], pp_day[[5]], "Cat encounter every 7 days, \nothers have 5 encounters a day")) %>%
bind_rows(bind_cols_scenarios(pp_day[[2]], pp_day[[6]], "Cat encounter every second day, \nothers have 10 encounters a day"))
scenario_joined_summary <- scenario_joined %>%
group_by(Scenario, SpeciesTaken) %>%
summarise(`Proportion %` = 100*n()/1000)
scenario_joined_summary %>%
kbl("html", digits = 2, caption = "Simulated bait taking rates based on different encounter volumes") %>%
kable_styling(bootstrap_options = "striped") %>%
collapse_rows(1:2)
Scenario | SpeciesTaken | Proportion % |
---|---|---|
Cat encounter every 7 days, others have 5 encounters a day | Cat | 7.6 |
Other | 90.1 | |
Same Day | 2.3 | |
Cat encounter every second day, others have 10 encounters a day | Cat | 11.5 |
Other | 80.8 | |
Same Day | 7.7 | |
Cat encounter every second day, others have 5 encounters a day | Cat | 25.2 |
Other | 65.7 | |
Same Day | 9.1 |
If other species encounter the baits a lot it will be less likely that cats encounter and take the baits first. Because probability of bait taking by cats, having encountered the bait is low early in deployment; even several encounters early on do not ensure bait taking.
rdraws <- sample(1:1000, 100) # change random sample number
ggplot(scenario_joined %>% filter(Draw %in% rdraws) %>% group_by(Scenario) %>% mutate(Draw = 1:100) %>% ungroup(), aes(x=Draw, y=DayTaken)) +
geom_segment( aes(x=Draw, xend=Draw, y=0, yend=DayTaken, colour = SpeciesTaken)) +
geom_point(aes(fill = SpeciesTaken), size=3, alpha=0.9, shape = 21) +
theme_light() +
coord_flip() +
xlab("Bait number") +
ylab("Days") +
scale_fill_manual(values = c(Cat = "#e41a1c", `Same Day` = "#377eb8", Other = "#4daf4a")) +
scale_colour_manual(values = c(Cat = "#e41a1c", `Same Day` = "#377eb8", Other = "#4daf4a")) +
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank(), legend.position = "bottom"
) + facet_wrap(~Scenario)
Figure 4: Simulation of 100 posterior draws showing the fait of 100 baits for various encounter volumes
The above plot visualises the variability in probability that a given bait will be consumed by a cat or other species. Unless cat-bait encounter rates are high, the probability the bait will be consumed by a cat is low.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Cally (2022, Oct. 10). Justin's Code Blog: Feral Cat Bait Analysis. Retrieved from https://justincally.github.io/blog/posts/2022-10-10-feral-cat-bait-analysis/
BibTeX citation
@misc{cally2022feral, author = {Cally, Justin G}, title = {Justin's Code Blog: Feral Cat Bait Analysis}, url = {https://justincally.github.io/blog/posts/2022-10-10-feral-cat-bait-analysis/}, year = {2022} }