Justin's Code Blog: Feral Cat Bait Analysis

Justin G Cally

Setup

Load useful packages and some data for the analysis

Show code

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = FALSE,
  message = FALSE, 
  warning = FALSE, 
  tidy.opts = list(width.cutoff = 60), tidy = TRUE
)
library(tidyverse)
library(survival)
library(tidycmprsk)
library(survminer)
library(ggsurvfit)
library(data.table)
library(brms)
library(survminer)
library(ggplot2)
library(kableExtra)
require(AICcmodavg) # has the aictab function
library(adjustedCurves)
library(patchwork)

read_excel_allsheets <- function(filename, tibble = FALSE) {
    sheets <- readxl::excel_sheets(filename)
    x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
    if(!tibble) x <- lapply(x, as.data.frame)
    names(x) <- sheets
    x
}

bait_state <- read_csv("data/Bait_Attractiveness.csv") %>%
  mutate(`24-hour precipitation (sqrt)` = sqrt(prev24), 
         `24-hour temperature` = prev24temp, 
         Location = factor(Location)) %>%

  # mutate(Season = as.integer(factor(Season)),
  #        Habitat = factor(Habitat),
  #        Location = factor(Location)) %>%
  as.data.frame()

bait_state <- read_csv("data/Bait_Attractiveness_time_varying.csv") %>%
  mutate(`24-hour precipitation (sqrt)` = sqrt(rain24), 
         `24-hour temperature` = temp24, 
         `Cumulative Rain (sqrt)` = sqrt(cumrain),
         Location = factor(Location)) %>%
  group_by(BaitID) %>%
  arrange(time1) %>%
  mutate(cumtemp = cumsum(temp24)) %>%
  ungroup() %>%
  as.data.frame() 

delwp_cols <- c(Teal = "#00B2A9",
                Navy = "#201547",
                Environment = "#CEDC00",
                `Climate Change` = "#FDDA24",
                Energy = "#0072CE",
                Water = "#71C5E8",
                Planning = "#642667",
                Infrastructure = "#AF272F",
                FFR = "#E57200",
                Corporate = "#201547")

Bait Fate Analysis

We investigate the effect of several environmental covariates on the survivorship of feral cat baits. For this analysis we use a Cox-proportional hazard regression model that accounts for various longevity of the bait (Andersen and Gill 1982). Models were implemented using the survival R package (Therneau 2022; Terry M. Therneau and Patricia M. Grambsch 2000). We compared model fits for several candidate models using second-order Akaike’s information criterion (AICc) with the AICcmodavg R package (Mazerolle 2020).

We have various deployments across sites and different seasons. Given that the experimental design has not been set-up to look at the crossed effects we should really only use one categorical group: Location

Given, that sample sizes are most balanced for region (GLCP, HKNP and WPNP) we will exclusively use this group for the analysis. However, it should be kept in mind that the regional effects may be confounded by season and habitat types. Based on ‘location’, rainfall and temperature, we use AIC-based model selection to compare and select a model to use in the analysis.

Show code

Temp <- bait_state$`24-hour temperature`
Precip <- bait_state$`cumrain`
par(mfrow = c(1,2))
hist(Temp, breaks = 10)
hist(Precip, breaks = 10)

Show code

# time varying model
baitcrr <- coxph(Surv(time2, time2 = time1, status) ~ `24-hour temperature` 
                 + `Cumulative Rain (sqrt)` 
                 + Location, data = bait_state, x = TRUE)

baitcrr1 <- coxph(Surv(time2, time2 = time1, status) ~ Location, data = bait_state)

baitcrr2 <- coxph(Surv(time2, time2 = time1, status) ~ `24-hour temperature` 
                 + `Cumulative Rain (sqrt)`, data = bait_state)

baitcrr3 <- coxph(Surv(time2, time2 = time1, status) ~ `Cumulative Rain (sqrt)` 
                 + Location, data = bait_state)

baitcrrNull <- coxph(Surv(time2, time2 = time1, status) ~ 1, data = bait_state)

model_list <- list(baitcrr, baitcrr1, baitcrr2, baitcrr3, baitcrrNull)
names(model_list) <- 1:5

# Compare models with AIC table
aic_table <- aictab(model_list)

# Add a column to the table with variable names
model_vars <- sapply(model_list, function(x) gsub(":", "*", paste(names(x$means), collapse = " + ")))
aic_table <- cbind(Modnames = names(model_vars), model_vars) %>%
  merge(aic_table, by="Modnames") %>%
  arrange(AICc)

aic_table %>% 
  kbl("html", caption = "Model comparison of models", digits = 2) %>%
  kable_styling(bootstrap_options = "striped")

Table 1: Model comparison of models
Modnames	model_vars	K	AICc	Delta_AICc	ModelLik	AICcWt	LL	Cum.Wt
1	`24-hour temperature` + `Cumulative Rain (sqrt)` + LocationHKNP + LocationWPNP	4	430.93	0.00	1.00	0.73	-211.45	0.73
4	`Cumulative Rain (sqrt)` + LocationHKNP + LocationWPNP	3	432.90	1.97	0.37	0.27	-213.44	1.00
2	LocationHKNP + LocationWPNP	2	471.54	40.61	0.00	0.00	-233.76	1.00
5		0	484.63	53.70	0.00	0.00	-242.32	1.00
3	`24-hour temperature` + `Cumulative Rain (sqrt)`	2	485.49	54.56	0.00	0.00	-240.74	1.00

Show code

lapply(model_list[1:4], cox.zph)

#> $`1`
#>                          chisq df       p
#> `24-hour temperature`     17.7  1 2.6e-05
#> `Cumulative Rain (sqrt)`  32.3  1 1.3e-08
#> Location                  45.8  2 1.1e-10
#> GLOBAL                    55.1  4 3.2e-11
#> 
#> $`2`
#>          chisq df       p
#> Location  32.3  2 9.7e-08
#> GLOBAL    32.3  2 9.7e-08
#> 
#> $`3`
#>                          chisq df       p
#> `24-hour temperature`     13.5  1 0.00024
#> `Cumulative Rain (sqrt)`  17.5  1 2.8e-05
#> GLOBAL                    22.5  2 1.3e-05
#> 
#> $`4`
#>                          chisq df       p
#> `Cumulative Rain (sqrt)`  21.2  1 4.2e-06
#> Location                  29.6  2 3.8e-07
#> GLOBAL                    29.8  3 1.5e-06

Based on the above checks, no models meet the assumptions of a cox-proportional hazards model. We can try and circumvent this issue by stratifying the location:

Show code

baitcrrS <- coxph(Surv(time2, time2 = time1, status) ~ `24-hour temperature` 
                  + `Cumulative Rain (sqrt)`  
                  + strata(Location), data = bait_state)

summary(baitcrrS)

#> Call:
#> coxph(formula = Surv(time2, time2 = time1, status) ~ `24-hour temperature` + 
#>     `Cumulative Rain (sqrt)` + strata(Location), data = bait_state)
#> 
#>   n= 1187, number of events= 68 
#> 
#>                                coef  exp(coef)   se(coef)  z Pr(>|z|)
#> `24-hour temperature`    -6.597e-02  9.362e-01  3.759e+06  0        1
#> `Cumulative Rain (sqrt)`         NA         NA  0.000e+00 NA       NA
#> 
#>                          exp(coef) exp(-coef) lower .95 upper .95
#> `24-hour temperature`       0.9362      1.068         0       Inf
#> `Cumulative Rain (sqrt)`        NA         NA        NA        NA
#> 
#> Concordance= 0.5  (se = 0 )
#> Likelihood ratio test= 0  on 1 df,   p=1
#> Wald test            = 0  on 1 df,   p=1
#> Score (logrank) test = 0  on 1 df,   p=1

Show code


cox.zph(baitcrrS)

#>                          chisq df p
#> `24-hour temperature` 4.54e-16  1 1
#> GLOBAL                4.54e-16  1 1

Based on the stratification it looks like the model has issues in estimate coeffieiceients for more than one parameter. It does not seem to be the solution we are after.

To circumvent this issue we are going to try a different approach

Bait fate analysis using logistic regression

Given the issues with the above approach in survival analysis and the violation of the cox-proportional hazards test it is not feasible to use a cox-proportional hazard model to evaluate the response of baits to various environmental factors. Another option is to use logistic regression with the time-varying components. This approach models the likelihood of a bait becoming unattractive as a function of environmental and other covariates for each given bait at each period the bait is still uncensored. For a vignette on the topic on how to apply this approach in brms see:

https://bookdown.org/content/ef0b28f7-8bdf-4ba7-ae2c-bc2b1f012283/extending-the-discrete-time-hazard-model.html#time-varying-predictors-1

We can assess the correlations between predictors with the following checks:

Show code

panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r <- abs(cor(x, y, use="pairwise.complete.obs"))
  txt <- format(c(r, 0.123456789), digits=digits)[1]
  txt <- paste0(prefix, txt)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor * r)
}

baitselectdata <- bait_state %>%
  dplyr::select(cumtemp, cumrain, time1, Location, rain24, temp24)

pairs(baitselectdata, lower.panel=panel.smooth, upper.panel=panel.cor)

Based on this we should not use ‘cumtemp’ and ‘time1’ (days) in the same model. We should also avoid ‘rain24’ and ‘cumrain’. We run several select models and compare the models using leave-one-out cross validation (LOO-CV) to select the best performing model (Vehtari et al. 2020; Vehtari, Gelman, and Gabry 2017). Models were implemented using Bayesian methods with the brms R package (Bürkner 2017, 2018, 2021). We used a weakly informative normal prior with a standard deviation of 4. 4 chains were run in parallel with 1,000 warmup iterations and 2,000 sampling iterations.

Show code

brms_bait_survival_model <- brm(data = bait_state,
      family = bernoulli(link = "logit"),
      formula = status ~ scale(cumtemp) + scale(sqrt(cumrain)) + Location  + (1|BaitID),
      prior(normal(0, 4), class = b),
      chains = 4, cores = 4, iter = 2000, warmup = 1000,
      seed = 12)

brms_bait_survival_model2 <- brm(data = bait_state,
      family = bernoulli(link = "logit"),
      formula = status ~ scale(cumtemp) + scale(sqrt(cumrain)) + (1|Location)  + (1|BaitID),
      prior(normal(0, 4), class = b),
      chains = 4, cores = 4, iter = 2000, warmup = 1000,
      seed = 12)

brms_bait_survival_model3 <- brm(data = bait_state,
      family = bernoulli(link = "logit"),
      formula = status ~ scale(time1) + scale(sqrt(cumrain)) + Location  + (1|BaitID),
      prior(normal(0, 4), class = b),
      chains = 4, cores = 4, iter = 2000, warmup = 1000,
      seed = 12)

brms_bait_survival_model4 <- brm(data = bait_state,
      family = bernoulli(link = "logit"),
      formula = status ~ scale(temp24) + scale(sqrt(cumrain)) + scale(time1) + Location  + (1|BaitID),
      prior(normal(0, 4), class = b),
      chains = 4, cores = 4, iter = 2000, warmup = 1000,
      seed = 12)

brms_bait_survival_model5 <- brm(data = bait_state,
      family = bernoulli(link = "logit"),
      formula = status ~ Location*scale(time1)  + (1|BaitID),
      prior(normal(0, 4), class = b),
      chains = 4, cores = 4, iter = 2000, warmup = 1000,
      seed = 12)

brms_bait_survival_model6 <- brm(data = bait_state,
      family = bernoulli(link = "logit"),
      formula = status ~ Location*scale(time1) + scale(temp24) + scale(sqrt(rain24)) + (1|BaitID),
      prior(normal(0, 4), class = b),
      chains = 4, cores = 4, iter = 2000, warmup = 1000,
      seed = 12)

brms_bait_survival_modelHS <- brm(data = bait_state,
      family = bernoulli(link = "logit"),
      formula = status ~ Location*scale(time1) + scale(temp24) + scale(sqrt(rain24)) + (1|BaitID),
      prior(horseshoe(df = 2, par_ratio = 1), class = b),
      chains = 4, cores = 4, iter = 2000, warmup = 1000,
      seed = 12)

brms_bait_survival_modelNull <- brm(data = bait_state,
      family = bernoulli(link = "logit"),
      formula = status ~ 1,
      chains = 4, cores = 4, iter = 2000, warmup = 1000,
      seed = 12)

model_list <- list(brms_bait_survival_model, 
            brms_bait_survival_model2, 
            brms_bait_survival_model3, 
            brms_bait_survival_model4, 
            brms_bait_survival_model5, 
            brms_bait_survival_model6, 
            brms_bait_survival_modelHS,
            brms_bait_survival_modelNull)

loo_compare <- loo_compare(loo(brms_bait_survival_model), 
            loo(brms_bait_survival_model2), 
            loo(brms_bait_survival_model3), 
            loo(brms_bait_survival_model4), 
            loo(brms_bait_survival_model5), 
            loo(brms_bait_survival_model6), 
            loo(brms_bait_survival_modelHS),
            loo(brms_bait_survival_modelNull))

saveRDS(loo_compare, "models/loo_compare.rds")
saveRDS(brms_bait_survival_model6, "models/brms_bait_survival_model6.rds", compress = "xz")

Show code

loo_compare <- readRDS("models/loo_compare.rds")
loo_compare

#>                              elpd_diff se_diff
#> brms_bait_survival_model6       0.0       0.0 
#> brms_bait_survival_modelHS     -2.0       1.2 
#> brms_bait_survival_model5      -2.8       3.2 
#> brms_bait_survival_model3      -7.0       6.6 
#> brms_bait_survival_model4      -8.0       6.5 
#> brms_bait_survival_model2     -10.9       7.8 
#> brms_bait_survival_model      -11.1       7.6 
#> brms_bait_survival_modelNull -117.7      17.7

Show code

brms_bait_survival_model6 <- readRDS("models/brms_bait_survival_model6.rds")
bait_survival_top_model <- brms_bait_survival_model6

Bait survival evaluation

Show code

bayesplot::mcmc_trace(bait_survival_top_model, facet_args = list(ncol = 2), regex_pars = c("time1", "Location", "temp24", "rain24", "lp"))

Bait survival results

Bait survival was primarily impacted by the location of the bait station and the time the bait was left out for; with some slight evidence suggesting rainfall may improve bait survival (reduce the rate of bait decay). The effect of location on bait survival suggests that there is unexplained variation driving bait decay; with GLCP appearing to have faster rates of decay.

Show code

summary(bait_survival_top_model)

#>  Family: bernoulli 
#>   Links: mu = logit 
#> Formula: status ~ Location * scale(time1) + scale(temp24) + scale(sqrt(rain24)) + (1 | BaitID) 
#>    Data: bait_state (Number of observations: 1187) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Group-Level Effects: 
#> ~BaitID (Number of levels: 117) 
#>               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> sd(Intercept)     0.37      0.29     0.02     1.11 1.00     1052
#>               Tail_ESS
#> sd(Intercept)     1509
#> 
#> Population-Level Effects: 
#>                         Estimate Est.Error l-95% CI u-95% CI Rhat
#> Intercept                  -6.88      1.23    -9.49    -4.72 1.00
#> LocationHKNP               -4.21      2.17    -8.63    -0.11 1.00
#> LocationWPNP                2.77      1.29     0.36     5.35 1.00
#> scaletime1                  6.36      1.31     3.95     9.04 1.00
#> scaletemp24                -0.04      0.26    -0.57     0.46 1.00
#> scalesqrtrain24            -0.45      0.21    -0.87    -0.05 1.00
#> LocationHKNP:scaletime1    -0.37      1.53    -3.48     2.56 1.00
#> LocationWPNP:scaletime1    -4.28      1.31    -6.91    -1.82 1.00
#>                         Bulk_ESS Tail_ESS
#> Intercept                   1653     2259
#> LocationHKNP                2019     2231
#> LocationWPNP                1782     2379
#> scaletime1                  1630     2039
#> scaletemp24                 2851     3091
#> scalesqrtrain24             3734     3085
#> LocationHKNP:scaletime1     1627     2293
#> LocationWPNP:scaletime1     1634     2104
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

Show code

bait_aggregated <-
  bait_state %>% 
  mutate(event = case_when(status == 1 ~ "event",
                           TRUE ~ "no_event")) %>% 
  group_by(Location) %>% 
  count(event, temp24, rain24, time1) %>% 
  ungroup() %>% 
  complete(Location, event, temp24, rain24, time1) %>% 
  pivot_wider(names_from = event,
              values_from = n) %>% 
  mutate(total = event + no_event) %>% 
  mutate(proportion = event / total) %>% 
  mutate(logit = log(proportion / (1 - proportion)))

Show code

scenarios <- data.frame(cond = c("Hot and Dry", 
                                 "Hot and Wet", 
                                 "Hot and Mid-precipitation",
                                 "Cold and Dry", 
                                 "Cold and Wet", 
                                 "Cold and Mid-precipitation",
                                 "Mid-temp and Dry", 
                                 "Mid-temp and Wet", 
                                 "Mid-temp and Mid-precipitation"), 
                        q1 = c(0.9, 0.9, 0.9, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5), 
                        q2 = c(0.1, 0.9, 0.5, 0.1, 0.9, 0.5, 0.1, 0.9, 0.5))


newdata_env <- list()

for(i in 1:nrow(scenarios)) {
newdata_env[[i]] <- expand.grid(Type = scenarios$cond[i], 
                           rain24 = quantile(bait_state$rain24, scenarios$q1[i]), 
                           temp24 = quantile(bait_state$temp24, scenarios$q2[i]), 
                           Location = unique(bait_state$Location), 
                           time1 = as.numeric(1:14), 
                           # status = 1, 
                           Title = paste0("Temp = ", round(quantile(Temp, scenarios$q1[i]),1), "°C, Precip = ", round(quantile(Precip, scenarios$q2[i]),1), "mm")) %>%
  mutate(time2 = time1 - 1)

}

newdata_all <- bind_rows(newdata_env)

preds_bs <- cbind(newdata_all, predict(bait_survival_top_model, newdata = newdata_all, re_formula = NA)) %>%
  rowwise() %>%
  mutate(LCI = max(c(Estimate - Est.Error, 0)), 
         UCI = min(c(Estimate + Est.Error, 1))) %>%
  ungroup() %>%
  group_by(Location, time1, Type, Title) %>% 
  mutate_at(vars(Estimate, LCI, UCI), .funs = ~cumprod(1 - .))

ggplot(data = preds_bs) +
  geom_ribbon(aes(x = time1, ymin = LCI, ymax = UCI, fill = Location), alpha = 0.5) +
  geom_line(aes(x = time1, y = Estimate, colour = Location)) + 
  scale_fill_manual(values = unname(delwp_cols[1:3])) + 
  scale_colour_manual(values = unname(delwp_cols[1:3])) + 
  facet_wrap(~ Type + Title, nrow = 3) +
  scale_x_continuous(breaks = seq(0,14, by = 2), expand = expansion(0, 0), name = "Days") +
  ylab("Survival probability") +
  theme_bw()

Survival of bits under different possible scenarios. Scenarios were derived from the 0.9, 0.5 and 0.1 quantiles of the data for temperature and precipitation values. Lines are the mean prediction and shaded area represents ± 1 standard error

Figure 1: Survival of bits under different possible scenarios. Scenarios were derived from the 0.9, 0.5 and 0.1 quantiles of the data for temperature and precipitation values. Lines are the mean prediction and shaded area represents ± 1 standard error

Based on the above scenarios we can generate the following table that shows the day when the probability of bait survival drops below 0.5.Note that in some cases survival probability does not drop below 0.5 within the 14 days.

Show code

preds_bs %>%
  group_by(Type, Location) %>%
  arrange(desc(Estimate)) %>%
  filter(Estimate <= 0.5) %>%
  slice(1) %>%
  select(Type, Conditions = Title, Location, Time = time1, `Survival Probability` = Estimate, `Survival Probability SE` = Est.Error) %>%
  kbl("html", digits = 2, caption = "Time at which bait survival probability falls below 0.5") %>%
  kable_styling(bootstrap_options = "striped") %>%
  kableExtra::collapse_rows(columns = 1:2)

Table 2: Time at which bait survival probability falls below 0.5
Type	Conditions	Location	Time	Survival Probability	Survival Probability SE
Hot and Dry	Temp = 20.2°C, Precip = 1.2mm	GLCP	11	0.25	0.43
		HKNP	14	0.21	0.40
		WPNP	14	0.49	0.50
Hot and Wet	Temp = 20.2°C, Precip = 15.2mm	GLCP	11	0.27	0.44
Hot and Wet	Temp = 20.2°C, Precip = 15.2mm	HKNP	14	0.23	0.42
Hot and Mid-precipitation	Temp = 20.2°C, Precip = 3.6mm	GLCP	11	0.26	0.44
Hot and Mid-precipitation	Temp = 20.2°C, Precip = 3.6mm	HKNP	14	0.21	0.41
Cold and Dry	Temp = 13.5°C, Precip = 1.2mm	GLCP	10	0.46	0.50
		HKNP	13	0.39	0.49
		WPNP	13	0.46	0.50
Cold and Wet	Temp = 13.5°C, Precip = 15.2mm	GLCP	10	0.48	0.50
		HKNP	13	0.41	0.49
		WPNP	13	0.49	0.50
Cold and Mid-precipitation	Temp = 13.5°C, Precip = 3.6mm	GLCP	10	0.47	0.50
		HKNP	13	0.38	0.49
		WPNP	13	0.47	0.50
Mid-temp and Dry	Temp = 16.8°C, Precip = 1.2mm	GLCP	11	0.17	0.37
		HKNP	13	0.43	0.50
		WPNP	14	0.39	0.49
Mid-temp and Wet	Temp = 16.8°C, Precip = 15.2mm	GLCP	11	0.20	0.40
		HKNP	13	0.46	0.50
		WPNP	14	0.40	0.49
Mid-temp and Mid-precipitation	Temp = 16.8°C, Precip = 3.6mm	GLCP	11	0.18	0.39
		HKNP	13	0.45	0.50
		WPNP	14	0.39	0.49

If we predict survival rates for mean levels of various environmental predictors (rainfall and temperature) we can obtain more general survival curves:

Show code

f <-
  fitted(bait_survival_top_model,
         newdata = newdata_all, re_formula = NA) %>% 
  data.frame() %>% 
  bind_cols(newdata_all)

# hazard (top panel)
p1 <-
  f %>% 
  group_by(Location, time1) %>% 
  summarise_all(mean) %>%
  ggplot(aes(x = time1, group = Location, color = Location, fill = Location)) +
  geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
              alpha = 0.5, size = 0) +
  geom_line(aes(y = Estimate)) +
  scale_y_continuous("fitted hazard")

# survival (bottom panel)
p2 <-
  f %>%
  group_by(Location, time1) %>% 
  mutate_at(vars(Estimate, Q2.5, Q97.5), .funs = ~cumprod(1 - .)) %>% 
  summarise_all(mean) %>%
  ggplot(aes(x = time1, group = Location, color = Location, fill = Location)) +
  geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
              alpha = 0.5, size = 0) +
  geom_line(aes(y = Estimate)) +
    geom_hline(yintercept = .5, color = "red", linetype = "dashed") +
  scale_y_continuous("fitted survival probability", limits = c(0, 1)) +
  theme(strip.background.x = element_blank(),
        strip.text.x = element_blank())

(
  (p1 / p2) & 
  scale_fill_manual(values = unname(delwp_cols[1:3])) & 
  scale_colour_manual(values = unname(delwp_cols[1:3])) & 
  theme_bw() &
  scale_x_continuous(breaks = seq(0,14, by = 2), expand = expansion(0, 0), name = "Days")
) +
  plot_layout(guides = "collect")

Figure 2: Hazard rates and survival probability of fitted models

Bait encounter analysis

Due to the data structures provided a decent amount of data wrangling was needed in order to format the data into a suitable long format for binomial regression. The aim of this analysis is to determine the probability of a cat (or other animal) taking a bait, having encountered it.

Show code

melt_taken <- function(x, region, round) {
  df <- data.table::as.data.table(x) %>%
    melt(id.vars = "Station", variable.name = "Date") %>% 
    mutate(Date = as.Date(Date)) %>%
    group_by(Station) %>%
    arrange(Date) %>%
    tidyr::fill(Date, .direction = "down") %>%
    mutate(value = coalesce(value, "BP")) %>%
    filter(value != "BA") %>%
    mutate(StartDate = min(Date), 
           EndDate = max(Date)) %>%
    slice_tail(n = 1) %>%
    ungroup() %>%
    mutate(Date = as.Date(Date)) %>%
    mutate(Taken = case_when(value == "BP" ~ 0, 
                             TRUE ~ 1)) %>%
    rename(Species = value)
  
  df$Region <- region
  df$Round <- round
  
  return(df)
}

observations <- list()
takes <- list()

# BD
BD_data_R1 <- readxl::read_excel("data/FateData/BD_record_table_15min_BaitFate.xlsx", sheet = 1) %>%
  mutate(Region = "BD", 
         Round = 1)
BD_data_R2 <- readxl::read_excel("data/FateData/BD_record_table_15min_BaitFate.xlsx", sheet = 2) %>%
    mutate(Region = "BD", 
         Round = 2)
BD_taken_matrix <- openxlsx::read.xlsx("data/FateData/BD_record_table_15min_BaitFate.xlsx", sheet = 3, detectDates = T)
BD_take_R1 <- BD_taken_matrix[2:nrow(BD_taken_matrix),c(1,which(BD_taken_matrix[1,] == "R1"))]
BD_take_R2 <- BD_taken_matrix[2:nrow(BD_taken_matrix),c(1,which(BD_taken_matrix[1,] == "R2"))]
observations[["BD"]] <- bind_rows(BD_data_R1, BD_data_R2)

melt_BD_take_R1 <- melt_taken(BD_take_R1, region = "BD", round = 1)
melt_BD_take_R2 <- melt_taken(BD_take_R2, region = "BD", round = 2)
takes[["BD"]] <- bind_rows(melt_BD_take_R1, melt_BD_take_R2)

# BM 
BM_data_R1 <- readxl::read_excel("data/FateData/BM_record_table_15min_BaitFate.xlsx", sheet = 1)[,1:5] %>%
  mutate(Region = "BM", 
         Round = 1)

BM_taken_matrix <- openxlsx::read.xlsx("data/FateData/BM_record_table_15min_BaitFate.xlsx", sheet = 2, detectDates = T) %>%
  rename(Station = CamID)

observations[["BM"]] <- BM_data_R1

takes[["BM"]] <- melt_taken(BM_taken_matrix, region = "BM", round = 1)

#GLCP 
GLCP_data_R1 <- readxl::read_excel("data/FateData/GLCP_record_table_15min_BaitFate.xlsx", sheet = 1, 
                                   col_types = c("numeric", "text", "text", "date", "date"))[,1:5] %>%
  mutate(Region = "GLCP", 
         Round = 1, 
         Station = paste0("GLCP", Station))

GLCP_taken_matrix <- openxlsx::read.xlsx("data/FateData/GLCP_record_table_15min_BaitFate.xlsx", sheet = "BaitFate", detectDates = T) %>%
  rename(Station = `Site.ID`)

observations[["GLCP"]] <- GLCP_data_R1

takes[["GLCP"]] <- melt_taken(GLCP_taken_matrix, region = "GLCP", round = 1)

#HKNP
HKNP_data_R1 <- readxl::read_excel("data/FateData/HKNP_record_table_15min_BaitFate_2.xlsx", sheet = 1) %>%
  mutate(Region = "HKNP", 
         Round = 1)
HKNP_data_R2 <- readxl::read_excel("data/FateData/HKNP_record_table_15min_BaitFate_2.xlsx", sheet = 3) %>%
  mutate(Region = "HKNP", 
         Round = 2)

HKNP_take_R1 <- openxlsx::read.xlsx("data/FateData/HKNP_record_table_15min_BaitFate_2.xlsx", sheet = 2, detectDates = T)

HKNP_take_R2 <- openxlsx::read.xlsx("data/FateData/HKNP_record_table_15min_BaitFate_2.xlsx", sheet = 4, detectDates = T) %>%
  rename(Station = `Row.Labels`)

observations[["HKNP"]] <- bind_rows(HKNP_data_R1, HKNP_data_R2)

melt_HKNP_take_R1 <- melt_taken(HKNP_take_R1, region = "HKNP", round = 1)
melt_HKNP_take_R2 <- melt_taken(HKNP_take_R1, region = "HKNP", round = 2)
takes[["HKNP"]] <- bind_rows(melt_HKNP_take_R1, melt_HKNP_take_R2)

#WPNP
WPNP_data_R1 <- readxl::read_excel("data/FateData/WPNP_record_table_15min_BaitFate.xlsx", sheet = 1)[,1:5] %>%
  mutate(Region = "WPNP", 
         Round = 1, 
         Station = as.character(Station))

WPNP_taken_matrix <- openxlsx::read.xlsx("data/FateData/WPNP_record_table_15min_BaitFate.xlsx", sheet = "BaitFate", detectDates = T) 

observations[["WPNP"]] <- WPNP_data_R1

takes[["WPNP"]] <- melt_taken(WPNP_taken_matrix, region = "WPNP", round = 1) %>%
  mutate(Station = as.character(Station))

# TA Data 
TA_data <- readxl::read_excel("data/FateData/TA_record_table_15min_BaitFate.xlsx", sheet = 1)[,1:5] %>%
  mutate(Region = "TA", 
         Round = as.numeric(substr(Round,2,2)), 
         Station = as.character(Station), 
         Date = as.Date(DateTimeOriginal))

observations[["TA"]] <- TA_data

TA_taken_matrix <- openxlsx::read.xlsx("data/FateData/TA_record_table_15min_BaitFate.xlsx", sheet = 2, detectDates = T) %>%
  rename(Station = Date)
TA_take_R1 <- TA_taken_matrix[2:nrow(TA_taken_matrix),c(1,which(TA_taken_matrix[1,] == "R1"))]
TA_take_R2 <- TA_taken_matrix[2:nrow(TA_taken_matrix),c(1,which(TA_taken_matrix[1,] == "R2"))]


melt_TA_take_R1 <- melt_taken(TA_take_R1, region = "TA", round = 1)
melt_TA_take_R2 <- melt_taken(TA_take_R2, region = "TA", round = 2)
takes[["TA"]] <- bind_rows(melt_TA_take_R1, melt_TA_take_R2)

##############

observations_joined <- bind_rows(observations)
taken_joined <- bind_rows(takes)

unique_sp_df <- list()
unique_sp_df[["unique_obs_species"]] <- data.frame(`NameInData` = unique(observations_joined$Species) %>% sort(), `ConsistentStandardisedSpeciesName` = NA, GroupLabel = NA)
unique_sp_df[["unique_taken_species"]] <- data.frame(`NameInData` = unique(taken_joined$Species) %>% sort(), `ConsistentStandardisedSpeciesName` = NA, GroupLabel = NA)

# openxlsx::write.xlsx(unique_sp_df, "data/unique_sp_df.xlsx", overwrite = T)

Show code

matched_sp_data <- readxl::read_excel("data/matched_sp.xlsx", sheet = 1) %>% unique()
matched_sp_data2 <- readxl::read_excel("data/matched_sp.xlsx", sheet = 2) %>% unique()

taken_date <- taken_joined %>% 
  # filter(Taken == 1) %>%
  mutate(Date = coalesce(Date, as.Date("2050-01-01"))) %>%
  dplyr::select(Station, TakenDate = Date, Region, Round)

taken_by <- taken_joined %>%
  mutate(Taken = 1) %>%
  left_join(matched_sp_data2, by = c("Species" = "NameInData")) %>%
  dplyr::select(Station, Date, Region, Round, ConsistentStandardisedSpeciesName, Taken) 

StationStarts <- taken_joined %>%
  dplyr::select(Station, Region, Round, StartDate)

observations_anti <- observations_joined %>% 
  left_join(matched_sp_data, by = c("Species" = "NameInData")) %>%
  left_join(taken_date, by = c("Station", "Region", "Round")) %>%
  anti_join(taken_by, y = .) %>% filter(!is.na(ConsistentStandardisedSpeciesName))

write.csv(observations_anti, "data/non-matching-data.csv")

data_size <- observations_joined %>% 
  left_join(StationStarts, by = c("Station", "Region", "Round"))

observations_matched<- observations_joined %>% 
  left_join(StationStarts, by = c("Station", "Region", "Round")) %>%
  mutate(Day = as.numeric(as.Date(Date) - StartDate + 1)) %>%
  left_join(matched_sp_data, by = c("Species" = "NameInData")) %>%
  left_join(taken_date, by = c("Station", "Region", "Round")) %>%
  left_join(taken_by) %>%
  filter(Date <= TakenDate) %>%
  mutate(Taken = coalesce(Taken, 0)) %>%
  group_by(Station, Region, Round, Taken) %>%
  filter(Taken == 0 | Taken == 1 & DateTimeOriginal == min(DateTimeOriginal)) %>% ungroup() %>%
  dplyr::select(Station, Species, DateTimeOriginal, Date, Region, Round, StartDate, Day, ConsistentStandardisedSpeciesName, GroupLabel_1, GroupLabel_2, TakenDate, Taken) %>%
filter(!is.na(GroupLabel_2))

write_csv(observations_matched, "data/observations_matched.csv")

Bait encounter model

Given we are modelling the probability of an animal taking the bait, having already encountered it, we do not need to model factors affecting encounter rate (which is likely linked to abundance and behaviour). Instead, we model the probability an animal will take a bat, having encountered it based on the days a bait has been deployed and a random effect of region and station. ‘Days’ is modelled as an interaction with species group (cat or other). Below we run a binomial regression (logit-link) model with region and station as random effects, just region and a null model. We compare the models using leave-one-out cross validation (LOO-CV) to select the best performing model (Vehtari et al. 2020; Vehtari, Gelman, and Gabry 2017). Models were implemented using Bayesian methods with the brms R package (Bürkner 2017, 2018, 2021). We used a weakly informative student-t prior with 3 degrees of freedom. 3 chains were run in parallel with 1,000 warmup iterations and 2,000 sampling iterations.

Show code

brms_bait_model <- brm(formula = Taken ~ GroupLabel_2*scale(Day) + (1|Region), 
                       data = observations_matched, 
                       family = bernoulli(link = "logit"),
                       warmup = 1000, 
                      iter = 2000, 
                      chains = 3, 
                      inits= "0", 
                      cores=3,
                      seed = 123, silent = 2, refresh = 0)

brms_bait_model_st <- brm(formula = Taken ~ GroupLabel_2*scale(Day) + (1|Region) + (1|Station), 
                       data = observations_matched, 
                       family = bernoulli(link = "logit"),
                       warmup = 1000, 
                      iter = 2000, 
                      chains = 3, 
                      inits= "0", 
                      cores=3,
                      seed = 123, silent = 2, refresh = 0)

brms_bait_model_null <- brm(formula = Taken ~ 1, 
                       data = observations_matched, 
                       family = bernoulli(link = "logit"),
                       warmup = 1000, 
                      iter = 2000, 
                      chains = 3, 
                      inits= "0", 
                      cores=3,
                      seed = 123, silent = 2, refresh = 0)

LOO

We use leave-one-out cross validation to compare the written model against null model. Broadly, this tests the posterior predictive power of the given model against one with no predictors.

Show code

loo_compare(loo(brms_bait_model), loo(brms_bait_model_null), loo(brms_bait_model_st))

#>                      elpd_diff se_diff
#> brms_bait_model_st     0.0       0.0  
#> brms_bait_model      -12.1       5.6  
#> brms_bait_model_null -43.7      10.9

Based on the above comparison, our model with days and a region and station random effect performs better than a null model.

Show code

summary(brms_bait_model_st)

#>  Family: bernoulli 
#>   Links: mu = logit 
#> Formula: Taken ~ GroupLabel_2 * scale(Day) + (1 | Region) + (1 | Station) 
#>    Data: observations_matched (Number of observations: 1761) 
#>   Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 3000
#> 
#> Group-Level Effects: 
#> ~Region (Number of levels: 6) 
#>               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> sd(Intercept)     1.01      0.49     0.38     2.32 1.00      807
#>               Tail_ESS
#> sd(Intercept)     1392
#> 
#> ~Station (Number of levels: 280) 
#>               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> sd(Intercept)     1.23      0.23     0.78     1.70 1.01      616
#>               Tail_ESS
#> sd(Intercept)      749
#> 
#> Population-Level Effects: 
#>                            Estimate Est.Error l-95% CI u-95% CI Rhat
#> Intercept                      0.23      0.74    -1.20     1.69 1.00
#> GroupLabel_2Other             -2.86      0.58    -4.00    -1.71 1.00
#> scaleDay                       1.29      0.46     0.45     2.21 1.00
#> GroupLabel_2Other:scaleDay    -1.14      0.45    -2.06    -0.30 1.00
#>                            Bulk_ESS Tail_ESS
#> Intercept                      1497     1392
#> GroupLabel_2Other              2688     2262
#> scaleDay                       1459     1628
#> GroupLabel_2Other:scaleDay     1667     1544
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

Traceplots

We assess convergenve via inspection of traceplots and Rhat values. Traceplots show good evidence of convergence and no Rhat values are above 1.05.

Show code

bayesplot::mcmc_trace(brms_bait_model_st, facet_args = list(ncol = 3), regex_pars = c("GroupLabel_2", "Region", "Day", "lp"))

Figure 3: Traceplots for the binomial regression encounter model

Model Results

From our model, we can generate predictions of the probability a cat or other species will take a bait, having encountered it. This probability will vary with the ‘days’ since the bait was deployed.

Show code

ce_plot <- conditional_effects(brms_bait_model_st, effects = "Day:GroupLabel_2")

plot(ce_plot, plot = FALSE)[[1]] +
  scale_fill_manual(values = unname(delwp_cols[1:2]), name = "Species") + 
  scale_colour_manual(values = unname(delwp_cols[1:2]), name = "Species") + 
  ylab("Probability of taking bait, having encountered it") +
  ggplot2::theme_bw()

From the results we see that the probability a cat will take a bait increases the longer a bait is deployed. This does not appear to be the case for other species. This then raises the question on the various probabilities a bait will be taken by a cat under different encounter rates (by cat and others). To investigate these scenarios we will use the distribution of encounter rates present in the original data:

Show code

encounter_scenarios <- list()
encounter_scenarios[["cat_1"]] <- data.frame(GroupLabel_2 = "Cat", 
                                  Day = 1:38)
encounter_scenarios[["cat_2"]] <- data.frame(GroupLabel_2 = "Cat", 
                                  Day = seq(2, 38, by = 2))
encounter_scenarios[["cat_week"]] <- data.frame(GroupLabel_2 = "Cat", 
                                  Day = seq(7, 35, by = 7))
# encounter_scenarios_cat_once <- data.frame(GroupLabel_2 = "Cat", 
#                                   Day = round(runif(1,1,38)))

encounter_scenarios[["other_1"]] <- data.frame(GroupLabel_2 = "Other", 
                                  Day = 1:38)
encounter_scenarios[["other_5"]] <- data.frame(GroupLabel_2 = "Other", 
                                  Day = rep(1:38, each = 5))
encounter_scenarios[["other_10"]] <- data.frame(GroupLabel_2 = "Other", 
                                  Day = rep(1:38, each = 10))

postdraws <- list()
pp_day <- list()

for(i in 1:length(encounter_scenarios)) {
  
postdraws[[i]] <- posterior_predict(brms_bait_model_st, newdata = encounter_scenarios[[i]], re_formula = NA, ndraws = 1000) 

pp_day[[i]] <- data.frame(DayTaken = encounter_scenarios[[i]]$Day[apply(postdraws[[i]], 1, which.max)])

colnames(pp_day[[i]]) <- encounter_scenarios[[i]]$GroupLabel_2[1]

}

bind_cols_scenarios <- function(df1, df2, name) {
  new_df <- bind_cols(df1, df2) %>%
    mutate(Draw = 1:n()) %>%
    rowwise() %>%
    mutate(Scenario = name, 
           DayTaken = min(Cat, Other), 
           SpeciesTaken = case_when(DayTaken == Cat & DayTaken !=Other ~ "Cat", 
                                    DayTaken != Cat & DayTaken ==Other ~ "Other", 
                                    DayTaken == Cat & DayTaken == Other ~ "Same Day")) %>%
    ungroup()
  
  return(new_df)
}

scenario_joined <- bind_cols_scenarios(pp_day[[2]], pp_day[[5]], "Cat encounter every second day, \nothers have 5 encounters a day")  %>%
  bind_rows(bind_cols_scenarios(pp_day[[3]], pp_day[[5]], "Cat encounter every 7 days, \nothers have 5 encounters a day")) %>%
  bind_rows(bind_cols_scenarios(pp_day[[2]], pp_day[[6]], "Cat encounter every second day, \nothers have 10 encounters a day"))

scenario_joined_summary <- scenario_joined %>%
    group_by(Scenario, SpeciesTaken) %>%
    summarise(`Proportion %` = 100*n()/1000)

scenario_joined_summary %>%
  kbl("html", digits = 2, caption = "Simulated bait taking rates based on different encounter volumes") %>%
  kable_styling(bootstrap_options = "striped") %>%
  collapse_rows(1:2)

Table 3: Simulated bait taking rates based on different encounter volumes
Scenario	SpeciesTaken	Proportion %
Cat encounter every 7 days, others have 5 encounters a day	Cat	7.6
	Other	90.1
	Same Day	2.3
Cat encounter every second day, others have 10 encounters a day	Cat	11.5
	Other	80.8
	Same Day	7.7
Cat encounter every second day, others have 5 encounters a day	Cat	25.2
	Other	65.7
	Same Day	9.1

If other species encounter the baits a lot it will be less likely that cats encounter and take the baits first. Because probability of bait taking by cats, having encountered the bait is low early in deployment; even several encounters early on do not ensure bait taking.

Show code

rdraws <- sample(1:1000, 100) # change random sample number
ggplot(scenario_joined %>% filter(Draw %in% rdraws) %>% group_by(Scenario) %>% mutate(Draw = 1:100) %>% ungroup(), aes(x=Draw, y=DayTaken)) +
  geom_segment( aes(x=Draw, xend=Draw, y=0, yend=DayTaken, colour = SpeciesTaken)) +
  geom_point(aes(fill = SpeciesTaken), size=3, alpha=0.9, shape = 21) +
  theme_light() +
  coord_flip() +
  xlab("Bait number") +
  ylab("Days") +
  scale_fill_manual(values = c(Cat = "#e41a1c", `Same Day` = "#377eb8", Other = "#4daf4a")) + 
  scale_colour_manual(values = c(Cat = "#e41a1c", `Same Day` = "#377eb8", Other = "#4daf4a")) + 
  theme(
    panel.grid.major.y = element_blank(),
    panel.border = element_blank(),
    axis.ticks.y = element_blank(), legend.position = "bottom"
  ) + facet_wrap(~Scenario)

Figure 4: Simulation of 100 posterior draws showing the fait of 100 baits for various encounter volumes

The above plot visualises the variability in probability that a given bait will be consumed by a cat or other species. Unless cat-bait encounter rates are high, the probability the bait will be consumed by a cat is low.

Andersen, Per Kragh, and Richard D Gill. 1982. “Cox’s Regression Model for Counting Processes: A Large Sample Study.” The Annals of Statistics, 1100–1120.

Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.

———. 2021. “Bayesian Item Response Modeling in R with brms and Stan.” Journal of Statistical Software 100 (5): 1–54. https://doi.org/10.18637/jss.v100.i05.

Mazerolle, Marc J. 2020. AICcmodavg: Model Selection and Multimodel Inference Based on (q)AIC(c). https://cran.r-project.org/package=AICcmodavg.

Terry M. Therneau, and Patricia M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. New York: Springer.

Therneau, Terry M. 2022. A Package for Survival Analysis in r. https://CRAN.R-project.org/package=survival.

Vehtari, Aki, Jonah Gabry, Mans Magnusson, Yuling Yao, Paul-Christian Bürkner, Topi Paananen, and Andrew Gelman. 2020. “Loo: Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models.” https://mc-stan.org/loo/.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27 (5): 1413–32.

Feral Cat Bait Analysis

Setup

Bait Fate Analysis

Bait fate analysis using logistic regression

Bait survival evaluation

Bait survival results

Bait encounter analysis

Bait encounter model

LOO

Traceplots

Model Results

References

Corrections

Citation