Location–scale models in ecology: handling heterogeneity in continuous, counts and proportions

step by step online tutorial

Authors

Shinichi Nakagawa

Santiago Ortega

Elena Gazzea

Malgorzata Lagisz

Anna Lenz

Erick Lundgren

Ayumi Mizuno*

Published

July 20, 2025

If you have any questions, errors or bug reports, please feel free to contact Ayumi Mizuno (amizuno@ualberta.ca / ayumi.mizuno5@gmail.com).

Some of the models in this tutorial take a long time to run. To make the tutorial faster and easier to follow, we have precomputed selected models and saved them as .rds files.

If you would like to re-run the code yourself and reproduce the results shown in the tutorial, please make sure to do the following:

  1. Download the following files:

  2. Create a folder structure like this on your machine:

    your-project-folder/
    ├── online_tutorial.qmd
    ├── data/
    │   └── (CSV files)
    └── Rdata/
        └── (RDS files from Google Drive)
  1. Adjust file paths if your folder structure is different. We use here::here()to help make paths more portable, but if you move or rename files/folders, make sure to update functions like readRDS() or read.csv() accordingly:
# this is example

dat <- read.csv(here("data", "XXX.csv"), header = TRUE) 
brms_m1 <- readRDS(here("Rdata", "brms_m1.rds"))

You will see similar code snippets throughout the tutorial. Feel free to modify them based on your own setup. If everything is placed correctly, the .qmd file should run without any issues.

Introduction

This tutorial provides a step-by-step guide to applying location-scale models in ecology, evolution, and environmental sciences. We focus on practical applications and demonstrate how to implement these models in R using the glmmTMB and brms packages.

After a brief introduction to both modeling frameworks, we present worked examples using real datasets. In the final section, we also discuss approaches for model comparison and selection.

Preparation

Load required packages

Our tutorial uses R statistical software and existing R packages, which you will first need to download and install.

This tutorial makes use of several R packages for data manipulation, model fitting, diagnostics, visualisation, and reporting:

  • dplyr, tibble, tidyverse - for efficient and tidy data manipulation.

  • brms, glmmTMB, arm - for fitting generalised linear mixed models.

  • cmdstanr - interfaces with the CmdStan backend to accelerate Bayesian sampling via within-chain parallelisation. is a C++ library for Bayesian inference, in order to enable within-chain parallelisation to speeding up the sampling process.

  • DHARMa, loo, MuMIn - for model diagnostics, cross-validation, and multi-model inference (note that both loo and Mumin contain functions with the same names, so call them with explicit namespaces, e.g. loo::loo( )).

  • ggplot2, patchwork, bayesplot, tidybayes - for flexible and publication-ready visualisations.

  • gt, kableExtra, knitr - for creating clean tables.

  • here - for consistent file path management across projects.

  • ape, TreeTools - for phylogenetic analyses and tree manipulation.

# Load required packages

pacman::p_load(
  ## data manipulation
  dplyr, tibble, tidyverse, broom, broom.mixed,
  
  ## model fitting
  ape, arm, brms, broom.mixed, cmdstanr, emmeans, glmmTMB, MASS, phytools, rstan, TreeTools,
  
  ## model checking and evaluation
  DHARMa, loo, MuMIn, parallel,
  
  ## visualisation
  bayesplot, ggplot2, patchwork, tidybayes,
  
  ## reporting and utilities
  gt, here, kableExtra, knitr
)

glmmTMB vs brms

glmmTMB is a powerful and flexible R package for fitting generalized linear mixed models (GLMMs), including models with random effect structures and scale (dispersion) part. It is built on the Template Model Builder (TMB) framework, which allows fast and efficient maximum likelihood estimation even for large and complex models.

brms is an R package that allows users to fit Bayesian generalized (non-)linear multilevel models using the probabilistic programming language Stan. It provides a user-friendly formula syntax similar to that of lme4 or glmmTMB, and supports a wide range of distributions, link functions, and advanced model components, including location-scale modeling.

Both packages are suitable for fitting location-scale models and are widely used in ecology and its related fields. Therefore, we selected them to illustrate how location-scale models can be practically applied in real data analysis.

While brms is a powerful and flexible package for Bayesian regression modeling, some readers may not be familiar with its usage. Below, we provide a brief introduction to fitting models using brms, focusing on the basic location-scale structure and key functions relevant to our analysis. You can also find some examples each section…

If you get stuck or are unsure about something, it might be helpful to check the below:

This example shows how to fit a simple location-scale model, where both the mean (\(\mu\)) and the variability (\(\sigma\)) of a continuous outcome variable \(y\) are modeled as functions of a predictor \(x\).

# Example dataset
# y is continuous response, x is a predictor
# specify the model using bf()
formula1 <- bf(
  y ~ 1 + x,  # location part
  sigma = ~ 1 + x # scale part - specified by sigma
)
#ASK Shinichi if this is correct
#' [EJL: Unlike in frequentist models, such as with glmmTMB, here we explicitly write the intercept term as '1']
# I do not think that is accurate. brms automatically includes the intercept - just like most R modeling frameworks. Writing `1 + x` is equivalent to `x` here... but may be I should ask Shinichi?

# generate default priors based on the formula and data
default_priors <- default_prior(
                        formula1,
                        data = dat,                             
                        family = gaussian()                               
                          )

# fit the model - you can change N of iter, warmup (and thin), and also chains.
  m1 <- brm(formula1,
                  data = dat,           
                  family = gaussian(),                   
                  prior = default_priors,                
                  iter = 2000,   # total iterations per Markov-chain (i.e., how many posterior samples are drawn, including warm-up)
                  warmup = 1000, # number of early draws used only for adapting the sampler (step-size, mass matrix). These samples are discarded
                  thin = 1,      # keep every n-th post-warm-up draw. 1 means keep all draws                    
                  chains = 2,    #  number of independent MCMC chains run in parallel. Provides a convergence check (via Rhat)                    
             )
summary(m1)

After fitting the model, you can use summary(m1) to inspect the estimated coefficients and sigma with 95% Credible Intervals, along with diagnostic statistics such as Rhat and effective sample size. To better understand how to interpret the model output, please refer to the “Bonus - brms” part in the next section.

Parallel Processing

Before fitting our models with brms, we configure some global options to optimize sampling speed using parallel processing:

  1. parallel::detectCores(): This function automatically detects the number of logical CPU cores available on your machine. This is a convenient way to ensure your code adapts to different computing environments.
  2. options(mc.cores = parallel::detectCores()): The mc.cores option is a global setting primarily used by rstan (the engine behind brms). It controls the number of MCMC chains that will be run in parallel. By setting it to detectCores(), you are telling brms to run as many chains concurrently as your CPU allows, significantly speeding up the overall sampling process.
  3. options(brms.threads = 6): The brms.threads option specifies the number of CPU threads that Stan’s internal operations can utilize within a single MCMC chain. This enables within-chain parallelisation, further accelerating computations, especially for complex or large datasets. The value 6 is an example; you can adjust this based on your specific CPU architecture and memory.

Threading is a powerful feature that enables you to split a chain into multiple parallel threads, significantly reducing computation time. However, it requires installing both cmdstanr and the underlying CmdStan backend.

Code
cmdstanr::install_cmdstan()
cmdstanr::check_cmdstan_toolchain()
cmdstanr::cmdstan_version()

These settings are crucial for making Bayesian model fitting with brms more efficient, particularly for complex models or large datasets.

parallel::detectCores()
[1] 12
options(mc.cores = parallel::detectCores())

options(brms.threads = 6)  # Set global default

brms models can be computationally intensive and take a significant amount of time to run. To streamline your workflow, we provide the pre-fit models in RDS files, allowing you to load them directly without needing to re-run the lengthy estimation process.

Specifying the Scale Component

Here’s how the scale component is handled in glmmTMB and brms, along with common parameter names for various distributions:

Distribution Scale Parameter (Example) glmmTMB Specification brms Specification (Example)
Gaussian \(\sigma\) dispformula = ~ ... bf(..., sigma ~ ...)
Negative Binomial \(\theta\) dispformula = ~ ... bf(..., shape ~ ...)
Conway-Maxwell-Poisson \(\nu\) dispformula = ~ ... bf(..., nu ~ ...)
Beta-Binomial \(\phi\) dispformula = ~ ... bf(..., phi ~ ...)

Note: In glmmTMB, dispformula is generally used to model the dispersion or scale parameter, regardless of its specific Greek letter notation, which varies by distribution.

The scale part varies depending on the distribution: for example, \(\sigma\) for Gaussian or \(\theta\) for negative binomial, \(\nu\) for Conway–Maxwell–Poisson, \(\phi\) for beta- binomial distribution (see the main text).

Fixed-effects location–scale model (model 1)

Model 1 is a fixed-effects location-scale model, which allows us to model both the mean (location) and variance (scale) of a response variable simultaneously. This is particularly useful when we suspect that the variance of the response variable may differ across groups or treatments.

Dataset overview

This dataset comes from a study by Cleasby et al. (2011), which investigated the effects of early-life food supplementation on adult morphology in a wild population of house sparrows (Passer domesticus). In particular, we focus on adult tarsus length as a measure of skeletal size. The dataset includes adult birds that either received supplemental food as chicks or not, and compares their tarsus length by treatment and sex.

Questions

  1. Does early-life food supplementation increase adult size? Specifically, does it increase the average tarsus length in adulthood?
  2. Does early-life food supplementation lead to lower variation in adult tarsus length?
  3. Are there sex-specific effects of food supplementation? Do the effects on mean or variance differ between males and females?

Variables included

The dataset includes adult birds that either received supplemental food as chicks or not, and compares their tarsus length by treatment and sex. We use the following variables:

  • Sex: Biological sex of the bird (“Male” or “Female”)
  • AdTarsus: Adult tarsus length (mm)
  • Treatment: Whether the bird received food supplementation as a chick (Fed) or not (Control)

Visualise the datasets

The plot shows how adult tarsus length varies by treatment (early-life food supplementation) and sex. Boxplots summarise the central tendency and spread, while the jittered points reveal the distribution of individual values. As shown in the plot, there is a clear tendency for reduced variability in the male treatment group (Fed), suggesting that early-life food supplementation may lead to more canalised development in males.

# load the datasets ----

dat_tarsus <- read.csv(here("data", "SparrowTarsusData.csv"), header = TRUE)

ggplot(dat_tarsus, aes(x = Treatment, y = AdTarsus, fill = Sex)) +
  geom_boxplot(outlier.shape = NA, alpha = 0.5, position = position_dodge(width = 0.8)) +
  geom_jitter(
    aes(color = Sex),
    size = 2, alpha = 0.7,
    position = position_jitterdodge(jitter.width = 0.2, dodge.width = 0.8)
  ) +
  scale_fill_manual(values = c("Male" = "#1f78b4", "Female" = "#e31a1c")) + 
  scale_color_manual(values = c("Male" = "#1f78b4", "Female" = "#e31a1c")) +
  labs(title = "Adult tarsus length by treatment and sex",
       x = "Treatment", y = "Tarsus length (mm)") +
  theme_classic() +
  theme(legend.position = "right")

Run models and interpret results

We fit and compare two types of models to understand the structure of adult tarsus length:

  • Location-only model: Estimates the mean of adult tarsus length as a function of sex and early-life food supplementation (treatment).

  • Location-scale model: Estimates both the mean and the variability (residual dispersion) of adult tarsus length, allowing us to examine whether treatment and sex influence not only the average trait value but also its individual variation.

This approach enables us to detect subtle patterns, such as sex-specific canalisation, that may not be captured when modeling the mean alone.

Model fitting

First, we fit a location-only model as the baseline model.

# location-only model ----

## log-transformation was applied to AdTarsus to reduce skewness and stabilize residual variance.
model_0 <- glmmTMB(
    log(AdTarsus) ~ 1 + Sex + Treatment + Sex:Treatment,
    data = dat_tarsus, 
    family = gaussian)

summary(model_0)
 Family: gaussian  ( identity )
Formula:          log(AdTarsus) ~ 1 + Sex + Treatment + Sex:Treatment
Data: dat_tarsus

      AIC       BIC    logLik -2*log(L)  df.resid 
   -221.0    -210.2     115.5    -231.0        59 


Dispersion estimate for gaussian family (sigma^2): 0.00158 

Conditional model:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)           2.88728    0.01064  271.42   <2e-16 ***
SexMale               0.01043    0.01373    0.76    0.447    
TreatmentFed          0.01036    0.01436    0.72    0.471    
SexMale:TreatmentFed  0.01353    0.02034    0.67    0.506    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## to quantify the uncertainty in parameter estimates, we computed 95% confidence intervals using the confint() function.
## this function returns the lower and upper bounds for each fixed/random effect parameters. If a confidence interval does not include zero, it suggests that the corresponding predictor has a statistically significant effect (at approximately the 0.05 level).

confint(model_0) # check 95%CI
                           2.5 %     97.5 %   Estimate
(Intercept)           2.86643380 2.90813203 2.88728291
SexMale              -0.01648352 0.03734866 0.01043257
TreatmentFed         -0.01779914 0.03850937 0.01035511
SexMale:TreatmentFed -0.02633701 0.05340220 0.01353260

We can check the residuals of the model to assess the model fit and assumptions. The Q-Q plot should show points falling along a straight line.

# | label: model0_diagnostics - model1

# plot a q-q plot of residuals to visually assess the normality assumption
# the data points should fail approximately along the reference line
res <- residuals(model_0)

qqnorm(res) # visual check for normality of residuals
qqline(res) # reference line for normal distribution

The residuals mostly follow a straight line in the Q-Q plot, but there are some deviations, particularly at the lower and upper ends. This suggests that the model may not fully capture the distribution of the data, indicating some potential issues with normality or heteroscedasticity.

Then, we fit a location-scale model.

# location-scale model ---- 

model_1 <- glmmTMB(
    log(AdTarsus) ~ 1 + Sex + Treatment + Sex:Treatment, # location part
    dispformula = ~ 1 + Sex + Treatment + Sex:Treatment, # scale part
    data = dat_tarsus, 
    family = gaussian
    )
summary(model_1)
 Family: gaussian  ( identity )
Formula:          log(AdTarsus) ~ 1 + Sex + Treatment + Sex:Treatment
Dispersion:                     ~1 + Sex + Treatment + Sex:Treatment
Data: dat_tarsus

      AIC       BIC    logLik -2*log(L)  df.resid 
   -224.0    -206.7     120.0    -240.0        56 


Conditional model:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)           2.88728    0.01006  286.87   <2e-16 ***
SexMale               0.01043    0.01343    0.78    0.437    
TreatmentFed          0.01036    0.01566    0.66    0.508    
SexMale:TreatmentFed  0.01353    0.01897    0.71    0.476    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dispersion model:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -3.27919    0.18898 -17.352  < 2e-16 ***
SexMale               0.07846    0.24397   0.322  0.74775    
TreatmentFed          0.27232    0.25520   1.067  0.28592    
SexMale:TreatmentFed -0.95054    0.36139  -2.630  0.00853 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(model_1) 
                                2.5 %      97.5 %    Estimate
cond.(Intercept)           2.86755650  2.90700935  2.88728293
cond.SexMale              -0.01588533  0.03675044  0.01043255
cond.TreatmentFed         -0.02033051  0.04104072  0.01035511
cond.SexMale:TreatmentFed -0.02364145  0.05070669  0.01353262
disp.(Intercept)          -3.64959165 -2.90879561 -3.27919363
disp.SexMale              -0.39971908  0.55664476  0.07846284
disp.TreatmentFed         -0.22785555  0.77250192  0.27232319
disp.SexMale:TreatmentFed -1.65885084 -0.24223149 -0.95054117

Now we can compare the two models to see if the location-scale model provides a better fit to the data than the location-only model. We can use the anova() function to compare the two models based on their AIC values. Alternatively, we can use the model.sel() function from the MuMIn package to compare AICc values, which is more appropriate for small sample sizes.

# compare models ----
## we can use the anova() function to compare the two models of AIC 
anova(model_0, model_1)
Data: dat_tarsus
Models:
model_0: log(AdTarsus) ~ 1 + Sex + Treatment + Sex:Treatment, zi=~0, disp=~1
model_1: log(AdTarsus) ~ 1 + Sex + Treatment + Sex:Treatment, zi=~0, disp=~1 + Sex + Treatment + Sex:Treatment
        Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
model_0  5 -221.03 -210.23 115.51  -231.03                           
model_1  8 -223.95 -206.68 119.98  -239.95 8.9248      3    0.03031 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## model.sel() from the MuMIn package can be used to compare AICc values.
model.sel(model_0, model_1)
Model selection table 
        cnd((Int)) dsp((Int)) cnd(Sex) cnd(Trt) cnd(Sex:Trt) dsp(Sex) dsp(Trt)
model_1      2.887     -3.279        +        +            +        +        +
model_0      2.887                   +        +            +                  
        dsp(Sex:Trt) df  logLik   AICc delta weight
model_1            +  8 119.976 -221.3  0.00  0.662
model_0               5 115.514 -220.0  1.34  0.338
Models ranked by AICc(x) 

The results of the location-only model (Model 0) and the location-scale model (Model 1) are summarised below.

Location-only model
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 2.887 0.011 2.866 2.908
SexMale 0.010 0.014 −0.016 0.037
TreatmentFed 0.010 0.014 −0.018 0.039
SexMale:TreatmentFed 0.014 0.020 −0.026 0.053
Location-scale model (location part)
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 2.887 0.010 2.868 2.907
SexMale 0.010 0.013 −0.016 0.037
TreatmentFed 0.010 0.016 −0.020 0.041
SexMale:TreatmentFed 0.014 0.019 −0.024 0.051
Location-scale model (dispersion part)
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) −3.279 0.189 −3.650 −2.909
SexMale 0.078 0.244 −0.400 0.557
TreatmentFed 0.272 0.255 −0.228 0.773
SexMale:TreatmentFed −0.951 0.361 −1.659 −0.242

Of course, we can also fit the location–scale model using the brms! Here we show how to fit the same model as above using brms. The results should be similar to those obtained with glmmTMB.

# specify the model using bf()
formula1 <- bf(
  log(AdTarsus) ~  1 + Sex + Treatment + Sex:Treatment, 
  sigma = ~ 1 + Sex + Treatment + Sex:Treatment
)

# generate default priors based on the formula and data
default_priors <- default_prior(
                        formula1,
                        data = dat_tarsus,                             
                        family = gaussian() # default link function for gaussian family                                 
                          )

# fit the model - you can change N of iter, warmup, thin, and also chains.
# adapt_delta = 0.95 helps to reduce divergent transitions
system.time(
  brms_g1 <- brm(formula1,
                  data = dat_tarsus,           
                  family = gaussian(),                   
                  prior = default_priors,                
                  iter = 2000,                          
                  warmup = 1000, 
                  thin = 1,                              
                  chains = 2,                            
                  control = list(adapt_delta = 0.95) 
             )
)

summary(brms_g1)
 Family: gaussian 
  Links: mu = identity; sigma = log 
Formula: log(AdTarsus) ~ 1 + Sex + Treatment + Sex:Treatment 
         sigma ~ 1 + Sex + Treatment + Sex:Treatment
   Data: dat_tarsus (Number of observations: 64) 
  Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 2000

Regression Coefficients:
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                      2.89      0.01     2.86     2.91 1.00     1162
sigma_Intercept               -3.19      0.21    -3.56    -2.72 1.00     1041
SexMale                        0.01      0.01    -0.02     0.04 1.00      994
TreatmentFed                   0.01      0.02    -0.03     0.05 1.00      795
SexMale:TreatmentFed           0.01      0.02    -0.03     0.06 1.00      711
sigma_SexMale                  0.04      0.27    -0.52     0.56 1.00     1025
sigma_TreatmentFed             0.24      0.28    -0.30     0.77 1.00      917
sigma_SexMale:TreatmentFed    -0.87      0.41    -1.66    -0.01 1.00      890
                           Tail_ESS
Intercept                      1130
sigma_Intercept                1202
SexMale                        1040
TreatmentFed                    756
SexMale:TreatmentFed            865
sigma_SexMale                  1079
sigma_TreatmentFed             1082
sigma_SexMale:TreatmentFed     1050

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

First, you need to check the effective sample size (**_ESS) and Rhat values. ESS should be greater than 400 (Vehtari et al. 2021), and \(\hat{R}\) should be close to 1.0 (ideally < 1.01). If these conditions are not met, you may need to increase the number of iterations or adjust the model specification.

Then, we can check the output. It is divided into two parts: Location (mean) part (how the average changes) and Scale (dispersion) part (how the variability changes). In Gaussian data, the scale part is \(\sigma\)

Here is the explanation of the output table:


Estimate: posterior mean

Est.Err: standard error of posterior mean

l-95% CI and u-95% CI: Lower and upper bounds of the 95% credible interval (range where the true value lies with 95% probability, given the model and data)

Rhat: Convergence diagnostic. Should be close to 1.00. If >1.01, convergence may be poor.

Bulk_ESS and Tail_ESS: Effective sample sizes for bulk and tail distributions. Should be >400 for reliable estimates (larger is better).


Now, you can find the results from glmmTMB and brms are very close to each other, but there are some differences in the estimates and standard errors. This came from the different estimation methods used by the two packages. glmmTMB uses maximum likelihood estimation, while brms uses Bayesian estimation with Markov Chain Monte Carlo (MCMC) sampling.

Comparison of location-only model and location-scale model

There was no significant difference in the fit of the two models. location-scale model (model 1) had a lower AICc (-221.0) than location-only model (model 0: -220.0), with an AIC weight of 0.662 vs. 0.338 (see Model comparison tab). 

Note on AIC vs AICc:

While both AIC (Akaike Information Criterion) and AICc (AIC with correction) assess model fit by balancing goodness of fit and model complexity, AICc includes an additional correction for small sample sizes. When the sample size is limited relative to the number of estimated parameters, AICc is generally preferred because it reduces the risk of overfitting. As a result, AIC and AICc values may differ slightly, and model rankings based on them may also vary.

Interpretation of location-scale model :

Location (mean) part:

  • There was no significant effect of treatment, sex, or their interaction on the mean of log(AdTarsus) (all 95% CIs include 0).

Scale (dispersion) part:

  • There are a significant negative interaction between sex and treatment (SexMale:TreatmentFed; \(\beta_{[\text{interaction}]}^{(s)}\) = -0.95, 95% CI = [-1.66, -0.24]). The estimate -0.95 in the dispersion model indicates that the residual variance in log(AdTarsus) is significantly lower in Fed males than in Control males.
  • This indicates that variance in adult tarsus length was significantly reduced among males that received food supplementation as chicks.
  • Neither treatment nor sex alone had a significant effect on variance (i.e., the reduction was specific to supplemented males).

Conclusion

Q1. Does early-life food supplementation increase adult size? Specifically, does it increase the average tarsus length in adulthood?

Answer: No clear evidence. In both the location-only and location-scale models, the effect of feeding (treatment) on the mean adult tarsus length was small and not statistically significant. This suggests that food supplementation did not lead to a measurable increase in average tarsus length.

Q2. Does early-life food supplementation lead to lower variation in adult tarsus length??

Answer: Partially yes - especially in males. The location-scale model revealed a significant reduction in variance in the male treatment group (Fed) compared to the male control group. This was supported by a significant negative interaction between sex and treatment in the dispersion model. In contrast, females showed no significant difference in variance between treatment groups. This indicates that early-life food supplementation reduced size variation only in males, not across all individuals.

Q3. Are there sex-specific effects of food supplementation? Do the effects on mean or variance differ between males and females?

Answer: Yes. The male treatment group (Fed) showed significantly reduced variance in adult tarsus length compared to the control group (Control), while females did not show a significant difference in variance between treatment groups. There were no significant differences in mean tarsus length between sexes or treatments. This pattern suggests that early-life food supplementation may canalise trait development in males, leading to more uniform adult morphology under favourable nutritional conditions.


Although the model comparison did not show a strong difference in overall fit between the location-only and location-scale models (\(\Delta AICc = 1.3\)), the location-scale model revealed an important and previously overlooked pattern:

Early-life food supplementation significantly reduced trait variance in males, but not in females.

This result would have been missed in a traditional location-only analysis that focuses solely on mean differences. By modeling both the mean and the dispersion, we were able to detect a sex-specific canalisation effect, highlighting the value of using location-scale models when investigating trait variability and developmental plasticity.

Adding random effects in the location part only (model 2)

Model 2 is a location-scale model with random effects in the location part only. This allows us to account for individual-level variation in the mean response while still modeling the dispersion separately.

Dataset overview

This dataset comes from Drummond et al (2025), which examined the benefits of brood reduction in the blue-footed booby (Sula nebouxii).Focusing on two-chick broods, the research explored whether the death of one chick (brood reduction) primarily benefits the surviving sibling (via increased resource acquisition) or the parents (by lessening parental investment). In this species, older chicks establish a strong dominance hierarchy over their younger siblings. Under stressful environmental conditions, this often results in the death of the second-hatched chick, with parental intervention being extremely rare.

For this tutorial, we will specifically examine data from two-chick broods where both chicks survived to fledgling. Our analysis will focus on how hatching order influences the body condition of chicks at fledgling.

Questions

  1. Do first hatched chicks have a higher body mass than second-hatched chicks?
  2. Does the variability in body mass differ between first and second-hatched chicks?

Variables included

  • SMI: Scaled Mass Index, a measure that accounts for body mass relative to size
  • RANK: The order in which chicks hatched (1 for first-hatched, 2 for second-hatched)
  • NEST_ID: Identifier for the nest where the chicks were raised
  • YEAR: The year in which the chicks were born

Visualise the dataset

The plot displays the distribution of body condition, measured as Scaled Mass Index (SMI), for blue-footed booby chicks based on their hatching order. Violin plots illustrate the overall density distribution of SMI for both “First-Hatched” and “Second-Hatched” chicks. Each individual empty circle represents the SMI of a single chick. The plot visually highlights individual variability and allows for the comparison of both central tendency (black solid line) and spread of body condition between the two hatching orders.

dat <- read_csv(here("data","SMI_Drummond_et_al_2025.csv")) %>%subset(REDUCTION == 0)

dat<- dat%>%
  dplyr::select(-TIME, -HATCHING_DATE,-REDUCTION,-RING) %>%
  mutate(SMI = as.numeric(SMI), # Scaled mass index
         NEST = as.factor(NEST), 
         WORKYEAR = as.factor(WORKYEAR),
         RANK = factor(RANK, levels = c("1", "2")))


ggplot(dat, aes(x = RANK, y = SMI, fill = RANK, color = RANK)) +
 geom_violin(aes(fill = RANK),
              color = "#8B8B83",
              width = 0.8, 
              alpha = 0.3,
              position = position_dodge(width = 0.7)) +
  geom_jitter(aes(color = RANK),
              size = 3,
              alpha = 0.4,
              shape = 1,
              position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.15)) + 
  stat_summary(fun = mean,              
               geom = "crossbar",       
               width = 0.1,             
               color = "black",         
               linewidth = 0.5,         
               position = position_dodge(width = 0.7))+
  labs(
    title = "Body Condition (SMI) by Hatching Order",
    x = "Hatching Order",
    y = "Scaled Mass Index (g)"
  ) +
  scale_fill_manual(
    values = c("1" = "#1F78B4", "2" = "#E31A1C") # 
  ) +
  scale_color_manual(
    values = c("1" = "#1F78B4", "2" = "#E31A1C") # 
  ) +
  scale_x_discrete(
    labels = c("1" = "First-Hatched", "2" = "Second-Hatched"),
    expand = expansion(add = 0.5)
  ) +
  theme_classic(base_size = 16) +
  theme(
    axis.text = element_text(color = "#6E7B8B", size = 14),
    axis.title = element_text(color = "#6E7B8B", size = 14),
    legend.position = "none",
    axis.text.x = element_text(angle = 0, hjust = 0.5)
  )

Run models and interpret results

All models incorporate NEST_ID and YEAR random effects to account for non-independence of chicks within the same nest and observations within the same year.

The models are as follows:

  1. Location-only model: Estimates the average difference in Scaled Mass Index (SMI) between the two hatching orders.
  2. Location-scale model: Estimates both the average difference (location) and differences in the variability (scale) of SMI between hatching orders.

Following the previous section, we first fit a location-only model as our baseline and applied a log-transformation to SMI.

model2_1<-glmmTMB(log(SMI) ~ 1 + RANK + (1|NEST)+(1|WORKYEAR),
              family = gaussian(), data=dat)


summary(model2_1)
 Family: gaussian  ( identity )
Formula:          log(SMI) ~ 1 + RANK + (1 | NEST) + (1 | WORKYEAR)
Data: dat

      AIC       BIC    logLik -2*log(L)  df.resid 
  -8048.8   -8016.5    4029.4   -8058.8      4732 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 NEST     (Intercept) 0.002434 0.04934 
 WORKYEAR (Intercept) 0.008858 0.09412 
 Residual             0.008245 0.09080 
Number of obs: 4737, groups:  NEST, 2873; WORKYEAR, 24

Dispersion estimate for gaussian family (sigma^2): 0.00824 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.35953    0.01973   372.9  < 2e-16 ***
RANK2       -0.01872    0.00272    -6.9 5.91e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(model2_1)
                                   2.5 %      97.5 %    Estimate
(Intercept)                   7.32085592  7.39820911  7.35953252
RANK2                        -0.02405065 -0.01338785 -0.01871925
Std.Dev.(Intercept)|NEST      0.04463325  0.05454148  0.04933927
Std.Dev.(Intercept)|WORKYEAR  0.07008890  0.12638674  0.09411859

Then we use the ‘DHARMa’ package to simulate residuals and plot them automatically. This step is crucial for checking the model’s assumptions, such as normality and homoscedasticity of residuals.

# | label: model_diagnostics - model0
simulationOutput <- simulateResiduals(fittedModel = model2_1, plot = F)
plot(simulationOutput)

The QQ plot and Kolmogorov-Smirnov (KS) test reveal that the model’s residuals aren’t uniformly distributed, hinting at potential issues with the chosen distribution or the model’s underlying structure. While tests for overdispersion and outliers showed no significant concerns, the boxplots and Levene test highlight a lack of uniformity within residual groups and non-homogeneous variances. This suggests the model doesn’t fully capture the data’s complexity and might benefit from adjustments, perhaps by incorporating a location-scale model to better handle varying dispersion.

# location-scale model ---- 

model2_2 <- glmmTMB(
    log(SMI) ~ 1 + RANK + (1|NEST)+(1|WORKYEAR),
    dispformula = ~ 1 + RANK,     
    data = dat, 
    family = gaussian
    )
summary(model2_2)
 Family: gaussian  ( identity )
Formula:          log(SMI) ~ 1 + RANK + (1 | NEST) + (1 | WORKYEAR)
Dispersion:                ~1 + RANK
Data: dat

      AIC       BIC    logLik -2*log(L)  df.resid 
  -8070.8   -8032.1    4041.4   -8082.8      4731 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 NEST     (Intercept) 0.002430 0.04929 
 WORKYEAR (Intercept) 0.008595 0.09271 
 Residual                   NA      NA 
Number of obs: 4737, groups:  NEST, 2873; WORKYEAR, 24

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.36075    0.01944   378.7  < 2e-16 ***
RANK2       -0.01880    0.00271    -6.9 3.93e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dispersion model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.47026    0.02242 -110.20  < 2e-16 ***
RANK2        0.13035    0.02687    4.85 1.23e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(model2_2)
                                        2.5 %      97.5 %    Estimate
cond.(Intercept)                   7.32265786  7.39885037  7.36075412
cond.RANK2                        -0.02411510 -0.01349325 -0.01880417
disp.(Intercept)                  -2.51419214 -2.42631943 -2.47025578
disp.RANK2                         0.07768165  0.18301526  0.13034846
cond.Std.Dev.(Intercept)|NEST      0.04462552  0.05445269  0.04929482
cond.Std.Dev.(Intercept)|WORKYEAR  0.06903183  0.12450065  0.09270657

Finally, we compare the location-scale model with the location-only model based on AICc values (model.sel() from the MuMIn package).

model.sel(model2_1, model2_2)
Model selection table 
         cnd((Int)) dsp((Int)) cnd(RAN) dsp(RAN) df   logLik    AICc delta
model2_2      7.361      -2.47        +        +  6 4041.417 -8070.8  0.00
model2_1      7.360                   +           5 4029.421 -8048.8 21.99
         weight
model2_2      1
model2_1      0
Models ranked by AICc(x) 
Random terms (all models): 
  cond(1 | NEST), cond(1 | WORKYEAR)

The results of the location-only model (Model 0) and the location-scale model (Model 1) are summarized below.

Location-only model
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 7.360 0.020 7.321 7.398
RANK2 −0.019 0.003 −0.024 −0.013
Location-scale model (location part)
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 7.361 0.019 7.323 7.399
RANK2 −0.019 0.003 −0.024 −0.013
Location-scale model (dispersion part)
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) −2.470 0.022 −2.514 −2.426
RANK2 0.130 0.027 0.078 0.183

Here is how to fit the same location-scale models using the brms package for comparison.

Location-only model

m1 <- bf(log(SMI) ~ 1 + RANK + (1|NEST) + (1|WORKYEAR))
prior1<-default_prior(m1, data = dat, family = gaussian())

fit1 <- brm(
  m1,
  prior = prior1,
  data = dat,
  family = gaussian(),
  iter = 6000,     
  warmup = 1000,   
  chains = 4,  cores=4,
  backend = "cmdstanr",
  control = list(
    adapt_delta = 0.99,  # Keep high if you have divergent transitions
    max_treedepth = 15   # Keep high if hitting max_treedepth warnings
    ),
  seed = 123,      
  refresh = 500    # Less frequent progress updates (reduces overhead)
)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log(SMI) ~ (1 | NEST) + (1 | WORKYEAR) + RANK 
   Data: dat (Number of observations: 4737) 
  Draws: 4 chains, each with iter = 6000; warmup = 1000; thin = 1;
         total post-warmup draws = 20000

Multilevel Hyperparameters:
~NEST (Number of levels: 2873) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.05      0.00     0.04     0.05 1.00     4028     7765

~WORKYEAR (Number of levels: 24) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.10      0.02     0.08     0.14 1.00     5145     9062

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     7.36      0.02     7.32     7.40 1.00     2947     5200
RANK2        -0.02      0.00    -0.02    -0.01 1.00    25883    12974

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.09      0.00     0.09     0.09 1.00     5732    10919

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Location-scale model

m2 <- bf(log(SMI) ~ 1 + RANK + (1|NEST) + (1|WORKYEAR),
         sigma~ 1 + RANK)
prior2 <- default_prior(m2,data = dat,family = gaussian())

fit2 <- brm(
 m2,
 prior= prior2,
 data = dat,
 family = gaussian(),
 iter = 6000,     
 warmup = 1000,   
 chains = 4,  cores=4,
 backend = "cmdstanr",
 control = list(
   adapt_delta = 0.99,  # Keep high if you have divergent transitions
   max_treedepth = 15   # Keep high if hitting max_treedepth warnings
 ),
 seed = 123,      
 refresh = 500    # Less frequent progress updates (reduces overhead)
)
 Family: gaussian 
  Links: mu = identity; sigma = log 
Formula: log(SMI) ~ 1 + RANK + (1 | NEST) + (1 | WORKYEAR) 
         sigma ~ 1 + RANK
   Data: dat (Number of observations: 4737) 
  Draws: 4 chains, each with iter = 6000; warmup = 1000; thin = 1;
         total post-warmup draws = 20000

Multilevel Hyperparameters:
~NEST (Number of levels: 2873) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.05      0.00     0.04     0.05 1.00     3871     8155

~WORKYEAR (Number of levels: 24) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.10      0.02     0.07     0.14 1.00     3937     7503

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           7.36      0.02     7.32     7.40 1.00     1826     4007
sigma_Intercept    -2.47      0.02    -2.51    -2.43 1.00     5963    11191
RANK2              -0.02      0.00    -0.02    -0.01 1.00    31665    14172
sigma_RANK2         0.13      0.03     0.08     0.18 1.00    16024    15761

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Model comparison

f1loo <- loo::loo(fit1)
f2loo <- loo::loo(fit2)

#Model comparison 
fc <- loo::loo_compare(f1loo, f2loo)
fc

#     elpd_diff se_diff
# fit2   0.0       0.0  
# fit1 -10.3       7.6  

LOO (Leave-One-Out) model comparison helps us estimate which Bayesian model will best predict new data. The elpd_diff (Estimated Log Predictive Density difference) tells us the difference in predictive accuracy, with a more positive number indicating better performance. The se_diff (Standard Error of the Difference) provides the uncertainty around this difference. In our results, fit2 (the location-scale model) has an elpd_diff of 0.0, meaning it’s the reference or best-performing model. fit1 (the location-only model) has an elpd_diff of -10.3 with an se_diff of 7.6, indicating that fit2 is estimated to be substantially better at predicting new data than fit1, although there is some uncertainty in that difference. As a rule of thumb, if the absolute value of the elpd_diff is more than twice the se_diff, we can consider the difference as meaningful (\(|elpd\_diff| > 2 * se\_diff\)).

Summary of brms model results

Model 1: Posterior Summary
Term Estimate Std.error 95% CI (low) 95% CI (high)
Location Model
Intercept 7.359 0.022 7.316 7.403
Second hatched chick −0.019 0.003 −0.024 −0.013
Random Effects
Nest ID 0.049 0.003 0.044 0.054
Year 0.102 0.017 0.075 0.141
Residual Standard Deviation
Sigma 0.091 0.001 0.088 0.094
Model 2: Posterior Summary
Term Estimate Std.error 95% CI (low) 95% CI (high)
Location Submodel
Intercept 7.360 0.021 7.318 7.402
Second hatched chick −0.019 0.003 −0.024 −0.014
Scale Submodel
Intercept (sigma) −2.469 0.022 −2.514 −2.426
Second hatched chick (sigma) 0.130 0.027 0.077 0.182
Random effects
Nest ID 0.049 0.003 0.044 0.054
Year 0.100 0.017 0.073 0.139

Comparison of location-only model and location-scale model

The location-scale model (Model 2) was the most-supported model based on AICc values (-8070.8) compared to the location-only model (-8048.8; see Model comparison tab for details).

Interpretation of location-scale model :

Location (mean) part:

  • There was a conclusive effect of hatching order on the mean of log(SMI), with second-hatched chicks having a lower SMI than first-hatched chicks (\(\beta_{[\text{first-second}]}^{(l)}\) = -0.018, 95% CI = [-0.024, -0.013]).

Scale (dispersion) part:

  • The scale component revealed a conclusive difference in the variability of SMI between hatching orders. The second-hatched chicks exhibited greater variability in SMI compared to first-hatched chicks (\(\beta_{[\text{first-second}]}^{(s)}\) = 0.130, 95% CI = [0.077, 0.183]).

Location (mean) random effects:

  • There is very little variation in the average body condition across different nests (\(\sigma_{[\text{Nest\_ID}]}^{(l)}\) = 0.049, 95% CI [0.044, 0.054]), suggesting that most nests have similar average body conditions for their chicks.

  • There is also very little variation in the average body condition across different years (\(\sigma_{[\text{Year}]}^{(l)}\) = 0.092, 95% CI [0.069, 0.124]),implying generally consistent average body conditions from year to year.

Conclusion

Do first hatched chicks have a higher body mass than second-hatched chicks?

Answer: Answer: Yes, first-hatched chicks have a significantly higher Scaled Mass Index (SMI) than second-hatched chicks at fledgling, with a mean difference of approximately 0.018 on the log scale.

Does the variability in body mass differ between first and second-hatched chicks?

Answer: Yes, the variability in body mass (SMI) is significantly greater in second-hatched chicks compared to first-hatched chicks. This indicates that second-hatched chicks show more variation in their body condition at fledgling.

Double-hierarchical location-scale model (Model 3)

Following up on the previous example analyzing the scaled mass index (SMI) at fledgling Drummond et al(2025), we now extend our approach by fitting a double-hierarchical Gaussian location-scale model. This extends Model 2 by introducing nest identity as a correlated random effect in both the location (mean) and scale (variance) components. This extension allows us to examine not only how average log(SMI) and its variability differ across nests, but also whether these two forms of nest-level variation are correlated. The main distinction between Model 2 and Model 3 lies in how they handle variation. Model 2 includes random effects only for the mean, assuming constant variance across groups. In contrast, Model 3 accounts for group-level variability by including random effects in both the mean and the variance components. This makes Model 3 particularly useful when:

  • Group-level variability (heteroscedasticity) is present
  • There may be a correlation between group-level means and variances (e.g., nests with higher average SMI also show more variability)

  • Both the average and variability are biologically meaningful or expected to vary across groups

By modeling variation explicitly, Model 3 can yield better fit and more nuanced inference, especially in ecological or evolutionary contexts where both central tendencies and dispersion carry important signals.

Dataset overview

We use the same dataset as in Model 2, but extend the location-scale framework by moving from Model 2 to Model 3, a double-hierarchical location-scale model. This extension allows us to model group-level variation in both the mean (location) and the variance (scale), as well as their potential correlation.

Question

Variation between and within nests Do nests with generally healthier (heavier) chicks also tend to have chicks that are more similar in their body condition?

Run models and interpret results

In this section, we compare Model 1 and Model 3 using the brms package, since glmmTMB does not support estimating correlations between random effects in the location and scale components. Model 3 - the double-hierarchical location-scale model - extends Model 2 by incorporating random effects for NEST_ID in both the mean and the variability of SMI. Crucially, it also estimates the correlation between these nest-level effects. For instance, a positive correlation would indicate that nests with higher average SMI also exhibit greater variability in SMI.

m3 <- bf(log(SMI) ~ 1 + RANK+ (1|q|NEST) + (1|WORKYEAR),
         sigma~ 1 + RANK + (1|q|NEST) + (1|WORKYEAR))
prior3<-default_prior(m3,data = dat,family = gaussian())
 
fit3 <- brm(
  m3,
  prior= prior3,
  data = dat,
  family = gaussian(),
  iter = 6000,     
  warmup = 1000,   
  chains = 4,  
  cores=4,
  backend = "cmdstanr",
  control = list(
    adapt_delta = 0.99,  
    max_treedepth = 15  
    ),
  seed = 123,      
  refresh = 500)
 Family: gaussian 
  Links: mu = identity; sigma = log 
Formula: log(SMI) ~ (1 | q | NEST) + (1 | WORKYEAR) + RANK 
         sigma ~ (1 | q | NEST) + (1 | WORKYEAR) + RANK
   Data: dat (Number of observations: 4737) 
  Draws: 4 chains, each with iter = 6000; warmup = 1000; thin = 1;
         total post-warmup draws = 20000

Multilevel Hyperparameters:
~NEST (Number of levels: 2873) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                      0.05      0.00     0.04     0.05 1.00
sd(sigma_Intercept)                0.36      0.02     0.32     0.40 1.00
cor(Intercept,sigma_Intercept)    -0.46      0.07    -0.58    -0.33 1.00
                               Bulk_ESS Tail_ESS
sd(Intercept)                      3596     7826
sd(sigma_Intercept)                4668     9783
cor(Intercept,sigma_Intercept)     3805     7596

~WORKYEAR (Number of levels: 24) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)           0.10      0.02     0.07     0.14 1.00     5949    10071
sd(sigma_Intercept)     0.28      0.06     0.18     0.41 1.00     8421    12541

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           7.36      0.02     7.32     7.40 1.00     3428     6959
sigma_Intercept    -2.52      0.07    -2.65    -2.38 1.00     5030     8529
RANK2              -0.01      0.00    -0.02    -0.01 1.00    24251    17886
sigma_RANK2         0.13      0.03     0.07     0.19 1.00    19128    15170

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

We compare the double-hierarchical location-scale model (Model 3) with the previous location-only model (Model 1) to examine whether the added complexity of Model 3 leads to a better fit to the data.

f1loo <- loo::loo(fit1)
f3loo <- loo::loo(fit3)

#Model comparison 
fc2 <- loo::loo_compare(f1loo, f3loo)
fc2
#  elpd_diff se_diff
# fit3    0.0       0.0 
# fit1 -357.2      37.4 

The impact of including random effects in the scale component becomes clear when comparing the results of Model 1 with those of the double-hierarchical location-scale model (Model 3).

Comparison of location-only model and location-scale model

The model comparison results indicate that the double-hierarchical location-scale model (Model 3) is the most supported by the data, as it has the lowest LOO (Leave-One-Out Cross-Validation) information criterion value. This suggests that this model provides the best fit to the data while accounting for both the average differences in body condition and the variability in body condition between first and second-hatched chicks.

Interpretation of location-scale model :

Scale (dispersion) random effects:

  • There is notable variation in body condition within nests (\(\sigma_{[\text{Nest\_ID}]}^{(s)}\) = 0.37, 95% CI [0.316, 0.399]). This indicates that some nests consistently produce chicks with more consistent body conditions than others.

  • The consistency of chick body conditions varies notably across work years (\(\sigma_{[\text{Year}]}^{(s)}\) = 0.277, 95% CI [0.183, 0.411]), implying some years yield chicks with more consistent body conditions than others.

Correlation between location and scale random effects:

  • The correlation between the nest-specific random effects for the average body condition and the variability in body condition is negative (\(\rho_{[\text{Nest\_ID}]}\) = -0.457, 95% CI [-0.584, -0.326]). This suggests that nests with higher average body conditions tend to produce chicks with more consistent body condition values (less dispersion).

Conclusion

Do nests with generally healthier (heavier) chicks also tend to have chicks that are more similar in their body condition?

Answer: Yes, there’s a tendency for nests with higher average chick body condition to also exhibit greater consistency among their chicks’ body conditions. This is supported by a negative correlation between nest-specific average body condition and its variability.

Beyond Gaussian 1

This dataset comes from Mizuno and Soma (2023), which investigates the visual preferences of estrildid finches. The study measured how often birds gazed at different visual stimuli (white dots vs. white stripes) under two conditions: food-deprived and food-supplied. The research originally tested the sensory bias hypothesis, which suggests that a pre-existing preference for whitish, round objects (like seeds) may have influenced the evolution of plumage patterns.

Here, to keep things simple, we use only a subset of the data related to the white dot pattern stimulus. We analyse how gaze frequency toward dot stimuli differs between the two conditions (food-deprived vs. food-supplied), using a location-scale model with a negative binomial distribution.

Although this model includes two random effects (individual and species), it follows the same basic structure as Model 2 - the only difference is that we use a non-Gaussian distribution (negative binomial) instead of a Gaussian one.

Questions

  1. Do birds gaze at dot patterns more (or less) when food-deprived compared to food-supplied?
  2. Does the variability in gaze responses differ between conditions?

Variables included

The dataset contains the following variables:

  • species and phylo: The species of the bird (12 species).
  • id: The individual bird’s ID.
  • sex: Male or female.
  • condition: The condition under which the data was collected (food-deprived or food-supplied).
  • frequency: The total frequency of gazes towards the stimulus over a 1-hour period.
  • p_dot: Having plumage dot patterns or not.
  • termite: The presence of termite-eating diet.

Visualise the datasets

First, we visualise the data to understand how gaze frequency varies by condition and species.

set.seed(42) 

# load the dataset ----
dat_pref <- read.csv(here("data", "AM_preference.csv"), header = TRUE)

dat_pref <- dat_pref %>%
  dplyr::select(-stripe, -full, -subset) %>%
  rename(frequency = dot) %>%
  mutate(species = phylo, across(c(condition, sex), as.factor))
  
ggplot(dat_pref, aes(x = condition,
                     y = frequency)) +
    geom_violin(color = "#8B8B83", fill = "white",
              width = 1.2, alpha = 0.3) +
  geom_jitter(aes(shape = sex, color = sex),
              size = 3, alpha = 0.8,
              width = 0.15, height = 0) +
  labs(
    title = "Total frequency of gazes towards dot patterns",
    x = "Condition",
    y = "Total frequency of gazes (1hr)"
  ) +
  scale_shape_manual(values = c("M" = 17, "F" = 16),
                     labels = c("M" = "Male", "F" = "Female")) +
  scale_color_manual(values = c("M" = "#009ACD", "F" = "#FF4D4D"),
                     labels = c("M" = "Male", "F" = "Female")) +
  theme_classic(base_size = 16) +
  theme(
    axis.text = element_text(color = "#6E7B8B", size = 14),
    axis.title = element_text(color = "#6E7B8B", size = 14),
    legend.title = element_text(color = "#6E7B8B"),
    legend.text = element_text(color = "#6E7B8B"),
    legend.position = "right",
  axis.text.x = element_text(angle = 45, hjust = 1)
  )

The plot breaks down the same gaze data by species. Each point is a bird’s gaze frequency under a condition. Colored lines represent species-specific means across conditions, helping to visualise - how different species vary in their average gazing behaviour, whether species respond differently to food deprivation. Faceting by species can help compare species-level patterns.

ggplot(dat_pref, aes(x = condition, y = frequency)) +
  geom_point(alpha = 0.5) +
  labs(
    title = "Total frequency of gazes towards dot patterns by species",
    x = "Condition",
    y = "Total frequency of gazes (1hr)"
  ) +
  stat_summary(fun = mean, geom = "line", aes(group = species, color = species)) +
  theme_classic() +
  facet_wrap(~ species)

Run models and interpret results

We fit and compare two types of models to analyse gaze frequency toward white dot patterns using the glmmTMB package:

  • Location-only model: Estimates how the mean gaze frequency varies between the two experimental conditions (food-deprived vs. food-supplied).

  • Location-scale model: Estimates how both the mean and the variability (dispersion) of gaze frequency vary between conditions.

These models help us examine whether internal states such as hunger (represented by food deprivation) influence not only the average number of gazes, but also the species/individual variability in gaze behaviour. Such variability may reflect differences in exploratory tendencies among species/individuals. Please note that in this section, species are not modelled with phylogenetic relatedness. For an example of how to fit a location–scale phylogenetic regression model, see Section Model selection (example 2).

Model fitting

We use negative binomial model here, as the response variable frequency is a count of gazes and may exhibit overdispersion. In The glmmTMB package, we can specify the family as nbinom2 for negative binomial distribution with a log link function.

# location model ----
## the mean (location) of gaze frequency with condition and sex as fixed effects; assumes constant variance.

model_0 <- glmmTMB(
  frequency ~ 1 +  condition + sex + (1 | species) + (1 | id), # location part (mean)
  data = dat_pref,
  family = nbinom2(link = "log")
)
 
summary(model_0)
 Family: nbinom2  ( log )
Formula:          frequency ~ 1 + condition + sex + (1 | species) + (1 | id)
Data: dat_pref

      AIC       BIC    logLik -2*log(L)  df.resid 
   2176.3    2195.8   -1082.2    2164.3       184 

Random effects:

Conditional model:
 Groups  Name        Variance Std.Dev.
 species (Intercept) 0.3725   0.6103  
 id      (Intercept) 0.1353   0.3679  
Number of obs: 190, groups:  species, 12; id, 95

Dispersion parameter for nbinom2 family (): 1.72 

Conditional model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        4.84086    0.22151  21.854  < 2e-16 ***
conditionsupplied -0.92168    0.12080  -7.630 2.36e-14 ***
sexM              -0.06673    0.14422  -0.463    0.644    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(model_0)
                                 2.5 %     97.5 %   Estimate
(Intercept)                  4.4067149  5.2750101  4.8408625
conditionsupplied           -1.1584490 -0.6849075 -0.9216782
sexM                        -0.3493977  0.2159315 -0.0667331
Std.Dev.(Intercept)|species  0.3467243  1.0743326  0.6103255
Std.Dev.(Intercept)|id       0.2105812  0.6427136  0.3678905

For Gaussian models, we can use the Q-Q plot to show whether the residuals fall along a straight line. However, for non-Gaussian models, we can use the DHARMa package to check the residuals. The DHARMa package provides a set of diagnostic tools for generalized linear mixed models (GLMMs) and allows us to simulate residuals and check for overdispersion, outliers, and uniformity.

# location-only model (NB) ----
# main diagnostic plots
model0_res <- simulateResiduals(model_0, plot = TRUE, seed = 42)

# formal test for over/underdispersion
testDispersion(model0_res) 


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.45606, p-value = 0.456
alternative hypothesis: two.sided

The DHARMa diagnostics indicate no issues with overdispersion or outliers (Dispersion test: p = 0.456; Outlier test: p = 0.88). However, the KS test for uniformity is significant (p = 2e-05), suggesting deviation from the expected residual distribution. Additionally, residuals show non-uniform patterns across levels of the categorical predictor (catPred), and the Levene’s test indicates heterogeneity of variance among groups. These results suggest that the model may not fully capture the structure associated with the categorical predictor.

The result fromDHARMa diagnostics suggests that the location-only model may not fully capture the structure of the data. Therefore, we can fit a location-scale model to account for both the mean and variance of the response variable.

# location-scale model ---- 
# both the mean (location) and the variance (scale) as functions of condition and sex.

model_1 <- glmmTMB(
  frequency ~ 1 + condition + sex + (1 | species) + (1 | id),
   dispformula = ~ condition + sex,     
  data = dat_pref,                          
  family = nbinom2(link = "log")
)
summary(model_1)
 Family: nbinom2  ( log )
Formula:          frequency ~ 1 + condition + sex + (1 | species) + (1 | id)
Dispersion:                 ~condition + sex
Data: dat_pref

      AIC       BIC    logLik -2*log(L)  df.resid 
     2172      2198     -1078      2156       182 

Random effects:

Conditional model:
 Groups  Name        Variance Std.Dev.
 species (Intercept) 0.3047   0.5520  
 id      (Intercept) 0.1173   0.3425  
Number of obs: 190, groups:  species, 12; id, 95

Conditional model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)         4.8737     0.2042  23.868  < 2e-16 ***
conditionsupplied  -0.8452     0.1206  -7.010 2.38e-12 ***
sexM               -0.1040     0.1370  -0.759    0.448    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dispersion model:
                  Estimate Std. Error z value Pr(>|z|)   
(Intercept)         0.8394     0.2597   3.232  0.00123 **
conditionsupplied  -0.6616     0.2480  -2.667  0.00765 **
sexM                0.1320     0.2486   0.531  0.59549   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(model_1)
                                      2.5 %     97.5 %   Estimate
cond.(Intercept)                  4.4734822  5.2739084  4.8736953
cond.conditionsupplied           -1.0814991 -0.6088973 -0.8451982
cond.sexM                        -0.3725499  0.1646364 -0.1039568
disp.(Intercept)                  0.3303327  1.3485194  0.8394261
disp.conditionsupplied           -1.1476836 -0.1754389 -0.6615612
disp.sexM                        -0.3552045  0.6191168  0.1319561
cond.Std.Dev.(Intercept)|species  0.3079456  0.9894436  0.5519916
cond.Std.Dev.(Intercept)|id       0.1719628  0.6820707  0.3424774

Let’s compare the two models to see if the location-scale model provides a better fit to the data than the location-only model.

# compare models ----
model.sel(model_0, model_1)
Model selection table 
        cond((Int)) dsp((Int)) cond(cndt) cond(sex) dsp(cndt) dsp(sex) df
model_1       4.874     0.8394          +         +         +        +  8
model_0       4.841                     +         +                     6
           logLik   AICc delta weight
model_1 -1078.021 2172.8  0.00  0.877
model_0 -1082.155 2176.8  3.93  0.123
Models ranked by AICc(x) 
Random terms (all models): 
  cond(1 | species), cond(1 | id)

You can quickly check the results of the location-only model (model 0) and the location-scale model (model 1) below.

Location-only model
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 4.841 0.222 4.407 5.275
conditionsupplied −0.922 0.121 −1.158 −0.685
sexM −0.067 0.144 −0.349 0.216
Location-scale model (location part)
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 4.874 0.204 4.473 5.274
conditionsupplied −0.845 0.121 −1.081 −0.609
sexM −0.104 0.137 −0.373 0.165
Location-scale model (dispersion part)
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 0.839 0.260 0.330 1.349
conditionsupplied −0.662 0.248 −1.148 −0.175
sexM 0.132 0.249 −0.355 0.619

The Conway-Maxwell-Poisson (CMP) model is a flexible count data model that can handle overdispersion and underdispersion.

# CMP model ----

model_2 <- glmmTMB(
  frequency ~ 1 + condition + sex + (1 | species) + (1 | id),
  dispformula = ~ condition + sex, 
  data = dat_pref,
  family = compois(link = "log")
)
summary(model_2)

confint(model_2)

model.sel(model_0, model_1, model_2)
 Family: compois  ( log )
Formula:          frequency ~ 1 + condition + sex + (1 | species) + (1 | id)
Dispersion:                 ~condition + sex
Data: dat_pref

      AIC       BIC    logLik -2*log(L)  df.resid 
   2158.4    2184.4   -1071.2    2142.4       182 

Random effects:

Conditional model:
 Groups  Name        Variance Std.Dev.
 species (Intercept) 0.246    0.4960  
 id      (Intercept) 0.151    0.3886  
Number of obs: 190, groups:  species, 12; id, 95

Conditional model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)         4.8614     0.1857  26.178  < 2e-16 ***
conditionsupplied  -0.7823     0.1044  -7.495 6.62e-14 ***
sexM               -0.1099     0.1257  -0.874    0.382    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dispersion model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)         4.0338     0.4457   9.050   <2e-16 ***
conditionsupplied   1.1965     0.7681   1.558    0.119    
sexM               -0.1428     0.5530  -0.258    0.796    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
                                      2.5 %     97.5 %   Estimate
cond.(Intercept)                  4.4974266  5.2253653  4.8613959
cond.conditionsupplied           -0.9868149 -0.5776988 -0.7822569
cond.sexM                        -0.3562620  0.1364832 -0.1098894
disp.(Intercept)                  3.1602735  4.9074247  4.0338491
disp.conditionsupplied           -0.3089514  2.7019951  1.1965218
disp.sexM                        -1.2266046  0.9409510 -0.1428268
cond.Std.Dev.(Intercept)|species  0.2691415  0.9139714  0.4959714
cond.Std.Dev.(Intercept)|id       0.2503174  0.6033250  0.3886165
Model selection table 
        cond((Int)) dsp((Int)) cond(cndt) cond(sex) dsp(cndt) dsp(sex) family
model_2       4.861     4.0340          +         +         +        + cm(lg)
model_1       4.874     0.8394          +         +         +        + n2(lg)
model_0       4.841                     +         +                    n2(lg)
        df    logLik   AICc delta weight
model_2  8 -1071.193 2159.2  0.00  0.999
model_1  8 -1078.021 2172.8 13.66  0.001
model_0  6 -1082.155 2176.8 17.59  0.000
Abbreviations:
 family: cm(lg) = 'compois(log)', n2(lg) = 'nbinom2(log)'
Models ranked by AICc(x) 
Random terms (all models): 
  cond(1 | species), cond(1 | id)

Here, we compared three generalised linear mixed models. model_0 was a Negative Binomial (NB) model with a single, overall estimated dispersion parameter (i.e., not modelled as a function of predictors). model_1 extended this by modelling the NB dispersion parameter as a function of condition and sex. model_2 further used a Conway-Maxwell-Poisson (CMP) distribution, which can accommodate both under- and over-dispersion, with its dispersion parameter similarly modelled by condition and sex.

Model comparison based on AICc strongly favored the location-scale CMP model (model_2: AICc = 2159.2, model weight = 0.999), with the location-scale NB model (model_1: AICc = 2172.8) and the NB model with a single dispersion parameter (model_0: AICc = 2176.8) performing substantially worse.

In all three models, food-supplied birds showed significantly lower gaze frequencies toward dot stimuli. There was no evidence of sex differences in the location part in any model.

Regarding the dispersion parameter, in the location-scale NB model (model_1), the dispersion part revealed a significant change in residual variance in the food-supplied condition (\(\beta_{[\text{conditionSupplied}]}^{(l)}\) = –0.662). Given the parameterization of the Negative Binomial distribution (where a smaller dispersion parameter \(\theta\) indicates greater overdispersion), this suggests an increase in overdispersion when food was supplied, implying less consistent behavior across individuals. However, this effect was not statistically significant in the location-scale CMP model (model_2), where the corresponding estimate was 1.20 (p = 0.12). In the CMP model, a positive estimate for the dispersion parameter \(\nu\) would imply a decrease in overdispersion (i.e., more consistent behavior) if it were significant.

Random effects in all models consistently showed greater variance among species than among id within species. For example, in the location-scale CMP model, species-level variance was 0.246, while individual-level variance was 0.151.

Comparing Negative Binomial and CMP Models

Below, we compare the Negative Binomial (NB) and Conway-Maxwell-Poisson (CMP) models to assess which better fits the data. Model selection is based on AICc and residual diagnostics.

Model comparison of location-only and location-scale models in negative Binomial (NB) and Conway-Maxwell-Poisson (CMP) distributions.

All models passed the DHARMa dispersion and outlier tests, indicating appropriate handling of overall variance and absence of extreme observations. However, the Kolmogorov–Smirnov (KS) test consistently revealed significant deviations from the expected uniform distribution of residuals in all three cases, suggesting remaining misfits in distributional shape. The location-scale CMP model performed best in terms of within-group residual uniformity, showing no significant deviation in any predictor level. In contrast, both the location-only NB model and the location-scale NB model exhibited within-group deviations in some categories. These results collectively suggest that while none of the models perfectly capture the residual structure, the location-scale CMP model may offer the best overall fit among the candidates for the observed data characteristics.

You can also fit the location-scale model using the brms package. The difference is that in brms, you can specify scale parts using shape. Below, we show how to fit the same location-scale negative binomial model as above using brms.

formula1 <- bf(
  frequency ~ 1 + condition + sex + (1 | species) + (1 | id),
  shape ~ 1 + condition + sex
)

prior1 <- default_prior(formula1, 
                        data = dat_pref, 
                        family = negbinomial(link = "log", link_shape = "log")
)

system.time(model_nb_brms <- brm(formula1, 
            data = dat_pref, 
            prior = prior1,
            chains = 2, 
            iter = 5000, 
            warmup = 3000,
            thin = 1,
            family = negbinomial(link = "log", link_shape = "log"),
            # control = list(adapt_delta = 0.95)
)
)
summary(model_nb_brms)
 Family: negbinomial 
  Links: mu = log; shape = log 
Formula: frequency ~ condition + sex + (1 | species) + (1 | id) 
         shape ~ condition + sex
   Data: dat_pref (Number of observations: 190) 
  Draws: 2 chains, each with iter = 5000; warmup = 3000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~id (Number of levels: 95) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.38      0.13     0.08     0.60 1.00      482      563

~species (Number of levels: 12) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.69      0.23     0.36     1.24 1.00      978     1816

Regression Coefficients:
                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                   4.85      0.25     4.35     5.34 1.00     1044
shape_Intercept             0.87      0.28     0.36     1.45 1.00      966
conditionsupplied          -0.84      0.12    -1.09    -0.60 1.00     4843
sexM                       -0.11      0.14    -0.40     0.17 1.00     3632
shape_conditionsupplied    -0.70      0.28    -1.29    -0.19 1.00     2044
shape_sexM                  0.14      0.25    -0.35     0.61 1.00     4208
                        Tail_ESS
Intercept                   1637
shape_Intercept             1938
conditionsupplied           3049
sexM                        3220
shape_conditionsupplied     1777
shape_sexM                  2673

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Comparison of location-only model and location-scale model

  • Modeling dispersion improves model fit and better captures the structure of the data.

  • Location-scale model had a lower AICc (2172.8) than location-only model (2176.8), indicating better model fit. It also had a higher model weight (0.877 vs. 0.123), suggesting stronger support for the location-scale model.

Interpretation of location-scale model :

Location (mean) part:

  • The supplied condition significantly lowers gaze frequency (\(\beta_{[\text{conditionSupplied}]}^{(l)}\) = -0.845, 95% CI [-1.08, -0.61]$).
  • There is no significant sex difference in mean gaze frequency (\(\beta_{[\text{sex-male}]}^{(l)}\) = -0.104, 95% CI [-0.37, 0.16]$).

Scale (dispersion) part:

  • The supplied condition significantly increases residual variance (\(\beta_{[\text{deprived–supplied}]}^{(s)}\) = -0.662, 95% CI [1.15, -0.18]$). This suggests more varied gaze behaviour across individuals under the supplied condition.

  • There is no significant sex difference in the variability of gaze frequency (\(\beta_{[\text{sex-male}]}^{(s)}\) = 0.132, 95% CI [-0.36, 0.62]$).

Location (mean) random effects:

  • Species-level (\(sd_{[\text{species}]}\) = 0.55, 95% CI [0.31, 0.99]$): Indicates some variation in mean gaze frequency among species.

Location (dispersion) random effects:

  • Individual-level (\(sd_{[\text{species}]}\) = 0.34, 95% CI [0.17, 0.68]): Suggests variation among individuals within species, but this is smaller than the variation across species.

Conclusion

Q1: Do birds gaze at dot patterns more (or less) when food-deprived compared to food-supplied?

Answer: Yes. In the location (mean) part of the model, the coefficient for the supplied condition is significantly negative. This indicates that participants made fewer gazes toward dot stimuli in the food-supplied condition compared to the food-deprived condition. So, the average gaze frequency is lower when food is supplied.

Q2: Does the variability in gaze responses differ between conditions?

Answer: Yes. In the scale (dispersion) part of the model, the coefficient for the food-supplied condition is significantly negative. Since lower θ values imply greater dispersion, this result suggests that individual gaze responses were more variable when food was supplied. In other words, birds showed more consistent gaze responses when food was deprived, and more varied responses when food was available.

Beyond Gaussian 2

This dataset comes from Lundgren et al. (2022), which examined the activity and impact of feral donkeys (Equus africanus asinus) between wetlands with and without cougar (Puma concolor) predation.

For this example, we will focus on the effect of cougar predation on the percentage of ground trampled by feral donkeys.

Questions

  1. Does cougar predation affect the percentage of ground trampled by feral donkeys?
  2. Does cougar predation affect the variability in the percentage of ground trampled by feral donkeys?

Variables included

  • cover: Percentage of ground trampled by feral donkeys
  • Site: Name of the site where the data was collected
  • Access_point: Access point to the sites
  • if_kill: Whether a cougar kill (dead donkey) was found at the site (1 = yes, 0 = no)

Visualise the dataset

Violin plots illustrate the overall density distribution of percentage of trampled Bare Ground for both areas with (Yes) and without (No) donkey kills Each individual empty circle represents the trampled bare ground cover of a single observation. The plot visually highlights individual variability and allows for the comparison of both central tendency (indicated by the black solid line, representing the mean) and spread of trampled bare ground cover between areas where burro kills are absent versus present.

dat <- read.csv(here("data","Lundgren_Cougar_Burro_Trophic_Cascade_Trampled_BareGround.csv"))

dat <- dat %>%
  dplyr::select(Site, Pool, if_kill, cover) %>%
  mutate(Site=as.factor(Site),
         if_kill=as.factor(if_kill),
         cover=as.numeric(cover)
  )

str(dat)
'data.frame':   115 obs. of  4 variables:
 $ Site   : Factor w/ 13 levels "Anvil Spring",..: 1 1 1 1 1 1 1 1 2 2 ...
 $ Pool   : chr  "Anvil Spring" "Anvil Spring" "Anvil Spring" "Anvil Spring" ...
 $ if_kill: Factor w/ 2 levels "burro kills absent",..: 1 1 1 1 1 1 1 1 2 2 ...
 $ cover  : num  1 1 1 1 1 1 1 1 1 0.6 ...
ggplot(dat, aes(x = if_kill, y = cover, fill = if_kill)) +
  geom_violin(
    aes(fill = if_kill), # Fill violins based on 'if_kill'
    color = "#8B8B83", # Outline color for violins
    width = 0.8,
    alpha = 0.3,
    position = position_dodge(width = 0.7)
  ) +
  geom_jitter(
    aes(color = if_kill), # Color jittered points based on 'if_kill'
    size = 3,
    alpha = 0.4,
    shape = 1, # Open circles for jittered points
    position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.15)
  ) +
  stat_summary(
    fun = mean,
    geom = "crossbar",
    width = 0.1,
    color = "black", # Black crossbar for mean
    linewidth = 0.5,
    position = position_dodge(width = 0.7)
  ) +
  labs(
    title = "Donkey Trampling by Cougar Kill Presence",
    x = "Cougar Kill Presence",
    y = "Proportion Trampled Bare Ground"
  ) +
  scale_fill_manual(
    values = c("burro kills absent" = "cornflowerblue", "burro kills present" = "firebrick")
  ) +
  scale_color_manual( # Add scale_color_manual for jitter points
    values = c("burro kills absent" = "cornflowerblue", "burro kills present" = "firebrick")
  ) +
  scale_x_discrete(
    labels = c("burro kills absent" = "No", "burro kills present" = "Yes"),
    expand = expansion(add = 0.5)
  ) +
  theme_classic(base_size = 16) +
  theme(
    axis.text = element_text(color = "#6E7B8B", size = 14),
    axis.title = element_text(color = "#6E7B8B", size = 14),
    legend.position = "none",
    axis.text.x = element_text(angle = 0, hjust = 0.5)
  )

Run models and interpret results

We will fit two zero-one inflated beta regression models to the data using the brms package. We’ll use the brms package because the glmmTMB (frequentist) approach does not allow for values of ‘1’, while the brms package allows us to fit zero-one inflation models to account for these boundary values. All models incorporate Pool as a random effect to account for non-independence of observations collected at the same wetland access point

The models are as follows:

  1. Location-Only Model: This model will estimate the average difference in the percentage of trampled bare ground between areas where “burro kills are absent” and where “burro kills are present.”
  2. Location-Scale Model: This more comprehensive model will estimate both the average difference (location) and any differences in the variability (scale) of the percentage of trampled bare ground between areas with and without burro kills.

Model fitting

zoi ~ if_kill (Zero-One Inflation; zoi): This part of the model estimates how if_kill (cougar kill presence) influences the probability that an observation is exactly 0% or exactly 100% trampled bare ground. If if_kill increases the zoi probability, it means that areas with burro kills are more likely to have either no trampled ground or completely trampled ground, compared to areas without burro kills.

coi ~ if_kill (Conditional One-Inflation): This is where we distinguish between the 0% and 100% boundaries. If an observation is exactly at a boundary (0% or 100% trampled bare ground), the coi part of the model estimates how if_kill influences the probability that this boundary observation is 100% (fully trampled) rather than 0% (no trampled ground). So, if coi increases with if_kill, it suggests that if a site has exact boundary values, the presence of burro kills makes it more likely for that site to be completely trampled (100%) rather than completely untrampled (0%).

m0<-bf(cover ~ if_kill + (1|Pool),
       zoi~  if_kill, 
       coi~  if_kill) 
prior1<-default_prior(m0, family=zero_one_inflated_beta(), data=dat)

# Since this model is time-consuming, reload without running it:
rerun <- F
if(rerun){
  
  fit0 <- brm(
   m0,
   data = dat,
   family = zero_one_inflated_beta(),
   prior = prior1,
   iter = 6000,
   warmup = 1000,
   chains = 2,  cores=2,
   control = list(
     adapt_delta = 0.99,
     max_treedepth = 15
   ),
   seed = 123,
   refresh = 500
  )
  saveRDS(fit0, file = here("Rdata", "fit0_BETA_Burros.rds"))
  
}else{
  fit0 <- readRDS(here("Rdata", "fit0_BETA_Burros.rds"))
}

summary(fit0)
 Family: zero_one_inflated_beta 
  Links: mu = logit; phi = identity; zoi = logit; coi = logit 
Formula: cover ~ if_kill + (1 | Pool) 
         zoi ~ if_kill
         coi ~ if_kill
   Data: dat (Number of observations: 115) 
  Draws: 2 chains, each with iter = 6000; warmup = 1000; thin = 1;
         total post-warmup draws = 10000

Multilevel Hyperparameters:
~Pool (Number of levels: 16) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.42      0.25     0.03     0.98 1.00     2136     2656

Regression Coefficients:
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                        0.70      0.41    -0.11     1.52 1.00     6799
zoi_Intercept                    0.72      0.33     0.10     1.37 1.00    15641
coi_Intercept                    6.31      2.97     2.51    13.90 1.00     4844
if_killburrokillspresent        -1.02      0.46    -1.95    -0.11 1.00     7279
zoi_if_killburrokillspresent    -1.70      0.42    -2.53    -0.89 1.00    15364
coi_if_killburrokillspresent    -4.24      3.06   -11.91    -0.07 1.00     4529
                             Tail_ESS
Intercept                        6130
zoi_Intercept                    7396
coi_Intercept                    3104
if_killburrokillspresent         6371
zoi_if_killburrokillspresent     7048
coi_if_killburrokillspresent     3118

Further Distributional Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi     2.11      0.35     1.51     2.87 1.00     6161     7129

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

phi (Precision/Scale Parameter): This part of the model estimates how if_kill (cougar kill presence) influences the variability of the percentage of trampled bare ground for observations that fall between 0% and 100% (i.e., not exactly at the boundaries). A higher phi value indicates lower variability (data points are more tightly clustered around the mean). A lower phi value indicates higher variability (data points are more spread out).

m1<-bf(cover ~ if_kill + (1|Pool),
       zoi ~ if_kill, 
       coi ~ if_kill,
       phi ~ if_kill) 
prior2<-default_prior(m1, family=zero_one_inflated_beta(), data=dat)

rerun <- F
if(rerun){
  fit1 <- brm(
   m1,
   data = dat,
   family = zero_one_inflated_beta(),
   prior = prior2,
   iter = 6000,
   warmup = 1000,
   chains = 2,  cores=2,
   control = list(
     adapt_delta = 0.99,
     max_treedepth = 15
   ),
   seed = 123,      #
   refresh = 500    #
  )
  saveRDS(fit1, file = here("Rdata", "fit1_BETA_Burros.rds"))
}else{
  fit1 <- readRDS(here("Rdata", "fit1_BETA_Burros.rds"))
}

summary(fit1)
 Family: zero_one_inflated_beta 
  Links: mu = logit; phi = log; zoi = logit; coi = logit 
Formula: cover ~ if_kill + (1 | Pool) 
         zoi ~ if_kill
         coi ~ if_kill
         phi ~ if_kill
   Data: dat (Number of observations: 115) 
  Draws: 2 chains, each with iter = 6000; warmup = 1000; thin = 1;
         total post-warmup draws = 10000

Multilevel Hyperparameters:
~Pool (Number of levels: 16) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.63      0.26     0.13     1.19 1.00     2237     2163

Regression Coefficients:
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                        0.92      0.46     0.01     1.85 1.00     4499
phi_Intercept                    1.73      0.49     0.72     2.61 1.00     3574
zoi_Intercept                    0.72      0.33     0.10     1.39 1.00    12262
coi_Intercept                    6.28      2.95     2.61    14.08 1.00     4100
if_killburrokillspresent        -1.22      0.52    -2.27    -0.18 1.00     4920
phi_if_killburrokillspresent    -1.07      0.51    -2.01    -0.04 1.00     4334
zoi_if_killburrokillspresent    -1.71      0.42    -2.55    -0.89 1.00    11924
coi_if_killburrokillspresent    -4.20      3.05   -12.07    -0.08 1.00     3953
                             Tail_ESS
Intercept                        5041
phi_Intercept                    5090
zoi_Intercept                    7049
coi_Intercept                    2701
if_killburrokillspresent         5730
phi_if_killburrokillspresent     5570
zoi_if_killburrokillspresent     6844
coi_if_killburrokillspresent     2698

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
f0loo <- loo::loo(fit0)
f1loo <- loo::loo(fit1)

fc<-loo::loo_compare(f0loo, f1loo)
fc
#      elpd_diff se_diff
# fit1  0.0       0.0   
# fit0 -2.7       1.6   

Model Comparison Results

Our analysis of the model comparison results shows that the location-scale model is the most supported by the data. This is indicated by its lowest LOO (Leave-One-Out Cross-Validation) information criterion value.

Model Interpretation

Location (mean) part:

On average, the log-odds of the mean percentage of trampled bare ground are lower when burro kills are present compared to when they are absent (\(\beta_{[\text{No-Yes}]}^{(l)}\) = -1.22, 95% CI [-2.27, -0.18]). This suggests that the presence of burro kills is associated with a lower average percentage of trampled bare ground.

Scale (dispersion/phi) part: The log-precision (phi) is lower when burro kills are present, compared to when they are absent (\(\beta_{[\text{No-Yes}]}^{(s)}\) = -1.07, 95% CI [-2.01, -0.04]). Suggesting that the percentage of trampled bare ground exhibits more variation or less consistency in areas where burro kills are present.

Zero-One Inflation (zoi) part: The log-odds of observing exactly 0% or 100% trampled bare ground are lower when burro kills are present (\(\beta_{[\text{No-Yes}]}^{(l)}\) = -1.71, 95% CI [-2.55, -0.89]). In other words, areas with burro kills are less likely to be at the extreme boundaries of either no trampled ground or entirely trampled ground.

Conditional on Inflation (coi) part: The log-odds of a boundary observation being 100% rather than 0% are lower when burro kills are present (\(\beta_{[\text{No-Yes}]}^{(l)}\) = -4.20, 95% CI [-12.07, -0.08]). This implies that if a boundary value is encountered with burro kills, it is less likely to be 100% (and potentially more likely to be 0%) compared to the “burro kills absent” scenario.

Conclusion

Does cougar predation affect the percentage of ground trampled by feral donkeys?

Answer: Yes, cougar predation, affects the average percentage of ground trampled by feral donkeys.

Does cougar predation affect the variability in the percentage of ground trampled by feral donkeys?

Answer: The model provides evidence that cougar predation affects the variability in the percentage of ground trampled by feral donkeys.

Model selection

This section outlines general guidelines for model selection and refinement, based on the results from the previous examples (sections XX and Beyond Gaussian 1). These steps can be applied to any model fitting process, whether using glmmTMB, brms, or other packages.

  1. Identify data type and plot raw data to understand the distribution and structure of the response variable. This helps in selecting an appropriate model family (e.g., Gaussian, Poisson, Negative Binomial, Beta).
  2. Begin with a simpler model (e.g., a location-only model assuming homoscedasticity) to establish a baseline for interpretation.
  3. Check residual diagnostics to detect possible model misspecification, such as overdispersion or non-uniformity of residuals. If issues are detected, consider more complex models (e.g., location-scale models) that account for heterogeneity in variance.
  4. Gradually increase model complexity: Introduce additional components, such as group-level effects on the scale (variance), and consider estimating correlations between random effects when theoretically or empirically justified—especially when supported by sufficient sample size.
  5. Compare models using information criteria:
  • For frequentist models: use AIC (Akaike Information Criterion)
  • For Bayesian models: use LOO (Leave-One-Out Cross-Validation) or WAIC (Widely Applicable Information Criterion).

Ultimately, the goal is not only to improve statistical fit but also to gain biological insights. For example, finding that a scale predictor improves model fit may suggest biologically meaningful heterogeneity.

We demonstrate these steps using the previous examples, where we started with a location-only model and then moved to a location-scale model to account for heterogeneity in variance. We also compared models using AICc and LOO criteria to select the best-fitting model. Although parts of the examples below may repeat content from earlier sections, we present them again here in sequence to illustrate a typical approach to model selection and refinement.

One important thing to keep in mind is that DHARMa does not work with models fitted using brms. This means that you cannot check residual diagnostics for the location-only double-hierarchical model (model 3) if it was fitted with brms. If you want to use DHARMa for residual diagnostics, you need to run model 2.5 instead - in other words, you cannot include the correlation part.

Example 1

In our ongoing exploration of Blue-footed Booby chick development, we previously investigated whether a chick’s scaled mass index (SMI) at fledgling was influenced by its hatching order (Model 2). Building upon that foundation, we now delve into the critical role of additional covariates, a common practice in ecological fieldwork where numerous factors are hypothesized to affect our response variable. While exhaustive inclusion of all potential covariates is often constrained by data availability, strategic model selection allows us to pinpoint those that provide significant explanatory power.

One particularly relevant covariate in this study is the chick’s hatching date. In this specific booby population, earlier hatching dates are frequently associated with enhanced chick survival. Conversely, later hatching can coincide with periods of increased food scarcity as ocean temperatures rise and fish prey disperse. Given these environmental pressures, we hypothesize that under the stressful conditions of late-season hatching, the disparity in fledgling mass between chicks might be more pronounced compared to those hatched earlier in the season. This suggests a potential interactive effect between hatching date and hatching order.

In the subsequent sections, we will systematically investigate this hypothesis by:

Incorporating hatching date as a fixed effect into the location (mean) component of our model.

Adding an interaction term between hatching date and hatching order to the location part of the model.

Conducting model comparisons to assess whether accounting for hatching date, both as a main effect and in interaction with hatching order, significantly improves our model’s fit and provides novel insights into chick fledgling success.

Identify data type and plot raw data

Our response variable, Scaled Mass Index (SMI), is a continuous variable. Therefore, a Gaussian distribution will be employed for our statistical modeling. The raw data, illustrating the SMI of chicks at fledgling, categorized by their hatching order (1st or 2nd hatched), is visualized below.

dat <- read_csv(here("data","SMI_Drummond_et_al_2025.csv")) %>%subset(REDUCTION == 0)

dat<- dat%>%
  dplyr::select(-TIME, -HATCHING_DATE,-REDUCTION,-RING) %>%
  mutate(SMI = as.numeric(SMI), # Scaled mass index
         NEST = as.factor(NEST), 
         WORKYEAR = as.factor(WORKYEAR),
         RANK = factor(RANK, levels = c("1", "2")))


ggplot(dat, aes(x = RANK, y = SMI, fill = RANK, color = RANK)) +
 geom_violin(aes(fill = RANK),
              color = "#8B8B83",
              width = 0.8, 
              alpha = 0.3,
              position = position_dodge(width = 0.7)) +
  geom_jitter(aes(color = RANK),
              size = 3,
              alpha = 0.4,
              shape = 1,
              position = position_jitterdodge(dodge.width = 0.5, jitter.width = 0.15)) + 
  stat_summary(fun = mean,              
               geom = "crossbar",       
               width = 0.1,             
               color = "black",         
               linewidth = 0.5,         
               position = position_dodge(width = 0.7))+
  labs(
    title = "Body Condition (SMI) by Hatching Order",
    x = "Hatching Order",
    y = "Scaled Mass Index (g)"
  ) +
  scale_fill_manual(
    values = c("1" = "#1F78B4", "2" = "#E31A1C") # 
  ) +
  scale_color_manual(
    values = c("1" = "#1F78B4", "2" = "#E31A1C") # 
  ) +
  scale_x_discrete(
    labels = c("1" = "First-Hatched", "2" = "Second-Hatched"),
    expand = expansion(add = 0.5)
  ) +
  theme_classic(base_size = 16) +
  theme(
    axis.text = element_text(color = "#6E7B8B", size = 14),
    axis.title = element_text(color = "#6E7B8B", size = 14),
    legend.position = "none",
    axis.text.x = element_text(angle = 0, hjust = 0.5)
  )

Begin with a simpler model and check residual diagnostics

Having established our initial modeling framework, we now move to a critical phase: assessing the model’s assumptions through residual analysis. We will begin with a foundational model that includes only hatching order as a fixed effect. To facilitate this, we’ll leverage the DHARMa package, which simulates scaled residuals and provides automated diagnostic plots. This step is indispensable for verifying key model assumptions, particularly the normality and homoscedasticity of the residuals, ensuring the reliability of our inferences.

model2_1<-glmmTMB(log(SMI) ~ 1 + RANK + (1|NEST)+(1|WORKYEAR),
              family = gaussian(),data=dat)



# | label: model_diagnostics - model0
simulationOutput <- simulateResiduals(fittedModel = model2_1, plot = F)
plot(simulationOutput)

Our initial residual diagnostics, specifically the QQ plot and Kolmogorov-Smirnov (KS) test, clearly indicate that the model’s residuals are not uniformly distributed. This non-uniformity signals potential issues with our chosen statistical distribution or the fundamental structure of the current model.

Furthermore, despite no significant concerns regarding overdispersion or outliers, a closer look at the boxplots and the Levene test revealed a lack of consistency within residual groups and non-homogeneous variances. These collective findings strongly suggest that our current model is not fully capturing the inherent complexity and variability within the data.

To effectively address this non-uniformity and varying dispersion, we will now proceed with fitting a location-scale model. For simplicity and to focus on the effects of our new covariate, we will specifically structure this model to include RANK as a fixed effect only in the scale (variance) component. The new covariate, and its interaction with RANK, will be added solely to the location (mean) part of the model. This approach allows us to investigate the impact of the new covariate on the average SMI while simultaneously accounting for the varying dispersion observed across RANK levels.

Gradually increase model complexity

Our next step involves enhancing the model by incorporating additional covariates. We begin by preparing our new covariate, ST_DATE, which represents the standardized hatching date (with November 3rd designated as day 1).

Crucially, we will center ST_DATE before including it in the model. Centering involves subtracting the mean of the variable from each observation, resulting in a new variable with a mean of zero.

This practice is particularly beneficial because it:

Facilitates Interpretation of Main Effects: When interaction terms are present in a model, the uncentered main effects can be difficult to interpret, as they represent the effect of that variable when all other interacting variables are at zero (which might be outside the observed range of the data). Centering ensures that the main effect of a predictor is interpreted at the average value of the other interacting predictors, making its meaning more intuitive and relevant within the context of our data.

Reduces Multicollinearity: While less critical for simple interactions, centering can sometimes help reduce multicollinearity between main effects and their interaction terms, which can improve model stability and estimation.

By centering ST_DATE, we set the stage for a more robust and interpretable analysis as we gradually build out our model.

dat<- dat%>%
  dplyr::mutate(ST_DATE=scale(ST_DATE,center=T,scale = F))
model2_2 <- glmmTMB(
    log(SMI) ~ 1 + RANK + (1|NEST)+(1|WORKYEAR),
    dispformula = ~ 1 + RANK,     
    data = dat, 
    family = gaussian
    )

First, we build a location-scale model with ST_DATE (centered) as a fixed effect in the location (mean) part of the model.

model2_3a <- glmmTMB(
    log(SMI) ~ 1 + RANK + ST_DATE + (1|NEST)+(1|WORKYEAR),
    dispformula = ~ 1 + RANK,     
    data = dat, 
    family = gaussian
    )

Following the previous model, we now increase complexity further by including a two-way interaction.

model2_3b<- glmmTMB(
    log(SMI) ~ 1 + RANK + ST_DATE + RANK*ST_DATE +(1|NEST)+(1|WORKYEAR),
    dispformula = ~ 1 + RANK,     
    data = dat, 
    family = gaussian
    )

Compare models using information criteria

To evaluate whether our model’s fit is significantly improved by the inclusion of additional covariates and interaction terms, we will now employ information criteria, specifically the Akaike Information Criterion corrected for small sample sizes (AICc). AICc provides a balance between model fit and model complexity, penalizing models with more parameters to help identify the most parsimonious yet well-fitting model.

Our first comparison will be between the model that includes hatching date as a fixed effect in the location part and the simpler model that omits this covariate. This will allow us to assess the individual contribution of ST_DATE to explaining the variation in log(SMI).

model.sel(model2_2, model2_3a)
Model selection table 
          cnd((Int)) dsp((Int)) cnd(RAN) dsp(RAN) cnd(ST_DAT) df   logLik
model2_3a      7.381     -2.454        +        +   -0.001207  7 4114.695
model2_2       7.361     -2.470        +        +              6 4041.417
             AICc  delta weight
model2_3a -8215.4   0.00      1
model2_2  -8070.8 144.55      0
Models ranked by AICc(x) 
Random terms (all models): 
  cond(1 | NEST), cond(1 | WORKYEAR)

Our first comparison shows a clear improvement: the model including hatching date (model2_3a) has a lower AICc value (AICc = -8215.4) compared to the model without it (model2_2, AICc = -8070.8). This confirms that hatching date significantly improves model fit.

Next, we will compare model2_3a (hatching date as a fixed effect) with the model that includes an interaction between hatching date and hatching order in the location part.

model.sel(model2_3a, model2_3b)
Model selection table 
          cnd((Int)) dsp((Int)) cnd(RAN) cnd(ST_DAT) dsp(RAN) cnd(RAN:ST_DAT)
model2_3b      7.381     -2.466        +  -0.0008479        +               +
model2_3a      7.381     -2.454        +  -0.0012070        +                
          df   logLik    AICc delta weight
model2_3b  8 4136.841 -8257.7  0.00      1
model2_3a  7 4114.695 -8215.4 42.28      0
Models ranked by AICc(x) 
Random terms (all models): 
  cond(1 | NEST), cond(1 | WORKYEAR)

Comparing the models, we find that the model incorporating the interaction between hatching date and hatching order (model2_3b) has a lower AICc value (AICc = -8257.7) compared to the model without it (model2_3a, AICc = -8215.4). This improvement in AICc indicates that the interaction model provides a superior fit, effectively capturing how the influence of hatching date on fledgling mass varies between first- and second-hatched chicks.

 Family: gaussian  ( identity )
Formula:          
log(SMI) ~ 1 + RANK + ST_DATE + RANK * ST_DATE + (1 | NEST) +  
    (1 | WORKYEAR)
Dispersion:                ~1 + RANK
Data: dat

      AIC       BIC    logLik -2*log(L)  df.resid 
  -8257.7   -8206.0    4136.8   -8273.7      4729 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 NEST     (Intercept) 0.002171 0.04660 
 WORKYEAR (Intercept) 0.007207 0.08489 
 Residual                   NA      NA 
Number of obs: 4737, groups:  NEST, 2873; WORKYEAR, 24

Conditional model:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    7.3813266  0.0179296   411.7  < 2e-16 ***
RANK2         -0.0194548  0.0026743    -7.3 3.47e-13 ***
ST_DATE       -0.0008479  0.0001117    -7.6 3.19e-14 ***
RANK2:ST_DATE -0.0007263  0.0001085    -6.7 2.18e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dispersion model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.46607    0.02180 -113.14  < 2e-16 ***
RANK2        0.10114    0.02614    3.87 0.000109 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By systematically comparing models and gradually increasing their complexity, we can gain deeper insights into the underlying data generating process. In this analysis, we expanded our model by incorporating ST_DATE as a covariate and, importantly, its interaction with RANK. The results from this more complex model show that both ST_DATE and the RANK:ST_DATE interaction term are statistically significant predictors of log(SMI). This suggests that the relationship between SMI and RANK changes over time, and/or the effect of ST_DATE on SMI depends on RANK. While these terms demonstrably improve the model’s fit, it’s worth noting that their estimated slopes are quite shallow (e.g., -0.0008479 for ST_DATE and -0.00007263 for the interaction). This implies that, despite their statistical significance, the practical impact of ST_DATE and its interaction on log(SMI) might be relatively small. Therefore, depending on the specific goals of the analysis and the desired level of model parsimony, a simpler model without these terms might perform nearly as well in terms of predictive power, while being easier to interpret

Example 2

We revisit the dataset on Estrildid finches to examine how food supplementation affects gaze frequency toward dot patterns (as introduced in Beyond Gaussian 1 section).

To avoid repeating the structure of the previous example, we now take a more realistic modeling approach by incorporating phylogenetic non-independence as a random effect. The response variable, frequency, is a count variable, which supports the use of Poisson or Negative Binomial distributions. However, as visualised in the plots below, there is substantial between-species variation—both in mean gaze frequency and in the degree of within-species variability.

To better capture this structure, we include phylogeny as a random effect, distinct from the previously used species effect. While species was treated as a simple categorical grouping factor, the phylogenetic random effect accounts for shared evolutionary history and allows for correlated responses among closely related species.

In the following sections, we will:

  • Include phylogeny as a random effect in the location and/or scale parts of the model.

  • Compare models with and without phylogenetic structure to evaluate whether accounting for evolutionary history improves model fit.

Identify data type and plot raw data

The violin plot by condition and sex shows that gaze frequency is generally higher in the food-deprived condition, with some individuals—especially females—showing very high values (over 500). The faceted line plot by species shows:

  • Consistent directional trends (e.g., reduced gazing in the supplied condition)
  • Species-specific differences in variance, suggesting possible heteroscedasticity

These patterns raise two important considerations:

  1. Overdispersion is likely present, particularly in the deprived condition, where the spread of counts is large.
  2. A phylogenetic effect may be at play, since species vary not only in their average gaze frequencies but also in how consistent individuals are within each species - possibly due to traits shaped by common ancestry.
ggplot(dat_pref, aes(x = condition,
                     y = frequency)) +
    geom_violin(color = "#8B8B83", fill = "white",
              width = 1.2, alpha = 0.3) +
  geom_jitter(aes(shape = sex, color = sex),
              size = 3, alpha = 0.8,
              width = 0.15, height = 0) +
  labs(
    title = "Total frequency of gazes towards dot patterns",
    x = "Condition",
    y = "Total frequency of gazes (1hr)"
  ) +
  scale_shape_manual(values = c("M" = 17, "F" = 16),
                     labels = c("M" = "Male", "F" = "Female")) +
  scale_color_manual(values = c("M" = "#009ACD", "F" = "#FF4D4D"),
                     labels = c("M" = "Male", "F" = "Female")) +
  theme_classic(base_size = 16) +
  theme(
    axis.text = element_text(color = "#6E7B8B", size = 14),
    axis.title = element_text(color = "#6E7B8B", size = 14),
    legend.title = element_text(color = "#6E7B8B"),
    legend.text = element_text(color = "#6E7B8B"),
    legend.position = "right",
  axis.text.x = element_text(angle = 45, hjust = 1)
  )

ggplot(dat_pref, aes(x = condition, y = frequency)) +
  geom_point(alpha = 0.5) +
  labs(
    title = "Total frequency of gazes towards dot patterns by species",
    x = "Condition",
    y = "Total frequency of gazes (1hr)"
  ) +
  stat_summary(fun = mean, geom = "line", aes(group = species, color = species)) +
  theme_classic() +
  facet_wrap(~ species)

Begin with a simpler model and check residual diagnostics

From the previous analysis, we already know that a location-only model without phylogenetic effects does not adequately capture the structure of the data. However, here we re-examine this model before moving on.

model_0 <- glmmTMB(
  frequency ~ 1 +  condition + sex + (1 | species) + (1 | id), # location part (mean)
  data = dat_pref,
  family = nbinom2(link = "log")
)

To evaluate model adequacy, we use the DHARMa package to simulate and visualise residuals. The plots allow us to assess residual uniformity, potential over- or underdispersion, outliers, and leverage.

# main diagnostic plots

model0_res <- simulateResiduals(model_0, plot = TRUE, seed = 42)

# formal test for over/underdispersion
testDispersion(model0_res) 


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.45606, p-value = 0.456
alternative hypothesis: two.sided

Although the dispersion test (p = 0.456) suggests no clear evidence of overdispersion based on the mean–variance relationship, the KS test indicates that the residuals deviate from a uniform distribution. These two tests assess different aspects of model adequacy: the dispersion test evaluates whether the variance is correctly specified as a function of the mean, while the KS test can detect broader misfits, such as unmodeled structure, zero inflation, or group-specific heteroscedasticity. Thus, even in the absence of overdispersion, the presence of non-uniform residuals supports the need for a more flexible model.

Since we have found the limitations of the model without phylogenetic effects, we now proceed to fit the same model using the brms package, this time including the phylogenetic effect as a random effect to account for the non-independence among species. Including both species and phylogeny as random effects allows us to distinguish between two sources of variation: the phylogenetic effect captures correlations arising from shared evolutionary history, whereas the species-level random intercept absorbs residual species-specific variation not accounted for by the tree—such as ecological or methodological factors. Including both terms helps avoid underestimating heterogeneity at the species level and provides a more accurate partitioning of variance.

First, we run the same model as above, but using the brms package. Then, we will gradually increase the model complexity by adding phylogenetic effects and a scale part to the model.

formula_eg2.0 <- bf(
  frequency ~ 1 + condition + sex + (1 | species) + (1 | id)
)

prior_eg2.0 <- default_prior(formula_eg2.0, 
                        data = dat_pref, 
                        family = negbinomial(link = "log")
)

model_eg2.0 <- brm(formula_eg2.0, 
            data = dat_pref, 
            prior = prior_eg2.0,
            chains = 2, 
            iter = 5000, 
            warmup = 3000,
            thin = 1,
            family = negbinomial(link = "log"),
            save_pars = save_pars(all = TRUE)
            # control = list(adapt_delta = 0.95)
)

Then, we add the phylogenetic effect as a random effect to the model. This allows us to account for the non-independence among species due to shared evolutionary history.

tree <- read.nexus(here("data", "AM_est_tree.txt"))
tree <-force.ultrametric(tree, method = "extend")
# is.ultrametric(tree)

tip <- c("Lonchura_cantans","Uraeginthus_bengalus","Neochmia_modesta",
         "Lonchura_atricapilla","Padda_oryzivora","Taeniopygia_bichenovii",
         "Emblema_pictum","Neochmia_ruficauda","Lonchura_maja",
         "Taeniopygia_guttata","Chloebia_gouldiae","Lonchura_castaneothorax")
tree <- KeepTip(tree, tip, preorder = TRUE, check = TRUE)

# Create phylogenetic correlation matrix
A <- ape::vcv.phylo(tree, corr = TRUE)

# Specify the model formula (location-only)
formula_eg2.1 <- bf(
  frequency ~ 1 + condition + sex + 
    (1 | a | gr(phylo, cov = A)) +  # phylogenetic random effect
    (1 | species) +             # non-phylogenetic species-level random effect (ecological factors)
    (1 | id)                    # individual-level random effect
)

prior_eg2.1 <- brms::get_prior(
  formula = formula_eg2.1, 
  data = dat_pref, 
  data2 = list(A = A),
  family = negbinomial(link = "log")
)

model_eg2.1 <- brm(
  formula = formula_eg2.1, 
  data = dat_pref, 
  data2 = list(A = A),
  chains = 2, 
  iter = 12000, 
  warmup = 10000,
  thin = 1,
  family = negbinomial(link = "log"),
  prior = prior_eg2.1,
  control = list(adapt_delta = 0.95),
  save_pars = save_pars(all = TRUE)
)

Gradually increase model complexity

In the previous model, variation in individual gaze frequency was modeled assuming a constant dispersion parameter across all groups. However, visual inspection of the raw data suggests heteroscedasticity (i.e., group-specific variance). To accommodate this, we now fit a location-scale model where the shape (dispersion) parameter is allowed to vary by condition and sex.

This model helps capture structured variability in the data and may reveal whether the precision (or variability) of gaze frequency differs systematically between experimental groups.

# Specify the model formula (location-scale) - the scale part does not include random effects
formula_eg2.2 <- bf(
  frequency ~ 1 + condition + sex + 
    (1 | a | gr(phylo, cov = A)) +  
    (1 | species) +          
    (1 | id),              
shape ~ 1 + condition + sex
)

prior_eg2.2 <- brms::get_prior(
  formula = formula_eg2.2, 
  data = dat_pref, 
  data2 = list(A = A),
  family = negbinomial(link = "log", link_shape = "log")
)

model_eg2.2 <- brm(
  formula = formula_eg2.2, 
  data = dat_pref, 
  data2 = list(A = A),
  chains = 2, 
  iter = 12000, 
  warmup = 10000,
  thin = 1,
  family = negbinomial(link = "log", link_shape = "log"),
  prior = prior_eg2.2,
  control = list(adapt_delta = 0.95),
  save_pars = save_pars(all = TRUE)
)
# specify the model formula (location-scale)
## the scale part includes phylogenetic random effects and individual-level random effects

formula_eg2.3 <- bf(
  frequency ~ 1 + condition + sex + 
    (1 |a| gr(phylo, cov = A)) +  # phylogenetic random effect
    (1 | species) +            # non-phylogenetic species-level random effect (ecological factors)
    (1 | id),              # individual-level random effect
shape ~ 1 + condition + sex +
    (1 |a| gr(phylo, cov = A)) +  
    (1 | species) +  
    (1 | id)          
    )

prior_eg2.3 <- brms::get_prior(
  formula = formula_eg2.3, 
  data = dat_pref, 
  data2 = list(A = A),
  family = negbinomial(link = "log", link_shape = "log")
)

model_eg2.3 <- brm(
  formula = formula_eg2.3, 
  data = dat_pref, 
  data2 = list(A = A),
  chains = 2, 
  iter = 12000, 
  warmup = 10000,
  thin = 1,
  family = negbinomial(link = "log", link_shape = "log"),
  prior = prior_eg2.3,
  control = list(adapt_delta = 0.95),
  save_pars = save_pars(all = TRUE)
)

summary(model_eg2.3)

Compare models using information criteria

To assess whether model fit improves with increased complexity, we compare models using Leave-One-Out Cross-Validation (LOO).

options(future.globals.maxSize = 2 * 1024^3)
loo_eg2.0 <- loo::loo(model_eg2.0, moment_match = TRUE,
                      reloo = TRUE, cores = 2)
loo_eg2.1 <- loo::loo(model_eg2.1, moment_match = TRUE, 
                      reloo = TRUE, cores = 2)
loo_eg2.2 <- loo::loo(model_eg2.2, moment_match = TRUE,
                      reloo = TRUE, cores = 2)
loo_eg2.3 <- loo::loo(model_eg2.3, moment_match = TRUE,
                      reloo = TRUE, cores = 2)

fc_eg2 <- loo::loo_compare(loo_eg2.0, loo_eg2.1, loo_eg2.2, loo_eg2.3)

print(fc_eg2)
#             elpd_diff se_diff
# model_eg2.3   0.0       0.0  
# model_eg2.1 -11.5       4.5  
# model_eg2.2 -11.7       5.3  
# model_eg2.0 -13.5       4.9  

summary(model_eg2.3)

Model eg2.3, which includes both phylogenetic and species-level effects on the location and scale parts, performs best (highest elpd).

ELPD stands for Expected Log Predictive Density. It is a measure of out-of-sample predictive accuracy—that is, how well a model is expected to predict new, unseen data. - It is used in Bayesian model comparison, especially when using LOO-CV (Leave-One-Out Cross-Validation). - Higher ELPD means better predictive performance. - It is calculated by taking the log-likelihood of each observation, averaged over posterior draws, and then summed over all observations. - It rewards models that fit the data well without overfitting.

 Family: negbinomial 
  Links: mu = log; shape = log 
Formula: frequency ~ condition + sex + (1 | a | gr(phylo, cov = A)) + (1 | species) + (1 | id) 
         shape ~ condition + sex + (1 | a | gr(phylo, cov = A)) + (1 | species) + (1 | id)
   Data: dat_pref (Number of observations: 190) 
  Draws: 2 chains, each with iter = 12000; warmup = 10000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~id (Number of levels: 95) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)           0.31      0.11     0.05     0.51 1.00      533      501
sd(shape_Intercept)     0.21      0.15     0.01     0.57 1.00     1125     1444

~phylo (Number of levels: 12) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                      0.49      0.25     0.05     1.04 1.00
sd(shape_Intercept)                0.74      0.38     0.09     1.63 1.00
cor(Intercept,shape_Intercept)     0.51      0.44    -0.64     0.99 1.00
                               Bulk_ESS Tail_ESS
sd(Intercept)                       759      670
sd(shape_Intercept)                1060      983
cor(Intercept,shape_Intercept)      854     1112

~species (Number of levels: 12) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)           0.27      0.20     0.01     0.74 1.00      833     1644
sd(shape_Intercept)     0.46      0.33     0.02     1.24 1.00      826     1519

Regression Coefficients:
                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                   5.07      0.30     4.46     5.65 1.00     1501
shape_Intercept             0.85      0.50    -0.16     1.84 1.00     1384
conditionsupplied          -0.77      0.11    -0.99    -0.54 1.00     3387
sexM                       -0.15      0.13    -0.40     0.11 1.00     2191
shape_conditionsupplied    -0.70      0.26    -1.22    -0.21 1.00     2251
shape_sexM                  0.12      0.26    -0.40     0.62 1.00     2814
                        Tail_ESS
Intercept                   2213
shape_Intercept             1884
conditionsupplied           2669
sexM                        2480
shape_conditionsupplied     2726
shape_sexM                  2867

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Accounting for phylogeny captures much of the between-species structure, it highlights evolutionary constraints or shared ecological traits. The remaining species-specific variance suggests additional factors (e.g., experimental nuances) not captured by the tree alone.

Extra

Here, we examine how sample size (N) affects estimates of the sample mean and sample standard deviation (SD). For each N = 2 to 50, draw 10000 samples from the standard normal distribution \(\mathcal{N}(0, 1)\). Then, compute mean and SD for each sample. After that, calculate the average of these estimates to see how close they are to the true values (mean = 0, SD = 1).

set.seed(2025)

n_reps <- 10000
Ns     <- 2:50

# generate all combinations of sample size (N) and replicate number
results <- expand.grid(
  N      = Ns,
  replic = seq_len(n_reps)
) %>%
  arrange(N, replic)

# pre-calculate how many total simulations to run and 
# generate a vector of sample sizes for all rows
n_total <- nrow(results)
Ns_vec <- results$N

# system.time({
# generate all random samples in one go (total number of draws = sum of all Ns)
samples_all <- rnorm(sum(Ns_vec), mean = 0, sd = 1)

# assign an ID to each sample, to keep track of which row (i.e., which N) it belongs to
row_id <- rep(seq_along(Ns_vec), times = Ns_vec)

# split the generated samples by row
split_samples <- split(samples_all, row_id)

# compute sample mean and SD for each group
means <- sapply(split_samples, mean)
sds   <- sapply(split_samples, sd)
# })

# add results into the data frame
results$samp_mean <- means
results$samp_sd <- sds

# summarise the average sample mean and SD per sample size (N)
summary2 <- results %>%
  group_by(N) %>%
  summarize(
    avg_mean = mean(samp_mean),
    avg_sd = mean(samp_sd),
    .groups = "drop"
  )

# plot A: Average Sample Mean vs. N
p_avg_mean <- ggplot(summary2, aes(x = N, y = avg_mean)) +
  geom_point(color = "#1f77b4", size = 2) +
  geom_line(color = "#1f77b4", size = 1) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  labs(
    title = "Average Sample Mean vs. Sample Size (N = 2…50)",
    subtitle = "True mean = 0 (dashed line)",
    x = "Sample size (N)",
    y = "Average of 10000 sample means"
  ) +
  coord_cartesian(ylim = c(-0.4, 0.4)) +
  theme_classic(base_size = 13) +
  theme(
    panel.grid.minor = element_blank()
  )

# plot B: Average Sample SD vs. N
p_avg_sd <- ggplot(summary2, aes(x = N, y = avg_sd)) +
  geom_point(color = "#ff7f0e", size = 2) +
  geom_line(color = "#ff7f0e", size = 1) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
  labs(
    title = "Average Sample SD vs. Sample Size (N = 2…50)",
    subtitle = "True SD = 1 (dashed line)",
    x = "Sample size (N)",
    y = "Average of 10000 sample SDs"
  ) +
  # coord_cartesian(ylim = c(-1.0, 1.0)) +
  theme_classic(base_size = 13) +
  theme(
    panel.grid.minor = element_blank()
  )

# display the two plots
p_avg_mean

p_avg_sd

For small sample sizes, sample variance and standard deviation tend to be biased, as shown in the bottom plot, the average sample SD is clearly lower than 1 when N is small. Once N exceeds approximately 20, the average sample SD (and variance) gets much closer to 1, and the bias becomes negligible.

References

  • Cleasby IR, Burke T, Schroeder J, Nakagawa S. (2011) Food supplements increase adult tarsus length, but not growth rate, in an island population of house sparrows (Passer domesticus). BMC Research Notes. 4:1-1. doi: 10.1186/1756-0500-4-431
  • Drummond H, Rodriguez C, Ortega S. (2025). Long-Term Insights into Who Benefits from Brood Reduction. Behavioral Ecology. doi: 10.1093/beheco/araf050
  • Mizuno A, Soma M. (2023) Pre-existing visual preference for white dot patterns in estrildid finches: a comparative study of a multi-species experiment. Royal Society Open Science. 10:231057. doi: 10.1098/rsos.231057
  • Lundgren EJ, Ramp D, Middleton OS, Wooster EI, Kusch E, Balisi M, Ripple WJ, Hasselerharm CD, Sanchez JN, Mills M, Wallach AD. (2022) A novel trophic cascade between cougars and feral donkeys shapes desert wetlands. Journal of Animal Ecology. 91:2348-57. doi: 10.1111/1365-2656.13766
  • Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, Paul-Christian Bürkner (2021). Rank-Normalization, Folding, and Localization: An Improved Rhat for Assessing Convergence of MCMC (with Discussion). Bayesian Analysis. 16:667-718. doi: 10.1214/20-BA1221

Information about R session

This section shows the current R session information, including R version, platform, and loaded packages.

R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Edmonton
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] knitr_1.50          kableExtra_1.4.0    here_1.0.1         
 [4] gt_1.0.0            tidybayes_3.0.7     patchwork_1.3.1    
 [7] bayesplot_1.13.0    MuMIn_1.48.11       loo_2.8.0          
[10] DHARMa_0.4.7        TreeTools_1.14.0    rstan_2.32.7       
[13] StanHeaders_2.32.10 phytools_2.4-4      maps_3.4.3         
[16] glmmTMB_1.1.11      emmeans_1.11.1      cmdstanr_0.9.0.9000
[19] brms_2.22.0         Rcpp_1.1.0          arm_1.14-4         
[22] lme4_1.1-37         Matrix_1.7-3        MASS_7.3-65        
[25] ape_5.8-1           broom.mixed_0.2.9.6 broom_1.0.8        
[28] lubridate_1.9.4     forcats_1.0.0       stringr_1.5.1      
[31] purrr_1.0.4         readr_2.1.5         tidyr_1.3.1        
[34] ggplot2_3.5.2       tidyverse_2.0.0     tibble_3.3.0       
[37] dplyr_1.1.4        

loaded via a namespace (and not attached):
  [1] svUnit_1.0.6            splines_4.4.2           bitops_1.0-9           
  [4] R.oo_1.27.1             lifecycle_1.0.4         Rdpack_2.6.4           
  [7] doParallel_1.0.17       rprojroot_2.0.4         vroom_1.6.5            
 [10] globals_0.18.0          processx_3.8.6          lattice_0.22-7         
 [13] ggdist_3.3.3            backports_1.5.0         magrittr_2.0.3         
 [16] sass_0.4.10             rmarkdown_2.29          yaml_2.3.10            
 [19] gap_1.6                 pkgbuild_1.4.8          minqa_1.2.8            
 [22] RColorBrewer_1.1-3      multcomp_1.4-28         abind_1.4-8            
 [25] expm_1.0-0              R.cache_0.17.0          quadprog_1.5-8         
 [28] R.utils_2.13.0          RCurl_1.98-1.17         TH.data_1.1-3          
 [31] tensorA_0.36.2.1        sandwich_3.1-1          inline_0.3.21          
 [34] listenv_0.9.1           bridgesampling_1.1-2    parallelly_1.45.0      
 [37] svglite_2.2.1           codetools_0.2-20        xml2_1.3.8             
 [40] tidyselect_1.2.1        farver_2.1.2            matrixStats_1.5.0      
 [43] stats4_4.4.2            jsonlite_2.0.0          survival_3.8-3         
 [46] iterators_1.0.14        systemfonts_1.2.3       foreach_1.5.2          
 [49] tools_4.4.2             glue_1.8.0              mnormt_2.1.1           
 [52] gridExtra_2.3           xfun_0.52               mgcv_1.9-3             
 [55] distributional_0.5.0    withr_3.0.2             numDeriv_2016.8-1.1    
 [58] combinat_0.0-8          fastmap_1.2.0           boot_1.3-31            
 [61] digest_0.6.37           timechange_0.3.0        R6_2.6.1               
 [64] estimability_1.5.1      textshaping_1.0.1       colorspace_2.1-1       
 [67] R.methodsS3_1.8.2       generics_0.1.4          clusterGeneration_1.3.8
 [70] htmlwidgets_1.6.4       scatterplot3d_0.3-44    pkgconfig_2.0.3        
 [73] gtable_0.3.6            furrr_0.3.1             htmltools_0.5.8.1      
 [76] TMB_1.9.17              scales_1.4.0            posterior_1.6.1        
 [79] reformulas_0.4.1        rstudioapi_0.17.1       tzdb_0.5.0             
 [82] reshape2_1.4.4          coda_0.19-4.1           checkmate_2.3.2        
 [85] nlme_3.1-168            curl_6.4.0              nloptr_2.2.1           
 [88] zoo_1.8-14              DEoptim_2.2-8           pillar_1.11.0          
 [91] grid_4.4.2              vctrs_0.6.5             PlotTools_0.3.1        
 [94] arrayhelpers_1.1-0      xtable_1.8-4            evaluate_1.0.4         
 [97] mvtnorm_1.3-3           cli_3.6.5               compiler_4.4.2         
[100] crayon_1.5.3            rlang_1.1.6             rstantools_2.4.0       
[103] labeling_0.4.3          ps_1.9.1                plyr_1.8.9             
[106] gap.datasets_0.0.6      stringi_1.8.7           viridisLite_0.4.2      
[109] QuickJSR_1.8.0          optimParallel_1.0-2     Brobdingnag_1.2-9      
[112] pacman_0.5.1            V8_6.0.4                hms_1.1.3              
[115] bit64_4.6.0-1           future_1.58.0           rbibutils_2.3          
[118] igraph_2.1.4            RcppParallel_5.1.10     phangorn_2.12.1        
[121] fastmatch_1.1-6         bit_4.6.0