Location–scale models in ecology and evolution: handling heterogeneity in continuous, count and proportion data

step by step online tutorial

Published

September 12, 2025

This tutorial provides a comprehensive guide to location-scale models, focusing on their application in ecology and evolution. Location-scale models allow researchers to simultaneously model both the mean (location) and variability (scale) of a response variable as functions of predictors. This is useful in biological data analysis, where understanding not just the average response but also the variability around that average can provide deeper insights into biological processes.

Some of the models in this tutorial take a long time to run. To make the tutorial faster and easier to follow, we have precomputed selected models and saved them as .rds files.

If you would like to re-run the code yourself and reproduce the results shown in the tutorial, please make sure to do the following:

  1. Download the following files:

  2. Create a folder structure like this on your machine:

    your-project-folder/
    ├── online_tutorial.qmd
    ├── data/
    │   └── (CSV files)
    └── Rdata/
        └── (RDS files from Google Drive)
  1. Adjust file paths if your folder structure is different. We use here::here()to help make paths more portable, but if you move or rename files/folders, make sure to update functions like readRDS() or read.csv() accordingly:
# this is example

dat <- read.csv(here("data", "XXX.csv"), header = TRUE) 
brms_m1 <- readRDS(here("Rdata", "brms_m1.rds"))

You will see similar code snippets throughout the tutorial. Feel free to modify them based on your own setup. If everything is placed correctly, the .qmd file should run without any issues.

This tutorial provides a step-by-step guide to applying location-scale models in ecology, evolution, and environmental sciences. We focus on practical applications and demonstrate how to implement these models in R using the glmmTMB and brms packages.

For models that can be implemented in both (e.g., Model 1, Model 2 or the negative binomial model in Beyond Gaussian 1), we provide code for both packages. However, for models that are currently only feasible in brms package (e.g., Model 3: where the location and scale parts include random effects), we illustrate the analysis using it alone.

After a brief introduction to both modeling frameworks, we present worked examples using real datasets. In the final section, we also discuss approaches for model comparison and selection.

1 Preparation

1.1 Load required packages

Our tutorial uses R statistical software and existing R packages, which you will first need to download and install.

This tutorial makes use of several R packages for data manipulation, model fitting, diagnostics, visualisation, and reporting:

  • dplyr, tibble, tidyverse - for efficient and tidy data manipulation.

  • brms, glmmTMB, arm - for fitting generalised linear mixed models.

  • cmdstanr - interfaces with the CmdStan backend to accelerate Bayesian sampling via within-chain parallelisation. is a C++ library for Bayesian inference, in order to enable within-chain parallelisation to speeding up the sampling process.

  • DHARMa, loo, MuMIn - for model diagnostics, cross-validation, and multi-model inference (note that both loo and Mumin contain functions with the same names, so call them with explicit namespaces, e.g. loo::loo( )).

  • ggplot2, patchwork, bayesplot, tidybayes - for flexible and publication-ready visualisations.

  • gt, kableExtra, knitr - for creating clean tables.

  • here - for consistent file path management across projects.

  • ape, TreeTools - for phylogenetic analyses and tree manipulation.

# Load required packages

pacman::p_load(
  ## data manipulation
  dplyr, tibble, tidyverse, broom, broom.mixed,
  
  ## model fitting
  ape, arm, brms, broom.mixed, cmdstanr, emmeans, glmmTMB, MASS, phytools, rstan, TreeTools,
  
  ## model checking and evaluation
  DHARMa, loo, MuMIn, parallel,
  
  ## visualisation
  bayesplot, ggplot2, patchwork, tidybayes,
  
  ## reporting and utilities
  gt, here, kableExtra, knitr
)

1.2 glmmTMB vs brms

glmmTMB is a powerful and flexible R package for fitting generalized linear mixed models (GLMMs), including models with random effect structures and scale (dispersion) part. It is built on the Template Model Builder (TMB) framework, which allows fast and efficient maximum likelihood estimation even for large and complex models.

brms is an R package that allows users to fit Bayesian generalized (non-)linear multilevel models using the probabilistic programming language Stan. It provides a user-friendly formula syntax similar to that of lme4 or glmmTMB, and supports a wide range of distributions, link functions, and advanced model components, including location-scale modeling.

Both packages are suitable for fitting location-scale models and are widely used in ecology and its related fields. Therefore, we selected them to illustrate how location-scale models can be practically applied in real data analysis.

While brms is a powerful and flexible package for Bayesian regression modeling, some readers may not be familiar with its usage. Below, we provide a brief introduction to fitting models using brms, focusing on the basic location-scale structure and key functions relevant to our analysis. You can also find some examples each section…

If you get stuck or are unsure about something, it might be helpful to check the below:

This example shows how to fit a simple location-scale model, where both the mean (\(\mu\)) and the variability (\(\sigma\)) of a continuous outcome variable \(y\) are modeled as functions of a predictor \(x\). Note that, throughout this tutorial, we explicitly write y ~ 1 + x to indicate that the model includes an intercept. However, y ~ x also includes an intercept by default, so both formulations are equivalent.

# Example dataset
# y is continuous response, x is a predictor
# specify the model using bf()
formula1 <- bf(
  y ~ 1 + x,  # location part
  sigma = ~ 1 + x # scale part - specified by sigma
)

# generate default priors based on the formula and data
default_priors <- default_prior(
                        formula1,
                        data = dat,                             
                        family = gaussian()                               
                          )

# fit the model - you can change N of iter, warmup (and thin), and also chains.
  m1 <- brm(formula1,
                  data = dat,           
                  family = gaussian(),                   
                  prior = default_priors,                
                  iter = 2000,   # total iterations per Markov-chain (i.e., how many posterior samples are drawn, including warm-up)
                  warmup = 1000, # number of early draws used only for adapting the sampler (step-size, mass matrix). These samples are discarded
                  thin = 1,      # keep every n-th post-warm-up draw. 1 means keep all draws                    
                  chains = 2,    #  number of independent MCMC chains run in parallel. Provides a convergence check (via Rhat)                    
             )
summary(m1)

After fitting the model, you can use summary(m1) to inspect the estimated coefficients and sigma with 95% Credible Intervals, along with diagnostic statistics such as Rhat and effective sample size. To better understand how to interpret the model output, please refer to the “Bonus - brms” part in the next section.

1.2.1 Parallel Processing

Before fitting our models with brms, we configure some global options to optimize sampling speed using parallel processing:

  1. parallel::detectCores(): This function automatically detects the number of logical CPU cores available on your machine. This is a convenient way to ensure your code adapts to different computing environments.
  2. options(mc.cores = parallel::detectCores()): The mc.cores option is a global setting primarily used by rstan (the engine behind brms). It controls the number of MCMC chains that will be run in parallel. By setting it to detectCores(), you are telling brms to run as many chains concurrently as your CPU allows, significantly speeding up the overall sampling process.
  3. options(brms.threads = 6): The brms.threads option specifies the number of CPU threads that Stan’s internal operations can utilize within a single MCMC chain. This enables within-chain parallelisation, further accelerating computations, especially for complex or large datasets. The value 6 is an example; you can adjust this based on your specific CPU architecture and memory.

Threading is a powerful feature that enables you to split a chain into multiple parallel threads, significantly reducing computation time. However, it requires installing both cmdstanr and the underlying CmdStan backend.

Code
cmdstanr::install_cmdstan()
cmdstanr::check_cmdstan_toolchain()
cmdstanr::cmdstan_version()

These settings are crucial for making Bayesian model fitting with brms more efficient, particularly for complex models or large datasets.

parallel::detectCores()
[1] 12
options(mc.cores = parallel::detectCores())

options(brms.threads = 6)  # Set global default

brms models can be computationally intensive and take a significant amount of time to run. To streamline your workflow, we provide the pre-fit models in RDS files, allowing you to load them directly without needing to re-run the lengthy estimation process.

1.3 Specifying the Scale Component

Here’s how the scale component is handled in glmmTMB and brms, along with common parameter names for various distributions:

Distribution Scale Parameter (Example) glmmTMB Specification brms Specification (Example)
Gaussian \(\sigma\) dispformula = ~ ... bf(..., sigma ~ ...)
Negative Binomial \(\theta\) dispformula = ~ ... bf(..., shape ~ ...)
Conway-Maxwell-Poisson \(\nu\) dispformula = ~ ... bf(..., shape ~ ...)
Beta-Binomial \(\phi\) dispformula = ~ ... bf(..., phi ~ ...)

Note: In glmmTMB, dispformula is generally used to model the dispersion or scale parameter, regardless of its specific Greek letter notation, which varies by distribution.

The scale part varies depending on the distribution: for example, \(\sigma\) for Gaussian or \(\theta\) for negative binomial, \(\nu\) for Conway–Maxwell–Poisson, \(\phi\) for beta-binomial distribution (see the main text).

1.4 Interpreting regression coefficients on the log scale

Before moving on to the examples, we provide a brief overview of how to interpret regression coefficients when the response variable or a distributional parameter is modelled on the non-natural scale. This is relevant for many of the models we will discuss in this tutorial. The models we use (Gaussian with log-transformed response, negative binomial, zero-one-inflated beta, Conway–Maxwell–Poisson, etc.) employ a log link for at least one distributional parameter. In these cases, regression coefficients (\(\beta\)) describe multiplicative changes in the parameter of interest. After exponentiation, these can usually be interpreted as percentage changes, with the important exception of logit-linked parameters, which yield odds ratios instead.

1.4.1 Rule of thumb

  • When using a log link:
    • \(\exp(\beta)\) gives the multiplicative ratio associated with a one-unit increase in the predictor.
    • \(100 \times (\exp(\beta) - 1)\) gives the percent change associated with a one-unit increase (positive) / decrease (negative) in the predictor.
      This applies to parameters such as the mean, variance/dispersion, or precision when they are modelled with a log link.
  • When using a logit link:
    Exponentiated coefficients represent odds ratios, not percentage changes.
    Example: \(\beta = 0.7 \;\Rightarrow\; \exp(0.7) \approx 2.0\), meaning the odds are about twice as high for a one-unit increase in the predictor.

1.4.2 Intercept vs. coefficients

  • Intercept:
    • If the response was log-transformed before fitting (e.g. Gaussian on log(Y)), exponentiating the intercept gives the mean of Y in the reference group.
    • If a log link was used (e.g. Poisson, negative binomial), exponentiating the intercept gives the mean of Y in the reference group.
  • Coefficients:
    • Each coefficient is an additive effect on the log scale. After exponentiation, it represents a multiplicative ratio on the original response scale.
    • Example: \(\beta = 0.05 \;\Rightarrow\; \exp(0.05) \approx 1.05\) -> the expected response is about 5% larger.
    • Negative coefficients imply proportional decreases (e.g. \(\beta = -0.30 \;\Rightarrow\; \exp(-0.30) \approx 0.74\) -> about a 26% reduction).

1.4.3 What the change applies to…

1.4.3.1 Location (mean of Y)

  • Log-transformed response (e.g. Gaussian on log(Y)):
    Coefficients describe multiplicative changes in the original response Y.

  • Models with a log link (e.g. Poisson, negative binomial, Gaussian with log link):
    Coefficients describe multiplicative changes in the mean of Y.

  • Zero-one-inflated beta (mean, logit link):
    Exponentiated coefficients are odds ratios, not percent changes.

1.4.3.2 Scale (variance, dispersion, precision)

  • Negative binomial (log link for \(\theta\), precision):
    Coefficients describe % changes in the precision parameter \(\theta\) (larger \(\theta\) = less overdispersion).

  • Conway–Maxwell–Poisson (\(\nu\), log link):
    Coefficients describe % changes in \(\nu\) (\(\nu > 1\) = under-dispersion, \(\nu < 1\) = over-dispersion).

  • Zero-one-inflated beta:

  • Precision \(\phi\) (log link): interpret as % change in \(\phi\) (larger \(\phi\) = less extra-binomial scatter).

  • Zero-inflation (logit link): exponentiated coefficients are odds ratios for the probability of structural zeros or ones.

1.4.4 Multiple-unit changes, interactions, and uncertainty

  • Interactions: Sum the relevant coefficients (main effects + interaction term) before exponentiating.
    • Why not sum CIs directly? Because fixed effects are correlated. Proper CI calculation requires using the full variance–covariance matrix (frequentist) or summing posterior draws (Bayesian).
  • Confidence/credible intervals: If a coefficient ranges from L to U, then the multiplicative ratio is between \(\exp(L)\) and \(\exp(U)\); the % change is between \(100 \times (\exp(L) - 1)\) and \(100 \times (\exp(U) - 1)\).