5  Beyond Gaussian 1

This dataset comes from Mizuno and Soma (2023), which investigates the visual preferences of estrildid finches. The study measured how often birds gazed at different visual stimuli (white dots vs. white stripes) under two conditions: food-deprived and food-supplied. The research originally tested the sensory bias hypothesis, which suggests that a pre-existing preference for whitish, round objects (like seeds) may have influenced the evolution of plumage patterns.

Here, to keep things simple, we use only a subset of the data related to the white dot pattern stimulus. We analyse how gaze frequency toward dot stimuli differs between the two conditions (food-deprived vs. food-supplied), using a location-scale model with a negative binomial distribution.

Although this model includes two random effects (individual and species), it follows the same basic structure as Model 2 - the only difference is that we use a non-Gaussian distribution (negative binomial) instead of a Gaussian one.

5.0.1 Questions

  1. Do birds gaze at dot patterns more (or less) when food-deprived compared to food-supplied?
  2. Does the variability in gaze responses differ between conditions?

5.0.2 Variables included

The dataset contains the following variables:

  • species and phylo: The species of the bird (12 species).
  • id: The individual bird’s ID.
  • sex: Male or female.
  • condition: The condition under which the data was collected (food-deprived or food-supplied).
  • frequency: The total frequency of gazes towards the stimulus over a 1-hour period.
  • p_dot: Having plumage dot patterns or not.
  • termite: The presence of termite-eating diet.

5.1 Visualise the datasets

First, we visualise the data to understand how gaze frequency varies by condition and species.

set.seed(42) 

# load the dataset ----
dat_pref <- read.csv(here("data", "AM_preference.csv"), header = TRUE)

dat_pref <- dat_pref %>%
  dplyr::select(-stripe, -full, -subset) %>%
  rename(frequency = dot) %>%
  mutate(species = phylo, across(c(condition, sex), as.factor))
  
ggplot(dat_pref, aes(x = condition,
                     y = frequency)) +
    geom_violin(color = "#8B8B83", fill = "white",
              width = 1.2, alpha = 0.3) +
  geom_jitter(aes(shape = sex, color = sex),
              size = 3, alpha = 0.8,
              width = 0.15, height = 0) +
  labs(
    title = "Total frequency of gazes towards dot patterns",
    x = "Condition",
    y = "Total frequency of gazes (1hr)"
  ) +
  scale_shape_manual(values = c("M" = 17, "F" = 16),
                     labels = c("M" = "Male", "F" = "Female")) +
  scale_color_manual(values = c("M" = "#009ACD", "F" = "#FF4D4D"),
                     labels = c("M" = "Male", "F" = "Female")) +
  theme_classic(base_size = 16) +
  theme(
    axis.text = element_text(color = "#6E7B8B", size = 14),
    axis.title = element_text(color = "#6E7B8B", size = 14),
    legend.title = element_text(color = "#6E7B8B"),
    legend.text = element_text(color = "#6E7B8B"),
    legend.position = "right",
  axis.text.x = element_text(angle = 45, hjust = 1)
  )

The plot breaks down the same gaze data by species. Each point is a bird’s gaze frequency under a condition. Colored lines represent species-specific means across conditions, helping to visualise - how different species vary in their average gazing behaviour, whether species respond differently to food deprivation. Faceting by species can help compare species-level patterns.

ggplot(dat_pref, aes(x = condition, y = frequency)) +
  geom_point(alpha = 0.5) +
  labs(
    title = "Total frequency of gazes towards dot patterns by species",
    x = "Condition",
    y = "Total frequency of gazes (1hr)"
  ) +
  stat_summary(fun = mean, geom = "line", aes(group = species, color = species)) +
  theme_classic() +
  facet_wrap(~ species)

5.2 Run models and interpret results

We fit and compare two types of models to analyse gaze frequency toward white dot patterns using the glmmTMB package:

  • Location-only model: Estimates how the mean gaze frequency varies between the two experimental conditions (food-deprived vs. food-supplied).

  • Location-scale model: Estimates how both the mean and the variability (dispersion) of gaze frequency vary between conditions.

These models help us examine whether internal states such as hunger (represented by food deprivation) influence not only the average number of gazes, but also the species/individual variability in gaze behaviour. Such variability may reflect differences in exploratory tendencies among species/individuals. Please note that in this section, species are not modelled with phylogenetic relatedness. For an example of how to fit a location–scale phylogenetic regression model, see Section Model selection (example 2).

5.2.1 Model fitting

We use negative binomial model here, as the response variable frequency is a count of gazes and may exhibit overdispersion. In The glmmTMB package, we can specify the family as nbinom2 for negative binomial distribution with a log link function.

# location model ----
## the mean (location) of gaze frequency with condition and sex as fixed effects; assumes constant variance.

model_0 <- glmmTMB(
  frequency ~ 1 +  condition + sex + (1 | species) + (1 | id), # location part (mean)
  data = dat_pref,
  family = nbinom2(link = "log")
)
 
summary(model_0)
 Family: nbinom2  ( log )
Formula:          frequency ~ 1 + condition + sex + (1 | species) + (1 | id)
Data: dat_pref

     AIC      BIC   logLik deviance df.resid 
  2176.3   2195.8  -1082.2   2164.3      184 

Random effects:

Conditional model:
 Groups  Name        Variance Std.Dev.
 species (Intercept) 0.3725   0.6103  
 id      (Intercept) 0.1353   0.3679  
Number of obs: 190, groups:  species, 12; id, 95

Dispersion parameter for nbinom2 family (): 1.72 

Conditional model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        4.84086    0.22151  21.854  < 2e-16 ***
conditionsupplied -0.92168    0.12080  -7.630 2.36e-14 ***
sexM              -0.06673    0.14422  -0.463    0.644    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(model_0)
                                 2.5 %     97.5 %   Estimate
(Intercept)                  4.4067149  5.2750101  4.8408625
conditionsupplied           -1.1584490 -0.6849075 -0.9216782
sexM                        -0.3493977  0.2159315 -0.0667331
Std.Dev.(Intercept)|species  0.3467243  1.0743326  0.6103255
Std.Dev.(Intercept)|id       0.2105812  0.6427136  0.3678905

For Gaussian models, we can use the Q-Q plot to show whether the residuals fall along a straight line. However, for non-Gaussian models, we can use the DHARMa package to check the residuals. The DHARMa package provides a set of diagnostic tools for generalized linear mixed models (GLMMs) and allows us to simulate residuals and check for overdispersion, outliers, and uniformity.

# location-only model (NB) ----
# main diagnostic plots
model0_res <- simulateResiduals(model_0, plot = TRUE, seed = 42)

# formal test for over/underdispersion
testDispersion(model0_res) 


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.45606, p-value = 0.456
alternative hypothesis: two.sided

The DHARMa diagnostics indicate no issues with overdispersion or outliers (Dispersion test: p = 0.456; Outlier test: p = 0.88). However, the KS test for uniformity is significant (p = 2e-05), suggesting deviation from the expected residual distribution. Additionally, residuals show non-uniform patterns across levels of the categorical predictor (catPred), and the Levene’s test indicates heterogeneity of variance among groups. These results suggest that the model may not fully capture the structure associated with the categorical predictor.

The result fromDHARMa diagnostics suggests that the location-only model may not fully capture the structure of the data. Therefore, we can fit a location-scale model to account for both the mean and variance of the response variable.

# location-scale model ---- 
# both the mean (location) and the variance (scale) as functions of condition and sex.

model_1 <- glmmTMB(
  frequency ~ 1 + condition + sex + (1 | species) + (1 | id),
   dispformula = ~ condition + sex,     
  data = dat_pref,                          
  family = nbinom2(link = "log")
)
summary(model_1)
 Family: nbinom2  ( log )
Formula:          frequency ~ 1 + condition + sex + (1 | species) + (1 | id)
Dispersion:                 ~condition + sex
Data: dat_pref

     AIC      BIC   logLik deviance df.resid 
    2172     2198    -1078     2156      182 

Random effects:

Conditional model:
 Groups  Name        Variance Std.Dev.
 species (Intercept) 0.3047   0.5520  
 id      (Intercept) 0.1173   0.3425  
Number of obs: 190, groups:  species, 12; id, 95

Conditional model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)         4.8737     0.2042  23.868  < 2e-16 ***
conditionsupplied  -0.8452     0.1206  -7.010 2.38e-12 ***
sexM               -0.1040     0.1370  -0.759    0.448    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dispersion model:
                  Estimate Std. Error z value Pr(>|z|)   
(Intercept)         0.8394     0.2597   3.232  0.00123 **
conditionsupplied  -0.6616     0.2480  -2.667  0.00765 **
sexM                0.1320     0.2486   0.531  0.59549   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(model_1)
                                      2.5 %     97.5 %   Estimate
cond.(Intercept)                  4.4734822  5.2739084  4.8736953
cond.conditionsupplied           -1.0814991 -0.6088973 -0.8451982
cond.sexM                        -0.3725499  0.1646364 -0.1039568
disp.(Intercept)                  0.3303327  1.3485194  0.8394261
disp.conditionsupplied           -1.1476836 -0.1754389 -0.6615612
disp.sexM                        -0.3552045  0.6191168  0.1319561
cond.Std.Dev.(Intercept)|species  0.3079456  0.9894436  0.5519916
cond.Std.Dev.(Intercept)|id       0.1719628  0.6820707  0.3424774

Let’s compare the two models to see if the location-scale model provides a better fit to the data than the location-only model.

# compare models ----
model.sel(model_0, model_1)

You can quickly check the results of the location-only model (model 0) and the location-scale model (model 1) below.

Location-only model
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 4.841 0.222 4.407 5.275
conditionsupplied −0.922 0.121 −1.158 −0.685
sexM −0.067 0.144 −0.349 0.216
Location-scale model (location part)
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 4.874 0.204 4.473 5.274
conditionsupplied −0.845 0.121 −1.081 −0.609
sexM −0.104 0.137 −0.373 0.165
Location-scale model (dispersion part)
Term Estimate Std. Error 95% CI (low) 95% CI (high)
(Intercept) 0.839 0.260 0.330 1.349
conditionsupplied −0.662 0.248 −1.148 −0.175
sexM 0.132 0.249 −0.355 0.619

The Conway-Maxwell-Poisson (CMP) model is a flexible count data model that can handle overdispersion and underdispersion.

# CMP model ----

model_2 <- glmmTMB(
  frequency ~ 1 + condition + sex + (1 | species) + (1 | id),
  dispformula = ~ condition + sex, 
  data = dat_pref,
  family = compois(link = "log")
)
summary(model_2)

confint(model_2)

model.sel(model_0, model_1, model_2)
 Family: compois  ( log )
Formula:          frequency ~ 1 + condition + sex + (1 | species) + (1 | id)
Dispersion:                 ~condition + sex
Data: dat_pref

     AIC      BIC   logLik deviance df.resid 
  2158.4   2184.4  -1071.2   2142.4      182 

Random effects:

Conditional model:
 Groups  Name        Variance Std.Dev.
 species (Intercept) 0.246    0.4960  
 id      (Intercept) 0.151    0.3886  
Number of obs: 190, groups:  species, 12; id, 95

Conditional model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)         4.8614     0.1857  26.178  < 2e-16 ***
conditionsupplied  -0.7823     0.1044  -7.495 6.62e-14 ***
sexM               -0.1099     0.1257  -0.874    0.382    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dispersion model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)         4.0338     0.4457   9.050   <2e-16 ***
conditionsupplied   1.1965     0.7681   1.558    0.119    
sexM               -0.1428     0.5530  -0.258    0.796    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
                                      2.5 %     97.5 %   Estimate
cond.(Intercept)                  4.4974266  5.2253653  4.8613959
cond.conditionsupplied           -0.9868149 -0.5776988 -0.7822569
cond.sexM                        -0.3562620  0.1364832 -0.1098894
disp.(Intercept)                  3.1602735  4.9074247  4.0338491
disp.conditionsupplied           -0.3089514  2.7019951  1.1965218
disp.sexM                        -1.2266046  0.9409510 -0.1428268
cond.Std.Dev.(Intercept)|species  0.2691415  0.9139714  0.4959714
cond.Std.Dev.(Intercept)|id       0.2503174  0.6033250  0.3886165

Here, we compared three generalised linear mixed models. model_0 was a Negative Binomial (NB) model with a single, overall estimated dispersion parameter (i.e., not modelled as a function of predictors). model_1 extended this by modelling the NB dispersion parameter as a function of condition and sex. model_2 further used a Conway-Maxwell-Poisson (CMP) distribution, which can accommodate both under- and over-dispersion, with its dispersion parameter similarly modelled by condition and sex.

Model comparison based on AICc strongly favored the location-scale CMP model (model_2: AICc = 2159.2, model weight = 0.999), with the location-scale NB model (model_1: AICc = 2172.8) and the NB model with a single dispersion parameter (model_0: AICc = 2176.8) performing substantially worse.

In all three models, food-supplied birds showed significantly lower gaze frequencies toward dot stimuli. There was no evidence of sex differences in the location part in any model.

Regarding the dispersion parameter, in the location-scale NB model (model_1), the dispersion part revealed a significant change in residual variance in the food-supplied condition (\(\beta_{[\text{conditionSupplied}]}^{(l)}\) = –0.662). Given the parameterisation of the Negative Binomial distribution (where a smaller dispersion parameter \(\theta\) indicates greater overdispersion), this suggests an increase in overdispersion when food was supplied, implying less consistent behaviour across individuals. However, this effect was not statistically significant in the location-scale CMP model (model_2), where the corresponding estimate was 1.20 (p = 0.12). In the CMP model, a positive estimate for the dispersion parameter \(\nu\) would imply a decrease in overdispersion (i.e., more consistent behavior) if it were significant.

Random effects in all models consistently showed greater variance among species than among id within species. For example, in the location-scale CMP model, species-level variance was 0.246, while individual-level variance was 0.151.

5.2.2 Comparing Negative Binomial and CMP Models

Below, we compare the Negative Binomial (NB) and Conway-Maxwell-Poisson (CMP) models to assess which better fits the data. Model selection is based on AICc and residual diagnostics.

Model comparison of location-only and location-scale models in negative Binomial (NB) and Conway-Maxwell-Poisson (CMP) distributions.

All models passed the DHARMa dispersion and outlier tests, indicating appropriate handling of overall variance and absence of extreme observations. However, the Kolmogorov–Smirnov (KS) test consistently revealed significant deviations from the expected uniform distribution of residuals in all three cases, suggesting remaining misfits in distributional shape. The location-scale CMP model performed best in terms of within-group residual uniformity, showing no significant deviation in any predictor level. In contrast, both the location-only NB model and the location-scale NB model exhibited within-group deviations in some categories. These results collectively suggest that while none of the models perfectly capture the residual structure, the location-scale CMP model may offer the best overall fit among the candidates for the observed data characteristics.

You can also fit the location-scale model using the brms package. The difference is that in brms, you can specify scale parts using shape. Below, we show how to fit the same location-scale negative binomial model as above using brms.

formula1 <- bf(
  frequency ~ 1 + condition + sex + (1 | species) + (1 | id),
  shape ~ 1 + condition + sex
)

prior1 <- default_prior(formula1, 
                        data = dat_pref, 
                        family = negbinomial(link = "log", link_shape = "log")
)

system.time(model_nb_brms <- brm(formula1, 
            data = dat_pref, 
            prior = prior1,
            chains = 2, 
            iter = 5000, 
            warmup = 3000,
            thin = 1,
            family = negbinomial(link = "log", link_shape = "log"),
            # control = list(adapt_delta = 0.95)
)
)
summary(model_nb_brms)
 Family: negbinomial 
  Links: mu = log; shape = log 
Formula: frequency ~ condition + sex + (1 | species) + (1 | id) 
         shape ~ condition + sex
   Data: dat_pref (Number of observations: 190) 
  Draws: 2 chains, each with iter = 5000; warmup = 3000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~id (Number of levels: 95) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.38      0.13     0.08     0.60 1.00      482      563

~species (Number of levels: 12) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.69      0.23     0.36     1.24 1.00      978     1816

Regression Coefficients:
                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                   4.85      0.25     4.35     5.34 1.00     1044
shape_Intercept             0.87      0.28     0.36     1.45 1.00      966
conditionsupplied          -0.84      0.12    -1.09    -0.60 1.00     4843
sexM                       -0.11      0.14    -0.40     0.17 1.00     3632
shape_conditionsupplied    -0.70      0.28    -1.29    -0.19 1.00     2044
shape_sexM                  0.14      0.25    -0.35     0.61 1.00     4208
                        Tail_ESS
Intercept                   1637
shape_Intercept             1938
conditionsupplied           3049
sexM                        3220
shape_conditionsupplied     1777
shape_sexM                  2673

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior predictive checks indicated that the negative binomial location–scale model adequately captured the central tendency of gaze frequencies. However, the model slightly underestimated the heaviness of the right tail, suggesting that a small number of extreme observations were not fully reproduced.

Note that we can also conduct the CMP model in brms using the family = brmsfamily("com_poisson") family, but it is still at an experimental stage and may not work well, so we recommend the glmmTMB package for CMP models for now…

  • Location part: Points and error bars indicate the estimated log(mean frequency) for each combination of feeding condition and sex. For glmmTMB, values are estimated marginal means (on the log scale) with 95% confidence intervals. For brms, values are posterior medians with 95% credible intervals. In addition, the brms panels include violin plots, which visualise the full posterior distributions of the estimates. Faint dots represent model-based predictions of log(mean frequency) for individual observations (i.e. fitted values including random effects), shown to illustrate how the model maps observed data points onto the underlying mean structure.

  • Scale part: Points and error bars show the estimated log(theta), the dispersion (shape) parameter of the negative binomial distribution, for each condition–sex cell. Because theta is a distributional parameter rather than an observed variable, there are no fitted or raw data points that can be plotted at the observation level. In the brms panels, violin plots again illustrate the posterior distributions for log(theta), while points and intervals indicate the posterior median and 95% credible interval.

5.2.3 Comparison of location-only model and location-scale model

  • Modeling dispersion improves model fit and better captures the structure of the data.

  • Location-scale model had a lower AICc (2172.8) than location-only model (2176.8), indicating better model fit. It also had a higher model weight (0.877 vs. 0.123), suggesting stronger support for the location-scale model.

5.2.4 Interpretation of location-scale model :

5.2.4.2 Biological meanings:

Location (mean) part:

  • Birds gazed significantly less at dots when food was supplied (\(\beta^{(l)}_{\text{deprived–supplied}} = -0.85\), 95% CI -1.08, -0.61), corresponding to a rate ratio of \(\exp(\beta^{(l)}) = 0.43\) (i.e., mean frequency reduced by 57.3%; CI −66.0% to −45.7%).
  • There is no significant sex difference in mean gaze frequency (\(\beta_{[\text{sex-male}]}^{(l)}\) = -0.104, 95% CI [-0.37, 0.16]$).

Scale (dispersion) part:

  • The supplied condition significantly reduced the precision parameter (\(\beta^{(s)}_{\text{deprived–supplied}}=-0.66\), 95% CI -1.15, -0.18), giving a \(\theta\)-ratio of \(\exp(\beta^{(s)})=0.52\) (i.e., 48.3% lower precision; CI −68.3% to −16.5%). Because lower \(\theta\) implies more scatter than the Poisson expectation, this means individual gaze behaviour was more variable when food was supplied.

  • There was no significant sex difference in the variability of gaze frequency (\(\beta^{(s)}_{\text{sex-male}}=0.13\), 95% CI -0.36, 0.62).

Location (mean) random effects:

  • Species-level variation (\(sd^{(l)}_{\text{species}}=0.55\), 95% CI 0.31, 0.99) exceeded within-species individual variation.

Location (dispersion) random effects:

  • Individual-level variation (\(sd^{(l)}_{\text{id}}=0.34\), 95% CI 0.17, 0.68) was present but smaller than species-level differences.

5.3 Conclusion

Q1: Do birds gaze at dot patterns more (or less) when food-deprived compared to food-supplied?

Answer: Yes. The average gaze frequency is lower when food is supplied.

Q2: Does the variability in gaze responses differ between conditions?

Answer: Yes. Birds showed more consistent gaze responses when food was deprived, and more varied responses when food was available.