::p_load("ape", "coda", "tidyverse", "here",
pacman"MCMCglmm", "brms", "MASS",
"phytools", "arm", "posterior")
A step-by-step tutorial
This online tutorial belongs to the paper “Promoting the use of phylogenetic multinomial generalised mixed-effects model to understand the evolution of discrete traits.” We are still updating the tutorial more reader-friendly - keep checking for the latest version!
Introduction
This tutorial provides a step-by-step guide to model different types of variables using MCMCglmm
and brms
. We show and explain how to implement 1. Gaussian, 2. binary, 3. ordinal, and 4. nominal models and interpret obtain results. We will also provide extensions: models with multiple data points per species and models with simulated data (simulation is powerful tool to understand the behaviour of the model and the effect of the prior distribution on the results!)
Why use phylogenetic comparative methods?
Phylogenetic comparative methods are used to consider the non-independence of species relatedness due to shared evolutionary history. If we ignore the phylogenetic relationship among species, we may violate the assumption of independence and underestimate the standard errors of the estimated parameters. These methods are widely used in evolutionary biology and ecology to test hypotheses about the evolution of traits and behaviours, reconstruct common ancestral traits, and investigate how different traits have co-evolved across species.
Why use MCMCglmm
and brms
in this tutorial?
MCMCglmm
and brms
allow us to make hierarchical and non-linear models. We can create more appropriate models that accurately reflect the underlying relationships within the data. Additionally, both packages can handle various types of response variables, including continuous, Poisson, binary, ordinal, and nominal data. Using Bayesian methods for parameter estimation, MCMCglmm
and brms
can incorporate prior distributions, which is particularly beneficial when working with limited data.
Intended audience
Our tutorial is designed for students and researchers who are familiar with using R
language (at least know what is R
!) and have an interest in using phylogenetic comparative methods. Although the examples in our tutorial focus on questions and data from Behavioural Ecology and Evolutionary Biology, we provide detailed explanations. Therefore, we believe these techniques are versatile and can be applied in other scientific disciplines, such as environment science, psychology, and beyond.
Setup R in your computer
Please install the following packages before running the code: MCMCglmm
, brms
, ape
, phytools
, here
, coda
, MASS
, tidyverse
, arm
. If this is the first time using brms
, you need to install Rstan
before installing brms
. Please follow the instruction on the official website.
Note 1: some models (especially nominal models) will take a long time to run. Thus, we recommend when you want to compare the results, you can save the model output as .rds
file using saveRDS()
and read it by readRDS()
when you want to compare the results.
Note 2: Because of the stochastic nature of MCMC, every time you (re)run a model, you will get a slightly different output even the model was well-mixing and converge. So even if you run the same model using the same computer, it would always be different to whatever our output here.
Note 3: Ideally, the effective sample size of a model should be close to 1000. However, We consider a model’s results reliable if the effective sample size (ESS) exceeds 400, as suggested by Vehtari et al. (2021). According to their recommendation, an ESS greater than 400 is typically sufficient to ensure stable estimates of variances, autocorrelations, and Monte Carlo standard errors based on their practical experience and simulations (more detailed information; Vehtari et al. 2021). Here, qe evaluate comprehensively in line with other convergence diagnostic indicators, such as R hat and autocorrelation plots.
1. Gaussian models
Gaussian models are used when the response variable is continuous (numeric) and normally distributed. For example, age, body size (e.g. weight, length, or height of an organism), temperature, and distance.
Explanation of dataset
We use the rodent dataset and phylogenetic tree provided by Sheard et al. (2024) to explain how to model Gaussian models. The data is about the rodent suborder Sciuromorpha (223 species), which includes squirrels, chipmunks, dormice, and the mountain beaver. The dataset contains several information including the tail length, body mass, mean annual temperature, and the presence of a tail contrasting tip. The tail contrasting tip is a white or black tip at the end of the tail. In this section, we test the evolutionary relationships between tail length & body mass (response variables) and mean annual temperature and the presence of tail contrasting tip (predictor variables) in rodents.
First of all, we modify the original dataset.
# 1. Read dataset
<- read.csv(here("data", "potential", "Rodent_tail", "RodentData.csv")) # Replace with your own folder path to where the data is stored
dt
str(dt) # check dataset - replace table format!!
<- dt %>% rename(Phylo = UphamTreeName.full) # Rename the 'UphamTreeName.full' column to 'Phylo'
dt <- dt %>% filter(Suborder == "Sciuromorpha") # Filter the rodent suborder
dt <- subset(dt, !is.na(Mass)) # Remove the species do not have body mass data
dt
# Tail length and body mass: log-transform to reduce skewness, then standardise (centre and scale) to have a mean of 0 and a standard deviation of 1, making it easier to compare with other variables.
# Temperature, shade score and litter size: standardise
$zLength <- scale(log(dt$Tail_length))
dt$zMass <- scale(log(dt$Mass))
dt$zShade<-scale(dt$shade_score)
dt$zTemp <- scale(dt$Mean_Annual_Temp)
dt$zLitter_size <- scale(dt$Litter_size)
dt
# Visualise zLength distribution
# hist(dt$zLength)
# Also, convert some columns to factors for analysis just in case
<- dt %>%
dt mutate(across(c(White.tips, Black.tips, Tufts, Naked, Fluffy, Contrasting, Noc, Di, Crep, Autotomy), as.factor))
# 2. Read phylogenetic tree
<- read.tree(here("data", "potential", "Rodent_tail", "RodentTrees.tre")) # 100 trees
trees <- trees[[1]] # In this time, we use only one tree
tree
#Trim out everything from the tree that's not in the modified dataset
<- lapply(trees, drop.tip,tip=setdiff(tree$tip.label, dt$Phylo))
trees
# Select one tree for trimming purposes
<- trees[[1]]
tree <- force.ultrametric(tree, method = c("extend")) # Convert non-ultrametric tree to ultrametric tree tree
How to model? How to interpret the output?
Let’s start by running the model. The first model we will fit is a simple animal model with no fixed effects, and only a random effect relating phylogeny. From here on, we will always begin with this simplest model to assess the phylogenetic signal of the response variable before adding more complexity by including one continuous and one categorical explanatory variable.
Making model is basically 3-4 steps:
Get phylogenetic variance-covariance (
MCMCglmm
) / correlation (brms
) matrixSet prior based on your model and run model with default
iteration
,thinning
, andburn-in
Check mixing whether the samples are adequately exploring the parameter space and convergence, posterior distribution of random and fixed effects
If model do not show good mixing or converge…
- change
iteration
,thinning
, andburn-in
- change prior setting
- change
Univariate model
Intercept-only momdel
MCMCglmm
To get the variance-covariance matrix, we use the inverseA()
function in MCMCglmm
:
<- inverseA(tree, nodes = "ALL", scale = TRUE) # Calculate the inverse of the phylogenetic relatedness matrix for all nodes, scaling the results inv_phylo
Then, we will define the priors for the phylogenetic effect and the residual variance using the following code:
<- list(R = list(V = 1, nu = 0.002), # Prior for residuals: weak inverse-Wishart prior for residual variance
prior1 G = list(G1 = list(V = 1, nu = 0.002))) # Prior for random effect: weak inverse-Wishart for random effect
we set the prior for the residual variance to be a scaled inverse-Wishart distribution with a scale parameter of 1 and degrees of freedom of 0.002. The prior for the phylogenetic effect is set to be a scaled inverse-Wishart distribution with a scale parameter of 1 and degrees of freedom of 0.002. This prior minimizes the influence of prior information, allowing the data to have a stronger impact on the model’s results. However, if you know your dataset characteristics, you can set a stronger (informative) prior.
InMCMCglmm
, we can set four elements in prior: R
(R-structure for residual), G
(G-structure for random effect), B
(fixed effects), and S
(theta_scale parameter). Here, we explain B, R, G
.
Parameter | Which elements | Meaning | Effect of increasing the value | Effect of decreasing the value |
---|---|---|---|---|
V | B, R, G | Variance parameter for the prior distribution | Increases variability in the prior, allowing more flexibility in parameter estimation. The model becomes more influenced by the data (= wider prior distribution) | Decreases variability in the prior, leading to more constrained estimates. The model becomes more influenced by the prior (= narrower prior distribution) |
mu | B | Mean of the prior distribution for fixed effects | Shifts the centre of the fixed effect estimates towards a larger value | Shifts the centre of the fixed effect estimates towards a smaller value |
nu | R, G | Degrees of freedom for the inverse Wishart distribution. Controls the “certainty” of the prior. Larger values result in smaller variance of the prior (higher certainty) | The prior becomes narrower, tending to decrease the variability of the variance-covariance matrix estimates. Stronger prior information is given | The prior becomes wider, tending to increase the variability of the variance-covariance matrix estimates. Weaker prior information is given. nu = 0.002 is often used |
fix | R, G | If fix = 1, the variance is fixed, and the specified prior value (V) is used. No estimation of variance is performed | NA | NA |
alpha.mu | R, G | Prior mean for the variance-covariance matrix. Specifies the central matrix for the variance-covariance matrix estimates. | The variance-covariance matrix estimates tend to approach the matrix specified by alpha.mu. | The variance-covariance matrix estimates tend to approach the matrix specified by alpha.mu (influences each element in the decreasing direction). |
alpha.V | R, G | Prior variance for the variance-covariance matrix. Controls the variability of the variance-covariance matrix estimates. | The variability of the variance-covariance matrix estimates tends to increase. | The variability of the variance-covariance matrix estimates tends to decrease. |
Note that the G element is for random effects variance (G1, G2, G3…). That means if you have 3 random effects, you need to define 3 priors in G.
InMCMCglmm
, the prior setting allows the model to account for different variance structures by dividing the variance components into two distinct matrices: G (phylogeny) and R (non-phylogeny, i.e., residual variance). From this setup, we can estimate the phylogenetic heritability, also known as the phylogenetic signal (Pagel’s \(\lambda\)), using the following simple equation:
\[ H^2 = \frac{\sigma_{a}^2}{\sigma_{a}^2 + \sigma_{e}^2} \]
When we do not include any explanatory variables (= intercept-only model), the model is added ~ 1
. We set a bit large nitt
, thin
, and burnin
to improve mixing and effective sample sizes than default settings (nitt
= 13000, thin
= 10, burnin
= 3000).
# You can check model running time by system.time() function. Knowing the running time is important when you try and error to find the best model setting!
system.time(
<- MCMCglmm(zLength ~ 1, # Response variable zLength with an intercept-only model
mcmcglmm_mg1 random = ~ Phylo, # Random effect for Phylo
ginverse = list(Phylo = inv_phylo$Ainv), # Specifies the inverse phylogenetic covariance matrix Ainv for Phylo, which accounts for phylogenetic relationships.
prior = prior1, # Prior distributions for the model parameters
family = "gaussian", # Family of the response variable (Gaussian for continuous data)
data = dt, # Dataset used for fitting the model
nitt = 13000*20, # Total number of iterations (multiplied by 20 to match scale)
thin = 10*20, # Thinning rate (multiplied by 20)
burnin = 3000*20 # Number of burn-in iterations (multiplied by 20)
)
)summary(mcmcglmm_mg1)
To begin with, we should check mixing and model convergence. We can check them via trace plots and effective sample size. The plot should look like a fuzzy caterpillar to make sure your model has converged.
plot(mcmcglmm_mg1$VCV) # Visualise variance component
plot(mcmcglmm_mg1$Sol) # Visualise location effects
autocorr.plot(mcmcglmm_mg1$VCV) # Check chain mixing
autocorr.plot(mcmcglmm_mg1$Sol) # Check chain mixing
summary(mcmcglmm_mg1) # See the result 95%HPD
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
# DIC: -26.20546
# G-structure: ~Phylo
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.514 1.118 1.968 1000
# R-structure: ~units
# post.mean l-95% CI u-95% CI eff.samp
# units 0.02785 0.00892 0.04964 1000
# Location effects: zLength ~ 1
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.1778 -1.3683 1.0638 1000 0.822
summary(mcmcglmm_mg1$VCV) # See the result of 95% Credible Interval of the variance component
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
# Mean SD Naive SE Time-series SE
# Phylo 1.51394 0.22740 0.007191 0.007191
# units 0.02785 0.01066 0.000337 0.000337
# 2. Quantiles for each variable:
# 2.5% 25% 50% 75% 97.5%
# Phylo 1.11828 1.34700 1.49706 1.66954 1.9651
# units 0.01027 0.01996 0.02735 0.03436 0.0519
summary(mcmcglmm_mg1$Sol) # See the result of 95% Credible Interval of the location effects
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
# Mean SD Naive SE Time-series SE
# -0.17781 0.63444 0.02006 0.02006
# 2. Quantiles for each variable:
# 2.5% 25% 50% 75% 97.5%
# -1.4874 -0.6218 -0.1435 0.2346 0.9961
It looks like both estimates mixed well. And the effective sample size (eff.samp
) is large enough. The summary output shows the posterior mean, standard deviation, and 95% credible interval of the fixed (Location effects
) and random effects (G-structure
) and also residuals (R-structure
).
brms
The syntax is similar to the packages lme4
and lmer
in R
. One of the advantages of brms
is that it can fit complex models with multiple random effects and non-linear terms. The model fitting process is similar to MCMCglmm
but with a different syntax. The output also differs between the two packages.
In brms
, We can use the default_prior
(or get_prior
) function to set prior for the model. This function automatically make a weakly informative prior, which is suitable for most models (more detailed information is here[url]). Of course, you can also set your own prior if you are familiar with your datasets and want to set more informative one (see the section 4).
Things to keep in your mind are to use vcv.phylo(phylogenetic_tree, corr = TRUE)
function to get the phylogenetic correlation matrix. brms
vignettes use vcv.phylo(phylogenetic_tree)
, but it looks like incorrect. You need to specify corr = TRUE
. Because the default is corr = FALSE
, which means the function returns the phylogenetic variance-covariance matrix, not the correlation matrix.
We set large iter
and warmup
to improve mixing and effective sample sizes than default settings (iter
= 2000, warmup
= 1000). Also, we set control = list(adapt_delta = 0.95)
to control parameter to avoid divergent transitions. Setting adapt_delta
closer to 1 increases the step size during sampling to improve model stability.
By default, Stan runs four independent Markov chains (chain = 4
), each exploring the posterior distribution. But four chains are rarely required; two are enough (chain = 2
).
<- ape::vcv.phylo(tree, corr = TRUE) # Get the phylogenetic correlation matrix A
In brms
, we can also check the Rhat
value to see if the model has converged. The Rhat
value should be close to 1.0.
plot(brms_mg1) # Visualise effects
summary(brms_mg1) # See the result
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: zLength ~ 1 + (1 | gr(Phylo, cov = A))
# Data: dt (Number of observations: 300)
# Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
# total post-warmup draws = 10000
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 300)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.09 0.07 0.95 1.24 1.00 1156 2572
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.10 0.54 -1.16 0.97 1.00 2228 3739
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.22 0.02 0.18 0.26 1.00 1091 2257
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
To check the consistency between the results obtained from MCMCglmm
and brms
, we compare the mean of posterior distribution of the fixed and random effect estimates. You should note that the output of random effect estimates and residuals are presented in slightly different formats between MCMCglmm
and brms
. In MCMCglmm
, the outputs (G-structure
/ R-structure
) are provided as variance, while in brms
, the outputs (Multilevel Hyperparameters
/ Further Distributional Parameters
) are given as standard deviation. Therefore, when comparing these estimates between the two packages, you need to square the values obtained from brms
.
As you can see below, the MCMCglmm
and brms
estimates do not exactly match. However, the values are close, and the overall trend is consistent. This suggests that the outcomes obtained from either approach remain largely the same. Thus, even with some differences in the estimates, the broader results and patterns can be interpreted similarly. *The difference comes from factors such as each package’s sampling algorithms and the prior setting.
In this model, the phylogenetic signal is quite high - over 0.95
. This means that 95% of the variance in tail length is explained by the phylogenetic relationship among species. This is a strong phylogenetic signal, suggesting that closely related species tend to have similar tail lengths!
Intercept
summary(mcmcglmm_mg1$Sol) # MCMCglmm
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
# Mean SD Naive SE Time-series SE
# -0.17781 0.63444 0.02006 0.02006
# 2. Quantiles for each variable:
# 2.5% 25% 50% 75% 97.5%
# -1.4874 -0.6218 -0.1435 0.2346 0.9961
<- as_draws_df(brms_mg1) # Convert brms object to data frame
draws_df summary(draws_df$b_Intercept) # brms
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -2.21282 -0.46570 -0.09844 -0.10152 0.26306 1.84947
Random effect and phylogenetic signals (and also residuals)
# random effect
mean(mcmcglmm_mg1$VCV[, "Phylo"]) # MCMCglmm(g-structure)
# [1] 1.513938
mean(draws_df$sd_Phylo__Intercept)^2 # brms(Multilevel Hyperparameters)
# [1] 1.179172
# residuals
mean(mcmcglmm_mg1$VCV[, "units"]) # MCMCglmm(r-structure)
# [1] 0.02785073
mean(draws_df$sigma)^2 # brms(Further Distributional Parameters)
# [1] 0.04712953
# phylogenetic signals
<- mean(mcmcglmm_mg1$VCV[, "Phylo"]) / (mean(mcmcglmm_mg1$VCV[, "Phylo"]) + mean(mcmcglmm_mg1$VCV[, "units"]))
phylo_signal_mcmcglmm
<- mean(draws_df$sd_Phylo__Intercept)^2 / (mean(draws_df$sd_Phylo__Intercept)^2 + mean(draws_df$sigma)^2)
phylo_signal_brms
phylo_signal_mcmcglmm# [1] 0.9819361
phylo_signal_brms# [1] 0.9615677
One continuous explanatory variable model
MCMCglmm
system.time(
<- MCMCglmm(zLength ~ zTemp,
mcmcglmm_mg2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
prior = prior1,
family = "gaussian",
data = dt,
nitt = 13000*25,
thin = 10*25,
burnin = 3000*25
) )
summary(mcmcglmm_mg2)
# Iterations = 75001:324751
# Thinning interval = 250
# Sample size = 1000
# DIC: -5.777706
# G-structure: ~Phylo
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.464 1.065 1.912 1000
# R-structure: ~units
# post.mean l-95% CI u-95% CI eff.samp
# units 0.03053 0.009984 0.05334 1000
# Location effects: zLength ~ zTemp
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.17061 -1.48930 0.99052 1000 0.768
# zTemp 0.03310 -0.05767 0.11829 1000 0.482
posterior_summary(mcmcglmm_mg2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.46421444 0.2224519 1.06723218 1.912723
# units 0.03052545 0.0113458 0.01195931 0.057253
posterior_summary(mcmcglmm_mg2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.1706142 0.63514628 -1.4145991 1.1523617
# zTemp 0.0331045 0.04637504 -0.0522391 0.1252606
The fixed effects estimate for the relationship between tail length and standardized temperature is 0.0331 (posterior mean), with a 95% HPD interval of [-0.05, 0.11]. The pMCMC value is 0.482, indicating that the effect of standardised temperature is not statistically supported under this model. The intercept, representing the expected tail length when standardised temperature is zero, has a posterior mean of -0.171, with a 95% credible interval of [-1.42, 1.15], also suggesting no strong evidence for deviation from zero.
The 95% HPD (Highest Posterior Density interval) from MCMCglmm
and the 95% CI ( 95% equal-tailed credible interval) from brms
serve a similar purpose, but they are constructed differently and can yield different intervals, especially in asymmetric posterior distributions.
Highest Posterior Density (HPD) Interval:
The HPD interval is the narrowest interval that contains a specified proportion (e.g., 95%) of the posterior distribution. It includes the most probable values of the parameter, ensuring that every point inside the interval has a higher posterior density than any point outside.
Equal-Tailed Credible Interval (CI):
An equal-tailed credible interval includes the central 95% of the posterior distribution, leaving equal probabilities (2.5%) in each tail. This means that there’s a 2.5% chance that the parameter is below the interval and a 2.5% chance that it’s above.
https://search.r-project.org/CRAN/refmans/HDInterval/html/hdi.html
brms
<- default_prior(
default_priors2 ~ zTemp + (1|gr(Phylo, cov = A)),
zLength data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(zLength ~ zTemp + (1|gr(Phylo, cov = A)),
brms_mg2 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_priors2,
iter = 10000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
threads = threading(5)
) )
summary(brms_mg2)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: zLength ~ zTemp + (1 | gr(Phylo, cov = A))
# Data: dt (Number of observations: 223)
# Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
# total post-warmup draws = 10000
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 223)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.20 0.09 1.03 1.38 1.00 864 1701
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.15 0.60 -1.33 1.03 1.00 1900 3128
# zTemp 0.04 0.05 -0.05 0.13 1.00 1961 4758
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.18 0.03 0.12 0.24 1.00 537 923
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
The posterior estimate for the effect of zTemp on zLength is 0.04, with a 95% credible interval of [-0.05, 0.13]. The uncertainty surrounding this effect suggests that the relationship between standardised temperature and tail length is weak and not statistically significant. The intercept has a posterior estimate of -0.15 with a 95% credible interval of [-1.33, 1.03], representing the average tail length when zTemp = 0, although this is also uncertain.
The results of MCMCglmm
and brms are generally consistent, with a few differences. The estimates and credible intervals for the fixed effect (zTemp) are almost identical, both indicating a weak relationship. These differences mainly arise from the computational methods and output formats of the models but do not affect the conclusions.
One continuous and one categorical explanatory variable model
Finally, we add categorical explanatory variable, Contrasting
, to the above model. In this model, we test whether species with contrasting tail tips have longer tails than those without contrasting tail tips, after controlling for the effect of temperature.
MCMCglmm
system.time(
<- MCMCglmm(zLength ~ zTemp + Contrasting,
mcmcglmm_mg3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
prior = prior1,
family = "gaussian",
data = dt,
nitt=13000*30,
thin=10*30,
burnin=3000*30
) )
summary(mcmcglmm_mg3)
# Iterations = 90001:389701
# Thinning interval = 300
# Sample size = 1000
# DIC: 87.21227
# G-structure: ~Phylo
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.19 0.9022 1.5 1113
# R-structure: ~units
# post.mean l-95% CI u-95% CI eff.samp
# units 0.04284 0.02576 0.05855 1168
# Location effects: zLength ~ zTemp + Contrasting
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.184003 -1.362926 0.869770 1000 0.740
# zTemp 0.082215 -0.006456 0.155338 1000 0.050 .
# Contrasting1 0.163943 0.059351 0.290087 1000 0.004 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
posterior_summary(mcmcglmm_mg3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.19045414 0.16451779 0.90569889 1.51171543
# units 0.04283667 0.00864498 0.02680094 0.06011227
posterior_summary(mcmcglmm_mg3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.18400256 0.5840963 -1.262384395 1.0278077
# zTemp 0.08221542 0.0420212 0.001294616 0.1679503
# Contrasting1 0.16394271 0.0581162 0.036145589 0.2736073
Both temperature (\(zTemp\)) and contrasting patterns appear to influence tail length. Contrasting has a stronger effect than temperature.
brms
<- default_prior(
default_priors3 ~ zTemp + Contrasting + (1|gr(Phylo, cov = A)),
zLength data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(zLength ~ zTemp + Contrasting + (1|gr(Phylo, cov = A)),
brms_mg3 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_priors3,
iter = 10000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
threads = threading(5)
) )
summary(brms_mg3)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: zLength ~ zTemp + Contrasting + (1 | gr(Phylo, cov = A))
# Data: dt (Number of observations: 223)
# Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
# total post-warmup draws = 10000
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 223)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.23 0.09 1.06 1.42 1.01 488 945
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.16 0.62 -1.38 1.04 1.00 1728 3132
# zTemp 0.03 0.05 -0.05 0.12 1.00 1447 2230
# Contrasting1 0.15 0.06 0.02 0.27 1.00 2826 4736
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.16 0.03 0.09 0.22 1.01 295 286
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
In brms
, \(Contrasting\) has a positive effect on tail length. The posterior estimate is 0.15, with a 95% credible interval of [0.03, 0.27]. This suggests that species with contrasting tail tips have longer tails than those without contrasting tail tips, after controlling for the effect of temperature. The effect of temperature on tail length is similar to the previous model, with a posterior estimate of 0.03 and a 95% credible interval of [-0.05, 0.13].
Although there is some overlap in the estimates of temperature’s effect between the two models, the key difference is that MCMCglmm
provides a positive effect with stronger evidence (CI not including zero), whereas brms
shows more uncertainty (CI including zero). This suggests that while both models suggest a potential effect, brms
does not provide as clear evidence for it being significant, and the effect size might be smaller or potentially null.
Bivariate model
Here, we provide more advanced models - bivariate models. Because the bivariate model is similar to the nominal model, it allows for examining relationships between two response variables while taking into account multiple explanatory variables. In this model, we will include two response variables, zLength
and zMass
, and test the relationship between these two variables and the explanatory variables, zTemp
and Contrasting
. This model can examine the evolutionary correlation between tail length and body mass in rodents, considering the phylogenetic relationships.
In MCMCglmm
, we use cbind(traitA, traitB)
to set the response variables (this allows us to estimate the covariance between the two response variables). In brms
, we use mvbind(traitA, traitB)
to set the response variable. We also need to specify family
for both response variables (unless you use a custom prior, you need not define what distribution fit to the random effect in brms
- default prior automatically set them).
The important point is, MCMCglmm
does not provide the correlation estimate between the two response variables (it estimate the covariance between two response variables), while brms
does. However, you need to use (1|a|gr(Phylo, cov = A))
instead of (1|gr(Phylo, cov = A))
and set set_rescor(TRUE).
Without the newly added |a|
and set_rescor(TRUE)
, brms
will not calculate the correlation (a
can be any letter). We need to calculate the correlation using covariance estimates in MCMCglmm
. Fourtunately, the correlation can calculate easily by dividing the covariance by the product of the standard deviations of the two variables.
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(G = list(G1 = list(V = diag(2),
prior2 nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 1000)),
R = list(V = diag(2), nu = 0.002)
)
system.time(
<- MCMCglmm(cbind(zLength, zMass) ~ trait - 1,
mcmcglmm_mg4 random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
<- MCMCglmm(cbind(zLength, zMass) ~ zTemp:trait + trait - 1,
mcmcglmm_mg5 random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
<- MCMCglmm(cbind(zLength, zMass) ~ zTemp:trait + Contrasting:trait + trait -1,
mcmcglmm_mg6 random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
## brms
<- ape::vcv.phylo(tree, corr = TRUE)
A
<- bf(mvbind(zLength, zMass) ~ 1 +
formula 1|a|gr(Phylo, cov = A)),
(set_rescor(rescor = TRUE)
)
<- default_prior(formula,
default_prior data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(formula = formula,
brms_mg4 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior,
iter = 25000,
warmup = 10000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
)
)
<- bf(mvbind(zLength, zMass) ~ zTemp + (1|a|gr(Phylo, cov = A)), set_rescor(TRUE))
formula2
<- default_prior(formula2,
default_prior2 data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(formula = formula2,
brms_mg5_1 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior,
iter = 55000,
warmup = 45000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
)
)
<- bf(mvbind(zLength, zMass) ~ zTemp + Contrasting + (1|a|gr(Phylo, cov = A)), set_rescor(TRUE))
formula3
<- default_prior(formula3,
default_prior3 data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(formula = formula3,
brms_mg6 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior3,
iter = 25000,
warmup = 20000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.98),
) )
summary(mcmcglmm_mg4)
# Iterations = 165001:714451
# Thinning interval = 550
# Sample size = 1000
# DIC: 56.02311
# G-structure: ~us(trait):Phylo
# post.mean l-95% CI u-95% CI eff.samp
# traitzLength:traitzLength.Phylo 1.8450 1.3355 2.4126 1000
# traitzMass:traitzLength.Phylo -0.1288 -0.5302 0.2168 1000
# traitzLength:traitzMass.Phylo -0.1288 -0.5302 0.2168 1000
# traitzMass:traitzMass.Phylo 2.1016 1.5978 2.6773 1120
# R-structure: ~us(trait):units
# post.mean l-95% CI u-95% CI eff.samp
# traitzLength:traitzLength.units 0.024426 0.006637 0.04178 1000
# traitzMass:traitzLength.units 0.005761 -0.008694 0.02082 1000
# traitzLength:traitzMass.units 0.005761 -0.008694 0.02082 1000
# traitzMass:traitzMass.units 0.057850 0.033641 0.08572 1000
# Location effects: cbind(zLength, zMass) ~ trait - 1
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitzLength -0.04079 -2.04401 1.70436 1000 0.968
# traitzMass -0.44244 -2.33966 1.55863 1000 0.688
posterior_summary(mcmcglmm_mg4$VCV)
# Estimate Est.Error Q2.5 Q97.5
# traitzLength:traitzLength.Phylo 1.845005322 0.272081110 1.347508423 2.47378715
# traitzMass:traitzLength.Phylo -0.128783301 0.192333313 -0.517573376 0.23065613
# traitzLength:traitzMass.Phylo -0.128783301 0.192333313 -0.517573376 0.23065613
# traitzMass:traitzMass.Phylo 2.101601613 0.288665978 1.624968122 2.74203079
# traitzLength:traitzLength.units 0.024426281 0.009195449 0.008110895 0.04557695
# traitzMass:traitzLength.units 0.005761157 0.007496943 -0.008150784 0.02198074
# traitzLength:traitzMass.units 0.005761157 0.007496943 -0.008150784 0.02198074
# traitzMass:traitzMass.units 0.057849736 0.013834738 0.036274710 0.09066023
posterior_summary(mcmcglmm_mg4$Sol)
# Estimate Est.Error Q2.5 Q97.5
# traitzLength -0.04079478 0.9568184 -1.827382 2.059106
# traitzMass -0.44243620 1.0046735 -2.486688 1.513088
<- (mcmcglmm_mg4$VCV[, 2]) /sqrt((mcmcglmm_mg4$VCV[, 1] * mcmcglmm_mg4$VCV[, 4])) # calculate phylogenetic correlation between zLength and zMass
corr_p <- (mcmcglmm_mg4$VCV[, 6]) /sqrt((mcmcglmm_mg4$VCV[, 5] * mcmcglmm_mg4$VCV[, 8]))
corr_nonp posterior_summary(corr_p)
# Estimate Est.Error Q2.5 Q97.5
# var1 -0.06650913 0.09770663 -0.2628186 0.1173509
posterior_summary(corr_nonp)
# Estimate Est.Error Q2.5 Q97.5
# var1 0.1527294 0.1890808 -0.2165917 0.5260998
summary(brms_mg4)
# Family: MV(gaussian, gaussian)
# Links: mu = identity; sigma = identity
# mu = identity; sigma = identity
# Formula: zLength ~ 1 + (1 | a | gr(Phylo, cov = A))
# zMass ~ 1 + (1 | a | gr(Phylo, cov = A))
# Data: dt (Number of observations: 223)
# Draws: 2 chains, each with iter = 35000; warmup = 25000; thin = 1;
# total post-warmup draws = 20000
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 223)
# Estimate Est.Error l-95% CI u-95% CI
# sd(zLength_Intercept) 1.22 0.09 1.05 1.40
# sd(zMass_Intercept) 1.29 0.09 1.12 1.48
# cor(zLength_Intercept,zMass_Intercept) -0.06 0.10 -0.26 0.13
# Rhat Bulk_ESS Tail_ESS
# sd(zLength_Intercept) 1.00 1048 2370
# sd(zMass_Intercept) 1.00 3079 6693
# cor(zLength_Intercept,zMass_Intercept) 1.00 1912 4279
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# zLength_Intercept -0.14 0.61 -1.35 1.05 1.00 2159 4378
# zMass_Intercept -0.42 0.63 -1.67 0.83 1.00 3442 6223
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma_zLength 0.17 0.03 0.11 0.23 1.00 751 1004
# sigma_zMass 0.25 0.03 0.20 0.31 1.00 2092 4823
# Residual Correlations:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
# rescor(zLength,zMass) 0.14 0.18 -0.22 0.47 1.00 2017
# Tail_ESS
# rescor(zLength,zMass) 4659
As you can see the above, the estimates from both models are consistent:
the random effects are…
zLength is 1.845 (1.348, 2.474)
and zMass is 2.102 (1.625, 2.742)
in MCMCglmm
and 1.4884 (1.1025, 1.96)
and 1.664 (1.2544, 2.1904)
in brms
. Phylogenetic correlation between zLength and zMass are -0.067 (-0.263, 0.117)
in MCMCglmm
and -0.06 (-0.26, 0.13)
Both length and mass exhibit very high phylogenetic signals (heritabilities), indicating that these traits are strongly conserved evolutionarily and that closely related species tend to resemble each other.
<- as_draws_df(brms_mg4) # Convert brms object to data frame
draws_df
# random effect - length
## mcmcglmm
mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.Phylo"]) # MCMCglmm(g-structure)
# [1] 1.845005
## brms
mean(draws_df$sd_Phylo__zLength_Intercept)^2 # brms(Multilevel Hyperparameters)
# [1] 1.179172
# random effect - mass
## mcmcglmm
mean(draws_df$sd_Phylo__zLength_Intercept)^2
## brms
mean(draws_df$sd_Phylo__zMass_Intercept)^2 # brms(Multilevel Hyperparameters)
# [1] 1.664106
# residuals - length
mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.units"]) # MCMCglmm(r-structure)
# [1] 0.02442628
mean(draws_df$sigma_zLength)^2 # brms(Further Distributional Parameters)
# [1] 0.02886549
mean(mcmcglmm_mg4$VCV[, "traitzMass:traitzMass.units"]) # MCMCglmm(r-structure)
# [1] 0.05784974
mean(draws_df$sigma_zMass)^2 # brms(Further Distributional Parameters)
# [1] 0.05784974
# phylogenetic heritability - length
## mcmcglmm
<- mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.Phylo"]) / (mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.Phylo"]) + mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.units"]))
phylo_signal_mcmcglmm
<- mean(draws_df$sd_Phylo__zLength_Intercept)^2 / (mean(draws_df$sd_Phylo__zLength_Intercept)^2 + mean(draws_df$sigma_zLength)^2)
phylo_signal_brms
phylo_signal_mcmcglmm# [1] 0.9869338
phylo_signal_brms# [1] 0.9810099
# phylogenetic heritability - mass
<- mean(mcmcglmm_mg4$VCV[, "traitzMass:traitzMass.Phylo"]) / (mean(mcmcglmm_mg4$VCV[, "traitzMass:traitzMass.Phylo"]) + mean(mcmcglmm_mg4$VCV[, "traitzMass:traitzMass.units"]))
phylo_signal_mcmcglmm
<- mean(draws_df$sd_Phylo__zMass_Intercept)^2 / (mean(draws_df$sd_Phylo__zMass_Intercept)^2 + mean(draws_df$sigma_zMass)^2)
phylo_signal_brms
phylo_signal_mcmcglmm# [1] 0.9732109
phylo_signal_brms# [1] 0.9732109
2. Binary models
Binary models are used when the response variable is binary (0 or 1), such as presence (1) vs. absence (0), survival (1) vs. death (0), success (1) vs. failure (0), or female vs. male. The results are straightforward to interpret: the model provides the probability of a ‘success’ (coded as 1) given the predictor variables. For example, in a female/male classification, the output represents the probability of being male, as R typically treats the alphabetically first category as the reference level.
Explanation of dataset
We tested the relationships between the presence of red–orange pelage on limbs (response variable) and average social group size and activity cycle in primates (265 species). We used the dataset and tree from Macdonald et al. (2024). The dataset contains information on the presence of skin and pelage coloration data, average social group size, presence or absence of multilevel hierarchical societies, colour visiual system, and activity cycle. We aim to test whether the presence of red–orange pelage on limbs is associated with social group size and activity cycle in primates.
For run model, we need to prepare the suitable format data.
<- read.nexus(here("data", "potential", "primate", "trees100m.nex"))
p_trees <- read.csv(here("data", "potential", "primate", "primate_data_male.csv"))
primate_data <- primate_data
p_dat
<- subset(p_dat, !is.na(vs_male)) # omit NAs by VS predictor
p_dat <- subset(p_dat, !is.na(vs_female))
p_dat <- subset(p_dat, !is.na(activity_cycle)) # omit NAs by activity cycle
p_dat <- subset(p_dat, !is.na(redpeachpink_facial_skin)) # omit NAs by response variable
p_dat <- subset(p_dat, !is.na(social_group_size)) # omit NAs by response variable
p_dat <- subset(p_dat, !is.na(multilevel)) # omit NAs by response variable
p_dat
# Rename the column 'social_group_size' to 'cSocial_group_size' - all continuous variables were centred and standardized by authors
<- p_dat %>% rename(cSocial_group_size = social_group_size)
p_dat
<- p_dat %>%
p_dat mutate(across(c(vs_female, vs_male, redpeachpink_facial_skin,red_genitals,red_pelage_head, red_pelage_body_limbs, red_pelage_tail), as.factor))
<- p_trees[[1]]
p_tree <- lapply(p_trees, drop.tip,tip = setdiff(p_tree$tip.label, p_dat$PhyloName)) #trim out everything from the tree that's not in the dataset
p_trees <- p_trees[[1]] # select one tree for trimming purposes
p_tree <- force.ultrametric(p_tree) # force tree to be ultrametric - all tips equidistant from root p_tree
How to impliment models and interpret the outputs?
InMCMCglmm
, the binary model can take two kinds of link functions: logit and probit link functions. The logit link function (family = "categorical"
) is the default inMCMCglmm
for the binary model, while the probit link function can be defined using family = "threshold"
or family = "ordinal"
. We recommend using family = "threshold"
as it is often more suitable for modelling under biological assumption and can provide better convergence properties in certain situations. Here, we show both models.
Univariate model
Intercept-only model
Probit model
MCMCglmm
We need to set different prior for binary models from the Gaussian mopdel. Actually, the residual variance does not need to be estimated in binary models, so we only need to set the prior for the random effect. In binary models, the response variable y
is either 0 or 1, which means there are no variety residuals to measure variability around a predicted mean. For logistic regression (logit link), the residual variance is fixed at ^2/3 on the latent scale. For probit regression (probit link), the residual variance is fixed at 1 on the latent scale. InMCMCglmm
, the prior settings do not differ between the probit and logit models.
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo
<- list(R = list(V = 1, fix = 1), # fix residual variance = 1
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)system.time(
<- MCMCglmm(red_pelage_body_limbs ~ 1,
mcmcglmm_BP1 random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)
summary(mcmcglmm_BP1) # 95%HPD Interval
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 298.7917
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 3.186 0.2795 7.242 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.8487 -2.8595 0.8645 1093 0.296
posterior_summary(mcmcglmm_BP1$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 3.185968 2.226905 0.6918403 8.6888
# units 1.000000 0.000000 1.0000000 1.0000
posterior_summary(mcmcglmm_BP1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.8487221 0.8998161 -2.859676 0.8638048
As you can see here, residual variance is fixed 1 - the model did not estimate the residual variance (R-structure
). The MCMC chain mixed well. The positive posterior mean of phylogenetic random effect (3.186) suggests that closely related species tend to have more red limb pelages.The fact that the 95% HPD interval / 95% equal-tailed credible interval does not include zero indicates that the effect of lineage on red pelages is statistically significant.
brms
family
argument is set to bernoulli()
(not binomial()
). The bernoulli()
family is used for binary data, while the binomial()
family is used for the combination of count data, such as (c(N_sucsess, N_failure)
).
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
priors_brms data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
<- brm(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
brms_BP1 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms,
iter = 6000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
) )
summary(brms_BP1)
# Family: bernoulli
# Links: mu = probit
# Formula: red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 6000; warmup = 5000; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.67 0.53 0.78 2.85 1.00 393 727
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.72 0.81 -2.38 0.82 1.00 752 1027
The same trend in results was observed in brms
. Note that the phylogenetic random effect estimate in brms is given as the standard deviation (sd
), so to compare it with MCMCglmm
, we need to square the sd. There were no differences between the fixed effects in brms
and MCMCglmm
. Outputs of brms
is usually 95% equal-tailed credible interval.
Phylogenetic signals (probit model)
# MCMCglmm
<- ((mcmcglmm_BP1$VCV[, "PhyloName"]) / (mcmcglmm_BP1$VCV[, "PhyloName"] + 1))
phylo_signal_mcmcglmm_BP %>% mean()
phylo_signal_mcmcglmm_BP # [1] 0.7052307
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mcmcglmm_BP # 2.5% 50% 97.5%
# 0.4089277 0.7260425 0.8967865
# brms
<- brms_BP1 %>% as_tibble() %>%
phylo_signal_brms_BP ::select(Sigma_phy = sd_PhyloName__Intercept) %>%
dplyrmutate(lambda_probit = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_probit)
%>% mean()
phylo_signal_brms_BP # [1] 0.6992207
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_brms_BP # 2.5% 50% 97.5%
# 0.3795359 0.7267670 0.8903665
As expected from the random effects estimates above, the systematic signal is estimated to be 0.71 for MCMCMCglmm
and 0.70 for brms
. The difference with the estimation from Gaussian model is that the residuals are fixed at 1 in the binary model. That is the same for the other discrete models.
Logit model
MCMCglmm
c2 correction
In the logit model, we need to correct the estimates obtained from MCMCglmm
as MCMCglmm
implements additive overdispersion GLMM fitted by MCMC sampling from the posterior distribution. This overdispersion lead to inflated variance estimates (see also Nakagawa & Schielzeth (2010)). To adjust for this, we need to rescale the estimates of both location effects and variance components by considering the residual variance adjustment (MCMCglmm course note
written by Jarrod Hadfield). We can use the below code:
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- model$Sol / sqrt(1+c2) # for fixed effects
res_1 <- model$VCV / (1+c2) # for variance components res_2
The logit model is a bit difficult to converge than the probit model. So, we set a bit large nitt
, thin
, and burnin
to improve mixing and effective sample sizes than the probit model.
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ 1,
mcmcglmm_BL1 random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60)
)
brms
In brms
, we do not need to conduct c2 correction as brms
automatically corrects the overdispersion.
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
priors_brms2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
<- brm(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
brms_BL1 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms2,
iter = 7500,
warmup = 6500,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
) )
# MCMCglmm
summary(mcmcglmm_BL1)
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 294.3311
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 11.64 0.7224 26.95 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.665 -5.154 1.833 1000 0.316
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmcglmm_BL1$Sol/sqrt(1+c2)
res_1 <- mcmcglmm_BL1$VCV/(1+c2)
res_2
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.435339 1.560119 -4.565329 1.502008
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 8.6476147 6.01105 1.5836214 24.0221167
# units 0.7430287 0.00000 0.7430287 0.7430287
#brms
# summary(brms_BL1)
# Family: bernoulli
# Links: mu = logit
# Formula: red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 7500; warmup = 6500; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.61 0.85 1.21 4.46 1.00 550 971
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -1.08 1.19 -3.54 1.13 1.00 1433 1398
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
The point estimates and the overlaps of 95%CI of both random and fixed effects look fine.
Phylogenetic signals (logit model)
Then, we can obtain the phylogenetic signal using
\[ H^2 = \frac{\sigma_{a}^2}{\sigma_{a}^2 + \pi^2/3} \]
# MCMCglmm
<- ((mcmcglmm_BL1$VCV[, "PhyloName"]/(1+c2)) / (mcmcglmm_BL1$VCV[, "PhyloName"]/(1+c2)+1))
phylo_signal_mcmcglmm_BL %>% mean()
phylo_signal_mcmcglmm_BL # [1] 0.8541816
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mcmcglmm_BL # 2.5% 50% 97.5%
# 0.6129464 0.8776149 0.9600353
# brms
<- brms_BL1 %>% as_tibble() %>%
phylo_signal_brms_BL ::select(Sigma_phy = sd_PhyloName__Intercept) %>%
dplyrmutate(lambda_logit = (Sigma_phy^2 / (Sigma_phy^2+1))) %>%
pull(lambda_logit)
%>% mean()
phylo_signal_brms_BL %>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_brms_BL
Both estimated phylogenetic signals took close values.
One continuous explanatory variable model
For the next step, we examine whether the group size affect the presence or absence of red pelagic on the body limbs in primates to test the possibility that red pelage is related to social roles and behaviours within the group.
Probit model
The model using MCMCglmm
is
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size,
mcmcglmm_BP2 random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*30,
thin = 10*30,
burnin = 3000*30)
)
For brms
…
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
priors_brms3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
<- brm(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
brms_BP2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms3,
iter = 10000,
warmup = 8500,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
) )
# MCMCglmm
summary(mcmcglmm_BP2)
# Iterations = 90001:389701
# Thinning interval = 300
# Sample size = 1000
#
# DIC: 299.4154
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 3.489 0.2165 7.618 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ cSocial_group_size
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.9754 -2.8047 0.8503 1979 0.258
# cSocial_group_size -0.1136 -0.3679 0.1707 1091 0.418
posterior_summary(mcmcglmm_BP2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.9753971 0.9209577 -2.9483573 0.7464915
# cSocial_group_size -0.1135599 0.1364510 -0.4137361 0.1260902
posterior_summary(mcmcglmm_BP2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 3.488721 2.419961 0.6210229 8.835258
# units 1.000000 0.000000 1.0000000 1.000000
#brms
summary(brms_BP2)
# Family: bernoulli
# Links: mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 10000; warmup = 8500; thin = 1;
# total post-warmup draws = 3000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.68 0.53 0.78 2.86 1.00 440 944
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.75 0.85 -2.54 0.88 1.00 1318 1299
# cSocial_group_size -0.11 0.13 -0.40 0.12 1.00 3127 1926
Both models suggest that social group size does not significantly affect the presence or absence of red pelage on the body limbs in primates, as the credible intervals include zero and the p-values are higher than 0.05.
Logit model
The model for MCMCglmm…
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size,
mcmcglmm_BL2 random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60)
)
For brms
…
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
priors_brms4 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
<- brm(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
brms_biL2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms4,
iter = 18000,
warmup = 8000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
) )
summary(mcmcglmm_BL2)
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 294.8728
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 11.95 0.8832 26.74 835
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ cSocial_group_size
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.8090 -5.3966 1.7961 1000 0.266
# cSocial_group_size -0.2622 -0.8166 0.2671 1000 0.356
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmcglmm_BL2$Sol/sqrt(1+c2)
res_1 <- mcmcglmm_BL2$VCV/(1+c2)
res_2
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.5593137 1.5801956 -4.9135095 1.3437762
# cSocial_group_size -0.2260184 0.2484719 -0.7398858 0.2081084
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 8.8770592 5.968142 1.4813098 23.2748934
# units 0.7430287 0.000000 0.7430287 0.7430287
# brms
summary(brms_BP2)
# Family: bernoulli
# Links: mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 10000; warmup = 8500; thin = 1;
# total post-warmup draws = 3000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.68 0.53 0.78 2.86 1.00 440 944
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.75 0.85 -2.54 0.88 1.00 1318 1299
# cSocial_group_size -0.11 0.13 -0.40 0.12 1.00 3127 1926
Estimates from both packages were similar and the logit model yielded the same results as the probit model: the size of the social group has no significant effect on this trait.
One continuous and one categorical explanatory variable model
Therefore, we hypothesised that there might be a relationship between the presence of red pelage and whether a primate species is diurnal or not. We included this as a predictor variable in our model. This is because red colouration may not be as visible in the dark, making it potentially less advantageous for nocturnal species. On the other hand, diurnal species may benefit more from the visibility of red pelage, which could play a role in social/sexual signalling or other ecological factors during the day. Note that the activity cycle variable comprised 01 data, whether diurnal or not, to simplify understanding the model outputs.
Probit model
For the model MCMCglmm the following:
$diurnal <- ifelse(p_dat$activity_cycle == "di", 1, 0)
p_dat
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size + diurnal,
mcmcglmm_BP3 random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*50,
thin = 10*50,
burnin = 3000*50)
)
For brms
,
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
priors_brms5 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
<- brm(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
brms_BP3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms5,
iter = 5000,
warmup = 3000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
) )
# MCMCglmm
summary(mcmcglmm_BP3)
# Iterations = 150001:649501
# Thinning interval = 500
# Sample size = 1000
#
# DIC: 299.2849
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 4.063 0.535 9.133 900.6
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.685030 -3.381045 1.969969 1000 0.584
# cSocial_group_size -0.110831 -0.373846 0.151850 1000 0.392
# activity_cycledi -0.533313 -2.486488 1.364789 1000 0.572
# activity_cyclenoct -0.007557 -1.807080 1.823124 1000 0.986
posterior_summary(mcmcglmm_BP3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 4.062649 2.673005 0.947084 11.42444
# units 1.000000 0.000000 1.000000 1.00000
posterior_summary(mcmcglmm_BP3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.685030297 1.3457124 -3.6844040 1.7696060
# cSocial_group_size -0.110831330 0.1342969 -0.4257179 0.1235912
# activity_cycledi -0.533313064 0.9653465 -2.6461670 1.2170270
# activity_cyclenoct -0.007557281 0.9589911 -1.9164287 1.7255631
# brms
summary(brms_BP3)
# Family: bernoulli
# Links: mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.86 0.58 0.89 3.21 1.00 3870 6693
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.54 1.21 -2.96 1.87 1.00 9333 10083
# cSocial_group_size -0.11 0.13 -0.40 0.13 1.00 21096 14055
# activity_cycledi -0.50 0.93 -2.46 1.24 1.00 9758 9062
# activity_cyclenoct -0.01 0.96 -2.00 1.81 1.00 9588 9736
The results indicate that both models (MCMCglmm
and brms
) are reliable based on diagnostic metrics. Effective sample sizes are sufficient, and Rhat
values are close to 1, indicating good convergence.
For random effects, the estimated variance of the phylogenetic effect is consistent between models (MCMCglmm
: 3.59; brms
: 1.73^2 = 2.99), with overlapping 95% credible intervals. This suggests still phylogenetic influence on red pelage presence, though with some uncertainty.
For fixed effects, neither social group size (MCMCglmm
: -0.10, brms
: -0.11) nor diurnal (MCMCglmm
: -0.49, brms
: -0.48) showed statistically significant effects, as their 95% CIs include zero. This suggests that group size and diurnal activity are unlikely to be major determinants of red pelage presence.
Logit model Finally, MCMCglmm
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size + diurnal,
mcmcglmm_BL3 random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*50,
thin = 10*50,
burnin = 3000*50)
)
And brms
,
<- ape::vcv.phylo(p_tree, corr = TRUE)
A <- default_prior(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
priors_brms6 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
<- brm(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
brms_BL3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms6,
iter = 18000,
warmup = 8000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
) )
summary(mcmcglmm_BL3)
# Iterations = 240001:1039201
# Thinning interval = 800
# Sample size = 1000
#
# DIC: 293.6985
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 15.16 1.792 37.31 905.8
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.30757 -6.63033 3.76436 1000 0.594
# cSocial_group_size -0.23416 -0.76302 0.28469 1000 0.392
# activity_cycledi -1.15936 -5.18073 2.29078 1000 0.528
# activity_cyclenoct -0.09113 -3.90477 3.45304 1000 0.980
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmcglmm_BL3$Sol/sqrt(1+c2)
res_1 <- mcmcglmm_BL3$VCV/(1+c2)
res_2
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.12711731 2.2299457 -5.7979353 3.1769798
# cSocial_group_size -0.20184346 0.2376961 -0.7177624 0.1888044
# activity_cycledi -0.99936034 1.6233908 -4.3827738 2.1066431
# activity_cyclenoct -0.07855316 1.6257651 -3.4238818 2.9358848
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 11.2672019 8.147431 2.2323210 31.2476370
# units 0.7430287 0.000000 0.7430287 0.7430287
summary(brms_BL3)
# Family: bernoulli
# Links: mu = logit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.93 0.96 1.33 5.06 1.00 3607 5803
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.62 1.88 -4.35 3.21 1.00 9854 10536
# cSocial_group_size -0.21 0.25 -0.79 0.21 1.00 19703 12846
# activity_cycledi -0.81 1.54 -4.04 2.07 1.00 9439 8872
# activity_cyclenoct -0.07 1.56 -3.29 2.93 1.00 9517 9450
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
Similar results were obtained with the logit model as with the probit model. In conclusion, the models are consistent and suggest that neither social group size nor diurnal activity significantly affect red pelage presence.
Bivariate model
We jointly analysed two binary traits, presence of red pelage on the body/limbs and on the head, using a multivariate probit/logit threshold model with a phylogenetic random effect. Residual variances were fixed to 1 for identification and residual cross-trait covariance was set to 0. We fitted three specifications: intercept-only, adding centred social group size, and adding activity cycle.
Probit link
<- inverseA(p_tree, nodes = "ALL", scale = TRUE) # invert covariance matrix for use by MCMCglmm
inv.phylo <- list(G = list(G1 = list(V = diag(2),
prior2 nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 10)),
R = list(V = diag(2), fix = 1)
)
system.time(
<- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ trait - 1,
mcmc_BPB1 random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*25,
thin = 10*25,
burnin = 3000*25
)
)
system.time(
<- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + trait - 1,
mcmc_BPB2 random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
<- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + diurnal :trait + trait - 1,
mcmc_BPB3 random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
<- ape::vcv.phylo(p_tree, corr = TRUE)
A <- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ 1 +
formula_biPv1 1|a|gr(PhyloName, cov = A))
(
)
<- default_prior(formula_biPv1,
default_prior2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
<- brm(formula = formula_biPv1,
brms_BPB1 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior2,
iter = 25000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
)
<- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size +
formula_biPv2 1|a|gr(PhyloName, cov = A))
(# set_rescor(TRUE)
)
<- default_prior(formula_biPv2,
default_prior3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
<- brm(formula = formula_biPv2,
brms_BPB2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior3,
iter = 35000,
warmup = 15000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
)
<- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + diurnal +
formula_biPv3 1|a|gr(PhyloName, cov = A))
(# set_rescor(TRUE)
)
<- default_prior(formula_biPv3,
default_prior4 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
<- brm(formula = formula_biPv3,
brms_BPB3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior4,
iter = 35000,
warmup = 25000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
) )
Logit link
# prior setting for mcmcglmm
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = diag(2), fix = 1),
prior G = list(G1 = list(V = diag(2), nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 10)
)
)
# function to run mcmcglmm
<- function(formula, data, prior, inv_phylo) {
run_mcmcglmm MCMCglmm(
fixed = formula,
random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("categorical", "categorical"),
data = data,
prior = prior,
ginverse = list(PhyloName = inv_phylo),
nitt = 13000*2500, # original *250
thin = 10*2500, # original *250
burnin = 3000*2500 # original *250
)
}
# model list - mcmcglmm
<- list(
mcmcglmm_formulas formula1 = cbind(red_pelage_body_limbs, red_pelage_head) ~ trait - 1,
formula2 = cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + trait - 1,
formula3 = cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + activity_cycle:trait + trait - 1
)
#### brms ####
# prior setting for brms
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- function(formula, data, A) {
default_priors default_prior(formula,
data = data,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
}
# function to run brms
<- function(formula, data, A, prior) {
run_brms brm(
formula = formula,
data = data,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = prior,
iter = 10000, # original 4000
warmup = 5000, # original 3000
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
}
# model list
<- list(
brms_formulas formula1 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ 1 + (1|a|gr(PhyloName, cov = A))),
formula2 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + (1|a|gr(PhyloName, cov = A)) ),
formula3 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + activity_cycle + (1|a|gr(PhyloName, cov = A)))
)
#### run mcmcglm and brms ####
# mcmcglmm
<- lapply(seq_along(mcmcglmm_formulas), function(i) {
mcmcglmm_results <- mcmcglmm_formulas[[i]]
formula <- run_mcmcglmm(formula = formula, data = p_dat, prior = prior, inv_phylo = inv.phylo$Ainv)
model
saveRDS(model, file = here("Rdata_tutorial", "binomial", "bivariate", paste0("mcmcglmm_BLB_v2", i, ".rds")))
return(model)
})
# brms
<- lapply(seq_along(brms_formulas), function(i) {
brms_results <- brms_formulas[[i]]
formula <- default_prior(formula,
prior2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
<- run_brms(formula = formula, data = p_dat, A = A, prior = prior2)
model
saveRDS(model, file = here("Rdata_tutorial", "binomial", "bivariate", paste0("brms_BLB_v2", i, ".rds")))
return(model)
})
Results from each models
Probit link model:
Across all three models, the 95% CIs were wide, particularly for phylogenetic random effect variances and correlations. This indicates substantial uncertainty around the exact magnitude of phylogenetic effects, even though the overall patterns were consistent. Estimate differed somewhat between MCMCglmm
and brms
, but their 95%CIs largely overlapped and the direction of effects was the same, suggesting that the results are robust.
Intercept-only model
summary(mcmc_BPB1)
# Iterations = 75001:324751
# Thinning interval = 250
# Sample size = 1000
#
# DIC:
#
# G-structure: ~us(trait):PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 8.585 1.4925 20.01 1405.5
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 6.770 1.1077 15.62 895.0
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 6.770 1.1077 15.62 895.0
# traitred_pelage_head:traitred_pelage_head.PhyloName 9.812 0.9357 25.06 736.1
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1 1 1 0
# traitred_pelage_head:traitred_pelage_body_limbs.units 0 0 0 0
# traitred_pelage_body_limbs:traitred_pelage_head.units 0 0 0 0
# traitred_pelage_head:traitred_pelage_head.units 1 1 1 0
#
# Location effects: cbind(red_pelage_body_limbs, red_pelage_head) ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitred_pelage_body_limbs -1.016 -3.755 1.871 1000 0.454
# traitred_pelage_head -1.077 -4.036 2.085 1000 0.448
posterior_summary(mcmc_BPB1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs -1.015976 1.433527 -4.002762 1.659291
# traitred_pelage_head -1.076598 1.581606 -4.486771 1.754392
posterior_summary(mcmc_BPB1$VCV)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 8.585286 5.994698 2.308280 24.00470
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 6.770038 4.153591 1.843889 18.91487
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 6.770038 4.153591 1.843889 18.91487
# traitred_pelage_head:traitred_pelage_head.PhyloName 9.811660 8.425988 1.944167 34.12230
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1.000000 0.000000 1.000000 1.00000
# traitred_pelage_head:traitred_pelage_body_limbs.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_body_limbs:traitred_pelage_head.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_head:traitred_pelage_head.units 1.000000 0.000000 1.000000 1.00000
summary(brms_BPB1)
# Family: MV(bernoulli, bernoulli)
# Links: mu = probit
# mu = probit
# Formula: red_pelage_body_limbs ~ 1 + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ 1 + (1 | a | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 25000; warmup = 5000; thin = 1;
# total post-warmup draws = 40000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 2.47 0.70 1.35 4.11 1.00 9742 19379
# sd(redpelagehead_Intercept) 2.54 0.86 1.20 4.54 1.00 8959 17895
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.79 0.12 0.50 0.96 1.00 13059 17565
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.57 1.03 -2.68 1.45 1.00 19113 23116
# redpelagehead_Intercept -0.63 1.07 -2.74 1.52 1.00 22258 24066
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
Both MCMCglmm
and brms
indicated a strong phylogenetic random effects (body/limbs: 8.59 (MCMCglmm
) and 6.10 (brms
), head: 9.81 (MCMCglmm
) and 6.45 (brms
)) and correlation between red pelage on the body/limbs and on the head (: 0.74 (MCMCglmm
) and 0.79 (brms
)). The intercepts were close to zero with wide intervals, suggesting no overall bias in trait prevalence, but a clear tendency for the two traits to co-occur along phylogenetic lineages. Note: You can calculate correlation using cov(body/limbs, head)/var(body/limbs)*var(head)
.
One explanatory variable model
summary(mcmc_BPB2)
# Iterations = 165001:714451
# Thinning interval = 550
# Sample size = 1000
#
# DIC:
#
# G-structure: ~us(trait):PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 9.206 0.8927 20.70 936.9
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 7.773 0.9654 18.15 1000.0
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 7.773 0.9654 18.15 1000.0
# traitred_pelage_head:traitred_pelage_head.PhyloName 11.574 1.2028 28.12 1000.0
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1 1 1 0
# traitred_pelage_head:traitred_pelage_body_limbs.units 0 0 0 0
# traitred_pelage_body_limbs:traitred_pelage_head.units 0 0 0 0
# traitred_pelage_head:traitred_pelage_head.units 1 1 1 0
#
# Location effects: cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitred_pelage_body_limbs -0.96134 -3.71702 1.92201 1000 0.468
# traitred_pelage_head -1.03760 -4.15635 2.23607 1000 0.468
# cSocial_group_size:traitred_pelage_body_limbs -0.11524 -0.40543 0.15340 1159 0.450
# cSocial_group_size:traitred_pelage_head 0.06917 -0.16555 0.32105 1000 0.596
posterior_summary(mcmc_BPB2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 9.205964 7.089659 2.195793 27.00238
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 7.772934 4.949530 2.069536 21.01101
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 7.772934 4.949530 2.069536 21.01101
# traitred_pelage_head:traitred_pelage_head.PhyloName 11.573686 9.504936 2.138451 38.94923
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1.000000 0.000000 1.000000 1.00000
# traitred_pelage_head:traitred_pelage_body_limbs.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_body_limbs:traitred_pelage_head.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_head:traitred_pelage_head.units 1.000000 0.000000 1.000000 1.00000
posterior_summary(mcmc_BPB2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs -0.96134135 1.462895 -4.0424929 1.6874910
# traitred_pelage_head -1.03760246 1.604165 -4.4335258 2.1006014
# cSocial_group_size:traitred_pelage_body_limbs -0.11524039 0.150902 -0.4464785 0.1389921
# cSocial_group_size:traitred_pelage_head 0.06917466 0.125719 -0.1645042 0.3210948
summary(brms_BPB2)
# Family: MV(bernoulli, bernoulli)
# Links: mu = probit
# mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ cSocial_group_size + (1 | a | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 35000; warmup = 15000; thin = 1;
# total post-warmup draws = 40000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 2.56 0.74 1.39 4.26 1.00 8659 18347
# sd(redpelagehead_Intercept) 2.70 0.95 1.28 4.95 1.00 8461 16509
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.80 0.12 0.52 0.96 1.00 14328 20705
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.59 1.05 -2.75 1.46 1.00 22699 26397
# redpelagehead_Intercept -0.62 1.10 -2.85 1.60 1.00 25511 25658
# redpelagebodylimbs_cSocial_group_size -0.11 0.14 -0.43 0.14 1.00 54965 30027
# redpelagehead_cSocial_group_size 0.06 0.12 -0.17 0.30 1.00 50035 27765
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
Adding social group size as a predictor did not change the results in any important way. The estimated effects of group size were very small and the uncertainty intervals included zero, meaning that species living in larger or smaller groups were not more or less likely to show red pelage. The strong positive phylogenetic correlation between the two body regions remained, showing that shared ancestry, rather than social structure, is the main factor explaining why the traits occur together.
Two explanatory variables model
summary(mcmc_BPB3)
# Iterations = 225001:974251
# Thinning interval = 750
# Sample size = 1000
#
# DIC:
#
# G-structure: ~us(trait):PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 10.593 1.140 26.01 1000.0
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 9.358 1.550 22.36 886.2
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 9.358 1.550 22.36 886.2
# traitred_pelage_head:traitred_pelage_head.PhyloName 15.643 1.257 43.19 721.6
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1 1 1 0
# traitred_pelage_head:traitred_pelage_body_limbs.units 0 0 0 0
# traitred_pelage_body_limbs:traitred_pelage_head.units 0 0 0 0
# traitred_pelage_head:traitred_pelage_head.units 1 1 1 0
#
# Location effects: cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + diurnal:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitred_pelage_body_limbs -0.80913 -4.13667 2.39332 1000.0 0.590
# traitred_pelage_head -1.08133 -5.11647 2.61995 1000.0 0.504
# cSocial_group_size:traitred_pelage_body_limbs -0.11514 -0.43031 0.17890 934.6 0.476
# cSocial_group_size:traitred_pelage_head 0.07673 -0.18447 0.34485 1000.0 0.542
# traitred_pelage_body_limbs:diurnal -0.48680 -2.12428 1.33723 1000.0 0.570
# traitred_pelage_head:diurnal 0.15765 -1.75338 2.13502 1000.0 0.926
posterior_summary(mcmc_BPB3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs -0.8091325 1.6473735 -4.0927622 2.4659980
# traitred_pelage_head -1.0813268 1.8811821 -5.1975758 2.5249237
# cSocial_group_size:traitred_pelage_body_limbs -0.1151393 0.1595438 -0.4547532 0.1720350
# cSocial_group_size:traitred_pelage_head 0.0767252 0.1374082 -0.1824891 0.3458651
# traitred_pelage_body_limbs:diurnal -0.4867977 0.8865311 -2.2848913 1.1781883
# traitred_pelage_head:diurnal 0.1576550 1.0020809 -1.7291174 2.2491917
posterior_summary(mcmc_BPB3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 10.593195 8.204460 2.421832 32.32659
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 9.358446 6.292992 2.161271 27.21691
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 9.358446 6.292992 2.161271 27.21691
# traitred_pelage_head:traitred_pelage_head.PhyloName 15.642777 16.000102 2.606732 59.73609
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1.000000 0.000000 1.000000 1.00000
# traitred_pelage_head:traitred_pelage_body_limbs.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_body_limbs:traitred_pelage_head.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_head:traitred_pelage_head.units 1.000000 0.000000 1.000000 1.00000
# traitred_pelage_head:traitred_pelage_head.units 1.000000 0.000000 1.000000 1.00000
summary(brms_BPB3)
# Family: MV(bernoulli, bernoulli)
# Links: mu = probit
# mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ cSocial_group_size + activity_cycle + (1 | a | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 35000; warmup = 25000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 2.84 0.88 1.51 4.90 1.00 3193 6522
# sd(redpelagehead_Intercept) 3.07 1.21 1.39 5.97 1.00 3135 5097
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.80 0.11 0.52 0.96 1.00 5791 8929
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.67 1.45 -3.60 2.19 1.00 7758 10369
# redpelagehead_Intercept -0.42 1.54 -3.61 2.56 1.00 8030 8538
# redpelagebodylimbs_cSocial_group_size -0.11 0.15 -0.44 0.16 1.00 15582 11724
# redpelagebodylimbs_activity_cycledi -0.13 1.02 -2.23 1.85 1.00 8423 10023
# redpelagebodylimbs_activity_cyclenoct 0.39 1.07 -1.75 2.50 1.00 8554 9618
# redpelagehead_cSocial_group_size 0.07 0.12 -0.18 0.31 1.00 14941 11504
# redpelagehead_activity_cycledi -0.07 1.08 -2.07 2.21 1.00 8644 8966
# redpelagehead_activity_cyclenoct -0.44 1.20 -2.75 2.02 1.00 9458 8874
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
When both social group size and activity cycle were included, neither predictor showed strong or consistent effects. The estimated coefficients were uncertain and close to zero. As in the previous models. there was a strong positive phylogenetic random effect and correlation between the two traits, indicating that shared ancestry is the main factor explaining why the traits occur together.
Logit link model:
In the logit link model, we need to correct the variances and estimates in the results from MCMCglmm
…
Intercept-only model
# summary(mcmc_BLB1)
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmc_BLB1$Sol/sqrt(1+c2)
res_1 <- mcmc_BLB1$VCV/(1+c2)
res_2
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2 -1.521216 2.276932 -6.235553 2.645887
# traitred_pelage_head.2 -1.697114 2.392179 -6.463650 3.067338
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.PhyloName 22.3183259 15.06258 5.8362624 60.1615736
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.PhyloName 18.0581310 10.73053 4.5460350 46.1693777
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.PhyloName 18.0581310 10.73053 4.5460350 46.1693777
# traitred_pelage_head.2:traitred_pelage_head.2.PhyloName 25.3350646 20.48068 4.7065443 77.9581473
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.units 0.7430287 0.00000 0.7430287 0.7430287
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.units 0.0000000 0.00000 0.0000000 0.0000000
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.units 0.0000000 0.00000 0.0000000 0.0000000
# traitred_pelage_head.2:traitred_pelage_head.2.units 0.7430287 0.00000 0.7430287 0.7430287
summary(brms_BLB1)
# Family: MV(bernoulli, bernoulli)
# Links: mu = logit
# mu = logit
# Formula: red_pelage_body_limbs ~ 1 + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ 1 + (1 | a | gr(PhyloName, cov = A))
# Data: data (Number of observations: 265)
# Draws: 2 chains, each with iter = 4000; warmup = 3000; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 3.88 1.17 2.08 6.63 1.00 509 963
# sd(redpelagehead_Intercept) 3.99 1.37 1.86 7.18 1.01 402 854
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.79 0.13 0.49 0.96 1.02 434 823
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.63 1.42 -3.54 2.06 1.00 957 1060
# redpelagehead_Intercept -0.82 1.46 -3.63 2.23 1.00 1092 1298
Intercepts were near zero with very wide intervals in MCMCglmm
and brms
. Phylogenetic variances were large and the cross‐trait covariance was positive, giving a high phylogenetic correlation. Overall, both models agree on a strong positive phylogenetic association between traits, although estimates are uncertain and intervals are wide.
One explanatory variable model
# summary(mcmc_BLB2)
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmc_BLB2$Sol/sqrt(1+c2)
res_1 <- mcmc_BLB2$VCV/(1+c2)
res_2
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2 -1.7247280 2.5117027 -7.4454943 2.7521580
# traitred_pelage_head.2 -1.8714622 2.6036701 -7.4632854 3.1199567
# cSocial_group_size:traitred_pelage_body_limbs.2 -0.2114015 0.2693973 -0.7956990 0.2218048
# cSocial_group_size:traitred_pelage_head.2 0.1132643 0.2118232 -0.3220887 0.5181248
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.PhyloName 25.8375512 17.21042 6.4036205 71.0978415
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.PhyloName 22.0012511 13.81957 5.1039500 58.9474559
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.PhyloName 22.0012511 13.81957 5.1039500 58.9474559
# traitred_pelage_head.2:traitred_pelage_head.2.PhyloName 31.8824714 28.59083 5.5360062 95.3746806
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.units 0.7430287 0.00000 0.7430287 0.7430287
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.units 0.0000000 0.00000 0.0000000 0.0000000
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.units 0.0000000 0.00000 0.0000000 0.0000000
# traitred_pelage_head.2:traitred_pelage_head.2.units 0.7430287 0.00000 0.7430287 0.7430287
summary(brms_BLB2)
# Family: MV(bernoulli, bernoulli)
# Links: mu = logit
# mu = logit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ cSocial_group_size + (1 | a | gr(PhyloName, cov = A))
# Data: data (Number of observations: 265)
# Draws: 2 chains, each with iter = 4000; warmup = 3000; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 3.93 1.19 2.10 6.80 1.00 526 916
# sd(redpelagehead_Intercept) 4.09 1.34 1.89 7.02 1.00 521 774
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.78 0.13 0.47 0.96 1.00 827 1113
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.71 1.43 -3.58 2.12 1.00 864 1130
# redpelagehead_Intercept -0.84 1.42 -3.68 1.99 1.00 1059 1183
# redpelagebodylimbs_cSocial_group_size -0.19 0.25 -0.74 0.23 1.00 2204 1489
# redpelagehead_cSocial_group_size 0.11 0.20 -0.30 0.50 1.00 2265 1554
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
Intercepts for both traits were close to zero with very wide intervals. Social group size effects were small and did not overlap with zero. Phylogenetic random effects are large, yielding a strong phylogenetic correlation in both models. Adding social group size did not materially change the results. Both approaches support a strong phylogenetic association between traits.
Two explanatory variables model
# summary(mcmc_BLB3)
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmc_BLB3$Sol/sqrt(1+c2)
res_1 <- mcmc_BLB3$VCV/(1+c2)
res_2
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2 -52.47073523 153.0210964 -470.6945375 213.7168079
# traitred_pelage_head.2 -1.59381221 4.2268293 -9.9107244 6.2757031
# cSocial_group_size:traitred_pelage_body_limbs.2 -0.05546842 5.6201453 -12.2032240 14.3404769
# cSocial_group_size:traitred_pelage_head.2 0.15627339 0.2463914 -0.3461778 0.6621622
# traitred_pelage_body_limbs.2:activity_cycledi -8.13420554 61.9395801 -159.5704053 139.5901896
# traitred_pelage_head.2:activity_cycledi -0.29106864 2.1108018 -4.1396017 4.2412983
# traitred_pelage_body_limbs.2:activity_cyclenoct 13.99091102 63.9019094 -121.4704631 173.7572996
# traitred_pelage_head.2:activity_cyclenoct -0.83013530 2.2719205 -5.0979767 3.5671068
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.PhyloName 7.672423e+04 111379.21048 9.1326987 3.823663e+05
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.PhyloName 1.199345e+03 1623.30073 8.2063366 5.272497e+03
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.PhyloName 1.199345e+03 1623.30073 8.2063366 5.272497e+03
# traitred_pelage_head.2:traitred_pelage_head.2.PhyloName 5.319362e+01 72.57619 7.6215483 1.778477e+02
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.units 7.430287e-01 0.00000 0.7430287 7.430287e-01
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.units 0.000000e+00 0.00000 0.0000000 0.000000e+00
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.units 0.000000e+00 0.00000 0.0000000 0.000000e+00
# traitred_pelage_head.2:traitred_pelage_head.2.units 7.430287e-01 0.00000 0.7430287 7.430287e-01
summary(brms_BLB3)
#
# Family: MV(bernoulli, bernoulli)
# Links: mu = logit
# mu = logit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ cSocial_group_size + activity_cycle + (1 | a | gr(PhyloName, cov = A))
# Data: data (Number of observations: 265)
# Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
# total post-warmup draws = 10000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 4.37 1.32 2.34 7.44 1.00 2900 4826
# sd(redpelagehead_Intercept) 4.66 2.05 2.06 8.70 1.00 2332 3687
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.79 0.12 0.51 0.96 1.00 3899 5188
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.70 2.08 -4.86 3.43 1.00 8051 7049
# redpelagehead_Intercept -0.34 2.19 -4.85 3.93 1.00 8470 6758
# redpelagebodylimbs_cSocial_group_size -0.19 0.26 -0.77 0.25 1.00 13234 6843
# redpelagebodylimbs_activity_cycledi -0.22 1.66 -3.72 2.95 1.00 8089 6616
# redpelagebodylimbs_activity_cyclenoct 0.54 1.75 -3.03 3.98 1.00 7681 6273
# redpelagehead_cSocial_group_size 0.11 0.21 -0.30 0.52 1.00 12432 5620
# redpelagehead_activity_cycledi -0.17 1.78 -3.45 3.50 1.00 6806 4880
# redpelagehead_activity_cyclenoct -0.94 1.92 -4.62 3.11 1.00 7076 5211
In MCMCglmm
, estimates were generally unstable. In particular, the intercept for body/limbs and the activity cycle effects showed extremely large errors and wide intervals. The effects of social group size and activity cycle both crossed zero, providing no evidence of significant associations. Phylogenetic variance estimates were extremely large with very wide intervals, indicating low precision. In brms
, the estimates were more stable. The phylogenetic random‐effect standard deviations were large, and the between‐trait phylogenetic correlation remained high. The effects of social group size and activity cycle were small, with 95% intervals overlapping zero. Overall, in both implementations, neither social group size nor activity cycle emerged as important predictors of red pelage, while the main effect was a consistently strong positive phylogenetic correlation between traits. However, MCMCglmm
produced unstable estimates with very wide intervals, and this discrepancy likely reflects differences in prior specification between the two frameworks.
3. Ordinal (threshold) models (ordered multinomial model)
In the ordinal threshold model and nominal model, we will show 2 realistic examples. As we explained in the BOX2
on main text, MCMCglmm
and brms
take different parametrisation for the threshold model. At first sight, the outputs about thresholds (cutpoints) appear to be quite difficult. But do not worry about it,we will guide you step by step to understand and interpret these outputs. By the end of this section, you will be able to compare the threshold estimates from both models and understand their implications.
Explanation of dataset
We used the dataset from AVONET (Tobias et al. (2022)) and the phylogenetic tree from BirdTree.
For first example, we tested the relationships between migration level (response variable) and body mass and habitat density in birds of prey (136 species).
For second example, we investigated the difference of habitat density within family Phasianidae (179 species) are affected by tail length and diet (herbivore).
Datasets for example 1
<- read.csv(here("data", "bird body mass", "accipitridae_sampled.csv"))
dat <- dat %>%
dat mutate(across(c(Trophic.Level, Trophic.Niche, Primary.Lifestyle, Migration, Habitat, Species.Status), as.factor))
# rename: numbers to descriptive name
$Migration_ordered <- factor(dat$Migration, levels = c("sedentary", "partially_migratory", "migratory"), ordered = TRUE)
dat<- dat %>%
dat mutate(Habitat.Density = case_when(
== 1 ~ "dense",
Habitat.Density == 2 ~ "semi-open",
Habitat.Density == 3 ~ "open",
Habitat.Density TRUE ~ as.character(Habitat.Density)
))
table(dat$Habitat.Density)
$logMass <- log(dat$Mass)
dathist(dat$logMass)
boxplot(logMass ~ Habitat.Density, data = dat)
summary(dat)
<- read.nexus(here("data", "bird body mass", "accipitridae_sampled.nex"))
trees <- trees[[1]]
tree
summary(dat)
Datasets for example 2
<- read.csv(here("data", "bird body mass", "9993spp_clearned.csv"))
dat <- dat %>%
dat mutate(across(c(Trophic.Level, Trophic.Niche,
Primary.Lifestyle, Migration, Habitat, Habitat.Density , Species.Status), as.factor),Habitat.Density = factor(Habitat.Density, ordered = TRUE))
<- dat %>%
Family_HD group_by(Family, Habitat.Density) %>%
tally() %>%
group_by(Family) %>%
filter(all(n >= 5)) %>%
ungroup()
<- dat %>%
Family_HD filter(Family %in% c("Phasianidae", "Ploceidae", "Sturnidae")) %>%
count(Family, Habitat.Density)
<- dat %>%
Phasianidae_dat filter(Family == "Phasianidae") %>%
mutate(Habitat.Density = case_when(
== 1 ~ "dense",
Habitat.Density == 2 ~ "semi-open",
Habitat.Density == 3 ~ "open"
Habitat.Density
),Habitat.Density = factor(Habitat.Density, ordered = TRUE)
)<- Phasianidae_dat %>%
Phasianidae_dat mutate(Habitat_Forest = ifelse(Habitat == "Forest", 1, 0),
Habitat_Forest = replace_na(Habitat_Forest, 0))
$cTail_length <- scale(log(Phasianidae_dat$Tail.Length), center = TRUE, scale = FALSE)
Phasianidae_dat
View(Phasianidae_dat)
<- read.nexus(here("data", "bird body mass", "Phasianidae.nex"))
Phasianidae_tree <- Phasianidae_tree[[1]] tree
Example 1
Intercept-only model
MCMCglmm
When using MCMCglmm for ordinal data, you can choose between two options for the family
parameter: "threshold"
and "ordinal"
. These options differ in how they handle residual variances and error distributions.
We generally recommend using family = "threshold"
for following reasons - directly corresponds to the standard probit regression with a variance of 1, no correcting of outputs is necessary, and it provides a more computationally efficient approach.
We show both models through this section;
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Migration_ordered ~ 1,
mcmcglmm_T1_1 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
)
)
system.time(
<- MCMCglmm(Migration_ordered ~ 1,
mcmcglmm_O1_1 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
) )
For ordinal model, we need the correction;
<- 1
c2 <- model$Sol / sqrt(1+c2) # for fixed effects
res_1 <- model$VCV / (1+c2) # for variance components res_2
summary(mcmcglmm_T1_1) # 95%HPD Interval
# Iterations = 120001:519601
# Thinning interval = 400
# Sample size = 1000
#
# DIC: 297.9054
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.43 1.196e-06 4.766 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 0.3178 -0.7651 1.5749 1000 0.464
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.011 0.7089 1.408 1000
posterior_summary(mcmcglmm_T1_1$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.429685 2.187129 0.0020836 6.25389
# units 1.000000 0.000000 1.0000000 1.00000
posterior_summary(mcmcglmm_T1_1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.3178127 0.5848262 -0.8350972 1.513238
posterior_summary(mcmcglmm_T1_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_1) # 95%HPD Interval
# Iterations = 120001:519601
# Thinning interval = 400
# Sample size = 1000
#
# DIC: 267.7801
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 2.94 1.447e-05 10.78 779.5
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 0.4319 -1.0821 2.2693 889.5 0.456
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.43 0.9309 1.959 817.1
posterior_summary(mcmcglmm_O1_1$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.940261 4.749804 0.003052149 13.92029
# units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.4319314 0.7973933 -1.170937 2.245297
<- 1
c2 <- mcmcglmm_O1_1$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O1_1$VCV / (1+c2) # for variance components
res_2 <- mcmcglmm_O1_1$CP / sqrt(1+c2) # for cutpoint
res_3
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.3723211 0.6873459 -1.009337 1.935426
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.1846987 3.529241 0.002267834 10.3431749
# units 0.7430287 0.000000 0.743028733 0.7430287
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.232696 0.2358161 0.8556872 1.782325
the ordinal
models show a smaller effect size compared to the threshold model, but the difference is minor and should not be an issue. The convergences appear to be satisfactory for both models. Also, it is well-known that ordinal models in MCMCglmm tend to be more difficult to converge compared to the threshold models. Despite this, the results obtained from both models are largely consistent, showing minimal difference. As you can see here, residual variance is fixed 1 - the model did not estimate the residual variance (
R-structure`).
Look at the output more closely - only one cutpoint is provided although our response variable is three ordered categorical traits! As we have explained in main text, MCMCglmm
implicitly assumes that the first cut point is always set to 0, making the intercept represent the centre of the distribution.
brms
The family argument is set to cumulative()
with link = "probit"
, which uses the standard probit parametrisation where the residual variance is fixed at 1. Since this is the standard parametrisation for ordinal probit regression, no correcting of coefficients is necessary.
<- ape::vcv.phylo(tree, corr = TRUE)
A
<- default_prior(
default_priors ~ 1 + (1 | gr(Phylo, cov = A)),
Migration_ordered data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
<- brm(
brms_OT1_1 formula = Migration_ordered ~ 1 + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 19000,
warmup =16000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
) )
summary(brms_OT1_1)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Migration_ordered ~ 1 + (1 | gr(Phylo, cov = A))
# Data: dat (Number of observations: 136)
# Draws: 2 chains, each with iter = 20000; warmup = 10000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 136)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.85 0.61 0.04 2.28 1.00 1594 3377
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -0.32 0.47 -1.36 0.61 1.00 12133 7267
# Intercept[2] 0.66 0.47 -0.24 1.75 1.00 11680 8235
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
In brms
, now you can see the two intercepts and no cutpoint in the output! In fact, the intercept is described each cutpoint. However, the results (how many species fall into each category) show little difference between MCMCglmm
and brms
, as can be seen below.
Probabilities
We can obtain the probabilities for each category (migrate level: sedentary, partially migratory, migratory) using pnorm()
. We made simple functions to calculate these and use them for the threshold models in MCMCglmm
and brms
.
# l = intercept
<- function(cutpoint0, cutpoint1, l) {
calculate_probabilities
<- pnorm(cutpoint0 - l)
category1_prob <- pnorm(cutpoint1 - l) - pnorm(cutpoint0 - l)
category2_prob <- 1 - pnorm(cutpoint1 - l)
category3_prob
return(c(sedentary = category1_prob,
partially_migratory = category2_prob,
migratory = category3_prob)
)
}
<- calculate_probabilities(0, 1.01, 0.3178)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
# sedentary partially_migratory migratory
# 0.3753183 0.3802758 0.2444059
<- calculate_probabilities(-0.32, 0.66, 0)
probabilities_brms print(probabilities_brms)
# sedentary partially_migratory migratory
# 0.3744842 0.3708889 0.2546269
The results obtained in MCMCglmm
are as follows: Sedentary: 0.3753183, Partially migratory: 0.3802758, and Migratory: 0.2444059. In brms
, the results are Sedentary: 0.3744842, Partially migratory: 0.3708889, and Migratory: 0.2546269. As shown, although MCMCglmm
and brms
use parametrisations of and , respectively, the results are very similar (see Box 2 in main text).
Phylogenetic signals
# MCMCglmm
<- ((mcmcglmm_T1_1$VCV[, "Phylo"]) / (mcmcglmm_T1_1$VCV[, "Phylo"] + 1))
phylo_signal_mcmcglmm_T %>% mean()
phylo_signal_mcmcglmm_T # [1] 0.4172334
%>% quantile(probs = c(0.025, 0.5, 0.975))
phylo_signal_mcmcglmm_T # 2.5% 50% 97.5%
# 0.002079267 0.430888403 0.862142904
<- ((mcmcglmm_O1_1$VCV[, "Phylo"]/(1+c2)) / (mcmcglmm_O1_1$VCV[, "Phylo"]/(1+c2) + 1))
phylo_signal_mcmcglmm_O %>% mean()
phylo_signal_mcmcglmm_O # [1] 0.4731027
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mcmcglmm_O # 2.5% 50% 97.5%
# 0.002262703 0.515490529 0.911841208
# brms
<- brms_OT1_1 %>% as_tibble() %>%
phylo_signal_brms_OT ::select(Sigma_phy = sd_Phylo__Intercept) %>%
dplyrmutate(lambda = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda)
%>% mean()
phylo_signal_brms_OT # [1] 0.3692979
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_brms_OT # 2.5% 50% 97.5%
# 0.001264342 0.359406642 0.838180812
In both MCMCglmm
(threshold and ordinal models) and brms
, the phylogenetic signal is estimated to be moderate. The moderately high phylogenetic signal suggests that closely related species tend to exhibit similar migratory behaviours, indicating that phylogeny plays a role in shaping these traits. However, this also implies that additional factors, such as environmental or ecological influences, likely contribute to the variation in migratory behavior. This balance between phylogenetic constraints and external influences may reflect evolutionary flexibility in migratory strategies among species.
One continuous explanatory variable model
The model using MCMCglmm
is
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Migration_ordered ~ logMass,
mcmcglmm_T1_2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*45,
thin = 10*45,
burnin = 3000*45
)
)
system.time(
<- MCMCglmm(Migration_ordered ~ logMass,
mcmcglmm_O1_2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
) )
For brms
…
<- ape::vcv.phylo(tree,corr = TRUE)
A
<- default_prior(
default_priors2 ~ logMass + (1 | gr(Phylo, cov = A)),
Migration_ordered data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
<- brm(
brms_OT1_2 formula = Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
) )
# MCMCglmm
summary(mcmcglmm_T1_2)
# Iterations = 135001:584551
# Thinning interval = 450
# Sample size = 1000
#
# DIC: 300.7449
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 2.068 6.389e-06 7.225 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 0.88928 -1.69994 3.49007 1000 0.436
# logMass -0.08033 -0.37129 0.21982 1000 0.550
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.052 0.6441 1.469 897.7
posterior_summary(mcmcglmm_T1_2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.88927720 1.2935980 -1.5704745 3.6731840
# logMass -0.08033062 0.1488659 -0.3928655 0.2076132
posterior_summary(mcmcglmm_T1_2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.068366 3.804493 0.00210611 11.97129
# units 1.000000 0.000000 1.00000000 1.00000
posterior_summary(mcmcglmm_T1_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_2) # 95%HPD Interval
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 268.7424
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 4.133 8.171e-05 14.26 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.2622 -2.7953 4.3184 1000 0.432
# logMass -0.1087 -0.5494 0.3449 1000 0.574
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.489 0.9695 2.098 1000
posterior_summary(mcmcglmm_O1_2$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 4.132637 6.62053 0.004598829 19.22872
# units 1.000000 0.00000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.262162 1.8674403 -2.1994721 5.0542830
# logMass -0.108659 0.2176232 -0.5751705 0.3314989
<- 1
c2 <- mcmcglmm_O1_2$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O1_2$VCV / (1+c2) # for variance components
res_2 <- mcmcglmm_O1_2$CP / sqrt(1+c2)
res_3
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.89248312 1.3204797 -1.555262 3.5739178
# logMass -0.07683353 0.1538828 -0.406707 0.2344051
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.066318 3.310265 0.002299414 9.614361
# units 0.500000 0.000000 0.500000000 0.500000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.053063 0.2218614 0.7173326 1.575309
#brms
summary(brms_OT1_2)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A))
# Data: dat (Number of observations: 136)
# Draws: 2 chains, each with iter = 20000; warmup = 10000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 136)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.93 0.66 0.04 2.45 1.00 1192 3611
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -0.85 1.07 -3.16 1.26 1.00 15424 9201
# Intercept[2] 0.15 1.06 -2.03 2.33 1.00 16388 10923
# logMass -0.07 0.13 -0.34 0.19 1.00 18410 10854
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
Both models (brms
and MCMCglmm
) indicate body mass (logMass) does not have a statistically significant effect on the migration order.
<- calculate_probabilities(0, 0.88, 1.05)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
# sedentary partially_migratory migratory
# 0.1468591 0.2856460 0.5674949
<- calculate_probabilities(-0.85, 0.15, 0)
probabilities_brms print(probabilities_brms)
# sedentary partially_migratory migratory
# 0.1976625 0.3619551 0.4403823
The coefficient of logMass is negative, but the actual results show an increase in the probability of being migratory. This is likely because the effect of logMass is not statistically significant, and due to wide credible intervals, the probabilities may shift due to sampling variability. Therefore, the results should be interpreted with caution, as they may reflect uncertainty in the data rather than a true underlying effect of logMass on migratory tendencies.
One continuous and one categorical explanatory variable model
For the model MCMCglmm
the following - here, we use the gelman,prior()
function for prior setting for fixed effects to ensure weakly informative prior that help with model convergence. You can find further information in Chapter 4. Nominal models.
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(B = list(mu = rep(0,4),
prior1 V = gelman.prior(~logMass + Habitat.Density, data = dat, scale = sqrt(1+1))),
R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Migration_ordered ~ logMass + Habitat.Density,
mcmcglmm_T1_3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
system.time(
<- MCMCglmm(Migration_ordered ~ logMass + Habitat.Density,
mcmcglmm_O1_3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
) )
For brms
,
<- ape::vcv.phylo(tree, cor = TRUE)
A
<- default_prior(
default_priors3 ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
Migration_ordered data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
brm_OT1_3 formula = Migration_ordered ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 6500,
warmup = 5500,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
) )
# MCMCglmm
summary(mcmcglmm_T1_3)
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 289.3043
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.988 4.136e-07 6.227 758.7
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass + Habitat.Density
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.3097 -1.1727 3.6674 1000.0 0.254
# logMass -0.2664 -0.5703 0.0479 891.2 0.058 .
# Habitat.Densityopen 1.3154 0.4916 2.0664 1054.1 <0.001 ***
# Habitat.Densitysemi-open 0.3955 -0.3020 0.9998 1000.0 0.218
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.116 0.6674 1.503 1022
posterior_summary(mcmcglmm_T1_3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.987738 4.411821 0.009463594 9.04495
# units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_T1_3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.3096841 1.2767628 -1.0290056 3.809598026
# logMass -0.2664037 0.1660313 -0.6334123 0.007099615
# Habitat.Densityopen 1.3153548 0.4385812 0.6134014 2.247115510
# Habitat.Densitysemi-open 0.3954618 0.3425752 -0.2200316 1.117789403
posterior_summary(mcmcglmm_T1_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_3) # 95%HPD Interval
# Iterations = 450001:1948501
# Thinning interval = 1500
# Sample size = 1000
#
# DIC: 255.0597
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 4.142 1.314e-06 14.07 839
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass + Habitat.Density
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.87956 -1.61645 5.74879 1000.0 0.262
# logMass -0.38074 -0.82066 0.06748 1112.4 0.078 .
# Habitat.Densityopen 1.85563 0.72168 3.01111 842.4 <0.001 ***
# Habitat.Densitysemi-open 0.58208 -0.29301 1.48943 1000.0 0.190
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.579 1.064 2.204 885.1
posterior_summary(mcmcglmm_O1_3$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 4.142472 9.805007 0.03628788 19.59177
# units 1.000000 0.000000 1.00000000 1.00000
posterior_summary(mcmcglmm_O1_3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.8795575 1.8603927 -1.5225125 6.0417116
# logMass -0.3807427 0.2337957 -0.8716018 0.0461751
# Habitat.Densityopen 1.8556261 0.6258944 0.8289186 3.1677050
# Habitat.Densitysemi-open 0.5820800 0.4612845 -0.2342549 1.5671716
<- 1
c2 <- mcmcglmm_O1_3$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O1_3$VCV / (1+c2) # for variance components
res_2 <- mcmcglmm_O1_3$CP / sqrt(1+c2)
res_3
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.3290479 1.3154963 -1.0765789 4.27213525
# logMass -0.2692257 0.1653185 -0.6163155 0.03265073
# Habitat.Densityopen 1.3121258 0.4425742 0.5861339 2.23990572
# Habitat.Densitysemi-open 0.4115927 0.3261774 -0.1656433 1.10815765
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.071236 4.902504 0.01814394 9.795886
# units 0.500000 0.000000 0.50000000 0.500000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116852 0.2491617 0.7783082 1.6727
# brms
summary(brms_OT1_3)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Migration_ordered ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A))
# Data: dat (Number of observations: 136)
# Draws: 2 chains, each with iter = 6500; warmup = 5500; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 136)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.91 0.58 0.08 2.31 1.01 231 460
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -1.34 1.10 -3.63 0.73 1.00 1364 1095
# Intercept[2] -0.27 1.09 -2.45 1.85 1.00 1523 1156
# logMass -0.26 0.14 -0.56 0.01 1.00 1300 1172
# Habitat.Densityopen 1.25 0.38 0.59 2.08 1.00 533 941
# Habitat.DensitysemiMopen 0.37 0.31 -0.20 1.01 1.00 1236 926
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
The species living in open environments have significantly higher migration levels compared to those living in dense environments. No significant difference in migration levels was observed between species living in semi-open and dense environments.logMass
does not have a significant impact on migration levels, suggesting that Habitat.Density
is the primary factor influencing migration levels.
<- calculate_probabilities(0, 1.116, 1.3097)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
# sedentary partially_migratory migratory
# 0.09514867 0.32805672 0.57679460
<- calculate_probabilities(-1.33, -0.26, 0)
probabilities_brms print(probabilities_brms)
# sedentary partially_migratory migratory
# 0.09175914 0.30567275 0.60256811
The probability of the Migratory category is the highest, around 60%. This reflects the higher migration likelihood, especially for species living in open environments, which leads to a higher probability of being migratory.
Example 2
The second example is similar to the first one, but we will use a different dataset and different explanatory variables. The response variable is Habitat.Density
(1 = Dense habitats, 2 = Semi-open habitats, 3 = Open habitats) and the explanatory variables are cTail_length
(continuous) and Diet
(categorical).
Intercept only model
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Habitat.Density ~ 1,
mcmcglmm_T2_1 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*100,
thin = 10*10,
burnin = 3000*100
)
)
system.time(
<- MCMCglmm(Habitat.Density ~ 1,
mcmcglmm_O2_1 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
)
)
<- ape::vcv.phylo(tree, corr = TRUE)
A
<- default_prior(
default_priors ~ 1 + (1 | gr(Phylo, cov = A)),
Habitat.Density data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
<- brm(
brms_O2_1 formula = Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 19000,
warmup =16000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
) )
# MCMCglmm
summary(mcmcglmm_T2_1)
# Iterations = 195001:844351
# Thinning interval = 650
# Sample size = 1000
#
# DIC: 280.497
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 8.552 1.018 20.69 726.6
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Habitat.Density ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.2226 -3.6231 2.5774 1000 0.9
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitHabitat.Density.1 1.609 1.055 2.36 829.5
posterior_summary(mcmcglmm_T2_1$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 8.552458 6.42742 2.217233 24.88162
# units 1.000000 0.00000 1.000000 1.00000
posterior_summary(mcmcglmm_T2_1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.2226369 1.574717 -3.545592 2.680834
summary(mcmcglmm_O2_1) # 95%HPD Interval
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Sample size = 1000
#
# DIC: 244.2456
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 15.59 2.906 36.05 844
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Habitat.Density ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.210 -3.987 4.017 1000 0.904
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitHabitat.Density.1 2.221 1.418 3.062 1000
posterior_summary(mcmcglmm_O2_1$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 15.58885 10.86135 3.928927 43.26684
# units 1.00000 0.00000 1.000000 1.00000
posterior_summary(mcmcglmm_O2_1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.2100358 2.077207 -4.373171 3.795935
<- 1
c2 <- mcmcglmm_T2_1$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_T2_1$VCV / (1+c2) # for variance components
res_2 <- mcmcglmm_T2_1$CP / sqrt(1+c2)
res_3
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.157428 1.113493 -2.507112 1.895636
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 4.276229 3.21371 1.108617 12.44081
# units 0.500000 0.00000 0.500000 0.50000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116852 0.2491617 0.7783082 1.6727
# brms
summary(brms_OT2_1)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A))
# Data: Phasianidae_dat (Number of observations: 179)
# Draws: 2 chains, each with iter = 6000; warmup = 4000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 179)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.54 0.70 1.45 4.20 1.01 542 1179
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -0.21 1.09 -2.40 1.93 1.00 2080 2257
# Intercept[2] 1.33 1.10 -0.75 3.58 1.00 2219 2312
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
Phylogenetic signals
# MCMCglmm
<- ((mcmcglmm_T2_1$VCV[, "Phylo"]) / (mcmcglmm_T2_1$VCV[, "Phylo"] + 1))
phylo_signal_mcmcglmm_T2 %>% mean()
phylo_signal_mcmcglmm_T2 # [1] 0.8578778
%>% quantile(probs = c(0.025, 0.5, 0.975))
phylo_signal_mcmcglmm_T2 # 2.5% 50% 97.5%
# 0.6891739 0.8711971 0.9613625
<- ((mcmcglmm_O2_1$VCV[, "Phylo"]/(1+c2)) / (mcmcglmm_O2_1$VCV[, "Phylo"]/(1+c2) + 1))
phylo_signal_mcmcglmm_O2 %>% mean()
phylo_signal_mcmcglmm_O2 # [1] 0.8513276
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mcmcglmm_O2 # 2.5% 50% 97.5%
# 0.6626708 0.8640197 0.9558175
# brms
<- brms_OT2_1 %>% as_tibble() %>%
phylo_signal_brms_OT2 ::select(Sigma_phy = sd_Phylo__Intercept) %>%
dplyrmutate(lambda = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda)
%>% mean()
phylo_signal_brms_OT2 # [1] 0.8456712
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_brms_OT2 # 2.5% 50% 97.5%
# 0.6783932 0.8563418 0.9463123
Summary All three models (MCMCglmm with threshold, MCMCglmm
with ordinal, and brms
with cumulative probit) estimate high phylogenetic signal in Habitat.Density
(\(\lambda \approx 0.85\)), suggesting strong phylogenetic structuring in habitat density preferences within Phasianidae.
One continuous explanatory variable model
The model using MCMCglmm
is
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Habitat.Density ~ logMass,
mcmcglmm_T2_2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*250,
thin = 10*25,
burnin = 3000*250
)
)
system.time(
<- MCMCglmm(Habitat.Density ~ logMass,
mcmcglmm_O2_2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
<- ape::vcv.phylo(tree,corr = TRUE)
A
<- default_prior(
default_priors2 ~ logMass + (1 | gr(Phylo, cov = A)),
Habitat.Density data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
<- brm(
brms_OT2_2 formula = Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
) )
# MCMCglmm
summary(mcmcglmm_T1_2)
# Iterations = 135001:584551
# Thinning interval = 450
# Sample size = 1000
#
# DIC: 300.7449
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 2.068 6.389e-06 7.225 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 0.88928 -1.69994 3.49007 1000 0.436
# logMass -0.08033 -0.37129 0.21982 1000 0.550
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.052 0.6441 1.469 897.7
posterior_summary(mcmcglmm_T1_2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.88927720 1.2935980 -1.5704745 3.6731840
# logMass -0.08033062 0.1488659 -0.3928655 0.2076132
posterior_summary(mcmcglmm_T1_2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.068366 3.804493 0.00210611 11.97129
# units 1.000000 0.000000 1.00000000 1.00000
summary(mcmcglmm_O1_2) # 95%HPD Interval
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 268.7424
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 4.133 8.171e-05 14.26 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.2622 -2.7953 4.3184 1000 0.432
# logMass -0.1087 -0.5494 0.3449 1000 0.574
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.489 0.9695 2.098 1000
posterior_summary(mcmcglmm_O1_2$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 4.132637 6.62053 0.004598829 19.22872
# units 1.000000 0.00000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.262162 1.8674403 -2.1994721 5.0542830
# logMass -0.108659 0.2176232 -0.5751705 0.3314989
posterior_summary(mcmcglmm_O1_2$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.489256 0.3137594 1.014461 2.227824
<- 1
c2 <- mcmcglmm_O1_2$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O1_2$VCV / (1+c2) # for variance components
res_2 <- mcmcglmm_O1_2$VCV / sqrt(1+c2)
res_3 posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.89248312 1.3204797 -1.555262 3.5739178
# logMass -0.07683353 0.1538828 -0.406707 0.2344051
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.066318 3.310265 0.002299414 9.614361
# units 0.500000 0.000000 0.500000000 0.500000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.9222156 4.681422 0.003251863 13.5967596
# units 0.7071068 0.000000 0.707106781 0.7071068
#brms
summary(brms_OT2_1)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A))
# Data: Phasianidae_dat (Number of observations: 179)
# Draws: 2 chains, each with iter = 6000; warmup = 4000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 179)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.54 0.70 1.45 4.20 1.01 542 1179
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -0.21 1.09 -2.40 1.93 1.00 2080 2257
# Intercept[2] 1.33 1.10 -0.75 3.58 1.00 2219 2312
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
summary
No significant effect of body mass on habitat density: In all models, the posterior estimates of the slope for
logMass
were close to zero, with wide credible intervals that include zero. This suggests no strong evidence that body size influences habitat density preference in Phasianidae.Moderate phylogenetic signal: Phylogenetic variance was estimated as non-negligible across models (e.g., \(\lambda \approx 0.67–0.80\) after transformation), indicating moderate phylogenetic structuring in habitat density even after accounting for body mass.
One continuous and one categorical explanatory variable model
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(
prior1 R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Habitat.Density ~ logMass + Habitat.Density,
mcmcglmm_T2_3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
system.time(
<- MCMCglmm(Habitat.Density ~ logMass + Habitat.Density,
mcmcglmm_O2_3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
<- ape::vcv.phylo(tree, cor = TRUE)
A
<- default_prior(
default_priors3 ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
Habitat.Density data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
brm_OT2_3 formula = Habitat.Density ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
) )
# MCMCglmm
summary(mcmcglmm_T2_3)
# Iterations = 1800001:7794001
# Thinning interval = 6000
# Sample size = 1000
#
# DIC: 264.3581
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 13.42 1.332 33.11 869.4
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Habitat.Density ~ cTail_length + IsHarbivore
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.2636 -5.2133 2.9340 1000 0.498
# cTail_length 0.5337 -0.4854 1.6519 1000 0.302
# IsHarbivore 1.0925 0.1875 2.1084 1100 0.012 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitHabitat.Density.1 1.838 1.142 2.773 1000
posterior_summary(mcmcglmm_T1_3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.987738 4.411821 0.009463594 9.04495
# units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_T1_3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.3096841 1.2767628 -1.0290056 3.809598026
# logMass -0.2664037 0.1660313 -0.6334123 0.007099615
# Habitat.Densityopen 1.3153548 0.4385812 0.6134014 2.247115510
# Habitat.Densitysemi-open 0.3954618 0.3425752 -0.2200316 1.117789403
posterior_summary(mcmcglmm_T1_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O2_3) # 95%HPD Interval
# Iterations = 1800001:7794001
# Thinning interval = 6000
# Sample size = 1000
#
# DIC: 223.0502
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 24.41 2.961 64.89 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Habitat.Density ~ cTail_length + IsHarbivore
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.6861 -8.1646 3.3928 1000.0 0.520
# cTail_length 0.7124 -0.6599 2.2055 987.7 0.266
# IsHarbivore 1.5078 0.1850 2.8961 986.1 0.010 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitHabitat.Density.1 2.536 1.54 3.812 855.9
posterior_summary(mcmcglmm_O2_3$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 24.40718 24.4568 5.150294 82.38519
# units 1.00000 0.0000 1.000000 1.00000
posterior_summary(mcmcglmm_O2_3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.6861096 2.8536673 -8.1735105 3.363553
# cTail_length 0.7124418 0.7462599 -0.5101511 2.467996
# IsHarbivore 1.5078371 0.7204467 0.2713007 3.103070
posterior_summary(mcmcglmm_O2_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitHabitat.Density.1 2.53607 0.6201751 1.660939 4.032219
<- 1
c2 <- mcmcglmm_O2_3$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O2_3$VCV / (1+c2) # for variance components
res_2 <- mcmcglmm_O2_3$CP / sqrt(1+c2)
res_3
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.1922596 2.0178475 -5.7795447 2.378391
# cTail_length 0.5037724 0.5276855 -0.3607313 1.745137
# IsHarbivore 1.0662019 0.5094327 0.1918386 2.194202
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 12.20359 12.2284 2.575147 41.1926
# units 0.50000 0.0000 0.500000 0.5000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitHabitat.Density.1 1.793272 0.43853 1.174461 2.851209
# brms
summary(brms_OT2_3)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Habitat.Density ~ cTail_length + IsHarbivore + (1 | gr(Phylo, cov = A))
# Data: Phasianidae_dat (Number of observations: 179)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 179)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.90 0.92 1.57 5.05 1.00 428 833
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] 0.56 1.24 -1.96 3.00 1.00 1655 2098
# Intercept[2] 2.25 1.28 -0.18 4.85 1.00 1532 1706
# cTail_length 0.46 0.48 -0.39 1.50 1.00 1244 1370
# IsHarbivore 0.98 0.45 0.18 1.96 1.00 1524 1594
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
Summary
Diet is positively associated with habitat: The posterior estimates of the slope for
Diet
were positive and significant across all models, indicating that species with a herbivorous diet tend to occupy open habitats compared to those with other diets.No significant effect of body mass on habitat density: The posterior estimates of the slope for
logMass
were close to zero, with wide credible intervals that include zero, suggesting no strong evidence that body size influences habitat density preference in Phasianidae.
probabilities
This probabilities section uses the posterior summaries of fixed effects (linear predictors) and cut points from each of the three models (intercept-only, one predictor, two predictors) to calculate the predicted probabilities of each category in Habitat.Density.
<- function(cutpoint0, cutpoint1, l) {
calculate_probabilities2
<- pnorm(cutpoint0 - l)
category1_prob <- pnorm(cutpoint1 - l) - pnorm(cutpoint0 - l)
category2_prob <- 1 - pnorm(cutpoint1 - l)
category3_prob
return(c(Dense_habitats = category1_prob,
Semi_open_habitats = category2_prob,
Open_habitats = category3_prob)
)
}<- calculate_probabilities2(0, 1.01, 0.3178)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.3753183 0.3802758 0.2444059
<- calculate_probabilities2(-0.32, 0.66, 0)
probabilities_brms print(probabilities_brms)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.3744842 0.3708889 0.2546269
<- calculate_probabilities2(0, 1.052, 0.88928)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.1869263 0.3777042 0.4353694
<- calculate_probabilities2(-0.85, 0.15, 0)
probabilities_brms print(probabilities_brms)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.1976625 0.3619551 0.4403823
<- calculate_probabilities2(0, 1.116, 1.3097)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.09514867 0.32805672 0.57679460
<- calculate_probabilities2(-1.33, -0.26, 0)
probabilities_brms print(probabilities_brms)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.09175914 0.30567275 0.60256811
4. Nominal models
When the response variable has more than two categories and un-ordered, it is treated as a nominal variable.
Prior setting
Sometimes, model convergence can be challenging, particularly in discrete models. In such cases, it is time to take a step forward - changing the default prior to a custom prior (i.e. appropriately informative prior, not over-informed prior) can improve performance and convergence.
Fixed Effects
For fixed effects, custom prior adjustments are typically not necessary. Because data usually provides enough information to estimate fixed effects accurately, so the prior does not have as much effect. Fixed effect priors can be a good thing if you know some biological limits (Gaussian models - e.g. bird morphology - some values are not realistic). Also, in MCMCglmm
, the gelman.prior
function can be particularly useful for logit and probit models. This function allows users to define a reasonable prior covariance matrix for the fixed effects, enhancing model stability and convergence. Specifically…
- For logit regression, where the variance of the error distribution (with mean = 0) is pi^2 / 3, the prior accounts for this variance to remain flat on the probability scale.
- For probit regression, where the variance of the error distribution is 1, the prior is adjusted to achieve a similarly flat distribution.
By incorporating these considerations, gelman.prior
avoids excessive regularisation and ensures that the prior reflects the data while remaining weakly informative, as Gelman et al. (2008) recommended.
Random effects
Convergence issues often arise with random effects because variance components are inherently more challenging to estimate than fixed effects. This is because variance components represent variability around the mean, which requires a larger sample size to be estimated precisely. In contrast, fixed effects estimate the mean (e.g. the overall mean in an intercept-only model), which is more straightforward and usually requires a smaller sample size to stabilise. In such cases, using more informative priors can help, but careful consideration is necessary to balance convergence improvement and interpret model output.
Key points
Check autocorrelation: An auto-correlation plot helps assess the mixing of the chain. It shows the relationship between current and past samples in the MCMC chain. High auto-correlation (i.e., slow decay) indicates poor mixing because the chain is highly correlated across iterations, meaning that it is not exploring the parameter space efficiently. Ideally, the auto-correlation should drop to near zero quickly, indicating that the chain is moving smoothly and independently across the parameter space. - Lag (X-axis): The interval between the current and past samples. - Autocorrelation coefficient (Y-axis): Indicates the strength of the correlation, ranging from -1 to 1. A value close to 0 (excluding lag 0) is ideal.
In the context of MCMC, it would be important to differentiate between convergence and mixing, as these two terms are often conflated, but they refer to distinct aspects of the MCMC process. Convergence refers to whether the Markov chains have reached the true posterior distribution. For MCMC to be valid, the chains should eventually explore the entire parameter space of the target distribution. In many cases, this is evaluated by running multiple chains and checking whether they all converge to the same distribution (e.g., Rhat in brms
). A value of Rhat close to 1.0 suggests that the chains have converged to the same posterior distribution.
Mixing refers to how well the Markov chain explores the parameter space during the sampling process. Good mixing means that the chain moves smoothly across the parameter space, avoiding long periods of stagnation in any particular region. Poor mixing can result in the chain getting stuck in one area of the parameter space and not exploring others effectively.
InMCMCglmm
, since you can run a single chain per single model, you cannot directly check convergence of one model using multiple chains (but you can compare multiple chains using multipleMCMCglmm
models). Instead, auto-correlation plots can give you insights into the quality of the sampling.
Practical benefits of informative priors for random effects
Informative priors for random effects improve model convergence and reduce computation time by restricting the MCMC sampler to a smaller parameter space. Such priors can influence both the point estimates (often) and uncertainty (always) of random effects (i.e. variance and covariance). However, they typically have less impact on fixed effect estimates. This allows you to obtain fixed effect estimates more quickly while improving convergence by constraining random effects. If your primary interest lies in fixed effect estimates rather than random effects, you can take advantage of this approach.
Potential Problems with informative priors
Over-constrained parameters: Excessively narrow priors can provide us with incorrect parameter estimates by limiting them to a small range.
Small sample size issues: When sample sizes are small, informative priors may dominate the posterior distribution, overshadowing the data. This can lead to the model failing to capture the data’s variability and oversimplifies reality.
When you want to estimate random effects properly, you need to use uninformative prior. But as you already know, we are hard to get convergent… The things we can do are:
- increase iteration (this means increasing the number of samples from the posterior)
- use longer burn-in (
MCMCglmm
) / warm-up (brms
) period to ensure convergence - increase thinning in
MCMCglmm
- While thinning can enhance convergence inMCMCglmm
by reducing autocorrelation,brms
generally do not need to control thinning
Examples Here is example of uninformative vs informative prior…
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo
#uninformative
<- list(
prior1 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ trait -1,
mcmcglmm_m1_100 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior1,
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ trait -1,
mcmcglmm_m1_1000 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior1,
nitt = 13000*10000,
thin = 10*10000,
burnin = 3000*10000
)
)
# informative prior
<- list(
prior2 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 200,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ trait -1,
mcmcglmm_m2 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior2,
nitt = 13000*75,
thin = 10*75,
burnin = 3000*75
) )
You can see that the increasing iteration, thinning, and burn-in can promote increasing effect sampling size and improving mixing.
summary(mcmcglmm_m1_100)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Sample size = 1000
#
# DIC: 259.4007
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 31.18 1.252 83.25 37.85
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -12.08 -70.340 59.37 41.96
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -12.08 -70.340 59.37 41.96
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 104.23 1.118 408.78 28.75
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -1.628 -10.272 7.019 148.9 0.662
# traitPrimary.Lifestyle.Terrestrial 5.279 -7.006 19.131 125.6 0.314
summary(mcmcglmm_m1_1000)
# Iterations = 3000001:12990001
# Thinning interval = 10000
# Sample size = 1000
#
# DIC: 273.7709
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 30.88 1.314 99.34 270.66
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -25.45 -94.475 29.87 218.26
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -25.45 -94.475 29.87 218.26
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 81.67 1.144 369.51 89.38
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -1.3033 -5.3581 2.3037 1000 0.478
# traitPrimary.Lifestyle.Terrestrial 0.9187 -3.7341 5.2263 1000 0.676
summary(mcmcglmm_m2)
# Iterations = 225001:974251
# Thinning interval = 750
# Sample size = 1000
#
# DIC: 298.0892
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 6.0759 0.2940 12.1120 731.3
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.0885 -0.9723 0.9838 1000.0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.0885 -0.9723 0.9838 1000.0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 6.8348 1.5880 13.6006 820.7
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -0.3218 -3.7935 3.0988 1000.0 0.908
# traitPrimary.Lifestyle.Terrestrial 2.2780 -1.3172 5.8322 888.4 0.180
autocorr.plot(mcmcglmm_m1_100$VCV)
autocorr.plot(mcmcglmm_m1_1000$VCV)
autocorr.plot(mcmcglmm_m2$VCV)
Explanation of dataset
Datasets for example 1
This dataset focuses on Turdidae species. A binary variable IsOmnivore based on trophic level is created.
<- read.nexus(here("data", "potential", "avonet", "trees.nex"))
trees <- trees[[1]]
tree <- read.csv(here("data", "potential", "avonet", "turdidae.csv"))
dat
# Check tree$tip.label matches dat$Scientific_name
<- setequal(tree$tip.label, dat$Phylo)
match_result
# Centering for continuous variables
<- dat %>%
dat mutate(
log_Mass_centered = scale(log(Mass), center = TRUE, scale = FALSE),
log_Tail_Length_centered = scale(log(Tail.Length), center = TRUE, scale = FALSE)
)
<- dat %>%
dat mutate(across(c(Trophic.Level, Trophic.Niche, Primary.Lifestyle, Migration, Habitat, Species.Status), as.factor))
$IsOmnivore <- ifelse(dat$Trophic.Level == "Omnivore", 1, 0) dat
Datasets for example 2
For Example 2, data for the Sylviidae (warblers) were extracted from the AVONET dataset. The habitat categories were reclassified, migratory status was converted into a binary variable, and body mass was log-transformed and centered, resulting in a clean dataset suitable for subsequent statistical modelling.
<- read.csv(here("data", "bird body mass", "9993spp_clearned.csv"))
dat <- dat %>%
dat mutate(across(c(Trophic.Level, Trophic.Niche,
Primary.Lifestyle, Migration, Habitat, Habitat.Density, Species.Status), as.factor),Habitat.Density = factor(Habitat.Density, ordered = TRUE))
<- dat %>%
dat mutate(
Habitat_Category = case_when(
%in% c("Desert", "Rock", "Grassland", "Shrubland") ~ "Arid_Open",
Habitat %in% c("Woodland", "Forest") ~ "Forested_Vegetated",
Habitat == "Human modified" ~ "Human-Modified",
Habitat %in% c("Riverine", "Coastal", "Marine", "Wetland") ~ "Aquatic_Coastal",
Habitat TRUE ~ "Other"
)
)
<- dat %>%
Family_habitat group_by(Family, Habitat_Category) %>%
tally() %>%
group_by(Family) %>%
ungroup()
<- dat %>%
Sylviidae_dat filter(Family == "Sylviidae") %>%
mutate(Habitat_Category = factor(Habitat_Category, ordered = FALSE))
<- Sylviidae_dat %>%
Sylviidae_dat mutate(IsMigrate = ifelse(Migration == "1", 1, 0)
)table(Sylviidae_dat$Habitat_Category)
Aquatic_Coastal Arid_Open Forested_Vegetated Other
34 90 168 2
<- Sylviidae_dat %>%
Sylviidae_dat mutate(Habitat_Category = recode(Habitat_Category,
"Aquatic_Coastal" = "Others",
"Other" = "Others"))
$cMass <- scale(log(Sylviidae_dat$Mass), center = TRUE, scale = FALSE)
Sylviidae_dat
<- read.nexus(here("data", "bird body mass", "Sylviidae.nex"))
Sylviidae_tree <- Sylviidae_tree[[1]] tree
Example 1
Intercept-only model
MCMCglmm
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(
prior5 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
<-MCMCglmm(Primary.Lifestyle ~ trait -1,
mcmcglmm_mn1_1 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
) )
brms
<- ape::vcv.phylo(tree, corr = TRUE)
A <- default_prior(Primary.Lifestyle ~ 1 + (1 |a| gr(Phylo, cov = A)),
priors_brms1 data = dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Primary.Lifestyle ~ 1 + (1 |a| gr(Phylo, cov = A)),
brms_mn1_1 data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
) )
As well as the binary model in MCMCglmm
, nominal model is need to correct the estimates obtained from MCMCglmm
as MCMCglmm
implements additive overdispersion GLMM fitted by MCMC sampling from the posterior distribution. Again, we can use the below code:
<- (16 * sqrt(3)/(15 * pi))^2
c2 <- (16 * sqrt(3)/(15 * pi))^2*(2/3) c2a
The results are…
summary(mcmcglmm_mn1_1)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Sample size = 1000
#
# DIC: 295.756
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 7.338 0.8372 15.386 764.5
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.164 -5.3745 2.788 885.3
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.164 -5.3745 2.788 885.3
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 8.168 1.1463 18.245 615.7
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -0.4015 -4.0226 2.5627 1000 0.804
# traitPrimary.Lifestyle.Terrestrial 1.6199 -1.2917 4.6445 1000 0.270
<- mcmcglmm_mn1_1$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn1_1$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn1_1$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_1$VCV[, 1] * mcmcglmm_mn1_1$VCV[, 4])/(1+c2a))
res_3_corr_phylo
summary(res_1)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial -0.362 1.480 0.04680 0.04680
# traitPrimary.Lifestyle.Terrestrial 1.460 1.385 0.04379 0.04379
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial -3.475 -1.2921 -0.3067 0.5483 2.605
# traitPrimary.Lifestyle.Terrestrial -1.125 0.5264 1.4485 2.3820 4.226
summary(res_2)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 5.9634 3.486 0.11023 0.12608
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.9456 1.643 0.05197 0.05524
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.9456 1.643 0.05197 0.05524
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 6.6372 4.233 0.13385 0.17058
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.000 0.00000 0.00000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.000 0.00000 0.00000
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.000 0.00000 0.00000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 1.6277 3.4946 5.2197 7.52933 14.5903
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -4.7030 -1.7201 -0.7662 -0.06118 2.0866
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -4.7030 -1.7201 -0.7662 -0.06118 2.0866
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.7250 3.8512 5.6452 8.20433 17.3970
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.5418 0.5418 0.54176 0.5418
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.2709 0.2709 0.27088 0.2709
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.2709 0.2709 0.27088 0.2709
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.5418 0.5418 0.54176 0.5418
summary(res_3_corr_phylo)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.144879 0.207344 0.006557 0.006916
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.51690 -0.28714 -0.14528 -0.01081 0.26479
summary(brms_mn1_1)
# Family: categorical
# Links: muInsessorial = logit; muTerrestrial = logit
# Formula: Primary.Lifestyle ~ 1 + (1 | a | gr(Phylo, cov = A))
# Data: dat (Number of observations: 173)
# Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
# total post-warmup draws = 10000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 173)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muInsessorial_Intercept) 4.76 2.40 1.96 10.81 1.00 1550 2181
# sd(muTerrestrial_Intercept) 5.76 3.37 2.14 14.61 1.00 1215 2349
# cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.28 0.36 -0.88 0.46 1.00 858 1508
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muInsessorial_Intercept -0.55 2.03 -4.86 3.32 1.00 4681 5207
# muTerrestrial_Intercept 1.30 2.29 -2.87 6.21 1.00 4028 4113
When comparing the results, the fixed effects estimated by both MCMCglmm
and brms
are similar. However, there are notable differences in the random effect variances and correlations. The variances estimated by brms
show higher uncertainty, with wider 95% credible intervals, and the correlations are closer to zero. This discrepancy is likely due to the informative priors used in MCMCglmm
to prioritize model convergence.
Phylogenetic signals
And phylogenetic signal is…
# average phylogenetic signal - both vs. left
## MCMCglmm
<- ((mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo"]/(1+c2a)) / (mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo"]/(1+c2a) + 1))
phylo_signalI_mcmcglmm
%>% mean()
phylo_signalI_mcmcglmm # [1] 0.8207548
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalI_mcmcglmm # 2.5% 50% 97.5%
# 0.6194383 0.8392215 0.9358569
## brms
<- brms_mn1_1 %>% as_tibble() %>%
phylo_signalI_brms ::select(Sigma_phy = sd_Phylo__muInsessorial_Intercept) %>%
dplyrmutate(lambda_nominalI = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalI)
%>% mean()
phylo_signalI_brms # [1] 0.9330662
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalI_brms # 2.5% 50% 97.5%
# 0.7940292 0.9474234 0.9915224
# average phylogenetic signal - both vs. right
## MCMCglmm
<- ((mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo"]/(1+c2a)) / (mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo"]/(1+c2a) + 1))
phylo_signalT_mcmcglmm
%>% mean()
phylo_signalT_mcmcglmm # [1] 0.8336959
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalT_mcmcglmm # 2.5% 50% 97.5%
# 0.6330234 0.8495150 0.9456432
## brms
<- brms_mn1_1 %>% as_tibble() %>%
phylo_signalT_brms ::select(Sigma_phy = sd_Phylo__muTerrestrial_Intercept) %>%
dplyrmutate(lambda_nominalT = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalT)
%>% mean()
phylo_signalT_brms # [1] 0.9477057
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalT_brms # 2.5% 50% 97.5%
# 0.8209225 0.9606255 0.9953342
Across both modelling frameworks, Primary.Lifestyle
categories show high phylogenetic signal.
For the Insessorial category:
MCMCglmm
estimates \(\lambda \approx 0.84\) (95% CI: 0.62–0.94)brms
estimates \(\lambda \approx 0.95\) (95% CI: 0.79–0.99)
For the Terrestrial category:
MCMCglmm
estimates \(\lambda \approx 0.85\) (95% CI: 0.63–0.95)brms
estimates \(\lambda \approx 0.96\) (95% CI: 0.82–1.00)
This indicates a strong phylogenetic signal in the distribution of Primary.Lifestyle
categories, suggesting that closely related species tend to share similar lifestyle traits.
One continuous explanatory variable model
MCMCglmm
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(
prior5 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ log_Tail_Length_centered:trait + trait -1,
mcmcglmm_mn1_2 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*200,
thin = 10*200,
burnin = 3000*200
) )
brms
<- default_prior(Primary.Lifestyle ~ log_Tail_Length_centered + (1 |a| gr(Phylo, cov = A)),
priors_brms2 data = dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Primary.Lifestyle ~ log_Tail_Length_centered + (1 |a| gr(Phylo, cov = A)),
brms_m1_2 data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms2,
iter = 10000,
warmup = 5000,
chains = 2,
thin = 1
) )
Output correction and summary
summary(mcmcglmm_mn1_2)
# Iterations = 600001:2598001
# Thinning interval = 2000
# Sample size = 1000
#
# DIC: 279.4446
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 18.854 1.984 42.79 542.4
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.169 -10.648 8.87 784.9
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.169 -10.648 8.87 784.9
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 18.974 2.593 44.49 499.9
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ log_Tail_Length_centered:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -1.072 -7.345 5.322 912.0 0.744
# traitPrimary.Lifestyle.Terrestrial 3.279 -1.929 9.466 855.8 0.202
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.187 -5.100 2.964 908.8 0.594
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -2.694 -6.748 1.571 1000.0 0.178
# res_1 <- mcmcglmm_mn1_2$Sol / sqrt(1+c2a) # for fixed effects
# res_2 <- mcmcglmm_mn1_2$VCV / (1+c2a) # for variance components
# res_3_corr_phylo <- (mcmcglmm_mn1_2$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_2$VCV[, 1] * mcmcglmm_mn1_2$VCV[, 4])/(1+c2a))
summary(res_1)
# Iterations = 600001:2598001
# Thinning interval = 2000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial -0.9666 2.803 0.08865 0.09283
# traitPrimary.Lifestyle.Terrestrial 2.9557 2.670 0.08443 0.09126
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.0701 1.953 0.06176 0.06479
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -2.4283 1.909 0.06037 0.06037
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial -7.020 -2.654 -0.8061 0.7953 4.602
# traitPrimary.Lifestyle.Terrestrial -1.639 1.231 2.6791 4.4126 8.724
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -4.746 -2.317 -1.0952 0.2483 2.557
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -6.266 -3.592 -2.4131 -1.1127 1.320
summary(res_2)
# Iterations = 600001:2598001
# Thinning interval = 2000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 15.3215 10.870 0.3437 0.4667
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.9500 4.038 0.1277 0.1441
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.9500 4.038 0.1277 0.1441
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 15.4192 10.808 0.3418 0.4834
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 3.2225 8.5847 12.7689 19.2764 42.4203
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -9.2289 -2.7137 -0.8335 1.1495 6.6341
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -9.2289 -2.7137 -0.8335 1.1495 6.6341
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 3.4482 8.1572 12.7788 19.6677 43.7071
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 600001:2598001
# Thinning interval = 2000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.064355 0.212862 0.006731 0.006731
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.46205 -0.21687 -0.07002 0.08667 0.35059
summary(brms_mn1_2)
# Family: categorical
# Links: muInsessorial = logit; muTerrestrial = logit
# Formula: Primary.Lifestyle ~ log_Tail_Length_centered + (1 | a | gr(Phylo, cov = A))
# Data: dat (Number of observations: 173)
# Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
# total post-warmup draws = 10000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 173)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muInsessorial_Intercept) 6.14 3.83 2.13 15.68 1.00 1627 2326
# sd(muTerrestrial_Intercept) 6.90 4.68 2.32 19.89 1.00 1412 1902
# cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.15 0.40 -0.84 0.64 1.00 1304 2835
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muInsessorial_Intercept -0.53 2.29 -5.38 3.86 1.00 12414 6264
# muTerrestrial_Intercept 1.34 2.45 -3.08 6.58 1.00 10541 5718
# muInsessorial_log_Tail_Length_centered -2.17 3.08 -9.60 2.81 1.00 4513 3030
# muTerrestrial_log_Tail_Length_centered -2.95 3.18 -9.99 2.58 1.00 4134 2746
Both MCMCglmm
and brms
indicate very high phylogenetic variance for both Insessorial and Terrestrial lifestyles with only weak and uncertain phylogenetic correlation between the two categories. The differences in variance estimates likely stem from the informative priors used in MCMCglmm
to aid convergence.
One continuous and one binary explanatory variables model
MCMCglmm
<- list(
prior5 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ log_Tail_Length_centered:trait + IsOmnivore:trait + trait -1,
mcmcglmm_mn1_3 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*250,
thin = 10*250,
burnin = 3000*250
) )
brms
system.time(
<- brm(
brms_mn1_3 ~ log_Tail_Length_centered + IsOmnivore + (1 |a| gr(Phylo, cov = A)),
Primary.Lifestyle data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = custom_priors,
iter = 15000,
warmup = 5000,
chains = 2,
thin = 1
) )
Outputs are…
summary(mcmcglmm_mn1_3)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Sample size = 1000
#
# DIC: 244.733
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 22.932 2.484e+00 51.826 581.7
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.274 -1.179e+01 8.518 899.2
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.274 -1.179e+01 8.518 899.2
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 15.898 3.475e-04 44.088 627.4
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ log_Tail_Length_centered:trait + IsOmnivore:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -1.01465 -8.38540 5.59217 894.2 0.818
# traitPrimary.Lifestyle.Terrestrial 3.98040 -1.31604 9.80666 1200.6 0.092 .
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.23330 -6.68483 2.66615 1000.0 0.610
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -1.89662 -6.62854 2.37056 1113.2 0.406
# traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.08497 -1.87839 2.24356 835.3 0.962
# traitPrimary.Lifestyle.Terrestrial:IsOmnivore -5.14025 -8.12432 -2.71908 848.1 <0.001 ***
<- mcmcglmm_mn1_3$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn1_3$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn1_3$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_3$VCV[, 1] * mcmcglmm_mn1_3$VCV[, 4])/(1+c2a))
res_3_corr_phylo <- (mcmcglmm_mn1_3$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn1_3$VCV[, 5] * mcmcglmm_mn1_3$VCV[, 8])/(1+c2a))
res_3_corr_nonphylo
summary(res_1)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial -0.9147 3.2692 0.10338 0.10932
# traitPrimary.Lifestyle.Terrestrial 3.5882 2.4961 0.07893 0.07204
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.1118 2.1376 0.06760 0.06760
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -1.7097 2.0720 0.06552 0.06210
# traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.0766 0.9584 0.03031 0.03316
# traitPrimary.Lifestyle.Terrestrial:IsOmnivore -4.6338 1.2856 0.04065 0.04415
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial -7.874 -2.8074 -0.64157 1.1515 4.945
# traitPrimary.Lifestyle.Terrestrial -1.002 1.8582 3.25656 5.0788 9.368
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -5.840 -2.4495 -0.92478 0.3622 2.750
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -5.978 -3.0120 -1.64566 -0.2734 2.103
# traitPrimary.Lifestyle.Insessorial:IsOmnivore -1.675 -0.5924 0.06417 0.7068 2.066
# traitPrimary.Lifestyle.Terrestrial:IsOmnivore -7.561 -5.3247 -4.47341 -3.6857 -2.669
summary(res_2)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 18.6355 11.978 0.3788 0.4966
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.0351 3.971 0.1256 0.1324
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.0351 3.971 0.1256 0.1324
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 12.9191 11.113 0.3514 0.4437
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 4.0308 10.2520 15.9642 24.0021 49.6277
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -10.0779 -2.7629 -0.7674 0.8149 6.5968
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -10.0779 -2.7629 -0.7674 0.8149 6.5968
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.4373 4.6612 10.1930 17.3835 44.3732
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.069527 0.220096 0.006960 0.006403
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.49120 -0.21345 -0.07966 0.08421 0.37013
summary(res_3_corr_nonphylo)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.4507 0.0000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn1_3)
# Family: categorical
# Links: muInsessorial = logit; muTerrestrial = logit
# Formula: Primary.Lifestyle ~ log_Tail_Length_centered + IsOmnivore + (1 | a | gr(Phylo, cov = A))
# Data: dat (Number of observations: 173)
# Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
# total post-warmup draws = 10000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 173)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muInsessorial_Intercept) 7.18 4.74 2.26 20.74 1.00 1052 1590
# sd(muTerrestrial_Intercept) 5.76 4.05 0.63 16.04 1.00 920 963
# cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.24 0.42 -0.92 0.64 1.00 1248 2800
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muInsessorial_Intercept -0.96 2.39 -6.16 3.36 1.00 3821 3835
# muTerrestrial_Intercept 3.72 2.49 0.09 9.85 1.00 2250 2363
# muInsessorial_log_Tail_Length_centered -2.05 3.19 -9.66 3.14 1.00 2889 2273
# muInsessorial_IsOmnivore 0.87 1.76 -1.87 5.32 1.00 1783 1843
# muTerrestrial_log_Tail_Length_centered -1.50 2.89 -7.61 4.13 1.00 4115 3124
# muTerrestrial_IsOmnivore -6.36 3.10 -14.70 -2.92 1.00 1269 1638
- The effect of tail length on lifestyle probability was negative for both categories in both models, but the 95% CIs include zero, indicating no strong evidence for an effect.
- Results from
MCMCglmm
andbrms
were broadly consistent in magnitude and direction of effects. - Only weak and uncertain phylogenetic correlation between the two categories.
Example 2
We use the data of habitat category on Sylviidae (Old World warbler - 294 species) as the response variable. Mass and presence or absence of migration are used as the explanatory variable.
Intercept-only model
MCMCglmm
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(
prior R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
<- MCMCglmm(Habitat_Category ~ trait -1,
mcmcglmm_mn2_1 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*500,
thin = 10*500,
burnin = 3000*500
)
brms
<- ape::vcv.phylo(tree, corr = TRUE)
A <- default_prior(Habitat_Category ~ 1 + (1 |a| gr(Phylo, cov = A)),
priors_brms1 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Habitat_Category ~ 1 + (1 |a| gr(Phylo, cov = A)),
brms_mn2_1 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
) )
Before checking the output,we need to correct the results from MCMCglmm
using the following equation:
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- c2*(2/3) c2a
summary(mcmcglmm_mn2_1)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Sample size = 1000
#
# DIC: 379.9416
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.151 1.003 16.839 854.5
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.010 -2.392 7.668 1000.0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.010 -2.392 7.668 1000.0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 14.710 3.210 29.105 784.9
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667 0.6667 0.6667 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667 0.6667 0.6667 0
#
# Location effects: Habitat_Category ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitHabitat_Category.Arid_Open 2.44871 0.09382 4.88885 905.8 0.036 *
# traitHabitat_Category.Forested_Vegetated 4.56420 1.35941 7.49998 892.7 <0.001 ***
<- mcmcglmm_mn2_1$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn2_1$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn2_1$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_1$VCV[, 1] * mcmcglmm_mn2_1$VCV[, 4])/(1+c2a))
res_3_corr_phylo <- (mcmcglmm_mn2_1$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn2_1$VCV[, 5] * mcmcglmm_mn2_1$VCV[, 8])/(1+c2a))
res_3_corr_nonphylo
summary(res_1)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open 2.207 1.140 0.03605 0.03788
# traitHabitat_Category.Forested_Vegetated 4.114 1.417 0.04480 0.04741
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open 0.1303 1.522 2.095 2.828 4.518
# traitHabitat_Category.Forested_Vegetated 1.5750 3.173 4.004 4.958 7.352
summary(res_2)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 5.8115 4.203 0.13290 0.14377
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.6331 2.320 0.07337 0.07337
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.6331 2.320 0.07337 0.07337
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 11.9537 6.815 0.21551 0.24325
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.3308 3.1661 4.8034 6.9772 15.7767
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -1.5487 0.2950 1.2254 2.4747 7.0730
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -1.5487 0.2950 1.2254 2.4747 7.0730
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.0126 7.3840 10.4777 14.8650 28.3674
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.172049 0.184925 0.005848 0.005848
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.17512 0.03699 0.17474 0.30106 0.51285
summary(res_3_corr_nonphylo)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.4507 0.0000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn2_1)
# Family: categorical
# Links: muAridOpen = logit; muForestedVegetated = logit
# Formula: Habitat_Category ~ 1 + (1 | a | gr(Phylo, cov = A))
# Data: Sylviidae_dat (Number of observations: 294)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 294)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muAridOpen_Intercept) 2.58 0.90 1.24 4.69 1.00 4049 7462
# sd(muForestedVegetated_Intercept) 4.14 1.33 2.20 7.19 1.00 2119 5431
# cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.44 0.31 -0.25 0.90 1.00 1116 2854
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muAridOpen_Intercept 2.24 1.17 -0.05 4.66 1.00 3552 6462
# muForestedVegetated_Intercept 3.62 1.49 0.88 6.83 1.00 12256 10030
And average phylogenetic signals are…
# average phylogenteic signal - others vs. Arid and open
# MCMCglmm
<- ((mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo"]/(1+c2a)) / (mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo"]/(1+c2a) + 1))
phylo_signalA_mcmcglmm_nominal
%>% mean()
phylo_signalA_mcmcglmm_nominal # [1] 0.808917
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalA_mcmcglmm_nominal # 2.5% 50% 97.5%
# 0.5709550 0.8276878 0.9403931
## brms
<- brms_mn2_1 %>% as_tibble() %>%
phylo_signalA_brms ::select(Sigma_phy = sd_Phylo__muAridOpen_Intercept) %>%
dplyrmutate(lambda_nominalA = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalA)
%>% mean()
phylo_signalA_brms # [1] 0.8370854
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalA_brms # 2.5% 50% 97.5%
# 0.6055780 0.8563134 0.9565012
# average phylogenteic signal - both vs. right
## MCMCglmm
<- ((mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo"]/(1+c2a)) / (mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo"]/(1+c2a) + 1))
phylo_signalF_mcmcglmm_nominal
%>% mean()
phylo_signalF_mcmcglmm_nominal # [1] 0.9044678
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalF_mcmcglmm_nominal # 2.5% 50% 97.5%
# 0.8004989 0.9128743 0.9659487
# brms
<- brms_mn2_1 %>% as_tibble() %>%
phylo_signalF_brms ::select(Sigma_phy = sd_Phylo__muForestedVegetated_Intercept) %>%
dplyrmutate(lambda_nominalF = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalF)
%>% mean()
phylo_signalF_brms # [1] 0.9304826
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalF_brms # 2.5% 50% 97.5%
# 0.8292983 0.9394892 0.9810195
Both models detected high phylogenetic signal for Habitat_Category, especially for Forested_Vegetated and slightly lower for Arid_Open category. The phylogenetic correlation between the two habitat categories tended to be positive, but the 95% credible intervals included zero in both models, indicating that the correlation was not statistically supported.
One continuous explanatory variable model
MCMCglmm
<- MCMCglmm(Habitat_Category ~ cMass:trait + trait -1,
mcmcglmm_mn2_2 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*500,
thin = 10*500,
burnin = 3000*500
)
brms
<- ape::vcv.phylo(tree, corr = TRUE)
A <- default_prior(Habitat_Category ~ cMass + (1 |a| gr(Phylo, cov = A)),
priors_brms2 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Habitat_Category ~ cMass + (1 |a| gr(Phylo, cov = A)),
brms_mn2_2 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms2,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
) )
summary(mcmcglmm_mn2_2)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Sample size = 1000
#
# DIC: 369.9206
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.923 1.119 18.446 649.8
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.344 -2.666 9.503 770.0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.344 -2.666 9.503 770.0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 18.943 4.131 41.543 745.4
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667 0.6667 0.6667 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667 0.6667 0.6667 0
#
# Location effects: Habitat_Category ~ cMass:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitHabitat_Category.Arid_Open 2.62403 -0.08498 5.40782 1000.0 0.048 *
# traitHabitat_Category.Forested_Vegetated 4.74629 1.55204 8.47272 1000.0 0.002 **
# cMass:traitHabitat_Category.Arid_Open -0.79705 -2.34051 0.62992 1000.0 0.294
# cMass:traitHabitat_Category.Forested_Vegetated 1.03745 -0.87700 2.97350 892.6 0.252
<- mcmcglmm_mn2_2$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn2_2$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn2_2$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_2$VCV[, 1] * mcmcglmm_mn2_2$VCV[, 4])/(1+c2a))
res_3_corr_phylo <- (mcmcglmm_mn2_2$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn2_2$VCV[, 5] * mcmcglmm_mn2_2$VCV[, 8])/(1+c2a))
res_3_corr_nonphylo
summary(res_1)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open 2.207 1.140 0.03605 0.03788
# traitHabitat_Category.Forested_Vegetated 4.114 1.417 0.04480 0.04741
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open 0.1303 1.522 2.095 2.828 4.518
# traitHabitat_Category.Forested_Vegetated 1.5750 3.173 4.004 4.958 7.352
summary(res_2)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 5.8115 4.203 0.13290 0.14377
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.6331 2.320 0.07337 0.07337
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.6331 2.320 0.07337 0.07337
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 11.9537 6.815 0.21551 0.24325
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.3308 3.1661 4.8034 6.9772 15.7767
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -1.5487 0.2950 1.2254 2.4747 7.0730
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -1.5487 0.2950 1.2254 2.4747 7.0730
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.0126 7.3840 10.4777 14.8650 28.3674
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.172049 0.184925 0.005848 0.005848
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.17512 0.03699 0.17474 0.30106 0.51285
summary(res_3_corr_nonphylo)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.4507 0.0000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn2_2)
# Family: categorical
# Links: muAridOpen = logit; muForestedVegetated = logit
# Formula: Habitat_Category ~ cMass + (1 | a | gr(Phylo, cov = A))
# Data: Sylviidae_dat (Number of observations: 294)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 294)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muAridOpen_Intercept) 2.63 0.94 1.22 4.90 1.00 3376 6591
# sd(muForestedVegetated_Intercept) 5.00 1.69 2.60 9.08 1.00 1488 4134
# cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.48 0.31 -0.23 0.93 1.00 794 1765
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muAridOpen_Intercept 2.47 1.20 0.18 5.03 1.00 2925 5132
# muForestedVegetated_Intercept 3.59 1.72 0.41 7.25 1.00 10315 9625
# muAridOpen_cMass -0.55 0.75 -2.08 0.92 1.00 8518 11678
# muForestedVegetated_cMass 1.29 1.08 -0.62 3.61 1.00 6534 8329
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
Summary The model intercept represents the log-odds of belonging to a given habitat category when predictors are at their reference values (here, cMass = 0). For Arid/Open and Forested/Vegetated, the intercepts were positive, and in most cases the 95% credible intervals did not include zero. This means that, under baseline conditions, the probability of belonging to these two habitat categories is higher than for the reference category (Aquatic/Coastal). In other words, the model predicts that species are more likely to occur in Arid/Open or Forested/Vegetated habitats than in Aquatic/Coastal habitats under the reference conditions.
Mass effects: The effect of centred log body mass (cMass) was negative for Arid/Open and positive for Forested/Vegetated, but in both models the 95% credible intervals crossed zero, suggesting no clear statistical support.
Phylogenetic variances: Substantial phylogenetic variance was observed for both habitat categories, indicating moderate to strong phylogenetic structuring.
Phylogenetic correlation: The correlation between habitat categories was small and positive but highly uncertain, with 95% credible intervals spanning zero.
One continuous and one binary explanatory variables model
MCMCglmm
<- MCMCglmm(Habitat_Category ~ cMass:trait + IsMigrate:trait + trait -1,
mcmcglmm_mn2_3 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*750,
thin = 10*750,
burnin = 3000*750
)
brms
<- ape::vcv.phylo(tree, corr = TRUE)
A <- default_prior(Habitat_Category ~ cMass + IsMigrate + (1 |a| gr(Phylo, cov = A)),
priors_brms3 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Habitat_Category ~ cMass + IsMigrate + (1 |a| gr(Phylo, cov = A)),
brms_mn2_3 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms3,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
) )
summary(mcmcglmm_mn2_3)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Sample size = 1000
#
# DIC: 353.666
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 9.083 1.196 21.74 718.3
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.660 -4.574 8.40 779.0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.660 -4.574 8.40 779.0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 20.892 2.772 44.08 678.4
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667 0.6667 0.6667 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667 0.6667 0.6667 0
#
# Location effects: Habitat_Category ~ cMass:trait + IsMigrate:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitHabitat_Category.Arid_Open 2.3783 -0.5852 5.1494 988.4 0.090 .
# traitHabitat_Category.Forested_Vegetated 2.4444 -0.8470 6.7465 1000.0 0.172
# cMass:traitHabitat_Category.Arid_Open -0.8778 -2.6132 0.6125 1588.3 0.258
# cMass:traitHabitat_Category.Forested_Vegetated 0.8132 -1.2024 2.7663 1000.0 0.424
# traitHabitat_Category.Arid_Open:IsMigrate 0.4607 -0.9485 1.8999 905.7 0.544
# traitHabitat_Category.Forested_Vegetated:IsMigrate 3.0348 1.1467 5.1125 1000.0 <0.001 ***
<- mcmcglmm_mn2_3$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn2_3$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn2_3$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_3$VCV[, 1] * mcmcglmm_mn2_3$VCV[, 4])/(1+c2a))
res_3_corr_phylo <- (mcmcglmm_mn2_3$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn2_3$VCV[, 5] * mcmcglmm_mn2_3$VCV[, 8])/(1+c2a))
res_3_corr_nonphylo
summary(res_1)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open 2.1440 1.2735 0.04027 0.04051
# traitHabitat_Category.Forested_Vegetated 2.2035 1.7196 0.05438 0.05438
# cMass:traitHabitat_Category.Arid_Open -0.7913 0.7334 0.02319 0.01840
# cMass:traitHabitat_Category.Forested_Vegetated 0.7331 0.9346 0.02955 0.02955
# traitHabitat_Category.Arid_Open:IsMigrate 0.4153 0.6657 0.02105 0.02212
# traitHabitat_Category.Forested_Vegetated:IsMigrate 2.7358 0.9074 0.02870 0.02870
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open -0.4477 1.37995 2.1559 2.8690 4.8198
# traitHabitat_Category.Forested_Vegetated -0.9808 1.11585 2.1584 3.1992 5.9649
# cMass:traitHabitat_Category.Arid_Open -2.3303 -1.24354 -0.7433 -0.2922 0.5856
# cMass:traitHabitat_Category.Forested_Vegetated -1.0181 0.08259 0.6961 1.3172 2.7128
# traitHabitat_Category.Arid_Open:IsMigrate -0.9151 -0.03301 0.4117 0.8748 1.7025
# traitHabitat_Category.Forested_Vegetated:IsMigrate 1.1262 2.12920 2.6469 3.2897 4.723
summary(res_2)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.3814 5.319 0.16821 0.19848
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.3490 2.643 0.08358 0.09469
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.3490 2.643 0.08358 0.09469
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 16.9777 10.703 0.33847 0.41092
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.5684 3.7512 5.9438 9.6229 20.8510
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -3.6719 -0.1351 1.1143 2.6647 7.1611
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -3.6719 -0.1351 1.1143 2.6647 7.1611
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.2987 9.7029 14.3953 21.6290 43.4114
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.117832 0.197752 0.006253 0.006253
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.27624 -0.01752 0.12623 0.25980 0.49079
summary(res_3_corr_nonphylo)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.4507 0.0000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn2_3)
# Family: categorical
# Links: muAridOpen = logit; muForestedVegetated = logit
# Formula: Habitat_Category ~ cMass + IsMigrate + (1 | a | gr(Phylo, cov = A))
# Data: Sylviidae_dat (Number of observations: 294)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 294)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muAridOpen_Intercept) 2.93 1.24 1.28 5.93 1.00 2322 3372
# sd(muForestedVegetated_Intercept) 6.45 3.41 2.65 15.31 1.00 1254 1903
# cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.35 0.35 -0.42 0.90 1.00 680 1883
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muAridOpen_Intercept 2.08 1.30 -0.42 4.81 1.00 3324 5760
# muForestedVegetated_Intercept 0.81 2.34 -4.26 5.08 1.00 4304 3182
# muAridOpen_cMass -0.64 0.81 -2.26 0.92 1.00 4954 9099
# muAridOpen_IsMigrate 0.53 0.72 -0.92 1.94 1.00 5983 9135
# muForestedVegetated_cMass 1.44 1.38 -0.81 4.62 1.00 3507 3935
# muForestedVegetated_IsMigrate 3.63 1.73 1.37 7.83 1.00 2677 2363
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).
Summary
Baseline habitat use: When migration status and body mass are not considered, Sylviidae species are somewhat more likely to be in Arid/Open or Forested/Vegetated habitats than in the reference category (Aquatic/Coastal), but the statistical uncertainty is large enough that we cannot be confident in these differences. Body mass effects: There is no clear evidence that heavier or lighter species prefer either Arid/Open or Forested/Vegetated habitats.
Migration effects: Migratory species are much more likely to be associated with Forested/Vegetated habitats compared to Aquatic/Coastal ones, even after accounting for body mass and phylogeny. Migration does not appear to strongly influence the likelihood of using Arid/Open habitats.
Phylogenetic patterns: The evolutionary tendencies for using Arid/Open vs. Forested/Vegetated habitats are only weakly related, and the data do not clearly indicate a consistent positive or negative correlation across the family’s evolutionary history.
Extension: models with multiple data point per species
Tips for simulating dataset
Unfortunately, there are few published dataset for multiple datapoints per species. Therefore, we generate 1. Gaussian, 2. binary. 3. ordinal, and 4. nominal data for this section and use for MCMCglmm
and brms
models.
Before moving each section, we share some tips we need to keep in mind when simulating data:
Make the dataset as realistic as possible
Simulated data should reflect the real-world context. If you are simulating biological data, consider the typical distributions, relationships, and range of values that could appear in your real dataset. You may need check published papers whether your assumption is suitable.Carefully define random effect(s) and fixed effect(s) parameters
For generating a response variable, ensure that both fixed and random effects are specified clearly. The fixed effects could represent known, deterministic influences (like treatment groups), while random effects account for variability due to unmeasured factors (like species or individuals). When setting residuals (errors) for the Gaussian response variable, remember that the variance of the residuals should typically be smaller than the variance of the random effects. Because they represent smaller-scale noise, while random effects capture larger-scale variability. If you set the residual variance is larger, it can lead model unrealistic.Check true values (we set) and model estimates
After generating the data and fitting your model, compare the “true” parameter values you set for the simulation with the estimates your model provides. This will help you assess the accuracy of your model and how well it can recover the true underlying parameters. It is important to check whether the model overestimates or underestimates certain effects.Ensure all created variables are properly aligned
When simulating data, it is essential to check that all the variables are correctly aligned and correspond to each other as intended. Specifically, pay attention to the following points:Matching species names with those in the phylogenetic tree
Ensure that the species names used in your data set are consistent with those in the phylogenetic tree (if applicable). If these names don’t match, the integrity of the data could be compromised, leading to inaccurate results (or model can not run). For instance, if a species is named “SpeciesA” in the data but referred to as “SpeciesB” in the phylogeny, the data won’t be interpreted correctly.Correspondence between variables
Also check that other variables, such as individual IDs, environmental factors, gender, or group differences, are correctly matched. For example, verify that data points for each individual are correctly associated with the intended ID, and that variables like age and gender follow expected distributions. How to check: After creating your data frame, use functions, such ashead()
,summary()
, orView()
in R (you can find other functions as well) to check for any discrepancies, such as duplicated species names or missing values. This helps that your dataset is internally consistent.
Other small things…
- Read the documentation
It may not be worth mentioning again, but always read the documentation for the functions yo are using. Understand how they work, what parameters they take, and what outputs they produce. - Test with small examples
If you are using a new function, start by testing it on a small sample of data to see how it behaves before using it in your actual analysis. This helps you avoid larger mistakes when you apply it to the full dataset. - Stick to familiar methods
However, stick to methods and functions you are familiar with, especially when you are starting. As you encounter issues or need more advanced features, gradually learn new functions and techniques. It is a safer approach and will help you build confidence.
Simulate datasets and run models
In this extension, We aim to create a dataset for models that take different types of response variables: Gaussian, binary, ordinal, and nominal traits. The explanatory variables are all the same - body mass and sex. All datasets were generated based on observations of 100 bird species, with each species being observed 5 times.
Gaussian
We simulated the habitat range (log-transformed) of each species as the continuous response variable. We assumed habitat range is more strongly influenced by body mass than by sex, and phylogenetic factors are expected to have a greater impact than non-phylogenetic factors.
set.seed(1234)
<- pbtree(n = 100, scale = 1)
tree
# extract the correlation matrix from the phylogenetic tree
<- vcv(tree, corr = TRUE)
phylo_cor
# define parameters
<- 100 # number of species (clusters)
n_species <- 5 # number of observations per species
obs_per_species <- n_species * obs_per_species # total number of observations
n
# simulate fixed effects
## continuous variable: body mass
<- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_mass <- mvrnorm(n = 1, # species level body mass
mu_species_mass mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
<- sapply(mu_species_mass,
obs_mass function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.1))) %>%
as.vector() # each observation body mass
## binomial variable: sex
<- rbinom(n, 1, 0.5)
sex
# simulate random effect parameter
## variance components
<- 2
sigma_phylo <- 1 # standard deviation of non-phylogenetic random effect
sigma_non
# phylogenetic random effect
<- mvrnorm(
random_effect_phylo n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
<- rep(random_effect_phylo, each = obs_per_species)
phylo_effect
# non-phylogenetic random effect
<- rnorm(n_species, sd = sqrt(sigma_non))
random_effect_non_phylo <- rep(random_effect_non_phylo, each = obs_per_species)
non_phylo_effect
# residual
<- 0.2
sigma_residual <- rnorm(n, mean = 0, sd = sqrt(sigma_residual))
residual
# define fixed effect parameter
<- 0 # intercept
beta_c0 <- 1.2 # coefficient for body mass
beta_c1 <- 0.1 # coefficient for sex
beta_c2
<- beta_c0 + beta_c1 * obs_mass + beta_c2 * sex + phylo_effect + non_phylo_effect + residual
y
<- data.frame(
sim_data1 habitat_range = y,
mass = obs_mass,
sex = factor(sex, labels = c("F", "M")),
species = names(phylo_effect),
individual_id = factor(1:n), # create unique individual IDs for each observation
phylo = names(phylo_effect)
%>%
) arrange(species)
Run models
<- inverseA(tree, nodes = "ALL", scale = TRUE)$Ainv
Ainv <- list(R = list(V = 1, nu = 0.002),
prior1_mcmcglmm G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 100),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 100)
)
)
system.time(
<- MCMCglmm(
mod1_mcmcglmm ~ 1,
habitat_range random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
system.time(
<- MCMCglmm(
mod1_mcmcglmm2 ~ mass,
habitat_range random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
system.time(
<- MCMCglmm(
mod1_mcmcglmm3 ~ mass + sex,
habitat_range random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*250,
thin = 10*250,
burnin = 3000*250
)
)
# brms
<- ape::vcv.phylo(tree, corr = TRUE)
A
<- default_prior(
priors1_brms ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
habitat_range data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(habitat_range ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
mod1_brms data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
priors1_brms2 ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
habitat_range data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(habitat_range ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
mod1_brms2 data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms2,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
priors1_brms3 ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
habitat_range data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(habitat_range ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
mod1_brms3 data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms3,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
) )
Results
actual mean and variance of the model is…
mean(y)
[1] 5.560533
var(random_effect_phylo)
[1] 0.9475315
var(random_effect_non_phylo)
[1] 0.8428301
var(residual)
[1] 0.1892837
summary(mod1_mcmcglmm)
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 998.4984
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.165 0.1108 2.704 881.6
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 1.036 0.5946 1.504 1323
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 0.3573 0.3125 0.4029 1000
#
# Location effects: habitat_range ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 5.239 4.131 6.302 1000 <0.001 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(mod1_brms)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: habitat_range ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data1 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.99 0.36 0.33 1.76 1.00 345 722
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.02 0.12 0.78 1.26 1.00 661 1451
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 5.28 0.51 4.22 6.30 1.00 4816 2515
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.60 0.02 0.56 0.64 1.00 3973 2562
autocorr.plot(mod1_mcmcglmm$Sol)
autocorr.plot(mod1_mcmcglmm$VCV)
posterior_summary(mod1_mcmcglmm$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 5.238814 0.5481793 4.036911 6.270514
posterior_summary(mod1_mcmcglmm$VCV)
Estimate Est.Error Q2.5 Q97.5
phylo 1.1653905 0.78259640 0.1622860 2.9869859
species 1.0359759 0.24017414 0.6052211 1.5251053
units 0.3572733 0.02396132 0.3150877 0.4083013
We will now verify whether the model could show estimates close to the true values from the simulated data - there are several points to check during this process. First, confirm whether the model converged by examining metrics such as the effective sampling size, autocorrelation, and R-hat values. Then, compare the true value with model point estimates and check the 95% credible intervals of point estimates. The intercept-only model does not include any independent variables. Instead, it focuses on the intercept (mean value) and variance components to explain the distribution of the dependent variable, habitat range. Looking at the results, the model appears to estimate the mean (fixed effect beta/intercept) and random effect variances close to their true values. However, the residual variance is not accurately estimated.
# MCMCglmm
<- ((mod1_mcmcglmm$VCV[, "phylo"]) / (mod1_mcmcglmm$VCV[, "phylo"] + mod1_mcmcglmm$VCV[, "species"] + mod1_mcmcglmm$VCV[, "units"]))
phylo_signal_mod1_mcmcglmm
%>% mean()
phylo_signal_mod1_mcmcglmm # [1] 0.4170196
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mod1_mcmcglmm # 2.5% 50% 97.5%
# 0.08783016 0.42359454 0.73669927
# brms
<- mod1_brms %>% as_tibble() %>%
phylo_signal_mod1_brms ::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept, Res = sigma) %>%
dplyrmutate(lambda_gaussian = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + Res^2)) %>%
pull(lambda_gaussian)
%>% mean()
phylo_signal_mod1_brms # [1] 0.3984794
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mod1_brms # 2.5% 50% 97.5%
# 0.06146131 0.39645518 0.74712062
The phylogenetic signal was estimated as 0.42 (95% CI: 0.09, 0.74) using MCMCglmm
and 0.40 (95% CI: 0.06, 0.75) using brms
.It is important to note that we can now consider non-phylogenetic variance (here, species
) more accurately, as the non-phylogenetic variance is directly estimated and is not confounded by within-species effects or other sources of variance.
summary(mod1_mcmcglmm2)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Sample size = 1000
#
# DIC: 680.8927
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 2.31 0.5428 4.753 844.2
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 1.164 0.6272 1.72 867.3
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 0.1881 0.1618 0.2127 1000
#
# Location effects: habitat_range ~ mass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.2339 -1.7632 1.4823 1000 0.774
# mass 1.1143 0.9890 1.2515 1000 <0.001 ***
posterior_summary(mod1_mcmcglmm2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.233917 0.86362777 -1.8194889 1.430111
# mass 1.114301 0.06819815 0.9859776 1.250126
posterior_summary(mod1_mcmcglmm2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 2.309659 1.1768520 0.7238491 5.2748480
# species 1.163885 0.2830942 0.6511874 1.7726979
# units 0.188124 0.0129251 0.1628494 0.2151048
summary(mod1_brms2)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: habitat_range ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data1 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.44 0.34 0.84 2.18 1.00 563 1266
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.07 0.13 0.82 1.34 1.00 851 1427
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.15 0.80 -1.73 1.45 1.00 3901 2789
# mass 1.11 0.07 0.98 1.25 1.00 6559 2347
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.43 0.02 0.40 0.47 1.00 3843 2959
Next, regarding the model that includes log-transformed body mass as an explanatory variable, the results showed that, as we had set(beta_c1
), mass was estimated to have a positive effect on habitat range. Additionally, examining the residuals reveals that they are much closer to the true value compared to the intercept-only model. This improvement suggests that including an explanatory variable helps the model capture more of the variability in the data, reducing the unexplained variance (residual variance).
summary(mod1_mcmcglmm3)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Sample size = 1000
#
# DIC: 668.5273
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 2.325 0.5523 4.646 1073
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 1.151 0.6569 1.764 900.3
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 0.1832 0.1572 0.2097 1000
#
# Location effects: habitat_range ~ mass + sex
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.28323 -1.97048 1.38771 1000 0.674
# mass 1.11239 0.97750 1.24672 1000 <0.001 ***
# sexM 0.15016 0.06697 0.22862 1000 <0.001 ***
posterior_summary(mod1_mcmcglmm3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.2832264 0.85695461 -1.93569141 1.4800298
# mass 1.1123899 0.06844718 0.97159841 1.2423309
# sexM 0.1501593 0.04167755 0.06778142 0.2286585
posterior_summary(mod1_mcmcglmm3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 2.3252270 1.16940589 0.7405934 5.2388026
# species 1.1512006 0.29343601 0.6567145 1.7615075
# units 0.1831836 0.01342064 0.1586928 0.2117723
summary(mod1_brms3)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: habitat_range ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data1 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.43 0.34 0.84 2.14 1.00 619 1255
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.08 0.13 0.83 1.33 1.00 964 1578
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.22 0.77 -1.76 1.26 1.00 3957 2850
# mass 1.11 0.07 0.98 1.24 1.00 8492 2851
# sexM 0.15 0.04 0.06 0.23 1.00 8621 2798
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.43 0.02 0.40 0.46 1.00 4112 2962
Finally, let’s take a look at the results when both mass and sex are included in the model. We assumed that males are, on average, 0.1 units larger than females. This assumption is successfully reflected in the model estimates
Binary
The response variable in this analysis is breeding success, defined as whether a species reproduced or not. The assumptions are as follows: phylogeny has a weak relationship with breeding success, while some non-phylogenetic factors, such as environmental conditions, are likely to play a role. Additionally, heavier species are assumed to have a higher likelihood of breeding success, and females (coded as 0) are expected to succeed in breeding more often than males.
set.seed(456)
<- pbtree(n = 100, scale = 1)
tree2
# extract the correlation matrix from the phylogenetic tree
<- vcv(tree2, corr = TRUE)
phylo_cor
# define parameters
<- 100 # number of species (clusters)
n_species <- 5 # number of observations per species
obs_per_species <- n_species * obs_per_species # total number of observations
n
# simulate fixed effects
## continuous variable: body mass
<- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_mass <- mvrnorm(n = 1, # species level body mass
mu_species_mass mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
<- sapply(mu_species_mass,
obs_mass function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.2))) %>%
as.vector() # each observation body mass
## binomial variable: sex
<- rbinom(n, 1, 0.5)
sex
# simulate random effect parameter
## variance components
<- 0.4
sigma_phylo <- 0.3 # standard deviation of non-phylogenetic random effect
sigma_non
## phylogenetic random effect
<- mvrnorm(
random_effect_phylo n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
<- rep(random_effect_phylo, each = obs_per_species)
phylo_effect
## non-phylogenetic random effect
<- rnorm(n, sd = sqrt(sigma_non))
random_effect_non_phylo <- rep(random_effect_non_phylo, each = obs_per_species)
non_phylo_effect
# define fixed effect parameter
<- -0.5 # intercept
beta_b0 <- 0.4 # coefficient for body mass
beta_b1 <- -1.5 # coefficient for sex
beta_b2
# simulate linear predictor
<- beta_b0 + beta_b1 * obs_mass + beta_b2 * sex + phylo_effect + non_phylo_effect
eta1 # eta1 <- beta_b0 + phylo_effect + non_phylo_effect
# probability
<- 1 / (1 + exp(-eta1))
p1 <- rbinom(n, 1, p1)
y1
# combine simulated data
<- data.frame(
sim_data2 breeding = y1,
mass = obs_mass,
sex = factor(sex, labels = c("F", "M")),
species = names(phylo_effect),
individual_id = factor(1:n), # create unique individual IDs for each observation
phylo = names(phylo_effect)
%>%
) arrange(species) %>%
as.data.frame()
# head(sim_data2, 20)
# breeding mass sex species individual_id phylo
# 1 0 4.425580 M t1 146 t1
# 2 0 4.518592 M t1 147 t1
# 3 0 4.553237 M t1 148 t1
# 4 1 4.728218 F t1 149 t1
# 5 1 4.125475 F t1 150 t1
# 6 1 4.181977 M t10 251 t10
# 7 1 4.366759 F t10 252 t10
# 8 0 4.286815 F t10 253 t10
# 9 0 3.998315 M t10 254 t10
# 10 1 3.877700 F t10 255 t10
# 11 1 5.367269 M t100 311 t100
# 12 0 4.287155 M t100 312 t100
# 13 0 4.715682 F t100 313 t100
# 14 1 4.969713 M t100 314 t100
# 15 1 4.818693 F t100 315 t100
# 16 0 3.750907 M t11 256 t11
# 17 1 4.016580 F t11 257 t11
# 18 1 3.722079 F t11 258 t11
# 19 1 3.325677 M t11 259 t11
# 20 1 3.100769 F t11 260 t11
Run models
# MCMCglmm
<- inverseA(tree2, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior2_mcmcglmm G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(breeding ~ 1,
mod2_mcmcglmm_logit random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*10,
thin = 10*10,
burnin = 3000*10)
)
system.time(
<- MCMCglmm(breeding ~ mass,
mod2_mcmcglmm_logit2 random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)
system.time(
<- MCMCglmm(breeding ~ mass + sex,
mod2_mcmcglmm_logit3 random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)
# brms
<- ape::vcv.phylo(tree2, corr = TRUE)
A
<- default_prior(
priors2_brms ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
breeding data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
<- brm(breeding ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
mod2_brms_logit data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
priors2_brms2 ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
breeding data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
<- brm(breeding ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
mod2_brms_logit2 data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms2,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
priors2_brms3 ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
breeding data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
<- brm(breeding ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
mod2_brms_logit3 data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms3,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
) )
Results
True values are here:
# actual mean and variance of y
mean(eta1)
[1] 1.191048
var(eta1)
[1] 1.277676
var(random_effect_phylo)
[1] 0.2981068
var(random_effect_non_phylo)
[1] 0.3090161
Before checking the output,we need to correct the results from MCMCglmm
using the following equation:
<- (16 * sqrt(3) / (15 * pi))^2 c2
summary(mod2_mcmcglmm_logit)
# Iterations = 30001:129901
# Thinning interval = 100
# Sample size = 1000
#
# DIC: 550.715
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 0.9548 8.97e-06 2.355 639.4
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.4156 3.523e-08 1.201 486.4
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: breeding ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.1605 0.1331 2.1428 1272 0.034 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# > res_1 <- mod2_mcmcglmm_logit$Sol / sqrt(1+c2) # for fixed effects
# > res_2 <- mod2_mcmcglmm_logit$VCV / (1+c2) # for variance components
# > summary(res_1)
#
# Iterations = 30001:129901
# Thinning interval = 100
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 1.00036 0.43343 0.01371 0.01215
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.1144 0.7425 1.0093 1.2709 1.8407
<- mod2_mcmcglmm_logit$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mod2_mcmcglmm_logit$VCV / (1+c2) # for variance components
res_2
summary(res_1)
# Iterations = 30001:129901
# Thinning interval = 100
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 1.00036 0.43343 0.01371 0.01215
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.1144 0.7425 1.0093 1.2709 1.8407
summary(res_2)
# Iterations = 30001:129901
# Thinning interval = 100
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# phylo 0.7094 0.5174 0.016361 0.02046
# species 0.3088 0.2845 0.008996 0.01290
# units 0.7430 0.0000 0.000000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# phylo 0.02672 0.3361 0.5973 0.9889 2.040
# species 0.00126 0.0810 0.2530 0.4560 1.028
# units 0.74303 0.7430 0.7430 0.7430 0.743
summary(mod2_brms_logit)
# Family: bernoulli
# Links: mu = logit
# Formula: breeding ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data2 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.77 0.31 0.18 1.41 1.01 542 820
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.51 0.25 0.03 1.00 1.00 475 924
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 0.99 0.42 0.13 1.81 1.00 1407 1848
The fixed effect estimates in both MCMCglmm
and brms
are close to the true values, but the phylogenetic random effects are not performing well. However, the wide 95% confidence intervals indicate that they have very large uncertainties.
And phylogenetic signal is…
# MCMCglmm
<- ((mod2_mcmcglmm_logit$VCV[, "phylo"]/(1+c2)) / (mod2_mcmcglmm_logit$VCV[, "phylo"]/(1+c2) + mod2_mcmcglmm_logit$VCV[, "species"]/(1+c2) + 1))
phylo_signal_mod2_mcmcglmm
%>% mean()
phylo_signal_mod2_mcmcglmm # [1] 0.3233245
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mod2_mcmcglmm # 2.5% 50% 97.5%
# 0.01696071 0.32167758 0.64318606
# brms
<- mod2_brms_logit %>% as_tibble() %>%
phylo_signal_mod2_brms ::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept) %>%
dplyrmutate(lambda_binary = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 +1)) %>%
pull(lambda_binary)
%>% mean()
phylo_signal_mod2_brms # [1] 0.3117823
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mod2_brms # 2.5% 50% 97.5%
# 0.01973947 0.30714261 0.63302530
The estimated phylogenetic signal values were similar in both models.
summary(mod2_mcmcglmm_logit2)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 543.8511
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.002 2.289e-06 2.566 972.4
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.5078 4.507e-06 1.34 868.1
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: breeding ~ mass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.5979 -3.6412 0.6322 1000 0.126
# mass 0.6094 0.2202 1.0957 1000 <0.001 ***
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mod2_mcmcglmm_logit2$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mod2_mcmcglmm_logit2$VCV / (1+c2) # for variance components
res_2
summary(res_1)
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# (Intercept) -1.3774 0.9619 0.030419 0.030419
# mass 0.5253 0.1910 0.006038 0.006038
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# (Intercept) -3.2469 -2.0173 -1.3249 -0.7434 0.4585
# mass 0.1571 0.3912 0.5174 0.6488 0.9140
summary(res_2)
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# phylo 0.7442 0.5982 0.01892 0.01918
# species 0.3773 0.3251 0.01028 0.01103
# units 0.7430 0.0000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# phylo 0.016307 0.3010 0.6173 1.0441 2.215
# species 0.001679 0.1132 0.3198 0.5479 1.194
# units 0.743029 0.7430 0.7430 0.7430 0.743
summary(mod2_brms_logit2)
# Family: bernoulli
# Links: mu = logit
# Formula: breeding ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data2 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.80 0.35 0.15 1.53 1.00 418 497
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.55 0.28 0.03 1.09 1.00 414 766
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -1.42 1.01 -3.51 0.45 1.00 2097 1850
# mass 0.52 0.19 0.15 0.92 1.00 2664 2631
summary(mod2_mcmcglmm_logit3)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 493.3098
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.629 0.0009646 4.062 910
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.6195 1.258e-06 1.682 756.2
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: breeding ~ mass + sex
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.8882 -3.4396 1.7246 1000 0.462
# mass 0.7312 0.2433 1.2183 1000 0.002 **
# sexM -2.0358 -2.6616 -1.4547 1000 <0.001 ***
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mod2_mcmcglmm_logit3$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mod2_mcmcglmm_logit3$VCV / (1+c2) # for variance components
res_2
summary(res_1)
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# (Intercept) -0.7656 1.1431 0.036149 0.036149
# mass 0.6303 0.2197 0.006946 0.006946
# sexM -1.7548 0.2780 0.008791 0.008791
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# (Intercept) -2.9652 -1.4957 -0.7345 -0.08661 1.485
# mass 0.2142 0.4818 0.6342 0.77977 1.060
# sexM -2.3071 -1.9429 -1.7496 -1.56044 -1.263
summary(res_2)
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# phylo 1.2105 0.9233 0.02920 0.03061
# species 0.4603 0.3948 0.01249 0.01436
# units 0.7430 0.0000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# phylo 0.043912 0.6072 1.031 1.573 3.571
# species 0.003595 0.1428 0.374 0.671 1.391
# units 0.743029 0.7430 0.743 0.743 0.743
summary(mod2_brms_logit3)
# Family: bernoulli
# Links: mu = logit
# Formula: breeding ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data2 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.01 0.40 0.24 1.86 1.00 776 1072
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.60 0.30 0.05 1.18 1.00 670 1537
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.79 1.14 -3.12 1.38 1.00 3595 3216
# mass 0.62 0.22 0.19 1.06 1.00 4898 3060
# sexM -1.75 0.28 -2.29 -1.23 1.00 6333 3267
The coefficient estimates for both mass and sex were close to the values we set :)
Ordinal
The response variable is the aggressiveness level, which has three categories: Low, Medium, and High. It is assumed that phylogeny plays a significant role in influencing aggressiveness. However, non-phylogenetic factors are expected to have only a weak relationship with aggressiveness. The relationship between body mass and aggressiveness is unclear, and it may vary across species. Additionally, males are more likely to exhibit higher levels of aggressiveness compared to females.
set.seed(789)
<- pbtree(n = 100, scale = 1)
tree3
# extract the correlation matrix from the phylogenetic tree
<- vcv(tree3, corr = TRUE)
phylo_cor
# define parameters
<- 100 # number of species (clusters)
n_species <- 5 # number of observations per species
obs_per_species <- n_species * obs_per_species # Total number of observations
n
# simulate fixed effects
## continuous variable: body mass
<- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_mass <- mvrnorm(n = 1, # species level body mass
mu_species_mass mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
<- sapply(mu_species_mass,
obs_mass function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.2))) %>%
as.vector() # each observation body mass
## binomial variable: sex
<- rbinom(n, 1, 0.5)
sex
# simulate random effect parameter
## variance components
<- 1
sigma_phylo <- 0.4 # standard deviation of non-phylogenetic random effect
sigma_non
## phylogenetic random effect
<- mvrnorm(
random_effect_phylo n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
<- rep(random_effect_phylo, each = obs_per_species)
phylo_effect
## non-phylogenetic random effect
<- rnorm(n_species, sd = sqrt(sigma_non))
random_effect_non_phylo <- rep(random_effect_non_phylo, each = obs_per_species)
non_phylo_effect
# define fixed effect coefficients
<- -0.5 # intercept
beta_o0 <- 0 # coefficient for mass - controversial whether body mass is related to aggressiveness
beta_o1 <- 1 # coefficient for sex - males are often more aggressive than females
beta_o2
# simulate the linear predictor
<- beta_o0 + beta_o1 * obs_mass + beta_o2 * sex + non_phylo_effect + phylo_effect
eta
# define thresholds (cutpoints) for the ordinal categories
<- c(-0.5, 0.5)
cutpoints
# Calculate cumulative probabilities using the probit link function
<- pnorm(cutpoints[1] - eta)
p1 <- pnorm(cutpoints[2] - eta) - p1
p2 <- 1 - pnorm(cutpoints[2] - eta)
p3
# combine probabilities into a matrix
<- cbind(p1, p2, p3)
probs
# simulate the ordinal response
<- apply(probs, 1, function(prob) sample(1:3, 1, prob = prob))
aggressive_level
# create a data frame
<- data.frame(
sim_data3 <- rep(1:n, each = 1),
individual_id phylo = names(phylo_effect),
species = names(phylo_effect),
aggressive_level = factor(aggressive_level,
levels = 1:3,
labels = c("Low", "Medium", "High"),
ordered = TRUE),
sex = factor(sex, labels = c("F", "M")),
mass = obs_mass
%>%
) arrange(species)
# table(sim_data3$aggressive_level)
# Low Medium High
# 97 143 260
Run models
# mcmcglmm
<- inverseA(tree3, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(aggressive_level ~ 1,
mod3_mcmcglmm random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
system.time(
<- MCMCglmm(aggressive_level ~ mass,
mod3_mcmcglmm2 random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
system.time(
<- MCMCglmm(aggressive_level ~ mass + sex,
mod3_mcmcglmm3 random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
# brms
<- ape::vcv.phylo(tree3, corr = TRUE)
A
<- default_prior(
default_priors ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
aggressive_level data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
mod3_brms formula = aggressive_level ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
default_priors2 ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
aggressive_level data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
mod3_brms2 formula = aggressive_level ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
default_priors3 ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
aggressive_level data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
mod3_brms3 formula = aggressive_level ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
) )
Results
The estimates should align with the following values…
mean(eta)
[1] 0.7347869
var(eta)
[1] 1.484684
var(random_effect_phylo)
[1] 0.8430567
var(random_effect_non_phylo)
[1] 0.3352239
summary(mod3_mcmcglmm)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 864.7065
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.117 0.2213 2.167 1000
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.3178 9.188e-07 0.6704 1099
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: aggressive_level ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.3987 0.5334 2.1765 1000 <0.001 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitaggressive_level.1 1.12 0.9581 1.277 1007
posterior_summary(mod3_mcmcglmm$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.398716 0.4201698 0.5994656 2.295179
posterior_summary(mod3_mcmcglmm$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 1.1170856 0.5334151 0.36097048 2.3726763
# species 0.3178276 0.1906041 0.02409845 0.7331308
# units 1.0000000 0.0000000 1.00000000 1.0000000
posterior_summary(mod3_mcmcglmm$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitaggressive_level.1 1.119532 0.08512686 0.9590519 1.278939
summary(mod3_brms)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: aggressive_level ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data3 (Number of observations: 500)
# Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.01 0.24 0.58 1.52 1.00 731 1479
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.55 0.17 0.19 0.87 1.00 578 657
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -1.32 0.42 -2.16 -0.51 1.00 2085 2250
# Intercept[2] -0.20 0.41 -1.03 0.60 1.00 2074 2333
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
the estimates are consistent with the true values, and both model outputs are as well. Then, phylogenetic signal is…
# MCMCglmm
<- ((mod3_mcmcglmm$VCV[, "phylo"]) / (mod3_mcmcglmm$VCV[, "phylo"] + mod3_mcmcglmm$VCV[, "species"] + 1))
phylo_signal_mod3_mcmcglmm
%>% mean()
phylo_signal_mod3_mcmcglmm # [1] 0.438553
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mod3_mcmcglmm # 2.5% 50% 97.5%
# 0.1911082 0.4385983 0.6808732
# brms
<- mod3_brms %>% as_tibble() %>%
phylo_signal_mod3_brms ::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept) %>%
dplyrmutate(lambda_ordinal = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 +1)) %>%
pull(lambda_ordinal)
%>% mean()
phylo_signal_mod3_brms # [1] 0.4288363
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signal_mod3_brms # 2.5% 50% 97.5%
# 0.1788238 0.4319170 0.6689190
Nice - both models yielded nearly identical phylogenetic signals.
summary(mod3_mcmcglmm2)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 862.4181
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.208 0.3157 2.251 1000
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.2989 2.879e-06 0.6329 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: aggressive_level ~ mass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 2.16331 0.84674 3.45252 1000 0.002 **
# mass -0.16112 -0.36658 0.03244 1000 0.126
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitaggressive_level.1 1.126 0.9627 1.299 1000
posterior_summary(mod3_mcmcglmm2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 2.163314 0.6643733 0.8905397 3.49905134
# mass -0.161119 0.1028227 -0.3548954 0.04503046
posterior_summary(mod3_mcmcglmm2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 1.2078921 0.5324432 0.400101109 2.4861135
# species 0.2989292 0.1824707 0.009196968 0.7111517
# units 1.0000000 0.0000000 1.000000000 1.0000000
posterior_summary(mod3_mcmcglmm2$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitaggressive_level.1 1.125844 0.08611264 0.9675762 1.303084
summary(mod3_brms2)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: aggressive_level ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data3 (Number of observations: 500)
# Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.08 0.25 0.63 1.63 1.00 797 1655
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.50 0.19 0.08 0.84 1.00 526 647
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -2.11 0.66 -3.41 -0.84 1.00 4761 3053
# Intercept[2] -0.99 0.65 -2.28 0.29 1.00 4540 2992
# mass -0.16 0.10 -0.37 0.04 1.00 6574 2840
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
summary(mod3_mcmcglmm3)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 785.7335
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.664 0.3929 3.216 1000
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.4061 9.11e-06 0.8454 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: aggressive_level ~ mass + sex
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.6060 0.1500 3.1995 1000 0.028 *
# mass -0.1199 -0.3519 0.1025 1000 0.308
# sexM 1.0586 0.7945 1.3036 1117 <0.001 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitaggressive_level.1 1.277 1.091 1.487 1000
posterior_summary(mod3_mcmcglmm3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.6059858 0.7827238 0.09396171 3.1803784
# mass -0.1198975 0.1181710 -0.35162836 0.1026163
# sexM 1.0585665 0.1318889 0.81133692 1.3344116
posterior_summary(mod3_mcmcglmm3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 1.663718 0.8053196 0.58185142 3.5586348
# species 0.406108 0.2440002 0.02279807 0.9607917
# units 1.000000 0.0000000 1.00000000 1.0000000
posterior_summary(mod3_mcmcglmm3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitaggressive_level.1 1.277118 0.1012452 1.088106 1.485881
summary(mod3_brms3)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: aggressive_level ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data3 (Number of observations: 500)
# Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.25 0.30 0.72 1.90 1.00 423 1179
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.59 0.21 0.11 0.97 1.00 357 408
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -1.58 0.76 -3.08 -0.12 1.00 2467 2888
# Intercept[2] -0.30 0.76 -1.80 1.14 1.00 2420 3074
# mass -0.13 0.11 -0.35 0.09 1.00 3431 3348
# sexM 1.05 0.13 0.80 1.31 1.00 5609 3295
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
Both MCMCglmm
and brms
estimated very similar values, and the fixed effect coefficients were close to the values set in the simulation.
Nominal
The response variable is lateralisation (handedness/footedness), categorized as Left, Right, or Both, with “Both” serving as the reference category, representing non-preference. It is assumed that phylogeny is related to lateralisation, with a weak negative correlation between left and right preferences. Non-phylogenetic factors are expected to have only a weak influence on lateralisation. While body mass may have a minor effect, sex would not have significant effect on lateralisation. The majority of individuals tend to show a preference for either the left or right side, with only a few exhibiting no clear preference.
Here, we create a simple dataset and check the results from the intercept-only model, as there are some challenges. The nominal model is more complex, so when we use similar settings to the three models above, the sample size we have set is too small. As a result, the variance and correlation do not align. Therefore, in this case, we do not include the effects of mass or sex when simulating the linear predictor. Additionally, both phylogenetic and non-phylogenetic correlations are set to zero.
set.seed(789)
<- pbtree(n = 100, scale = 1)
tree4
# extract the correlation matrix from the phylogenetic tree
<- vcv(tree4, corr = TRUE)
phylo_cor
# define parameters
<- 100 # number of species (clusters)
n_species <- 5 # number of observations per species
obs_per_species <- n_species * obs_per_species # total number of observations
n <- 3
n_categories
# define standard deviations and correlations for the random effects
<- c(1, 1.5) # standard deviations for phylogenetic random effects for left and right
sigma_phylo <- 0 # correlation between phylogenetic random effects for left and right
cor_phylo <- c(1.1, 1) # standard deviations for non-phylogenetic random effects for left and right
sigma_u <- 0 # correlation between non-phylogenetic random effects for left and right
cor_u
# covariance matrices
<- diag(sigma_u^2)
cov_u 1, 2] <- cov_u[2, 1] <- cor_u * sigma_u[1] * sigma_u[2]
cov_u[
<- diag(sigma_phylo^2)
cov_phylo 1, 2] <- cov_phylo[2, 1] <- cor_phylo * sigma_phylo[1] * sigma_phylo[2]
cov_phylo[
# simulate phylogenetic random effects
<- mvrnorm(n = 1, mu = rep(0, n_species * 2), Sigma = kronecker(cov_phylo, phylo_cor))
phylo_random_effects <- rep(phylo_random_effects[1:n_species], each = obs_per_species)
phylo_effect_1 <- rep(phylo_random_effects[(n_species + 1):(2 * n_species)], each = obs_per_species)
phylo_effect_2
# simulate non-phylogenetic random effects
<- mvrnorm(n_species, mu = c(0, 0), Sigma = cov_u)
non_phylo_random_effects <- rep(non_phylo_random_effects[, 1], each = obs_per_species)
non_phylo_effect_1 <- rep(non_phylo_random_effects[, 2], each = obs_per_species)
non_phylo_effect_2
# define fixed effect coefficients for each category relative to the reference category (Both)
<- c(0.4, 0.8) # intercept for left and right
beta_m0 <- c(0.2, -0.1) # coefficient for mass for left and right
beta_m1 <- c(0, 0) # coefficient for sex for left and right
beta_m2
# simulate the linear predictor for each category relative to the reference
<- matrix(0, nrow = n, ncol = 2)
eta 1] <- beta_m0[1] + non_phylo_effect_1 + phylo_effect_1
eta[, 2] <- beta_m0[2] + non_phylo_effect_2 + phylo_effect_2
eta[,
# calculate probabilities using the multinomial logit link function
<- exp(cbind(eta, 0)) # adding a column of zeros for the reference category
exp_eta <- exp_eta / rowSums(exp_eta)
probs
# simulate the multinomial response
<- apply(probs, 1, function(prob) sample(1:3, 1, prob = prob))
lateralisation
# generate individual ID for each observation
<- rep(1:n, each = 1) # create unique individual IDs for each observation
individual_id
# create a data frame
<- data.frame(
sim_data4 individual_id = individual_id,
phylo = rep(tree4$tip.label, each = 5),
species = rep(tree4$tip.label, each = 5),
lateralisation = factor(lateralisation,
levels = 1:n_categories,
labels = c("Left", "Right", "Both"),
ordered = FALSE),
sex = factor(sex, labels = c("F", "M")),
mass = obs_mass
%>%
) arrange(species)
$lateralisation <- relevel(sim_data4$lateralisation, ref = "Both")
sim_data4
# table(sim_data4$lateralisation)
# Both Left Right
# 45 208 247
Run models
<- inverseA(tree4, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(
prior1 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
),G2 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
<- MCMCglmm(lateralisation ~ trait - 1,
mcmcglmm_mod4 random = ~us(trait):phylo + us(trait):species,
rcov = ~us(trait):units,
ginverse = list(phylo = inv_phylo$Ainv),
family = "categorical",
data = sim_data4,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
# brms
<- ape::vcv.phylo(tree4, corr = TRUE)
A <- default_prior(lateralisation ~ 1 + (1 |a| gr(phylo, cov = A)) + (1 |b| species),
priors_brms1 data = sim_data4,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(lateralisation ~ 1 + (1 |a| gr(phylo, cov = A)) + (1 |b| species),
brms_mod4 data = sim_data4,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 12000,
warmup = 8000,
chains = 2,
thin = 1,
cores = 2,
control = list(adapt_delta = 0.99),
) )
Results
True values are…
mean(eta[, 1])
[1] 1.498934
mean(eta[, 2])
[1] 1.71958
var(phylo_effect_1)
[1] 0.888621
var(phylo_effect_2)
[1] 1.853071
var(non_phylo_effect_1)
[1] 1.135754
var(non_phylo_effect_2)
[1] 1.184511
cor(phylo_effect_1, phylo_effect_2)
[1] 0.1435537
cor(non_phylo_effect_1, non_phylo_effect_2)
[1] 0.01583177
Before checking the output,we need to correct the results from MCMCglmm
using the following equation:
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- c2*(2/3) c2a
summary(mod4_mcmcglmm)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Sample size = 1000
#
# DIC: 758.4723
#
# G-structure: ~us(trait):phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitlateralisation.Left:traitlateralisation.Left.phylo 0.5216 2.422e-07 1.6198 1000
# traitlateralisation.Right:traitlateralisation.Left.phylo -0.4840 -2.272e+00 0.8416 1000
# traitlateralisation.Left:traitlateralisation.Right.phylo -0.4840 -2.272e+00 0.8416 1000
# traitlateralisation.Right:traitlateralisation.Right.phylo 4.6297 1.388e+00 9.1477 1000
#
# ~us(trait):species
#
# post.mean l-95% CI u-95% CI eff.samp
# traitlateralisation.Left:traitlateralisation.Left.species 1.6045 3.293e-01 3.3113 1000
# traitlateralisation.Right:traitlateralisation.Left.species -0.4089 -1.402e+00 0.5503 1000
# traitlateralisation.Left:traitlateralisation.Right.species -0.4089 -1.402e+00 0.5503 1000
# traitlateralisation.Right:traitlateralisation.Right.species 1.0943 1.113e-06 2.6165 1294
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitlateralisation.Left:traitlateralisation.Left.units 0.6667 0.6667 0.6667 0
# traitlateralisation.Right:traitlateralisation.Left.units 0.3333 0.3333 0.3333 0
# traitlateralisation.Left:traitlateralisation.Right.units 0.3333 0.3333 0.3333 0
# traitlateralisation.Right:traitlateralisation.Right.units 0.6667 0.6667 0.6667 0
#
# Location effects: lateralisation ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitlateralisation.Left 0.70100 -0.04992 1.49781 1000 0.086 .
# traitlateralisation.Right 0.48833 -1.14012 2.22379 1000 0.578
<- mod4_mcmcglmm$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mod4_mcmcglmm$VCV / (1+c2a) # for variance components
res_2 <- (mod4_mcmcglmm$VCV[, 2]/(1+c2a)) /sqrt((mod4_mcmcglmm$VCV[, 1] * mod4_mcmcglmm$VCV[, 4])/(1+c2a))
res_3_corr_phylo <- (mod4_mcmcglmm$VCV[, 6]/(1+c2a)) /sqrt((mod4_mcmcglmm$VCV[, 5] * mod4_mcmcglmm$VCV[, 8])/(1+c2a))
res_3_corr_nonphylo
summary(res_1)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitlateralisation.Left 0.6319 0.3392 0.01073 0.01073
# traitlateralisation.Right 0.4402 0.7946 0.02513 0.02513
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitlateralisation.Left -0.1209 0.44513 0.6354 0.8306 1.306
# traitlateralisation.Right -1.0131 -0.09662 0.4496 0.9362 2.079
summary(res_2)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitlateralisation.Left:traitlateralisation.Left.phylo 0.4239 0.4635 0.01466 0.01466
# traitlateralisation.Right:traitlateralisation.Left.phylo -0.3933 0.6352 0.02009 0.02009
# traitlateralisation.Left:traitlateralisation.Right.phylo -0.3933 0.6352 0.02009 0.02009
# traitlateralisation.Right:traitlateralisation.Right.phylo 3.7622 1.7766 0.05618 0.05618
# traitlateralisation.Left:traitlateralisation.Left.species 1.3038 0.6761 0.02138 0.02138
# traitlateralisation.Right:traitlateralisation.Left.species -0.3323 0.3875 0.01226 0.01226
# traitlateralisation.Left:traitlateralisation.Right.species -0.3323 0.3875 0.01226 0.01226
# traitlateralisation.Right:traitlateralisation.Right.species 0.8892 0.7171 0.02268 0.01993
# traitlateralisation.Left:traitlateralisation.Left.units 0.5418 0.0000 0.00000 0.00000
# traitlateralisation.Right:traitlateralisation.Left.units 0.2709 0.0000 0.00000 0.00000
# traitlateralisation.Left:traitlateralisation.Right.units 0.2709 0.0000 0.00000 0.00000
# traitlateralisation.Right:traitlateralisation.Right.units 0.5418 0.0000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitlateralisation.Left:traitlateralisation.Left.phylo 0.0009474 0.09136 0.2797 0.606425 1.6498
# traitlateralisation.Right:traitlateralisation.Left.phylo -1.8018394 -0.71723 -0.3356 -0.003957 0.8138
# traitlateralisation.Left:traitlateralisation.Right.phylo -1.8018394 -0.71723 -0.3356 -0.003957 0.8138
# traitlateralisation.Right:traitlateralisation.Right.phylo 1.3864388 2.45513 3.4326 4.670559 8.0450
# traitlateralisation.Left:traitlateralisation.Left.species 0.3061715 0.82577 1.1969 1.673444 2.8234
# traitlateralisation.Right:traitlateralisation.Left.species -1.1075741 -0.56624 -0.3359 -0.104277 0.4842
# traitlateralisation.Left:traitlateralisation.Right.species -1.1075741 -0.56624 -0.3359 -0.104277 0.4842
# traitlateralisation.Right:traitlateralisation.Right.species 0.0251312 0.33532 0.7494 1.269344 2.6216
# traitlateralisation.Left:traitlateralisation.Left.units 0.5417579 0.54176 0.5418 0.541758 0.5418
# traitlateralisation.Right:traitlateralisation.Left.units 0.2708789 0.27088 0.2709 0.270879 0.2709
# traitlateralisation.Left:traitlateralisation.Right.units 0.2708789 0.27088 0.2709 0.270879 0.2709
# traitlateralisation.Right:traitlateralisation.Right.units 0.5417579 0.54176 0.5418 0.541758 0.5418
summary(res_3_corr_phylo)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.32095 0.43943 0.01390 0.01465
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.86925 -0.68977 -0.42416 -0.02273 0.67656
summary(res_3_corr_nonphylo)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.35545 0.34670 0.01096 0.01096
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.8562 -0.6255 -0.4143 -0.1322 0.4288
summary(mod4_brms)
# Family: categorical
# Links: muLeft = logit; muRight = logit
# Formula: lateralisation ~ 1 + (1 | a | gr(phylo, cov = A)) + (1 | b | species)
# Data: sim_data4 (Number of observations: 500)
# Draws: 2 chains, each with iter = 12000; warmup = 8000; thin = 1;
# total post-warmup draws = 8000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muLeft_Intercept) 0.60 0.37 0.03 1.41 1.00 1258 2496
# sd(muRight_Intercept) 1.99 0.46 1.21 3.02 1.00 2445 4477
# cor(muLeft_Intercept,muRight_Intercept) -0.28 0.46 -0.96 0.73 1.00 480 746
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muLeft_Intercept) 1.16 0.31 0.56 1.79 1.00 2264 2590
# sd(muRight_Intercept) 0.94 0.41 0.14 1.76 1.00 1100 1616
# cor(muLeft_Intercept,muRight_Intercept) -0.35 0.39 -0.97 0.45 1.00 1699 2808
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muLeft_Intercept 0.63 0.35 -0.11 1.32 1.00 3793 4140
# muRight_Intercept 0.37 0.79 -1.18 1.89 1.00 3953 4832
Please look at the correlation part - estimating random effect covariance is harder than estimating the random effects variance(s) - you now understand there is a different difficulty in estimating parameters. The 95% CIs are a quite large, they have very large uncertainties and with such large uncertainties, we do not expect they match…
fixed effects (beta) < random effects variance(s) <<< random effects covariance (correlation)
And phylogenetic signal is…
# average phylogenteic signal - both vs. left
## MCMCglmm
<- ((mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.phylo"]/(1+c2a)) / (mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.phylo"]/(1+c2a) + mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.species"]/(1+c2a) + 1))
phylo_signalL_mod4_mcmcglmm
%>% mean()
phylo_signalL_mod4_mcmcglmm # [1] 0.1452383
%>% quantile(probs = c(0.025, 0.5, 0.975))
phylo_signalL_mod4_mcmcglmm # 2.5% 50% 97.5%
# 0.000305831 0.112652234 0.482842085
## brms
<- mod4_brms %>% as_tibble() %>%
phylo_signalL_mod4_brms ::select(Sigma_phy = sd_phylo__muLeft_Intercept, Sigma_non_phy = sd_species__muLeft_Intercept) %>%
dplyrmutate(lambda_nominalL = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + 1)) %>%
pull(lambda_nominalL)
%>% mean()
phylo_signalL_mod4_brms # [1] 0.1557183
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalL_mod4_brms # 2.5% 50% 97.5%
# 0.0004165978 0.1182931266 0.5124994918
# average phylogenteic signal - both vs. right
## MCMCglmm
<- ((mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.phylo"]/(1+c2a)) / (mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.phylo"]/(1+c2a) + mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.species"]/(1+c2a) + 1))
phylo_signalR_mod4_mcmcglmm
%>% mean()
phylo_signalR_mod4_mcmcglmm # [1] 0.6446448
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalR_mod4_mcmcglmm # 2.5% 50% 97.5%
# 0.3433895 0.6569055 0.8555424
## brms
<- mod4_brms %>% as_tibble() %>%
phylo_signalR_mod4_brms ::select(Sigma_phy = sd_phylo__muRight_Intercept, Sigma_non_phy = sd_species__muRight_Intercept) %>%
dplyrmutate(lambda_nominalR = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + 1)) %>%
pull(lambda_nominalR)
%>% mean()
phylo_signalR_mod4_brms # [1] 0.6524086
%>% quantile(probs = c(0.025,0.5,0.975))
phylo_signalR_mod4_brms # 2.5% 50% 97.5%
# 0.3728231 0.6642912 0.8648875
References and useful links
Sheard C, Skinner N, and Caro T. The evolution of rodent tail morphology. Am. Nat. 2024. 203:629-43. doi.org/10.1086/729751
Macdonald RX, Sheard C, Howell N, and Caro T. Primate coloration and colour vision: a comparative approach. Biol. J. Linn. Soc. Lond 2024. 141:435-455. doi.org/10.1093/biolinnean/blad089
Tobias et al. AVONET: morphological, ecological and geographical data for all birds. Ecol. Lett 2022. 25:581-597. doi.org/10.1111/ele.13898
Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. Rank-normalization, folding, and localization: An improved \(\hat{\mathbf{R}}\) for assessing convergence of MCMC (with discussion). Bayesian. anal 2021. 16:667-718. doi.org/10.1214/20-BA1221