pacman::p_load("ape", "coda", "tidyverse", "here",
"MCMCglmm", "brms", "MASS",
"phytools", "arm", "posterior")A step-by-step tutorial
This online tutorial belongs to the paper “Promoting the use of phylogenetic multinomial generalised mixed-effects model to understand the evolution of discrete traits.” We are still updating the tutorial more reader-friendly - keep checking for the latest version!
Introduction
This tutorial provides a step-by-step guide to model different types of variables using MCMCglmm and brms. We show and explain how to implement 1. Gaussian, 2. binary, 3. ordinal, and 4. nominal models and interpret obtain results. We will also provide extensions: models with multiple data points per species and models with simulated data (simulation is powerful tool to understand the behaviour of the model and the effect of the prior distribution on the results!)
Why use phylogenetic comparative methods?
Phylogenetic comparative methods are used to consider the non-independence of species relatedness due to shared evolutionary history. If we ignore the phylogenetic relationship among species, we may violate the assumption of independence and underestimate the standard errors of the estimated parameters. These methods are widely used in evolutionary biology and ecology to test hypotheses about the evolution of traits and behaviours, reconstruct common ancestral traits, and investigate how different traits have co-evolved across species.
Why use MCMCglmm and brms in this tutorial?
MCMCglmm and brms allow us to make hierarchical and non-linear models. We can create more appropriate models that accurately reflect the underlying relationships within the data. Additionally, both packages can handle various types of response variables, including continuous, Poisson, binary, ordinal, and nominal data. Using Bayesian methods for parameter estimation, MCMCglmm and brms can incorporate prior distributions, which is particularly beneficial when working with limited data.
Intended audience
Our tutorial is designed for students and researchers who are familiar with using R language (at least know what is R!) and have an interest in using phylogenetic comparative methods. Although the examples in our tutorial focus on questions and data from Behavioural Ecology and Evolutionary Biology, we provide detailed explanations. Therefore, we believe these techniques are versatile and can be applied in other scientific disciplines, such as environment science, psychology, and beyond.
Setup R in your computer
Please install the following packages before running the code: MCMCglmm, brms, ape, phytools, here, coda, MASS, tidyverse, arm. If this is the first time using brms, you need to install Rstan before installing brms. Please follow the instruction on the official website.
Note 1: some models (especially nominal models) will take a long time to run. Thus, we recommend when you want to compare the results, you can save the model output as .rds file using saveRDS() and read it by readRDS() when you want to compare the results.
Note 2: Because of the stochastic nature of MCMC, every time you (re)run a model, you will get a slightly different output even the model was well-mixing and converge. So even if you run the same model using the same computer, it would always be different to whatever our output here.
Note 3: Ideally, the effective sample size of a model should be close to 1000. However, We consider a model’s results reliable if the effective sample size (ESS) exceeds 400, as suggested by Vehtari et al. (2021). According to their recommendation, an ESS greater than 400 is typically sufficient to ensure stable estimates of variances, autocorrelations, and Monte Carlo standard errors based on their practical experience and simulations (more detailed information; Vehtari et al. 2021). Here, qe evaluate comprehensively in line with other convergence diagnostic indicators, such as R hat and autocorrelation plots.
1. Gaussian models
Gaussian models are used when the response variable is continuous (numeric) and normally distributed. For example, age, body size (e.g. weight, length, or height of an organism), temperature, and distance.
Explanation of dataset
We use the rodent dataset and phylogenetic tree provided by Sheard et al. (2024) to explain how to model Gaussian models. The data is about the rodent suborder Sciuromorpha (223 species), which includes squirrels, chipmunks, dormice, and the mountain beaver. The dataset contains several information including the tail length, body mass, mean annual temperature, and the presence of a tail contrasting tip. The tail contrasting tip is a white or black tip at the end of the tail. In this section, we test the evolutionary relationships between tail length & body mass (response variables) and mean annual temperature and the presence of tail contrasting tip (predictor variables) in rodents.
First of all, we modify the original dataset.
# 1. Read dataset
dt <- read.csv(here("data", "potential", "Rodent_tail", "RodentData.csv")) # Replace with your own folder path to where the data is stored
str(dt) # check dataset - replace table format!!
dt <- dt %>% rename(Phylo = UphamTreeName.full) # Rename the 'UphamTreeName.full' column to 'Phylo'
dt <- dt %>% filter(Suborder == "Sciuromorpha") # Filter the rodent suborder
dt <- subset(dt, !is.na(Mass)) # Remove the species do not have body mass data
# Tail length and body mass: log-transform to reduce skewness, then standardise (centre and scale) to have a mean of 0 and a standard deviation of 1, making it easier to compare with other variables.
# Temperature, shade score and litter size: standardise
dt$zLength <- scale(log(dt$Tail_length))
dt$zMass <- scale(log(dt$Mass))
dt$zShade<-scale(dt$shade_score)
dt$zTemp <- scale(dt$Mean_Annual_Temp)
dt$zLitter_size <- scale(dt$Litter_size)
# Visualise zLength distribution
# hist(dt$zLength)
# Also, convert some columns to factors for analysis just in case
dt <- dt %>%
mutate(across(c(White.tips, Black.tips, Tufts, Naked, Fluffy, Contrasting, Noc, Di, Crep, Autotomy), as.factor))
# 2. Read phylogenetic tree
trees <- read.tree(here("data", "potential", "Rodent_tail", "RodentTrees.tre")) # 100 trees
tree <- trees[[1]] # In this time, we use only one tree
#Trim out everything from the tree that's not in the modified dataset
trees <- lapply(trees, drop.tip,tip=setdiff(tree$tip.label, dt$Phylo))
# Select one tree for trimming purposes
tree <- trees[[1]]
tree <- force.ultrametric(tree, method = c("extend")) # Convert non-ultrametric tree to ultrametric treeHow to model? How to interpret the output?
Let’s start by running the model. The first model we will fit is a simple animal model with no fixed effects, and only a random effect relating phylogeny. From here on, we will always begin with this simplest model to assess the phylogenetic signal of the response variable before adding more complexity by including one continuous and one categorical explanatory variable.
Making model is basically 3-4 steps:
Get phylogenetic variance-covariance (
MCMCglmm) / correlation (brms) matrixSet prior based on your model and run model with default
iteration,thinning, andburn-inCheck mixing whether the samples are adequately exploring the parameter space and convergence, posterior distribution of random and fixed effects
If model do not show good mixing or converge…
- change
iteration,thinning, andburn-in - change prior setting
- change
Univariate model
Intercept-only momdel
MCMCglmm
To get the variance-covariance matrix, we use the inverseA() function in MCMCglmm:
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE) # Calculate the inverse of the phylogenetic relatedness matrix for all nodes, scaling the resultsThen, we will define the priors for the phylogenetic effect and the residual variance using the following code:
prior1 <- list(R = list(V = 1, nu = 0.002), # Prior for residuals: weak inverse-Wishart prior for residual variance
G = list(G1 = list(V = 1, nu = 0.002))) # Prior for random effect: weak inverse-Wishart for random effectwe set the prior for the residual variance to be a scaled inverse-Wishart distribution with a scale parameter of 1 and degrees of freedom of 0.002. The prior for the phylogenetic effect is set to be a scaled inverse-Wishart distribution with a scale parameter of 1 and degrees of freedom of 0.002. This prior minimizes the influence of prior information, allowing the data to have a stronger impact on the model’s results. However, if you know your dataset characteristics, you can set a stronger (informative) prior.
InMCMCglmm, we can set four elements in prior: R (R-structure for residual), G (G-structure for random effect), B (fixed effects), and S (theta_scale parameter). Here, we explain B, R, G.
| Parameter | Which elements | Meaning | Effect of increasing the value | Effect of decreasing the value |
|---|---|---|---|---|
| V | B, R, G | Variance parameter for the prior distribution | Increases variability in the prior, allowing more flexibility in parameter estimation. The model becomes more influenced by the data (= wider prior distribution) | Decreases variability in the prior, leading to more constrained estimates. The model becomes more influenced by the prior (= narrower prior distribution) |
| mu | B | Mean of the prior distribution for fixed effects | Shifts the centre of the fixed effect estimates towards a larger value | Shifts the centre of the fixed effect estimates towards a smaller value |
| nu | R, G | Degrees of freedom for the inverse Wishart distribution. Controls the “certainty” of the prior. Larger values result in smaller variance of the prior (higher certainty) | The prior becomes narrower, tending to decrease the variability of the variance-covariance matrix estimates. Stronger prior information is given | The prior becomes wider, tending to increase the variability of the variance-covariance matrix estimates. Weaker prior information is given. nu = 0.002 is often used |
| fix | R, G | If fix = 1, the variance is fixed, and the specified prior value (V) is used. No estimation of variance is performed | NA | NA |
| alpha.mu | R, G | Prior mean for the variance-covariance matrix. Specifies the central matrix for the variance-covariance matrix estimates. | The variance-covariance matrix estimates tend to approach the matrix specified by alpha.mu. | The variance-covariance matrix estimates tend to approach the matrix specified by alpha.mu (influences each element in the decreasing direction). |
| alpha.V | R, G | Prior variance for the variance-covariance matrix. Controls the variability of the variance-covariance matrix estimates. | The variability of the variance-covariance matrix estimates tends to increase. | The variability of the variance-covariance matrix estimates tends to decrease. |
Note that the G element is for random effects variance (G1, G2, G3…). That means if you have 3 random effects, you need to define 3 priors in G.
InMCMCglmm, the prior setting allows the model to account for different variance structures by dividing the variance components into two distinct matrices: G (phylogeny) and R (non-phylogeny, i.e., residual variance). From this setup, we can estimate the phylogenetic heritability, also known as the phylogenetic signal (Pagel’s \(\lambda\)), using the following simple equation:
\[ H^2 = \frac{\sigma_{a}^2}{\sigma_{a}^2 + \sigma_{e}^2} \]
When we do not include any explanatory variables (= intercept-only model), the model is added ~ 1. We set a bit large nitt, thin, and burnin to improve mixing and effective sample sizes than default settings (nitt = 13000, thin = 10, burnin = 3000).
# You can check model running time by system.time() function. Knowing the running time is important when you try and error to find the best model setting!
system.time(
mcmcglmm_mg1 <- MCMCglmm(zLength ~ 1, # Response variable zLength with an intercept-only model
random = ~ Phylo, # Random effect for Phylo
ginverse = list(Phylo = inv_phylo$Ainv), # Specifies the inverse phylogenetic covariance matrix Ainv for Phylo, which accounts for phylogenetic relationships.
prior = prior1, # Prior distributions for the model parameters
family = "gaussian", # Family of the response variable (Gaussian for continuous data)
data = dt, # Dataset used for fitting the model
nitt = 13000*20, # Total number of iterations (multiplied by 20 to match scale)
thin = 10*20, # Thinning rate (multiplied by 20)
burnin = 3000*20 # Number of burn-in iterations (multiplied by 20)
)
)
summary(mcmcglmm_mg1)To begin with, we should check mixing and model convergence. We can check them via trace plots and effective sample size. The plot should look like a fuzzy caterpillar to make sure your model has converged.
plot(mcmcglmm_mg1$VCV) # Visualise variance component plot(mcmcglmm_mg1$Sol) # Visualise location effectsautocorr.plot(mcmcglmm_mg1$VCV) # Check chain mixingautocorr.plot(mcmcglmm_mg1$Sol) # Check chain mixingsummary(mcmcglmm_mg1) # See the result 95%HPD
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
# DIC: -26.20546
# G-structure: ~Phylo
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.514 1.118 1.968 1000
# R-structure: ~units
# post.mean l-95% CI u-95% CI eff.samp
# units 0.02785 0.00892 0.04964 1000
# Location effects: zLength ~ 1
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.1778 -1.3683 1.0638 1000 0.822
summary(mcmcglmm_mg1$VCV) # See the result of 95% Credible Interval of the variance component
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
# Mean SD Naive SE Time-series SE
# Phylo 1.51394 0.22740 0.007191 0.007191
# units 0.02785 0.01066 0.000337 0.000337
# 2. Quantiles for each variable:
# 2.5% 25% 50% 75% 97.5%
# Phylo 1.11828 1.34700 1.49706 1.66954 1.9651
# units 0.01027 0.01996 0.02735 0.03436 0.0519
summary(mcmcglmm_mg1$Sol) # See the result of 95% Credible Interval of the location effects
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
# Mean SD Naive SE Time-series SE
# -0.17781 0.63444 0.02006 0.02006
# 2. Quantiles for each variable:
# 2.5% 25% 50% 75% 97.5%
# -1.4874 -0.6218 -0.1435 0.2346 0.9961 It looks like both estimates mixed well. And the effective sample size (eff.samp) is large enough. The summary output shows the posterior mean, standard deviation, and 95% credible interval of the fixed (Location effects) and random effects (G-structure) and also residuals (R-structure).
brms
The syntax is similar to the packages lme4 and lmer in R. One of the advantages of brms is that it can fit complex models with multiple random effects and non-linear terms. The model fitting process is similar to MCMCglmm but with a different syntax. The output also differs between the two packages.
In brms, We can use the default_prior(or get_prior) function to set prior for the model. This function automatically make a weakly informative prior, which is suitable for most models (more detailed information is here[url]). Of course, you can also set your own prior if you are familiar with your datasets and want to set more informative one (see the section 4).
Things to keep in your mind are to use vcv.phylo(phylogenetic_tree, corr = TRUE) function to get the phylogenetic correlation matrix. brms vignettes use vcv.phylo(phylogenetic_tree), but it looks like incorrect. You need to specify corr = TRUE. Because the default is corr = FALSE, which means the function returns the phylogenetic variance-covariance matrix, not the correlation matrix.
We set large iter and warmup to improve mixing and effective sample sizes than default settings (iter = 2000, warmup = 1000). Also, we set control = list(adapt_delta = 0.95) to control parameter to avoid divergent transitions. Setting adapt_delta closer to 1 increases the step size during sampling to improve model stability.
By default, Stan runs four independent Markov chains (chain = 4), each exploring the posterior distribution. But four chains are rarely required; two are enough (chain = 2).
A <- ape::vcv.phylo(tree, corr = TRUE) # Get the phylogenetic correlation matrixIn brms, we can also check the Rhat value to see if the model has converged. The Rhat value should be close to 1.0.
plot(brms_mg1) # Visualise effectssummary(brms_mg1) # See the result
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: zLength ~ 1 + (1 | gr(Phylo, cov = A))
# Data: dt (Number of observations: 300)
# Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
# total post-warmup draws = 10000
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 300)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.09 0.07 0.95 1.24 1.00 1156 2572
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.10 0.54 -1.16 0.97 1.00 2228 3739
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.22 0.02 0.18 0.26 1.00 1091 2257
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).To check the consistency between the results obtained from MCMCglmm and brms, we compare the mean of posterior distribution of the fixed and random effect estimates. You should note that the output of random effect estimates and residuals are presented in slightly different formats between MCMCglmm and brms. In MCMCglmm, the outputs (G-structure / R-structure) are provided as variance, while in brms, the outputs (Multilevel Hyperparameters / Further Distributional Parameters) are given as standard deviation. Therefore, when comparing these estimates between the two packages, you need to square the values obtained from brms.
As you can see below, the MCMCglmm and brms estimates do not exactly match. However, the values are close, and the overall trend is consistent. This suggests that the outcomes obtained from either approach remain largely the same. Thus, even with some differences in the estimates, the broader results and patterns can be interpreted similarly. *The difference comes from factors such as each package’s sampling algorithms and the prior setting.
In this model, the phylogenetic signal is quite high - over 0.95. This means that 95% of the variance in tail length is explained by the phylogenetic relationship among species. This is a strong phylogenetic signal, suggesting that closely related species tend to have similar tail lengths!
Intercept
summary(mcmcglmm_mg1$Sol) # MCMCglmm
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
# Mean SD Naive SE Time-series SE
# -0.17781 0.63444 0.02006 0.02006
# 2. Quantiles for each variable:
# 2.5% 25% 50% 75% 97.5%
# -1.4874 -0.6218 -0.1435 0.2346 0.9961
draws_df <- as_draws_df(brms_mg1) # Convert brms object to data frame
summary(draws_df$b_Intercept) # brms
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -2.21282 -0.46570 -0.09844 -0.10152 0.26306 1.84947 Random effect and phylogenetic signals (and also residuals)
# random effect
mean(mcmcglmm_mg1$VCV[, "Phylo"]) # MCMCglmm(g-structure)
# [1] 1.513938
mean(draws_df$sd_Phylo__Intercept)^2 # brms(Multilevel Hyperparameters)
# [1] 1.179172
# residuals
mean(mcmcglmm_mg1$VCV[, "units"]) # MCMCglmm(r-structure)
# [1] 0.02785073
mean(draws_df$sigma)^2 # brms(Further Distributional Parameters)
# [1] 0.04712953
# phylogenetic signals
phylo_signal_mcmcglmm <- mean(mcmcglmm_mg1$VCV[, "Phylo"]) / (mean(mcmcglmm_mg1$VCV[, "Phylo"]) + mean(mcmcglmm_mg1$VCV[, "units"]))
phylo_signal_brms <- mean(draws_df$sd_Phylo__Intercept)^2 / (mean(draws_df$sd_Phylo__Intercept)^2 + mean(draws_df$sigma)^2)
phylo_signal_mcmcglmm
# [1] 0.9819361
phylo_signal_brms
# [1] 0.9615677One continuous explanatory variable model
MCMCglmm
system.time(
mcmcglmm_mg2 <- MCMCglmm(zLength ~ zTemp,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
prior = prior1,
family = "gaussian",
data = dt,
nitt = 13000*25,
thin = 10*25,
burnin = 3000*25
)
)summary(mcmcglmm_mg2)
# Iterations = 75001:324751
# Thinning interval = 250
# Sample size = 1000
# DIC: -5.777706
# G-structure: ~Phylo
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.464 1.065 1.912 1000
# R-structure: ~units
# post.mean l-95% CI u-95% CI eff.samp
# units 0.03053 0.009984 0.05334 1000
# Location effects: zLength ~ zTemp
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.17061 -1.48930 0.99052 1000 0.768
# zTemp 0.03310 -0.05767 0.11829 1000 0.482
posterior_summary(mcmcglmm_mg2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.46421444 0.2224519 1.06723218 1.912723
# units 0.03052545 0.0113458 0.01195931 0.057253
posterior_summary(mcmcglmm_mg2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.1706142 0.63514628 -1.4145991 1.1523617
# zTemp 0.0331045 0.04637504 -0.0522391 0.1252606The fixed effects estimate for the relationship between tail length and standardized temperature is 0.0331 (posterior mean), with a 95% HPD interval of [-0.05, 0.11]. The pMCMC value is 0.482, indicating that the effect of standardised temperature is not statistically supported under this model. The intercept, representing the expected tail length when standardised temperature is zero, has a posterior mean of -0.171, with a 95% credible interval of [-1.42, 1.15], also suggesting no strong evidence for deviation from zero.
The 95% HPD (Highest Posterior Density interval) from MCMCglmm and the 95% CI ( 95% equal-tailed credible interval) from brms serve a similar purpose, but they are constructed differently and can yield different intervals, especially in asymmetric posterior distributions.
Highest Posterior Density (HPD) Interval:
The HPD interval is the narrowest interval that contains a specified proportion (e.g., 95%) of the posterior distribution. It includes the most probable values of the parameter, ensuring that every point inside the interval has a higher posterior density than any point outside.
Equal-Tailed Credible Interval (CI):
An equal-tailed credible interval includes the central 95% of the posterior distribution, leaving equal probabilities (2.5%) in each tail. This means that there’s a 2.5% chance that the parameter is below the interval and a 2.5% chance that it’s above.
https://search.r-project.org/CRAN/refmans/HDInterval/html/hdi.html
brms
default_priors2 <- default_prior(
zLength ~ zTemp + (1|gr(Phylo, cov = A)),
data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
brms_mg2 <- brm(zLength ~ zTemp + (1|gr(Phylo, cov = A)),
data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_priors2,
iter = 10000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
threads = threading(5)
)
)summary(brms_mg2)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: zLength ~ zTemp + (1 | gr(Phylo, cov = A))
# Data: dt (Number of observations: 223)
# Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
# total post-warmup draws = 10000
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 223)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.20 0.09 1.03 1.38 1.00 864 1701
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.15 0.60 -1.33 1.03 1.00 1900 3128
# zTemp 0.04 0.05 -0.05 0.13 1.00 1961 4758
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.18 0.03 0.12 0.24 1.00 537 923
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).The posterior estimate for the effect of zTemp on zLength is 0.04, with a 95% credible interval of [-0.05, 0.13]. The uncertainty surrounding this effect suggests that the relationship between standardised temperature and tail length is weak and not statistically significant. The intercept has a posterior estimate of -0.15 with a 95% credible interval of [-1.33, 1.03], representing the average tail length when zTemp = 0, although this is also uncertain.
The results of MCMCglmm and brms are generally consistent, with a few differences. The estimates and credible intervals for the fixed effect (zTemp) are almost identical, both indicating a weak relationship. These differences mainly arise from the computational methods and output formats of the models but do not affect the conclusions.
One continuous and one categorical explanatory variable model
Finally, we add categorical explanatory variable, Contrasting, to the above model. In this model, we test whether species with contrasting tail tips have longer tails than those without contrasting tail tips, after controlling for the effect of temperature.
MCMCglmm
system.time(
mcmcglmm_mg3 <- MCMCglmm(zLength ~ zTemp + Contrasting,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
prior = prior1,
family = "gaussian",
data = dt,
nitt=13000*30,
thin=10*30,
burnin=3000*30
)
)summary(mcmcglmm_mg3)
# Iterations = 90001:389701
# Thinning interval = 300
# Sample size = 1000
# DIC: 87.21227
# G-structure: ~Phylo
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.19 0.9022 1.5 1113
# R-structure: ~units
# post.mean l-95% CI u-95% CI eff.samp
# units 0.04284 0.02576 0.05855 1168
# Location effects: zLength ~ zTemp + Contrasting
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.184003 -1.362926 0.869770 1000 0.740
# zTemp 0.082215 -0.006456 0.155338 1000 0.050 .
# Contrasting1 0.163943 0.059351 0.290087 1000 0.004 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
posterior_summary(mcmcglmm_mg3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.19045414 0.16451779 0.90569889 1.51171543
# units 0.04283667 0.00864498 0.02680094 0.06011227
posterior_summary(mcmcglmm_mg3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.18400256 0.5840963 -1.262384395 1.0278077
# zTemp 0.08221542 0.0420212 0.001294616 0.1679503
# Contrasting1 0.16394271 0.0581162 0.036145589 0.2736073Both temperature (\(zTemp\)) and contrasting patterns appear to influence tail length. Contrasting has a stronger effect than temperature.
brms
default_priors3 <- default_prior(
zLength ~ zTemp + Contrasting + (1|gr(Phylo, cov = A)),
data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
brms_mg3 <- brm(zLength ~ zTemp + Contrasting + (1|gr(Phylo, cov = A)),
data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_priors3,
iter = 10000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
threads = threading(5)
)
)summary(brms_mg3)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: zLength ~ zTemp + Contrasting + (1 | gr(Phylo, cov = A))
# Data: dt (Number of observations: 223)
# Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
# total post-warmup draws = 10000
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 223)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.23 0.09 1.06 1.42 1.01 488 945
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.16 0.62 -1.38 1.04 1.00 1728 3132
# zTemp 0.03 0.05 -0.05 0.12 1.00 1447 2230
# Contrasting1 0.15 0.06 0.02 0.27 1.00 2826 4736
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.16 0.03 0.09 0.22 1.01 295 286
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).In brms, \(Contrasting\) has a positive effect on tail length. The posterior estimate is 0.15, with a 95% credible interval of [0.03, 0.27]. This suggests that species with contrasting tail tips have longer tails than those without contrasting tail tips, after controlling for the effect of temperature. The effect of temperature on tail length is similar to the previous model, with a posterior estimate of 0.03 and a 95% credible interval of [-0.05, 0.13].
Although there is some overlap in the estimates of temperature’s effect between the two models, the key difference is that MCMCglmm provides a positive effect with stronger evidence (CI not including zero), whereas brms shows more uncertainty (CI including zero). This suggests that while both models suggest a potential effect, brms does not provide as clear evidence for it being significant, and the effect size might be smaller or potentially null.
Bivariate model
Here, we provide more advanced models - bivariate models. Because the bivariate model is similar to the nominal model, it allows for examining relationships between two response variables while taking into account multiple explanatory variables. In this model, we will include two response variables, zLength and zMass, and test the relationship between these two variables and the explanatory variables, zTemp and Contrasting. This model can examine the evolutionary correlation between tail length and body mass in rodents, considering the phylogenetic relationships.
In MCMCglmm, we use cbind(traitA, traitB) to set the response variables (this allows us to estimate the covariance between the two response variables). In brms, we use mvbind(traitA, traitB) to set the response variable. We also need to specify family for both response variables (unless you use a custom prior, you need not define what distribution fit to the random effect in brms- default prior automatically set them).
The important point is, MCMCglmm does not provide the correlation estimate between the two response variables (it estimate the covariance between two response variables), while brms does. However, you need to use (1|a|gr(Phylo, cov = A)) instead of (1|gr(Phylo, cov = A)) and set set_rescor(TRUE). Without the newly added |a| and set_rescor(TRUE), brms will not calculate the correlation (a can be any letter). We need to calculate the correlation using covariance estimates in MCMCglmm. Fourtunately, the correlation can calculate easily by dividing the covariance by the product of the standard deviations of the two variables.
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior2 <- list(G = list(G1 = list(V = diag(2),
nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 1000)),
R = list(V = diag(2), nu = 0.002)
)
system.time(
mcmcglmm_mg4 <- MCMCglmm(cbind(zLength, zMass) ~ trait - 1,
random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
mcmcglmm_mg5 <- MCMCglmm(cbind(zLength, zMass) ~ zTemp:trait + trait - 1,
random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
mcmcglmm_mg6 <- MCMCglmm(cbind(zLength, zMass) ~ zTemp:trait + Contrasting:trait + trait -1,
random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
## brms
A <- ape::vcv.phylo(tree, corr = TRUE)
formula <- bf(mvbind(zLength, zMass) ~ 1 +
(1|a|gr(Phylo, cov = A)),
set_rescor(rescor = TRUE)
)
default_prior <- default_prior(formula,
data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
brms_mg4 <- brm(formula = formula,
data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior,
iter = 25000,
warmup = 10000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
)
)
formula2 <- bf(mvbind(zLength, zMass) ~ zTemp + (1|a|gr(Phylo, cov = A)), set_rescor(TRUE))
default_prior2 <- default_prior(formula2,
data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
brms_mg5_1 <- brm(formula = formula2,
data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior,
iter = 55000,
warmup = 45000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
)
)
formula3 <- bf(mvbind(zLength, zMass) ~ zTemp + Contrasting + (1|a|gr(Phylo, cov = A)), set_rescor(TRUE))
default_prior3 <- default_prior(formula3,
data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
brms_mg6 <- brm(formula = formula3,
data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior3,
iter = 25000,
warmup = 20000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.98),
)
)summary(mcmcglmm_mg4)
# Iterations = 165001:714451
# Thinning interval = 550
# Sample size = 1000
# DIC: 56.02311
# G-structure: ~us(trait):Phylo
# post.mean l-95% CI u-95% CI eff.samp
# traitzLength:traitzLength.Phylo 1.8450 1.3355 2.4126 1000
# traitzMass:traitzLength.Phylo -0.1288 -0.5302 0.2168 1000
# traitzLength:traitzMass.Phylo -0.1288 -0.5302 0.2168 1000
# traitzMass:traitzMass.Phylo 2.1016 1.5978 2.6773 1120
# R-structure: ~us(trait):units
# post.mean l-95% CI u-95% CI eff.samp
# traitzLength:traitzLength.units 0.024426 0.006637 0.04178 1000
# traitzMass:traitzLength.units 0.005761 -0.008694 0.02082 1000
# traitzLength:traitzMass.units 0.005761 -0.008694 0.02082 1000
# traitzMass:traitzMass.units 0.057850 0.033641 0.08572 1000
# Location effects: cbind(zLength, zMass) ~ trait - 1
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitzLength -0.04079 -2.04401 1.70436 1000 0.968
# traitzMass -0.44244 -2.33966 1.55863 1000 0.688
posterior_summary(mcmcglmm_mg4$VCV)
# Estimate Est.Error Q2.5 Q97.5
# traitzLength:traitzLength.Phylo 1.845005322 0.272081110 1.347508423 2.47378715
# traitzMass:traitzLength.Phylo -0.128783301 0.192333313 -0.517573376 0.23065613
# traitzLength:traitzMass.Phylo -0.128783301 0.192333313 -0.517573376 0.23065613
# traitzMass:traitzMass.Phylo 2.101601613 0.288665978 1.624968122 2.74203079
# traitzLength:traitzLength.units 0.024426281 0.009195449 0.008110895 0.04557695
# traitzMass:traitzLength.units 0.005761157 0.007496943 -0.008150784 0.02198074
# traitzLength:traitzMass.units 0.005761157 0.007496943 -0.008150784 0.02198074
# traitzMass:traitzMass.units 0.057849736 0.013834738 0.036274710 0.09066023
posterior_summary(mcmcglmm_mg4$Sol)
# Estimate Est.Error Q2.5 Q97.5
# traitzLength -0.04079478 0.9568184 -1.827382 2.059106
# traitzMass -0.44243620 1.0046735 -2.486688 1.513088
corr_p <- (mcmcglmm_mg4$VCV[, 2]) /sqrt((mcmcglmm_mg4$VCV[, 1] * mcmcglmm_mg4$VCV[, 4])) # calculate phylogenetic correlation between zLength and zMass
corr_nonp <- (mcmcglmm_mg4$VCV[, 6]) /sqrt((mcmcglmm_mg4$VCV[, 5] * mcmcglmm_mg4$VCV[, 8]))
posterior_summary(corr_p)
# Estimate Est.Error Q2.5 Q97.5
# var1 -0.06650913 0.09770663 -0.2628186 0.1173509
posterior_summary(corr_nonp)
# Estimate Est.Error Q2.5 Q97.5
# var1 0.1527294 0.1890808 -0.2165917 0.5260998
summary(brms_mg4)
# Family: MV(gaussian, gaussian)
# Links: mu = identity; sigma = identity
# mu = identity; sigma = identity
# Formula: zLength ~ 1 + (1 | a | gr(Phylo, cov = A))
# zMass ~ 1 + (1 | a | gr(Phylo, cov = A))
# Data: dt (Number of observations: 223)
# Draws: 2 chains, each with iter = 35000; warmup = 25000; thin = 1;
# total post-warmup draws = 20000
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 223)
# Estimate Est.Error l-95% CI u-95% CI
# sd(zLength_Intercept) 1.22 0.09 1.05 1.40
# sd(zMass_Intercept) 1.29 0.09 1.12 1.48
# cor(zLength_Intercept,zMass_Intercept) -0.06 0.10 -0.26 0.13
# Rhat Bulk_ESS Tail_ESS
# sd(zLength_Intercept) 1.00 1048 2370
# sd(zMass_Intercept) 1.00 3079 6693
# cor(zLength_Intercept,zMass_Intercept) 1.00 1912 4279
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# zLength_Intercept -0.14 0.61 -1.35 1.05 1.00 2159 4378
# zMass_Intercept -0.42 0.63 -1.67 0.83 1.00 3442 6223
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma_zLength 0.17 0.03 0.11 0.23 1.00 751 1004
# sigma_zMass 0.25 0.03 0.20 0.31 1.00 2092 4823
# Residual Correlations:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
# rescor(zLength,zMass) 0.14 0.18 -0.22 0.47 1.00 2017
# Tail_ESS
# rescor(zLength,zMass) 4659As you can see the above, the estimates from both models are consistent:
the random effects are…
zLength is 1.845 (1.348, 2.474) and zMass is 2.102 (1.625, 2.742)in MCMCglmm and 1.4884 (1.1025, 1.96) and 1.664 (1.2544, 2.1904) in brms. Phylogenetic correlation between zLength and zMass are -0.067 (-0.263, 0.117) in MCMCglmm and -0.06 (-0.26, 0.13)
Both length and mass exhibit very high phylogenetic signals (heritabilities), indicating that these traits are strongly conserved evolutionarily and that closely related species tend to resemble each other.
draws_df <- as_draws_df(brms_mg4) # Convert brms object to data frame
# random effect - length
## mcmcglmm
mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.Phylo"]) # MCMCglmm(g-structure)
# [1] 1.845005
## brms
mean(draws_df$sd_Phylo__zLength_Intercept)^2 # brms(Multilevel Hyperparameters)
# [1] 1.179172
# random effect - mass
## mcmcglmm
mean(draws_df$sd_Phylo__zLength_Intercept)^2
## brms
mean(draws_df$sd_Phylo__zMass_Intercept)^2 # brms(Multilevel Hyperparameters)
# [1] 1.664106
# residuals - length
mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.units"]) # MCMCglmm(r-structure)
# [1] 0.02442628
mean(draws_df$sigma_zLength)^2 # brms(Further Distributional Parameters)
# [1] 0.02886549
mean(mcmcglmm_mg4$VCV[, "traitzMass:traitzMass.units"]) # MCMCglmm(r-structure)
# [1] 0.05784974
mean(draws_df$sigma_zMass)^2 # brms(Further Distributional Parameters)
# [1] 0.05784974
# phylogenetic heritability - length
## mcmcglmm
phylo_signal_mcmcglmm <- mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.Phylo"]) / (mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.Phylo"]) + mean(mcmcglmm_mg4$VCV[, "traitzLength:traitzLength.units"]))
phylo_signal_brms <- mean(draws_df$sd_Phylo__zLength_Intercept)^2 / (mean(draws_df$sd_Phylo__zLength_Intercept)^2 + mean(draws_df$sigma_zLength)^2)
phylo_signal_mcmcglmm
# [1] 0.9869338
phylo_signal_brms
# [1] 0.9810099
# phylogenetic heritability - mass
phylo_signal_mcmcglmm <- mean(mcmcglmm_mg4$VCV[, "traitzMass:traitzMass.Phylo"]) / (mean(mcmcglmm_mg4$VCV[, "traitzMass:traitzMass.Phylo"]) + mean(mcmcglmm_mg4$VCV[, "traitzMass:traitzMass.units"]))
phylo_signal_brms <- mean(draws_df$sd_Phylo__zMass_Intercept)^2 / (mean(draws_df$sd_Phylo__zMass_Intercept)^2 + mean(draws_df$sigma_zMass)^2)
phylo_signal_mcmcglmm
# [1] 0.9732109
phylo_signal_brms
# [1] 0.97321092. Binary models
Binary models are used when the response variable is binary (0 or 1), such as presence (1) vs. absence (0), survival (1) vs. death (0), success (1) vs. failure (0), or female vs. male. The results are straightforward to interpret: the model provides the probability of a ‘success’ (coded as 1) given the predictor variables. For example, in a female/male classification, the output represents the probability of being male, as R typically treats the alphabetically first category as the reference level.
Explanation of dataset
We tested the relationships between the presence of red–orange pelage on limbs (response variable) and average social group size and activity cycle in primates (265 species). We used the dataset and tree from Macdonald et al. (2024). The dataset contains information on the presence of skin and pelage coloration data, average social group size, presence or absence of multilevel hierarchical societies, colour visiual system, and activity cycle. We aim to test whether the presence of red–orange pelage on limbs is associated with social group size and activity cycle in primates.
For run model, we need to prepare the suitable format data.
p_trees <- read.nexus(here("data", "potential", "primate", "trees100m.nex"))
primate_data <- read.csv(here("data", "potential", "primate", "primate_data_male.csv"))
p_dat <- primate_data
p_dat <- subset(p_dat, !is.na(vs_male)) # omit NAs by VS predictor
p_dat <- subset(p_dat, !is.na(vs_female))
p_dat <- subset(p_dat, !is.na(activity_cycle)) # omit NAs by activity cycle
p_dat <- subset(p_dat, !is.na(redpeachpink_facial_skin)) # omit NAs by response variable
p_dat <- subset(p_dat, !is.na(social_group_size)) # omit NAs by response variable
p_dat <- subset(p_dat, !is.na(multilevel)) # omit NAs by response variable
# Rename the column 'social_group_size' to 'cSocial_group_size' - all continuous variables were centred and standardized by authors
p_dat <- p_dat %>% rename(cSocial_group_size = social_group_size)
p_dat <- p_dat %>%
mutate(across(c(vs_female, vs_male, redpeachpink_facial_skin,red_genitals,red_pelage_head, red_pelage_body_limbs, red_pelage_tail), as.factor))
p_tree <- p_trees[[1]]
p_trees <- lapply(p_trees, drop.tip,tip = setdiff(p_tree$tip.label, p_dat$PhyloName)) #trim out everything from the tree that's not in the dataset
p_tree <- p_trees[[1]] # select one tree for trimming purposes
p_tree <- force.ultrametric(p_tree) # force tree to be ultrametric - all tips equidistant from rootHow to impliment models and interpret the outputs?
InMCMCglmm, the binary model can take two kinds of link functions: logit and probit link functions. The logit link function (family = "categorical") is the default inMCMCglmm for the binary model, while the probit link function can be defined using family = "threshold" or family = "ordinal". We recommend using family = "threshold" as it is often more suitable for modelling under biological assumption and can provide better convergence properties in certain situations. Here, we show both models.
Univariate model
Intercept-only model
Probit model
MCMCglmm
We need to set different prior for binary models from the Gaussian mopdel. Actually, the residual variance does not need to be estimated in binary models, so we only need to set the prior for the random effect. In binary models, the response variable y is either 0 or 1, which means there are no variety residuals to measure variability around a predicted mean. For logistic regression (logit link), the residual variance is fixed at ^2/3 on the latent scale. For probit regression (probit link), the residual variance is fixed at 1 on the latent scale. InMCMCglmm, the prior settings do not differ between the probit and logit models.
inv.phylo <- inverseA(p_tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1), # fix residual variance = 1
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
mcmcglmm_BP1 <- MCMCglmm(red_pelage_body_limbs ~ 1,
random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)summary(mcmcglmm_BP1) # 95%HPD Interval
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 298.7917
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 3.186 0.2795 7.242 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.8487 -2.8595 0.8645 1093 0.296
posterior_summary(mcmcglmm_BP1$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 3.185968 2.226905 0.6918403 8.6888
# units 1.000000 0.000000 1.0000000 1.0000
posterior_summary(mcmcglmm_BP1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.8487221 0.8998161 -2.859676 0.8638048As you can see here, residual variance is fixed 1 - the model did not estimate the residual variance (R-structure). The MCMC chain mixed well. The positive posterior mean of phylogenetic random effect (3.186) suggests that closely related species tend to have more red limb pelages.The fact that the 95% HPD interval / 95% equal-tailed credible interval does not include zero indicates that the effect of lineage on red pelages is statistically significant.
brms
family argument is set to bernoulli() (not binomial()). The bernoulli() family is used for binary data, while the binomial() family is used for the combination of count data, such as (c(N_sucsess, N_failure)).
A <- ape::vcv.phylo(p_tree, corr = TRUE)
priors_brms <- default_prior(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
brms_BP1 <- brm(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms,
iter = 6000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)summary(brms_BP1)
# Family: bernoulli
# Links: mu = probit
# Formula: red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 6000; warmup = 5000; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.67 0.53 0.78 2.85 1.00 393 727
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.72 0.81 -2.38 0.82 1.00 752 1027The same trend in results was observed in brms. Note that the phylogenetic random effect estimate in brms is given as the standard deviation (sd), so to compare it with MCMCglmm, we need to square the sd. There were no differences between the fixed effects in brms and MCMCglmm. Outputs of brms is usually 95% equal-tailed credible interval.
Phylogenetic signals (probit model)
# MCMCglmm
phylo_signal_mcmcglmm_BP <- ((mcmcglmm_BP1$VCV[, "PhyloName"]) / (mcmcglmm_BP1$VCV[, "PhyloName"] + 1))
phylo_signal_mcmcglmm_BP %>% mean()
# [1] 0.7052307
phylo_signal_mcmcglmm_BP %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.4089277 0.7260425 0.8967865
# brms
phylo_signal_brms_BP <- brms_BP1 %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_PhyloName__Intercept) %>%
mutate(lambda_probit = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_probit)
phylo_signal_brms_BP %>% mean()
# [1] 0.6992207
phylo_signal_brms_BP %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.3795359 0.7267670 0.8903665As expected from the random effects estimates above, the systematic signal is estimated to be 0.71 for MCMCMCglmm and 0.70 for brms. The difference with the estimation from Gaussian model is that the residuals are fixed at 1 in the binary model. That is the same for the other discrete models.
Logit model
MCMCglmm
c2 correction
In the logit model, we need to correct the estimates obtained from MCMCglmm as MCMCglmm implements additive overdispersion GLMM fitted by MCMC sampling from the posterior distribution. This overdispersion lead to inflated variance estimates (see also Nakagawa & Schielzeth (2010)). To adjust for this, we need to rescale the estimates of both location effects and variance components by considering the residual variance adjustment (MCMCglmm course note written by Jarrod Hadfield). We can use the below code:
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- model$Sol / sqrt(1+c2) # for fixed effects
res_2 <- model$VCV / (1+c2) # for variance componentsThe logit model is a bit difficult to converge than the probit model. So, we set a bit large nitt, thin, and burnin to improve mixing and effective sample sizes than the probit model.
inv.phylo <- inverseA(p_tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
mcmcglmm_BL1 <- MCMCglmm(red_pelage_body_limbs ~ 1,
random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60)
)brms
In brms, we do not need to conduct c2 correction as brms automatically corrects the overdispersion.
A <- ape::vcv.phylo(p_tree, corr = TRUE)
priors_brms2 <- default_prior(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
brms_BL1 <- brm(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms2,
iter = 7500,
warmup = 6500,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
)
)# MCMCglmm
summary(mcmcglmm_BL1)
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 294.3311
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 11.64 0.7224 26.95 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.665 -5.154 1.833 1000 0.316
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- mcmcglmm_BL1$Sol/sqrt(1+c2)
res_2 <- mcmcglmm_BL1$VCV/(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.435339 1.560119 -4.565329 1.502008
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 8.6476147 6.01105 1.5836214 24.0221167
# units 0.7430287 0.00000 0.7430287 0.7430287
#brms
# summary(brms_BL1)
# Family: bernoulli
# Links: mu = logit
# Formula: red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 7500; warmup = 6500; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.61 0.85 1.21 4.46 1.00 550 971
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -1.08 1.19 -3.54 1.13 1.00 1433 1398
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).The point estimates and the overlaps of 95%CI of both random and fixed effects look fine.
Phylogenetic signals (logit model)
Then, we can obtain the phylogenetic signal using
\[ H^2 = \frac{\sigma_{a}^2}{\sigma_{a}^2 + \pi^2/3} \]
# MCMCglmm
phylo_signal_mcmcglmm_BL <- ((mcmcglmm_BL1$VCV[, "PhyloName"]/(1+c2)) / (mcmcglmm_BL1$VCV[, "PhyloName"]/(1+c2)+1))
phylo_signal_mcmcglmm_BL %>% mean()
# [1] 0.8541816
phylo_signal_mcmcglmm_BL %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.6129464 0.8776149 0.9600353
# brms
phylo_signal_brms_BL <- brms_BL1 %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_PhyloName__Intercept) %>%
mutate(lambda_logit = (Sigma_phy^2 / (Sigma_phy^2+1))) %>%
pull(lambda_logit)
phylo_signal_brms_BL %>% mean()
phylo_signal_brms_BL %>% quantile(probs = c(0.025,0.5,0.975))Both estimated phylogenetic signals took close values.
One continuous explanatory variable model
For the next step, we examine whether the group size affect the presence or absence of red pelagic on the body limbs in primates to test the possibility that red pelage is related to social roles and behaviours within the group.
Probit model
The model using MCMCglmm is
inv.phylo <- inverseA(p_tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
mcmcglmm_BP2 <- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size,
random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*30,
thin = 10*30,
burnin = 3000*30)
)For brms…
A <- ape::vcv.phylo(p_tree, corr = TRUE)
priors_brms3 <- default_prior(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
brms_BP2 <- brm(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms3,
iter = 10000,
warmup = 8500,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
)
)# MCMCglmm
summary(mcmcglmm_BP2)
# Iterations = 90001:389701
# Thinning interval = 300
# Sample size = 1000
#
# DIC: 299.4154
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 3.489 0.2165 7.618 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ cSocial_group_size
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.9754 -2.8047 0.8503 1979 0.258
# cSocial_group_size -0.1136 -0.3679 0.1707 1091 0.418
posterior_summary(mcmcglmm_BP2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.9753971 0.9209577 -2.9483573 0.7464915
# cSocial_group_size -0.1135599 0.1364510 -0.4137361 0.1260902
posterior_summary(mcmcglmm_BP2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 3.488721 2.419961 0.6210229 8.835258
# units 1.000000 0.000000 1.0000000 1.000000
#brms
summary(brms_BP2)
# Family: bernoulli
# Links: mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 10000; warmup = 8500; thin = 1;
# total post-warmup draws = 3000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.68 0.53 0.78 2.86 1.00 440 944
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.75 0.85 -2.54 0.88 1.00 1318 1299
# cSocial_group_size -0.11 0.13 -0.40 0.12 1.00 3127 1926Both models suggest that social group size does not significantly affect the presence or absence of red pelage on the body limbs in primates, as the credible intervals include zero and the p-values are higher than 0.05.
Logit model
The model for MCMCglmm…
inv.phylo <- inverseA(p_tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
mcmcglmm_BL2 <- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size,
random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60)
)For brms…
A <- ape::vcv.phylo(p_tree, corr = TRUE)
priors_brms4 <- default_prior(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
brms_biL2 <- brm(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms4,
iter = 18000,
warmup = 8000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
)
)summary(mcmcglmm_BL2)
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 294.8728
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 11.95 0.8832 26.74 835
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ cSocial_group_size
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.8090 -5.3966 1.7961 1000 0.266
# cSocial_group_size -0.2622 -0.8166 0.2671 1000 0.356
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- mcmcglmm_BL2$Sol/sqrt(1+c2)
res_2 <- mcmcglmm_BL2$VCV/(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.5593137 1.5801956 -4.9135095 1.3437762
# cSocial_group_size -0.2260184 0.2484719 -0.7398858 0.2081084
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 8.8770592 5.968142 1.4813098 23.2748934
# units 0.7430287 0.000000 0.7430287 0.7430287
# brms
summary(brms_BP2)
# Family: bernoulli
# Links: mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 10000; warmup = 8500; thin = 1;
# total post-warmup draws = 3000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.68 0.53 0.78 2.86 1.00 440 944
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.75 0.85 -2.54 0.88 1.00 1318 1299
# cSocial_group_size -0.11 0.13 -0.40 0.12 1.00 3127 1926Estimates from both packages were similar and the logit model yielded the same results as the probit model: the size of the social group has no significant effect on this trait.
One continuous and one categorical explanatory variable model
Therefore, we hypothesised that there might be a relationship between the presence of red pelage and whether a primate species is diurnal or not. We included this as a predictor variable in our model. This is because red colouration may not be as visible in the dark, making it potentially less advantageous for nocturnal species. On the other hand, diurnal species may benefit more from the visibility of red pelage, which could play a role in social/sexual signalling or other ecological factors during the day. Note that the activity cycle variable comprised 01 data, whether diurnal or not, to simplify understanding the model outputs.
Probit model
For the model MCMCglmm the following:
p_dat$diurnal <- ifelse(p_dat$activity_cycle == "di", 1, 0)
inv.phylo <- inverseA(p_tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
mcmcglmm_BP3 <- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size + diurnal,
random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*50,
thin = 10*50,
burnin = 3000*50)
)For brms,
A <- ape::vcv.phylo(p_tree, corr = TRUE)
priors_brms5 <- default_prior(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
brms_BP3 <- brm(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms5,
iter = 5000,
warmup = 3000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
)
)# MCMCglmm
summary(mcmcglmm_BP3)
# Iterations = 150001:649501
# Thinning interval = 500
# Sample size = 1000
#
# DIC: 299.2849
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 4.063 0.535 9.133 900.6
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.685030 -3.381045 1.969969 1000 0.584
# cSocial_group_size -0.110831 -0.373846 0.151850 1000 0.392
# activity_cycledi -0.533313 -2.486488 1.364789 1000 0.572
# activity_cyclenoct -0.007557 -1.807080 1.823124 1000 0.986
posterior_summary(mcmcglmm_BP3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 4.062649 2.673005 0.947084 11.42444
# units 1.000000 0.000000 1.000000 1.00000
posterior_summary(mcmcglmm_BP3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.685030297 1.3457124 -3.6844040 1.7696060
# cSocial_group_size -0.110831330 0.1342969 -0.4257179 0.1235912
# activity_cycledi -0.533313064 0.9653465 -2.6461670 1.2170270
# activity_cyclenoct -0.007557281 0.9589911 -1.9164287 1.7255631
# brms
summary(brms_BP3)
# Family: bernoulli
# Links: mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.86 0.58 0.89 3.21 1.00 3870 6693
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.54 1.21 -2.96 1.87 1.00 9333 10083
# cSocial_group_size -0.11 0.13 -0.40 0.13 1.00 21096 14055
# activity_cycledi -0.50 0.93 -2.46 1.24 1.00 9758 9062
# activity_cyclenoct -0.01 0.96 -2.00 1.81 1.00 9588 9736The results indicate that both models (MCMCglmm and brms) are reliable based on diagnostic metrics. Effective sample sizes are sufficient, and Rhat values are close to 1, indicating good convergence.
For random effects, the estimated variance of the phylogenetic effect is consistent between models (MCMCglmm: 3.59; brms: 1.73^2 = 2.99), with overlapping 95% credible intervals. This suggests still phylogenetic influence on red pelage presence, though with some uncertainty.
For fixed effects, neither social group size (MCMCglmm: -0.10, brms: -0.11) nor diurnal (MCMCglmm: -0.49, brms: -0.48) showed statistically significant effects, as their 95% CIs include zero. This suggests that group size and diurnal activity are unlikely to be major determinants of red pelage presence.
Logit model Finally, MCMCglmm
inv.phylo <- inverseA(p_tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
mcmcglmm_BL3 <- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size + diurnal,
random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*50,
thin = 10*50,
burnin = 3000*50)
)And brms,
A <- ape::vcv.phylo(p_tree, corr = TRUE)
priors_brms6 <- default_prior(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
brms_BL3 <- brm(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms6,
iter = 18000,
warmup = 8000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
)
)summary(mcmcglmm_BL3)
# Iterations = 240001:1039201
# Thinning interval = 800
# Sample size = 1000
#
# DIC: 293.6985
#
# G-structure: ~PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# PhyloName 15.16 1.792 37.31 905.8
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.30757 -6.63033 3.76436 1000 0.594
# cSocial_group_size -0.23416 -0.76302 0.28469 1000 0.392
# activity_cycledi -1.15936 -5.18073 2.29078 1000 0.528
# activity_cyclenoct -0.09113 -3.90477 3.45304 1000 0.980
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- mcmcglmm_BL3$Sol/sqrt(1+c2)
res_2 <- mcmcglmm_BL3$VCV/(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.12711731 2.2299457 -5.7979353 3.1769798
# cSocial_group_size -0.20184346 0.2376961 -0.7177624 0.1888044
# activity_cycledi -0.99936034 1.6233908 -4.3827738 2.1066431
# activity_cyclenoct -0.07855316 1.6257651 -3.4238818 2.9358848
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# PhyloName 11.2672019 8.147431 2.2323210 31.2476370
# units 0.7430287 0.000000 0.7430287 0.7430287
summary(brms_BL3)
# Family: bernoulli
# Links: mu = logit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.93 0.96 1.33 5.06 1.00 3607 5803
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.62 1.88 -4.35 3.21 1.00 9854 10536
# cSocial_group_size -0.21 0.25 -0.79 0.21 1.00 19703 12846
# activity_cycledi -0.81 1.54 -4.04 2.07 1.00 9439 8872
# activity_cyclenoct -0.07 1.56 -3.29 2.93 1.00 9517 9450
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).Similar results were obtained with the logit model as with the probit model. In conclusion, the models are consistent and suggest that neither social group size nor diurnal activity significantly affect red pelage presence.
Bivariate model
We jointly analysed two binary traits, presence of red pelage on the body/limbs and on the head, using a multivariate probit/logit threshold model with a phylogenetic random effect. Residual variances were fixed to 1 for identification and residual cross-trait covariance was set to 0. We fitted three specifications: intercept-only, adding centred social group size, and adding activity cycle.
Probit link
inv.phylo <- inverseA(p_tree, nodes = "ALL", scale = TRUE) # invert covariance matrix for use by MCMCglmm
prior2 <- list(G = list(G1 = list(V = diag(2),
nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 10)),
R = list(V = diag(2), fix = 1)
)
system.time(
mcmc_BPB1 <- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ trait - 1,
random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*25,
thin = 10*25,
burnin = 3000*25
)
)
system.time(
mcmc_BPB2 <- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + trait - 1,
random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
mcmc_BPB3 <- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + diurnal :trait + trait - 1,
random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
A <- ape::vcv.phylo(p_tree, corr = TRUE)
formula_biPv1 <- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ 1 +
(1|a|gr(PhyloName, cov = A))
)
default_prior2 <- default_prior(formula_biPv1,
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
brms_BPB1 <- brm(formula = formula_biPv1,
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior2,
iter = 25000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
)
formula_biPv2 <- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size +
(1|a|gr(PhyloName, cov = A))
# set_rescor(TRUE)
)
default_prior3 <- default_prior(formula_biPv2,
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
brms_BPB2 <- brm(formula = formula_biPv2,
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior3,
iter = 35000,
warmup = 15000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
)
formula_biPv3 <- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + diurnal +
(1|a|gr(PhyloName, cov = A))
# set_rescor(TRUE)
)
default_prior4 <- default_prior(formula_biPv3,
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
brms_BPB3 <- brm(formula = formula_biPv3,
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior4,
iter = 35000,
warmup = 25000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
)Logit link
# prior setting for mcmcglmm
inv.phylo <- inverseA(p_tree, nodes = "ALL", scale = TRUE)
prior <- list(R = list(V = diag(2), fix = 1),
G = list(G1 = list(V = diag(2), nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 10)
)
)
# function to run mcmcglmm
run_mcmcglmm <- function(formula, data, prior, inv_phylo) {
MCMCglmm(
fixed = formula,
random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("categorical", "categorical"),
data = data,
prior = prior,
ginverse = list(PhyloName = inv_phylo),
nitt = 13000*2500, # original *250
thin = 10*2500, # original *250
burnin = 3000*2500 # original *250
)
}
# model list - mcmcglmm
mcmcglmm_formulas <- list(
formula1 = cbind(red_pelage_body_limbs, red_pelage_head) ~ trait - 1,
formula2 = cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + trait - 1,
formula3 = cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + activity_cycle:trait + trait - 1
)
#### brms ####
# prior setting for brms
A <- ape::vcv.phylo(p_tree, corr = TRUE)
default_priors <- function(formula, data, A) {
default_prior(formula,
data = data,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
}
# function to run brms
run_brms <- function(formula, data, A, prior) {
brm(
formula = formula,
data = data,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = prior,
iter = 10000, # original 4000
warmup = 5000, # original 3000
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
}
# model list
brms_formulas <- list(
formula1 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ 1 + (1|a|gr(PhyloName, cov = A))),
formula2 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + (1|a|gr(PhyloName, cov = A)) ),
formula3 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + activity_cycle + (1|a|gr(PhyloName, cov = A)))
)
#### run mcmcglm and brms ####
# mcmcglmm
mcmcglmm_results <- lapply(seq_along(mcmcglmm_formulas), function(i) {
formula <- mcmcglmm_formulas[[i]]
model <- run_mcmcglmm(formula = formula, data = p_dat, prior = prior, inv_phylo = inv.phylo$Ainv)
saveRDS(model, file = here("Rdata_tutorial", "binomial", "bivariate", paste0("mcmcglmm_BLB_v2", i, ".rds")))
return(model)
})
# brms
brms_results <- lapply(seq_along(brms_formulas), function(i) {
formula <- brms_formulas[[i]]
prior2 <- default_prior(formula,
data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
model <- run_brms(formula = formula, data = p_dat, A = A, prior = prior2)
saveRDS(model, file = here("Rdata_tutorial", "binomial", "bivariate", paste0("brms_BLB_v2", i, ".rds")))
return(model)
})Results from each models
Probit link model:
Across all three models, the 95% CIs were wide, particularly for phylogenetic random effect variances and correlations. This indicates substantial uncertainty around the exact magnitude of phylogenetic effects, even though the overall patterns were consistent. Estimate differed somewhat between MCMCglmm and brms, but their 95%CIs largely overlapped and the direction of effects was the same, suggesting that the results are robust.
Intercept-only model
summary(mcmc_BPB1)
# Iterations = 75001:324751
# Thinning interval = 250
# Sample size = 1000
#
# DIC:
#
# G-structure: ~us(trait):PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 8.585 1.4925 20.01 1405.5
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 6.770 1.1077 15.62 895.0
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 6.770 1.1077 15.62 895.0
# traitred_pelage_head:traitred_pelage_head.PhyloName 9.812 0.9357 25.06 736.1
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1 1 1 0
# traitred_pelage_head:traitred_pelage_body_limbs.units 0 0 0 0
# traitred_pelage_body_limbs:traitred_pelage_head.units 0 0 0 0
# traitred_pelage_head:traitred_pelage_head.units 1 1 1 0
#
# Location effects: cbind(red_pelage_body_limbs, red_pelage_head) ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitred_pelage_body_limbs -1.016 -3.755 1.871 1000 0.454
# traitred_pelage_head -1.077 -4.036 2.085 1000 0.448
posterior_summary(mcmc_BPB1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs -1.015976 1.433527 -4.002762 1.659291
# traitred_pelage_head -1.076598 1.581606 -4.486771 1.754392
posterior_summary(mcmc_BPB1$VCV)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 8.585286 5.994698 2.308280 24.00470
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 6.770038 4.153591 1.843889 18.91487
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 6.770038 4.153591 1.843889 18.91487
# traitred_pelage_head:traitred_pelage_head.PhyloName 9.811660 8.425988 1.944167 34.12230
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1.000000 0.000000 1.000000 1.00000
# traitred_pelage_head:traitred_pelage_body_limbs.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_body_limbs:traitred_pelage_head.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_head:traitred_pelage_head.units 1.000000 0.000000 1.000000 1.00000
summary(brms_BPB1)
# Family: MV(bernoulli, bernoulli)
# Links: mu = probit
# mu = probit
# Formula: red_pelage_body_limbs ~ 1 + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ 1 + (1 | a | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 25000; warmup = 5000; thin = 1;
# total post-warmup draws = 40000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 2.47 0.70 1.35 4.11 1.00 9742 19379
# sd(redpelagehead_Intercept) 2.54 0.86 1.20 4.54 1.00 8959 17895
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.79 0.12 0.50 0.96 1.00 13059 17565
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.57 1.03 -2.68 1.45 1.00 19113 23116
# redpelagehead_Intercept -0.63 1.07 -2.74 1.52 1.00 22258 24066
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).Both MCMCglmm and brms indicated a strong phylogenetic random effects (body/limbs: 8.59 (MCMCglmm) and 6.10 (brms), head: 9.81 (MCMCglmm) and 6.45 (brms)) and correlation between red pelage on the body/limbs and on the head (: 0.74 (MCMCglmm) and 0.79 (brms)). The intercepts were close to zero with wide intervals, suggesting no overall bias in trait prevalence, but a clear tendency for the two traits to co-occur along phylogenetic lineages. Note: You can calculate correlation using cov(body/limbs, head)/var(body/limbs)*var(head).
One explanatory variable model
summary(mcmc_BPB2)
# Iterations = 165001:714451
# Thinning interval = 550
# Sample size = 1000
#
# DIC:
#
# G-structure: ~us(trait):PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 9.206 0.8927 20.70 936.9
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 7.773 0.9654 18.15 1000.0
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 7.773 0.9654 18.15 1000.0
# traitred_pelage_head:traitred_pelage_head.PhyloName 11.574 1.2028 28.12 1000.0
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1 1 1 0
# traitred_pelage_head:traitred_pelage_body_limbs.units 0 0 0 0
# traitred_pelage_body_limbs:traitred_pelage_head.units 0 0 0 0
# traitred_pelage_head:traitred_pelage_head.units 1 1 1 0
#
# Location effects: cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitred_pelage_body_limbs -0.96134 -3.71702 1.92201 1000 0.468
# traitred_pelage_head -1.03760 -4.15635 2.23607 1000 0.468
# cSocial_group_size:traitred_pelage_body_limbs -0.11524 -0.40543 0.15340 1159 0.450
# cSocial_group_size:traitred_pelage_head 0.06917 -0.16555 0.32105 1000 0.596
posterior_summary(mcmc_BPB2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 9.205964 7.089659 2.195793 27.00238
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 7.772934 4.949530 2.069536 21.01101
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 7.772934 4.949530 2.069536 21.01101
# traitred_pelage_head:traitred_pelage_head.PhyloName 11.573686 9.504936 2.138451 38.94923
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1.000000 0.000000 1.000000 1.00000
# traitred_pelage_head:traitred_pelage_body_limbs.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_body_limbs:traitred_pelage_head.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_head:traitred_pelage_head.units 1.000000 0.000000 1.000000 1.00000
posterior_summary(mcmc_BPB2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs -0.96134135 1.462895 -4.0424929 1.6874910
# traitred_pelage_head -1.03760246 1.604165 -4.4335258 2.1006014
# cSocial_group_size:traitred_pelage_body_limbs -0.11524039 0.150902 -0.4464785 0.1389921
# cSocial_group_size:traitred_pelage_head 0.06917466 0.125719 -0.1645042 0.3210948
summary(brms_BPB2)
# Family: MV(bernoulli, bernoulli)
# Links: mu = probit
# mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ cSocial_group_size + (1 | a | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 35000; warmup = 15000; thin = 1;
# total post-warmup draws = 40000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 2.56 0.74 1.39 4.26 1.00 8659 18347
# sd(redpelagehead_Intercept) 2.70 0.95 1.28 4.95 1.00 8461 16509
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.80 0.12 0.52 0.96 1.00 14328 20705
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.59 1.05 -2.75 1.46 1.00 22699 26397
# redpelagehead_Intercept -0.62 1.10 -2.85 1.60 1.00 25511 25658
# redpelagebodylimbs_cSocial_group_size -0.11 0.14 -0.43 0.14 1.00 54965 30027
# redpelagehead_cSocial_group_size 0.06 0.12 -0.17 0.30 1.00 50035 27765
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).Adding social group size as a predictor did not change the results in any important way. The estimated effects of group size were very small and the uncertainty intervals included zero, meaning that species living in larger or smaller groups were not more or less likely to show red pelage. The strong positive phylogenetic correlation between the two body regions remained, showing that shared ancestry, rather than social structure, is the main factor explaining why the traits occur together.
Two explanatory variables model
summary(mcmc_BPB3)
# Iterations = 225001:974251
# Thinning interval = 750
# Sample size = 1000
#
# DIC:
#
# G-structure: ~us(trait):PhyloName
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 10.593 1.140 26.01 1000.0
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 9.358 1.550 22.36 886.2
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 9.358 1.550 22.36 886.2
# traitred_pelage_head:traitred_pelage_head.PhyloName 15.643 1.257 43.19 721.6
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1 1 1 0
# traitred_pelage_head:traitred_pelage_body_limbs.units 0 0 0 0
# traitred_pelage_body_limbs:traitred_pelage_head.units 0 0 0 0
# traitred_pelage_head:traitred_pelage_head.units 1 1 1 0
#
# Location effects: cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + diurnal:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitred_pelage_body_limbs -0.80913 -4.13667 2.39332 1000.0 0.590
# traitred_pelage_head -1.08133 -5.11647 2.61995 1000.0 0.504
# cSocial_group_size:traitred_pelage_body_limbs -0.11514 -0.43031 0.17890 934.6 0.476
# cSocial_group_size:traitred_pelage_head 0.07673 -0.18447 0.34485 1000.0 0.542
# traitred_pelage_body_limbs:diurnal -0.48680 -2.12428 1.33723 1000.0 0.570
# traitred_pelage_head:diurnal 0.15765 -1.75338 2.13502 1000.0 0.926
posterior_summary(mcmc_BPB3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs -0.8091325 1.6473735 -4.0927622 2.4659980
# traitred_pelage_head -1.0813268 1.8811821 -5.1975758 2.5249237
# cSocial_group_size:traitred_pelage_body_limbs -0.1151393 0.1595438 -0.4547532 0.1720350
# cSocial_group_size:traitred_pelage_head 0.0767252 0.1374082 -0.1824891 0.3458651
# traitred_pelage_body_limbs:diurnal -0.4867977 0.8865311 -2.2848913 1.1781883
# traitred_pelage_head:diurnal 0.1576550 1.0020809 -1.7291174 2.2491917
posterior_summary(mcmc_BPB3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.PhyloName 10.593195 8.204460 2.421832 32.32659
# traitred_pelage_head:traitred_pelage_body_limbs.PhyloName 9.358446 6.292992 2.161271 27.21691
# traitred_pelage_body_limbs:traitred_pelage_head.PhyloName 9.358446 6.292992 2.161271 27.21691
# traitred_pelage_head:traitred_pelage_head.PhyloName 15.642777 16.000102 2.606732 59.73609
# traitred_pelage_body_limbs:traitred_pelage_body_limbs.units 1.000000 0.000000 1.000000 1.00000
# traitred_pelage_head:traitred_pelage_body_limbs.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_body_limbs:traitred_pelage_head.units 0.000000 0.000000 0.000000 0.00000
# traitred_pelage_head:traitred_pelage_head.units 1.000000 0.000000 1.000000 1.00000
# traitred_pelage_head:traitred_pelage_head.units 1.000000 0.000000 1.000000 1.00000
summary(brms_BPB3)
# Family: MV(bernoulli, bernoulli)
# Links: mu = probit
# mu = probit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ cSocial_group_size + activity_cycle + (1 | a | gr(PhyloName, cov = A))
# Data: p_dat (Number of observations: 265)
# Draws: 2 chains, each with iter = 35000; warmup = 25000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 2.84 0.88 1.51 4.90 1.00 3193 6522
# sd(redpelagehead_Intercept) 3.07 1.21 1.39 5.97 1.00 3135 5097
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.80 0.11 0.52 0.96 1.00 5791 8929
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.67 1.45 -3.60 2.19 1.00 7758 10369
# redpelagehead_Intercept -0.42 1.54 -3.61 2.56 1.00 8030 8538
# redpelagebodylimbs_cSocial_group_size -0.11 0.15 -0.44 0.16 1.00 15582 11724
# redpelagebodylimbs_activity_cycledi -0.13 1.02 -2.23 1.85 1.00 8423 10023
# redpelagebodylimbs_activity_cyclenoct 0.39 1.07 -1.75 2.50 1.00 8554 9618
# redpelagehead_cSocial_group_size 0.07 0.12 -0.18 0.31 1.00 14941 11504
# redpelagehead_activity_cycledi -0.07 1.08 -2.07 2.21 1.00 8644 8966
# redpelagehead_activity_cyclenoct -0.44 1.20 -2.75 2.02 1.00 9458 8874
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).When both social group size and activity cycle were included, neither predictor showed strong or consistent effects. The estimated coefficients were uncertain and close to zero. As in the previous models. there was a strong positive phylogenetic random effect and correlation between the two traits, indicating that shared ancestry is the main factor explaining why the traits occur together.
Logit link model:
In the logit link model, we need to correct the variances and estimates in the results from MCMCglmm…
Intercept-only model
# summary(mcmc_BLB1)
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- mcmc_BLB1$Sol/sqrt(1+c2)
res_2 <- mcmc_BLB1$VCV/(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2 -1.521216 2.276932 -6.235553 2.645887
# traitred_pelage_head.2 -1.697114 2.392179 -6.463650 3.067338
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.PhyloName 22.3183259 15.06258 5.8362624 60.1615736
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.PhyloName 18.0581310 10.73053 4.5460350 46.1693777
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.PhyloName 18.0581310 10.73053 4.5460350 46.1693777
# traitred_pelage_head.2:traitred_pelage_head.2.PhyloName 25.3350646 20.48068 4.7065443 77.9581473
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.units 0.7430287 0.00000 0.7430287 0.7430287
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.units 0.0000000 0.00000 0.0000000 0.0000000
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.units 0.0000000 0.00000 0.0000000 0.0000000
# traitred_pelage_head.2:traitred_pelage_head.2.units 0.7430287 0.00000 0.7430287 0.7430287
summary(brms_BLB1)
# Family: MV(bernoulli, bernoulli)
# Links: mu = logit
# mu = logit
# Formula: red_pelage_body_limbs ~ 1 + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ 1 + (1 | a | gr(PhyloName, cov = A))
# Data: data (Number of observations: 265)
# Draws: 2 chains, each with iter = 4000; warmup = 3000; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 3.88 1.17 2.08 6.63 1.00 509 963
# sd(redpelagehead_Intercept) 3.99 1.37 1.86 7.18 1.01 402 854
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.79 0.13 0.49 0.96 1.02 434 823
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.63 1.42 -3.54 2.06 1.00 957 1060
# redpelagehead_Intercept -0.82 1.46 -3.63 2.23 1.00 1092 1298Intercepts were near zero with very wide intervals in MCMCglmm and brms. Phylogenetic variances were large and the cross‐trait covariance was positive, giving a high phylogenetic correlation. Overall, both models agree on a strong positive phylogenetic association between traits, although estimates are uncertain and intervals are wide.
One explanatory variable model
# summary(mcmc_BLB2)
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- mcmc_BLB2$Sol/sqrt(1+c2)
res_2 <- mcmc_BLB2$VCV/(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2 -1.7247280 2.5117027 -7.4454943 2.7521580
# traitred_pelage_head.2 -1.8714622 2.6036701 -7.4632854 3.1199567
# cSocial_group_size:traitred_pelage_body_limbs.2 -0.2114015 0.2693973 -0.7956990 0.2218048
# cSocial_group_size:traitred_pelage_head.2 0.1132643 0.2118232 -0.3220887 0.5181248
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.PhyloName 25.8375512 17.21042 6.4036205 71.0978415
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.PhyloName 22.0012511 13.81957 5.1039500 58.9474559
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.PhyloName 22.0012511 13.81957 5.1039500 58.9474559
# traitred_pelage_head.2:traitred_pelage_head.2.PhyloName 31.8824714 28.59083 5.5360062 95.3746806
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.units 0.7430287 0.00000 0.7430287 0.7430287
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.units 0.0000000 0.00000 0.0000000 0.0000000
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.units 0.0000000 0.00000 0.0000000 0.0000000
# traitred_pelage_head.2:traitred_pelage_head.2.units 0.7430287 0.00000 0.7430287 0.7430287
summary(brms_BLB2)
# Family: MV(bernoulli, bernoulli)
# Links: mu = logit
# mu = logit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ cSocial_group_size + (1 | a | gr(PhyloName, cov = A))
# Data: data (Number of observations: 265)
# Draws: 2 chains, each with iter = 4000; warmup = 3000; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 3.93 1.19 2.10 6.80 1.00 526 916
# sd(redpelagehead_Intercept) 4.09 1.34 1.89 7.02 1.00 521 774
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.78 0.13 0.47 0.96 1.00 827 1113
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.71 1.43 -3.58 2.12 1.00 864 1130
# redpelagehead_Intercept -0.84 1.42 -3.68 1.99 1.00 1059 1183
# redpelagebodylimbs_cSocial_group_size -0.19 0.25 -0.74 0.23 1.00 2204 1489
# redpelagehead_cSocial_group_size 0.11 0.20 -0.30 0.50 1.00 2265 1554
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).Intercepts for both traits were close to zero with very wide intervals. Social group size effects were small and did not overlap with zero. Phylogenetic random effects are large, yielding a strong phylogenetic correlation in both models. Adding social group size did not materially change the results. Both approaches support a strong phylogenetic association between traits.
Two explanatory variables model
# summary(mcmc_BLB3)
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- mcmc_BLB3$Sol/sqrt(1+c2)
res_2 <- mcmc_BLB3$VCV/(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2 -52.47073523 153.0210964 -470.6945375 213.7168079
# traitred_pelage_head.2 -1.59381221 4.2268293 -9.9107244 6.2757031
# cSocial_group_size:traitred_pelage_body_limbs.2 -0.05546842 5.6201453 -12.2032240 14.3404769
# cSocial_group_size:traitred_pelage_head.2 0.15627339 0.2463914 -0.3461778 0.6621622
# traitred_pelage_body_limbs.2:activity_cycledi -8.13420554 61.9395801 -159.5704053 139.5901896
# traitred_pelage_head.2:activity_cycledi -0.29106864 2.1108018 -4.1396017 4.2412983
# traitred_pelage_body_limbs.2:activity_cyclenoct 13.99091102 63.9019094 -121.4704631 173.7572996
# traitred_pelage_head.2:activity_cyclenoct -0.83013530 2.2719205 -5.0979767 3.5671068
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.PhyloName 7.672423e+04 111379.21048 9.1326987 3.823663e+05
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.PhyloName 1.199345e+03 1623.30073 8.2063366 5.272497e+03
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.PhyloName 1.199345e+03 1623.30073 8.2063366 5.272497e+03
# traitred_pelage_head.2:traitred_pelage_head.2.PhyloName 5.319362e+01 72.57619 7.6215483 1.778477e+02
# traitred_pelage_body_limbs.2:traitred_pelage_body_limbs.2.units 7.430287e-01 0.00000 0.7430287 7.430287e-01
# traitred_pelage_head.2:traitred_pelage_body_limbs.2.units 0.000000e+00 0.00000 0.0000000 0.000000e+00
# traitred_pelage_body_limbs.2:traitred_pelage_head.2.units 0.000000e+00 0.00000 0.0000000 0.000000e+00
# traitred_pelage_head.2:traitred_pelage_head.2.units 7.430287e-01 0.00000 0.7430287 7.430287e-01
summary(brms_BLB3)
#
# Family: MV(bernoulli, bernoulli)
# Links: mu = logit
# mu = logit
# Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | a | gr(PhyloName, cov = A))
# red_pelage_head ~ cSocial_group_size + activity_cycle + (1 | a | gr(PhyloName, cov = A))
# Data: data (Number of observations: 265)
# Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
# total post-warmup draws = 10000
#
# Multilevel Hyperparameters:
# ~PhyloName (Number of levels: 265)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(redpelagebodylimbs_Intercept) 4.37 1.32 2.34 7.44 1.00 2900 4826
# sd(redpelagehead_Intercept) 4.66 2.05 2.06 8.70 1.00 2332 3687
# cor(redpelagebodylimbs_Intercept,redpelagehead_Intercept) 0.79 0.12 0.51 0.96 1.00 3899 5188
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# redpelagebodylimbs_Intercept -0.70 2.08 -4.86 3.43 1.00 8051 7049
# redpelagehead_Intercept -0.34 2.19 -4.85 3.93 1.00 8470 6758
# redpelagebodylimbs_cSocial_group_size -0.19 0.26 -0.77 0.25 1.00 13234 6843
# redpelagebodylimbs_activity_cycledi -0.22 1.66 -3.72 2.95 1.00 8089 6616
# redpelagebodylimbs_activity_cyclenoct 0.54 1.75 -3.03 3.98 1.00 7681 6273
# redpelagehead_cSocial_group_size 0.11 0.21 -0.30 0.52 1.00 12432 5620
# redpelagehead_activity_cycledi -0.17 1.78 -3.45 3.50 1.00 6806 4880
# redpelagehead_activity_cyclenoct -0.94 1.92 -4.62 3.11 1.00 7076 5211In MCMCglmm, estimates were generally unstable. In particular, the intercept for body/limbs and the activity cycle effects showed extremely large errors and wide intervals. The effects of social group size and activity cycle both crossed zero, providing no evidence of significant associations. Phylogenetic variance estimates were extremely large with very wide intervals, indicating low precision. In brms, the estimates were more stable. The phylogenetic random‐effect standard deviations were large, and the between‐trait phylogenetic correlation remained high. The effects of social group size and activity cycle were small, with 95% intervals overlapping zero. Overall, in both implementations, neither social group size nor activity cycle emerged as important predictors of red pelage, while the main effect was a consistently strong positive phylogenetic correlation between traits. However, MCMCglmm produced unstable estimates with very wide intervals, and this discrepancy likely reflects differences in prior specification between the two frameworks.
3. Ordinal (threshold) models (ordered multinomial model)
In the ordinal threshold model and nominal model, we will show 2 realistic examples. As we explained in the BOX2 on main text, MCMCglmm and brms take different parametrisation for the threshold model. At first sight, the outputs about thresholds (cutpoints) appear to be quite difficult. But do not worry about it,we will guide you step by step to understand and interpret these outputs. By the end of this section, you will be able to compare the threshold estimates from both models and understand their implications.
Explanation of dataset
We used the dataset from AVONET (Tobias et al. (2022)) and the phylogenetic tree from BirdTree.
For first example, we tested the relationships between migration level (response variable) and body mass and habitat density in birds of prey (136 species).
For second example, we investigated the difference of habitat density within family Phasianidae (179 species) are affected by tail length and diet (herbivore).
Datasets for example 1
dat <- read.csv(here("data", "bird body mass", "accipitridae_sampled.csv"))
dat <- dat %>%
mutate(across(c(Trophic.Level, Trophic.Niche, Primary.Lifestyle, Migration, Habitat, Species.Status), as.factor))
# rename: numbers to descriptive name
dat$Migration_ordered <- factor(dat$Migration, levels = c("sedentary", "partially_migratory", "migratory"), ordered = TRUE)
dat <- dat %>%
mutate(Habitat.Density = case_when(
Habitat.Density == 1 ~ "dense",
Habitat.Density == 2 ~ "semi-open",
Habitat.Density == 3 ~ "open",
TRUE ~ as.character(Habitat.Density)
))
table(dat$Habitat.Density)
dat$logMass <- log(dat$Mass)
hist(dat$logMass)
boxplot(logMass ~ Habitat.Density, data = dat)
summary(dat)
trees <- read.nexus(here("data", "bird body mass", "accipitridae_sampled.nex"))
tree <- trees[[1]]
summary(dat)Datasets for example 2
dat <- read.csv(here("data", "bird body mass", "9993spp_clearned.csv"))
dat <- dat %>%
mutate(across(c(Trophic.Level, Trophic.Niche,
Primary.Lifestyle, Migration, Habitat, Habitat.Density , Species.Status), as.factor),
Habitat.Density = factor(Habitat.Density, ordered = TRUE))
Family_HD <- dat %>%
group_by(Family, Habitat.Density) %>%
tally() %>%
group_by(Family) %>%
filter(all(n >= 5)) %>%
ungroup()
Family_HD <- dat %>%
filter(Family %in% c("Phasianidae", "Ploceidae", "Sturnidae")) %>%
count(Family, Habitat.Density)
Phasianidae_dat <- dat %>%
filter(Family == "Phasianidae") %>%
mutate(Habitat.Density = case_when(
Habitat.Density == 1 ~ "dense",
Habitat.Density == 2 ~ "semi-open",
Habitat.Density == 3 ~ "open"
),
Habitat.Density = factor(Habitat.Density, ordered = TRUE)
)
Phasianidae_dat <- Phasianidae_dat %>%
mutate(Habitat_Forest = ifelse(Habitat == "Forest", 1, 0),
Habitat_Forest = replace_na(Habitat_Forest, 0))
Phasianidae_dat$cTail_length <- scale(log(Phasianidae_dat$Tail.Length), center = TRUE, scale = FALSE)
View(Phasianidae_dat)
Phasianidae_tree <- read.nexus(here("data", "bird body mass", "Phasianidae.nex"))
tree <- Phasianidae_tree[[1]]Example 1
Intercept-only model
MCMCglmm
When using MCMCglmm for ordinal data, you can choose between two options for the family parameter: "threshold" and "ordinal". These options differ in how they handle residual variances and error distributions.
We generally recommend using family = "threshold" for following reasons - directly corresponds to the standard probit regression with a variance of 1, no correcting of outputs is necessary, and it provides a more computationally efficient approach.
We show both models through this section;
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
mcmcglmm_T1_1 <- MCMCglmm(Migration_ordered ~ 1,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
)
)
system.time(
mcmcglmm_O1_1 <- MCMCglmm(Migration_ordered ~ 1,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
)
)For ordinal model, we need the correction;
c2 <- 1
res_1 <- model$Sol / sqrt(1+c2) # for fixed effects
res_2 <- model$VCV / (1+c2) # for variance componentssummary(mcmcglmm_T1_1) # 95%HPD Interval
# Iterations = 120001:519601
# Thinning interval = 400
# Sample size = 1000
#
# DIC: 297.9054
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.43 1.196e-06 4.766 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 0.3178 -0.7651 1.5749 1000 0.464
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.011 0.7089 1.408 1000
posterior_summary(mcmcglmm_T1_1$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.429685 2.187129 0.0020836 6.25389
# units 1.000000 0.000000 1.0000000 1.00000
posterior_summary(mcmcglmm_T1_1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.3178127 0.5848262 -0.8350972 1.513238
posterior_summary(mcmcglmm_T1_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_1) # 95%HPD Interval
# Iterations = 120001:519601
# Thinning interval = 400
# Sample size = 1000
#
# DIC: 267.7801
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 2.94 1.447e-05 10.78 779.5
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 0.4319 -1.0821 2.2693 889.5 0.456
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.43 0.9309 1.959 817.1
posterior_summary(mcmcglmm_O1_1$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.940261 4.749804 0.003052149 13.92029
# units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.4319314 0.7973933 -1.170937 2.245297
c2 <- 1
res_1 <- mcmcglmm_O1_1$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mcmcglmm_O1_1$VCV / (1+c2) # for variance components
res_3 <- mcmcglmm_O1_1$CP / sqrt(1+c2) # for cutpoint
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.3723211 0.6873459 -1.009337 1.935426
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.1846987 3.529241 0.002267834 10.3431749
# units 0.7430287 0.000000 0.743028733 0.7430287
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.232696 0.2358161 0.8556872 1.782325the ordinal models show a smaller effect size compared to the threshold model, but the difference is minor and should not be an issue. The convergences appear to be satisfactory for both models. Also, it is well-known that ordinal models in MCMCglmm tend to be more difficult to converge compared to the threshold models. Despite this, the results obtained from both models are largely consistent, showing minimal difference. As you can see here, residual variance is fixed 1 - the model did not estimate the residual variance (R-structure`).
Look at the output more closely - only one cutpoint is provided although our response variable is three ordered categorical traits! As we have explained in main text, MCMCglmm implicitly assumes that the first cut point is always set to 0, making the intercept represent the centre of the distribution.
brms
The family argument is set to cumulative() with link = "probit", which uses the standard probit parametrisation where the residual variance is fixed at 1. Since this is the standard parametrisation for ordinal probit regression, no correcting of coefficients is necessary.
A <- ape::vcv.phylo(tree, corr = TRUE)
default_priors <- default_prior(
Migration_ordered ~ 1 + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
brms_OT1_1 <- brm(
formula = Migration_ordered ~ 1 + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 19000,
warmup =16000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
)
)summary(brms_OT1_1)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Migration_ordered ~ 1 + (1 | gr(Phylo, cov = A))
# Data: dat (Number of observations: 136)
# Draws: 2 chains, each with iter = 20000; warmup = 10000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 136)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.85 0.61 0.04 2.28 1.00 1594 3377
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -0.32 0.47 -1.36 0.61 1.00 12133 7267
# Intercept[2] 0.66 0.47 -0.24 1.75 1.00 11680 8235
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NAIn brms, now you can see the two intercepts and no cutpoint in the output! In fact, the intercept is described each cutpoint. However, the results (how many species fall into each category) show little difference between MCMCglmm and brms, as can be seen below.
Probabilities
We can obtain the probabilities for each category (migrate level: sedentary, partially migratory, migratory) using pnorm(). We made simple functions to calculate these and use them for the threshold models in MCMCglmm and brms.
# l = intercept
calculate_probabilities <- function(cutpoint0, cutpoint1, l) {
category1_prob <- pnorm(cutpoint0 - l)
category2_prob <- pnorm(cutpoint1 - l) - pnorm(cutpoint0 - l)
category3_prob <- 1 - pnorm(cutpoint1 - l)
return(c(sedentary = category1_prob,
partially_migratory = category2_prob,
migratory = category3_prob)
)
}probabilities_mcmcglmm <- calculate_probabilities(0, 1.01, 0.3178)
print(probabilities_mcmcglmm)
# sedentary partially_migratory migratory
# 0.3753183 0.3802758 0.2444059
probabilities_brms <- calculate_probabilities(-0.32, 0.66, 0)
print(probabilities_brms)
# sedentary partially_migratory migratory
# 0.3744842 0.3708889 0.2546269The results obtained in MCMCglmm are as follows: Sedentary: 0.3753183, Partially migratory: 0.3802758, and Migratory: 0.2444059. In brms, the results are Sedentary: 0.3744842, Partially migratory: 0.3708889, and Migratory: 0.2546269. As shown, although MCMCglmm and brms use parametrisations of and , respectively, the results are very similar (see Box 2 in main text).
Phylogenetic signals
# MCMCglmm
phylo_signal_mcmcglmm_T <- ((mcmcglmm_T1_1$VCV[, "Phylo"]) / (mcmcglmm_T1_1$VCV[, "Phylo"] + 1))
phylo_signal_mcmcglmm_T %>% mean()
# [1] 0.4172334
phylo_signal_mcmcglmm_T %>% quantile(probs = c(0.025, 0.5, 0.975))
# 2.5% 50% 97.5%
# 0.002079267 0.430888403 0.862142904
phylo_signal_mcmcglmm_O <- ((mcmcglmm_O1_1$VCV[, "Phylo"]/(1+c2)) / (mcmcglmm_O1_1$VCV[, "Phylo"]/(1+c2) + 1))
phylo_signal_mcmcglmm_O %>% mean()
# [1] 0.4731027
phylo_signal_mcmcglmm_O %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.002262703 0.515490529 0.911841208
# brms
phylo_signal_brms_OT <- brms_OT1_1 %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_Phylo__Intercept) %>%
mutate(lambda = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda)
phylo_signal_brms_OT %>% mean()
# [1] 0.3692979
phylo_signal_brms_OT %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.001264342 0.359406642 0.838180812 In both MCMCglmm (threshold and ordinal models) and brms, the phylogenetic signal is estimated to be moderate. The moderately high phylogenetic signal suggests that closely related species tend to exhibit similar migratory behaviours, indicating that phylogeny plays a role in shaping these traits. However, this also implies that additional factors, such as environmental or ecological influences, likely contribute to the variation in migratory behavior. This balance between phylogenetic constraints and external influences may reflect evolutionary flexibility in migratory strategies among species.
One continuous explanatory variable model
The model using MCMCglmm is
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
mcmcglmm_T1_2 <- MCMCglmm(Migration_ordered ~ logMass,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*45,
thin = 10*45,
burnin = 3000*45
)
)
system.time(
mcmcglmm_O1_2 <- MCMCglmm(Migration_ordered ~ logMass,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)For brms…
A <- ape::vcv.phylo(tree,corr = TRUE)
default_priors2 <- default_prior(
Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
brms_OT1_2 <- brm(
formula = Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
)
)# MCMCglmm
summary(mcmcglmm_T1_2)
# Iterations = 135001:584551
# Thinning interval = 450
# Sample size = 1000
#
# DIC: 300.7449
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 2.068 6.389e-06 7.225 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 0.88928 -1.69994 3.49007 1000 0.436
# logMass -0.08033 -0.37129 0.21982 1000 0.550
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.052 0.6441 1.469 897.7
posterior_summary(mcmcglmm_T1_2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.88927720 1.2935980 -1.5704745 3.6731840
# logMass -0.08033062 0.1488659 -0.3928655 0.2076132
posterior_summary(mcmcglmm_T1_2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.068366 3.804493 0.00210611 11.97129
# units 1.000000 0.000000 1.00000000 1.00000
posterior_summary(mcmcglmm_T1_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_2) # 95%HPD Interval
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 268.7424
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 4.133 8.171e-05 14.26 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.2622 -2.7953 4.3184 1000 0.432
# logMass -0.1087 -0.5494 0.3449 1000 0.574
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.489 0.9695 2.098 1000
posterior_summary(mcmcglmm_O1_2$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 4.132637 6.62053 0.004598829 19.22872
# units 1.000000 0.00000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.262162 1.8674403 -2.1994721 5.0542830
# logMass -0.108659 0.2176232 -0.5751705 0.3314989
c2 <- 1
res_1 <- mcmcglmm_O1_2$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mcmcglmm_O1_2$VCV / (1+c2) # for variance components
res_3 <- mcmcglmm_O1_2$CP / sqrt(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.89248312 1.3204797 -1.555262 3.5739178
# logMass -0.07683353 0.1538828 -0.406707 0.2344051
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.066318 3.310265 0.002299414 9.614361
# units 0.500000 0.000000 0.500000000 0.500000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.053063 0.2218614 0.7173326 1.575309
#brms
summary(brms_OT1_2)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A))
# Data: dat (Number of observations: 136)
# Draws: 2 chains, each with iter = 20000; warmup = 10000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 136)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.93 0.66 0.04 2.45 1.00 1192 3611
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -0.85 1.07 -3.16 1.26 1.00 15424 9201
# Intercept[2] 0.15 1.06 -2.03 2.33 1.00 16388 10923
# logMass -0.07 0.13 -0.34 0.19 1.00 18410 10854
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NABoth models (brms and MCMCglmm) indicate body mass (logMass) does not have a statistically significant effect on the migration order.
probabilities_mcmcglmm <- calculate_probabilities(0, 0.88, 1.05)
print(probabilities_mcmcglmm)
# sedentary partially_migratory migratory
# 0.1468591 0.2856460 0.5674949
probabilities_brms <- calculate_probabilities(-0.85, 0.15, 0)
print(probabilities_brms)
# sedentary partially_migratory migratory
# 0.1976625 0.3619551 0.4403823 The coefficient of logMass is negative, but the actual results show an increase in the probability of being migratory. This is likely because the effect of logMass is not statistically significant, and due to wide credible intervals, the probabilities may shift due to sampling variability. Therefore, the results should be interpreted with caution, as they may reflect uncertainty in the data rather than a true underlying effect of logMass on migratory tendencies.
One continuous and one categorical explanatory variable model
For the model MCMCglmm the following - here, we use the gelman,prior() function for prior setting for fixed effects to ensure weakly informative prior that help with model convergence. You can find further information in Chapter 4. Nominal models.
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior1 <- list(B = list(mu = rep(0,4),
V = gelman.prior(~logMass + Habitat.Density, data = dat, scale = sqrt(1+1))),
R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
mcmcglmm_T1_3 <- MCMCglmm(Migration_ordered ~ logMass + Habitat.Density,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
system.time(
mcmcglmm_O1_3 <- MCMCglmm(Migration_ordered ~ logMass + Habitat.Density,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)For brms,
A <- ape::vcv.phylo(tree, cor = TRUE)
default_priors3 <- default_prior(
Migration_ordered ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
brm_OT1_3 <- brm(
formula = Migration_ordered ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 6500,
warmup = 5500,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
)
)# MCMCglmm
summary(mcmcglmm_T1_3)
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 289.3043
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 1.988 4.136e-07 6.227 758.7
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass + Habitat.Density
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.3097 -1.1727 3.6674 1000.0 0.254
# logMass -0.2664 -0.5703 0.0479 891.2 0.058 .
# Habitat.Densityopen 1.3154 0.4916 2.0664 1054.1 <0.001 ***
# Habitat.Densitysemi-open 0.3955 -0.3020 0.9998 1000.0 0.218
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.116 0.6674 1.503 1022
posterior_summary(mcmcglmm_T1_3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.987738 4.411821 0.009463594 9.04495
# units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_T1_3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.3096841 1.2767628 -1.0290056 3.809598026
# logMass -0.2664037 0.1660313 -0.6334123 0.007099615
# Habitat.Densityopen 1.3153548 0.4385812 0.6134014 2.247115510
# Habitat.Densitysemi-open 0.3954618 0.3425752 -0.2200316 1.117789403
posterior_summary(mcmcglmm_T1_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_3) # 95%HPD Interval
# Iterations = 450001:1948501
# Thinning interval = 1500
# Sample size = 1000
#
# DIC: 255.0597
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 4.142 1.314e-06 14.07 839
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass + Habitat.Density
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.87956 -1.61645 5.74879 1000.0 0.262
# logMass -0.38074 -0.82066 0.06748 1112.4 0.078 .
# Habitat.Densityopen 1.85563 0.72168 3.01111 842.4 <0.001 ***
# Habitat.Densitysemi-open 0.58208 -0.29301 1.48943 1000.0 0.190
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.579 1.064 2.204 885.1
posterior_summary(mcmcglmm_O1_3$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 4.142472 9.805007 0.03628788 19.59177
# units 1.000000 0.000000 1.00000000 1.00000
posterior_summary(mcmcglmm_O1_3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.8795575 1.8603927 -1.5225125 6.0417116
# logMass -0.3807427 0.2337957 -0.8716018 0.0461751
# Habitat.Densityopen 1.8556261 0.6258944 0.8289186 3.1677050
# Habitat.Densitysemi-open 0.5820800 0.4612845 -0.2342549 1.5671716
c2 <- 1
res_1 <- mcmcglmm_O1_3$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mcmcglmm_O1_3$VCV / (1+c2) # for variance components
res_3 <- mcmcglmm_O1_3$CP / sqrt(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.3290479 1.3154963 -1.0765789 4.27213525
# logMass -0.2692257 0.1653185 -0.6163155 0.03265073
# Habitat.Densityopen 1.3121258 0.4425742 0.5861339 2.23990572
# Habitat.Densitysemi-open 0.4115927 0.3261774 -0.1656433 1.10815765
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.071236 4.902504 0.01814394 9.795886
# units 0.500000 0.000000 0.50000000 0.500000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116852 0.2491617 0.7783082 1.6727
# brms
summary(brms_OT1_3)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Migration_ordered ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A))
# Data: dat (Number of observations: 136)
# Draws: 2 chains, each with iter = 6500; warmup = 5500; thin = 1;
# total post-warmup draws = 2000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 136)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.91 0.58 0.08 2.31 1.01 231 460
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -1.34 1.10 -3.63 0.73 1.00 1364 1095
# Intercept[2] -0.27 1.09 -2.45 1.85 1.00 1523 1156
# logMass -0.26 0.14 -0.56 0.01 1.00 1300 1172
# Habitat.Densityopen 1.25 0.38 0.59 2.08 1.00 533 941
# Habitat.DensitysemiMopen 0.37 0.31 -0.20 1.01 1.00 1236 926
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NAThe species living in open environments have significantly higher migration levels compared to those living in dense environments. No significant difference in migration levels was observed between species living in semi-open and dense environments.logMass does not have a significant impact on migration levels, suggesting that Habitat.Density is the primary factor influencing migration levels.
probabilities_mcmcglmm <- calculate_probabilities(0, 1.116, 1.3097)
print(probabilities_mcmcglmm)
# sedentary partially_migratory migratory
# 0.09514867 0.32805672 0.57679460
probabilities_brms <- calculate_probabilities(-1.33, -0.26, 0)
print(probabilities_brms)
# sedentary partially_migratory migratory
# 0.09175914 0.30567275 0.60256811 The probability of the Migratory category is the highest, around 60%. This reflects the higher migration likelihood, especially for species living in open environments, which leads to a higher probability of being migratory.
Example 2
The second example is similar to the first one, but we will use a different dataset and different explanatory variables. The response variable is Habitat.Density (1 = Dense habitats, 2 = Semi-open habitats, 3 = Open habitats) and the explanatory variables are cTail_length (continuous) and Diet (categorical).
Intercept only model
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
mcmcglmm_T2_1 <- MCMCglmm(Habitat.Density ~ 1,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*100,
thin = 10*10,
burnin = 3000*100
)
)
system.time(
mcmcglmm_O2_1 <- MCMCglmm(Habitat.Density ~ 1,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
)
)
A <- ape::vcv.phylo(tree, corr = TRUE)
default_priors <- default_prior(
Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
brms_O2_1 <- brm(
formula = Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 19000,
warmup =16000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
)
)# MCMCglmm
summary(mcmcglmm_T2_1)
# Iterations = 195001:844351
# Thinning interval = 650
# Sample size = 1000
#
# DIC: 280.497
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 8.552 1.018 20.69 726.6
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Habitat.Density ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.2226 -3.6231 2.5774 1000 0.9
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitHabitat.Density.1 1.609 1.055 2.36 829.5
posterior_summary(mcmcglmm_T2_1$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 8.552458 6.42742 2.217233 24.88162
# units 1.000000 0.00000 1.000000 1.00000
posterior_summary(mcmcglmm_T2_1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.2226369 1.574717 -3.545592 2.680834
summary(mcmcglmm_O2_1) # 95%HPD Interval
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Sample size = 1000
#
# DIC: 244.2456
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 15.59 2.906 36.05 844
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Habitat.Density ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.210 -3.987 4.017 1000 0.904
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitHabitat.Density.1 2.221 1.418 3.062 1000
posterior_summary(mcmcglmm_O2_1$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 15.58885 10.86135 3.928927 43.26684
# units 1.00000 0.00000 1.000000 1.00000
posterior_summary(mcmcglmm_O2_1$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.2100358 2.077207 -4.373171 3.795935
c2 <- 1
res_1 <- mcmcglmm_T2_1$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mcmcglmm_T2_1$VCV / (1+c2) # for variance components
res_3 <- mcmcglmm_T2_1$CP / sqrt(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.157428 1.113493 -2.507112 1.895636
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 4.276229 3.21371 1.108617 12.44081
# units 0.500000 0.00000 0.500000 0.50000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116852 0.2491617 0.7783082 1.6727
# brms
summary(brms_OT2_1)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A))
# Data: Phasianidae_dat (Number of observations: 179)
# Draws: 2 chains, each with iter = 6000; warmup = 4000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 179)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.54 0.70 1.45 4.20 1.01 542 1179
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -0.21 1.09 -2.40 1.93 1.00 2080 2257
# Intercept[2] 1.33 1.10 -0.75 3.58 1.00 2219 2312
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NAPhylogenetic signals
# MCMCglmm
phylo_signal_mcmcglmm_T2 <- ((mcmcglmm_T2_1$VCV[, "Phylo"]) / (mcmcglmm_T2_1$VCV[, "Phylo"] + 1))
phylo_signal_mcmcglmm_T2 %>% mean()
# [1] 0.8578778
phylo_signal_mcmcglmm_T2 %>% quantile(probs = c(0.025, 0.5, 0.975))
# 2.5% 50% 97.5%
# 0.6891739 0.8711971 0.9613625
phylo_signal_mcmcglmm_O2 <- ((mcmcglmm_O2_1$VCV[, "Phylo"]/(1+c2)) / (mcmcglmm_O2_1$VCV[, "Phylo"]/(1+c2) + 1))
phylo_signal_mcmcglmm_O2 %>% mean()
# [1] 0.8513276
phylo_signal_mcmcglmm_O2 %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.6626708 0.8640197 0.9558175
# brms
phylo_signal_brms_OT2 <- brms_OT2_1 %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_Phylo__Intercept) %>%
mutate(lambda = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda)
phylo_signal_brms_OT2 %>% mean()
# [1] 0.8456712
phylo_signal_brms_OT2 %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.6783932 0.8563418 0.9463123 Summary All three models (MCMCglmm with threshold, MCMCglmm with ordinal, and brms with cumulative probit) estimate high phylogenetic signal in Habitat.Density (\(\lambda \approx 0.85\)), suggesting strong phylogenetic structuring in habitat density preferences within Phasianidae.
One continuous explanatory variable model
The model using MCMCglmm is
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
mcmcglmm_T2_2 <- MCMCglmm(Habitat.Density ~ logMass,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*250,
thin = 10*25,
burnin = 3000*250
)
)
system.time(
mcmcglmm_O2_2 <- MCMCglmm(Habitat.Density ~ logMass,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
A <- ape::vcv.phylo(tree,corr = TRUE)
default_priors2 <- default_prior(
Habitat.Density ~ logMass + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
brms_OT2_2 <- brm(
formula = Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
)
)# MCMCglmm
summary(mcmcglmm_T1_2)
# Iterations = 135001:584551
# Thinning interval = 450
# Sample size = 1000
#
# DIC: 300.7449
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 2.068 6.389e-06 7.225 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 0.88928 -1.69994 3.49007 1000 0.436
# logMass -0.08033 -0.37129 0.21982 1000 0.550
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.052 0.6441 1.469 897.7
posterior_summary(mcmcglmm_T1_2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.88927720 1.2935980 -1.5704745 3.6731840
# logMass -0.08033062 0.1488659 -0.3928655 0.2076132
posterior_summary(mcmcglmm_T1_2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.068366 3.804493 0.00210611 11.97129
# units 1.000000 0.000000 1.00000000 1.00000
summary(mcmcglmm_O1_2) # 95%HPD Interval
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 268.7424
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 4.133 8.171e-05 14.26 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Migration_ordered ~ logMass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.2622 -2.7953 4.3184 1000 0.432
# logMass -0.1087 -0.5494 0.3449 1000 0.574
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitMigration_ordered.1 1.489 0.9695 2.098 1000
posterior_summary(mcmcglmm_O1_2$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 4.132637 6.62053 0.004598829 19.22872
# units 1.000000 0.00000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.262162 1.8674403 -2.1994721 5.0542830
# logMass -0.108659 0.2176232 -0.5751705 0.3314989
posterior_summary(mcmcglmm_O1_2$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.489256 0.3137594 1.014461 2.227824
c2 <- 1
res_1 <- mcmcglmm_O1_2$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mcmcglmm_O1_2$VCV / (1+c2) # for variance components
res_3 <- mcmcglmm_O1_2$VCV / sqrt(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 0.89248312 1.3204797 -1.555262 3.5739178
# logMass -0.07683353 0.1538828 -0.406707 0.2344051
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.066318 3.310265 0.002299414 9.614361
# units 0.500000 0.000000 0.500000000 0.500000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 2.9222156 4.681422 0.003251863 13.5967596
# units 0.7071068 0.000000 0.707106781 0.7071068
#brms
summary(brms_OT2_1)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A))
# Data: Phasianidae_dat (Number of observations: 179)
# Draws: 2 chains, each with iter = 6000; warmup = 4000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 179)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.54 0.70 1.45 4.20 1.01 542 1179
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -0.21 1.09 -2.40 1.93 1.00 2080 2257
# Intercept[2] 1.33 1.10 -0.75 3.58 1.00 2219 2312
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NAsummary
No significant effect of body mass on habitat density: In all models, the posterior estimates of the slope for
logMasswere close to zero, with wide credible intervals that include zero. This suggests no strong evidence that body size influences habitat density preference in Phasianidae.Moderate phylogenetic signal: Phylogenetic variance was estimated as non-negligible across models (e.g., \(\lambda \approx 0.67–0.80\) after transformation), indicating moderate phylogenetic structuring in habitat density even after accounting for body mass.
One continuous and one categorical explanatory variable model
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior1 <- list(
R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
mcmcglmm_T2_3 <- MCMCglmm(Habitat.Density ~ logMass + Habitat.Density,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
system.time(
mcmcglmm_O2_3 <- MCMCglmm(Habitat.Density ~ logMass + Habitat.Density,
random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
A <- ape::vcv.phylo(tree, cor = TRUE)
default_priors3 <- default_prior(
Habitat.Density ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
brm_OT2_3 <- brm(
formula = Habitat.Density ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
)
)# MCMCglmm
summary(mcmcglmm_T2_3)
# Iterations = 1800001:7794001
# Thinning interval = 6000
# Sample size = 1000
#
# DIC: 264.3581
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 13.42 1.332 33.11 869.4
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Habitat.Density ~ cTail_length + IsHarbivore
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.2636 -5.2133 2.9340 1000 0.498
# cTail_length 0.5337 -0.4854 1.6519 1000 0.302
# IsHarbivore 1.0925 0.1875 2.1084 1100 0.012 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitHabitat.Density.1 1.838 1.142 2.773 1000
posterior_summary(mcmcglmm_T1_3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 1.987738 4.411821 0.009463594 9.04495
# units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_T1_3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.3096841 1.2767628 -1.0290056 3.809598026
# logMass -0.2664037 0.1660313 -0.6334123 0.007099615
# Habitat.Densityopen 1.3153548 0.4385812 0.6134014 2.247115510
# Habitat.Densitysemi-open 0.3954618 0.3425752 -0.2200316 1.117789403
posterior_summary(mcmcglmm_T1_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O2_3) # 95%HPD Interval
# Iterations = 1800001:7794001
# Thinning interval = 6000
# Sample size = 1000
#
# DIC: 223.0502
#
# G-structure: ~Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# Phylo 24.41 2.961 64.89 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: Habitat.Density ~ cTail_length + IsHarbivore
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.6861 -8.1646 3.3928 1000.0 0.520
# cTail_length 0.7124 -0.6599 2.2055 987.7 0.266
# IsHarbivore 1.5078 0.1850 2.8961 986.1 0.010 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitHabitat.Density.1 2.536 1.54 3.812 855.9
posterior_summary(mcmcglmm_O2_3$VCV) # 95%CI
# Estimate Est.Error Q2.5 Q97.5
# Phylo 24.40718 24.4568 5.150294 82.38519
# units 1.00000 0.0000 1.000000 1.00000
posterior_summary(mcmcglmm_O2_3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.6861096 2.8536673 -8.1735105 3.363553
# cTail_length 0.7124418 0.7462599 -0.5101511 2.467996
# IsHarbivore 1.5078371 0.7204467 0.2713007 3.103070
posterior_summary(mcmcglmm_O2_3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitHabitat.Density.1 2.53607 0.6201751 1.660939 4.032219
c2 <- 1
res_1 <- mcmcglmm_O2_3$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mcmcglmm_O2_3$VCV / (1+c2) # for variance components
res_3 <- mcmcglmm_O2_3$CP / sqrt(1+c2)
posterior_summary(res_1)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -1.1922596 2.0178475 -5.7795447 2.378391
# cTail_length 0.5037724 0.5276855 -0.3607313 1.745137
# IsHarbivore 1.0662019 0.5094327 0.1918386 2.194202
posterior_summary(res_2)
# Estimate Est.Error Q2.5 Q97.5
# Phylo 12.20359 12.2284 2.575147 41.1926
# units 0.50000 0.0000 0.500000 0.5000
posterior_summary(res_3)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitHabitat.Density.1 1.793272 0.43853 1.174461 2.851209
# brms
summary(brms_OT2_3)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: Habitat.Density ~ cTail_length + IsHarbivore + (1 | gr(Phylo, cov = A))
# Data: Phasianidae_dat (Number of observations: 179)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 179)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 2.90 0.92 1.57 5.05 1.00 428 833
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] 0.56 1.24 -1.96 3.00 1.00 1655 2098
# Intercept[2] 2.25 1.28 -0.18 4.85 1.00 1532 1706
# cTail_length 0.46 0.48 -0.39 1.50 1.00 1244 1370
# IsHarbivore 0.98 0.45 0.18 1.96 1.00 1524 1594
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NASummary
Diet is positively associated with habitat: The posterior estimates of the slope for
Dietwere positive and significant across all models, indicating that species with a herbivorous diet tend to occupy open habitats compared to those with other diets.No significant effect of body mass on habitat density: The posterior estimates of the slope for
logMasswere close to zero, with wide credible intervals that include zero, suggesting no strong evidence that body size influences habitat density preference in Phasianidae.
probabilities
This probabilities section uses the posterior summaries of fixed effects (linear predictors) and cut points from each of the three models (intercept-only, one predictor, two predictors) to calculate the predicted probabilities of each category in Habitat.Density.
calculate_probabilities2 <- function(cutpoint0, cutpoint1, l) {
category1_prob <- pnorm(cutpoint0 - l)
category2_prob <- pnorm(cutpoint1 - l) - pnorm(cutpoint0 - l)
category3_prob <- 1 - pnorm(cutpoint1 - l)
return(c(Dense_habitats = category1_prob,
Semi_open_habitats = category2_prob,
Open_habitats = category3_prob)
)
}
probabilities_mcmcglmm <- calculate_probabilities2(0, 1.01, 0.3178)
print(probabilities_mcmcglmm)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.3753183 0.3802758 0.2444059
probabilities_brms <- calculate_probabilities2(-0.32, 0.66, 0)
print(probabilities_brms)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.3744842 0.3708889 0.2546269 probabilities_mcmcglmm <- calculate_probabilities2(0, 1.052, 0.88928)
print(probabilities_mcmcglmm)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.1869263 0.3777042 0.4353694
probabilities_brms <- calculate_probabilities2(-0.85, 0.15, 0)
print(probabilities_brms)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.1976625 0.3619551 0.4403823 probabilities_mcmcglmm <- calculate_probabilities2(0, 1.116, 1.3097)
print(probabilities_mcmcglmm)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.09514867 0.32805672 0.57679460
probabilities_brms <- calculate_probabilities2(-1.33, -0.26, 0)
print(probabilities_brms)
# Dense_habitats Semi_open_habitats Open_habitats
# 0.09175914 0.30567275 0.60256811 4. Nominal models
When the response variable has more than two categories and un-ordered, it is treated as a nominal variable.
Prior setting
Sometimes, model convergence can be challenging, particularly in discrete models. In such cases, it is time to take a step forward - changing the default prior to a custom prior (i.e. appropriately informative prior, not over-informed prior) can improve performance and convergence.
Fixed Effects
For fixed effects, custom prior adjustments are typically not necessary. Because data usually provides enough information to estimate fixed effects accurately, so the prior does not have as much effect. Fixed effect priors can be a good thing if you know some biological limits (Gaussian models - e.g. bird morphology - some values are not realistic). Also, in MCMCglmm, the gelman.prior function can be particularly useful for logit and probit models. This function allows users to define a reasonable prior covariance matrix for the fixed effects, enhancing model stability and convergence. Specifically…
- For logit regression, where the variance of the error distribution (with mean = 0) is pi^2 / 3, the prior accounts for this variance to remain flat on the probability scale.
- For probit regression, where the variance of the error distribution is 1, the prior is adjusted to achieve a similarly flat distribution.
By incorporating these considerations, gelman.prior avoids excessive regularisation and ensures that the prior reflects the data while remaining weakly informative, as Gelman et al. (2008) recommended.
Random effects
Convergence issues often arise with random effects because variance components are inherently more challenging to estimate than fixed effects. This is because variance components represent variability around the mean, which requires a larger sample size to be estimated precisely. In contrast, fixed effects estimate the mean (e.g. the overall mean in an intercept-only model), which is more straightforward and usually requires a smaller sample size to stabilise. In such cases, using more informative priors can help, but careful consideration is necessary to balance convergence improvement and interpret model output.
Key points
Check autocorrelation: An auto-correlation plot helps assess the mixing of the chain. It shows the relationship between current and past samples in the MCMC chain. High auto-correlation (i.e., slow decay) indicates poor mixing because the chain is highly correlated across iterations, meaning that it is not exploring the parameter space efficiently. Ideally, the auto-correlation should drop to near zero quickly, indicating that the chain is moving smoothly and independently across the parameter space. - Lag (X-axis): The interval between the current and past samples. - Autocorrelation coefficient (Y-axis): Indicates the strength of the correlation, ranging from -1 to 1. A value close to 0 (excluding lag 0) is ideal.
In the context of MCMC, it would be important to differentiate between convergence and mixing, as these two terms are often conflated, but they refer to distinct aspects of the MCMC process. Convergence refers to whether the Markov chains have reached the true posterior distribution. For MCMC to be valid, the chains should eventually explore the entire parameter space of the target distribution. In many cases, this is evaluated by running multiple chains and checking whether they all converge to the same distribution (e.g., Rhat in brms). A value of Rhat close to 1.0 suggests that the chains have converged to the same posterior distribution.
Mixing refers to how well the Markov chain explores the parameter space during the sampling process. Good mixing means that the chain moves smoothly across the parameter space, avoiding long periods of stagnation in any particular region. Poor mixing can result in the chain getting stuck in one area of the parameter space and not exploring others effectively.
InMCMCglmm, since you can run a single chain per single model, you cannot directly check convergence of one model using multiple chains (but you can compare multiple chains using multipleMCMCglmm models). Instead, auto-correlation plots can give you insights into the quality of the sampling.
Practical benefits of informative priors for random effects
Informative priors for random effects improve model convergence and reduce computation time by restricting the MCMC sampler to a smaller parameter space. Such priors can influence both the point estimates (often) and uncertainty (always) of random effects (i.e. variance and covariance). However, they typically have less impact on fixed effect estimates. This allows you to obtain fixed effect estimates more quickly while improving convergence by constraining random effects. If your primary interest lies in fixed effect estimates rather than random effects, you can take advantage of this approach.
Potential Problems with informative priors
Over-constrained parameters: Excessively narrow priors can provide us with incorrect parameter estimates by limiting them to a small range.
Small sample size issues: When sample sizes are small, informative priors may dominate the posterior distribution, overshadowing the data. This can lead to the model failing to capture the data’s variability and oversimplifies reality.
When you want to estimate random effects properly, you need to use uninformative prior. But as you already know, we are hard to get convergent… The things we can do are:
- increase iteration (this means increasing the number of samples from the posterior)
- use longer burn-in (
MCMCglmm) / warm-up (brms) period to ensure convergence - increase thinning in
MCMCglmm- While thinning can enhance convergence inMCMCglmmby reducing autocorrelation,brmsgenerally do not need to control thinning
Examples Here is example of uninformative vs informative prior…
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
#uninformative
prior1 <- list(
R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
mcmcglmm_m1_100 <- MCMCglmm(Primary.Lifestyle ~ trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior1,
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
system.time(
mcmcglmm_m1_1000 <- MCMCglmm(Primary.Lifestyle ~ trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior1,
nitt = 13000*10000,
thin = 10*10000,
burnin = 3000*10000
)
)
# informative prior
prior2 <- list(
R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 200,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
mcmcglmm_m2 <- MCMCglmm(Primary.Lifestyle ~ trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior2,
nitt = 13000*75,
thin = 10*75,
burnin = 3000*75
)
)You can see that the increasing iteration, thinning, and burn-in can promote increasing effect sampling size and improving mixing.
summary(mcmcglmm_m1_100)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Sample size = 1000
#
# DIC: 259.4007
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 31.18 1.252 83.25 37.85
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -12.08 -70.340 59.37 41.96
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -12.08 -70.340 59.37 41.96
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 104.23 1.118 408.78 28.75
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -1.628 -10.272 7.019 148.9 0.662
# traitPrimary.Lifestyle.Terrestrial 5.279 -7.006 19.131 125.6 0.314
summary(mcmcglmm_m1_1000)
# Iterations = 3000001:12990001
# Thinning interval = 10000
# Sample size = 1000
#
# DIC: 273.7709
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 30.88 1.314 99.34 270.66
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -25.45 -94.475 29.87 218.26
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -25.45 -94.475 29.87 218.26
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 81.67 1.144 369.51 89.38
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -1.3033 -5.3581 2.3037 1000 0.478
# traitPrimary.Lifestyle.Terrestrial 0.9187 -3.7341 5.2263 1000 0.676
summary(mcmcglmm_m2)
# Iterations = 225001:974251
# Thinning interval = 750
# Sample size = 1000
#
# DIC: 298.0892
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 6.0759 0.2940 12.1120 731.3
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.0885 -0.9723 0.9838 1000.0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.0885 -0.9723 0.9838 1000.0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 6.8348 1.5880 13.6006 820.7
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -0.3218 -3.7935 3.0988 1000.0 0.908
# traitPrimary.Lifestyle.Terrestrial 2.2780 -1.3172 5.8322 888.4 0.180autocorr.plot(mcmcglmm_m1_100$VCV)autocorr.plot(mcmcglmm_m1_1000$VCV)autocorr.plot(mcmcglmm_m2$VCV)Explanation of dataset
Datasets for example 1
This dataset focuses on Turdidae species. A binary variable IsOmnivore based on trophic level is created.
trees <- read.nexus(here("data", "potential", "avonet", "trees.nex"))
tree <- trees[[1]]
dat <- read.csv(here("data", "potential", "avonet", "turdidae.csv"))
# Check tree$tip.label matches dat$Scientific_name
match_result <- setequal(tree$tip.label, dat$Phylo)
# Centering for continuous variables
dat <- dat %>%
mutate(
log_Mass_centered = scale(log(Mass), center = TRUE, scale = FALSE),
log_Tail_Length_centered = scale(log(Tail.Length), center = TRUE, scale = FALSE)
)
dat <- dat %>%
mutate(across(c(Trophic.Level, Trophic.Niche, Primary.Lifestyle, Migration, Habitat, Species.Status), as.factor))
dat$IsOmnivore <- ifelse(dat$Trophic.Level == "Omnivore", 1, 0)Datasets for example 2
For Example 2, data for the Sylviidae (warblers) were extracted from the AVONET dataset. The habitat categories were reclassified, migratory status was converted into a binary variable, and body mass was log-transformed and centered, resulting in a clean dataset suitable for subsequent statistical modelling.
dat <- read.csv(here("data", "bird body mass", "9993spp_clearned.csv"))
dat <- dat %>%
mutate(across(c(Trophic.Level, Trophic.Niche,
Primary.Lifestyle, Migration, Habitat, Habitat.Density, Species.Status), as.factor),
Habitat.Density = factor(Habitat.Density, ordered = TRUE))
dat <- dat %>%
mutate(
Habitat_Category = case_when(
Habitat %in% c("Desert", "Rock", "Grassland", "Shrubland") ~ "Arid_Open",
Habitat %in% c("Woodland", "Forest") ~ "Forested_Vegetated",
Habitat == "Human modified" ~ "Human-Modified",
Habitat %in% c("Riverine", "Coastal", "Marine", "Wetland") ~ "Aquatic_Coastal",
TRUE ~ "Other"
)
)
Family_habitat <- dat %>%
group_by(Family, Habitat_Category) %>%
tally() %>%
group_by(Family) %>%
ungroup()
Sylviidae_dat <- dat %>%
filter(Family == "Sylviidae") %>%
mutate(Habitat_Category = factor(Habitat_Category, ordered = FALSE))
Sylviidae_dat <- Sylviidae_dat %>%
mutate(IsMigrate = ifelse(Migration == "1", 1, 0)
)
table(Sylviidae_dat$Habitat_Category)
Aquatic_Coastal Arid_Open Forested_Vegetated Other
34 90 168 2
Sylviidae_dat <- Sylviidae_dat %>%
mutate(Habitat_Category = recode(Habitat_Category,
"Aquatic_Coastal" = "Others",
"Other" = "Others"))
Sylviidae_dat$cMass <- scale(log(Sylviidae_dat$Mass), center = TRUE, scale = FALSE)
Sylviidae_tree <- read.nexus(here("data", "bird body mass", "Sylviidae.nex"))
tree <- Sylviidae_tree[[1]]Example 1
Intercept-only model
MCMCglmm
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior5 <- list(
R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
mcmcglmm_mn1_1 <-MCMCglmm(Primary.Lifestyle ~ trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)brms
A <- ape::vcv.phylo(tree, corr = TRUE)
priors_brms1 <- default_prior(Primary.Lifestyle ~ 1 + (1 |a| gr(Phylo, cov = A)),
data = dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
brms_mn1_1 <- brm(Primary.Lifestyle ~ 1 + (1 |a| gr(Phylo, cov = A)),
data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
)
)As well as the binary model in MCMCglmm, nominal model is need to correct the estimates obtained from MCMCglmm as MCMCglmm implements additive overdispersion GLMM fitted by MCMC sampling from the posterior distribution. Again, we can use the below code:
c2 <- (16 * sqrt(3)/(15 * pi))^2
c2a <- (16 * sqrt(3)/(15 * pi))^2*(2/3)The results are…
summary(mcmcglmm_mn1_1)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Sample size = 1000
#
# DIC: 295.756
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 7.338 0.8372 15.386 764.5
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.164 -5.3745 2.788 885.3
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.164 -5.3745 2.788 885.3
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 8.168 1.1463 18.245 615.7
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -0.4015 -4.0226 2.5627 1000 0.804
# traitPrimary.Lifestyle.Terrestrial 1.6199 -1.2917 4.6445 1000 0.270
res_1 <- mcmcglmm_mn1_1$Sol / sqrt(1+c2a) # for fixed effects
res_2 <- mcmcglmm_mn1_1$VCV / (1+c2a) # for variance components
res_3_corr_phylo <- (mcmcglmm_mn1_1$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_1$VCV[, 1] * mcmcglmm_mn1_1$VCV[, 4])/(1+c2a))
summary(res_1)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial -0.362 1.480 0.04680 0.04680
# traitPrimary.Lifestyle.Terrestrial 1.460 1.385 0.04379 0.04379
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial -3.475 -1.2921 -0.3067 0.5483 2.605
# traitPrimary.Lifestyle.Terrestrial -1.125 0.5264 1.4485 2.3820 4.226
summary(res_2)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 5.9634 3.486 0.11023 0.12608
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.9456 1.643 0.05197 0.05524
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.9456 1.643 0.05197 0.05524
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 6.6372 4.233 0.13385 0.17058
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.000 0.00000 0.00000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.000 0.00000 0.00000
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.000 0.00000 0.00000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 1.6277 3.4946 5.2197 7.52933 14.5903
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -4.7030 -1.7201 -0.7662 -0.06118 2.0866
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -4.7030 -1.7201 -0.7662 -0.06118 2.0866
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.7250 3.8512 5.6452 8.20433 17.3970
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.5418 0.5418 0.54176 0.5418
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.2709 0.2709 0.27088 0.2709
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.2709 0.2709 0.27088 0.2709
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.5418 0.5418 0.54176 0.5418
summary(res_3_corr_phylo)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.144879 0.207344 0.006557 0.006916
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.51690 -0.28714 -0.14528 -0.01081 0.26479
summary(brms_mn1_1)
# Family: categorical
# Links: muInsessorial = logit; muTerrestrial = logit
# Formula: Primary.Lifestyle ~ 1 + (1 | a | gr(Phylo, cov = A))
# Data: dat (Number of observations: 173)
# Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
# total post-warmup draws = 10000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 173)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muInsessorial_Intercept) 4.76 2.40 1.96 10.81 1.00 1550 2181
# sd(muTerrestrial_Intercept) 5.76 3.37 2.14 14.61 1.00 1215 2349
# cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.28 0.36 -0.88 0.46 1.00 858 1508
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muInsessorial_Intercept -0.55 2.03 -4.86 3.32 1.00 4681 5207
# muTerrestrial_Intercept 1.30 2.29 -2.87 6.21 1.00 4028 4113When comparing the results, the fixed effects estimated by both MCMCglmm and brms are similar. However, there are notable differences in the random effect variances and correlations. The variances estimated by brms show higher uncertainty, with wider 95% credible intervals, and the correlations are closer to zero. This discrepancy is likely due to the informative priors used in MCMCglmm to prioritize model convergence.
Phylogenetic signals
And phylogenetic signal is…
# average phylogenetic signal - both vs. left
## MCMCglmm
phylo_signalI_mcmcglmm <- ((mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo"]/(1+c2a)) / (mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo"]/(1+c2a) + 1))
phylo_signalI_mcmcglmm %>% mean()
# [1] 0.8207548
phylo_signalI_mcmcglmm %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.6194383 0.8392215 0.9358569
## brms
phylo_signalI_brms <- brms_mn1_1 %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_Phylo__muInsessorial_Intercept) %>%
mutate(lambda_nominalI = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalI)
phylo_signalI_brms %>% mean()
# [1] 0.9330662
phylo_signalI_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.7940292 0.9474234 0.9915224
# average phylogenetic signal - both vs. right
## MCMCglmm
phylo_signalT_mcmcglmm <- ((mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo"]/(1+c2a)) / (mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo"]/(1+c2a) + 1))
phylo_signalT_mcmcglmm %>% mean()
# [1] 0.8336959
phylo_signalT_mcmcglmm %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.6330234 0.8495150 0.9456432
## brms
phylo_signalT_brms <- brms_mn1_1 %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_Phylo__muTerrestrial_Intercept) %>%
mutate(lambda_nominalT = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalT)
phylo_signalT_brms %>% mean()
# [1] 0.9477057
phylo_signalT_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.8209225 0.9606255 0.9953342Across both modelling frameworks, Primary.Lifestyle categories show high phylogenetic signal.
For the Insessorial category:
MCMCglmmestimates \(\lambda \approx 0.84\) (95% CI: 0.62–0.94)brmsestimates \(\lambda \approx 0.95\) (95% CI: 0.79–0.99)
For the Terrestrial category:
MCMCglmmestimates \(\lambda \approx 0.85\) (95% CI: 0.63–0.95)brmsestimates \(\lambda \approx 0.96\) (95% CI: 0.82–1.00)
This indicates a strong phylogenetic signal in the distribution of Primary.Lifestyle categories, suggesting that closely related species tend to share similar lifestyle traits.
One continuous explanatory variable model
MCMCglmm
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior5 <- list(
R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
mcmcglmm_mn1_2 <- MCMCglmm(Primary.Lifestyle ~ log_Tail_Length_centered:trait + trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*200,
thin = 10*200,
burnin = 3000*200
)
)brms
priors_brms2 <- default_prior(Primary.Lifestyle ~ log_Tail_Length_centered + (1 |a| gr(Phylo, cov = A)),
data = dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
brms_m1_2 <- brm(Primary.Lifestyle ~ log_Tail_Length_centered + (1 |a| gr(Phylo, cov = A)),
data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms2,
iter = 10000,
warmup = 5000,
chains = 2,
thin = 1
)
)Output correction and summary
summary(mcmcglmm_mn1_2)
# Iterations = 600001:2598001
# Thinning interval = 2000
# Sample size = 1000
#
# DIC: 279.4446
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 18.854 1.984 42.79 542.4
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.169 -10.648 8.87 784.9
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.169 -10.648 8.87 784.9
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 18.974 2.593 44.49 499.9
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ log_Tail_Length_centered:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -1.072 -7.345 5.322 912.0 0.744
# traitPrimary.Lifestyle.Terrestrial 3.279 -1.929 9.466 855.8 0.202
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.187 -5.100 2.964 908.8 0.594
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -2.694 -6.748 1.571 1000.0 0.178
# res_1 <- mcmcglmm_mn1_2$Sol / sqrt(1+c2a) # for fixed effects
# res_2 <- mcmcglmm_mn1_2$VCV / (1+c2a) # for variance components
# res_3_corr_phylo <- (mcmcglmm_mn1_2$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_2$VCV[, 1] * mcmcglmm_mn1_2$VCV[, 4])/(1+c2a))
summary(res_1)
# Iterations = 600001:2598001
# Thinning interval = 2000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial -0.9666 2.803 0.08865 0.09283
# traitPrimary.Lifestyle.Terrestrial 2.9557 2.670 0.08443 0.09126
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.0701 1.953 0.06176 0.06479
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -2.4283 1.909 0.06037 0.06037
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial -7.020 -2.654 -0.8061 0.7953 4.602
# traitPrimary.Lifestyle.Terrestrial -1.639 1.231 2.6791 4.4126 8.724
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -4.746 -2.317 -1.0952 0.2483 2.557
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -6.266 -3.592 -2.4131 -1.1127 1.320
summary(res_2)
# Iterations = 600001:2598001
# Thinning interval = 2000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 15.3215 10.870 0.3437 0.4667
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.9500 4.038 0.1277 0.1441
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.9500 4.038 0.1277 0.1441
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 15.4192 10.808 0.3418 0.4834
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 3.2225 8.5847 12.7689 19.2764 42.4203
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -9.2289 -2.7137 -0.8335 1.1495 6.6341
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -9.2289 -2.7137 -0.8335 1.1495 6.6341
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 3.4482 8.1572 12.7788 19.6677 43.7071
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 600001:2598001
# Thinning interval = 2000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.064355 0.212862 0.006731 0.006731
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.46205 -0.21687 -0.07002 0.08667 0.35059
summary(brms_mn1_2)
# Family: categorical
# Links: muInsessorial = logit; muTerrestrial = logit
# Formula: Primary.Lifestyle ~ log_Tail_Length_centered + (1 | a | gr(Phylo, cov = A))
# Data: dat (Number of observations: 173)
# Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
# total post-warmup draws = 10000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 173)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muInsessorial_Intercept) 6.14 3.83 2.13 15.68 1.00 1627 2326
# sd(muTerrestrial_Intercept) 6.90 4.68 2.32 19.89 1.00 1412 1902
# cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.15 0.40 -0.84 0.64 1.00 1304 2835
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muInsessorial_Intercept -0.53 2.29 -5.38 3.86 1.00 12414 6264
# muTerrestrial_Intercept 1.34 2.45 -3.08 6.58 1.00 10541 5718
# muInsessorial_log_Tail_Length_centered -2.17 3.08 -9.60 2.81 1.00 4513 3030
# muTerrestrial_log_Tail_Length_centered -2.95 3.18 -9.99 2.58 1.00 4134 2746Both MCMCglmm and brms indicate very high phylogenetic variance for both Insessorial and Terrestrial lifestyles with only weak and uncertain phylogenetic correlation between the two categories. The differences in variance estimates likely stem from the informative priors used in MCMCglmm to aid convergence.
One continuous and one binary explanatory variables model
MCMCglmm
prior5 <- list(
R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
mcmcglmm_mn1_3 <- MCMCglmm(Primary.Lifestyle ~ log_Tail_Length_centered:trait + IsOmnivore:trait + trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*250,
thin = 10*250,
burnin = 3000*250
)
)brms
system.time(
brms_mn1_3 <- brm(
Primary.Lifestyle ~ log_Tail_Length_centered + IsOmnivore + (1 |a| gr(Phylo, cov = A)),
data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = custom_priors,
iter = 15000,
warmup = 5000,
chains = 2,
thin = 1
)
)Outputs are…
summary(mcmcglmm_mn1_3)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Sample size = 1000
#
# DIC: 244.733
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 22.932 2.484e+00 51.826 581.7
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.274 -1.179e+01 8.518 899.2
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.274 -1.179e+01 8.518 899.2
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 15.898 3.475e-04 44.088 627.4
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667 0.6667 0.6667 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333 0.3333 0.3333 0
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667 0.6667 0.6667 0
#
# Location effects: Primary.Lifestyle ~ log_Tail_Length_centered:trait + IsOmnivore:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitPrimary.Lifestyle.Insessorial -1.01465 -8.38540 5.59217 894.2 0.818
# traitPrimary.Lifestyle.Terrestrial 3.98040 -1.31604 9.80666 1200.6 0.092 .
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.23330 -6.68483 2.66615 1000.0 0.610
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -1.89662 -6.62854 2.37056 1113.2 0.406
# traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.08497 -1.87839 2.24356 835.3 0.962
# traitPrimary.Lifestyle.Terrestrial:IsOmnivore -5.14025 -8.12432 -2.71908 848.1 <0.001 ***
res_1 <- mcmcglmm_mn1_3$Sol / sqrt(1+c2a) # for fixed effects
res_2 <- mcmcglmm_mn1_3$VCV / (1+c2a) # for variance components
res_3_corr_phylo <- (mcmcglmm_mn1_3$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_3$VCV[, 1] * mcmcglmm_mn1_3$VCV[, 4])/(1+c2a))
res_3_corr_nonphylo <- (mcmcglmm_mn1_3$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn1_3$VCV[, 5] * mcmcglmm_mn1_3$VCV[, 8])/(1+c2a))
summary(res_1)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial -0.9147 3.2692 0.10338 0.10932
# traitPrimary.Lifestyle.Terrestrial 3.5882 2.4961 0.07893 0.07204
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.1118 2.1376 0.06760 0.06760
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -1.7097 2.0720 0.06552 0.06210
# traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.0766 0.9584 0.03031 0.03316
# traitPrimary.Lifestyle.Terrestrial:IsOmnivore -4.6338 1.2856 0.04065 0.04415
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial -7.874 -2.8074 -0.64157 1.1515 4.945
# traitPrimary.Lifestyle.Terrestrial -1.002 1.8582 3.25656 5.0788 9.368
# log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -5.840 -2.4495 -0.92478 0.3622 2.750
# log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -5.978 -3.0120 -1.64566 -0.2734 2.103
# traitPrimary.Lifestyle.Insessorial:IsOmnivore -1.675 -0.5924 0.06417 0.7068 2.066
# traitPrimary.Lifestyle.Terrestrial:IsOmnivore -7.561 -5.3247 -4.47341 -3.6857 -2.669
summary(res_2)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 18.6355 11.978 0.3788 0.4966
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.0351 3.971 0.1256 0.1324
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.0351 3.971 0.1256 0.1324
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 12.9191 11.113 0.3514 0.4437
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.000 0.0000 0.0000
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 4.0308 10.2520 15.9642 24.0021 49.6277
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -10.0779 -2.7629 -0.7674 0.8149 6.5968
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -10.0779 -2.7629 -0.7674 0.8149 6.5968
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.4373 4.6612 10.1930 17.3835 44.3732
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.069527 0.220096 0.006960 0.006403
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.49120 -0.21345 -0.07966 0.08421 0.37013
summary(res_3_corr_nonphylo)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.4507 0.0000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn1_3)
# Family: categorical
# Links: muInsessorial = logit; muTerrestrial = logit
# Formula: Primary.Lifestyle ~ log_Tail_Length_centered + IsOmnivore + (1 | a | gr(Phylo, cov = A))
# Data: dat (Number of observations: 173)
# Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
# total post-warmup draws = 10000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 173)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muInsessorial_Intercept) 7.18 4.74 2.26 20.74 1.00 1052 1590
# sd(muTerrestrial_Intercept) 5.76 4.05 0.63 16.04 1.00 920 963
# cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.24 0.42 -0.92 0.64 1.00 1248 2800
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muInsessorial_Intercept -0.96 2.39 -6.16 3.36 1.00 3821 3835
# muTerrestrial_Intercept 3.72 2.49 0.09 9.85 1.00 2250 2363
# muInsessorial_log_Tail_Length_centered -2.05 3.19 -9.66 3.14 1.00 2889 2273
# muInsessorial_IsOmnivore 0.87 1.76 -1.87 5.32 1.00 1783 1843
# muTerrestrial_log_Tail_Length_centered -1.50 2.89 -7.61 4.13 1.00 4115 3124
# muTerrestrial_IsOmnivore -6.36 3.10 -14.70 -2.92 1.00 1269 1638- The effect of tail length on lifestyle probability was negative for both categories in both models, but the 95% CIs include zero, indicating no strong evidence for an effect.
- Results from
MCMCglmmandbrmswere broadly consistent in magnitude and direction of effects. - Only weak and uncertain phylogenetic correlation between the two categories.
Example 2
We use the data of habitat category on Sylviidae (Old World warbler - 294 species) as the response variable. Mass and presence or absence of migration are used as the explanatory variable.
Intercept-only model
MCMCglmm
inv_phylo <- inverseA(tree, nodes = "ALL", scale = TRUE)
prior <- list(
R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
mcmcglmm_mn2_1 <- MCMCglmm(Habitat_Category ~ trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*500,
thin = 10*500,
burnin = 3000*500
)brms
A <- ape::vcv.phylo(tree, corr = TRUE)
priors_brms1 <- default_prior(Habitat_Category ~ 1 + (1 |a| gr(Phylo, cov = A)),
data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
brms_mn2_1 <- brm(Habitat_Category ~ 1 + (1 |a| gr(Phylo, cov = A)),
data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
)
)Before checking the output,we need to correct the results from MCMCglmm using the following equation:
c2 <- (16 * sqrt(3) / (15 * pi))^2
c2a <- c2*(2/3)summary(mcmcglmm_mn2_1)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Sample size = 1000
#
# DIC: 379.9416
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.151 1.003 16.839 854.5
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.010 -2.392 7.668 1000.0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.010 -2.392 7.668 1000.0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 14.710 3.210 29.105 784.9
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667 0.6667 0.6667 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667 0.6667 0.6667 0
#
# Location effects: Habitat_Category ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitHabitat_Category.Arid_Open 2.44871 0.09382 4.88885 905.8 0.036 *
# traitHabitat_Category.Forested_Vegetated 4.56420 1.35941 7.49998 892.7 <0.001 ***
res_1 <- mcmcglmm_mn2_1$Sol / sqrt(1+c2a) # for fixed effects
res_2 <- mcmcglmm_mn2_1$VCV / (1+c2a) # for variance components
res_3_corr_phylo <- (mcmcglmm_mn2_1$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_1$VCV[, 1] * mcmcglmm_mn2_1$VCV[, 4])/(1+c2a))
res_3_corr_nonphylo <- (mcmcglmm_mn2_1$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn2_1$VCV[, 5] * mcmcglmm_mn2_1$VCV[, 8])/(1+c2a))
summary(res_1)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open 2.207 1.140 0.03605 0.03788
# traitHabitat_Category.Forested_Vegetated 4.114 1.417 0.04480 0.04741
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open 0.1303 1.522 2.095 2.828 4.518
# traitHabitat_Category.Forested_Vegetated 1.5750 3.173 4.004 4.958 7.352
summary(res_2)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 5.8115 4.203 0.13290 0.14377
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.6331 2.320 0.07337 0.07337
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.6331 2.320 0.07337 0.07337
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 11.9537 6.815 0.21551 0.24325
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.3308 3.1661 4.8034 6.9772 15.7767
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -1.5487 0.2950 1.2254 2.4747 7.0730
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -1.5487 0.2950 1.2254 2.4747 7.0730
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.0126 7.3840 10.4777 14.8650 28.3674
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.172049 0.184925 0.005848 0.005848
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.17512 0.03699 0.17474 0.30106 0.51285
summary(res_3_corr_nonphylo)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.4507 0.0000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn2_1)
# Family: categorical
# Links: muAridOpen = logit; muForestedVegetated = logit
# Formula: Habitat_Category ~ 1 + (1 | a | gr(Phylo, cov = A))
# Data: Sylviidae_dat (Number of observations: 294)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 294)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muAridOpen_Intercept) 2.58 0.90 1.24 4.69 1.00 4049 7462
# sd(muForestedVegetated_Intercept) 4.14 1.33 2.20 7.19 1.00 2119 5431
# cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.44 0.31 -0.25 0.90 1.00 1116 2854
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muAridOpen_Intercept 2.24 1.17 -0.05 4.66 1.00 3552 6462
# muForestedVegetated_Intercept 3.62 1.49 0.88 6.83 1.00 12256 10030And average phylogenetic signals are…
# average phylogenteic signal - others vs. Arid and open
# MCMCglmm
phylo_signalA_mcmcglmm_nominal <- ((mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo"]/(1+c2a)) / (mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo"]/(1+c2a) + 1))
phylo_signalA_mcmcglmm_nominal %>% mean()
# [1] 0.808917
phylo_signalA_mcmcglmm_nominal %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.5709550 0.8276878 0.9403931
## brms
phylo_signalA_brms <- brms_mn2_1 %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_Phylo__muAridOpen_Intercept) %>%
mutate(lambda_nominalA = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalA)
phylo_signalA_brms %>% mean()
# [1] 0.8370854
phylo_signalA_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.6055780 0.8563134 0.9565012
# average phylogenteic signal - both vs. right
## MCMCglmm
phylo_signalF_mcmcglmm_nominal <- ((mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo"]/(1+c2a)) / (mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo"]/(1+c2a) + 1))
phylo_signalF_mcmcglmm_nominal %>% mean()
# [1] 0.9044678
phylo_signalF_mcmcglmm_nominal %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.8004989 0.9128743 0.9659487
# brms
phylo_signalF_brms <- brms_mn2_1 %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_Phylo__muForestedVegetated_Intercept) %>%
mutate(lambda_nominalF = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalF)
phylo_signalF_brms %>% mean()
# [1] 0.9304826
phylo_signalF_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.8292983 0.9394892 0.9810195 Both models detected high phylogenetic signal for Habitat_Category, especially for Forested_Vegetated and slightly lower for Arid_Open category. The phylogenetic correlation between the two habitat categories tended to be positive, but the 95% credible intervals included zero in both models, indicating that the correlation was not statistically supported.
One continuous explanatory variable model
MCMCglmm
mcmcglmm_mn2_2 <- MCMCglmm(Habitat_Category ~ cMass:trait + trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*500,
thin = 10*500,
burnin = 3000*500
)brms
A <- ape::vcv.phylo(tree, corr = TRUE)
priors_brms2 <- default_prior(Habitat_Category ~ cMass + (1 |a| gr(Phylo, cov = A)),
data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
brms_mn2_2 <- brm(Habitat_Category ~ cMass + (1 |a| gr(Phylo, cov = A)),
data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms2,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
)
)summary(mcmcglmm_mn2_2)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Sample size = 1000
#
# DIC: 369.9206
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.923 1.119 18.446 649.8
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.344 -2.666 9.503 770.0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.344 -2.666 9.503 770.0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 18.943 4.131 41.543 745.4
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667 0.6667 0.6667 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667 0.6667 0.6667 0
#
# Location effects: Habitat_Category ~ cMass:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitHabitat_Category.Arid_Open 2.62403 -0.08498 5.40782 1000.0 0.048 *
# traitHabitat_Category.Forested_Vegetated 4.74629 1.55204 8.47272 1000.0 0.002 **
# cMass:traitHabitat_Category.Arid_Open -0.79705 -2.34051 0.62992 1000.0 0.294
# cMass:traitHabitat_Category.Forested_Vegetated 1.03745 -0.87700 2.97350 892.6 0.252
res_1 <- mcmcglmm_mn2_2$Sol / sqrt(1+c2a) # for fixed effects
res_2 <- mcmcglmm_mn2_2$VCV / (1+c2a) # for variance components
res_3_corr_phylo <- (mcmcglmm_mn2_2$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_2$VCV[, 1] * mcmcglmm_mn2_2$VCV[, 4])/(1+c2a))
res_3_corr_nonphylo <- (mcmcglmm_mn2_2$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn2_2$VCV[, 5] * mcmcglmm_mn2_2$VCV[, 8])/(1+c2a))
summary(res_1)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open 2.207 1.140 0.03605 0.03788
# traitHabitat_Category.Forested_Vegetated 4.114 1.417 0.04480 0.04741
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open 0.1303 1.522 2.095 2.828 4.518
# traitHabitat_Category.Forested_Vegetated 1.5750 3.173 4.004 4.958 7.352
summary(res_2)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 5.8115 4.203 0.13290 0.14377
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.6331 2.320 0.07337 0.07337
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.6331 2.320 0.07337 0.07337
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 11.9537 6.815 0.21551 0.24325
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.3308 3.1661 4.8034 6.9772 15.7767
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -1.5487 0.2950 1.2254 2.4747 7.0730
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -1.5487 0.2950 1.2254 2.4747 7.0730
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.0126 7.3840 10.4777 14.8650 28.3674
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.172049 0.184925 0.005848 0.005848
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.17512 0.03699 0.17474 0.30106 0.51285
summary(res_3_corr_nonphylo)
# Iterations = 1500001:6495001
# Thinning interval = 5000
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.4507 0.0000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn2_2)
# Family: categorical
# Links: muAridOpen = logit; muForestedVegetated = logit
# Formula: Habitat_Category ~ cMass + (1 | a | gr(Phylo, cov = A))
# Data: Sylviidae_dat (Number of observations: 294)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 294)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muAridOpen_Intercept) 2.63 0.94 1.22 4.90 1.00 3376 6591
# sd(muForestedVegetated_Intercept) 5.00 1.69 2.60 9.08 1.00 1488 4134
# cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.48 0.31 -0.23 0.93 1.00 794 1765
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muAridOpen_Intercept 2.47 1.20 0.18 5.03 1.00 2925 5132
# muForestedVegetated_Intercept 3.59 1.72 0.41 7.25 1.00 10315 9625
# muAridOpen_cMass -0.55 0.75 -2.08 0.92 1.00 8518 11678
# muForestedVegetated_cMass 1.29 1.08 -0.62 3.61 1.00 6534 8329
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).Summary The model intercept represents the log-odds of belonging to a given habitat category when predictors are at their reference values (here, cMass = 0). For Arid/Open and Forested/Vegetated, the intercepts were positive, and in most cases the 95% credible intervals did not include zero. This means that, under baseline conditions, the probability of belonging to these two habitat categories is higher than for the reference category (Aquatic/Coastal). In other words, the model predicts that species are more likely to occur in Arid/Open or Forested/Vegetated habitats than in Aquatic/Coastal habitats under the reference conditions.
Mass effects: The effect of centred log body mass (cMass) was negative for Arid/Open and positive for Forested/Vegetated, but in both models the 95% credible intervals crossed zero, suggesting no clear statistical support.
Phylogenetic variances: Substantial phylogenetic variance was observed for both habitat categories, indicating moderate to strong phylogenetic structuring.
Phylogenetic correlation: The correlation between habitat categories was small and positive but highly uncertain, with 95% credible intervals spanning zero.
One continuous and one binary explanatory variables model
MCMCglmm
mcmcglmm_mn2_3 <- MCMCglmm(Habitat_Category ~ cMass:trait + IsMigrate:trait + trait -1,
random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*750,
thin = 10*750,
burnin = 3000*750
)brms
A <- ape::vcv.phylo(tree, corr = TRUE)
priors_brms3 <- default_prior(Habitat_Category ~ cMass + IsMigrate + (1 |a| gr(Phylo, cov = A)),
data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
brms_mn2_3 <- brm(Habitat_Category ~ cMass + IsMigrate + (1 |a| gr(Phylo, cov = A)),
data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms3,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
)
)summary(mcmcglmm_mn2_3)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Sample size = 1000
#
# DIC: 353.666
#
# G-structure: ~us(trait):Phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 9.083 1.196 21.74 718.3
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.660 -4.574 8.40 779.0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.660 -4.574 8.40 779.0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 20.892 2.772 44.08 678.4
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667 0.6667 0.6667 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333 0.3333 0.3333 0
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667 0.6667 0.6667 0
#
# Location effects: Habitat_Category ~ cMass:trait + IsMigrate:trait + trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitHabitat_Category.Arid_Open 2.3783 -0.5852 5.1494 988.4 0.090 .
# traitHabitat_Category.Forested_Vegetated 2.4444 -0.8470 6.7465 1000.0 0.172
# cMass:traitHabitat_Category.Arid_Open -0.8778 -2.6132 0.6125 1588.3 0.258
# cMass:traitHabitat_Category.Forested_Vegetated 0.8132 -1.2024 2.7663 1000.0 0.424
# traitHabitat_Category.Arid_Open:IsMigrate 0.4607 -0.9485 1.8999 905.7 0.544
# traitHabitat_Category.Forested_Vegetated:IsMigrate 3.0348 1.1467 5.1125 1000.0 <0.001 ***
res_1 <- mcmcglmm_mn2_3$Sol / sqrt(1+c2a) # for fixed effects
res_2 <- mcmcglmm_mn2_3$VCV / (1+c2a) # for variance components
res_3_corr_phylo <- (mcmcglmm_mn2_3$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_3$VCV[, 1] * mcmcglmm_mn2_3$VCV[, 4])/(1+c2a))
res_3_corr_nonphylo <- (mcmcglmm_mn2_3$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn2_3$VCV[, 5] * mcmcglmm_mn2_3$VCV[, 8])/(1+c2a))
summary(res_1)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open 2.1440 1.2735 0.04027 0.04051
# traitHabitat_Category.Forested_Vegetated 2.2035 1.7196 0.05438 0.05438
# cMass:traitHabitat_Category.Arid_Open -0.7913 0.7334 0.02319 0.01840
# cMass:traitHabitat_Category.Forested_Vegetated 0.7331 0.9346 0.02955 0.02955
# traitHabitat_Category.Arid_Open:IsMigrate 0.4153 0.6657 0.02105 0.02212
# traitHabitat_Category.Forested_Vegetated:IsMigrate 2.7358 0.9074 0.02870 0.02870
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open -0.4477 1.37995 2.1559 2.8690 4.8198
# traitHabitat_Category.Forested_Vegetated -0.9808 1.11585 2.1584 3.1992 5.9649
# cMass:traitHabitat_Category.Arid_Open -2.3303 -1.24354 -0.7433 -0.2922 0.5856
# cMass:traitHabitat_Category.Forested_Vegetated -1.0181 0.08259 0.6961 1.3172 2.7128
# traitHabitat_Category.Arid_Open:IsMigrate -0.9151 -0.03301 0.4117 0.8748 1.7025
# traitHabitat_Category.Forested_Vegetated:IsMigrate 1.1262 2.12920 2.6469 3.2897 4.723
summary(res_2)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.3814 5.319 0.16821 0.19848
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.3490 2.643 0.08358 0.09469
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.3490 2.643 0.08358 0.09469
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 16.9777 10.703 0.33847 0.41092
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.000 0.00000 0.00000
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.5684 3.7512 5.9438 9.6229 20.8510
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -3.6719 -0.1351 1.1143 2.6647 7.1611
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -3.6719 -0.1351 1.1143 2.6647 7.1611
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.2987 9.7029 14.3953 21.6290 43.4114
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418 0.5418 0.5418 0.5418 0.5418
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709 0.2709 0.2709 0.2709 0.2709
# traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418 0.5418 0.5418 0.5418 0.5418
summary(res_3_corr_phylo)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.117832 0.197752 0.006253 0.006253
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.27624 -0.01752 0.12623 0.25980 0.49079
summary(res_3_corr_nonphylo)
# Iterations = 2250001:9742501
# Thinning interval = 7500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 0.4507 0.0000 0.0000 0.0000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn2_3)
# Family: categorical
# Links: muAridOpen = logit; muForestedVegetated = logit
# Formula: Habitat_Category ~ cMass + IsMigrate + (1 | a | gr(Phylo, cov = A))
# Data: Sylviidae_dat (Number of observations: 294)
# Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
# total post-warmup draws = 20000
#
# Multilevel Hyperparameters:
# ~Phylo (Number of levels: 294)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muAridOpen_Intercept) 2.93 1.24 1.28 5.93 1.00 2322 3372
# sd(muForestedVegetated_Intercept) 6.45 3.41 2.65 15.31 1.00 1254 1903
# cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.35 0.35 -0.42 0.90 1.00 680 1883
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muAridOpen_Intercept 2.08 1.30 -0.42 4.81 1.00 3324 5760
# muForestedVegetated_Intercept 0.81 2.34 -4.26 5.08 1.00 4304 3182
# muAridOpen_cMass -0.64 0.81 -2.26 0.92 1.00 4954 9099
# muAridOpen_IsMigrate 0.53 0.72 -0.92 1.94 1.00 5983 9135
# muForestedVegetated_cMass 1.44 1.38 -0.81 4.62 1.00 3507 3935
# muForestedVegetated_IsMigrate 3.63 1.73 1.37 7.83 1.00 2677 2363
#
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).Summary
Baseline habitat use: When migration status and body mass are not considered, Sylviidae species are somewhat more likely to be in Arid/Open or Forested/Vegetated habitats than in the reference category (Aquatic/Coastal), but the statistical uncertainty is large enough that we cannot be confident in these differences. Body mass effects: There is no clear evidence that heavier or lighter species prefer either Arid/Open or Forested/Vegetated habitats.
Migration effects: Migratory species are much more likely to be associated with Forested/Vegetated habitats compared to Aquatic/Coastal ones, even after accounting for body mass and phylogeny. Migration does not appear to strongly influence the likelihood of using Arid/Open habitats.
Phylogenetic patterns: The evolutionary tendencies for using Arid/Open vs. Forested/Vegetated habitats are only weakly related, and the data do not clearly indicate a consistent positive or negative correlation across the family’s evolutionary history.
Extension: models with multiple data point per species
Tips for simulating dataset
Unfortunately, there are few published dataset for multiple datapoints per species. Therefore, we generate 1. Gaussian, 2. binary. 3. ordinal, and 4. nominal data for this section and use for MCMCglmm and brms models.
Before moving each section, we share some tips we need to keep in mind when simulating data:
Make the dataset as realistic as possible
Simulated data should reflect the real-world context. If you are simulating biological data, consider the typical distributions, relationships, and range of values that could appear in your real dataset. You may need check published papers whether your assumption is suitable.Carefully define random effect(s) and fixed effect(s) parameters
For generating a response variable, ensure that both fixed and random effects are specified clearly. The fixed effects could represent known, deterministic influences (like treatment groups), while random effects account for variability due to unmeasured factors (like species or individuals). When setting residuals (errors) for the Gaussian response variable, remember that the variance of the residuals should typically be smaller than the variance of the random effects. Because they represent smaller-scale noise, while random effects capture larger-scale variability. If you set the residual variance is larger, it can lead model unrealistic.Check true values (we set) and model estimates
After generating the data and fitting your model, compare the “true” parameter values you set for the simulation with the estimates your model provides. This will help you assess the accuracy of your model and how well it can recover the true underlying parameters. It is important to check whether the model overestimates or underestimates certain effects.Ensure all created variables are properly aligned
When simulating data, it is essential to check that all the variables are correctly aligned and correspond to each other as intended. Specifically, pay attention to the following points:Matching species names with those in the phylogenetic tree
Ensure that the species names used in your data set are consistent with those in the phylogenetic tree (if applicable). If these names don’t match, the integrity of the data could be compromised, leading to inaccurate results (or model can not run). For instance, if a species is named “SpeciesA” in the data but referred to as “SpeciesB” in the phylogeny, the data won’t be interpreted correctly.Correspondence between variables
Also check that other variables, such as individual IDs, environmental factors, gender, or group differences, are correctly matched. For example, verify that data points for each individual are correctly associated with the intended ID, and that variables like age and gender follow expected distributions. How to check: After creating your data frame, use functions, such ashead(),summary(), orView()in R (you can find other functions as well) to check for any discrepancies, such as duplicated species names or missing values. This helps that your dataset is internally consistent.
Other small things…
- Read the documentation
It may not be worth mentioning again, but always read the documentation for the functions yo are using. Understand how they work, what parameters they take, and what outputs they produce. - Test with small examples
If you are using a new function, start by testing it on a small sample of data to see how it behaves before using it in your actual analysis. This helps you avoid larger mistakes when you apply it to the full dataset. - Stick to familiar methods
However, stick to methods and functions you are familiar with, especially when you are starting. As you encounter issues or need more advanced features, gradually learn new functions and techniques. It is a safer approach and will help you build confidence.
Simulate datasets and run models
In this extension, We aim to create a dataset for models that take different types of response variables: Gaussian, binary, ordinal, and nominal traits. The explanatory variables are all the same - body mass and sex. All datasets were generated based on observations of 100 bird species, with each species being observed 5 times.
Gaussian
We simulated the habitat range (log-transformed) of each species as the continuous response variable. We assumed habitat range is more strongly influenced by body mass than by sex, and phylogenetic factors are expected to have a greater impact than non-phylogenetic factors.
set.seed(1234)
tree <- pbtree(n = 100, scale = 1)
# extract the correlation matrix from the phylogenetic tree
phylo_cor <- vcv(tree, corr = TRUE)
# define parameters
n_species <- 100 # number of species (clusters)
obs_per_species <- 5 # number of observations per species
n <- n_species * obs_per_species # total number of observations
# simulate fixed effects
## continuous variable: body mass
mu_mass <- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_species_mass <- mvrnorm(n = 1, # species level body mass
mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
obs_mass <- sapply(mu_species_mass,
function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.1))) %>%
as.vector() # each observation body mass
## binomial variable: sex
sex <- rbinom(n, 1, 0.5)
# simulate random effect parameter
## variance components
sigma_phylo <- 2
sigma_non <- 1 # standard deviation of non-phylogenetic random effect
# phylogenetic random effect
random_effect_phylo <- mvrnorm(
n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
phylo_effect <- rep(random_effect_phylo, each = obs_per_species)
# non-phylogenetic random effect
random_effect_non_phylo <- rnorm(n_species, sd = sqrt(sigma_non))
non_phylo_effect <- rep(random_effect_non_phylo, each = obs_per_species)
# residual
sigma_residual <- 0.2
residual <- rnorm(n, mean = 0, sd = sqrt(sigma_residual))
# define fixed effect parameter
beta_c0 <- 0 # intercept
beta_c1 <- 1.2 # coefficient for body mass
beta_c2 <- 0.1 # coefficient for sex
y <- beta_c0 + beta_c1 * obs_mass + beta_c2 * sex + phylo_effect + non_phylo_effect + residual
sim_data1 <- data.frame(
habitat_range = y,
mass = obs_mass,
sex = factor(sex, labels = c("F", "M")),
species = names(phylo_effect),
individual_id = factor(1:n), # create unique individual IDs for each observation
phylo = names(phylo_effect)
) %>%
arrange(species)Run models
Ainv <- inverseA(tree, nodes = "ALL", scale = TRUE)$Ainv
prior1_mcmcglmm <- list(R = list(V = 1, nu = 0.002),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 100),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 100)
)
)
system.time(
mod1_mcmcglmm <- MCMCglmm(
habitat_range ~ 1,
random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
system.time(
mod1_mcmcglmm2 <- MCMCglmm(
habitat_range ~ mass,
random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
system.time(
mod1_mcmcglmm3 <- MCMCglmm(
habitat_range ~ mass + sex,
random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*250,
thin = 10*250,
burnin = 3000*250
)
)
# brms
A <- ape::vcv.phylo(tree, corr = TRUE)
priors1_brms <- default_prior(
habitat_range ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
mod1_brms <- brm(habitat_range ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
priors1_brms2 <- default_prior(
habitat_range ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
mod1_brms2 <- brm(habitat_range ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms2,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
priors1_brms3 <- default_prior(
habitat_range ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
mod1_brms3 <- brm(habitat_range ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms3,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)Results
actual mean and variance of the model is…
mean(y)[1] 5.560533
var(random_effect_phylo)[1] 0.9475315
var(random_effect_non_phylo)[1] 0.8428301
var(residual)[1] 0.1892837
summary(mod1_mcmcglmm)
# Iterations = 180001:779401
# Thinning interval = 600
# Sample size = 1000
#
# DIC: 998.4984
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.165 0.1108 2.704 881.6
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 1.036 0.5946 1.504 1323
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 0.3573 0.3125 0.4029 1000
#
# Location effects: habitat_range ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 5.239 4.131 6.302 1000 <0.001 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(mod1_brms)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: habitat_range ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data1 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.99 0.36 0.33 1.76 1.00 345 722
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.02 0.12 0.78 1.26 1.00 661 1451
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 5.28 0.51 4.22 6.30 1.00 4816 2515
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.60 0.02 0.56 0.64 1.00 3973 2562autocorr.plot(mod1_mcmcglmm$Sol)autocorr.plot(mod1_mcmcglmm$VCV)posterior_summary(mod1_mcmcglmm$Sol) Estimate Est.Error Q2.5 Q97.5
(Intercept) 5.238814 0.5481793 4.036911 6.270514
posterior_summary(mod1_mcmcglmm$VCV) Estimate Est.Error Q2.5 Q97.5
phylo 1.1653905 0.78259640 0.1622860 2.9869859
species 1.0359759 0.24017414 0.6052211 1.5251053
units 0.3572733 0.02396132 0.3150877 0.4083013
We will now verify whether the model could show estimates close to the true values from the simulated data - there are several points to check during this process. First, confirm whether the model converged by examining metrics such as the effective sampling size, autocorrelation, and R-hat values. Then, compare the true value with model point estimates and check the 95% credible intervals of point estimates. The intercept-only model does not include any independent variables. Instead, it focuses on the intercept (mean value) and variance components to explain the distribution of the dependent variable, habitat range. Looking at the results, the model appears to estimate the mean (fixed effect beta/intercept) and random effect variances close to their true values. However, the residual variance is not accurately estimated.
# MCMCglmm
phylo_signal_mod1_mcmcglmm <- ((mod1_mcmcglmm$VCV[, "phylo"]) / (mod1_mcmcglmm$VCV[, "phylo"] + mod1_mcmcglmm$VCV[, "species"] + mod1_mcmcglmm$VCV[, "units"]))
phylo_signal_mod1_mcmcglmm %>% mean()
# [1] 0.4170196
phylo_signal_mod1_mcmcglmm %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.08783016 0.42359454 0.73669927
# brms
phylo_signal_mod1_brms <- mod1_brms %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept, Res = sigma) %>%
mutate(lambda_gaussian = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + Res^2)) %>%
pull(lambda_gaussian)
phylo_signal_mod1_brms %>% mean()
# [1] 0.3984794
phylo_signal_mod1_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.06146131 0.39645518 0.74712062 The phylogenetic signal was estimated as 0.42 (95% CI: 0.09, 0.74) using MCMCglmm and 0.40 (95% CI: 0.06, 0.75) using brms.It is important to note that we can now consider non-phylogenetic variance (here, species) more accurately, as the non-phylogenetic variance is directly estimated and is not confounded by within-species effects or other sources of variance.
summary(mod1_mcmcglmm2)
# Iterations = 300001:1299001
# Thinning interval = 1000
# Sample size = 1000
#
# DIC: 680.8927
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 2.31 0.5428 4.753 844.2
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 1.164 0.6272 1.72 867.3
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 0.1881 0.1618 0.2127 1000
#
# Location effects: habitat_range ~ mass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.2339 -1.7632 1.4823 1000 0.774
# mass 1.1143 0.9890 1.2515 1000 <0.001 ***
posterior_summary(mod1_mcmcglmm2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.233917 0.86362777 -1.8194889 1.430111
# mass 1.114301 0.06819815 0.9859776 1.250126
posterior_summary(mod1_mcmcglmm2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 2.309659 1.1768520 0.7238491 5.2748480
# species 1.163885 0.2830942 0.6511874 1.7726979
# units 0.188124 0.0129251 0.1628494 0.2151048
summary(mod1_brms2)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: habitat_range ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data1 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.44 0.34 0.84 2.18 1.00 563 1266
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.07 0.13 0.82 1.34 1.00 851 1427
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.15 0.80 -1.73 1.45 1.00 3901 2789
# mass 1.11 0.07 0.98 1.25 1.00 6559 2347
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.43 0.02 0.40 0.47 1.00 3843 2959Next, regarding the model that includes log-transformed body mass as an explanatory variable, the results showed that, as we had set(beta_c1), mass was estimated to have a positive effect on habitat range. Additionally, examining the residuals reveals that they are much closer to the true value compared to the intercept-only model. This improvement suggests that including an explanatory variable helps the model capture more of the variability in the data, reducing the unexplained variance (residual variance).
summary(mod1_mcmcglmm3)
# Iterations = 750001:3247501
# Thinning interval = 2500
# Sample size = 1000
#
# DIC: 668.5273
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 2.325 0.5523 4.646 1073
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 1.151 0.6569 1.764 900.3
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 0.1832 0.1572 0.2097 1000
#
# Location effects: habitat_range ~ mass + sex
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.28323 -1.97048 1.38771 1000 0.674
# mass 1.11239 0.97750 1.24672 1000 <0.001 ***
# sexM 0.15016 0.06697 0.22862 1000 <0.001 ***
posterior_summary(mod1_mcmcglmm3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) -0.2832264 0.85695461 -1.93569141 1.4800298
# mass 1.1123899 0.06844718 0.97159841 1.2423309
# sexM 0.1501593 0.04167755 0.06778142 0.2286585
posterior_summary(mod1_mcmcglmm3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 2.3252270 1.16940589 0.7405934 5.2388026
# species 1.1512006 0.29343601 0.6567145 1.7615075
# units 0.1831836 0.01342064 0.1586928 0.2117723
summary(mod1_brms3)
# Family: gaussian
# Links: mu = identity; sigma = identity
# Formula: habitat_range ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data1 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.43 0.34 0.84 2.14 1.00 619 1255
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.08 0.13 0.83 1.33 1.00 964 1578
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.22 0.77 -1.76 1.26 1.00 3957 2850
# mass 1.11 0.07 0.98 1.24 1.00 8492 2851
# sexM 0.15 0.04 0.06 0.23 1.00 8621 2798
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.43 0.02 0.40 0.46 1.00 4112 2962Finally, let’s take a look at the results when both mass and sex are included in the model. We assumed that males are, on average, 0.1 units larger than females. This assumption is successfully reflected in the model estimates
Binary
The response variable in this analysis is breeding success, defined as whether a species reproduced or not. The assumptions are as follows: phylogeny has a weak relationship with breeding success, while some non-phylogenetic factors, such as environmental conditions, are likely to play a role. Additionally, heavier species are assumed to have a higher likelihood of breeding success, and females (coded as 0) are expected to succeed in breeding more often than males.
set.seed(456)
tree2 <- pbtree(n = 100, scale = 1)
# extract the correlation matrix from the phylogenetic tree
phylo_cor <- vcv(tree2, corr = TRUE)
# define parameters
n_species <- 100 # number of species (clusters)
obs_per_species <- 5 # number of observations per species
n <- n_species * obs_per_species # total number of observations
# simulate fixed effects
## continuous variable: body mass
mu_mass <- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_species_mass <- mvrnorm(n = 1, # species level body mass
mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
obs_mass <- sapply(mu_species_mass,
function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.2))) %>%
as.vector() # each observation body mass
## binomial variable: sex
sex <- rbinom(n, 1, 0.5)
# simulate random effect parameter
## variance components
sigma_phylo <- 0.4
sigma_non <- 0.3 # standard deviation of non-phylogenetic random effect
## phylogenetic random effect
random_effect_phylo <- mvrnorm(
n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
phylo_effect <- rep(random_effect_phylo, each = obs_per_species)
## non-phylogenetic random effect
random_effect_non_phylo <- rnorm(n, sd = sqrt(sigma_non))
non_phylo_effect <- rep(random_effect_non_phylo, each = obs_per_species)
# define fixed effect parameter
beta_b0 <- -0.5 # intercept
beta_b1 <- 0.4 # coefficient for body mass
beta_b2 <- -1.5 # coefficient for sex
# simulate linear predictor
eta1 <- beta_b0 + beta_b1 * obs_mass + beta_b2 * sex + phylo_effect + non_phylo_effect
# eta1 <- beta_b0 + phylo_effect + non_phylo_effect
# probability
p1 <- 1 / (1 + exp(-eta1))
y1 <- rbinom(n, 1, p1)
# combine simulated data
sim_data2 <- data.frame(
breeding = y1,
mass = obs_mass,
sex = factor(sex, labels = c("F", "M")),
species = names(phylo_effect),
individual_id = factor(1:n), # create unique individual IDs for each observation
phylo = names(phylo_effect)
) %>%
arrange(species) %>%
as.data.frame()
# head(sim_data2, 20)
# breeding mass sex species individual_id phylo
# 1 0 4.425580 M t1 146 t1
# 2 0 4.518592 M t1 147 t1
# 3 0 4.553237 M t1 148 t1
# 4 1 4.728218 F t1 149 t1
# 5 1 4.125475 F t1 150 t1
# 6 1 4.181977 M t10 251 t10
# 7 1 4.366759 F t10 252 t10
# 8 0 4.286815 F t10 253 t10
# 9 0 3.998315 M t10 254 t10
# 10 1 3.877700 F t10 255 t10
# 11 1 5.367269 M t100 311 t100
# 12 0 4.287155 M t100 312 t100
# 13 0 4.715682 F t100 313 t100
# 14 1 4.969713 M t100 314 t100
# 15 1 4.818693 F t100 315 t100
# 16 0 3.750907 M t11 256 t11
# 17 1 4.016580 F t11 257 t11
# 18 1 3.722079 F t11 258 t11
# 19 1 3.325677 M t11 259 t11
# 20 1 3.100769 F t11 260 t11Run models
# MCMCglmm
inv_phylo <- inverseA(tree2, nodes = "ALL", scale = TRUE)
prior2_mcmcglmm <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
mod2_mcmcglmm_logit <- MCMCglmm(breeding ~ 1,
random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*10,
thin = 10*10,
burnin = 3000*10)
)
system.time(
mod2_mcmcglmm_logit2 <- MCMCglmm(breeding ~ mass,
random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)
system.time(
mod2_mcmcglmm_logit3 <- MCMCglmm(breeding ~ mass + sex,
random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)
# brms
A <- ape::vcv.phylo(tree2, corr = TRUE)
priors2_brms <- default_prior(
breeding ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
mod2_brms_logit <- brm(breeding ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
priors2_brms2 <- default_prior(
breeding ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
mod2_brms_logit2 <- brm(breeding ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms2,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
priors2_brms3 <- default_prior(
breeding ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
mod2_brms_logit3 <- brm(breeding ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms3,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)Results
True values are here:
# actual mean and variance of y
mean(eta1)[1] 1.191048
var(eta1)[1] 1.277676
var(random_effect_phylo)[1] 0.2981068
var(random_effect_non_phylo)[1] 0.3090161
Before checking the output,we need to correct the results from MCMCglmm using the following equation:
c2 <- (16 * sqrt(3) / (15 * pi))^2summary(mod2_mcmcglmm_logit)
# Iterations = 30001:129901
# Thinning interval = 100
# Sample size = 1000
#
# DIC: 550.715
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 0.9548 8.97e-06 2.355 639.4
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.4156 3.523e-08 1.201 486.4
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: breeding ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.1605 0.1331 2.1428 1272 0.034 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# > res_1 <- mod2_mcmcglmm_logit$Sol / sqrt(1+c2) # for fixed effects
# > res_2 <- mod2_mcmcglmm_logit$VCV / (1+c2) # for variance components
# > summary(res_1)
#
# Iterations = 30001:129901
# Thinning interval = 100
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 1.00036 0.43343 0.01371 0.01215
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.1144 0.7425 1.0093 1.2709 1.8407
res_1 <- mod2_mcmcglmm_logit$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mod2_mcmcglmm_logit$VCV / (1+c2) # for variance components
summary(res_1)
# Iterations = 30001:129901
# Thinning interval = 100
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# 1.00036 0.43343 0.01371 0.01215
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# 0.1144 0.7425 1.0093 1.2709 1.8407
summary(res_2)
# Iterations = 30001:129901
# Thinning interval = 100
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# phylo 0.7094 0.5174 0.016361 0.02046
# species 0.3088 0.2845 0.008996 0.01290
# units 0.7430 0.0000 0.000000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# phylo 0.02672 0.3361 0.5973 0.9889 2.040
# species 0.00126 0.0810 0.2530 0.4560 1.028
# units 0.74303 0.7430 0.7430 0.7430 0.743
summary(mod2_brms_logit)
# Family: bernoulli
# Links: mu = logit
# Formula: breeding ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data2 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.77 0.31 0.18 1.41 1.01 542 820
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.51 0.25 0.03 1.00 1.00 475 924
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 0.99 0.42 0.13 1.81 1.00 1407 1848The fixed effect estimates in both MCMCglmm and brms are close to the true values, but the phylogenetic random effects are not performing well. However, the wide 95% confidence intervals indicate that they have very large uncertainties.
And phylogenetic signal is…
# MCMCglmm
phylo_signal_mod2_mcmcglmm <- ((mod2_mcmcglmm_logit$VCV[, "phylo"]/(1+c2)) / (mod2_mcmcglmm_logit$VCV[, "phylo"]/(1+c2) + mod2_mcmcglmm_logit$VCV[, "species"]/(1+c2) + 1))
phylo_signal_mod2_mcmcglmm %>% mean()
# [1] 0.3233245
phylo_signal_mod2_mcmcglmm %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.01696071 0.32167758 0.64318606
# brms
phylo_signal_mod2_brms <- mod2_brms_logit %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept) %>%
mutate(lambda_binary = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 +1)) %>%
pull(lambda_binary)
phylo_signal_mod2_brms %>% mean()
# [1] 0.3117823
phylo_signal_mod2_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.01973947 0.30714261 0.63302530 The estimated phylogenetic signal values were similar in both models.
summary(mod2_mcmcglmm_logit2)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 543.8511
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.002 2.289e-06 2.566 972.4
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.5078 4.507e-06 1.34 868.1
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: breeding ~ mass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -1.5979 -3.6412 0.6322 1000 0.126
# mass 0.6094 0.2202 1.0957 1000 <0.001 ***
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- mod2_mcmcglmm_logit2$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mod2_mcmcglmm_logit2$VCV / (1+c2) # for variance components
summary(res_1)
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# (Intercept) -1.3774 0.9619 0.030419 0.030419
# mass 0.5253 0.1910 0.006038 0.006038
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# (Intercept) -3.2469 -2.0173 -1.3249 -0.7434 0.4585
# mass 0.1571 0.3912 0.5174 0.6488 0.9140
summary(res_2)
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# phylo 0.7442 0.5982 0.01892 0.01918
# species 0.3773 0.3251 0.01028 0.01103
# units 0.7430 0.0000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# phylo 0.016307 0.3010 0.6173 1.0441 2.215
# species 0.001679 0.1132 0.3198 0.5479 1.194
# units 0.743029 0.7430 0.7430 0.7430 0.743
summary(mod2_brms_logit2)
# Family: bernoulli
# Links: mu = logit
# Formula: breeding ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data2 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.80 0.35 0.15 1.53 1.00 418 497
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.55 0.28 0.03 1.09 1.00 414 766
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -1.42 1.01 -3.51 0.45 1.00 2097 1850
# mass 0.52 0.19 0.15 0.92 1.00 2664 2631summary(mod2_mcmcglmm_logit3)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 493.3098
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.629 0.0009646 4.062 910
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.6195 1.258e-06 1.682 756.2
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: breeding ~ mass + sex
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) -0.8882 -3.4396 1.7246 1000 0.462
# mass 0.7312 0.2433 1.2183 1000 0.002 **
# sexM -2.0358 -2.6616 -1.4547 1000 <0.001 ***
c2 <- (16 * sqrt(3) / (15 * pi))^2
res_1 <- mod2_mcmcglmm_logit3$Sol / sqrt(1+c2) # for fixed effects
res_2 <- mod2_mcmcglmm_logit3$VCV / (1+c2) # for variance components
summary(res_1)
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# (Intercept) -0.7656 1.1431 0.036149 0.036149
# mass 0.6303 0.2197 0.006946 0.006946
# sexM -1.7548 0.2780 0.008791 0.008791
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# (Intercept) -2.9652 -1.4957 -0.7345 -0.08661 1.485
# mass 0.2142 0.4818 0.6342 0.77977 1.060
# sexM -2.3071 -1.9429 -1.7496 -1.56044 -1.263
summary(res_2)
# Iterations = 60001:259801
# Thinning interval = 200
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# phylo 1.2105 0.9233 0.02920 0.03061
# species 0.4603 0.3948 0.01249 0.01436
# units 0.7430 0.0000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# phylo 0.043912 0.6072 1.031 1.573 3.571
# species 0.003595 0.1428 0.374 0.671 1.391
# units 0.743029 0.7430 0.743 0.743 0.743
summary(mod2_brms_logit3)
# Family: bernoulli
# Links: mu = logit
# Formula: breeding ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data2 (Number of observations: 500)
# Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.01 0.40 0.24 1.86 1.00 776 1072
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.60 0.30 0.05 1.18 1.00 670 1537
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.79 1.14 -3.12 1.38 1.00 3595 3216
# mass 0.62 0.22 0.19 1.06 1.00 4898 3060
# sexM -1.75 0.28 -2.29 -1.23 1.00 6333 3267The coefficient estimates for both mass and sex were close to the values we set :)
Ordinal
The response variable is the aggressiveness level, which has three categories: Low, Medium, and High. It is assumed that phylogeny plays a significant role in influencing aggressiveness. However, non-phylogenetic factors are expected to have only a weak relationship with aggressiveness. The relationship between body mass and aggressiveness is unclear, and it may vary across species. Additionally, males are more likely to exhibit higher levels of aggressiveness compared to females.
set.seed(789)
tree3 <- pbtree(n = 100, scale = 1)
# extract the correlation matrix from the phylogenetic tree
phylo_cor <- vcv(tree3, corr = TRUE)
# define parameters
n_species <- 100 # number of species (clusters)
obs_per_species <- 5 # number of observations per species
n <- n_species * obs_per_species # Total number of observations
# simulate fixed effects
## continuous variable: body mass
mu_mass <- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_species_mass <- mvrnorm(n = 1, # species level body mass
mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
obs_mass <- sapply(mu_species_mass,
function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.2))) %>%
as.vector() # each observation body mass
## binomial variable: sex
sex <- rbinom(n, 1, 0.5)
# simulate random effect parameter
## variance components
sigma_phylo <- 1
sigma_non <- 0.4 # standard deviation of non-phylogenetic random effect
## phylogenetic random effect
random_effect_phylo <- mvrnorm(
n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
phylo_effect <- rep(random_effect_phylo, each = obs_per_species)
## non-phylogenetic random effect
random_effect_non_phylo <- rnorm(n_species, sd = sqrt(sigma_non))
non_phylo_effect <- rep(random_effect_non_phylo, each = obs_per_species)
# define fixed effect coefficients
beta_o0 <- -0.5 # intercept
beta_o1 <- 0 # coefficient for mass - controversial whether body mass is related to aggressiveness
beta_o2 <- 1 # coefficient for sex - males are often more aggressive than females
# simulate the linear predictor
eta <- beta_o0 + beta_o1 * obs_mass + beta_o2 * sex + non_phylo_effect + phylo_effect
# define thresholds (cutpoints) for the ordinal categories
cutpoints <- c(-0.5, 0.5)
# Calculate cumulative probabilities using the probit link function
p1 <- pnorm(cutpoints[1] - eta)
p2 <- pnorm(cutpoints[2] - eta) - p1
p3 <- 1 - pnorm(cutpoints[2] - eta)
# combine probabilities into a matrix
probs <- cbind(p1, p2, p3)
# simulate the ordinal response
aggressive_level <- apply(probs, 1, function(prob) sample(1:3, 1, prob = prob))
# create a data frame
sim_data3 <- data.frame(
individual_id <- rep(1:n, each = 1),
phylo = names(phylo_effect),
species = names(phylo_effect),
aggressive_level = factor(aggressive_level,
levels = 1:3,
labels = c("Low", "Medium", "High"),
ordered = TRUE),
sex = factor(sex, labels = c("F", "M")),
mass = obs_mass
) %>%
arrange(species)
# table(sim_data3$aggressive_level)
# Low Medium High
# 97 143 260 Run models
# mcmcglmm
inv_phylo <- inverseA(tree3, nodes = "ALL", scale = TRUE)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
mod3_mcmcglmm <- MCMCglmm(aggressive_level ~ 1,
random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
system.time(
mod3_mcmcglmm2 <- MCMCglmm(aggressive_level ~ mass,
random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
system.time(
mod3_mcmcglmm3 <- MCMCglmm(aggressive_level ~ mass + sex,
random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
# brms
A <- ape::vcv.phylo(tree3, corr = TRUE)
default_priors <- default_prior(
aggressive_level ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
mod3_brms <- brm(
formula = aggressive_level ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
default_priors2 <- default_prior(
aggressive_level ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
mod3_brms2 <- brm(
formula = aggressive_level ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
default_priors3 <- default_prior(
aggressive_level ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
mod3_brms3 <- brm(
formula = aggressive_level ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)Results
The estimates should align with the following values…
mean(eta)[1] 0.7347869
var(eta)[1] 1.484684
var(random_effect_phylo)[1] 0.8430567
var(random_effect_non_phylo)[1] 0.3352239
summary(mod3_mcmcglmm)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 864.7065
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.117 0.2213 2.167 1000
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.3178 9.188e-07 0.6704 1099
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: aggressive_level ~ 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.3987 0.5334 2.1765 1000 <0.001 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitaggressive_level.1 1.12 0.9581 1.277 1007
posterior_summary(mod3_mcmcglmm$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.398716 0.4201698 0.5994656 2.295179
posterior_summary(mod3_mcmcglmm$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 1.1170856 0.5334151 0.36097048 2.3726763
# species 0.3178276 0.1906041 0.02409845 0.7331308
# units 1.0000000 0.0000000 1.00000000 1.0000000
posterior_summary(mod3_mcmcglmm$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitaggressive_level.1 1.119532 0.08512686 0.9590519 1.278939
summary(mod3_brms)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: aggressive_level ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data3 (Number of observations: 500)
# Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.01 0.24 0.58 1.52 1.00 731 1479
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.55 0.17 0.19 0.87 1.00 578 657
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -1.32 0.42 -2.16 -0.51 1.00 2085 2250
# Intercept[2] -0.20 0.41 -1.03 0.60 1.00 2074 2333
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NAthe estimates are consistent with the true values, and both model outputs are as well. Then, phylogenetic signal is…
# MCMCglmm
phylo_signal_mod3_mcmcglmm <- ((mod3_mcmcglmm$VCV[, "phylo"]) / (mod3_mcmcglmm$VCV[, "phylo"] + mod3_mcmcglmm$VCV[, "species"] + 1))
phylo_signal_mod3_mcmcglmm %>% mean()
# [1] 0.438553
phylo_signal_mod3_mcmcglmm %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.1911082 0.4385983 0.6808732
# brms
phylo_signal_mod3_brms <- mod3_brms %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept) %>%
mutate(lambda_ordinal = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 +1)) %>%
pull(lambda_ordinal)
phylo_signal_mod3_brms %>% mean()
# [1] 0.4288363
phylo_signal_mod3_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.1788238 0.4319170 0.6689190 Nice - both models yielded nearly identical phylogenetic signals.
summary(mod3_mcmcglmm2)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 862.4181
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.208 0.3157 2.251 1000
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.2989 2.879e-06 0.6329 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: aggressive_level ~ mass
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 2.16331 0.84674 3.45252 1000 0.002 **
# mass -0.16112 -0.36658 0.03244 1000 0.126
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitaggressive_level.1 1.126 0.9627 1.299 1000
posterior_summary(mod3_mcmcglmm2$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 2.163314 0.6643733 0.8905397 3.49905134
# mass -0.161119 0.1028227 -0.3548954 0.04503046
posterior_summary(mod3_mcmcglmm2$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 1.2078921 0.5324432 0.400101109 2.4861135
# species 0.2989292 0.1824707 0.009196968 0.7111517
# units 1.0000000 0.0000000 1.000000000 1.0000000
posterior_summary(mod3_mcmcglmm2$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitaggressive_level.1 1.125844 0.08611264 0.9675762 1.303084
summary(mod3_brms2)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: aggressive_level ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data3 (Number of observations: 500)
# Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.08 0.25 0.63 1.63 1.00 797 1655
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.50 0.19 0.08 0.84 1.00 526 647
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -2.11 0.66 -3.41 -0.84 1.00 4761 3053
# Intercept[2] -0.99 0.65 -2.28 0.29 1.00 4540 2992
# mass -0.16 0.10 -0.37 0.04 1.00 6574 2840
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NA
summary(mod3_mcmcglmm3)
# Iterations = 60001:259801
# Thinning interval = 200
# Sample size = 1000
#
# DIC: 785.7335
#
# G-structure: ~phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# phylo 1.664 0.3929 3.216 1000
#
# ~species
#
# post.mean l-95% CI u-95% CI eff.samp
# species 0.4061 9.11e-06 0.8454 1000
#
# R-structure: ~units
#
# post.mean l-95% CI u-95% CI eff.samp
# units 1 1 1 0
#
# Location effects: aggressive_level ~ mass + sex
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.6060 0.1500 3.1995 1000 0.028 *
# mass -0.1199 -0.3519 0.1025 1000 0.308
# sexM 1.0586 0.7945 1.3036 1117 <0.001 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Cutpoints:
#
# post.mean l-95% CI u-95% CI eff.samp
# cutpoint.traitaggressive_level.1 1.277 1.091 1.487 1000
posterior_summary(mod3_mcmcglmm3$Sol)
# Estimate Est.Error Q2.5 Q97.5
# (Intercept) 1.6059858 0.7827238 0.09396171 3.1803784
# mass -0.1198975 0.1181710 -0.35162836 0.1026163
# sexM 1.0585665 0.1318889 0.81133692 1.3344116
posterior_summary(mod3_mcmcglmm3$VCV)
# Estimate Est.Error Q2.5 Q97.5
# phylo 1.663718 0.8053196 0.58185142 3.5586348
# species 0.406108 0.2440002 0.02279807 0.9607917
# units 1.000000 0.0000000 1.00000000 1.0000000
posterior_summary(mod3_mcmcglmm3$CP)
# Estimate Est.Error Q2.5 Q97.5
# cutpoint.traitaggressive_level.1 1.277118 0.1012452 1.088106 1.485881
summary(mod3_brms3)
# Family: cumulative
# Links: mu = probit; disc = identity
# Formula: aggressive_level ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
# Data: sim_data3 (Number of observations: 500)
# Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
# total post-warmup draws = 4000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.25 0.30 0.72 1.90 1.00 423 1179
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.59 0.21 0.11 0.97 1.00 357 408
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept[1] -1.58 0.76 -3.08 -0.12 1.00 2467 2888
# Intercept[2] -0.30 0.76 -1.80 1.14 1.00 2420 3074
# mass -0.13 0.11 -0.35 0.09 1.00 3431 3348
# sexM 1.05 0.13 0.80 1.31 1.00 5609 3295
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# disc 1.00 0.00 1.00 1.00 NA NA NABoth MCMCglmm and brms estimated very similar values, and the fixed effect coefficients were close to the values set in the simulation.
Nominal
The response variable is lateralisation (handedness/footedness), categorized as Left, Right, or Both, with “Both” serving as the reference category, representing non-preference. It is assumed that phylogeny is related to lateralisation, with a weak negative correlation between left and right preferences. Non-phylogenetic factors are expected to have only a weak influence on lateralisation. While body mass may have a minor effect, sex would not have significant effect on lateralisation. The majority of individuals tend to show a preference for either the left or right side, with only a few exhibiting no clear preference.
Here, we create a simple dataset and check the results from the intercept-only model, as there are some challenges. The nominal model is more complex, so when we use similar settings to the three models above, the sample size we have set is too small. As a result, the variance and correlation do not align. Therefore, in this case, we do not include the effects of mass or sex when simulating the linear predictor. Additionally, both phylogenetic and non-phylogenetic correlations are set to zero.
set.seed(789)
tree4 <- pbtree(n = 100, scale = 1)
# extract the correlation matrix from the phylogenetic tree
phylo_cor <- vcv(tree4, corr = TRUE)
# define parameters
n_species <- 100 # number of species (clusters)
obs_per_species <- 5 # number of observations per species
n <- n_species * obs_per_species # total number of observations
n_categories <- 3
# define standard deviations and correlations for the random effects
sigma_phylo <- c(1, 1.5) # standard deviations for phylogenetic random effects for left and right
cor_phylo <- 0 # correlation between phylogenetic random effects for left and right
sigma_u <- c(1.1, 1) # standard deviations for non-phylogenetic random effects for left and right
cor_u <- 0 # correlation between non-phylogenetic random effects for left and right
# covariance matrices
cov_u <- diag(sigma_u^2)
cov_u[1, 2] <- cov_u[2, 1] <- cor_u * sigma_u[1] * sigma_u[2]
cov_phylo <- diag(sigma_phylo^2)
cov_phylo[1, 2] <- cov_phylo[2, 1] <- cor_phylo * sigma_phylo[1] * sigma_phylo[2]
# simulate phylogenetic random effects
phylo_random_effects <- mvrnorm(n = 1, mu = rep(0, n_species * 2), Sigma = kronecker(cov_phylo, phylo_cor))
phylo_effect_1 <- rep(phylo_random_effects[1:n_species], each = obs_per_species)
phylo_effect_2 <- rep(phylo_random_effects[(n_species + 1):(2 * n_species)], each = obs_per_species)
# simulate non-phylogenetic random effects
non_phylo_random_effects <- mvrnorm(n_species, mu = c(0, 0), Sigma = cov_u)
non_phylo_effect_1 <- rep(non_phylo_random_effects[, 1], each = obs_per_species)
non_phylo_effect_2 <- rep(non_phylo_random_effects[, 2], each = obs_per_species)
# define fixed effect coefficients for each category relative to the reference category (Both)
beta_m0 <- c(0.4, 0.8) # intercept for left and right
beta_m1 <- c(0.2, -0.1) # coefficient for mass for left and right
beta_m2 <- c(0, 0) # coefficient for sex for left and right
# simulate the linear predictor for each category relative to the reference
eta <- matrix(0, nrow = n, ncol = 2)
eta[, 1] <- beta_m0[1] + non_phylo_effect_1 + phylo_effect_1
eta[, 2] <- beta_m0[2] + non_phylo_effect_2 + phylo_effect_2
# calculate probabilities using the multinomial logit link function
exp_eta <- exp(cbind(eta, 0)) # adding a column of zeros for the reference category
probs <- exp_eta / rowSums(exp_eta)
# simulate the multinomial response
lateralisation <- apply(probs, 1, function(prob) sample(1:3, 1, prob = prob))
# generate individual ID for each observation
individual_id <- rep(1:n, each = 1) # create unique individual IDs for each observation
# create a data frame
sim_data4 <- data.frame(
individual_id = individual_id,
phylo = rep(tree4$tip.label, each = 5),
species = rep(tree4$tip.label, each = 5),
lateralisation = factor(lateralisation,
levels = 1:n_categories,
labels = c("Left", "Right", "Both"),
ordered = FALSE),
sex = factor(sex, labels = c("F", "M")),
mass = obs_mass
) %>%
arrange(species)
sim_data4$lateralisation <- relevel(sim_data4$lateralisation, ref = "Both")
# table(sim_data4$lateralisation)
# Both Left Right
# 45 208 247Run models
inv_phylo <- inverseA(tree4, nodes = "ALL", scale = TRUE)
prior1 <- list(
R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
),
G2 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
mcmcglmm_mod4 <- MCMCglmm(lateralisation ~ trait - 1,
random = ~us(trait):phylo + us(trait):species,
rcov = ~us(trait):units,
ginverse = list(phylo = inv_phylo$Ainv),
family = "categorical",
data = sim_data4,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
# brms
A <- ape::vcv.phylo(tree4, corr = TRUE)
priors_brms1 <- default_prior(lateralisation ~ 1 + (1 |a| gr(phylo, cov = A)) + (1 |b| species),
data = sim_data4,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
brms_mod4 <- brm(lateralisation ~ 1 + (1 |a| gr(phylo, cov = A)) + (1 |b| species),
data = sim_data4,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 12000,
warmup = 8000,
chains = 2,
thin = 1,
cores = 2,
control = list(adapt_delta = 0.99),
)
)Results
True values are…
mean(eta[, 1])[1] 1.498934
mean(eta[, 2])[1] 1.71958
var(phylo_effect_1)[1] 0.888621
var(phylo_effect_2)[1] 1.853071
var(non_phylo_effect_1)[1] 1.135754
var(non_phylo_effect_2)[1] 1.184511
cor(phylo_effect_1, phylo_effect_2)[1] 0.1435537
cor(non_phylo_effect_1, non_phylo_effect_2)[1] 0.01583177
Before checking the output,we need to correct the results from MCMCglmm using the following equation:
c2 <- (16 * sqrt(3) / (15 * pi))^2
c2a <- c2*(2/3)summary(mod4_mcmcglmm)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Sample size = 1000
#
# DIC: 758.4723
#
# G-structure: ~us(trait):phylo
#
# post.mean l-95% CI u-95% CI eff.samp
# traitlateralisation.Left:traitlateralisation.Left.phylo 0.5216 2.422e-07 1.6198 1000
# traitlateralisation.Right:traitlateralisation.Left.phylo -0.4840 -2.272e+00 0.8416 1000
# traitlateralisation.Left:traitlateralisation.Right.phylo -0.4840 -2.272e+00 0.8416 1000
# traitlateralisation.Right:traitlateralisation.Right.phylo 4.6297 1.388e+00 9.1477 1000
#
# ~us(trait):species
#
# post.mean l-95% CI u-95% CI eff.samp
# traitlateralisation.Left:traitlateralisation.Left.species 1.6045 3.293e-01 3.3113 1000
# traitlateralisation.Right:traitlateralisation.Left.species -0.4089 -1.402e+00 0.5503 1000
# traitlateralisation.Left:traitlateralisation.Right.species -0.4089 -1.402e+00 0.5503 1000
# traitlateralisation.Right:traitlateralisation.Right.species 1.0943 1.113e-06 2.6165 1294
#
# R-structure: ~us(trait):units
#
# post.mean l-95% CI u-95% CI eff.samp
# traitlateralisation.Left:traitlateralisation.Left.units 0.6667 0.6667 0.6667 0
# traitlateralisation.Right:traitlateralisation.Left.units 0.3333 0.3333 0.3333 0
# traitlateralisation.Left:traitlateralisation.Right.units 0.3333 0.3333 0.3333 0
# traitlateralisation.Right:traitlateralisation.Right.units 0.6667 0.6667 0.6667 0
#
# Location effects: lateralisation ~ trait - 1
#
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# traitlateralisation.Left 0.70100 -0.04992 1.49781 1000 0.086 .
# traitlateralisation.Right 0.48833 -1.14012 2.22379 1000 0.578
res_1 <- mod4_mcmcglmm$Sol / sqrt(1+c2a) # for fixed effects
res_2 <- mod4_mcmcglmm$VCV / (1+c2a) # for variance components
res_3_corr_phylo <- (mod4_mcmcglmm$VCV[, 2]/(1+c2a)) /sqrt((mod4_mcmcglmm$VCV[, 1] * mod4_mcmcglmm$VCV[, 4])/(1+c2a))
res_3_corr_nonphylo <- (mod4_mcmcglmm$VCV[, 6]/(1+c2a)) /sqrt((mod4_mcmcglmm$VCV[, 5] * mod4_mcmcglmm$VCV[, 8])/(1+c2a))
summary(res_1)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitlateralisation.Left 0.6319 0.3392 0.01073 0.01073
# traitlateralisation.Right 0.4402 0.7946 0.02513 0.02513
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitlateralisation.Left -0.1209 0.44513 0.6354 0.8306 1.306
# traitlateralisation.Right -1.0131 -0.09662 0.4496 0.9362 2.079
summary(res_2)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# traitlateralisation.Left:traitlateralisation.Left.phylo 0.4239 0.4635 0.01466 0.01466
# traitlateralisation.Right:traitlateralisation.Left.phylo -0.3933 0.6352 0.02009 0.02009
# traitlateralisation.Left:traitlateralisation.Right.phylo -0.3933 0.6352 0.02009 0.02009
# traitlateralisation.Right:traitlateralisation.Right.phylo 3.7622 1.7766 0.05618 0.05618
# traitlateralisation.Left:traitlateralisation.Left.species 1.3038 0.6761 0.02138 0.02138
# traitlateralisation.Right:traitlateralisation.Left.species -0.3323 0.3875 0.01226 0.01226
# traitlateralisation.Left:traitlateralisation.Right.species -0.3323 0.3875 0.01226 0.01226
# traitlateralisation.Right:traitlateralisation.Right.species 0.8892 0.7171 0.02268 0.01993
# traitlateralisation.Left:traitlateralisation.Left.units 0.5418 0.0000 0.00000 0.00000
# traitlateralisation.Right:traitlateralisation.Left.units 0.2709 0.0000 0.00000 0.00000
# traitlateralisation.Left:traitlateralisation.Right.units 0.2709 0.0000 0.00000 0.00000
# traitlateralisation.Right:traitlateralisation.Right.units 0.5418 0.0000 0.00000 0.00000
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# traitlateralisation.Left:traitlateralisation.Left.phylo 0.0009474 0.09136 0.2797 0.606425 1.6498
# traitlateralisation.Right:traitlateralisation.Left.phylo -1.8018394 -0.71723 -0.3356 -0.003957 0.8138
# traitlateralisation.Left:traitlateralisation.Right.phylo -1.8018394 -0.71723 -0.3356 -0.003957 0.8138
# traitlateralisation.Right:traitlateralisation.Right.phylo 1.3864388 2.45513 3.4326 4.670559 8.0450
# traitlateralisation.Left:traitlateralisation.Left.species 0.3061715 0.82577 1.1969 1.673444 2.8234
# traitlateralisation.Right:traitlateralisation.Left.species -1.1075741 -0.56624 -0.3359 -0.104277 0.4842
# traitlateralisation.Left:traitlateralisation.Right.species -1.1075741 -0.56624 -0.3359 -0.104277 0.4842
# traitlateralisation.Right:traitlateralisation.Right.species 0.0251312 0.33532 0.7494 1.269344 2.6216
# traitlateralisation.Left:traitlateralisation.Left.units 0.5417579 0.54176 0.5418 0.541758 0.5418
# traitlateralisation.Right:traitlateralisation.Left.units 0.2708789 0.27088 0.2709 0.270879 0.2709
# traitlateralisation.Left:traitlateralisation.Right.units 0.2708789 0.27088 0.2709 0.270879 0.2709
# traitlateralisation.Right:traitlateralisation.Right.units 0.5417579 0.54176 0.5418 0.541758 0.5418
summary(res_3_corr_phylo)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.32095 0.43943 0.01390 0.01465
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.86925 -0.68977 -0.42416 -0.02273 0.67656
summary(res_3_corr_nonphylo)
# Iterations = 450001:1948501
# Thinning interval = 1500
# Number of chains = 1
# Sample size per chain = 1000
#
# 1. Empirical mean and standard deviation for each variable,
# plus standard error of the mean:
#
# Mean SD Naive SE Time-series SE
# -0.35545 0.34670 0.01096 0.01096
#
# 2. Quantiles for each variable:
#
# 2.5% 25% 50% 75% 97.5%
# -0.8562 -0.6255 -0.4143 -0.1322 0.4288
summary(mod4_brms)
# Family: categorical
# Links: muLeft = logit; muRight = logit
# Formula: lateralisation ~ 1 + (1 | a | gr(phylo, cov = A)) + (1 | b | species)
# Data: sim_data4 (Number of observations: 500)
# Draws: 2 chains, each with iter = 12000; warmup = 8000; thin = 1;
# total post-warmup draws = 8000
#
# Multilevel Hyperparameters:
# ~phylo (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muLeft_Intercept) 0.60 0.37 0.03 1.41 1.00 1258 2496
# sd(muRight_Intercept) 1.99 0.46 1.21 3.02 1.00 2445 4477
# cor(muLeft_Intercept,muRight_Intercept) -0.28 0.46 -0.96 0.73 1.00 480 746
#
# ~species (Number of levels: 100)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(muLeft_Intercept) 1.16 0.31 0.56 1.79 1.00 2264 2590
# sd(muRight_Intercept) 0.94 0.41 0.14 1.76 1.00 1100 1616
# cor(muLeft_Intercept,muRight_Intercept) -0.35 0.39 -0.97 0.45 1.00 1699 2808
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# muLeft_Intercept 0.63 0.35 -0.11 1.32 1.00 3793 4140
# muRight_Intercept 0.37 0.79 -1.18 1.89 1.00 3953 4832Please look at the correlation part - estimating random effect covariance is harder than estimating the random effects variance(s) - you now understand there is a different difficulty in estimating parameters. The 95% CIs are a quite large, they have very large uncertainties and with such large uncertainties, we do not expect they match…
fixed effects (beta) < random effects variance(s) <<< random effects covariance (correlation)
And phylogenetic signal is…
# average phylogenteic signal - both vs. left
## MCMCglmm
phylo_signalL_mod4_mcmcglmm <- ((mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.phylo"]/(1+c2a)) / (mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.phylo"]/(1+c2a) + mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.species"]/(1+c2a) + 1))
phylo_signalL_mod4_mcmcglmm %>% mean()
# [1] 0.1452383
phylo_signalL_mod4_mcmcglmm %>% quantile(probs = c(0.025, 0.5, 0.975))
# 2.5% 50% 97.5%
# 0.000305831 0.112652234 0.482842085
## brms
phylo_signalL_mod4_brms <- mod4_brms %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_phylo__muLeft_Intercept, Sigma_non_phy = sd_species__muLeft_Intercept) %>%
mutate(lambda_nominalL = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + 1)) %>%
pull(lambda_nominalL)
phylo_signalL_mod4_brms %>% mean()
# [1] 0.1557183
phylo_signalL_mod4_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.0004165978 0.1182931266 0.5124994918
# average phylogenteic signal - both vs. right
## MCMCglmm
phylo_signalR_mod4_mcmcglmm <- ((mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.phylo"]/(1+c2a)) / (mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.phylo"]/(1+c2a) + mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.species"]/(1+c2a) + 1))
phylo_signalR_mod4_mcmcglmm %>% mean()
# [1] 0.6446448
phylo_signalR_mod4_mcmcglmm %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.3433895 0.6569055 0.8555424
## brms
phylo_signalR_mod4_brms <- mod4_brms %>% as_tibble() %>%
dplyr::select(Sigma_phy = sd_phylo__muRight_Intercept, Sigma_non_phy = sd_species__muRight_Intercept) %>%
mutate(lambda_nominalR = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + 1)) %>%
pull(lambda_nominalR)
phylo_signalR_mod4_brms %>% mean()
# [1] 0.6524086
phylo_signalR_mod4_brms %>% quantile(probs = c(0.025,0.5,0.975))
# 2.5% 50% 97.5%
# 0.3728231 0.6642912 0.8648875References and useful links
Sheard C, Skinner N, and Caro T. The evolution of rodent tail morphology. Am. Nat. 2024. 203:629-43. doi.org/10.1086/729751
Macdonald RX, Sheard C, Howell N, and Caro T. Primate coloration and colour vision: a comparative approach. Biol. J. Linn. Soc. Lond 2024. 141:435-455. doi.org/10.1093/biolinnean/blad089
Tobias et al. AVONET: morphological, ecological and geographical data for all birds. Ecol. Lett 2022. 25:581-597. doi.org/10.1111/ele.13898
Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. Rank-normalization, folding, and localization: An improved \(\hat{\mathbf{R}}\) for assessing convergence of MCMC (with discussion). Bayesian. anal 2021. 16:667-718. doi.org/10.1214/20-BA1221