::p_load("ape", "coda", "tidyverse", "here",
pacman"MCMCglmm", "brms", "MASS",
"phytools", "arm", "posterior")
A step-by-step tutorial
This online tutorial belongs to the paper “Promoting the use of phylogenetic multinomial generalised mixed-effects model to understand the evolution of discrete traits.” We are still updating the tutorial more reader-friendly - keep checking for the latest version!
Introduction
This tutorial provides a step-by-step guide to model different types of variables using MCMCglmm
and brms
. We show and explain how to implement 1. Gaussian, 2. binary, 3. ordinal, and 4. nominal models and interpret obtain results. We will also provide extensions: models with multiple data points per species and models with simulated data (simulation is powerful tool to understand the behaviour of the model and the effect of the prior distribution on the results!)
Why use phylogenetic comparative methods?
Phylogenetic comparative methods are used to consider the non-independence of species relatedness due to shared evolutionary history. If we ignore the phylogenetic relationship among species, we may violate the assumption of independence and underestimate the standard errors of the estimated parameters. These methods are widely used in evolutionary biology and ecology to test hypotheses about the evolution of traits and behaviours, reconstruct common ancestral traits, and investigate how different traits have co-evolved across species.
Why use MCMCglmm
and brms
in this tutorial?
MCMCglmm
and brms
allow us to make hierarchical and non-linear models. We can create more appropriate models that accurately reflect the underlying relationships within the data. Additionally, both packages can handle various types of response variables, including continuous, Poisson, binary, ordinal, and nominal data. Using Bayesian methods for parameter estimation, MCMCglmm
and brms
can incorporate prior distributions, which is particularly beneficial when working with limited data.
Intended audience
Our tutorial is designed for students and researchers who are familiar with using R
language (at least know what is R
!) and have an interest in using phylogenetic comparative methods. Although the examples in our tutorial focus on questions and data from Behavioural Ecology and Evolutionary Biology, we provide detailed explanations. Therefore, we believe these techniques are versatile and can be applied in other scientific disciplines, such as environment science, psychology, and beyond.
Setup R in your computer
Please install the following packages before running the code: MCMCglmm
, brms
, ape
, phytools
, here
, coda
, MASS
, tidyverse
, arm
. If this is the first time using brms
, you need to install Rstan
before installing brms
. Please follow the instruction on the official website.
Note 1: some models (especially nominal models) will take a long time to run. Thus, we recommend when you want to compare the results, you can save the model output as .rds
file using saveRDS()
and read it by readRDS()
when you want to compare the results.
Note 2: Because of the stochastic nature of MCMC, every time you (re)run a model, you will get a slightly different output even the model was well-mixing and converge. So even if you run the same model using the same computer, it would always be different to whatever our output here.
Note 3: Ideally, the effective sample size of a model should be close to 1000. However, We consider a model’s results reliable if the effective sample size (ESS) exceeds 400, as suggested by Vehtari et al. (2021). According to their recommendation, an ESS greater than 400 is typically sufficient to ensure stable estimates of variances, autocorrelations, and Monte Carlo standard errors based on their practical experience and simulations (more detailed information; Vehtari et al. 2021). Here, qe evaluate comprehensively in line with other convergence diagnostic indicators, such as R hat and autocorrelation plots.
1. Gaussian models
Gaussian models are used when the response variable is continuous (numeric) and normally distributed. For example, age, body size (e.g. weight, length, or height of an organism), temperature, and distance.
Explanation of dataset
We use the rodent dataset and phylogenetic tree provided by Sheard et al. (2024) to explain how to model Gaussian models. The data is about the rodent suborder Sciuromorpha (223 species), which includes squirrels, chipmunks, dormice, and the mountain beaver. The dataset contains several information including the tail length, body mass, mean annual temperature, and the presence of a tail contrasting tip. The tail contrasting tip is a white or black tip at the end of the tail. In this section, we test the evolutionary relationships between tail length & body mass (response variables) and mean annual temperature and the presence of tail contrasting tip (predictor variables) in rodents.
First of all, we modify the original dataset.
# 1. Read dataset
<- read.csv(here("data", "potential", "Rodent_tail", "RodentData.csv")) # Replace with your own folder path to where the data is stored
dt
str(dt) # check dataset - replace table format!!
<- dt %>% rename(Phylo = UphamTreeName.full) # Rename the 'UphamTreeName.full' column to 'Phylo'
dt <- dt %>% filter(Suborder == "Sciuromorpha") # Filter the rodent suborder
dt <- subset(dt, !is.na(Mass)) # Remove the species do not have body mass data
dt
# Tail length and body mass: log-transform to reduce skewness, then standardise (centre and scale) to have a mean of 0 and a standard deviation of 1, making it easier to compare with other variables.
# Temperature, shade score and litter size: standardise
$zLength <- scale(log(dt$Tail_length))
dt$zMass <- scale(log(dt$Mass))
dt$zShade<-scale(dt$shade_score)
dt$zTemp <- scale(dt$Mean_Annual_Temp)
dt$zLitter_size <- scale(dt$Litter_size)
dt
# Visualise zLength distribution
# hist(dt$zLength)
# Also, convert some columns to factors for analysis just in case
<- dt %>%
dt mutate(across(c(White.tips, Black.tips, Tufts, Naked, Fluffy, Contrasting, Noc, Di, Crep, Autotomy), as.factor))
# 2. Read phylogenetic tree
<- read.tree(here("data", "potential", "Rodent_tail", "RodentTrees.tre")) # 100 trees
trees <- trees[[1]] # In this time, we use only one tree
tree
#Trim out everything from the tree that's not in the modified dataset
<- lapply(trees, drop.tip,tip=setdiff(tree$tip.label, dt$Phylo))
trees
# Select one tree for trimming purposes
<- trees[[1]]
tree <- force.ultrametric(tree, method = c("extend")) # Convert non-ultrametric tree to ultrametric tree tree
How to model? How to interpret the output?
Let’s start by running the model. The first model we will fit is a simple animal model with no fixed effects, and only a random effect relating phylogeny. From here on, we will always begin with this simplest model to assess the phylogenetic signal of the response variable before adding more complexity by including one continuous and one categorical explanatory variable.
Making model is basically 3-4 steps:
Get phylogenetic variance-covariance (
MCMCglmm
) / correlation (brms
) matrixSet prior based on your model and run model with default
iteration
,thinning
, andburn-in
Check mixing whether the samples are adequately exploring the parameter space and convergence, posterior distribution of random and fixed effects
If model do not show good mixing or converge…
- change
iteration
,thinning
, andburn-in
- change prior setting
- change
Univariate model
Intercept-only momdel
MCMCglmm
To get the variance-covariance matrix, we use the inverseA()
function in MCMCglmm
:
<- inverseA(tree, nodes = "ALL", scale = TRUE) # Calculate the inverse of the phylogenetic relatedness matrix for all nodes, scaling the results inv_phylo
Then, we will define the priors for the phylogenetic effect and the residual variance using the following code:
<- list(R = list(V = 1, nu = 0.002), # Prior for residuals: weak inverse-Wishart prior for residual variance
prior1 G = list(G1 = list(V = 1, nu = 0.002))) # Prior for random effect: weak inverse-Wishart for random effect
we set the prior for the residual variance to be a scaled inverse-Wishart distribution with a scale parameter of 1 and degrees of freedom of 0.002. The prior for the phylogenetic effect is set to be a scaled inverse-Wishart distribution with a scale parameter of 1 and degrees of freedom of 0.002. This prior minimizes the influence of prior information, allowing the data to have a stronger impact on the model’s results. However, if you know your dataset characteristics, you can set a stronger (informative) prior.
In MCMCglmm, we can set four elements in prior: R
(R-structure for residual), G
(G-structure for random effect), B
(fixed effects), and S
(theta_scale parameter). Here, we explain B, R, G
.
Parameter | Which elements | Meaning | Effect of increasing the value | Effect of decreasing the value |
---|---|---|---|---|
V | B, R, G | Variance parameter for the prior distribution | Increases variability in the prior, allowing more flexibility in parameter estimation. The model becomes more influenced by the data (= wider prior distribution) | Decreases variability in the prior, leading to more constrained estimates. The model becomes more influenced by the prior (= narrower prior distribution) |
mu | B | Mean of the prior distribution for fixed effects | Shifts the centre of the fixed effect estimates towards a larger value | Shifts the centre of the fixed effect estimates towards a smaller value |
nu | R, G | Degrees of freedom for the inverse Wishart distribution. Controls the “certainty” of the prior. Larger values result in smaller variance of the prior (higher certainty) | The prior becomes narrower, tending to decrease the variability of the variance-covariance matrix estimates. Stronger prior information is given | The prior becomes wider, tending to increase the variability of the variance-covariance matrix estimates. Weaker prior information is given. nu = 0.002 is often used |
fix | R, G | If fix = 1, the variance is fixed, and the specified prior value (V) is used. No estimation of variance is performed | NA | NA |
alpha.mu | R, G | Prior mean for the variance-covariance matrix. Specifies the central matrix for the variance-covariance matrix estimates. | The variance-covariance matrix estimates tend to approach the matrix specified by alpha.mu. | The variance-covariance matrix estimates tend to approach the matrix specified by alpha.mu (influences each element in the decreasing direction). |
alpha.V | R, G | Prior variance for the variance-covariance matrix. Controls the variability of the variance-covariance matrix estimates. | The variability of the variance-covariance matrix estimates tends to increase. | The variability of the variance-covariance matrix estimates tends to decrease. |
Note that the G element is for random effects variance (G1, G2, G3…). That means if you have 3 random effects, you need to define 3 priors in G.
In MCMCglmm, the prior setting allows the model to account for different variance structures by dividing the variance components into two distinct matrices: G (phylogeny) and R (non-phylogeny, i.e., residual variance). From this setup, we can estimate the phylogenetic heritability, also known as the phylogenetic signal (Pagel’s \(\lambda\)), using the following simple equation:
\[ H^2 = \frac{\sigma_{a}^2}{\sigma_{a}^2 + \sigma_{e}^2} \]
When we do not include any explanatory variables (= intercept-only model), the model is added ~ 1
. We set a bit large nitt
, thin
, and burnin
to improve mixing and effective sample sizes than default settings (nitt
= 13000, thin
= 10, burnin
= 3000).
# You can check model running time by system.time() function. Knowing the running time is important when you try and error to find the best model setting!
system.time(
<- MCMCglmm(zLength ~ 1, # Response variable zLength with an intercept-only model
mcmcglmm_mg1 random = ~ Phylo, # Random effect for Phylo
ginverse = list(Phylo = inv_phylo$Ainv), # Specifies the inverse phylogenetic covariance matrix Ainv for Phylo, which accounts for phylogenetic relationships.
prior = prior1, # Prior distributions for the model parameters
family = "gaussian", # Family of the response variable (Gaussian for continuous data)
data = dt, # Dataset used for fitting the model
nitt = 13000*20, # Total number of iterations (multiplied by 20 to match scale)
thin = 10*20, # Thinning rate (multiplied by 20)
burnin = 3000*20 # Number of burn-in iterations (multiplied by 20)
)
)summary(mcmcglmm_mg1)
To begin with, we should check mixing and model convergence. We can check them via trace plots and effective sample size. The plot should look like a fuzzy caterpillar to make sure your model has converged.
plot(mcmcglmm_mg1$VCV) # Visualise variance component
plot(mcmcglmm_mg1$Sol) # Visualise location effects
autocorr.plot(mcmcglmm_mg1$VCV) # Check chain mixing
autocorr.plot(mcmcglmm_mg1$Sol) # Check chain mixing
summary(mcmcglmm_mg1) # See the result 95%HPD
Iterations = 60001:259801
Thinning interval = 200
Sample size = 1000
DIC: -26.20546
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 1.514 1.118 1.968 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.02785 0.00892 0.04964 1000
Location effects: zLength ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.1778 -1.3683 1.0638 1000 0.822
summary(mcmcglmm_mg1$VCV) # See the result of 95% Credible Interval of the variance component
Iterations = 60001:259801
Thinning interval = 200
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
Phylo 1.51394 0.22740 0.007191 0.007191
units 0.02785 0.01066 0.000337 0.000337
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
Phylo 1.11828 1.34700 1.49706 1.66954 1.9651
units 0.01027 0.01996 0.02735 0.03436 0.0519
summary(mcmcglmm_mg1$Sol) # See the result of 95% Credible Interval of the location effects
Iterations = 60001:259801
Thinning interval = 200
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
-0.17781 0.63444 0.02006 0.02006
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-1.4874 -0.6218 -0.1435 0.2346 0.9961
It looks like both estimates mixed well. And the effective sample size (eff.samp
) is large enough. The summary output shows the posterior mean, standard deviation, and 95% credible interval of the fixed (Location effects
) and random effects (G-structure
) and also residuals (R-structure
).
brms
The syntax is similar to the packages lme4
and lmer
in R
. One of the advantages of brms
is that it can fit complex models with multiple random effects and non-linear terms. The model fitting process is similar to MCMCglmm
but with a different syntax. The output also differs between the two packages.
In brms
, We can use the default_prior
(or get_prior
) function to set prior for the model. This function automatically make a weakly informative prior, which is suitable for most models (more detailed information is here[url]). Of course, you can also set your own prior if you are familiar with your datasets and want to set more informative one (see the section 4).
Things to keep in your mind are to use vcv.phylo(phylogenetic_tree, corr = TRUE)
function to get the phylogenetic correlation matrix. brms vignettes use vcv.phylo(phylogenetic_tree)
, but it looks like incorrect. You need to specify corr = TRUE
. Because the default is corr = FALSE
, which means the function returns the phylogenetic variance-covariance matrix, not the correlation matrix.
We set large iter
and warmup
to improve mixing and effective sample sizes than default settings (iter
= 2000, warmup
= 1000). Also, we set control = list(adapt_delta = 0.95)
to control parameter to avoid divergent transitions. Setting adapt_delta
closer to 1 increases the step size during sampling to improve model stability.
By default, Stan runs four independent Markov chains (chain = 4
), each exploring the posterior distribution. But four chains are rarely required; two are enough (chain = 2
).
<- ape::vcv.phylo(tree, corr = TRUE) # Get the phylogenetic correlation matrix A
In brms
, we can also check the Rhat
value to see if the model has converged. The Rhat
value should be close to 1.0.
plot(brms_mg1) # Visualise effects
summary(brms_mg1) # See the result
Family: gaussian
Links: mu = identity; sigma = identity
Formula: zLength ~ 1 + (1 | gr(Phylo, cov = A))
Data: dt (Number of observations: 300)
Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
total post-warmup draws = 10000
Multilevel Hyperparameters:
~Phylo (Number of levels: 300)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.09 0.07 0.95 1.24 1.00 1156 2572
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.10 0.54 -1.16 0.97 1.00 2228 3739
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.22 0.02 0.18 0.26 1.00 1091 2257
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
To check the consistency between the results obtained from MCMCglmm
and brms
, we compare the mean of posterior distribution of the fixed and random effect estimates. You should note that the output of random effect estimates and residuals are presented in slightly different formats between MCMCglmm
and brms
. In MCMCglmm
, the outputs (G-structure
/ R-structure
) are provided as variance, while in brms
, the outputs (Multilevel Hyperparameters
/ Further Distributional Parameters
) are given as standard deviation. Therefore, when comparing these estimates between the two packages, you need to square the values obtained from brms
.
As you can see below, the MCMCglmm
and brms
estimates do not exactly match. However, the values are close, and the overall trend is consistent. This suggests that the outcomes obtained from either approach remain largely the same. Thus, even with some differences in the estimates, the broader results and patterns can be interpreted similarly. *The difference comes from factors such as each package’s sampling algorithms and the prior setting.
In this model, the phylogenetic signal is quite high - over 0.95
. This means that 95% of the variance in tail length is explained by the phylogenetic relationship among species. This is a strong phylogenetic signal, suggesting that closely related species tend to have similar tail lengths!
Intercept
summary(mcmcglmm_mg1$Sol) # MCMCglmm
Iterations = 60001:259801
Thinning interval = 200
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
-0.17781 0.63444 0.02006 0.02006
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-1.4874 -0.6218 -0.1435 0.2346 0.9961
<- as_draws_df(brms_mg1) # Convert brms object to data frame
draws_df summary(draws_df$b_Intercept) # brms
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.21282 -0.46570 -0.09844 -0.10152 0.26306 1.84947
Random effect and phylogenetic signals (and also residuals)
# random effect
mean(mcmcglmm_mg1$VCV[, "Phylo"]) # MCMCglmm(g-structure)
[1] 1.513938
mean(draws_df$sd_Phylo__Intercept)^2 # brms(Multilevel Hyperparameters)
[1] 1.179172
# residuals
mean(mcmcglmm_mg1$VCV[, "units"]) # MCMCglmm(r-structure)
[1] 0.02785073
mean(draws_df$sigma)^2 # brms(Further Distributional Parameters)
[1] 0.04712953
# phylogenetic signals
<- mean(mcmcglmm_mg1$VCV[, "Phylo"]) / (mean(mcmcglmm_mg1$VCV[, "Phylo"]) + mean(mcmcglmm_mg1$VCV[, "units"]))
phylo_signal_mcmcglmm
<- mean(draws_df$sd_Phylo__Intercept)^2 / (mean(draws_df$sd_Phylo__Intercept)^2 + mean(draws_df$sigma)^2)
phylo_signal_brms
phylo_signal_mcmcglmm
[1] 0.9819361
phylo_signal_brms
[1] 0.9615677
One continuous explanatory variable model
MCMCglmm
system.time(
<- MCMCglmm(zLength ~ zTemp,
mcmcglmm_mg2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
prior = prior1,
family = "gaussian",
data = dt,
nitt = 13000*25,
thin = 10*25,
burnin = 3000*25
) )
summary(mcmcglmm_mg2)
Iterations = 75001:324751
Thinning interval = 250
Sample size = 1000
DIC: -5.777706
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 1.464 1.065 1.912 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.03053 0.009984 0.05334 1000
Location effects: zLength ~ zTemp
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.17061 -1.48930 0.99052 1000 0.768
zTemp 0.03310 -0.05767 0.11829 1000 0.482
posterior_summary(mcmcglmm_mg2$VCV)
Estimate Est.Error Q2.5 Q97.5
Phylo 1.46421444 0.2224519 1.06723218 1.912723
units 0.03052545 0.0113458 0.01195931 0.057253
posterior_summary(mcmcglmm_mg2$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.1706142 0.63514628 -1.4145991 1.1523617
zTemp 0.0331045 0.04637504 -0.0522391 0.1252606
The fixed effects estimate for the relationship between tail length and standardized temperature is 0.0331 (posterior mean), with a 95% HPD interval of [-0.05, 0.11]. The pMCMC value is 0.482, indicating that the effect of standardised temperature is not statistically supported under this model. The intercept, representing the expected tail length when standardised temperature is zero, has a posterior mean of -0.171, with a 95% credible interval of [-1.42, 1.15], also suggesting no strong evidence for deviation from zero.
The 95% HPD (Highest Posterior Density interval) from MCMCglmm
and the 95% CI ( 95% equal-tailed credible interval) from brms
serve a similar purpose, but they are constructed differently and can yield different intervals, especially in asymmetric posterior distributions.
Highest Posterior Density (HPD) Interval:
The HPD interval is the narrowest interval that contains a specified proportion (e.g., 95%) of the posterior distribution. It includes the most probable values of the parameter, ensuring that every point inside the interval has a higher posterior density than any point outside.
Equal-Tailed Credible Interval (CI):
An equal-tailed credible interval includes the central 95% of the posterior distribution, leaving equal probabilities (2.5%) in each tail. This means that there’s a 2.5% chance that the parameter is below the interval and a 2.5% chance that it’s above.
https://search.r-project.org/CRAN/refmans/HDInterval/html/hdi.html
brms
<- default_prior(
default_priors2 ~ zTemp + (1|gr(Phylo, cov = A)),
zLength data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(zLength ~ zTemp + (1|gr(Phylo, cov = A)),
brms_mg2 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_priors2,
iter = 10000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
threads = threading(5)
) )
summary(brms_mg2)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: zLength ~ zTemp + (1 | gr(Phylo, cov = A))
Data: dt (Number of observations: 223)
Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
total post-warmup draws = 10000
Multilevel Hyperparameters:
~Phylo (Number of levels: 223)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.20 0.09 1.03 1.38 1.00 864 1701
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.15 0.60 -1.33 1.03 1.00 1900 3128
zTemp 0.04 0.05 -0.05 0.13 1.00 1961 4758
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.18 0.03 0.12 0.24 1.00 537 923
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The posterior estimate for the effect of zTemp on zLength is 0.04, with a 95% credible interval of [-0.05, 0.13]. The uncertainty surrounding this effect suggests that the relationship between standardised temperature and tail length is weak and not statistically significant. The intercept has a posterior estimate of -0.15 with a 95% credible interval of [-1.33, 1.03], representing the average tail length when zTemp = 0, although this is also uncertain.
The results of MCMCglmm and brms are generally consistent, with a few differences. The estimates and credible intervals for the fixed effect (zTemp) are almost identical, both indicating a weak relationship. These differences mainly arise from the computational methods and output formats of the models but do not affect the conclusions.
One continuous and one categorical explanatory variable model
Finally, we add categorical explanatory variable, Contrasting
, to the above model. In this model, we test whether species with contrasting tail tips have longer tails than those without contrasting tail tips, after controlling for the effect of temperature.
MCMCglmm
system.time(
<- MCMCglmm(zLength ~ zTemp + Contrasting,
mcmcglmm_mg3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
prior = prior1,
family = "gaussian",
data = dt,
nitt=13000*30,
thin=10*30,
burnin=3000*30
) )
summary(mcmcglmm_mg3)
Iterations = 90001:389701
Thinning interval = 300
Sample size = 1000
DIC: 87.21227
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 1.19 0.9022 1.5 1113
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.04284 0.02576 0.05855 1168
Location effects: zLength ~ zTemp + Contrasting
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.184003 -1.362926 0.869770 1000 0.740
zTemp 0.082215 -0.006456 0.155338 1000 0.050 .
Contrasting1 0.163943 0.059351 0.290087 1000 0.004 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
posterior_summary(mcmcglmm_mg3$VCV)
Estimate Est.Error Q2.5 Q97.5
Phylo 1.19045414 0.16451779 0.90569889 1.51171543
units 0.04283667 0.00864498 0.02680094 0.06011227
posterior_summary(mcmcglmm_mg3$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.18400256 0.5840963 -1.262384395 1.0278077
zTemp 0.08221542 0.0420212 0.001294616 0.1679503
Contrasting1 0.16394271 0.0581162 0.036145589 0.2736073
Both temperature (\(zTemp\)) and contrasting patterns appear to influence tail length. Contrasting has a stronger effect than temperature
brms
<- default_prior(
default_priors3 ~ zTemp + Contrasting + (1|gr(Phylo, cov = A)),
zLength data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(zLength ~ zTemp + Contrasting + (1|gr(Phylo, cov = A)),
brms_mg3 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_priors3,
iter = 10000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
threads = threading(5)
) )
summary(brms_mg3)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: zLength ~ zTemp + Contrasting + (1 | gr(Phylo, cov = A))
Data: dt (Number of observations: 223)
Draws: 2 chains, each with iter = 10000; warmup = 5000; thin = 1;
total post-warmup draws = 10000
Multilevel Hyperparameters:
~Phylo (Number of levels: 223)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.23 0.09 1.06 1.42 1.01 488 945
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.16 0.62 -1.38 1.04 1.00 1728 3132
zTemp 0.03 0.05 -0.05 0.12 1.00 1447 2230
Contrasting1 0.15 0.06 0.02 0.27 1.00 2826 4736
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.16 0.03 0.09 0.22 1.01 295 286
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
In brms, \(Contrasting\) has a positive effect on tail length. The posterior estimate is 0.15, with a 95% credible interval of [0.03, 0.27]. This suggests that species with contrasting tail tips have longer tails than those without contrasting tail tips, after controlling for the effect of temperature. The effect of temperature on tail length is similar to the previous model, with a posterior estimate of 0.03 and a 95% credible interval of [-0.05, 0.13].
Although there is some overlap in the estimates of temperature’s effect between the two models, the key difference is that MCMCglmm provides a positive effect with stronger evidence (CI not including zero), whereas brms shows more uncertainty (CI including zero). This suggests that while both models suggest a potential effect, brms does not provide as clear evidence for it being significant, and the effect size might be smaller or potentially null.
Bivariate model
Here, we provide more advanced models - bivariate models. Because the bivariate model is similar to the nominal model, it allows for examining relationships between two response variables while taking into account multiple explanatory variables. In this model, we will include two response variables, zLength
and zMass
, and test the relationship between these two variables and the explanatory variables, zTemp
and Contrasting
. This model can examine the evolutionary correlation between tail length and body mass in rodents, considering the phylogenetic relationships.
In MCMCglmm
, we use cbind(traitA, traitB)
to set the response variables (this allows us to estimate the covariance between the two response variables). In brms
, we use mvbind(traitA, traitB)
to set the response variable. We also need to specify family
for both response variables (unless you use a custom prior, you need not define what distribution fit to the random effect in brms
- default prior automatically set them).
The important point is, MCMCglmm
does not provide the correlation estimate between the two response variables (it estimate the covariance between two response variables), while brms
does. However, you need to use (1|a|gr(Phylo, cov = A))
instead of (1|gr(Phylo, cov = A))
and set set_rescor(TRUE).
Without the newly added |a|
and set_rescor(TRUE)
, brms
will not calculate the correlation (a
can be any letter). We need to calculate the correlation using covariance estimates in MCMCglmm
. Fourtunately, the correlation can calculate easily by dividing the covariance by the product of the standard deviations of the two variables.
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(G = list(G1 = list(V = diag(2),
prior2 nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 1000)),
R = list(V = diag(2), nu = 0.002)
)
system.time(
<- MCMCglmm(cbind(zLength, zMass) ~ trait - 1,
mcmcglmm_mg4 random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
<- MCMCglmm(cbind(zLength, zMass) ~ zTemp:trait + trait - 1,
mcmcglmm_mg5 random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
<- MCMCglmm(cbind(zLength, zMass) ~ zTemp:trait + Contrasting:trait + trait -1,
mcmcglmm_mg6 random = ~ us(trait):Phylo,
rcov = ~ us(trait):units,
family = c("gaussian", "gaussian"),
ginv = list(Phylo = inv_phylo$Ainv),
data = dt,
prior = prior2,
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
## brms
<- ape::vcv.phylo(tree, corr = TRUE)
A
<- bf(mvbind(zLength, zMass) ~ 1 +
formula 1|a|gr(Phylo, cov = A)),
(set_rescor(rescor = TRUE)
)
<- default_prior(formula,
default_prior data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(formula = formula,
brms_mg4 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior,
iter = 25000,
warmup = 10000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
)
)
<- bf(mvbind(zLength, zMass) ~ zTemp + (1|a|gr(Phylo, cov = A)), set_rescor(TRUE))
formula2
<- default_prior(formula2,
default_prior2 data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(formula = formula2,
brms_mg5_1 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior,
iter = 55000,
warmup = 45000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
)
)
<- bf(mvbind(zLength, zMass) ~ zTemp + Contrasting + (1|a|gr(Phylo, cov = A)), set_rescor(TRUE))
formula3
<- default_prior(formula3,
default_prior3 data = dt,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(formula = formula3,
brms_mg6 data = dt,
data2 = list(A = A),
family = gaussian(),
prior = default_prior3,
iter = 25000,
warmup = 20000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.98),
) )
summary(mcmcglmm_mg4)
Iterations = 165001:714451
Thinning interval = 550
Sample size = 1000
DIC: 56.02311
G-structure: ~us(trait):Phylo
post.mean l-95% CI u-95% CI eff.samp
traitzLength:traitzLength.Phylo 1.8450 1.3355 2.4126 1000
traitzMass:traitzLength.Phylo -0.1288 -0.5302 0.2168 1000
traitzLength:traitzMass.Phylo -0.1288 -0.5302 0.2168 1000
traitzMass:traitzMass.Phylo 2.1016 1.5978 2.6773 1120
R-structure: ~us(trait):units
post.mean l-95% CI u-95% CI eff.samp
traitzLength:traitzLength.units 0.024426 0.006637 0.04178 1000
traitzMass:traitzLength.units 0.005761 -0.008694 0.02082 1000
traitzLength:traitzMass.units 0.005761 -0.008694 0.02082 1000
traitzMass:traitzMass.units 0.057850 0.033641 0.08572 1000
Location effects: cbind(zLength, zMass) ~ trait - 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
traitzLength -0.04079 -2.04401 1.70436 1000 0.968
traitzMass -0.44244 -2.33966 1.55863 1000 0.688
posterior_summary(mcmcglmm_mg4$VCV)
Estimate Est.Error Q2.5
traitzLength:traitzLength.Phylo 1.845005322 0.272081110 1.347508423
traitzMass:traitzLength.Phylo -0.128783301 0.192333313 -0.517573376
traitzLength:traitzMass.Phylo -0.128783301 0.192333313 -0.517573376
traitzMass:traitzMass.Phylo 2.101601613 0.288665978 1.624968122
traitzLength:traitzLength.units 0.024426281 0.009195449 0.008110895
traitzMass:traitzLength.units 0.005761157 0.007496943 -0.008150784
traitzLength:traitzMass.units 0.005761157 0.007496943 -0.008150784
traitzMass:traitzMass.units 0.057849736 0.013834738 0.036274710
Q97.5
traitzLength:traitzLength.Phylo 2.47378715
traitzMass:traitzLength.Phylo 0.23065613
traitzLength:traitzMass.Phylo 0.23065613
traitzMass:traitzMass.Phylo 2.74203079
traitzLength:traitzLength.units 0.04557695
traitzMass:traitzLength.units 0.02198074
traitzLength:traitzMass.units 0.02198074
traitzMass:traitzMass.units 0.09066023
posterior_summary(mcmcglmm_mg4$Sol)
Estimate Est.Error Q2.5 Q97.5
traitzLength -0.04079478 0.9568184 -1.827382 2.059106
traitzMass -0.44243620 1.0046735 -2.486688 1.513088
<- (mcmcglmm_mg4$VCV[, 2]) /sqrt((mcmcglmm_mg4$VCV[, 1] * mcmcglmm_mg4$VCV[, 4])) # calculate phylogenetic correlation between zLength and zMass
corr_p <- (mcmcglmm_mg4$VCV[, 6]) /sqrt((mcmcglmm_mg4$VCV[, 5] * mcmcglmm_mg4$VCV[, 8]))
corr_nonp posterior_summary(corr_p)
Estimate Est.Error Q2.5 Q97.5
var1 -0.06650913 0.09770663 -0.2628186 0.1173509
posterior_summary(corr_nonp)
Estimate Est.Error Q2.5 Q97.5
var1 0.1527294 0.1890808 -0.2165917 0.5260998
summary(brms_mg4)
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = identity
mu = identity; sigma = identity
Formula: zLength ~ 1 + (1 | a | gr(Phylo, cov = A))
zMass ~ 1 + (1 | a | gr(Phylo, cov = A))
Data: dt (Number of observations: 223)
Draws: 2 chains, each with iter = 35000; warmup = 25000; thin = 1;
total post-warmup draws = 20000
Multilevel Hyperparameters:
~Phylo (Number of levels: 223)
Estimate Est.Error l-95% CI u-95% CI
sd(zLength_Intercept) 1.22 0.09 1.05 1.40
sd(zMass_Intercept) 1.29 0.09 1.12 1.48
cor(zLength_Intercept,zMass_Intercept) -0.06 0.10 -0.26 0.13
Rhat Bulk_ESS Tail_ESS
sd(zLength_Intercept) 1.00 1048 2370
sd(zMass_Intercept) 1.00 3079 6693
cor(zLength_Intercept,zMass_Intercept) 1.00 1912 4279
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
zLength_Intercept -0.14 0.61 -1.35 1.05 1.00 2159 4378
zMass_Intercept -0.42 0.63 -1.67 0.83 1.00 3442 6223
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_zLength 0.17 0.03 0.11 0.23 1.00 751 1004
sigma_zMass 0.25 0.03 0.20 0.31 1.00 2092 4823
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
rescor(zLength,zMass) 0.14 0.18 -0.22 0.47 1.00 2017
Tail_ESS
rescor(zLength,zMass) 4659
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
As you can see the above, the estimates from both models are consistent:
the random effects are…
zLength is 1.845 (1.348, 2.474)
and zMass is 2.102 (1.625, 2.742)
in MCMCglmm
and 1.4884 (1.1025, 1.96)
and 1.664 (1.2544, 2.1904)
in brms
. Phylogenetic correlation between zLength and zMass are -0.067 (-0.263, 0.117)
in MCMCglmm
and -0.06 (-0.26, 0.13)
The phylogenetic signal is
2. Binary models
Binary models are used when the response variable is binary (0 or 1), such as presence (1) vs. absence (0), survival (1) vs. death (0), success (1) vs. failure (0), or female vs. male. The results are straightforward to interpret: the model provides the probability of a ‘success’ (coded as 1) given the predictor variables. For example, in a female/male classification, the output represents the probability of being male, as R typically treats the alphabetically first category as the reference level.
Explanation of dataset
We tested the relationships between the presence of red–orange pelage on limbs (response variable) and average social group size and activity cycle in primates (265 species). We used the dataset and tree from Macdonald et al. (2024). The dataset contains information on the presence of skin and pelage coloration data, average social group size, presence or absence of multilevel hierarchical societies, colour visiual system, and activity cycle. We aim to test whether the presence of red–orange pelage on limbs is associated with social group size and activity cycle in primates.
For run model, we need to prepare the suitable format data.
<- read.nexus(here("data", "potential", "primate", "trees100m.nex"))
p_trees <- read.csv(here("data", "potential", "primate", "primate_data_male.csv"))
primate_data <- primate_data
p_dat
<- subset(p_dat, !is.na(vs_male)) # omit NAs by VS predictor
p_dat <- subset(p_dat, !is.na(vs_female))
p_dat <- subset(p_dat, !is.na(activity_cycle)) # omit NAs by activity cycle
p_dat <- subset(p_dat, !is.na(redpeachpink_facial_skin)) # omit NAs by response variable
p_dat <- subset(p_dat, !is.na(social_group_size)) # omit NAs by response variable
p_dat <- subset(p_dat, !is.na(multilevel)) # omit NAs by response variable
p_dat
# Rename the column 'social_group_size' to 'cSocial_group_size' - all continuous variables were centred and standardized by authors
<- p_dat %>% rename(cSocial_group_size = social_group_size)
p_dat
<- p_dat %>%
p_dat mutate(across(c(vs_female, vs_male, redpeachpink_facial_skin,red_genitals,red_pelage_head, red_pelage_body_limbs, red_pelage_tail), as.factor))
<- p_trees[[1]]
p_tree <- lapply(p_trees, drop.tip,tip = setdiff(p_tree$tip.label, p_dat$PhyloName)) #trim out everything from the tree that's not in the dataset
p_trees <- p_trees[[1]] # select one tree for trimming purposes
p_tree <- force.ultrametric(p_tree) # force tree to be ultrametric - all tips equidistant from root p_tree
How to impliment models and interpret the outputs?
In MCMCglmm, the binary model can take two kinds of link functions: logit and probit link functions. The logit link function (family = "categorical"
) is the default in MCMCglmm for the binary model, while the probit link function can be defined using family = "threshold"
or family = "ordinal"
. We recommend using family = "threshold"
as it is often more suitable for modelling under biological assumption and can provide better convergence properties in certain situations. Here, we show both models.
Univariate model
Intercept-only model
Probit model
MCMCglmm
We need to set different prior for binary models from the Gaussian mopdel. Actually, the residual variance does not need to be estimated in binary models, so we only need to set the prior for the random effect. In binary models, the response variable y
is either 0 or 1, which means there are no variety residuals to measure variability around a predicted mean. For logistic regression (logit link), the residual variance is fixed at ^2/3 on the latent scale. For probit regression (probit link), the residual variance is fixed at 1 on the latent scale. In MCMCglmm, the prior settings do not differ between the probit and logit models.
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo
<- list(R = list(V = 1, fix = 1), # fix residual variance = 1
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)system.time(
<- MCMCglmm(red_pelage_body_limbs ~ 1,
mcmcglmm_BP1 random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)
summary(mcmcglmm_BP1) # 95%HPD Interval
Iterations = 60001:259801
Thinning interval = 200
Sample size = 1000
DIC: 298.7917
G-structure: ~PhyloName
post.mean l-95% CI u-95% CI eff.samp
PhyloName 3.186 0.2795 7.242 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: red_pelage_body_limbs ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.8487 -2.8595 0.8645 1093 0.296
posterior_summary(mcmcglmm_BP1$VCV) # 95%CI
Estimate Est.Error Q2.5 Q97.5
PhyloName 3.185968 2.226905 0.6918403 8.6888
units 1.000000 0.000000 1.0000000 1.0000
posterior_summary(mcmcglmm_BP1$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.8487221 0.8998161 -2.859676 0.8638048
As you can see here, residual variance is fixed 1 - the model did not estimate the residual variance (R-structure
). The MCMC chain mixed well. The positive posterior mean of phylogenetic random effect (3.186) suggests that closely related species tend to have more red limb pelages.The fact that the 95% HPD interval / 95% equal-tailed credible interval does not include zero indicates that the effect of lineage on red pelages is statistically significant.
brms
family
argument is set to bernoulli()
(not binomial()
). The bernoulli()
family is used for binary data, while the binomial()
family is used for the combination of count data, such as (c(N_sucsess, N_failure)
).
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
priors_brms data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
<- brm(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
brms_BP1 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms,
iter = 6000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
) )
summary(brms_BP1)
Family: bernoulli
Links: mu = probit
Formula: red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A))
Data: p_dat (Number of observations: 265)
Draws: 2 chains, each with iter = 6000; warmup = 5000; thin = 1;
total post-warmup draws = 2000
Multilevel Hyperparameters:
~PhyloName (Number of levels: 265)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.67 0.53 0.78 2.85 1.00 393 727
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.72 0.81 -2.38 0.82 1.00 752 1027
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# posterior_interval(brms_BP1)
The same trend in results was observed in brms. Note that the phylogenetic random effect estimate in brms is given as the standard deviation (sd), so to compare it with MCMCglmm, we need to square the sd. There were no differences between the fixed effects in brms and MCMCglmm. Outputs of brms is usually 95% equal-tailed credible interval.
Phylogenetic signals (probit model)
# MCMCglmm
<- ((mcmcglmm_BP1$VCV[, "PhyloName"]) / (mcmcglmm_BP1$VCV[, "PhyloName"] + 1))
phylo_signal_mcmcglmm_BP %>% mean() phylo_signal_mcmcglmm_BP
[1] 0.7052307
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mcmcglmm_BP
2.5% 50% 97.5%
0.4089277 0.7260425 0.8967865
# brms
<- brms_BP1 %>% as_tibble() %>%
phylo_signal_brms_BP ::select(Sigma_phy = sd_PhyloName__Intercept) %>%
dplyrmutate(lambda_probit = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_probit)
%>% mean() phylo_signal_brms_BP
[1] 0.6992207
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_brms_BP
2.5% 50% 97.5%
0.3795359 0.7267670 0.8903665
As expected from the random effects estimates above, the systematic signal is estimated to be 0.71 for MCMCMCglmm
and 0.70 for brms
. The difference with the estimation from Gaussian model is that the residuals are fixed at 1 in the binary model. That is the same for the other discrete models.
Logit model
MCMCglmm
c2 correction
In the logit model, we need to correct the estimates obtained from MCMCglmm
as MCMCglmm
implements additive overdispersion GLMM fitted by MCMC sampling from the posterior distribution. This overdispersion lead to inflated variance estimates (see also Nakagawa & Schielzeth (2010)). To adjust for this, we need to rescale the estimates of both location effects and variance components by considering the residual variance adjustment (MCMCglmm course note
written by Jarrod Hadfield). We can use the below code:
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- model$Sol / sqrt(1+c2) # for fixed effects
res_1 <- model$VCV / (1+c2) # for variance components res_2
The logit model is a bit difficult to converge than the probit model. So, we set a bit large nitt
, thin
, and burnin
to improve mixing and effective sample sizes than the probit model.
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ 1,
mcmcglmm_BL1 random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60)
)
brms
In brms, we do not need to conduct c2 correction as brms
automatically corrects the overdispersion.
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
priors_brms2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
<- brm(red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A)),
brms_BL1 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms2,
iter = 7500,
warmup = 6500,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
) )
# MCMCglmm
summary(mcmcglmm_BL1)
Iterations = 180001:779401
Thinning interval = 600
Sample size = 1000
DIC: 294.3311
G-structure: ~PhyloName
post.mean l-95% CI u-95% CI eff.samp
PhyloName 11.64 0.7224 26.95 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: red_pelage_body_limbs ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -1.665 -5.154 1.833 1000 0.316
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmcglmm_BL1$Sol/sqrt(1+c2)
res_1 <- mcmcglmm_BL1$VCV/(1+c2)
res_2
posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -1.435339 1.560119 -4.565329 1.502008
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
PhyloName 8.6476147 6.01105 1.5836214 24.0221167
units 0.7430287 0.00000 0.7430287 0.7430287
#brms
summary(brms_BL1)
Family: bernoulli
Links: mu = logit
Formula: red_pelage_body_limbs ~ 1 + (1 | gr(PhyloName, cov = A))
Data: p_dat (Number of observations: 265)
Draws: 2 chains, each with iter = 7500; warmup = 6500; thin = 1;
total post-warmup draws = 2000
Multilevel Hyperparameters:
~PhyloName (Number of levels: 265)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.61 0.85 1.21 4.46 1.00 550 971
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -1.08 1.19 -3.54 1.13 1.00 1433 1398
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The point estimates and the overlaps of 95%CI of both random and fixed effects look fine.
Phylogenetic signals (logit model)
Then, we can obtain the phylogenetic signal using
\[ H^2 = \frac{\sigma_{a}^2}{\sigma_{a}^2 + \pi^2/3} \]
# MCMCglmm
<- ((mcmcglmm_BL1$VCV[, "PhyloName"]/(1+c2)) / (mcmcglmm_BL1$VCV[, "PhyloName"]/(1+c2)+1))
phylo_signal_mcmcglmm_BL %>% mean() phylo_signal_mcmcglmm_BL
[1] 0.8541816
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mcmcglmm_BL
2.5% 50% 97.5%
0.6129464 0.8776149 0.9600353
# brms
<- brms_BL1 %>% as_tibble() %>%
phylo_signal_brms_BL ::select(Sigma_phy = sd_PhyloName__Intercept) %>%
dplyrmutate(lambda_logit = (Sigma_phy^2 / (Sigma_phy^2+1))) %>%
pull(lambda_logit)
%>% mean() phylo_signal_brms_BL
[1] 0.8411519
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_brms_BL
2.5% 50% 97.5%
0.5941574 0.8636453 0.9520353
Both estimated phylogenetic signals took close values.
One continuous explanatory variable model
For the next step, we examine whether the group size affect the presence or absence of red pelagic on the body limbs in primates to test the possibility that red pelage is related to social roles and behaviours within the group.
Probit model
The model using MCMCglmm
is
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size,
mcmcglmm_BP2 random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*30,
thin = 10*30,
burnin = 3000*30)
)
For brms
…
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
priors_brms3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
<- brm(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
brms_BP2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms3,
iter = 10000,
warmup = 8500,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
) )
# MCMCglmm
summary(mcmcglmm_BP2)
Iterations = 90001:389701
Thinning interval = 300
Sample size = 1000
DIC: 299.4154
G-structure: ~PhyloName
post.mean l-95% CI u-95% CI eff.samp
PhyloName 3.489 0.2165 7.618 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: red_pelage_body_limbs ~ cSocial_group_size
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.9754 -2.8047 0.8503 1979 0.258
cSocial_group_size -0.1136 -0.3679 0.1707 1091 0.418
posterior_summary(mcmcglmm_BP2$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.9753971 0.9209577 -2.9483573 0.7464915
cSocial_group_size -0.1135599 0.1364510 -0.4137361 0.1260902
posterior_summary(mcmcglmm_BP2$VCV)
Estimate Est.Error Q2.5 Q97.5
PhyloName 3.488721 2.419961 0.6210229 8.835258
units 1.000000 0.000000 1.0000000 1.000000
#brms
summary(brms_BP2)
Family: bernoulli
Links: mu = probit
Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A))
Data: p_dat (Number of observations: 265)
Draws: 2 chains, each with iter = 10000; warmup = 8500; thin = 1;
total post-warmup draws = 3000
Multilevel Hyperparameters:
~PhyloName (Number of levels: 265)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.68 0.53 0.78 2.86 1.00 440 944
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.75 0.85 -2.54 0.88 1.00 1318 1299
cSocial_group_size -0.11 0.13 -0.40 0.12 1.00 3127 1926
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Both models suggest that social group size does not significantly affect the presence or absence of red pelage on the body limbs in primates, as the credible intervals include zero and the p-values are higher than 0.05.
Logit model
The model for MCMCglmm…
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size,
mcmcglmm_BL2 random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60)
)
For brms…
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
priors_brms4 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
<- brm(red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A)),
brms_biL2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms4,
iter = 18000,
warmup = 8000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
) )
summary(mcmcglmm_BL2)
Iterations = 180001:779401
Thinning interval = 600
Sample size = 1000
DIC: 294.8728
G-structure: ~PhyloName
post.mean l-95% CI u-95% CI eff.samp
PhyloName 11.95 0.8832 26.74 835
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: red_pelage_body_limbs ~ cSocial_group_size
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -1.8090 -5.3966 1.7961 1000 0.266
cSocial_group_size -0.2622 -0.8166 0.2671 1000 0.356
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmcglmm_BL2$Sol/sqrt(1+c2)
res_1 <- mcmcglmm_BL2$VCV/(1+c2)
res_2
posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -1.5593137 1.5801956 -4.9135095 1.3437762
cSocial_group_size -0.2260184 0.2484719 -0.7398858 0.2081084
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
PhyloName 8.8770592 5.968142 1.4813098 23.2748934
units 0.7430287 0.000000 0.7430287 0.7430287
# brms
summary(brms_BP2)
Family: bernoulli
Links: mu = probit
Formula: red_pelage_body_limbs ~ cSocial_group_size + (1 | gr(PhyloName, cov = A))
Data: p_dat (Number of observations: 265)
Draws: 2 chains, each with iter = 10000; warmup = 8500; thin = 1;
total post-warmup draws = 3000
Multilevel Hyperparameters:
~PhyloName (Number of levels: 265)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.68 0.53 0.78 2.86 1.00 440 944
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.75 0.85 -2.54 0.88 1.00 1318 1299
cSocial_group_size -0.11 0.13 -0.40 0.12 1.00 3127 1926
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Estimates from both packages were similar and the logit model yielded the same results as the probit model: the size of the social group has no significant effect on this trait.
One continuous and one categorical explanatory variable model
Therefore, we hypothesised that there might be a relationship between the presence of red pelage and whether a primate species is diurnal or not. We included this as a predictor variable in our model. This is because red colouration may not be as visible in the dark, making it potentially less advantageous for nocturnal species. On the other hand, diurnal species may benefit more from the visibility of red pelage, which could play a role in social/sexual signalling or other ecological factors during the day. Note that the activity cycle variable comprised 01 data, whether diurnal or not, to simplify understanding the model outputs.
Probit model
For the model MCMCglmm the following:
$diurnal <- ifelse(p_dat$activity_cycle == "di", 1, 0)
p_dat
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size + diurnal,
mcmcglmm_BP3 random = ~ PhyloName,
family = "threshold",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*50,
thin = 10*50,
burnin = 3000*50)
)
For brms,
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- default_prior(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
priors_brms5 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"))
system.time(
<- brm(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
brms_BP3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = priors_brms5,
iter = 5000,
warmup = 3000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
) )
# MCMCglmm
summary(mcmcglmm_BP3)
Iterations = 150001:649501
Thinning interval = 500
Sample size = 1000
DIC: 299.2849
G-structure: ~PhyloName
post.mean l-95% CI u-95% CI eff.samp
PhyloName 4.063 0.535 9.133 900.6
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.685030 -3.381045 1.969969 1000 0.584
cSocial_group_size -0.110831 -0.373846 0.151850 1000 0.392
activity_cycledi -0.533313 -2.486488 1.364789 1000 0.572
activity_cyclenoct -0.007557 -1.807080 1.823124 1000 0.986
posterior_summary(mcmcglmm_BP3$VCV)
Estimate Est.Error Q2.5 Q97.5
PhyloName 4.062649 2.673005 0.947084 11.42444
units 1.000000 0.000000 1.000000 1.00000
posterior_summary(mcmcglmm_BP3$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.685030297 1.3457124 -3.6844040 1.7696060
cSocial_group_size -0.110831330 0.1342969 -0.4257179 0.1235912
activity_cycledi -0.533313064 0.9653465 -2.6461670 1.2170270
activity_cyclenoct -0.007557281 0.9589911 -1.9164287 1.7255631
# brms
summary(brms_BP3)
Family: bernoulli
Links: mu = probit
Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | gr(PhyloName, cov = A))
Data: p_dat (Number of observations: 265)
Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
total post-warmup draws = 20000
Multilevel Hyperparameters:
~PhyloName (Number of levels: 265)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.86 0.58 0.89 3.21 1.00 3870 6693
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.54 1.21 -2.96 1.87 1.00 9333 10083
cSocial_group_size -0.11 0.13 -0.40 0.13 1.00 21096 14055
activity_cycledi -0.50 0.93 -2.46 1.24 1.00 9758 9062
activity_cyclenoct -0.01 0.96 -2.00 1.81 1.00 9588 9736
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results indicate that both models (MCMCglmm and brms) are reliable based on diagnostic metrics. Effective sample sizes are sufficient, and Rhat values are close to 1, indicating good convergence.
For random effects, the estimated variance of the phylogenetic effect is consistent between models (MCMCglmm: 3.59; brms: 1.73^2 = 2.99), with overlapping 95% credible intervals. This suggests still phylogenetic influence on red pelage presence, though with some uncertainty.
For fixed effects, neither social group size (MCMCglmm: -0.10, brms: -0.11) nor diurnal (MCMCglmm: -0.49, brms: -0.48) showed statistically significant effects, as their 95% CIs include zero. This suggests that group size and diurnal activity are unlikely to be major determinants of red pelage presence.
Logit model Finally, MCMCglmm
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(red_pelage_body_limbs ~ cSocial_group_size + diurnal,
mcmcglmm_BL3 random = ~ PhyloName,
family = "categorical",
data = p_dat,
prior = prior1,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*50,
thin = 10*50,
burnin = 3000*50)
)
And brms,
<- ape::vcv.phylo(p_tree, corr = TRUE)
A <- default_prior(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
priors_brms6 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
system.time(
<- brm(red_pelage_body_limbs ~ cSocial_group_size + diurnal + (1 | gr(PhyloName, cov = A)),
brms_BL3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors_brms6,
iter = 18000,
warmup = 8000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
core = 2,
thread = threading(5)
) )
summary(mcmcglmm_BL3)
Iterations = 240001:1039201
Thinning interval = 800
Sample size = 1000
DIC: 293.6985
G-structure: ~PhyloName
post.mean l-95% CI u-95% CI eff.samp
PhyloName 15.16 1.792 37.31 905.8
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -1.30757 -6.63033 3.76436 1000 0.594
cSocial_group_size -0.23416 -0.76302 0.28469 1000 0.392
activity_cycledi -1.15936 -5.18073 2.29078 1000 0.528
activity_cyclenoct -0.09113 -3.90477 3.45304 1000 0.980
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mcmcglmm_BL3$Sol/sqrt(1+c2)
res_1 <- mcmcglmm_BL3$VCV/(1+c2)
res_2
posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -1.12711731 2.2299457 -5.7979353 3.1769798
cSocial_group_size -0.20184346 0.2376961 -0.7177624 0.1888044
activity_cycledi -0.99936034 1.6233908 -4.3827738 2.1066431
activity_cyclenoct -0.07855316 1.6257651 -3.4238818 2.9358848
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
PhyloName 11.2672019 8.147431 2.2323210 31.2476370
units 0.7430287 0.000000 0.7430287 0.7430287
summary(brms_BL3)
Family: bernoulli
Links: mu = logit
Formula: red_pelage_body_limbs ~ cSocial_group_size + activity_cycle + (1 | gr(PhyloName, cov = A))
Data: p_dat (Number of observations: 265)
Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
total post-warmup draws = 20000
Multilevel Hyperparameters:
~PhyloName (Number of levels: 265)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.93 0.96 1.33 5.06 1.00 3607 5803
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.62 1.88 -4.35 3.21 1.00 9854 10536
cSocial_group_size -0.21 0.25 -0.79 0.21 1.00 19703 12846
activity_cycledi -0.81 1.54 -4.04 2.07 1.00 9439 8872
activity_cyclenoct -0.07 1.56 -3.29 2.93 1.00 9517 9450
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Similar results were obtained with the logit model as with the probit model. In conclusion, the models are consistent and suggest that neither social group size nor diurnal activity significantly affect red pelage presence.
Bivariate model
ADD: explanations #### Probit link
<- inverseA(p_tree, nodes = "ALL", scale = TRUE) # invert covariance matrix for use by MCMCglmm
inv.phylo <- list(G = list(G1 = list(V = diag(2),
prior2 nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 10)),
R = list(V = diag(2), fix = 1)
)
system.time(
<- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ trait - 1,
mcmc_BPB1 random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*25,
thin = 10*25,
burnin = 3000*25
)
)
system.time(
<- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + trait - 1,
mcmc_BPB2 random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*55,
thin = 10*55,
burnin = 3000*55
)
)
system.time(
<- MCMCglmm(cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + diurnal :trait + trait - 1,
mcmc_BPB3 random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("threshold", "threshold"),
data = p_dat,
prior = prior2,
ginverse = list(PhyloName = inv.phylo$Ainv),
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
<- ape::vcv.phylo(p_tree, corr = TRUE)
A <- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ 1 +
formula_biPv1 1|a|gr(PhyloName, cov = A))
(
)
<- default_prior(formula_biPv1,
default_prior2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
<- brm(formula = formula_biPv1,
brms_BPB1 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior2,
iter = 25000,
warmup = 5000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
)
<- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size +
formula_biPv2 1|a|gr(PhyloName, cov = A))
(# set_rescor(TRUE)
)
<- default_prior(formula_biPv2,
default_prior3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
<- brm(formula = formula_biPv2,
brms_BPB2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior3,
iter = 35000,
warmup = 15000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
)
<- bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + diurnal +
formula_biPv3 1|a|gr(PhyloName, cov = A))
(# set_rescor(TRUE)
)
<- default_prior(formula_biPv3,
default_prior4 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit")
)
system.time(
<- brm(formula = formula_biPv3,
brms_BPB3 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "probit"),
prior = default_prior4,
iter = 35000,
warmup = 25000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
) )
Logit link
# prior setting for mcmcglmm
<- inverseA(p_tree, nodes = "ALL", scale = TRUE)
inv.phylo <- list(R = list(V = diag(2), fix = 1),
prior G = list(G1 = list(V = diag(2), nu = 2, alpha.mu = rep(0, 2),
alpha.V = diag(2) * 10)
)
)
# function to run mcmcglmm
<- function(formula, data, prior, inv_phylo) {
run_mcmcglmm MCMCglmm(
fixed = formula,
random = ~ us(trait):PhyloName,
rcov = ~ us(trait):units,
family = c("categorical", "categorical"),
data = data,
prior = prior,
ginverse = list(PhyloName = inv_phylo),
nitt = 13000*2500, # original *250
thin = 10*2500, # original *250
burnin = 3000*2500 # original *250
)
}
# model list - mcmcglmm
<- list(
mcmcglmm_formulas formula1 = cbind(red_pelage_body_limbs, red_pelage_head) ~ trait - 1,
formula2 = cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + trait - 1,
formula3 = cbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size:trait + activity_cycle:trait + trait - 1
)
#### brms ####
# prior setting for brms
<- ape::vcv.phylo(p_tree, corr = TRUE)
A
<- function(formula, data, A) {
default_priors default_prior(formula,
data = data,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
}
# function to run brms
<- function(formula, data, A, prior) {
run_brms brm(
formula = formula,
data = data,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = prior,
iter = 10000, # original 4000
warmup = 5000, # original 3000
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95)
)
}
# model list
<- list(
brms_formulas formula1 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ 1 + (1|a|gr(PhyloName, cov = A))),
formula2 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + (1|a|gr(PhyloName, cov = A)) ),
formula3 = bf(mvbind(red_pelage_body_limbs, red_pelage_head) ~ cSocial_group_size + activity_cycle + (1|a|gr(PhyloName, cov = A)))
)
#### run mcmcglm and brms ####
# mcmcglmm
<- lapply(seq_along(mcmcglmm_formulas), function(i) {
mcmcglmm_results <- mcmcglmm_formulas[[i]]
formula <- run_mcmcglmm(formula = formula, data = p_dat, prior = prior, inv_phylo = inv.phylo$Ainv)
model
saveRDS(model, file = here("Rdata_tutorial", "binomial", "bivariate", paste0("mcmcglmm_BLB_v2", i, ".rds")))
return(model)
})
# brms
<- lapply(seq_along(brms_formulas), function(i) {
brms_results <- brms_formulas[[i]]
formula <- default_prior(formula,
prior2 data = p_dat,
data2 = list(A = A),
family = bernoulli(link = "logit"))
<- run_brms(formula = formula, data = p_dat, A = A, prior = prior2)
model
saveRDS(model, file = here("Rdata_tutorial", "binomial", "bivariate", paste0("brms_BLB_v2", i, ".rds")))
return(model)
})
#|echo: false
#|
<- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "mcmcglmm_mbiP_biv1.rds"))
mcmc_BPB1 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "mcmcglmm_mbiP_biv2.rds"))
mcmc_BPB2 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "mcmcglmm_mbiP_biv3.rds"))
mcmc_BPB3 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "brms_biPv1.rds"))
brms_BPB1 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "brms_biPv2.rds"))
brms_BPB2 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "brms_biPv3.rds"))
brms_BPB3 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "mcmcglmm_BLB1.rds"))
mcmc_BLB1 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "mcmcglmm_BLB_v2_2.rds"))
mcmc_BLB2 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "mcmcglmm_BLB_v2_3.rds"))
mcmc_BLB3 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "brms_BLB1.rds"))
brms_BLB1 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "brms_BLB2.rds"))
brms_BLB2 <- readRDS(here("Rdata_tutorial", "binomial", "bivariate", "brms_BLB_v2_3.rds")) brms_BLB3
the results are…
3. Ordinal (threshold) models (ordered multinomial model)
In the ordinal threshold model and nominal model, we will show 2 realistic examples. As we explained in the BOX2
on main text, MCMCglmm
and brms
take different parametrisation for the threshold model. At first sight, the outputs about thresholds (cutpoints) appear to be quite difficult. But do not worry about it,we will guide you step by step to understand and interpret these outputs. By the end of this section, you will be able to compare the threshold estimates from both models and understand their implications.
Explanation of dataset
We used the dataset from AVONET (Tobias et al. (2022)) and the phylogenetic tree from BirdTree.
For first example, we tested the relationships between migration level (response variable) and body mass and habitat density in birds of prey (136 species).
For second example, we investigated the difference of habitat density within family Phasianidae (179 species) are affected by tail length and diet (herbivore).
Datasets for example 1
<- read.csv(here("data", "bird body mass", "accipitridae_sampled.csv"))
dat <- dat %>%
dat mutate(across(c(Trophic.Level, Trophic.Niche, Primary.Lifestyle, Migration, Habitat, Species.Status), as.factor))
# rename: numbers to descriptive name
$Migration_ordered <- factor(dat$Migration, levels = c("sedentary", "partially_migratory", "migratory"), ordered = TRUE)
dat<- dat %>%
dat mutate(Habitat.Density = case_when(
== 1 ~ "dense",
Habitat.Density == 2 ~ "semi-open",
Habitat.Density == 3 ~ "open",
Habitat.Density TRUE ~ as.character(Habitat.Density)
))
table(dat$Habitat.Density)
$logMass <- log(dat$Mass)
dathist(dat$logMass)
boxplot(logMass ~ Habitat.Density, data = dat)
summary(dat)
<- read.nexus(here("data", "bird body mass", "accipitridae_sampled.nex"))
trees <- trees[[1]]
tree
summary(dat)
Datasets for example 2
<- read.csv(here("data", "bird body mass", "9993spp_clearned.csv"))
dat <- dat %>%
dat mutate(across(c(Trophic.Level, Trophic.Niche,
Primary.Lifestyle, Migration, Habitat, Habitat.Density , Species.Status), as.factor),Habitat.Density = factor(Habitat.Density, ordered = TRUE))
<- dat %>%
Family_HD group_by(Family, Habitat.Density) %>%
tally() %>%
group_by(Family) %>%
filter(all(n >= 5)) %>%
ungroup()
<- dat %>%
Family_HD filter(Family %in% c("Phasianidae", "Ploceidae", "Sturnidae")) %>%
count(Family, Habitat.Density)
<- dat %>%
Phasianidae_dat filter(Family == "Phasianidae") %>%
mutate(Habitat.Density = case_when(
== 1 ~ "dense",
Habitat.Density == 2 ~ "semi-open",
Habitat.Density == 3 ~ "open"
Habitat.Density
),Habitat.Density = factor(Habitat.Density, ordered = TRUE)
)<- Phasianidae_dat %>%
Phasianidae_dat mutate(Habitat_Forest = ifelse(Habitat == "Forest", 1, 0),
Habitat_Forest = replace_na(Habitat_Forest, 0))
$cTail_length <- scale(log(Phasianidae_dat$Tail.Length), center = TRUE, scale = FALSE)
Phasianidae_dat
View(Phasianidae_dat)
<- read.nexus(here("data", "bird body mass", "Phasianidae.nex"))
Phasianidae_tree <- Phasianidae_tree[[1]] tree
Example 1
Intercept-only model
MCMCglmm
When using MCMCglmm for ordinal data, you can choose between two options for the family
parameter: "threshold"
and "ordinal"
. These options differ in how they handle residual variances and error distributions.
We generally recommend using family = "threshold"
for following reasons - directly corresponds to the standard probit regression with a variance of 1, no correcting of outputs is necessary, and it provides a more computationally efficient approach.
We show both models through this section;
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Migration_ordered ~ 1,
mcmcglmm_T1_1 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
)
)
system.time(
<- MCMCglmm(Migration_ordered ~ 1,
mcmcglmm_O1_1 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
) )
For ordinal model, we need the correction;
<- 1
c2 <- model$Sol / sqrt(1+c2) # for fixed effects
res_1 <- model$VCV / (1+c2) # for variance components res_2
summary(mcmcglmm_T1_1) # 95%HPD Interval
Iterations = 120001:519601
Thinning interval = 400
Sample size = 1000
DIC: 297.9054
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 1.43 1.196e-06 4.766 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Migration_ordered ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 0.3178 -0.7651 1.5749 1000 0.464
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitMigration_ordered.1 1.011 0.7089 1.408 1000
posterior_summary(mcmcglmm_T1_1$VCV) # 95%CI
Estimate Est.Error Q2.5 Q97.5
Phylo 1.429685 2.187129 0.0020836 6.25389
units 1.000000 0.000000 1.0000000 1.00000
posterior_summary(mcmcglmm_T1_1$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 0.3178127 0.5848262 -0.8350972 1.513238
posterior_summary(mcmcglmm_T1_3$CP)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_1) # 95%HPD Interval
Iterations = 120001:519601
Thinning interval = 400
Sample size = 1000
DIC: 267.7801
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 2.94 1.447e-05 10.78 779.5
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Migration_ordered ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 0.4319 -1.0821 2.2693 889.5 0.456
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitMigration_ordered.1 1.43 0.9309 1.959 817.1
posterior_summary(mcmcglmm_O1_1$VCV) # 95%CI
Estimate Est.Error Q2.5 Q97.5
Phylo 2.940261 4.749804 0.003052149 13.92029
units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_1$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 0.4319314 0.7973933 -1.170937 2.245297
<- 1
c2 <- mcmcglmm_O1_1$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O1_1$VCV / (1+c2) # for variance components
res_2
<- mcmcglmm_O1_1$CP / sqrt(1+c2) # for cutpoint
res_3 posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 0.3054216 0.5638422 -0.8279773 1.587665
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
Phylo 1.470131 2.374902 0.001526075 6.960145
units 0.500000 0.000000 0.500000000 0.500000
posterior_summary(res_3)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitMigration_ordered.1 1.011203 0.1934441 0.7019355 1.462073
the ordinal
models show a smaller effect size compared to the threshold model, but the difference is minor and should not be an issue. The convergences appear to be satisfactory for both models. Also, it is well-known that ordinal models in MCMCglmm tend to be more difficult to converge compared to the threshold models. Despite this, the results obtained from both models are largely consistent, showing minimal difference. As you can see here, residual variance is fixed 1 - the model did not estimate the residual variance (R-structure
).
Look at the output more closely - only one cutpoint is provided although our response variable is three ordered categorical traits! As we have explained in main text, MCMCglmm implicitly assumes that the first cut point is always set to 0, making the intercept represent the centre of the distribution.
brms
The family argument is set to cumulative()
with link = "probit"
, which uses the standard probit parametrisation where the residual variance is fixed at 1. Since this is the standard parametrisation for ordinal probit regression, no correcting of coefficients is necessary.
<- ape::vcv.phylo(tree, corr = TRUE)
A
<- default_prior(
default_priors ~ 1 + (1 | gr(Phylo, cov = A)),
Migration_ordered data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
<- brm(
brms_OT1_1 formula = Migration_ordered ~ 1 + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 19000,
warmup =16000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
) )
summary(brms_OT1_1)
Family: cumulative
Links: mu = probit; disc = identity
Formula: Migration_ordered ~ 1 + (1 | gr(Phylo, cov = A))
Data: dat (Number of observations: 136)
Draws: 2 chains, each with iter = 20000; warmup = 10000; thin = 1;
total post-warmup draws = 20000
Multilevel Hyperparameters:
~Phylo (Number of levels: 136)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.85 0.61 0.04 2.28 1.00 1594 3377
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -0.32 0.47 -1.36 0.61 1.00 12133 7267
Intercept[2] 0.66 0.47 -0.24 1.75 1.00 11680 8235
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
In brms, now you can see the two intercepts and no cutpoint in the output! In fact, the intercept is described each cutpoint. However, the results (how many species fall into each category) show little difference between MCMCglmm and brms, as can be seen below.
Probabilities
We can obtain the probabilities for each category (migrate level: sedentary, partially migratory, migratory) using pnorm()
. We made simple functions to calculate these and use them for the threshold models in MCMcglmm and brms.
# l = intercept
<- function(cutpoint0, cutpoint1, l) {
calculate_probabilities
<- pnorm(cutpoint0 - l)
category1_prob <- pnorm(cutpoint1 - l) - pnorm(cutpoint0 - l)
category2_prob <- 1 - pnorm(cutpoint1 - l)
category3_prob
return(c(sedentary = category1_prob,
partially_migratory = category2_prob,
migratory = category3_prob)
)
}
<- calculate_probabilities(0, 1.01, 0.3178)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
sedentary partially_migratory migratory
0.3753183 0.3802758 0.2444059
<- calculate_probabilities(-0.32, 0.66, 0)
probabilities_brms print(probabilities_brms)
sedentary partially_migratory migratory
0.3744842 0.3708889 0.2546269
The results obtained in MCMCglmm are as follows: Sedentary: 0.3753183, Partially migratory: 0.3802758, and Migratory: 0.2444059. In brms, the results are Sedentary: 0.3744842, Partially migratory: 0.3708889, and Migratory: 0.2546269. As shown, although MCMCglmm and brms use parametrisations of and , respectively, the results are very similar (see Box 2 in main text).
Phylogenetic signals
# MCMCglmm
<- ((mcmcglmm_T1_1$VCV[, "Phylo"]) / (mcmcglmm_T1_1$VCV[, "Phylo"] + 1))
phylo_signal_mcmcglmm_T %>% mean() phylo_signal_mcmcglmm_T
[1] 0.4172334
%>% quantile(probs = c(0.025, 0.5, 0.975)) phylo_signal_mcmcglmm_T
2.5% 50% 97.5%
0.002079267 0.430888403 0.862142904
<- ((mcmcglmm_O1_1$VCV[, "Phylo"]/(1+c2)) / (mcmcglmm_O1_1$VCV[, "Phylo"]/(1+c2) + 1))
phylo_signal_mcmcglmm_O %>% mean() phylo_signal_mcmcglmm_O
[1] 0.4080353
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mcmcglmm_O
2.5% 50% 97.5%
0.001523749 0.417232582 0.874374073
# brms
<- brms_OT1_1 %>% as_tibble() %>%
phylo_signal_brms_OT ::select(Sigma_phy = sd_Phylo__Intercept) %>%
dplyrmutate(lambda = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda)
%>% mean() phylo_signal_brms_OT
[1] 0.3692979
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_brms_OT
2.5% 50% 97.5%
0.001264342 0.359406642 0.838180812
In both MCMCglmm (threshold and ordinal models) and brms, the phylogenetic signal is estimated to be moderate. The moderately high phylogenetic signal suggests that closely related species tend to exhibit similar migratory behaviours, indicating that phylogeny plays a role in shaping these traits. However, this also implies that additional factors, such as environmental or ecological influences, likely contribute to the variation in migratory behavior. This balance between phylogenetic constraints and external influences may reflect evolutionary flexibility in migratory strategies among species.
One continuous explanatory variable model
The model using MCMCglmm
is
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Migration_ordered ~ logMass,
mcmcglmm_T1_2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*45,
thin = 10*45,
burnin = 3000*45
)
)
system.time(
<- MCMCglmm(Migration_ordered ~ logMass,
mcmcglmm_O1_2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
) )
For brms
…
<- ape::vcv.phylo(tree,corr = TRUE)
A
<- default_prior(
default_priors2 ~ logMass + (1 | gr(Phylo, cov = A)),
Migration_ordered data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
<- brm(
brms_OT1_2 formula = Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
) )
# MCMCglmm
summary(mcmcglmm_T1_2)
Iterations = 135001:584551
Thinning interval = 450
Sample size = 1000
DIC: 300.7449
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 2.068 6.389e-06 7.225 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Migration_ordered ~ logMass
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 0.88928 -1.69994 3.49007 1000 0.436
logMass -0.08033 -0.37129 0.21982 1000 0.550
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitMigration_ordered.1 1.052 0.6441 1.469 897.7
posterior_summary(mcmcglmm_T1_2$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 0.88927720 1.2935980 -1.5704745 3.6731840
logMass -0.08033062 0.1488659 -0.3928655 0.2076132
posterior_summary(mcmcglmm_T1_2$VCV)
Estimate Est.Error Q2.5 Q97.5
Phylo 2.068366 3.804493 0.00210611 11.97129
units 1.000000 0.000000 1.00000000 1.00000
posterior_summary(mcmcglmm_T1_3$CP)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_2) # 95%HPD Interval
Iterations = 180001:779401
Thinning interval = 600
Sample size = 1000
DIC: 268.7424
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 4.133 8.171e-05 14.26 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Migration_ordered ~ logMass
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.2622 -2.7953 4.3184 1000 0.432
logMass -0.1087 -0.5494 0.3449 1000 0.574
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitMigration_ordered.1 1.489 0.9695 2.098 1000
posterior_summary(mcmcglmm_O1_2$VCV) # 95%CI
Estimate Est.Error Q2.5 Q97.5
Phylo 4.132637 6.62053 0.004598829 19.22872
units 1.000000 0.00000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_2$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 1.262162 1.8674403 -2.1994721 5.0542830
logMass -0.108659 0.2176232 -0.5751705 0.3314989
<- 1
c2 <- mcmcglmm_O1_2$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O1_2$VCV / (1+c2) # for variance components
res_2 <- mcmcglmm_O1_2$CP / sqrt(1+c2)
res_3 posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 0.89248312 1.3204797 -1.555262 3.5739178
logMass -0.07683353 0.1538828 -0.406707 0.2344051
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
Phylo 2.066318 3.310265 0.002299414 9.614361
units 0.500000 0.000000 0.500000000 0.500000
posterior_summary(res_3)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitMigration_ordered.1 1.053063 0.2218614 0.7173326 1.575309
#brms
summary(brms_OT1_2)
Family: cumulative
Links: mu = probit; disc = identity
Formula: Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A))
Data: dat (Number of observations: 136)
Draws: 2 chains, each with iter = 20000; warmup = 10000; thin = 1;
total post-warmup draws = 20000
Multilevel Hyperparameters:
~Phylo (Number of levels: 136)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.93 0.66 0.04 2.45 1.00 1192 3611
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -0.85 1.07 -3.16 1.26 1.00 15424 9201
Intercept[2] 0.15 1.06 -2.03 2.33 1.00 16388 10923
logMass -0.07 0.13 -0.34 0.19 1.00 18410 10854
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Both models (brms and MCMCglmm) indicate body mass (logMass) does not have a statistically significant effect on the migration order.
<- calculate_probabilities(0, 0.88, 1.05)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
sedentary partially_migratory migratory
0.1468591 0.2856460 0.5674949
<- calculate_probabilities(-0.85, 0.15, 0)
probabilities_brms print(probabilities_brms)
sedentary partially_migratory migratory
0.1976625 0.3619551 0.4403823
The coefficient of logMass is negative, but the actual results show an increase in the probability of being migratory. This is likely because the effect of logMass is not statistically significant, and due to wide credible intervals, the probabilities may shift due to sampling variability. Therefore, the results should be interpreted with caution, as they may reflect uncertainty in the data rather than a true underlying effect of logMass on migratory tendencies.
One continuous and one categorical explanatory variable model
For the model MCMCglmm the following - here, we use the gelman,prior()
function for prior setting for fixed effects to ensure weakly informative prior that help with model convergence. You can find further information in Chapter 4. Nominal models.
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(B = list(mu = rep(0,4),
prior1 V = gelman.prior(~logMass + Habitat.Density, data = dat, scale = sqrt(1+1))),
R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Migration_ordered ~ logMass + Habitat.Density,
mcmcglmm_T1_3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
system.time(
<- MCMCglmm(Migration_ordered ~ logMass + Habitat.Density,
mcmcglmm_O1_3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
) )
For brms,
<- ape::vcv.phylo(tree, cor = TRUE)
A
<- default_prior(
default_priors3 ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
Migration_ordered data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
brm_OT1_3 formula = Migration_ordered ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 6500,
warmup = 5500,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
) )
# MCMCglmm
summary(mcmcglmm_T1_3)
Iterations = 180001:779401
Thinning interval = 600
Sample size = 1000
DIC: 289.3043
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 1.988 4.136e-07 6.227 758.7
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Migration_ordered ~ logMass + Habitat.Density
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.3097 -1.1727 3.6674 1000.0 0.254
logMass -0.2664 -0.5703 0.0479 891.2 0.058 .
Habitat.Densityopen 1.3154 0.4916 2.0664 1054.1 <0.001 ***
Habitat.Densitysemi-open 0.3955 -0.3020 0.9998 1000.0 0.218
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitMigration_ordered.1 1.116 0.6674 1.503 1022
posterior_summary(mcmcglmm_T1_3$VCV)
Estimate Est.Error Q2.5 Q97.5
Phylo 1.987738 4.411821 0.009463594 9.04495
units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_T1_3$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 1.3096841 1.2767628 -1.0290056 3.809598026
logMass -0.2664037 0.1660313 -0.6334123 0.007099615
Habitat.Densityopen 1.3153548 0.4385812 0.6134014 2.247115510
Habitat.Densitysemi-open 0.3954618 0.3425752 -0.2200316 1.117789403
posterior_summary(mcmcglmm_T1_3$CP)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitMigration_ordered.1 1.116292 0.2420588 0.7754895 1.677832
summary(mcmcglmm_O1_3) # 95%HPD Interval
Iterations = 450001:1948501
Thinning interval = 1500
Sample size = 1000
DIC: 255.0597
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 4.142 1.314e-06 14.07 839
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Migration_ordered ~ logMass + Habitat.Density
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.87956 -1.61645 5.74879 1000.0 0.262
logMass -0.38074 -0.82066 0.06748 1112.4 0.078 .
Habitat.Densityopen 1.85563 0.72168 3.01111 842.4 <0.001 ***
Habitat.Densitysemi-open 0.58208 -0.29301 1.48943 1000.0 0.190
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitMigration_ordered.1 1.579 1.064 2.204 885.1
posterior_summary(mcmcglmm_O1_3$VCV) # 95%CI
Estimate Est.Error Q2.5 Q97.5
Phylo 4.142472 9.805007 0.03628788 19.59177
units 1.000000 0.000000 1.00000000 1.00000
posterior_summary(mcmcglmm_O1_3$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 1.8795575 1.8603927 -1.5225125 6.0417116
logMass -0.3807427 0.2337957 -0.8716018 0.0461751
Habitat.Densityopen 1.8556261 0.6258944 0.8289186 3.1677050
Habitat.Densitysemi-open 0.5820800 0.4612845 -0.2342549 1.5671716
<- 1
c2 <- mcmcglmm_O1_3$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O1_3$VCV / (1+c2) # for variance components
res_2 <- mcmcglmm_O1_3$CP / sqrt(1+c2)
res_3 posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 1.3290479 1.3154963 -1.0765789 4.27213525
logMass -0.2692257 0.1653185 -0.6163155 0.03265073
Habitat.Densityopen 1.3121258 0.4425742 0.5861339 2.23990572
Habitat.Densitysemi-open 0.4115927 0.3261774 -0.1656433 1.10815765
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
Phylo 2.071236 4.902504 0.01814394 9.795886
units 0.500000 0.000000 0.50000000 0.500000
posterior_summary(res_3)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitMigration_ordered.1 1.116852 0.2491617 0.7783082 1.6727
# brms
summary(brms_OT1_3)
Family: cumulative
Links: mu = probit; disc = identity
Formula: Migration_ordered ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A))
Data: dat (Number of observations: 136)
Draws: 2 chains, each with iter = 6500; warmup = 5500; thin = 1;
total post-warmup draws = 2000
Multilevel Hyperparameters:
~Phylo (Number of levels: 136)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.91 0.58 0.08 2.31 1.01 231 460
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept[1] -1.34 1.10 -3.63 0.73 1.00 1364
Intercept[2] -0.27 1.09 -2.45 1.85 1.00 1523
logMass -0.26 0.14 -0.56 0.01 1.00 1300
Habitat.Densityopen 1.25 0.38 0.59 2.08 1.00 533
Habitat.DensitysemiMopen 0.37 0.31 -0.20 1.01 1.00 1236
Tail_ESS
Intercept[1] 1095
Intercept[2] 1156
logMass 1172
Habitat.Densityopen 941
Habitat.DensitysemiMopen 926
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The species living in open environments have significantly higher migration levels compared to those living in dense environments. No significant difference in migration levels was observed between species living in semi-open and dense environments.logMass
does not have a significant impact on migration levels, suggesting that Habitat.Density
is the primary factor influencing migration levels.
<- calculate_probabilities(0, 1.116, 1.3097)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
sedentary partially_migratory migratory
0.09514867 0.32805672 0.57679460
<- calculate_probabilities(-1.33, -0.26, 0)
probabilities_brms print(probabilities_brms)
sedentary partially_migratory migratory
0.09175914 0.30567275 0.60256811
The probability of the Migratory category is the highest, around 60%. This reflects the higher migration likelihood, especially for species living in open environments, which leads to a higher probability of being migratory.
Example 2
The second example is similar to the first one, but we will use a different dataset and different explanatory variables. The response variable is Habitat.Density
(1 = Dense habitats, 2 = Semi-open habitats, 3 = Open habitats) and the explanatory variables are cTail_length
(continuous) and Diet
(categorical).
Intercept only model
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Habitat.Density ~ 1,
mcmcglmm_T2_1 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*100,
thin = 10*10,
burnin = 3000*100
)
)
system.time(
<- MCMCglmm(Habitat.Density ~ 1,
mcmcglmm_O2_1 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*40,
thin = 10*40,
burnin = 3000*40
)
)
<- ape::vcv.phylo(tree, corr = TRUE)
A
<- default_prior(
default_priors ~ 1 + (1 | gr(Phylo, cov = A)),
Habitat.Density data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
<- brm(
brms_O2_1 formula = Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 19000,
warmup =16000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.99),
) )
# MCMCglmm
summary(mcmcglmm_T2_1)
Iterations = 195001:844351
Thinning interval = 650
Sample size = 1000
DIC: 280.497
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 8.552 1.018 20.69 726.6
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Habitat.Density ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.2226 -3.6231 2.5774 1000 0.9
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitHabitat.Density.1 1.609 1.055 2.36 829.5
posterior_summary(mcmcglmm_T2_1$VCV)
Estimate Est.Error Q2.5 Q97.5
Phylo 8.552458 6.42742 2.217233 24.88162
units 1.000000 0.00000 1.000000 1.00000
posterior_summary(mcmcglmm_T2_1$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.2226369 1.574717 -3.545592 2.680834
summary(mcmcglmm_O2_1) # 95%HPD Interval
Iterations = 1500001:6495001
Thinning interval = 5000
Sample size = 1000
DIC: 244.2456
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 15.59 2.906 36.05 844
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Habitat.Density ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.210 -3.987 4.017 1000 0.904
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitHabitat.Density.1 2.221 1.418 3.062 1000
posterior_summary(mcmcglmm_O2_1$VCV) # 95%CI
Estimate Est.Error Q2.5 Q97.5
Phylo 15.58885 10.86135 3.928927 43.26684
units 1.00000 0.00000 1.000000 1.00000
posterior_summary(mcmcglmm_O2_1$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.2100358 2.077207 -4.373171 3.795935
<- 1
c2 <- mcmcglmm_T2_1$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_T2_1$VCV / (1+c2) # for variance components
res_2 posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.157428 1.113493 -2.507112 1.895636
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
Phylo 4.276229 3.21371 1.108617 12.44081
units 0.500000 0.00000 0.500000 0.50000
# brms
summary(brms_OT2_1)
Family: cumulative
Links: mu = probit; disc = identity
Formula: Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A))
Data: Phasianidae_dat (Number of observations: 179)
Draws: 2 chains, each with iter = 6000; warmup = 4000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Phylo (Number of levels: 179)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.54 0.70 1.45 4.20 1.01 542 1179
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -0.21 1.09 -2.40 1.93 1.00 2080 2257
Intercept[2] 1.33 1.10 -0.75 3.58 1.00 2219 2312
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Phylogenetic signals
# MCMCglmm
<- ((mcmcglmm_T2_1$VCV[, "Phylo"]) / (mcmcglmm_T2_1$VCV[, "Phylo"] + 1))
phylo_signal_mcmcglmm_T2 %>% mean() phylo_signal_mcmcglmm_T2
[1] 0.8578778
%>% quantile(probs = c(0.025, 0.5, 0.975)) phylo_signal_mcmcglmm_T2
2.5% 50% 97.5%
0.6891739 0.8711971 0.9613625
<- ((mcmcglmm_O2_1$VCV[, "Phylo"]/(1+c2)) / (mcmcglmm_O2_1$VCV[, "Phylo"]/(1+c2) + 1))
phylo_signal_mcmcglmm_O2 %>% mean() phylo_signal_mcmcglmm_O2
[1] 0.8513276
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mcmcglmm_O2
2.5% 50% 97.5%
0.6626708 0.8640197 0.9558175
# brms
<- brms_OT2_1 %>% as_tibble() %>%
phylo_signal_brms_OT2 ::select(Sigma_phy = sd_Phylo__Intercept) %>%
dplyrmutate(lambda = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda)
%>% mean() phylo_signal_brms_OT2
[1] 0.8456712
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_brms_OT2
2.5% 50% 97.5%
0.6783932 0.8563418 0.9463123
Summary All three models (MCMCglmm with threshold, MCMCglmm with ordinal, and brms with cumulative probit) estimate high phylogenetic signal in Habitat.Density
(\(\lambda \approx 0.85\)), suggesting strong phylogenetic structuring in habitat density preferences within Phasianidae.
One continuous explanatory variable model
The model using MCMCglmm
is
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Habitat.Density ~ logMass,
mcmcglmm_T2_2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*250,
thin = 10*25,
burnin = 3000*250
)
)
system.time(
<- MCMCglmm(Habitat.Density ~ logMass,
mcmcglmm_O2_2 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
<- ape::vcv.phylo(tree,corr = TRUE)
A
<- default_prior(
default_priors2 ~ logMass + (1 | gr(Phylo, cov = A)),
Habitat.Density data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
# Fit the model
system.time(
<- brm(
brms_OT2_2 formula = Migration_ordered ~ logMass + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
) )
# MCMCglmm
summary(mcmcglmm_T1_2)
Iterations = 135001:584551
Thinning interval = 450
Sample size = 1000
DIC: 300.7449
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 2.068 6.389e-06 7.225 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Migration_ordered ~ logMass
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 0.88928 -1.69994 3.49007 1000 0.436
logMass -0.08033 -0.37129 0.21982 1000 0.550
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitMigration_ordered.1 1.052 0.6441 1.469 897.7
posterior_summary(mcmcglmm_T1_2$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 0.88927720 1.2935980 -1.5704745 3.6731840
logMass -0.08033062 0.1488659 -0.3928655 0.2076132
posterior_summary(mcmcglmm_T1_2$VCV)
Estimate Est.Error Q2.5 Q97.5
Phylo 2.068366 3.804493 0.00210611 11.97129
units 1.000000 0.000000 1.00000000 1.00000
summary(mcmcglmm_O1_2) # 95%HPD Interval
Iterations = 180001:779401
Thinning interval = 600
Sample size = 1000
DIC: 268.7424
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 4.133 8.171e-05 14.26 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Migration_ordered ~ logMass
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.2622 -2.7953 4.3184 1000 0.432
logMass -0.1087 -0.5494 0.3449 1000 0.574
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitMigration_ordered.1 1.489 0.9695 2.098 1000
posterior_summary(mcmcglmm_O1_2$VCV) # 95%CI
Estimate Est.Error Q2.5 Q97.5
Phylo 4.132637 6.62053 0.004598829 19.22872
units 1.000000 0.00000 1.000000000 1.00000
posterior_summary(mcmcglmm_O1_2$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 1.262162 1.8674403 -2.1994721 5.0542830
logMass -0.108659 0.2176232 -0.5751705 0.3314989
<- 1
c2 <- mcmcglmm_O1_2$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O1_2$VCV / (1+c2) # for variance components
res_2 posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 0.89248312 1.3204797 -1.555262 3.5739178
logMass -0.07683353 0.1538828 -0.406707 0.2344051
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
Phylo 2.066318 3.310265 0.002299414 9.614361
units 0.500000 0.000000 0.500000000 0.500000
#brms
summary(brms_OT2_1)
Family: cumulative
Links: mu = probit; disc = identity
Formula: Habitat.Density ~ 1 + (1 | gr(Phylo, cov = A))
Data: Phasianidae_dat (Number of observations: 179)
Draws: 2 chains, each with iter = 6000; warmup = 4000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Phylo (Number of levels: 179)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.54 0.70 1.45 4.20 1.01 542 1179
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -0.21 1.09 -2.40 1.93 1.00 2080 2257
Intercept[2] 1.33 1.10 -0.75 3.58 1.00 2219 2312
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
summary
No significant effect of body mass on habitat density: In all models, the posterior estimates of the slope for
logMass
were close to zero, with wide credible intervals that include zero. This suggests no strong evidence that body size influences habitat density preference in Phasianidae.Moderate phylogenetic signal: Phylogenetic variance was estimated as non-negligible across models (e.g., \(\lambda \approx 0.67–0.80\) after transformation), indicating moderate phylogenetic structuring in habitat density even after accounting for body mass.
One continuous and one categorical explanatory variable model
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo <- list(
prior1 R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)))
system.time(
<- MCMCglmm(Habitat.Density ~ logMass + Habitat.Density,
mcmcglmm_T2_3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "threshold",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
system.time(
<- MCMCglmm(Habitat.Density ~ logMass + Habitat.Density,
mcmcglmm_O2_3 random = ~ Phylo,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "ordinal",
data = dat,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
<- ape::vcv.phylo(tree, cor = TRUE)
A
<- default_prior(
default_priors3 ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
Habitat.Density data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
brm_OT2_3 formula = Habitat.Density ~ logMass + Habitat.Density + (1 | gr(Phylo, cov = A)),
data = dat,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 20000,
warmup = 10000,
thin = 1,
chain = 2,
core = 2,
control = list(adapt_delta = 0.99)
) )
# MCMCglmm
summary(mcmcglmm_T2_3)
Iterations = 1800001:7794001
Thinning interval = 6000
Sample size = 1000
DIC: 264.3581
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 13.42 1.332 33.11 869.4
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Habitat.Density ~ cTail_length + IsHarbivore
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -1.2636 -5.2133 2.9340 1000 0.498
cTail_length 0.5337 -0.4854 1.6519 1000 0.302
IsHarbivore 1.0925 0.1875 2.1084 1100 0.012 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitHabitat.Density.1 1.838 1.142 2.773 1000
posterior_summary(mcmcglmm_T1_3$VCV)
Estimate Est.Error Q2.5 Q97.5
Phylo 1.987738 4.411821 0.009463594 9.04495
units 1.000000 0.000000 1.000000000 1.00000
posterior_summary(mcmcglmm_T1_3$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 1.3096841 1.2767628 -1.0290056 3.809598026
logMass -0.2664037 0.1660313 -0.6334123 0.007099615
Habitat.Densityopen 1.3153548 0.4385812 0.6134014 2.247115510
Habitat.Densitysemi-open 0.3954618 0.3425752 -0.2200316 1.117789403
summary(mcmcglmm_O2_3) # 95%HPD Interval
Iterations = 1800001:7794001
Thinning interval = 6000
Sample size = 1000
DIC: 223.0502
G-structure: ~Phylo
post.mean l-95% CI u-95% CI eff.samp
Phylo 24.41 2.961 64.89 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: Habitat.Density ~ cTail_length + IsHarbivore
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -1.6861 -8.1646 3.3928 1000.0 0.520
cTail_length 0.7124 -0.6599 2.2055 987.7 0.266
IsHarbivore 1.5078 0.1850 2.8961 986.1 0.010 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitHabitat.Density.1 2.536 1.54 3.812 855.9
posterior_summary(mcmcglmm_O2_3$VCV) # 95%CI
Estimate Est.Error Q2.5 Q97.5
Phylo 24.40718 24.4568 5.150294 82.38519
units 1.00000 0.0000 1.000000 1.00000
posterior_summary(mcmcglmm_O2_3$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -1.6861096 2.8536673 -8.1735105 3.363553
cTail_length 0.7124418 0.7462599 -0.5101511 2.467996
IsHarbivore 1.5078371 0.7204467 0.2713007 3.103070
<- 1
c2 <- mcmcglmm_O2_3$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mcmcglmm_O2_3$VCV / (1+c2) # for variance components
res_2 posterior_summary(res_1)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -1.1922596 2.0178475 -5.7795447 2.378391
cTail_length 0.5037724 0.5276855 -0.3607313 1.745137
IsHarbivore 1.0662019 0.5094327 0.1918386 2.194202
posterior_summary(res_2)
Estimate Est.Error Q2.5 Q97.5
Phylo 12.20359 12.2284 2.575147 41.1926
units 0.50000 0.0000 0.500000 0.5000
# brms
summary(brms_OT2_3)
Family: cumulative
Links: mu = probit; disc = identity
Formula: Habitat.Density ~ cTail_length + IsHarbivore + (1 | gr(Phylo, cov = A))
Data: Phasianidae_dat (Number of observations: 179)
Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Phylo (Number of levels: 179)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.90 0.92 1.57 5.05 1.00 428 833
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] 0.56 1.24 -1.96 3.00 1.00 1655 2098
Intercept[2] 2.25 1.28 -0.18 4.85 1.00 1532 1706
cTail_length 0.46 0.48 -0.39 1.50 1.00 1244 1370
IsHarbivore 0.98 0.45 0.18 1.96 1.00 1524 1594
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Summary
Diet is positively associated with habitat: The posterior estimates of the slope for
Diet
were positive and significant across all models, indicating that species with a herbivorous diet tend to occupy open habitats compared to those with other diets.No significant effect of body mass on habitat density: The posterior estimates of the slope for
logMass
were close to zero, with wide credible intervals that include zero, suggesting no strong evidence that body size influences habitat density preference in Phasianidae.
probabilities
This probabilities section uses the posterior summaries of fixed effects (linear predictors) and cut points from each of the three models (intercept-only, one predictor, two predictors) to calculate the predicted probabilities of each category in Habitat.Density.
<- function(cutpoint0, cutpoint1, l) {
calculate_probabilities2
<- pnorm(cutpoint0 - l)
category1_prob <- pnorm(cutpoint1 - l) - pnorm(cutpoint0 - l)
category2_prob <- 1 - pnorm(cutpoint1 - l)
category3_prob
return(c(Dense_habitats = category1_prob,
Semi_open_habitats = category2_prob,
Open_habitats = category3_prob)
)
}<- calculate_probabilities2(0, 1.01, 0.3178)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
Dense_habitats Semi_open_habitats Open_habitats
0.3753183 0.3802758 0.2444059
<- calculate_probabilities2(-0.32, 0.66, 0)
probabilities_brms print(probabilities_brms)
Dense_habitats Semi_open_habitats Open_habitats
0.3744842 0.3708889 0.2546269
<- calculate_probabilities2(0, 1.052, 0.88928)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
Dense_habitats Semi_open_habitats Open_habitats
0.1869263 0.3777042 0.4353694
<- calculate_probabilities2(-0.85, 0.15, 0)
probabilities_brms print(probabilities_brms)
Dense_habitats Semi_open_habitats Open_habitats
0.1976625 0.3619551 0.4403823
<- calculate_probabilities2(0, 1.116, 1.3097)
probabilities_mcmcglmm print(probabilities_mcmcglmm)
Dense_habitats Semi_open_habitats Open_habitats
0.09514867 0.32805672 0.57679460
<- calculate_probabilities2(-1.33, -0.26, 0)
probabilities_brms print(probabilities_brms)
Dense_habitats Semi_open_habitats Open_habitats
0.09175914 0.30567275 0.60256811
4. Nominal models
When the response variable has more than two categories and un-ordered, it is treated as a nominal variable.
Prior setting
Sometimes, model convergence can be challenging, particularly in discrete models. In such cases, it is time to take a step forward - changing the default prior to a custom prior (i.e. appropriately informative prior, not over-informed prior) can improve performance and convergence.
Fixed Effects
For fixed effects, custom prior adjustments are typically not necessary. Because data usually provides enough information to estimate fixed effects accurately, so the prior does not have as much effect. Fixed effect priors can be a good thing if you know some biological limits (Gaussian models - e.g. bird morphology - some values are not realistic). Also, in MCMCglmm
, the gelman.prior
function can be particularly useful for logit and probit models. This function allows users to define a reasonable prior covariance matrix for the fixed effects, enhancing model stability and convergence. Specifically…
- For logit regression, where the variance of the error distribution (with mean = 0) is pi^2 / 3, the prior accounts for this variance to remain flat on the probability scale.
- For probit regression, where the variance of the error distribution is 1, the prior is adjusted to achieve a similarly flat distribution.
By incorporating these considerations, gelman.prior
avoids excessive regularisation and ensures that the prior reflects the data while remaining weakly informative, as Gelman et al. (2008) recommended.
Random effects
Convergence issues often arise with random effects because variance components are inherently more challenging to estimate than fixed effects. This is because variance components represent variability around the mean, which requires a larger sample size to be estimated precisely. In contrast, fixed effects estimate the mean (e.g. the overall mean in an intercept-only model), which is more straightforward and usually requires a smaller sample size to stabilise. In such cases, using more informative priors can help, but careful consideration is necessary to balance convergence improvement and interpret model output.
Key points
Check autocorrelation: An auto-correlation plot helps assess the mixing of the chain. It shows the relationship between current and past samples in the MCMC chain. High auto-correlation (i.e., slow decay) indicates poor mixing because the chain is highly correlated across iterations, meaning that it is not exploring the parameter space efficiently. Ideally, the auto-correlation should drop to near zero quickly, indicating that the chain is moving smoothly and independently across the parameter space. - Lag (X-axis): The interval between the current and past samples. - Autocorrelation coefficient (Y-axis): Indicates the strength of the correlation, ranging from -1 to 1. A value close to 0 (excluding lag 0) is ideal.
In the context of MCMC, it would be important to differentiate between convergence and mixing, as these two terms are often conflated, but they refer to distinct aspects of the MCMC process. Convergence refers to whether the Markov chains have reached the true posterior distribution. For MCMC to be valid, the chains should eventually explore the entire parameter space of the target distribution. In many cases, this is evaluated by running multiple chains and checking whether they all converge to the same distribution (e.g., Rhat in brms
). A value of Rhat close to 1.0 suggests that the chains have converged to the same posterior distribution.
Mixing refers to how well the Markov chain explores the parameter space during the sampling process. Good mixing means that the chain moves smoothly across the parameter space, avoiding long periods of stagnation in any particular region. Poor mixing can result in the chain getting stuck in one area of the parameter space and not exploring others effectively.
In MCMCglmm, since you can run a single chain per single model, you cannot directly check convergence of one model using multiple chains (but you can compare multiple chains using multiple MCMCglmm models). Instead, auto-correlation plots can give you insights into the quality of the sampling.
Practical benefits of informative priors for random effects
Informative priors for random effects improve model convergence and reduce computation time by restricting the MCMC sampler to a smaller parameter space. Such priors can influence both the point estimates (often) and uncertainty (always) of random effects (i.e. variance and covariance). However, they typically have less impact on fixed effect estimates. This allows you to obtain fixed effect estimates more quickly while improving convergence by constraining random effects. If your primary interest lies in fixed effect estimates rather than random effects, you can take advantage of this approach.
Potential Problems with informative priors
Over-constrained parameters: Excessively narrow priors can provide us with incorrect parameter estimates by limiting them to a small range.
Small sample size issues: When sample sizes are small, informative priors may dominate the posterior distribution, overshadowing the data. This can lead to the model failing to capture the data’s variability and oversimplifies reality.
When you want to estimate random effects properly, you need to use uninformative prior. But as you already know, we are hard to get convergent… The things we can do are:
- increase iteration (this means increasing the number of samples from the posterior)
- use longer burn-in (
MCMCglmm
) / warm-up (brms
) period to ensure convergence - increase thinning in
MCMCglmm
- While thinning can enhance convergence inMCMCglmm
by reducing autocorrelation,brms
generally do not need to control thinning
Examples Here is example of uninformative vs informative prior…
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo
#uninformative
<- list(
prior1 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ trait -1,
mcmcglmm_m1_100 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior1,
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ trait -1,
mcmcglmm_m1_1000 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior1,
nitt = 13000*10000,
thin = 10*10000,
burnin = 3000*10000
)
)
# informative prior
<- list(
prior2 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 200,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ trait -1,
mcmcglmm_m2 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior2,
nitt = 13000*75,
thin = 10*75,
burnin = 3000*75
) )
You can see that the increasing iteration, thinning, and burn-in can promote increasing effect sampling size and improving mixing.
summary(mcmcglmm_m1_100)
Iterations = 300001:1299001
Thinning interval = 1000
Sample size = 1000
DIC: 259.4007
G-structure: ~us(trait):Phylo
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 31.18
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -12.08
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -12.08
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 104.23
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 1.252
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -70.340
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -70.340
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.118
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 83.25
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 59.37
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 59.37
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 408.78
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 37.85
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 41.96
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 41.96
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 28.75
R-structure: ~us(trait):units
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0
Location effects: Primary.Lifestyle ~ trait - 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
traitPrimary.Lifestyle.Insessorial -1.628 -10.272 7.019 148.9 0.662
traitPrimary.Lifestyle.Terrestrial 5.279 -7.006 19.131 125.6 0.314
summary(mcmcglmm_m1_1000)
Iterations = 3000001:12990001
Thinning interval = 10000
Sample size = 1000
DIC: 273.7709
G-structure: ~us(trait):Phylo
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 30.88
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -25.45
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -25.45
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 81.67
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 1.314
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -94.475
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -94.475
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.144
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 99.34
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 29.87
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 29.87
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 369.51
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 270.66
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 218.26
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 218.26
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 89.38
R-structure: ~us(trait):units
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0
Location effects: Primary.Lifestyle ~ trait - 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
traitPrimary.Lifestyle.Insessorial -1.3033 -5.3581 2.3037 1000 0.478
traitPrimary.Lifestyle.Terrestrial 0.9187 -3.7341 5.2263 1000 0.676
summary(mcmcglmm_m2)
Iterations = 225001:974251
Thinning interval = 750
Sample size = 1000
DIC: 298.0892
G-structure: ~us(trait):Phylo
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 6.0759
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.0885
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.0885
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 6.8348
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 0.2940
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.9723
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.9723
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.5880
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 12.1120
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 0.9838
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.9838
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 13.6006
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 731.3
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 1000.0
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 1000.0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 820.7
R-structure: ~us(trait):units
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0
Location effects: Primary.Lifestyle ~ trait - 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
traitPrimary.Lifestyle.Insessorial -0.3218 -3.7935 3.0988 1000.0 0.908
traitPrimary.Lifestyle.Terrestrial 2.2780 -1.3172 5.8322 888.4 0.180
autocorr.plot(mcmcglmm_m1_100$VCV)
autocorr.plot(mcmcglmm_m1_1000$VCV)
autocorr.plot(mcmcglmm_m2$VCV)
Explanation of dataset
Datasets for example 1
This dataset focuses on Turdidae species. A binary variable IsOmnivore based on trophic level is created.
<- read.nexus(here("data", "potential", "avonet", "trees.nex"))
trees <- trees[[1]]
tree <- read.csv(here("data", "potential", "avonet", "turdidae.csv"))
dat
# Check tree$tip.label matches dat$Scientific_name
<- setequal(tree$tip.label, dat$Phylo)
match_result
# Centering for continuous variables
<- dat %>%
dat mutate(
log_Mass_centered = scale(log(Mass), center = TRUE, scale = FALSE),
log_Tail_Length_centered = scale(log(Tail.Length), center = TRUE, scale = FALSE)
)
<- dat %>%
dat mutate(across(c(Trophic.Level, Trophic.Niche, Primary.Lifestyle, Migration, Habitat, Species.Status), as.factor))
$IsOmnivore <- ifelse(dat$Trophic.Level == "Omnivore", 1, 0) dat
Datasets for example 2
For Example 2, data for the Sylviidae (warblers) were extracted from the AVONET dataset. The habitat categories were reclassified, migratory status was converted into a binary variable, and body mass was log-transformed and centered, resulting in a clean dataset suitable for subsequent statistical modelling.
<- read.csv(here("data", "bird body mass", "9993spp_clearned.csv"))
dat <- dat %>%
dat mutate(across(c(Trophic.Level, Trophic.Niche,
Primary.Lifestyle, Migration, Habitat, Habitat.Density, Species.Status), as.factor),Habitat.Density = factor(Habitat.Density, ordered = TRUE))
<- dat %>%
dat mutate(
Habitat_Category = case_when(
%in% c("Desert", "Rock", "Grassland", "Shrubland") ~ "Arid_Open",
Habitat %in% c("Woodland", "Forest") ~ "Forested_Vegetated",
Habitat == "Human modified" ~ "Human-Modified",
Habitat %in% c("Riverine", "Coastal", "Marine", "Wetland") ~ "Aquatic_Coastal",
Habitat TRUE ~ "Other"
)
)
<- dat %>%
Family_habitat group_by(Family, Habitat_Category) %>%
tally() %>%
group_by(Family) %>%
ungroup()
<- dat %>%
Sylviidae_dat filter(Family == "Sylviidae") %>%
mutate(Habitat_Category = factor(Habitat_Category, ordered = FALSE))
<- Sylviidae_dat %>%
Sylviidae_dat mutate(IsMigrate = ifelse(Migration == "1", 1, 0)
)table(Sylviidae_dat$Habitat_Category)
Aquatic_Coastal Arid_Open Forested_Vegetated Other
34 90 168 2
<- Sylviidae_dat %>%
Sylviidae_dat mutate(Habitat_Category = recode(Habitat_Category,
"Aquatic_Coastal" = "Others",
"Other" = "Others"))
$cMass <- scale(log(Sylviidae_dat$Mass), center = TRUE, scale = FALSE)
Sylviidae_dat
<- read.nexus(here("data", "bird body mass", "Sylviidae.nex"))
Sylviidae_tree <- Sylviidae_tree[[1]] tree
Example 1
Intercept-only model
MCMCglmm
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(
prior5 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
<-MCMCglmm(Primary.Lifestyle ~ trait -1,
mcmcglmm_mn1_1 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
) )
brms
<- ape::vcv.phylo(tree, corr = TRUE)
A <- default_prior(Primary.Lifestyle ~ 1 + (1 |a| gr(Phylo, cov = A)),
priors_brms1 data = dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Primary.Lifestyle ~ 1 + (1 |a| gr(Phylo, cov = A)),
brms_mn1_1 data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
) )
As well as the binary model in MCMCglmm
, nominal model is need to correct the estimates obtained from MCMCglmm
as MCMCglmm
implements additive overdispersion GLMM fitted by MCMC sampling from the posterior distribution. Again, we can use the below code:
<- (16 * sqrt(3)/(15 * pi))^2
c2 <- (16 * sqrt(3)/(15 * pi))^2*(2/3) c2a
The results are…
summary(mcmcglmm_mn1_1)
Iterations = 300001:1299001
Thinning interval = 1000
Sample size = 1000
DIC: 295.756
G-structure: ~us(trait):Phylo
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 7.338
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.164
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.164
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 8.168
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 0.8372
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -5.3745
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -5.3745
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.1463
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 15.386
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 2.788
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 2.788
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 18.245
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 764.5
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 885.3
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 885.3
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 615.7
R-structure: ~us(trait):units
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0
Location effects: Primary.Lifestyle ~ trait - 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
traitPrimary.Lifestyle.Insessorial -0.4015 -4.0226 2.5627 1000 0.804
traitPrimary.Lifestyle.Terrestrial 1.6199 -1.2917 4.6445 1000 0.270
<- mcmcglmm_mn1_1$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn1_1$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn1_1$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_1$VCV[, 1] * mcmcglmm_mn1_1$VCV[, 4])/(1+c2a))
res_3_corr_phylo
summary(res_1)
Iterations = 300001:1299001
Thinning interval = 1000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
traitPrimary.Lifestyle.Insessorial -0.362 1.480 0.04680 0.04680
traitPrimary.Lifestyle.Terrestrial 1.460 1.385 0.04379 0.04379
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
traitPrimary.Lifestyle.Insessorial -3.475 -1.2921 -0.3067 0.5483 2.605
traitPrimary.Lifestyle.Terrestrial -1.125 0.5264 1.4485 2.3820 4.226
summary(res_2)
Iterations = 300001:1299001
Thinning interval = 1000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 5.9634
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.9456
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.9456
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 6.6372
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
SD
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 3.486
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 1.643
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.643
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 4.233
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.000
Naive SE
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 0.11023
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 0.05197
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.05197
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.13385
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.00000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.00000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.00000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.00000
Time-series SE
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 0.12608
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 0.05524
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.05524
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.17058
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.00000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.00000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.00000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.00000
2. Quantiles for each variable:
2.5%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 1.6277
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -4.7030
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -4.7030
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.7250
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
25%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 3.4946
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.7201
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.7201
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 3.8512
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
50%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 5.2197
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.7662
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.7662
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 5.6452
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
75%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 7.52933
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.06118
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.06118
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 8.20433
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.54176
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.27088
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.27088
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.54176
97.5%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 14.5903
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 2.0866
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 2.0866
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 17.3970
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
summary(res_3_corr_phylo)
Iterations = 300001:1299001
Thinning interval = 1000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
-0.144879 0.207344 0.006557 0.006916
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-0.51690 -0.28714 -0.14528 -0.01081 0.26479
summary(brms_mn1_1)
Family: categorical
Links: muInsessorial = logit; muTerrestrial = logit
Formula: Primary.Lifestyle ~ 1 + (1 | a | gr(Phylo, cov = A))
Data: dat (Number of observations: 173)
Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
total post-warmup draws = 10000
Multilevel Hyperparameters:
~Phylo (Number of levels: 173)
Estimate Est.Error
sd(muInsessorial_Intercept) 4.76 2.40
sd(muTerrestrial_Intercept) 5.76 3.37
cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.28 0.36
l-95% CI u-95% CI Rhat
sd(muInsessorial_Intercept) 1.96 10.81 1.00
sd(muTerrestrial_Intercept) 2.14 14.61 1.00
cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.88 0.46 1.00
Bulk_ESS Tail_ESS
sd(muInsessorial_Intercept) 1550 2181
sd(muTerrestrial_Intercept) 1215 2349
cor(muInsessorial_Intercept,muTerrestrial_Intercept) 858 1508
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
muInsessorial_Intercept -0.55 2.03 -4.86 3.32 1.00 4681
muTerrestrial_Intercept 1.30 2.29 -2.87 6.21 1.00 4028
Tail_ESS
muInsessorial_Intercept 5207
muTerrestrial_Intercept 4113
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
When comparing the results, the fixed effects estimated by both MCMCglmm and brms are similar. However, there are notable differences in the random effect variances and correlations. The variances estimated by brms show higher uncertainty, with wider 95% credible intervals, and the correlations are closer to zero. This discrepancy is likely due to the informative priors used in MCMCglmm to prioritize model convergence.
Phylogenetic signals
And phylogenetic signal is…
# average phylogenetic signal - both vs. left
## MCMCglmm
<- ((mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo"]/(1+c2a)) / (mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo"]/(1+c2a) + 1))
phylo_signalI_mcmcglmm %>% mean() phylo_signalI_mcmcglmm
[1] 0.8207548
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalI_mcmcglmm
2.5% 50% 97.5%
0.6194383 0.8392215 0.9358569
## brms
<- brms_mn1_1 %>% as_tibble() %>%
phylo_signalI_brms ::select(Sigma_phy = sd_Phylo__muInsessorial_Intercept) %>%
dplyrmutate(lambda_nominalI = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalI)
%>% mean() phylo_signalI_brms
[1] 0.9330662
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalI_brms
2.5% 50% 97.5%
0.7940292 0.9474234 0.9915224
# average phylogenetic signal - both vs. right
## MCMCglmm
<- ((mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo"]/(1+c2a)) / (mcmcglmm_mn1_1$VCV[, "traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo"]/(1+c2a) + 1))
phylo_signalT_mcmcglmm %>% mean() phylo_signalT_mcmcglmm
[1] 0.8336959
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalT_mcmcglmm
2.5% 50% 97.5%
0.6330234 0.8495150 0.9456432
## brms
<- brms_mn1_1 %>% as_tibble() %>%
phylo_signalT_brms ::select(Sigma_phy = sd_Phylo__muTerrestrial_Intercept) %>%
dplyrmutate(lambda_nominalT = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalT)
%>% mean() phylo_signalT_brms
[1] 0.9477057
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalT_brms
2.5% 50% 97.5%
0.8209225 0.9606255 0.9953342
Across both modelling frameworks, Primary.Lifestyle
categories show high phylogenetic signal.
For the Insessorial category:
MCMCglmm estimates \(\lambda \approx 0.84\) (95% CI: 0.62–0.94)
brms estimates \(\lambda \approx 0.95\) (95% CI: 0.79–0.99)
For the Terrestrial category:
MCMCglmm estimates \(\lambda \approx 0.85\) (95% CI: 0.63–0.95)
brms estimates \(\lambda \approx 0.96\) (95% CI: 0.82–1.00)
This indicates a strong phylogenetic signal in the distribution of Primary.Lifestyle
categories, suggesting that closely related species tend to share similar lifestyle traits.
One continuous explanatory variable model
MCMCglmm
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(
prior5 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ log_Tail_Length_centered:trait + trait -1,
mcmcglmm_mn1_2 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*200,
thin = 10*200,
burnin = 3000*200
) )
brms
<- default_prior(Primary.Lifestyle ~ log_Tail_Length_centered + (1 |a| gr(Phylo, cov = A)),
priors_brms2 data = dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Primary.Lifestyle ~ log_Tail_Length_centered + (1 |a| gr(Phylo, cov = A)),
brms_m1_2 data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms2,
iter = 10000,
warmup = 5000,
chains = 2,
thin = 1
) )
Output correction and summary
summary(mcmcglmm_mn1_2)
Iterations = 600001:2598001
Thinning interval = 2000
Sample size = 1000
DIC: 279.4446
G-structure: ~us(trait):Phylo
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 18.854
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.169
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.169
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 18.974
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 1.984
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -10.648
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -10.648
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 2.593
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 42.79
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 8.87
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 8.87
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 44.49
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 542.4
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 784.9
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 784.9
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 499.9
R-structure: ~us(trait):units
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0
Location effects: Primary.Lifestyle ~ log_Tail_Length_centered:trait + trait - 1
post.mean l-95% CI
traitPrimary.Lifestyle.Insessorial -1.072 -7.345
traitPrimary.Lifestyle.Terrestrial 3.279 -1.929
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.187 -5.100
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -2.694 -6.748
u-95% CI eff.samp
traitPrimary.Lifestyle.Insessorial 5.322 912.0
traitPrimary.Lifestyle.Terrestrial 9.466 855.8
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 2.964 908.8
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 1.571 1000.0
pMCMC
traitPrimary.Lifestyle.Insessorial 0.744
traitPrimary.Lifestyle.Terrestrial 0.202
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 0.594
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 0.178
<- mcmcglmm_mn1_2$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn1_2$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn1_2$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_2$VCV[, 1] * mcmcglmm_mn1_2$VCV[, 4])/(1+c2a))
res_3_corr_phylo
summary(res_1)
Iterations = 600001:2598001
Thinning interval = 2000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD
traitPrimary.Lifestyle.Insessorial -0.9666 2.803
traitPrimary.Lifestyle.Terrestrial 2.9557 2.670
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.0701 1.953
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -2.4283 1.909
Naive SE
traitPrimary.Lifestyle.Insessorial 0.08865
traitPrimary.Lifestyle.Terrestrial 0.08443
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 0.06176
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 0.06037
Time-series SE
traitPrimary.Lifestyle.Insessorial 0.09283
traitPrimary.Lifestyle.Terrestrial 0.09126
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 0.06479
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 0.06037
2. Quantiles for each variable:
2.5% 25%
traitPrimary.Lifestyle.Insessorial -7.020 -2.654
traitPrimary.Lifestyle.Terrestrial -1.639 1.231
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -4.746 -2.317
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -6.266 -3.592
50% 75%
traitPrimary.Lifestyle.Insessorial -0.8061 0.7953
traitPrimary.Lifestyle.Terrestrial 2.6791 4.4126
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.0952 0.2483
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -2.4131 -1.1127
97.5%
traitPrimary.Lifestyle.Insessorial 4.602
traitPrimary.Lifestyle.Terrestrial 8.724
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 2.557
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 1.320
summary(res_2)
Iterations = 600001:2598001
Thinning interval = 2000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 15.3215
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.9500
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.9500
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 15.4192
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
SD
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 10.870
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 4.038
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 4.038
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 10.808
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.000
Naive SE
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 0.3437
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 0.1277
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.1277
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.3418
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.0000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.0000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.0000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.0000
Time-series SE
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 0.4667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 0.1441
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.1441
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.4834
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.0000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.0000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.0000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.0000
2. Quantiles for each variable:
2.5%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 3.2225
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -9.2289
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -9.2289
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 3.4482
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
25%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 8.5847
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -2.7137
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -2.7137
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 8.1572
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
50%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 12.7689
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.8335
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.8335
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 12.7788
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
75%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 19.2764
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 1.1495
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 1.1495
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 19.6677
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
97.5%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 42.4203
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 6.6341
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 6.6341
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 43.7071
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
summary(res_3_corr_phylo)
Iterations = 600001:2598001
Thinning interval = 2000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
-0.064355 0.212862 0.006731 0.006731
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-0.46205 -0.21687 -0.07002 0.08667 0.35059
summary(brms_mn1_2)
Family: categorical
Links: muInsessorial = logit; muTerrestrial = logit
Formula: Primary.Lifestyle ~ log_Tail_Length_centered + (1 | a | gr(Phylo, cov = A))
Data: dat (Number of observations: 173)
Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
total post-warmup draws = 10000
Multilevel Hyperparameters:
~Phylo (Number of levels: 173)
Estimate Est.Error
sd(muInsessorial_Intercept) 6.14 3.83
sd(muTerrestrial_Intercept) 6.90 4.68
cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.15 0.40
l-95% CI u-95% CI Rhat
sd(muInsessorial_Intercept) 2.13 15.68 1.00
sd(muTerrestrial_Intercept) 2.32 19.89 1.00
cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.84 0.64 1.00
Bulk_ESS Tail_ESS
sd(muInsessorial_Intercept) 1627 2326
sd(muTerrestrial_Intercept) 1412 1902
cor(muInsessorial_Intercept,muTerrestrial_Intercept) 1304 2835
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI
muInsessorial_Intercept -0.53 2.29 -5.38 3.86
muTerrestrial_Intercept 1.34 2.45 -3.08 6.58
muInsessorial_log_Tail_Length_centered -2.17 3.08 -9.60 2.81
muTerrestrial_log_Tail_Length_centered -2.95 3.18 -9.99 2.58
Rhat Bulk_ESS Tail_ESS
muInsessorial_Intercept 1.00 12414 6264
muTerrestrial_Intercept 1.00 10541 5718
muInsessorial_log_Tail_Length_centered 1.00 4513 3030
muTerrestrial_log_Tail_Length_centered 1.00 4134 2746
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Both MCMCglmm and brms indicate very high phylogenetic variance for both Insessorial and Terrestrial lifestyles with only weak and uncertain phylogenetic correlation between the two categories. The differences in variance estimates likely stem from the informative priors used in MCMCglmm to aid convergence.
One continuous and one binary explanatory variables model
MCMCglmm
<- list(
prior5 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
system.time(
<- MCMCglmm(Primary.Lifestyle ~ log_Tail_Length_centered:trait + IsOmnivore:trait + trait -1,
mcmcglmm_mn1_3 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = dat,
prior = prior5,
nitt = 13000*250,
thin = 10*250,
burnin = 3000*250
) )
brms
system.time(
<- brm(
brms_mn1_3 ~ log_Tail_Length_centered + IsOmnivore + (1 |a| gr(Phylo, cov = A)),
Primary.Lifestyle data = dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = custom_priors,
iter = 15000,
warmup = 5000,
chains = 2,
thin = 1
) )
Outputs are…
summary(mcmcglmm_mn1_3)
Iterations = 750001:3247501
Thinning interval = 2500
Sample size = 1000
DIC: 244.733
G-structure: ~us(trait):Phylo
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 22.932
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.274
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.274
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 15.898
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 2.484e+00
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.179e+01
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.179e+01
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 3.475e-04
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 51.826
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 8.518
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 8.518
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 44.088
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 581.7
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 899.2
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 899.2
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 627.4
R-structure: ~us(trait):units
post.mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
l-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
u-95% CI
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.6667
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.3333
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.3333
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.6667
eff.samp
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0
Location effects: Primary.Lifestyle ~ log_Tail_Length_centered:trait + IsOmnivore:trait + trait - 1
post.mean l-95% CI
traitPrimary.Lifestyle.Insessorial -1.01465 -8.38540
traitPrimary.Lifestyle.Terrestrial 3.98040 -1.31604
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.23330 -6.68483
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -1.89662 -6.62854
traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.08497 -1.87839
traitPrimary.Lifestyle.Terrestrial:IsOmnivore -5.14025 -8.12432
u-95% CI eff.samp
traitPrimary.Lifestyle.Insessorial 5.59217 894.2
traitPrimary.Lifestyle.Terrestrial 9.80666 1200.6
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 2.66615 1000.0
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 2.37056 1113.2
traitPrimary.Lifestyle.Insessorial:IsOmnivore 2.24356 835.3
traitPrimary.Lifestyle.Terrestrial:IsOmnivore -2.71908 848.1
pMCMC
traitPrimary.Lifestyle.Insessorial 0.818
traitPrimary.Lifestyle.Terrestrial 0.092 .
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 0.610
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 0.406
traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.962
traitPrimary.Lifestyle.Terrestrial:IsOmnivore <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- mcmcglmm_mn1_3$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn1_3$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn1_3$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn1_3$VCV[, 1] * mcmcglmm_mn1_3$VCV[, 4])/(1+c2a))
res_3_corr_phylo <- (mcmcglmm_mn1_3$VCV[, 6]/(1+c2a)) /sqrt((mcmcglmm_mn1_3$VCV[, 5] * mcmcglmm_mn1_3$VCV[, 8])/(1+c2a))
res_3_corr_nonphylo
summary(res_1)
Iterations = 750001:3247501
Thinning interval = 2500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD
traitPrimary.Lifestyle.Insessorial -0.9147 3.2692
traitPrimary.Lifestyle.Terrestrial 3.5882 2.4961
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -1.1118 2.1376
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -1.7097 2.0720
traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.0766 0.9584
traitPrimary.Lifestyle.Terrestrial:IsOmnivore -4.6338 1.2856
Naive SE
traitPrimary.Lifestyle.Insessorial 0.10338
traitPrimary.Lifestyle.Terrestrial 0.07893
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 0.06760
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 0.06552
traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.03031
traitPrimary.Lifestyle.Terrestrial:IsOmnivore 0.04065
Time-series SE
traitPrimary.Lifestyle.Insessorial 0.10932
traitPrimary.Lifestyle.Terrestrial 0.07204
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 0.06760
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 0.06210
traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.03316
traitPrimary.Lifestyle.Terrestrial:IsOmnivore 0.04415
2. Quantiles for each variable:
2.5% 25%
traitPrimary.Lifestyle.Insessorial -7.874 -2.8074
traitPrimary.Lifestyle.Terrestrial -1.002 1.8582
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -5.840 -2.4495
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -5.978 -3.0120
traitPrimary.Lifestyle.Insessorial:IsOmnivore -1.675 -0.5924
traitPrimary.Lifestyle.Terrestrial:IsOmnivore -7.561 -5.3247
50% 75%
traitPrimary.Lifestyle.Insessorial -0.64157 1.1515
traitPrimary.Lifestyle.Terrestrial 3.25656 5.0788
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial -0.92478 0.3622
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial -1.64566 -0.2734
traitPrimary.Lifestyle.Insessorial:IsOmnivore 0.06417 0.7068
traitPrimary.Lifestyle.Terrestrial:IsOmnivore -4.47341 -3.6857
97.5%
traitPrimary.Lifestyle.Insessorial 4.945
traitPrimary.Lifestyle.Terrestrial 9.368
log_Tail_Length_centered:traitPrimary.Lifestyle.Insessorial 2.750
log_Tail_Length_centered:traitPrimary.Lifestyle.Terrestrial 2.103
traitPrimary.Lifestyle.Insessorial:IsOmnivore 2.066
traitPrimary.Lifestyle.Terrestrial:IsOmnivore -2.669
summary(res_2)
Iterations = 750001:3247501
Thinning interval = 2500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 18.6355
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -1.0351
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -1.0351
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 12.9191
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
SD
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 11.978
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 3.971
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 3.971
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 11.113
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.000
Naive SE
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 0.3788
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 0.1256
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.1256
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.3514
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.0000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.0000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.0000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.0000
Time-series SE
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 0.4966
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 0.1324
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.1324
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.4437
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.0000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.0000
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.0000
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.0000
2. Quantiles for each variable:
2.5%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 4.0308
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -10.0779
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -10.0779
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.4373
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
25%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 10.2520
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -2.7629
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -2.7629
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 4.6612
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
50%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 15.9642
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo -0.7674
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo -0.7674
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 10.1930
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
75%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 24.0021
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 0.8149
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 0.8149
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 17.3835
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
97.5%
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.Phylo 49.6277
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.Phylo 6.5968
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.Phylo 6.5968
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.Phylo 44.3732
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Insessorial.units 0.5418
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Insessorial.units 0.2709
traitPrimary.Lifestyle.Insessorial:traitPrimary.Lifestyle.Terrestrial.units 0.2709
traitPrimary.Lifestyle.Terrestrial:traitPrimary.Lifestyle.Terrestrial.units 0.5418
summary(res_3_corr_phylo)
Iterations = 750001:3247501
Thinning interval = 2500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
-0.069527 0.220096 0.006960 0.006403
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-0.49120 -0.21345 -0.07966 0.08421 0.37013
summary(res_3_corr_nonphylo)
Iterations = 750001:3247501
Thinning interval = 2500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
0.4507 0.0000 0.0000 0.0000
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
0.4507 0.4507 0.4507 0.4507 0.4507
summary(brms_mn1_3)
Family: categorical
Links: muInsessorial = logit; muTerrestrial = logit
Formula: Primary.Lifestyle ~ log_Tail_Length_centered + IsOmnivore + (1 | a | gr(Phylo, cov = A))
Data: dat (Number of observations: 173)
Draws: 2 chains, each with iter = 15000; warmup = 10000; thin = 1;
total post-warmup draws = 10000
Multilevel Hyperparameters:
~Phylo (Number of levels: 173)
Estimate Est.Error
sd(muInsessorial_Intercept) 7.18 4.74
sd(muTerrestrial_Intercept) 5.76 4.05
cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.24 0.42
l-95% CI u-95% CI Rhat
sd(muInsessorial_Intercept) 2.26 20.74 1.00
sd(muTerrestrial_Intercept) 0.63 16.04 1.00
cor(muInsessorial_Intercept,muTerrestrial_Intercept) -0.92 0.64 1.00
Bulk_ESS Tail_ESS
sd(muInsessorial_Intercept) 1052 1590
sd(muTerrestrial_Intercept) 920 963
cor(muInsessorial_Intercept,muTerrestrial_Intercept) 1248 2800
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI
muInsessorial_Intercept -0.96 2.39 -6.16 3.36
muTerrestrial_Intercept 3.72 2.49 0.09 9.85
muInsessorial_log_Tail_Length_centered -2.05 3.19 -9.66 3.14
muInsessorial_IsOmnivore 0.87 1.76 -1.87 5.32
muTerrestrial_log_Tail_Length_centered -1.50 2.89 -7.61 4.13
muTerrestrial_IsOmnivore -6.36 3.10 -14.70 -2.92
Rhat Bulk_ESS Tail_ESS
muInsessorial_Intercept 1.00 3821 3835
muTerrestrial_Intercept 1.00 2250 2363
muInsessorial_log_Tail_Length_centered 1.00 2889 2273
muInsessorial_IsOmnivore 1.00 1783 1843
muTerrestrial_log_Tail_Length_centered 1.00 4115 3124
muTerrestrial_IsOmnivore 1.00 1269 1638
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
- The effect of tail length on lifestyle probability was negative for both categories in both models, but the 95% CIs include zero, indicating no strong evidence for an effect.
- Results from MCMCglmm and brms were broadly consistent in magnitude and direction of effects.
- Only weak and uncertain phylogenetic correlation between the two categories.
Example 2
We use the data of habitat category on Sylviidae (Old World warbler - 294 species) as the response variable. Mass and presence or absence of migration are used as the explanatory variable.
Intercept-only model
MCMCglmm
<- inverseA(tree, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(
prior R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 20,
alpha.mu = rep(0, 2), alpha.V = diag(2)*(pi^2/3))
)
)
<- MCMCglmm(Habitat_Category ~ trait -1,
mcmcglmm_mn2_1 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*500,
thin = 10*500,
burnin = 3000*500
)
brms
<- ape::vcv.phylo(tree, corr = TRUE)
A <- default_prior(Habitat_Category ~ 1 + (1 |a| gr(Phylo, cov = A)),
priors_brms1 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Habitat_Category ~ 1 + (1 |a| gr(Phylo, cov = A)),
brms_mn2_1 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
) )
Before checking the output,we need to correct the results from MCMCglmm
using the following equation:
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- c2*(2/3) c2a
summary(mcmcglmm_mn2_1)
Iterations = 1500001:6495001
Thinning interval = 5000
Sample size = 1000
DIC: 379.9416
G-structure: ~us(trait):Phylo
post.mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.151
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.010
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.010
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 14.710
l-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.003
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -2.392
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -2.392
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 3.210
u-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 16.839
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 7.668
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 7.668
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 29.105
eff.samp
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 854.5
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1000.0
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1000.0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 784.9
R-structure: ~us(trait):units
post.mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
l-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
u-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
eff.samp
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0
Location effects: Habitat_Category ~ trait - 1
post.mean l-95% CI u-95% CI eff.samp
traitHabitat_Category.Arid_Open 2.44871 0.09382 4.88885 905.8
traitHabitat_Category.Forested_Vegetated 4.56420 1.35941 7.49998 892.7
pMCMC
traitHabitat_Category.Arid_Open 0.036 *
traitHabitat_Category.Forested_Vegetated <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- mcmcglmm_mn2_1$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn2_1$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn2_1$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_1$VCV[, 1] * mcmcglmm_mn2_1$VCV[, 4])/(1+c2a))
res_3_corr_phylo
summary(res_1)
Iterations = 1500001:6495001
Thinning interval = 5000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
traitHabitat_Category.Arid_Open 2.207 1.140 0.03605 0.03788
traitHabitat_Category.Forested_Vegetated 4.114 1.417 0.04480 0.04741
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
traitHabitat_Category.Arid_Open 0.1303 1.522 2.095 2.828 4.518
traitHabitat_Category.Forested_Vegetated 1.5750 3.173 4.004 4.958 7.352
summary(res_2)
Iterations = 1500001:6495001
Thinning interval = 5000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 5.8115
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.6331
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.6331
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 11.9537
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
SD
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 4.203
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.320
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.320
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 6.815
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.000
Naive SE
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 0.13290
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 0.07337
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 0.07337
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 0.21551
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.00000
Time-series SE
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 0.14377
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 0.07337
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 0.07337
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 0.24325
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.00000
2. Quantiles for each variable:
2.5%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.3308
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -1.5487
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -1.5487
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.0126
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
25%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 3.1661
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 0.2950
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 0.2950
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 7.3840
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
50%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 4.8034
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.2254
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.2254
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 10.4777
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
75%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 6.9772
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.4747
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.4747
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 14.8650
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
97.5%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 15.7767
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 7.0730
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 7.0730
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 28.3674
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
summary(res_3_corr_phylo)
Iterations = 1500001:6495001
Thinning interval = 5000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
0.172049 0.184925 0.005848 0.005848
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-0.17512 0.03699 0.17474 0.30106 0.51285
summary(brms_mn2_1)
Family: categorical
Links: muAridOpen = logit; muForestedVegetated = logit
Formula: Habitat_Category ~ 1 + (1 | a | gr(Phylo, cov = A))
Data: Sylviidae_dat (Number of observations: 294)
Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
total post-warmup draws = 20000
Multilevel Hyperparameters:
~Phylo (Number of levels: 294)
Estimate Est.Error
sd(muAridOpen_Intercept) 2.58 0.90
sd(muForestedVegetated_Intercept) 4.14 1.33
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.44 0.31
l-95% CI u-95% CI Rhat
sd(muAridOpen_Intercept) 1.24 4.69 1.00
sd(muForestedVegetated_Intercept) 2.20 7.19 1.00
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) -0.25 0.90 1.00
Bulk_ESS Tail_ESS
sd(muAridOpen_Intercept) 4049 7462
sd(muForestedVegetated_Intercept) 2119 5431
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 1116 2854
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat
muAridOpen_Intercept 2.24 1.17 -0.05 4.66 1.00
muForestedVegetated_Intercept 3.62 1.49 0.88 6.83 1.00
Bulk_ESS Tail_ESS
muAridOpen_Intercept 3552 6462
muForestedVegetated_Intercept 12256 10030
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
And average phylogenetic signals are…
# average phylogenteic signal - others vs. Arid and open
# MCMCglmm
<- ((mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo"]/(1+c2a)) / (mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo"]/(1+c2a) + 1))
phylo_signalA_mcmcglmm_nominal %>% mean() phylo_signalA_mcmcglmm_nominal
[1] 0.808917
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalA_mcmcglmm_nominal
2.5% 50% 97.5%
0.5709550 0.8276878 0.9403931
## brms
<- brms_mn2_1 %>% as_tibble() %>%
phylo_signalA_brms ::select(Sigma_phy = sd_Phylo__muAridOpen_Intercept) %>%
dplyrmutate(lambda_nominalA = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalA)
%>% mean() phylo_signalA_brms
[1] 0.8370854
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalA_brms
2.5% 50% 97.5%
0.6055780 0.8563134 0.9565012
# average phylogenteic signal - both vs. right
## MCMCglmm
<- ((mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo"]/(1+c2a)) / (mcmcglmm_mn2_1$VCV[, "traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo"]/(1+c2a) + 1))
phylo_signalF_mcmcglmm_nominal %>% mean() phylo_signalF_mcmcglmm_nominal
[1] 0.9044678
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalF_mcmcglmm_nominal
2.5% 50% 97.5%
0.8004989 0.9128743 0.9659487
# brms
<- brms_mn2_1 %>% as_tibble() %>%
phylo_signalF_brms ::select(Sigma_phy = sd_Phylo__muForestedVegetated_Intercept) %>%
dplyrmutate(lambda_nominalF = Sigma_phy^2 / (Sigma_phy^2 + 1)) %>%
pull(lambda_nominalF)
%>% mean() phylo_signalF_brms
[1] 0.9304826
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalF_brms
2.5% 50% 97.5%
0.8292983 0.9394892 0.9810195
Both models detected high phylogenetic signal for Habitat_Category, especially for Forested_Vegetated and slightly lower for Arid_Open category. The phylogenetic correlation between the two habitat categories tended to be positive, but the 95% credible intervals included zero in both models, indicating that the correlation was not statistically supported.
One continuous explanatory variable model
MCMCglmm
<- MCMCglmm(Habitat_Category ~ cMass:trait + trait -1,
mcmcglmm_mn2_2 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*500,
thin = 10*500,
burnin = 3000*500
)
brms
<- ape::vcv.phylo(tree, corr = TRUE)
A <- default_prior(Habitat_Category ~ cMass + (1 |a| gr(Phylo, cov = A)),
priors_brms2 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Habitat_Category ~ cMass + (1 |a| gr(Phylo, cov = A)),
brms_mn2_2 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms2,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
) )
summary(mcmcglmm_mn2_2)
Iterations = 1500001:6495001
Thinning interval = 5000
Sample size = 1000
DIC: 369.9206
G-structure: ~us(trait):Phylo
post.mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.923
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.344
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.344
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 18.943
l-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.119
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -2.666
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -2.666
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.131
u-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 18.446
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 9.503
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 9.503
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 41.543
eff.samp
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 649.8
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 770.0
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 770.0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 745.4
R-structure: ~us(trait):units
post.mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
l-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
u-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
eff.samp
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0
Location effects: Habitat_Category ~ cMass:trait + trait - 1
post.mean l-95% CI u-95% CI
traitHabitat_Category.Arid_Open 2.62403 -0.08498 5.40782
traitHabitat_Category.Forested_Vegetated 4.74629 1.55204 8.47272
cMass:traitHabitat_Category.Arid_Open -0.79705 -2.34051 0.62992
cMass:traitHabitat_Category.Forested_Vegetated 1.03745 -0.87700 2.97350
eff.samp pMCMC
traitHabitat_Category.Arid_Open 1000.0 0.048 *
traitHabitat_Category.Forested_Vegetated 1000.0 0.002 **
cMass:traitHabitat_Category.Arid_Open 1000.0 0.294
cMass:traitHabitat_Category.Forested_Vegetated 892.6 0.252
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- mcmcglmm_mn2_2$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn2_2$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn2_2$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_2$VCV[, 1] * mcmcglmm_mn2_2$VCV[, 4])/(1+c2a))
res_3_corr_phylo
summary(res_1)
Iterations = 1500001:6495001
Thinning interval = 5000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE
traitHabitat_Category.Arid_Open 2.3655 1.2243 0.03872
traitHabitat_Category.Forested_Vegetated 4.2786 1.5798 0.04996
cMass:traitHabitat_Category.Arid_Open -0.7185 0.6901 0.02182
cMass:traitHabitat_Category.Forested_Vegetated 0.9352 0.8609 0.02722
Time-series SE
traitHabitat_Category.Arid_Open 0.03872
traitHabitat_Category.Forested_Vegetated 0.04996
cMass:traitHabitat_Category.Arid_Open 0.02182
cMass:traitHabitat_Category.Forested_Vegetated 0.02882
2. Quantiles for each variable:
2.5% 25% 50% 75%
traitHabitat_Category.Arid_Open 0.01923 1.6198 2.2918 3.0933
traitHabitat_Category.Forested_Vegetated 1.53683 3.1620 4.1529 5.2215
cMass:traitHabitat_Category.Arid_Open -2.10314 -1.1670 -0.6945 -0.2558
cMass:traitHabitat_Category.Forested_Vegetated -0.66742 0.3931 0.8633 1.4720
97.5%
traitHabitat_Category.Arid_Open 5.0795
traitHabitat_Category.Forested_Vegetated 7.8460
cMass:traitHabitat_Category.Arid_Open 0.5787
cMass:traitHabitat_Category.Forested_Vegetated 2.8657
summary(res_2)
Iterations = 1500001:6495001
Thinning interval = 5000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 6.4389
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.9045
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.9045
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 15.3942
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
SD
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 4.320
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.559
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.559
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 8.817
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.000
Naive SE
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 0.13661
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 0.08093
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 0.08093
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 0.27883
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.00000
Time-series SE
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 0.16948
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 0.09223
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 0.09223
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 0.32297
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.00000
2. Quantiles for each variable:
2.5%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.2723
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -2.0319
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -2.0319
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.7187
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
25%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 3.4168
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 0.3359
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 0.3359
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 9.3997
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
50%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 5.3290
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.4980
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.4980
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 12.9514
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
75%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 8.3308
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 3.0477
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 3.0477
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 18.8536
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
97.5%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 18.1859
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 8.1388
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 8.1388
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 38.7921
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
summary(res_3_corr_phylo)
Iterations = 1500001:6495001
Thinning interval = 5000
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
0.172506 0.195639 0.006187 0.006234
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-0.23534 0.04138 0.18461 0.31305 0.53443
summary(brms_mn2_2)
Family: categorical
Links: muAridOpen = logit; muForestedVegetated = logit
Formula: Habitat_Category ~ cMass + (1 | a | gr(Phylo, cov = A))
Data: Sylviidae_dat (Number of observations: 294)
Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
total post-warmup draws = 20000
Multilevel Hyperparameters:
~Phylo (Number of levels: 294)
Estimate Est.Error
sd(muAridOpen_Intercept) 2.63 0.94
sd(muForestedVegetated_Intercept) 5.00 1.69
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.48 0.31
l-95% CI u-95% CI Rhat
sd(muAridOpen_Intercept) 1.22 4.90 1.00
sd(muForestedVegetated_Intercept) 2.60 9.08 1.00
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) -0.23 0.93 1.00
Bulk_ESS Tail_ESS
sd(muAridOpen_Intercept) 3376 6591
sd(muForestedVegetated_Intercept) 1488 4134
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 794 1765
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat
muAridOpen_Intercept 2.47 1.20 0.18 5.03 1.00
muForestedVegetated_Intercept 3.59 1.72 0.41 7.25 1.00
muAridOpen_cMass -0.55 0.75 -2.08 0.92 1.00
muForestedVegetated_cMass 1.29 1.08 -0.62 3.61 1.00
Bulk_ESS Tail_ESS
muAridOpen_Intercept 2925 5132
muForestedVegetated_Intercept 10315 9625
muAridOpen_cMass 8518 11678
muForestedVegetated_cMass 6534 8329
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Summary The model intercept represents the log-odds of belonging to a given habitat category when predictors are at their reference values (here, cMass = 0). For Arid/Open and Forested/Vegetated, the intercepts were positive, and in most cases the 95% credible intervals did not include zero. This means that, under baseline conditions, the probability of belonging to these two habitat categories is higher than for the reference category (Aquatic/Coastal). In other words, the model predicts that species are more likely to occur in Arid/Open or Forested/Vegetated habitats than in Aquatic/Coastal habitats under the reference conditions.
Mass effects: The effect of centred log body mass (cMass) was negative for Arid/Open and positive for Forested/Vegetated, but in both models the 95% credible intervals crossed zero, suggesting no clear statistical support.
Phylogenetic variances: Substantial phylogenetic variance was observed for both habitat categories, indicating moderate to strong phylogenetic structuring.
Phylogenetic correlation: The correlation between habitat categories was small and positive but highly uncertain, with 95% credible intervals spanning zero.
One continuous and one binary explanatory variables model
MCMCglmm
<- MCMCglmm(Habitat_Category ~ cMass:trait + IsMigrate:trait + trait -1,
mcmcglmm_mn2_3 random = ~us(trait):Phylo,
rcov = ~us(trait):units,
ginverse = list(Phylo = inv_phylo$Ainv),
family = "categorical",
data = Sylviidae_dat,
prior = prior,
nitt = 13000*750,
thin = 10*750,
burnin = 3000*750
)
brms
<- ape::vcv.phylo(tree, corr = TRUE)
A <- default_prior(Habitat_Category ~ cMass + IsMigrate + (1 |a| gr(Phylo, cov = A)),
priors_brms3 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(Habitat_Category ~ cMass + IsMigrate + (1 |a| gr(Phylo, cov = A)),
brms_mn2_3 data = Sylviidae_dat,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms3,
iter = 18000,
warmup = 8000,
chains = 2,
thin = 1,
) )
summary(mcmcglmm_mn2_3)
Iterations = 2250001:9742501
Thinning interval = 7500
Sample size = 1000
DIC: 353.666
G-structure: ~us(trait):Phylo
post.mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 9.083
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.660
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.660
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 20.892
l-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.196
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -4.574
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -4.574
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 2.772
u-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 21.74
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 8.40
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 8.40
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 44.08
eff.samp
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 718.3
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 779.0
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 779.0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 678.4
R-structure: ~us(trait):units
post.mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
l-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
u-95% CI
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.6667
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.3333
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.3333
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.6667
eff.samp
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0
Location effects: Habitat_Category ~ cMass:trait + IsMigrate:trait + trait - 1
post.mean l-95% CI u-95% CI
traitHabitat_Category.Arid_Open 2.3783 -0.5852 5.1494
traitHabitat_Category.Forested_Vegetated 2.4444 -0.8470 6.7465
cMass:traitHabitat_Category.Arid_Open -0.8778 -2.6132 0.6125
cMass:traitHabitat_Category.Forested_Vegetated 0.8132 -1.2024 2.7663
traitHabitat_Category.Arid_Open:IsMigrate 0.4607 -0.9485 1.8999
traitHabitat_Category.Forested_Vegetated:IsMigrate 3.0348 1.1467 5.1125
eff.samp pMCMC
traitHabitat_Category.Arid_Open 988.4 0.090 .
traitHabitat_Category.Forested_Vegetated 1000.0 0.172
cMass:traitHabitat_Category.Arid_Open 1588.3 0.258
cMass:traitHabitat_Category.Forested_Vegetated 1000.0 0.424
traitHabitat_Category.Arid_Open:IsMigrate 905.7 0.544
traitHabitat_Category.Forested_Vegetated:IsMigrate 1000.0 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- mcmcglmm_mn2_3$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mcmcglmm_mn2_3$VCV / (1+c2a) # for variance components
res_2 <- (mcmcglmm_mn2_3$VCV[, 2]/(1+c2a)) /sqrt((mcmcglmm_mn2_3$VCV[, 1] * mcmcglmm_mn2_3$VCV[, 4])/(1+c2a))
res_3_corr_phylo
summary(res_1)
Iterations = 2250001:9742501
Thinning interval = 7500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE
traitHabitat_Category.Arid_Open 2.1440 1.2735 0.04027
traitHabitat_Category.Forested_Vegetated 2.2035 1.7196 0.05438
cMass:traitHabitat_Category.Arid_Open -0.7913 0.7334 0.02319
cMass:traitHabitat_Category.Forested_Vegetated 0.7331 0.9346 0.02955
traitHabitat_Category.Arid_Open:IsMigrate 0.4153 0.6657 0.02105
traitHabitat_Category.Forested_Vegetated:IsMigrate 2.7358 0.9074 0.02870
Time-series SE
traitHabitat_Category.Arid_Open 0.04051
traitHabitat_Category.Forested_Vegetated 0.05438
cMass:traitHabitat_Category.Arid_Open 0.01840
cMass:traitHabitat_Category.Forested_Vegetated 0.02955
traitHabitat_Category.Arid_Open:IsMigrate 0.02212
traitHabitat_Category.Forested_Vegetated:IsMigrate 0.02870
2. Quantiles for each variable:
2.5% 25% 50%
traitHabitat_Category.Arid_Open -0.4477 1.37995 2.1559
traitHabitat_Category.Forested_Vegetated -0.9808 1.11585 2.1584
cMass:traitHabitat_Category.Arid_Open -2.3303 -1.24354 -0.7433
cMass:traitHabitat_Category.Forested_Vegetated -1.0181 0.08259 0.6961
traitHabitat_Category.Arid_Open:IsMigrate -0.9151 -0.03301 0.4117
traitHabitat_Category.Forested_Vegetated:IsMigrate 1.1262 2.12920 2.6469
75% 97.5%
traitHabitat_Category.Arid_Open 2.8690 4.8198
traitHabitat_Category.Forested_Vegetated 3.1992 5.9649
cMass:traitHabitat_Category.Arid_Open -0.2922 0.5856
cMass:traitHabitat_Category.Forested_Vegetated 1.3172 2.7128
traitHabitat_Category.Arid_Open:IsMigrate 0.8748 1.7025
traitHabitat_Category.Forested_Vegetated:IsMigrate 3.2897 4.7233
summary(res_2)
Iterations = 2250001:9742501
Thinning interval = 7500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 7.3814
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.3490
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.3490
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 16.9777
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
SD
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 5.319
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.643
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.643
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 10.703
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.000
Naive SE
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 0.16821
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 0.08358
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 0.08358
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 0.33847
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.00000
Time-series SE
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 0.19848
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 0.09469
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 0.09469
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 0.41092
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.00000
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.00000
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.00000
2. Quantiles for each variable:
2.5%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 1.5684
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -3.6719
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -3.6719
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 4.2987
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
25%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 3.7512
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo -0.1351
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo -0.1351
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 9.7029
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
50%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 5.9438
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 1.1143
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 1.1143
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 14.3953
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
75%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 9.6229
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 2.6647
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 2.6647
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 21.6290
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
97.5%
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.Phylo 20.8510
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.Phylo 7.1611
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.Phylo 7.1611
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.Phylo 43.4114
traitHabitat_Category.Arid_Open:traitHabitat_Category.Arid_Open.units 0.5418
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Arid_Open.units 0.2709
traitHabitat_Category.Arid_Open:traitHabitat_Category.Forested_Vegetated.units 0.2709
traitHabitat_Category.Forested_Vegetated:traitHabitat_Category.Forested_Vegetated.units 0.5418
summary(res_3_corr_phylo)
Iterations = 2250001:9742501
Thinning interval = 7500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
0.117832 0.197752 0.006253 0.006253
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-0.27624 -0.01752 0.12623 0.25980 0.49079
summary(brms_mn2_3)
Family: categorical
Links: muAridOpen = logit; muForestedVegetated = logit
Formula: Habitat_Category ~ cMass + IsMigrate + (1 | a | gr(Phylo, cov = A))
Data: Sylviidae_dat (Number of observations: 294)
Draws: 2 chains, each with iter = 18000; warmup = 8000; thin = 1;
total post-warmup draws = 20000
Multilevel Hyperparameters:
~Phylo (Number of levels: 294)
Estimate Est.Error
sd(muAridOpen_Intercept) 2.93 1.24
sd(muForestedVegetated_Intercept) 6.45 3.41
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 0.35 0.35
l-95% CI u-95% CI Rhat
sd(muAridOpen_Intercept) 1.28 5.93 1.00
sd(muForestedVegetated_Intercept) 2.65 15.31 1.00
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) -0.42 0.90 1.00
Bulk_ESS Tail_ESS
sd(muAridOpen_Intercept) 2322 3372
sd(muForestedVegetated_Intercept) 1254 1903
cor(muAridOpen_Intercept,muForestedVegetated_Intercept) 680 1883
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat
muAridOpen_Intercept 2.08 1.30 -0.42 4.81 1.00
muForestedVegetated_Intercept 0.81 2.34 -4.26 5.08 1.00
muAridOpen_cMass -0.64 0.81 -2.26 0.92 1.00
muAridOpen_IsMigrate 0.53 0.72 -0.92 1.94 1.00
muForestedVegetated_cMass 1.44 1.38 -0.81 4.62 1.00
muForestedVegetated_IsMigrate 3.63 1.73 1.37 7.83 1.00
Bulk_ESS Tail_ESS
muAridOpen_Intercept 3324 5760
muForestedVegetated_Intercept 4304 3182
muAridOpen_cMass 4954 9099
muAridOpen_IsMigrate 5983 9135
muForestedVegetated_cMass 3507 3935
muForestedVegetated_IsMigrate 2677 2363
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Summary
Baseline habitat use: When migration status and body mass are not considered, Sylviidae species are somewhat more likely to be in Arid/Open or Forested/Vegetated habitats than in the reference category (Aquatic/Coastal), but the statistical uncertainty is large enough that we cannot be confident in these differences. Body mass effects: There is no clear evidence that heavier or lighter species prefer either Arid/Open or Forested/Vegetated habitats.
Migration effects: Migratory species are much more likely to be associated with Forested/Vegetated habitats compared to Aquatic/Coastal ones, even after accounting for body mass and phylogeny. Migration does not appear to strongly influence the likelihood of using Arid/Open habitats.
Phylogenetic patterns: The evolutionary tendencies for using Arid/Open vs. Forested/Vegetated habitats are only weakly related, and the data do not clearly indicate a consistent positive or negative correlation across the family’s evolutionary history.
Extension: models with multiple data point per species
Tips for simulating dataset
Unfortunately, there are few published dataset for multiple datapoints per species. Therefore, we generate 1. Gaussian, 2. binary. 3. ordinal, and 4. nominal data for this section and use for MCMCglmm
and brms
models.
Before moving each section, we share some tips we need to keep in mind when simulating data:
Make the dataset as realistic as possible
Simulated data should reflect the real-world context. If you are simulating biological data, consider the typical distributions, relationships, and range of values that could appear in your real dataset. You may need check published papers whether your assumption is suitable.Carefully define random effect(s) and fixed effect(s) parameters
For generating a response variable, ensure that both fixed and random effects are specified clearly. The fixed effects could represent known, deterministic influences (like treatment groups), while random effects account for variability due to unmeasured factors (like species or individuals). When setting residuals (errors) for the Gaussian response variable, remember that the variance of the residuals should typically be smaller than the variance of the random effects. Because they represent smaller-scale noise, while random effects capture larger-scale variability. If you set the residual variance is larger, it can lead model unrealistic.Check true values (we set) and model estimates
After generating the data and fitting your model, compare the “true” parameter values you set for the simulation with the estimates your model provides. This will help you assess the accuracy of your model and how well it can recover the true underlying parameters. It is important to check whether the model overestimates or underestimates certain effects.Ensure all created variables are properly aligned
When simulating data, it is essential to check that all the variables are correctly aligned and correspond to each other as intended. Specifically, pay attention to the following points:Matching species names with those in the phylogenetic tree
Ensure that the species names used in your data set are consistent with those in the phylogenetic tree (if applicable). If these names don’t match, the integrity of the data could be compromised, leading to inaccurate results (or model can not run). For instance, if a species is named “SpeciesA” in the data but referred to as “SpeciesB” in the phylogeny, the data won’t be interpreted correctly.Correspondence between variables
Also check that other variables, such as individual IDs, environmental factors, gender, or group differences, are correctly matched. For example, verify that data points for each individual are correctly associated with the intended ID, and that variables like age and gender follow expected distributions. How to check: After creating your data frame, use functions, such ashead()
,summary()
, orView()
in R (you can find other functions as well) to check for any discrepancies, such as duplicated species names or missing values. This helps that your dataset is internally consistent.
Other small things…
- Read the documentation
It may not be worth mentioning again, but always read the documentation for the functions yo are using. Understand how they work, what parameters they take, and what outputs they produce. - Test with small examples
If you are using a new function, start by testing it on a small sample of data to see how it behaves before using it in your actual analysis. This helps you avoid larger mistakes when you apply it to the full dataset. - Stick to familiar methods
However, stick to methods and functions you are familiar with, especially when you are starting. As you encounter issues or need more advanced features, gradually learn new functions and techniques. It is a safer approach and will help you build confidence.
Simulate datasets and run models
In this extension, We aim to create a dataset for models that take different types of response variables: Gaussian, binary, ordinal, and nominal traits. The explanatory variables are all the same - body mass and sex. All datasets were generated based on observations of 100 bird species, with each species being observed 5 times.
Gaussian
We simulated the habitat range (log-transformed) of each species as the continuous response variable. We assumed habitat range is more strongly influenced by body mass than by sex, and phylogenetic factors are expected to have a greater impact than non-phylogenetic factors.
set.seed(1234)
<- pbtree(n = 100, scale = 1)
tree
# extract the correlation matrix from the phylogenetic tree
<- vcv(tree, corr = TRUE)
phylo_cor
# define parameters
<- 100 # number of species (clusters)
n_species <- 5 # number of observations per species
obs_per_species <- n_species * obs_per_species # total number of observations
n
# simulate fixed effects
## continuous variable: body mass
<- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_mass <- mvrnorm(n = 1, # species level body mass
mu_species_mass mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
<- sapply(mu_species_mass,
obs_mass function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.1))) %>%
as.vector() # each observation body mass
## binomial variable: sex
<- rbinom(n, 1, 0.5)
sex
# simulate random effect parameter
## variance components
<- 2
sigma_phylo <- 1 # standard deviation of non-phylogenetic random effect
sigma_non
# phylogenetic random effect
<- mvrnorm(
random_effect_phylo n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
<- rep(random_effect_phylo, each = obs_per_species)
phylo_effect
# non-phylogenetic random effect
<- rnorm(n_species, sd = sqrt(sigma_non))
random_effect_non_phylo <- rep(random_effect_non_phylo, each = obs_per_species)
non_phylo_effect
# residual
<- 0.2
sigma_residual <- rnorm(n, mean = 0, sd = sqrt(sigma_residual))
residual
# define fixed effect parameter
<- 0 # intercept
beta_c0 <- 1.2 # coefficient for body mass
beta_c1 <- 0.1 # coefficient for sex
beta_c2
<- beta_c0 + beta_c1 * obs_mass + beta_c2 * sex + phylo_effect + non_phylo_effect + residual
y
<- data.frame(
sim_data1 habitat_range = y,
mass = obs_mass,
sex = factor(sex, labels = c("F", "M")),
species = names(phylo_effect),
individual_id = factor(1:n), # create unique individual IDs for each observation
phylo = names(phylo_effect)
%>%
) arrange(species)
Run models
<- inverseA(tree, nodes = "ALL", scale = TRUE)$Ainv
Ainv <- list(R = list(V = 1, nu = 0.002),
prior1_mcmcglmm G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 100),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 100)
)
)
system.time(
<- MCMCglmm(
mod1_mcmcglmm ~ 1,
habitat_range random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*60,
thin = 10*60,
burnin = 3000*60
)
)
system.time(
<- MCMCglmm(
mod1_mcmcglmm2 ~ mass,
habitat_range random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*100,
thin = 10*100,
burnin = 3000*100
)
)
system.time(
<- MCMCglmm(
mod1_mcmcglmm3 ~ mass + sex,
habitat_range random = ~ phylo + species,
ginverse = list(phylo = Ainv),
prior = prior1_mcmcglmm,
family = "gaussian",
data = sim_data1,
nitt = 13000*250,
thin = 10*250,
burnin = 3000*250
)
)
# brms
<- ape::vcv.phylo(tree, corr = TRUE)
A
<- default_prior(
priors1_brms ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
habitat_range data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(habitat_range ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
mod1_brms data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
priors1_brms2 ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
habitat_range data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(habitat_range ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
mod1_brms2 data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms2,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
priors1_brms3 ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
habitat_range data = sim_data1,
data2 = list(A = A),
family = gaussian()
)
system.time(
<- brm(habitat_range ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
mod1_brms3 data = sim_data1,
data2 = list(A = A),
family = gaussian(),
prior = priors1_brms3,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
) )
Results
actual mean and variance of the model is…
mean(y)
[1] 5.560533
var(random_effect_phylo)
[1] 0.9475315
var(random_effect_non_phylo)
[1] 0.8428301
var(residual)
[1] 0.1892837
autocorr.plot(mod1_mcmcglmm$Sol)
autocorr.plot(mod1_mcmcglmm$VCV)
summary(mod1_mcmcglmm)
Iterations = 180001:779401
Thinning interval = 600
Sample size = 1000
DIC: 998.4984
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 1.165 0.1108 2.704 881.6
~species
post.mean l-95% CI u-95% CI eff.samp
species 1.036 0.5946 1.504 1323
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.3573 0.3125 0.4029 1000
Location effects: habitat_range ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 5.239 4.131 6.302 1000 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
posterior_summary(mod1_mcmcglmm$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 5.238814 0.5481793 4.036911 6.270514
posterior_summary(mod1_mcmcglmm$VCV)
Estimate Est.Error Q2.5 Q97.5
phylo 1.1653905 0.78259640 0.1622860 2.9869859
species 1.0359759 0.24017414 0.6052211 1.5251053
units 0.3572733 0.02396132 0.3150877 0.4083013
summary(mod1_brms)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: habitat_range ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data1 (Number of observations: 500)
Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.99 0.36 0.33 1.76 1.00 345 722
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.02 0.12 0.78 1.26 1.00 661 1451
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 5.28 0.51 4.22 6.30 1.00 4816 2515
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.60 0.02 0.56 0.64 1.00 3973 2562
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We will now verify whether the model could show estimates close to the true values from the simulated data - there are several points to check during this process. First, confirm whether the model converged by examining metrics such as the effective sampling size, autocorrelation, and R-hat values. Then, compare the true value with model point estimates and check the 95% credible intervals of point estimates. The intercept-only model does not include any independent variables. Instead, it focuses on the intercept (mean value) and variance components to explain the distribution of the dependent variable, habitat range. Looking at the results, the model appears to estimate the mean (fixed effect beta/intercept) and random effect variances close to their true values. However, the residual variance is not accurately estimated.
# MCMCglmm
<- ((mod1_mcmcglmm$VCV[, "phylo"]) / (mod1_mcmcglmm$VCV[, "phylo"] + mod1_mcmcglmm$VCV[, "species"] + mod1_mcmcglmm$VCV[, "units"]))
phylo_signal_mod1_mcmcglmm %>% mean() phylo_signal_mod1_mcmcglmm
[1] 0.4170196
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mod1_mcmcglmm
2.5% 50% 97.5%
0.08783016 0.42359454 0.73669927
# brms
<- mod1_brms %>% as_tibble() %>%
phylo_signal_mod1_brms ::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept, Res = sigma) %>%
dplyrmutate(lambda_gaussian = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + Res^2)) %>%
pull(lambda_gaussian)
%>% mean() phylo_signal_mod1_brms
[1] 0.3984794
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mod1_brms
2.5% 50% 97.5%
0.06146131 0.39645518 0.74712062
The phylogenetic signal was estimated as 0.42 (95% CI: 0.09, 0.74) using MCMCglmm and 0.40 (95% CI: 0.06, 0.75) using brms.It is important to note that we can now consider non-phylogenetic variance (here, species
) more accurately, as the non-phylogenetic variance is directly estimated and is not confounded by within-species effects or other sources of variance.
summary(mod1_mcmcglmm2)
Iterations = 300001:1299001
Thinning interval = 1000
Sample size = 1000
DIC: 680.8927
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 2.31 0.5428 4.753 844.2
~species
post.mean l-95% CI u-95% CI eff.samp
species 1.164 0.6272 1.72 867.3
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.1881 0.1618 0.2127 1000
Location effects: habitat_range ~ mass
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.2339 -1.7632 1.4823 1000 0.774
mass 1.1143 0.9890 1.2515 1000 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
posterior_summary(mod1_mcmcglmm2$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.233917 0.86362777 -1.8194889 1.430111
mass 1.114301 0.06819815 0.9859776 1.250126
posterior_summary(mod1_mcmcglmm2$VCV)
Estimate Est.Error Q2.5 Q97.5
phylo 2.309659 1.1768520 0.7238491 5.2748480
species 1.163885 0.2830942 0.6511874 1.7726979
units 0.188124 0.0129251 0.1628494 0.2151048
summary(mod1_brms2)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: habitat_range ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data1 (Number of observations: 500)
Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.44 0.34 0.84 2.18 1.00 563 1266
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.07 0.13 0.82 1.34 1.00 851 1427
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.15 0.80 -1.73 1.45 1.00 3901 2789
mass 1.11 0.07 0.98 1.25 1.00 6559 2347
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.43 0.02 0.40 0.47 1.00 3843 2959
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Next, regarding the model that includes log-transformed body mass as an explanatory variable, the results showed that, as we had set(beta_c1
), mass was estimated to have a positive effect on habitat range. Additionally, examining the residuals reveals that they are much closer to the true value compared to the intercept-only model. This improvement suggests that including an explanatory variable helps the model capture more of the variability in the data, reducing the unexplained variance (residual variance).
summary(mod1_mcmcglmm3)
Iterations = 750001:3247501
Thinning interval = 2500
Sample size = 1000
DIC: 668.5273
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 2.325 0.5523 4.646 1073
~species
post.mean l-95% CI u-95% CI eff.samp
species 1.151 0.6569 1.764 900.3
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.1832 0.1572 0.2097 1000
Location effects: habitat_range ~ mass + sex
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.28323 -1.97048 1.38771 1000 0.674
mass 1.11239 0.97750 1.24672 1000 <0.001 ***
sexM 0.15016 0.06697 0.22862 1000 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
posterior_summary(mod1_mcmcglmm3$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) -0.2832264 0.85695461 -1.93569141 1.4800298
mass 1.1123899 0.06844718 0.97159841 1.2423309
sexM 0.1501593 0.04167755 0.06778142 0.2286585
posterior_summary(mod1_mcmcglmm3$VCV)
Estimate Est.Error Q2.5 Q97.5
phylo 2.3252270 1.16940589 0.7405934 5.2388026
species 1.1512006 0.29343601 0.6567145 1.7615075
units 0.1831836 0.01342064 0.1586928 0.2117723
summary(mod1_brms3)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: habitat_range ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data1 (Number of observations: 500)
Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.43 0.34 0.84 2.14 1.00 619 1255
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.08 0.13 0.83 1.33 1.00 964 1578
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.22 0.77 -1.76 1.26 1.00 3957 2850
mass 1.11 0.07 0.98 1.24 1.00 8492 2851
sexM 0.15 0.04 0.06 0.23 1.00 8621 2798
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.43 0.02 0.40 0.46 1.00 4112 2962
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Finally, let’s take a look at the results when both mass and sex are included in the model. We assumed that males are, on average, 0.1 units larger than females. This assumption is successfully reflected in the model estimates
Binary
The response variable in this analysis is breeding success, defined as whether a species reproduced or not. The assumptions are as follows: phylogeny has a weak relationship with breeding success, while some non-phylogenetic factors, such as environmental conditions, are likely to play a role. Additionally, heavier species are assumed to have a higher likelihood of breeding success, and females (coded as 0) are expected to succeed in breeding more often than males.
set.seed(456)
<- pbtree(n = 100, scale = 1)
tree2
# extract the correlation matrix from the phylogenetic tree
<- vcv(tree2, corr = TRUE)
phylo_cor
# define parameters
<- 100 # number of species (clusters)
n_species <- 5 # number of observations per species
obs_per_species <- n_species * obs_per_species # total number of observations
n
# simulate fixed effects
## continuous variable: body mass
<- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_mass <- mvrnorm(n = 1, # species level body mass
mu_species_mass mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
<- sapply(mu_species_mass,
obs_mass function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.2))) %>%
as.vector() # each observation body mass
## binomial variable: sex
<- rbinom(n, 1, 0.5)
sex
# simulate random effect parameter
## variance components
<- 0.4
sigma_phylo <- 0.3 # standard deviation of non-phylogenetic random effect
sigma_non
## phylogenetic random effect
<- mvrnorm(
random_effect_phylo n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
<- rep(random_effect_phylo, each = obs_per_species)
phylo_effect
## non-phylogenetic random effect
<- rnorm(n, sd = sqrt(sigma_non))
random_effect_non_phylo <- rep(random_effect_non_phylo, each = obs_per_species)
non_phylo_effect
# define fixed effect parameter
<- -0.5 # intercept
beta_b0 <- 0.4 # coefficient for body mass
beta_b1 <- -1.5 # coefficient for sex
beta_b2
# simulate linear predictor
<- beta_b0 + beta_b1 * obs_mass + beta_b2 * sex + phylo_effect + non_phylo_effect
eta1 # eta1 <- beta_b0 + phylo_effect + non_phylo_effect
# probability
<- 1 / (1 + exp(-eta1))
p1 <- rbinom(n, 1, p1)
y1
# combine simulated data
<- data.frame(
sim_data2 breeding = y1,
mass = obs_mass,
sex = factor(sex, labels = c("F", "M")),
species = names(phylo_effect),
individual_id = factor(1:n), # create unique individual IDs for each observation
phylo = names(phylo_effect)
%>%
) arrange(species) %>%
as.data.frame()
head(sim_data2, 20)
breeding mass sex species individual_id phylo
1 0 4.425580 M t1 146 t1
2 0 4.518592 M t1 147 t1
3 0 4.553237 M t1 148 t1
4 1 4.728218 F t1 149 t1
5 1 4.125475 F t1 150 t1
6 1 4.181977 M t10 251 t10
7 1 4.366759 F t10 252 t10
8 0 4.286815 F t10 253 t10
9 0 3.998315 M t10 254 t10
10 1 3.877700 F t10 255 t10
11 1 5.367269 M t100 311 t100
12 0 4.287155 M t100 312 t100
13 0 4.715682 F t100 313 t100
14 1 4.969713 M t100 314 t100
15 1 4.818693 F t100 315 t100
16 0 3.750907 M t11 256 t11
17 1 4.016580 F t11 257 t11
18 1 3.722079 F t11 258 t11
19 1 3.325677 M t11 259 t11
20 1 3.100769 F t11 260 t11
Run models
# MCMCglmm
<- inverseA(tree2, nodes = "ALL", scale = TRUE)
inv_phylo <- list(R = list(V = 1, fix = 1),
prior2_mcmcglmm G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(breeding ~ 1,
mod2_mcmcglmm_logit random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*10,
thin = 10*10,
burnin = 3000*10)
)
system.time(
<- MCMCglmm(breeding ~ mass,
mod2_mcmcglmm_logit2 random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)
system.time(
<- MCMCglmm(breeding ~ mass + sex,
mod2_mcmcglmm_logit3 random = ~ phylo + species,
family = "categorical",
data = sim_data2,
prior = prior2_mcmcglmm,
ginverse = list(phylo = inv_phylo$Ainv),
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20)
)
# brms
<- ape::vcv.phylo(tree2, corr = TRUE)
A
<- default_prior(
priors2_brms ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
breeding data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
<- brm(breeding ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
mod2_brms_logit data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
priors2_brms2 ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
breeding data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
<- brm(breeding ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
mod2_brms_logit2 data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms2,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
priors2_brms3 ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
breeding data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit")
)
system.time(
<- brm(breeding ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
mod2_brms_logit3 data = sim_data2,
data2 = list(A = A),
family = bernoulli(link = "logit"),
prior = priors2_brms3,
iter = 8000,
warmup = 6000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
) )
Results
True values are here:
# actual mean and variance of y
mean(eta1)
[1] 1.191048
var(eta1)
[1] 1.277676
var(random_effect_phylo)
[1] 0.2981068
var(random_effect_non_phylo)
[1] 0.3090161
Before checking the output,we need to correct the results from MCMCglmm
using the following equation:
<- (16 * sqrt(3) / (15 * pi))^2 c2
summary(mod2_mcmcglmm_logit)
Iterations = 30001:129901
Thinning interval = 100
Sample size = 1000
DIC: 550.715
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 0.9548 8.97e-06 2.355 639.4
~species
post.mean l-95% CI u-95% CI eff.samp
species 0.4156 3.523e-08 1.201 486.4
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: breeding ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.1605 0.1331 2.1428 1272 0.034 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- mod2_mcmcglmm_logit$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mod2_mcmcglmm_logit$VCV / (1+c2) # for variance components
res_2
summary(res_1)
Iterations = 30001:129901
Thinning interval = 100
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
1.00036 0.43343 0.01371 0.01215
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
0.1144 0.7425 1.0093 1.2709 1.8407
summary(res_2)
Iterations = 30001:129901
Thinning interval = 100
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
phylo 0.7094 0.5174 0.016361 0.02046
species 0.3088 0.2845 0.008996 0.01290
units 0.7430 0.0000 0.000000 0.00000
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
phylo 0.02672 0.3361 0.5973 0.9889 2.040
species 0.00126 0.0810 0.2530 0.4560 1.028
units 0.74303 0.7430 0.7430 0.7430 0.743
summary(mod2_brms_logit)
Family: bernoulli
Links: mu = logit
Formula: breeding ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data2 (Number of observations: 500)
Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.77 0.31 0.18 1.41 1.01 542 820
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.51 0.25 0.03 1.00 1.00 475 924
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.99 0.42 0.13 1.81 1.00 1407 1848
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The fixed effect estimates in both MCMCglmm and brms are close to the true values, but the phylogenetic random effects are not performing well. However, the wide 95% confidence intervals indicate that they have very large uncertainties.
And phylogenetic signal is…
# MCMCglmm
<- ((mod2_mcmcglmm_logit$VCV[, "phylo"]/(1+c2)) / (mod2_mcmcglmm_logit$VCV[, "phylo"]/(1+c2) + mod2_mcmcglmm_logit$VCV[, "species"]/(1+c2) + 1))
phylo_signal_mod2_mcmcglmm %>% mean() phylo_signal_mod2_mcmcglmm
[1] 0.3233245
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mod2_mcmcglmm
2.5% 50% 97.5%
0.01696071 0.32167758 0.64318606
# brms
<- mod2_brms_logit %>% as_tibble() %>%
phylo_signal_mod2_brms ::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept) %>%
dplyrmutate(lambda_binary = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 +1)) %>%
pull(lambda_binary)
%>% mean() phylo_signal_mod2_brms
[1] 0.3117823
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mod2_brms
2.5% 50% 97.5%
0.01973947 0.30714261 0.63302530
The estimated phylogenetic signal values were similar in both models.
summary(mod2_mcmcglmm_logit2)
Iterations = 60001:259801
Thinning interval = 200
Sample size = 1000
DIC: 543.8511
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 1.002 2.289e-06 2.566 972.4
~species
post.mean l-95% CI u-95% CI eff.samp
species 0.5078 4.507e-06 1.34 868.1
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: breeding ~ mass
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -1.5979 -3.6412 0.6322 1000 0.126
mass 0.6094 0.2202 1.0957 1000 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mod2_mcmcglmm_logit2$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mod2_mcmcglmm_logit2$VCV / (1+c2) # for variance components
res_2
summary(res_1)
Iterations = 60001:259801
Thinning interval = 200
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
(Intercept) -1.3774 0.9619 0.030419 0.030419
mass 0.5253 0.1910 0.006038 0.006038
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
(Intercept) -3.2469 -2.0173 -1.3249 -0.7434 0.4585
mass 0.1571 0.3912 0.5174 0.6488 0.9140
summary(res_2)
Iterations = 60001:259801
Thinning interval = 200
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
phylo 0.7442 0.5982 0.01892 0.01918
species 0.3773 0.3251 0.01028 0.01103
units 0.7430 0.0000 0.00000 0.00000
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
phylo 0.016307 0.3010 0.6173 1.0441 2.215
species 0.001679 0.1132 0.3198 0.5479 1.194
units 0.743029 0.7430 0.7430 0.7430 0.743
summary(mod2_brms_logit2)
Family: bernoulli
Links: mu = logit
Formula: breeding ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data2 (Number of observations: 500)
Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.80 0.35 0.15 1.53 1.00 418 497
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.55 0.28 0.03 1.09 1.00 414 766
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -1.42 1.01 -3.51 0.45 1.00 2097 1850
mass 0.52 0.19 0.15 0.92 1.00 2664 2631
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
summary(mod2_mcmcglmm_logit3)
Iterations = 60001:259801
Thinning interval = 200
Sample size = 1000
DIC: 493.3098
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 1.629 0.0009646 4.062 910
~species
post.mean l-95% CI u-95% CI eff.samp
species 0.6195 1.258e-06 1.682 756.2
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: breeding ~ mass + sex
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.8882 -3.4396 1.7246 1000 0.462
mass 0.7312 0.2433 1.2183 1000 0.002 **
sexM -2.0358 -2.6616 -1.4547 1000 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- mod2_mcmcglmm_logit3$Sol / sqrt(1+c2) # for fixed effects
res_1 <- mod2_mcmcglmm_logit3$VCV / (1+c2) # for variance components
res_2
summary(res_1)
Iterations = 60001:259801
Thinning interval = 200
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
(Intercept) -0.7656 1.1431 0.036149 0.036149
mass 0.6303 0.2197 0.006946 0.006946
sexM -1.7548 0.2780 0.008791 0.008791
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
(Intercept) -2.9652 -1.4957 -0.7345 -0.08661 1.485
mass 0.2142 0.4818 0.6342 0.77977 1.060
sexM -2.3071 -1.9429 -1.7496 -1.56044 -1.263
summary(res_2)
Iterations = 60001:259801
Thinning interval = 200
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
phylo 1.2105 0.9233 0.02920 0.03061
species 0.4603 0.3948 0.01249 0.01436
units 0.7430 0.0000 0.00000 0.00000
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
phylo 0.043912 0.6072 1.031 1.573 3.571
species 0.003595 0.1428 0.374 0.671 1.391
units 0.743029 0.7430 0.743 0.743 0.743
summary(mod2_brms_logit3)
Family: bernoulli
Links: mu = logit
Formula: breeding ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data2 (Number of observations: 500)
Draws: 2 chains, each with iter = 8000; warmup = 6000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.01 0.40 0.24 1.86 1.00 776 1072
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.60 0.30 0.05 1.18 1.00 670 1537
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.79 1.14 -3.12 1.38 1.00 3595 3216
mass 0.62 0.22 0.19 1.06 1.00 4898 3060
sexM -1.75 0.28 -2.29 -1.23 1.00 6333 3267
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The coefficient estimates for both mass and sex were close to the values we set :)
Ordinal
The response variable is the aggressiveness level, which has three categories: Low, Medium, and High. It is assumed that phylogeny plays a significant role in influencing aggressiveness. However, non-phylogenetic factors are expected to have only a weak relationship with aggressiveness. The relationship between body mass and aggressiveness is unclear, and it may vary across species. Additionally, males are more likely to exhibit higher levels of aggressiveness compared to females.
set.seed(789)
<- pbtree(n = 100, scale = 1)
tree3
# extract the correlation matrix from the phylogenetic tree
<- vcv(tree3, corr = TRUE)
phylo_cor
# define parameters
<- 100 # number of species (clusters)
n_species <- 5 # number of observations per species
obs_per_species <- n_species * obs_per_species # Total number of observations
n
# simulate fixed effects
## continuous variable: body mass
<- (log(4)+ log(2500))/2 # mean of whole 100 species body mass
mu_mass <- mvrnorm(n = 1, # species level body mass
mu_species_mass mu = rep(mu_mass, ncol(phylo_cor)),
Sigma = phylo_cor*1)
<- sapply(mu_species_mass,
obs_mass function(x) rnorm(obs_per_species, mean = x, sd = sqrt(0.2))) %>%
as.vector() # each observation body mass
## binomial variable: sex
<- rbinom(n, 1, 0.5)
sex
# simulate random effect parameter
## variance components
<- 1
sigma_phylo <- 0.4 # standard deviation of non-phylogenetic random effect
sigma_non
## phylogenetic random effect
<- mvrnorm(
random_effect_phylo n = 1,
mu = rep(0, ncol(phylo_cor)),
Sigma = sigma_phylo*phylo_cor)
<- rep(random_effect_phylo, each = obs_per_species)
phylo_effect
## non-phylogenetic random effect
<- rnorm(n_species, sd = sqrt(sigma_non))
random_effect_non_phylo <- rep(random_effect_non_phylo, each = obs_per_species)
non_phylo_effect
# define fixed effect coefficients
<- -0.5 # intercept
beta_o0 <- 0 # coefficient for mass - controversial whether body mass is related to aggressiveness
beta_o1 <- 1 # coefficient for sex - males are often more aggressive than females
beta_o2
# simulate the linear predictor
<- beta_o0 + beta_o1 * obs_mass + beta_o2 * sex + non_phylo_effect + phylo_effect
eta
# define thresholds (cutpoints) for the ordinal categories
<- c(-0.5, 0.5)
cutpoints
# Calculate cumulative probabilities using the probit link function
<- pnorm(cutpoints[1] - eta)
p1 <- pnorm(cutpoints[2] - eta) - p1
p2 <- 1 - pnorm(cutpoints[2] - eta)
p3
# combine probabilities into a matrix
<- cbind(p1, p2, p3)
probs
# simulate the ordinal response
<- apply(probs, 1, function(prob) sample(1:3, 1, prob = prob))
aggressive_level
# create a data frame
<- data.frame(
sim_data3 <- rep(1:n, each = 1),
individual_id phylo = names(phylo_effect),
species = names(phylo_effect),
aggressive_level = factor(aggressive_level,
levels = 1:3,
labels = c("Low", "Medium", "High"),
ordered = TRUE),
sex = factor(sex, labels = c("F", "M")),
mass = obs_mass
%>%
) arrange(species)
table(sim_data3$aggressive_level)
Low Medium High
97 143 260
Run models
# mcmcglmm
<- inverseA(tree3, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(R = list(V = 1, fix = 1),
prior1 G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10),
G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 10)
)
)
system.time(
<- MCMCglmm(aggressive_level ~ 1,
mod3_mcmcglmm random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
system.time(
<- MCMCglmm(aggressive_level ~ mass,
mod3_mcmcglmm2 random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
system.time(
<- MCMCglmm(aggressive_level ~ mass + sex,
mod3_mcmcglmm3 random = ~ phylo + species,
ginverse = list(phylo = inv_phylo$Ainv),
family = "threshold",
data = sim_data3,
prior = prior1,
nitt = 13000*20,
thin = 10*20,
burnin = 3000*20
)
)
# brms
<- ape::vcv.phylo(tree3, corr = TRUE)
A
<- default_prior(
default_priors ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
aggressive_level data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
mod3_brms formula = aggressive_level ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
default_priors2 ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
aggressive_level data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
mod3_brms2 formula = aggressive_level ~ mass + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors2,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
)
)
<- default_prior(
default_priors3 ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
aggressive_level data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A)
)
system.time(
<- brm(
mod3_brms3 formula = aggressive_level ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species),
data = sim_data3,
family = cumulative(link = "probit"),
data2 = list(A = A),
prior = default_priors3,
iter = 9000,
warmup = 7000,
thin = 1,
chain = 2,
control = list(adapt_delta = 0.95),
) )
Results
The estimates should align with the following values…
mean(eta)
[1] 0.7347869
var(eta)
[1] 1.484684
var(random_effect_phylo)
[1] 0.8430567
var(random_effect_non_phylo)
[1] 0.3352239
summary(mod3_mcmcglmm)
Iterations = 60001:259801
Thinning interval = 200
Sample size = 1000
DIC: 864.7065
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 1.117 0.2213 2.167 1000
~species
post.mean l-95% CI u-95% CI eff.samp
species 0.3178 9.188e-07 0.6704 1099
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: aggressive_level ~ 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.3987 0.5334 2.1765 1000 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitaggressive_level.1 1.12 0.9581 1.277 1007
posterior_summary(mod3_mcmcglmm$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 1.398716 0.4201698 0.5994656 2.295179
posterior_summary(mod3_mcmcglmm$VCV)
Estimate Est.Error Q2.5 Q97.5
phylo 1.1170856 0.5334151 0.36097048 2.3726763
species 0.3178276 0.1906041 0.02409845 0.7331308
units 1.0000000 0.0000000 1.00000000 1.0000000
posterior_summary(mod3_mcmcglmm$CP)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitaggressive_level.1 1.119532 0.08512686 0.9590519 1.278939
summary(mod3_brms)
Family: cumulative
Links: mu = probit; disc = identity
Formula: aggressive_level ~ 1 + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data3 (Number of observations: 500)
Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.01 0.24 0.58 1.52 1.00 731 1479
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.55 0.17 0.19 0.87 1.00 578 657
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -1.32 0.42 -2.16 -0.51 1.00 2085 2250
Intercept[2] -0.20 0.41 -1.03 0.60 1.00 2074 2333
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
the estimates are consistent with the true values, and both model outputs are as well. Then, phylogenetic signal is…
# MCMCglmm
<- ((mod3_mcmcglmm$VCV[, "phylo"]) / (mod3_mcmcglmm$VCV[, "phylo"] + mod3_mcmcglmm$VCV[, "species"] + 1))
phylo_signal_mod3_mcmcglmm %>% mean() phylo_signal_mod3_mcmcglmm
[1] 0.438553
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mod3_mcmcglmm
2.5% 50% 97.5%
0.1911082 0.4385983 0.6808732
# brms
<- mod3_brms %>% as_tibble() %>%
phylo_signal_mod3_brms ::select(Sigma_phy = sd_phylo__Intercept, Sigma_non_phy = sd_species__Intercept) %>%
dplyrmutate(lambda_ordinal = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 +1)) %>%
pull(lambda_ordinal)
%>% mean() phylo_signal_mod3_brms
[1] 0.4288363
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signal_mod3_brms
2.5% 50% 97.5%
0.1788238 0.4319170 0.6689190
Nice - both models yielded nearly identical phylogenetic signals.
summary(mod3_mcmcglmm2)
Iterations = 60001:259801
Thinning interval = 200
Sample size = 1000
DIC: 862.4181
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 1.208 0.3157 2.251 1000
~species
post.mean l-95% CI u-95% CI eff.samp
species 0.2989 2.879e-06 0.6329 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: aggressive_level ~ mass
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 2.16331 0.84674 3.45252 1000 0.002 **
mass -0.16112 -0.36658 0.03244 1000 0.126
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitaggressive_level.1 1.126 0.9627 1.299 1000
posterior_summary(mod3_mcmcglmm2$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 2.163314 0.6643733 0.8905397 3.49905134
mass -0.161119 0.1028227 -0.3548954 0.04503046
posterior_summary(mod3_mcmcglmm2$VCV)
Estimate Est.Error Q2.5 Q97.5
phylo 1.2078921 0.5324432 0.400101109 2.4861135
species 0.2989292 0.1824707 0.009196968 0.7111517
units 1.0000000 0.0000000 1.000000000 1.0000000
posterior_summary(mod3_mcmcglmm2$CP)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitaggressive_level.1 1.125844 0.08611264 0.9675762 1.303084
summary(mod3_brms2)
Family: cumulative
Links: mu = probit; disc = identity
Formula: aggressive_level ~ mass + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data3 (Number of observations: 500)
Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.08 0.25 0.63 1.63 1.00 797 1655
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.50 0.19 0.08 0.84 1.00 526 647
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -2.11 0.66 -3.41 -0.84 1.00 4761 3053
Intercept[2] -0.99 0.65 -2.28 0.29 1.00 4540 2992
mass -0.16 0.10 -0.37 0.04 1.00 6574 2840
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
summary(mod3_mcmcglmm3)
Iterations = 60001:259801
Thinning interval = 200
Sample size = 1000
DIC: 785.7335
G-structure: ~phylo
post.mean l-95% CI u-95% CI eff.samp
phylo 1.664 0.3929 3.216 1000
~species
post.mean l-95% CI u-95% CI eff.samp
species 0.4061 9.11e-06 0.8454 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 1 1 1 0
Location effects: aggressive_level ~ mass + sex
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.6060 0.1500 3.1995 1000 0.028 *
mass -0.1199 -0.3519 0.1025 1000 0.308
sexM 1.0586 0.7945 1.3036 1117 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cutpoints:
post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitaggressive_level.1 1.277 1.091 1.487 1000
posterior_summary(mod3_mcmcglmm3$Sol)
Estimate Est.Error Q2.5 Q97.5
(Intercept) 1.6059858 0.7827238 0.09396171 3.1803784
mass -0.1198975 0.1181710 -0.35162836 0.1026163
sexM 1.0585665 0.1318889 0.81133692 1.3344116
posterior_summary(mod3_mcmcglmm3$VCV)
Estimate Est.Error Q2.5 Q97.5
phylo 1.663718 0.8053196 0.58185142 3.5586348
species 0.406108 0.2440002 0.02279807 0.9607917
units 1.000000 0.0000000 1.00000000 1.0000000
posterior_summary(mod3_mcmcglmm3$CP)
Estimate Est.Error Q2.5 Q97.5
cutpoint.traitaggressive_level.1 1.277118 0.1012452 1.088106 1.485881
summary(mod3_brms3)
Family: cumulative
Links: mu = probit; disc = identity
Formula: aggressive_level ~ mass + sex + (1 | gr(phylo, cov = A)) + (1 | species)
Data: sim_data3 (Number of observations: 500)
Draws: 2 chains, each with iter = 9000; warmup = 7000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.25 0.30 0.72 1.90 1.00 423 1179
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.59 0.21 0.11 0.97 1.00 357 408
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -1.58 0.76 -3.08 -0.12 1.00 2467 2888
Intercept[2] -0.30 0.76 -1.80 1.14 1.00 2420 3074
mass -0.13 0.11 -0.35 0.09 1.00 3431 3348
sexM 1.05 0.13 0.80 1.31 1.00 5609 3295
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Both MCMCglmm and brms estimated very similar values, and the fixed effect coefficients were close to the values set in the simulation.
Nominal
The response variable is lateralisation (handedness/footedness), categorized as Left, Right, or Both, with “Both” serving as the reference category, representing non-preference. It is assumed that phylogeny is related to lateralisation, with a weak negative correlation between left and right preferences. Non-phylogenetic factors are expected to have only a weak influence on lateralisation. While body mass may have a minor effect, sex would not have significant effect on lateralisation. The majority of individuals tend to show a preference for either the left or right side, with only a few exhibiting no clear preference.
Here, we create a simple dataset and check the results from the intercept-only model, as there are some challenges. The nominal model is more complex, so when we use similar settings to the three models above, the sample size we have set is too small. As a result, the variance and correlation do not align. Therefore, in this case, we do not include the effects of mass or sex when simulating the linear predictor. Additionally, both phylogenetic and non-phylogenetic correlations are set to zero.
set.seed(789)
<- pbtree(n = 100, scale = 1)
tree4
# extract the correlation matrix from the phylogenetic tree
<- vcv(tree4, corr = TRUE)
phylo_cor
# define parameters
<- 100 # number of species (clusters)
n_species <- 5 # number of observations per species
obs_per_species <- n_species * obs_per_species # total number of observations
n <- 3
n_categories
# define standard deviations and correlations for the random effects
<- c(1, 1.5) # standard deviations for phylogenetic random effects for left and right
sigma_phylo <- 0 # correlation between phylogenetic random effects for left and right
cor_phylo <- c(1.1, 1) # standard deviations for non-phylogenetic random effects for left and right
sigma_u <- 0 # correlation between non-phylogenetic random effects for left and right
cor_u
# covariance matrices
<- diag(sigma_u^2)
cov_u 1, 2] <- cov_u[2, 1] <- cor_u * sigma_u[1] * sigma_u[2]
cov_u[
<- diag(sigma_phylo^2)
cov_phylo 1, 2] <- cov_phylo[2, 1] <- cor_phylo * sigma_phylo[1] * sigma_phylo[2]
cov_phylo[
# simulate phylogenetic random effects
<- mvrnorm(n = 1, mu = rep(0, n_species * 2), Sigma = kronecker(cov_phylo, phylo_cor))
phylo_random_effects <- rep(phylo_random_effects[1:n_species], each = obs_per_species)
phylo_effect_1 <- rep(phylo_random_effects[(n_species + 1):(2 * n_species)], each = obs_per_species)
phylo_effect_2
# simulate non-phylogenetic random effects
<- mvrnorm(n_species, mu = c(0, 0), Sigma = cov_u)
non_phylo_random_effects <- rep(non_phylo_random_effects[, 1], each = obs_per_species)
non_phylo_effect_1 <- rep(non_phylo_random_effects[, 2], each = obs_per_species)
non_phylo_effect_2
# define fixed effect coefficients for each category relative to the reference category (Both)
<- c(0.4, 0.8) # intercept for left and right
beta_m0 <- c(0.2, -0.1) # coefficient for mass for left and right
beta_m1 <- c(0, 0) # coefficient for sex for left and right
beta_m2
# simulate the linear predictor for each category relative to the reference
<- matrix(0, nrow = n, ncol = 2)
eta 1] <- beta_m0[1] + non_phylo_effect_1 + phylo_effect_1
eta[, 2] <- beta_m0[2] + non_phylo_effect_2 + phylo_effect_2
eta[,
# calculate probabilities using the multinomial logit link function
<- exp(cbind(eta, 0)) # adding a column of zeros for the reference category
exp_eta <- exp_eta / rowSums(exp_eta)
probs
# simulate the multinomial response
<- apply(probs, 1, function(prob) sample(1:3, 1, prob = prob))
lateralisation
# generate individual ID for each observation
<- rep(1:n, each = 1) # create unique individual IDs for each observation
individual_id
# create a data frame
<- data.frame(
sim_data4 individual_id = individual_id,
phylo = rep(tree4$tip.label, each = 5),
species = rep(tree4$tip.label, each = 5),
lateralisation = factor(lateralisation,
levels = 1:n_categories,
labels = c("Left", "Right", "Both"),
ordered = FALSE),
sex = factor(sex, labels = c("F", "M")),
mass = obs_mass
%>%
) arrange(species)
$lateralisation <- relevel(sim_data4$lateralisation, ref = "Both")
sim_data4
table(sim_data4$lateralisation)
Both Left Right
45 208 247
Run models
<- inverseA(tree4, nodes = "ALL", scale = TRUE)
inv_phylo
<- list(
prior1 R = list(V = (matrix(1, 2, 2) + diag(2)) / 3, fix = 1),
G = list(G1 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
),G2 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2), alpha.V = diag(2)
)
)
)
system.time(
<- MCMCglmm(lateralisation ~ trait - 1,
mcmcglmm_mod4 random = ~us(trait):phylo + us(trait):species,
rcov = ~us(trait):units,
ginverse = list(phylo = inv_phylo$Ainv),
family = "categorical",
data = sim_data4,
prior = prior1,
nitt = 13000*150,
thin = 10*150,
burnin = 3000*150
)
)
# brms
<- ape::vcv.phylo(tree4, corr = TRUE)
A <- default_prior(lateralisation ~ 1 + (1 |a| gr(phylo, cov = A)) + (1 |b| species),
priors_brms1 data = sim_data4,
data2 = list(A = A),
family = categorical(link = "logit")
)
system.time(
<- brm(lateralisation ~ 1 + (1 |a| gr(phylo, cov = A)) + (1 |b| species),
brms_mod4 data = sim_data4,
data2 = list(A = A),
family = categorical(link = "logit"),
prior = priors_brms1,
iter = 12000,
warmup = 8000,
chains = 2,
thin = 1,
cores = 2,
control = list(adapt_delta = 0.99),
) )
Results
True values are…
mean(eta[, 1])
[1] 1.498934
mean(eta[, 2])
[1] 1.71958
var(phylo_effect_1)
[1] 0.888621
var(phylo_effect_2)
[1] 1.853071
var(non_phylo_effect_1)
[1] 1.135754
var(non_phylo_effect_2)
[1] 1.184511
cor(phylo_effect_1, phylo_effect_2)
[1] 0.1435537
cor(non_phylo_effect_1, non_phylo_effect_2)
[1] 0.01583177
Before checking the output,we need to correct the results from MCMCglmm
using the following equation:
<- (16 * sqrt(3) / (15 * pi))^2
c2 <- c2*(2/3) c2a
summary(mod4_mcmcglmm)
Iterations = 450001:1948501
Thinning interval = 1500
Sample size = 1000
DIC: 758.4723
G-structure: ~us(trait):phylo
post.mean l-95% CI
traitlateralisation.Left:traitlateralisation.Left.phylo 0.5216 2.422e-07
traitlateralisation.Right:traitlateralisation.Left.phylo -0.4840 -2.272e+00
traitlateralisation.Left:traitlateralisation.Right.phylo -0.4840 -2.272e+00
traitlateralisation.Right:traitlateralisation.Right.phylo 4.6297 1.388e+00
u-95% CI eff.samp
traitlateralisation.Left:traitlateralisation.Left.phylo 1.6198 1000
traitlateralisation.Right:traitlateralisation.Left.phylo 0.8416 1000
traitlateralisation.Left:traitlateralisation.Right.phylo 0.8416 1000
traitlateralisation.Right:traitlateralisation.Right.phylo 9.1477 1000
~us(trait):species
post.mean
traitlateralisation.Left:traitlateralisation.Left.species 1.6045
traitlateralisation.Right:traitlateralisation.Left.species -0.4089
traitlateralisation.Left:traitlateralisation.Right.species -0.4089
traitlateralisation.Right:traitlateralisation.Right.species 1.0943
l-95% CI u-95% CI
traitlateralisation.Left:traitlateralisation.Left.species 3.293e-01 3.3113
traitlateralisation.Right:traitlateralisation.Left.species -1.402e+00 0.5503
traitlateralisation.Left:traitlateralisation.Right.species -1.402e+00 0.5503
traitlateralisation.Right:traitlateralisation.Right.species 1.113e-06 2.6165
eff.samp
traitlateralisation.Left:traitlateralisation.Left.species 1000
traitlateralisation.Right:traitlateralisation.Left.species 1000
traitlateralisation.Left:traitlateralisation.Right.species 1000
traitlateralisation.Right:traitlateralisation.Right.species 1294
R-structure: ~us(trait):units
post.mean l-95% CI
traitlateralisation.Left:traitlateralisation.Left.units 0.6667 0.6667
traitlateralisation.Right:traitlateralisation.Left.units 0.3333 0.3333
traitlateralisation.Left:traitlateralisation.Right.units 0.3333 0.3333
traitlateralisation.Right:traitlateralisation.Right.units 0.6667 0.6667
u-95% CI eff.samp
traitlateralisation.Left:traitlateralisation.Left.units 0.6667 0
traitlateralisation.Right:traitlateralisation.Left.units 0.3333 0
traitlateralisation.Left:traitlateralisation.Right.units 0.3333 0
traitlateralisation.Right:traitlateralisation.Right.units 0.6667 0
Location effects: lateralisation ~ trait - 1
post.mean l-95% CI u-95% CI eff.samp pMCMC
traitlateralisation.Left 0.70100 -0.04992 1.49781 1000 0.086 .
traitlateralisation.Right 0.48833 -1.14012 2.22379 1000 0.578
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- mod4_mcmcglmm$Sol / sqrt(1+c2a) # for fixed effects
res_1 <- mod4_mcmcglmm$VCV / (1+c2a) # for variance components
res_2 <- (mod4_mcmcglmm$VCV[, 2]/(1+c2a)) /sqrt((mod4_mcmcglmm$VCV[, 1] * mod4_mcmcglmm$VCV[, 4])/(1+c2a))
res_3_corr_phylo <- (mod4_mcmcglmm$VCV[, 6]/(1+c2a)) /sqrt((mod4_mcmcglmm$VCV[, 5] * mod4_mcmcglmm$VCV[, 8])/(1+c2a))
res_3_corr_nonphylo
summary(res_1)
Iterations = 450001:1948501
Thinning interval = 1500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
traitlateralisation.Left 0.6319 0.3392 0.01073 0.01073
traitlateralisation.Right 0.4402 0.7946 0.02513 0.02513
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
traitlateralisation.Left -0.1209 0.44513 0.6354 0.8306 1.306
traitlateralisation.Right -1.0131 -0.09662 0.4496 0.9362 2.079
summary(res_2)
Iterations = 450001:1948501
Thinning interval = 1500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD
traitlateralisation.Left:traitlateralisation.Left.phylo 0.4239 0.4635
traitlateralisation.Right:traitlateralisation.Left.phylo -0.3933 0.6352
traitlateralisation.Left:traitlateralisation.Right.phylo -0.3933 0.6352
traitlateralisation.Right:traitlateralisation.Right.phylo 3.7622 1.7766
traitlateralisation.Left:traitlateralisation.Left.species 1.3038 0.6761
traitlateralisation.Right:traitlateralisation.Left.species -0.3323 0.3875
traitlateralisation.Left:traitlateralisation.Right.species -0.3323 0.3875
traitlateralisation.Right:traitlateralisation.Right.species 0.8892 0.7171
traitlateralisation.Left:traitlateralisation.Left.units 0.5418 0.0000
traitlateralisation.Right:traitlateralisation.Left.units 0.2709 0.0000
traitlateralisation.Left:traitlateralisation.Right.units 0.2709 0.0000
traitlateralisation.Right:traitlateralisation.Right.units 0.5418 0.0000
Naive SE
traitlateralisation.Left:traitlateralisation.Left.phylo 0.01466
traitlateralisation.Right:traitlateralisation.Left.phylo 0.02009
traitlateralisation.Left:traitlateralisation.Right.phylo 0.02009
traitlateralisation.Right:traitlateralisation.Right.phylo 0.05618
traitlateralisation.Left:traitlateralisation.Left.species 0.02138
traitlateralisation.Right:traitlateralisation.Left.species 0.01226
traitlateralisation.Left:traitlateralisation.Right.species 0.01226
traitlateralisation.Right:traitlateralisation.Right.species 0.02268
traitlateralisation.Left:traitlateralisation.Left.units 0.00000
traitlateralisation.Right:traitlateralisation.Left.units 0.00000
traitlateralisation.Left:traitlateralisation.Right.units 0.00000
traitlateralisation.Right:traitlateralisation.Right.units 0.00000
Time-series SE
traitlateralisation.Left:traitlateralisation.Left.phylo 0.01466
traitlateralisation.Right:traitlateralisation.Left.phylo 0.02009
traitlateralisation.Left:traitlateralisation.Right.phylo 0.02009
traitlateralisation.Right:traitlateralisation.Right.phylo 0.05618
traitlateralisation.Left:traitlateralisation.Left.species 0.02138
traitlateralisation.Right:traitlateralisation.Left.species 0.01226
traitlateralisation.Left:traitlateralisation.Right.species 0.01226
traitlateralisation.Right:traitlateralisation.Right.species 0.01993
traitlateralisation.Left:traitlateralisation.Left.units 0.00000
traitlateralisation.Right:traitlateralisation.Left.units 0.00000
traitlateralisation.Left:traitlateralisation.Right.units 0.00000
traitlateralisation.Right:traitlateralisation.Right.units 0.00000
2. Quantiles for each variable:
2.5% 25%
traitlateralisation.Left:traitlateralisation.Left.phylo 0.0009474 0.09136
traitlateralisation.Right:traitlateralisation.Left.phylo -1.8018394 -0.71723
traitlateralisation.Left:traitlateralisation.Right.phylo -1.8018394 -0.71723
traitlateralisation.Right:traitlateralisation.Right.phylo 1.3864388 2.45513
traitlateralisation.Left:traitlateralisation.Left.species 0.3061715 0.82577
traitlateralisation.Right:traitlateralisation.Left.species -1.1075741 -0.56624
traitlateralisation.Left:traitlateralisation.Right.species -1.1075741 -0.56624
traitlateralisation.Right:traitlateralisation.Right.species 0.0251312 0.33532
traitlateralisation.Left:traitlateralisation.Left.units 0.5417579 0.54176
traitlateralisation.Right:traitlateralisation.Left.units 0.2708789 0.27088
traitlateralisation.Left:traitlateralisation.Right.units 0.2708789 0.27088
traitlateralisation.Right:traitlateralisation.Right.units 0.5417579 0.54176
50% 75%
traitlateralisation.Left:traitlateralisation.Left.phylo 0.2797 0.606425
traitlateralisation.Right:traitlateralisation.Left.phylo -0.3356 -0.003957
traitlateralisation.Left:traitlateralisation.Right.phylo -0.3356 -0.003957
traitlateralisation.Right:traitlateralisation.Right.phylo 3.4326 4.670559
traitlateralisation.Left:traitlateralisation.Left.species 1.1969 1.673444
traitlateralisation.Right:traitlateralisation.Left.species -0.3359 -0.104277
traitlateralisation.Left:traitlateralisation.Right.species -0.3359 -0.104277
traitlateralisation.Right:traitlateralisation.Right.species 0.7494 1.269344
traitlateralisation.Left:traitlateralisation.Left.units 0.5418 0.541758
traitlateralisation.Right:traitlateralisation.Left.units 0.2709 0.270879
traitlateralisation.Left:traitlateralisation.Right.units 0.2709 0.270879
traitlateralisation.Right:traitlateralisation.Right.units 0.5418 0.541758
97.5%
traitlateralisation.Left:traitlateralisation.Left.phylo 1.6498
traitlateralisation.Right:traitlateralisation.Left.phylo 0.8138
traitlateralisation.Left:traitlateralisation.Right.phylo 0.8138
traitlateralisation.Right:traitlateralisation.Right.phylo 8.0450
traitlateralisation.Left:traitlateralisation.Left.species 2.8234
traitlateralisation.Right:traitlateralisation.Left.species 0.4842
traitlateralisation.Left:traitlateralisation.Right.species 0.4842
traitlateralisation.Right:traitlateralisation.Right.species 2.6216
traitlateralisation.Left:traitlateralisation.Left.units 0.5418
traitlateralisation.Right:traitlateralisation.Left.units 0.2709
traitlateralisation.Left:traitlateralisation.Right.units 0.2709
traitlateralisation.Right:traitlateralisation.Right.units 0.5418
summary(res_3_corr_phylo)
Iterations = 450001:1948501
Thinning interval = 1500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
-0.32095 0.43943 0.01390 0.01465
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-0.86925 -0.68977 -0.42416 -0.02273 0.67656
summary(res_3_corr_nonphylo)
Iterations = 450001:1948501
Thinning interval = 1500
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
-0.35545 0.34670 0.01096 0.01096
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
-0.8562 -0.6255 -0.4143 -0.1322 0.4288
summary(mod4_brms)
Family: categorical
Links: muLeft = logit; muRight = logit
Formula: lateralisation ~ 1 + (1 | a | gr(phylo, cov = A)) + (1 | b | species)
Data: sim_data4 (Number of observations: 500)
Draws: 2 chains, each with iter = 12000; warmup = 8000; thin = 1;
total post-warmup draws = 8000
Multilevel Hyperparameters:
~phylo (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI
sd(muLeft_Intercept) 0.60 0.37 0.03 1.41
sd(muRight_Intercept) 1.99 0.46 1.21 3.02
cor(muLeft_Intercept,muRight_Intercept) -0.28 0.46 -0.96 0.73
Rhat Bulk_ESS Tail_ESS
sd(muLeft_Intercept) 1.00 1258 2496
sd(muRight_Intercept) 1.00 2445 4477
cor(muLeft_Intercept,muRight_Intercept) 1.00 480 746
~species (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI
sd(muLeft_Intercept) 1.16 0.31 0.56 1.79
sd(muRight_Intercept) 0.94 0.41 0.14 1.76
cor(muLeft_Intercept,muRight_Intercept) -0.35 0.39 -0.97 0.45
Rhat Bulk_ESS Tail_ESS
sd(muLeft_Intercept) 1.00 2264 2590
sd(muRight_Intercept) 1.00 1100 1616
cor(muLeft_Intercept,muRight_Intercept) 1.00 1699 2808
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
muLeft_Intercept 0.63 0.35 -0.11 1.32 1.00 3793 4140
muRight_Intercept 0.37 0.79 -1.18 1.89 1.00 3953 4832
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Please look at the correlation part - estimating random effect covariance is harder than estimating the random effects variance(s) - you now understand there is a different difficulty in estimating parameters. The 95% CIs are a quite large, they have very large uncertainties and with such large uncertainties, we do not expect they match…
fixed effects (beta) < random effects variance(s) <<< random effects covariance (correlation)
And phylogenetic signal is…
# average phylogenteic signal - both vs. left
## MCMCglmm
<- ((mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.phylo"]/(1+c2a)) / (mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.phylo"]/(1+c2a) + mod4_mcmcglmm$VCV[, "traitlateralisation.Left:traitlateralisation.Left.species"]/(1+c2a) + 1))
phylo_signalL_mod4_mcmcglmm %>% mean() phylo_signalL_mod4_mcmcglmm
[1] 0.1452383
%>% quantile(probs = c(0.025, 0.5, 0.975)) phylo_signalL_mod4_mcmcglmm
2.5% 50% 97.5%
0.000305831 0.112652234 0.482842085
## brms
<- mod4_brms %>% as_tibble() %>%
phylo_signalL_mod4_brms ::select(Sigma_phy = sd_phylo__muLeft_Intercept, Sigma_non_phy = sd_species__muLeft_Intercept) %>%
dplyrmutate(lambda_nominalL = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + 1)) %>%
pull(lambda_nominalL)
%>% mean() phylo_signalL_mod4_brms
[1] 0.1557183
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalL_mod4_brms
2.5% 50% 97.5%
0.0004165978 0.1182931266 0.5124994918
# average phylogenteic signal - both vs. right
## MCMCglmm
<- ((mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.phylo"]/(1+c2a)) / (mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.phylo"]/(1+c2a) + mod4_mcmcglmm$VCV[, "traitlateralisation.Right:traitlateralisation.Right.species"]/(1+c2a) + 1))
phylo_signalR_mod4_mcmcglmm %>% mean() phylo_signalR_mod4_mcmcglmm
[1] 0.6446448
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalR_mod4_mcmcglmm
2.5% 50% 97.5%
0.3433895 0.6569055 0.8555424
## brms
<- mod4_brms %>% as_tibble() %>%
phylo_signalR_mod4_brms ::select(Sigma_phy = sd_phylo__muRight_Intercept, Sigma_non_phy = sd_species__muRight_Intercept) %>%
dplyrmutate(lambda_nominalR = Sigma_phy^2 / (Sigma_phy^2 + Sigma_non_phy^2 + 1)) %>%
pull(lambda_nominalR)
%>% mean() phylo_signalR_mod4_brms
[1] 0.6524086
%>% quantile(probs = c(0.025,0.5,0.975)) phylo_signalR_mod4_brms
2.5% 50% 97.5%
0.3728231 0.6642912 0.8648875
References and useful links (add useful link!)
Sheard C, Skinner N, and Caro T. The evolution of rodent tail morphology. Am. Nat. 2024. 203:629-43. doi.org/10.1086/729751
Macdonald RX, Sheard C, Howell N, and Caro T. Primate coloration and colour vision: a comparative approach. Biol. J. Linn. Soc. Lond 2024. 141:435-455. doi.org/10.1093/biolinnean/blad089
Tobias et al. AVONET: morphological, ecological and geographical data for all birds. Ecol. Lett 2022. 25:581-597. doi.org/10.1111/ele.13898
Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. Rank-normalization, folding, and localization: An improved \(\hat{\mathbf{R}}\) for assessing convergence of MCMC (with discussion). Bayesian. anal 2021. 16:667-718. doi.org/10.1214/20-BA1221