rm(list = ls()) # remove all objects from the workspace to start from a clean session and avoid using stale variables.
gc() # trigger garbage collection to free memory and reduce RAM usage after clearing the workspace.Intended audience and scope
This online tutorial accompanies our tutorial paper: A unified framework for phylogenetic and spatial meta-analysis: concepts, implementation, and practical guidance [LINK]
This is intended for researchers who already have a basic understanding of meta-analysis and some familiarity with statistical modelling in R. It is aimed at readers who are interested in phylogenetic and spatial meta-analytic modelling, given the increasing availability of large-scale datasets that include species-level traits and geographic information.
In this online tutorial, we demonstrate how to conduct phylogenetic and spatial meta-analyses using R packages such as metafor, brms, and glmmTMB. We provide step-by-step instructions on creating phylogenetic and spatial correlation matrices, fitting models with different correlation structures, and interpreting the results.
Setup
# install.packages("pacman") # if not already installed
pacman::p_load(brms, metafor, metadat, tidyverse, crayon, here, ape, patchwork, dplyr, tidyr, MCMCglmm, bayesplot, tidybayes, posterior, orchaRd,purrr, stringr, readr, magrittr, janitor, glmmTMB, rotl, sf, gtools)Set up parallel processing for brms - please adjust according to your computer’s capabilities.
max_cores <- 10
num_chains <- 2
threads_per_chain <- floor(max_cores / num_chains)
options(mc.cores = num_chains) Phylogenetic meta-analysis
How to make phylogenetic correlation matrix?
First, we will show how to make a phylogenetic correlation matrix to account for the shared evolutionary history of species in your meta-analytic dataset. A phylogenetic correlation matrix reflects how closely related species are. Relations among a set of species (phylogenies) can be derived in two main ways:
- Using a specific phylogenetic tree
- Using the
rotlpackage (accessing a synthetic tree from Open Tree of Life; https://opentreeoflife.github.io/)
In the following two sections, we introduce how to make the correlation matrix from phylogenetic trees in these two ways. When making a correlation matrix from a phylogenetic tree we need to assume an evolutionary model (process), which determines how branch lengths are transformed into expected covariances between species. The example below demonstrates how to compute a correlation matrix under the Brownian motion process.
In some cases, the phylogenetic relationships among the species of interest are already well established, and a corresponding phylogenetic tree for a given taxon is available from a published source or a public repository. When this is the case, it is usually preferable to use the existing tree rather than constructing a new one. For example, large, time-calibrated phylogenies are available for many taxonomic groups (e.g. birds, mammals, plants), often accompanied by branch lengths and documentation of the underlying assumptions. Using such trees ensures consistency with previous studies and avoids unnecessary reconstruction steps
We now illustrate how to use an existing phylogenetic tree provided as a file and prepare it for use in downstream analyses, such as phylogenetic meta-analysis (of course, you can also apply what we explain to broader phylogenetic comparative analysis.)
Step 1. Read the tree from a file
Phylogenetic trees are commonly distributed in formats such as Newick (.nwk, .tre) or Nexus (.nex). The ape package can read most of these standard-format files.
Here, we will use a sample dataset on Sylviidae (Sylviid warblers; a family of passerines). We have a Nexus file containing a phylogenetic tree for 294 species from the Sylvidae family. However, our meta-analytic dataset only includes 10 species (for didactic purposes). So, we first need to “prune” the tree to remove those species that are not in our data set.
library(ape)
#load ape library for reading and manipulating phylogenetic trees in R
dat <- read.csv(here("data", "Sylviidae_dat.csv"))
dat$Phylo
# [1] "Abroscopus_albogularis" "Abroscopus_schisticeps"
# [3] "Abroscopus_superciliaris" "Achaetops_pycnopygius"
# [5] "Acrocephalus_aedon" "Acrocephalus_aequinoctialis"
# [7] "Acrocephalus_agricola" "Acrocephalus_arundinaceus"
# [9] "Acrocephalus_atyphus" "Acrocephalus_australis"
# Load phylogenetic tree – note that this file contains multiple alternative trees
trees <- read.nexus(here("data", "Sylviidae.nex"))
# select the first tree from a list of many alternative trees for Sylviidae
tree <- trees[[1]]
#plot(tree) #you can try to plot the tree, but its going to be quite big!Step 2. Basic structural checks
Before using the tree in analyses, it is good practice to perform some basic checks to ensure that the tree structure is appropriate for the intended analysis (i.e. in this case, is the tree ultrametric and binary – see below).
# Check whether the tree is ultrametric (time-calibrated)
is.ultrametric(tree)
# [1] TRUE
# Check whether the tree is fully bifurcating
is.binary(tree)
# [1] TRUEStep 3. Match tree tip labels to the dataset
The species names in the tree (tip labels) must match exactly the species names used in the dataset.
# species names used in the dataset
species_data <- unique(dat$Phylo)
length(tree$tip.label) # number if species names on the tree tips
# [1] 294
length(species_data) # number of species names in the dataset
# [1] 10
# check for mismatches between tree tip labels and dataset species names
setdiff(species_data, tree$tip.label) # species names from the dataset not found in the tree
setdiff(tree$tip.label, species_data) # species names from the tree not found in the datasetGiven that we only have 10 species in the dataset and 294 species in the tree, most species from tree will not be present in the dataset. At this point, it is also very important to look at the other mismatch result, which checks whether all of the species in our dataset can be matched to the tree. If mismatches occur, we need to consider whether it is a truly missing species, typographical errors, outdated taxonomy, hybrid species which do not fit on binomial tree, or differences in naming conventions (e.g. using underscores vs spaces between name segments, or using subspecies names). These issues must be resolved manually before proceeding. In our example, we have a dataset with species names that match the tree, si we do not need to deal with such issues.
Step 4. Prune the tree to include only species in the dataset
Phylogenetic trees often contain more species than required for a given analysis (as in our example above). Extraneous tips should be removed to match the dataset.
tree_pruned <- drop.tip(tree,setdiff(tree$tip.label, species_data))
lwngth(tree_pruned$tip.label) # check how many species left on the pruned tree
plot(tree_pruned) # check the pruned tree# check the binary and ultrametric status again
is.binary(tree_pruned)
# [1] TRUE
is.ultrametric(tree_pruned)
# [1] TRUEStep 5. Compute the phylogenetic correlation matrix
Once the tree is time-calibrated (= ultrametric), fully bifurcating (= binary), and pruned to match the dataset, you can compute the phylogenetic correlation matrix under a Brownian motion model using the vcv() function from the ape package.
A <- vcv(tree_pruned, corr = TRUE)
# check the dimensions and preview the top-left block
dim(A) This matrix represents the expected pirwise correlation among species due to shared evolutionary history and can be used directly in phylogenetic comparative or meta-analytic models.
In practice, you will not always have a phylogenetic tree readily available for the species in your dataset. In such cases, the rotl package can be used to access the Open Tree of Life and retrieve a tree based on the Latin species names in your dataset.
- Make sure that the species names in your dataset are in Latin (scientific names) and are spelled correctly. The
rotlpackage relies on these names to find the corresponding species in the Open Tree of Life (https://opentreeoflife.github.io/use). - The OpenTree synthetic tree and name resolution services are actively updated. Running the same code at different times may yield slightly different trees due to updates in the database.
- For reproducible research, it is good practice to save the retrieved tree to a file (e.g., in Newick or Nexus format) after obtaining it using
rotl. This way, you can ensure that you are using the same tree in future analyses. - Automated name matching is convenient but may not always be perfect. It is advisable to manually check the matched names to ensure accuracy.
# install and load necessary packages - if not already installed
# install.packages("rotl")
# install.packages("ape")
library(rotl)
library(ape)
# record info about OpenTree version
tol_about()
# OpenTree Synthetic Tree of Life.
# Tree version: opentree16.1
# Taxonomy version: 3.7draft3
# Constructed on: 2025-12-20 00:55:58
# Number of terminal taxa: 2385875
# Number of source trees: 2064
# Number of source studies: 1931
# Source list present: false
# Root taxon: cellular organisms
# Root ott_id: 93302
# Root node_id: ott93302Step 1. Provide a species list
Here, we use 10 commonly used lab model taxa across different animal groups as an example. Use Latin names, not common names.
# make species list
myspecies <- c(
"Escherichia colli", # typo on purpose
"Chlamydomonas reinhardtii",
"Drosophila melanogaster",
"Arabidopsis thaliana",
"Rattus norvegicus",
"Mus musculus",
"Cavia porcellus",
"Xenopus laevis",
"Saccharomyces cervisae", # typo on purpose
"Danio rerio"
)Step 2. Resolve species names using OpenTree TNRS
We start with strict matching to identify exact matches and flag names that may require manual attention. We then use approximate matching only as a diagnostic step for unresolved names, because fuzzy matches can help detect likely typos or synonyms but should always be checked before being accepted.
# strict matching first:
taxa_strict <- tnrs_match_names(
names = myspecies,
do_approximate_matching = FALSE # set FALSE
)
# Warning message:
# Escherichia colli, Saccharomyces cervisae are not matched
taxa_strict
# search_string unique_name approximate_match score
# 1 escherichia colli <NA> NA NA
# 2 chlamydomonas reinhardtii Chlamydomonas reinhardtii FALSE 1
# 3 drosophila melanogaster Drosophila melanogaster FALSE 1
# 4 arabidopsis thaliana Arabidopsis thaliana FALSE 1
# 5 rattus norvegicus Rattus norvegicus FALSE 1
# 6 mus musculus Mus musculus FALSE 1
# 7 cavia porcellus Cavia porcellus FALSE 1
# 8 xenopus laevis Xenopus laevis FALSE 1
# 9 saccharomyces cervisae <NA> NA NA
# 10 danio rerio Danio rerio FALSE 1
# ott_id is_synonym flags number_matches
# 1 NA NA <NA> NA
# 2 33153 FALSE 1
# 3 505714 FALSE 1
# 4 309263 FALSE 1
# 5 271555 FALSE 1
# 6 542509 FALSE 1
# 7 744000 FALSE 1
# 8 465096 FALSE 1
# 9 NA NA <NA> NA
# 10 1005914 FALSE 1
# fuzzy matching for unresolved names:
taxa <- tnrs_match_names(
names = myspecies,
do_approximate_matching = TRUE # set TRUE
)
taxa
# search_string unique_name approximate_match
# 1 escherichia colli Escherichia coli TRUE
# 2 chlamydomonas reinhardtii Chlamydomonas reinhardtii FALSE
# 3 drosophila melanogaster Drosophila melanogaster FALSE
# 4 arabidopsis thaliana Arabidopsis thaliana FALSE
# 5 rattus norvegicus Rattus norvegicus FALSE
# 6 mus musculus Mus musculus FALSE
# 7 cavia porcellus Cavia porcellus FALSE
# 8 xenopus laevis Xenopus laevis FALSE
# 9 saccharomyces cervisae Saccharomyces cerevisiae TRUE
# 10 danio rerio Danio rerio FALSE
# score ott_id is_synonym flags number_matches
# 1 0.9375000 474506 FALSE sibling_higher 2
# 2 1.0000000 33153 FALSE 1
# 3 1.0000000 505714 FALSE 1
# 4 1.0000000 309263 FALSE 1
# 5 1.0000000 271555 FALSE 1
# 6 1.0000000 542509 FALSE 1
# 7 1.0000000 744000 FALSE 1
# 8 1.0000000 465096 FALSE 1
# 9 0.9166667 356221 FALSE 2
# 10 1.0000000 1005914 FALSE 1In this example, the strict match flags two misspelled names, and approximate matching helps identify their likely intended taxa. In practice, once these likely matches have been identified, it is better to correct the original species names and rerun strict matching than to proceed with fuzzy matches in the final workflow.
If the strict match fails or returns NAs, you can enable approximate matching. Approximate matching uses fuzzy string matching to find the closest matches in the OpenTree taxonomy, but you should always inspect the returned matches carefully.
Key columns to check include
unique_name: the matched OpenTree taxon nameott_id: the OpenTree taxonomy identifier used for tree retrievalapproximate_match:TRUEindicates a non-exact match, for example due to a typois_synonym:TRUEindicates the input name was treated as a synonym
A useful rule of thumb is as follows:
- if
approximate_match == TRUEand the reason is an obvious typo, correct the original species list and rerun the name matching. This ensures that the final dataset contains clean and unambiguous names. - If a match looks suspicious, for example because the matched taxon belongs to a different genus or the name could plausibly refer to multiple taxa, do not accept the match blindly. Instead, inspect the additional information returned by
tnrs_match_names()to confirm that the matched taxon is indeed the one you intended to include.
Step 3. Fix typos and re-run matching
myspecies_fixed <- c(
"Escherichia coli",
"Chlamydomonas reinhardtii",
"Drosophila melanogaster",
"Arabidopsis thaliana",
"Rattus norvegicus",
"Mus musculus",
"Cavia porcellus",
"Xenopus laevis",
"Saccharomyces cerevisiae",
"Danio rerio"
)
taxa_fixed <- tnrs_match_names(
names = myspecies_fixed,
do_approximate_matching = FALSE
)
# taxa_fixed
# search_string unique_name approximate_match score
# 1 escherichia coli Escherichia coli FALSE 1
# 2 chlamydomonas reinhardtii Chlamydomonas reinhardtii FALSE 1
# 3 drosophila melanogaster Drosophila melanogaster FALSE 1
# 4 arabidopsis thaliana Arabidopsis thaliana FALSE 1
# 5 rattus norvegicus Rattus norvegicus FALSE 1
# 6 mus musculus Mus musculus FALSE 1
# 7 cavia porcellus Cavia porcellus FALSE 1
# 8 xenopus laevis Xenopus laevis FALSE 1
# 9 saccharomyces cerevisiae Saccharomyces cerevisiae FALSE 1
# 10 danio rerio Danio rerio FALSE 1
# ott_id is_synonym flags number_matches
# 1 474506 FALSE sibling_higher 1
# 2 33153 FALSE 1
# 3 505714 FALSE 1
# 4 309263 FALSE 1
# 5 271555 FALSE 1
# 6 542509 FALSE 1
# 7 744000 FALSE 1
# 8 465096 FALSE 1
# 9 356221 FALSE 1
# 10 1005914 FALSE 1At this point, you should confirm that: - All species have a valid ott_id. - the returned matches correspond to the intended taxa.
Step 4. Retrieve the phylogenetic tree from OpenTree
Once you are satisfied with the matched OpenTree identifiers, you can request a trimmed subtree from the OpenTree synthetic tree using tol_induced_subtree(). At this stage, the OpenTree subtree provides a convenient phylogenetic topology for the matched taxa. However, before using it in comparative or meta-analytic models, you should still check whether the tip labels are appropriate, whether branch lengths are available, and whether the tree satisfies assumptions such as bifurcation and ultrametricity.
tree <- tol_induced_subtree(
ott_ids = taxa_fixed[["ott_id"]],
label_format = "name"
)
# returns a trimmed subtree as an ape::phylo object
# warnings about collapsing single nodes are common and usually not problematic here
Progress [---------------------------------] 0/304 ( 0%) ?s
Progress [==============================] 304/304 (100%) 0s
Step 5. Clean and standardise tip labels
The tip labels returned by OpenTree are not always directly ready for downstream analyses. They may contain underscores, appended identifiers, or labels corresponding to internal nodes rather than named terminal taxa. The goal of this step is therefore to standardise the labels for readability and to check that each tip can be matched unambiguously to the intended species in the dataset.
In our example, one tip is labelled "mrcaott616ott617" rather than Escherichia coli. This is a non-taxon internal node within Bacteria, which can arise when only one representative from a large clade is included. Since there are no other bacteria in the dataset and the tree position is appropriate for our purposes, we relabel this tip as Escherichia coli for didactic purposes.
tree$tip.label # see the current tree tip labels
# [1] "Arabidopsis_thaliana" "Chlamydomonas_reinhardtii"
# [3] "Mus_musculus" "Rattus_norvegicus"
# [5] "Cavia_porcellus" "Xenopus_laevis"
# [7] "Danio_rerio" "Drosophila_melanogaster"
# [9] "Saccharomyces_cerevisiae" "mrcaott616ott617"
# replace underscores with spaces for readability
tree$tip.label <- gsub("_", " ", tree$tip.label)
# inspect the lineage of the E. coli OTT identifier
ecoli <- tol_node_info(ott_id = 474506, include_lineage = TRUE)
tax_lineage(ecoli) # domain within Bacteria
# rank name unique_name ott_id
# 1 domain Bacteria Bacteria 844192
# 2 no rank cellular organisms cellular organisms 93302
# we can see what info is available for this genus - not much, looks like a messy bit of a tree flagged as "INCONSISTENT"
# tnrs_match_names(names = ("Escherichia"))
# relabel the MRCA-like tip for plotting and matching
tree$tip.label <- gsub("mrcaott616ott617", "Escherichia coli", tree$tip.label)We can now plot the cleaned tree for inspection. Note that branch lengths are not yet included, so at this stage the tree is mainly useful for checking topology and tip labels.
Step 6. Final checks (binary tree and matching tip labels)
Before using the tree in downstream analyses, it is good practice to confirm that:
- The tree is fully binary, if that is required for your method.
- The tip labels match exactly the species list in your dataset.
# check whether the tree is binary
is.binary(tree) # should return TRUE
# [1] TRUE
# check exact matching between the species list and tree tip labels
intersect(as.character(tree$tip.label), myspecies_fixed)
# [1] "Arabidopsis thaliana" "Chlamydomonas reinhardtii"
# [3] "Mus musculus" "Rattus norvegicus"
# [5] "Cavia porcellus" "Xenopus laevis"
# [7] "Danio rerio" "Drosophila melanogaster"
# [9] "Saccharomyces cerevisiae" "Escherichia coli"
setdiff(myspecies_fixed, as.character(tree$tip.label))
# character(0)
setdiff(as.character(tree$tip.label), myspecies_fixed)
# character(0)Step 7. Compute the phylogenetic correlation matrix (handling missing branch lengths)
At this point, the tree topology may be usable for plotting or simple inspection, but it is not yet guaranteed to be suitable for phylogenetic covariance calculations. In particular, vcv() requires branch lengths, and many comparative models additionally assume a fully bifurcating and often ultrametric tree. The Open Tree of Life synthetic tree often provides a well-resolved topology, but it does not always include branch lengths.
To compute a phylogenetic variance-covariance (VCV) or correlation matrix under a Brownian motion (BM) process, we use vcv() from the ape package. If the tree has no branch lengths, vcv() will fail with:
A <- vcv(tree, corr = TRUE)
# Error in vcv.phylo(tree, corr = TRUE) : the tree has no branch lengthsFor didactic purposes in this tutorial, we assign equal branch lengths to all edges (that is, all branch lengths are set to 1). This allows us to demonstrate the workflow and obtain a valid correlation matrix. In real analyses, you should consider using a tree with biologically meaningful branch lengths (for example, a time-calibrated phylogeny).
# check whether the OpenTree synthetic tree includes branch lengths
is.null(tree$edge.length)
# [1] TRUE # TRUE indicates that branch lengths are missing
# if branch lengths are missing, add equal branch lengths for didactic purposes
if (is.null(tree$edge.length)) {
tree$edge.length <- rep(1, nrow(tree$edge))
}
# compute the phylogenetic correlation matrix under Brownian motion
A <- vcv(tree, corr = TRUE)
# check the dimensions and preview the top-left block
dim(A) # should return the number of species (rows/columns)
# [1] 10 10
A[1:5, 1:5]
# Arabidopsis_thaliana Chlamydomonas_reinhardtii
# Arabidopsis_thaliana 1.0000000 0.6666667
# Chlamydomonas_reinhardtii 0.6666667 1.0000000
# Mus_musculus 0.2041241 0.2041241
# Rattus_norvegicus 0.2041241 0.2041241
# Cavia_porcellus 0.2182179 0.2182179
# Mus_musculus Rattus_norvegicus Cavia_porcellus
# Arabidopsis_thaliana 0.2041241 0.2041241 0.2182179
# Chlamydomonas_reinhardtii 0.2041241 0.2041241 0.2182179
# Mus_musculus 1.0000000 0.8750000 0.8017837
# Rattus_norvegicus 0.8750000 1.0000000 0.8017837
# Cavia_porcellus 0.8017837 0.8017837 1.0000000The subtree returned by OpenTree may be sufficient for demonstration purposes, but many phylogenetic models require additional assumptions about branching structure and branch lengths. We therefore illustrate how to diagnose and handle two common issues: polytomies and non-ultrametric trees.
A polytomy occurs when an internal node has more than two descendant branches (a multifurcation).
A non-ultrametric tree is a tree in which not all tips are equidistant from the root, meaning that the tree is not time-calibrated.
Many phylogenetic comparative and meta-analytic models assume a fully bifurcating and ultrametric tree, because the expected covariance among species is defined under a continuous-time branching process (for example Brownian motion or Ornstein–Uhlenbeck evolution). Here we illustrate what these issues mean, and how they can be handled, using a simple toy example.
Create a toy tree with a polytomy
library(ape)
library(phytools)
# A simple tree with one polytomy (A, B, C diverge simultaneously)
tree_poly <- read.tree(text = "((A,B,C),D,E);")
plot(tree_poly)is.binary(tree_poly)
# [1] FALSEHere, species A, B, and C form a polytomy because their common ancestor has three descendants instead of two…
Under a Brownian motion or OU model, the phylogenetic covariance matrix is derived from branch lengths along a fully specified bifurcating tree.
When a node is a polytomy, the relative order of divergence among its descendants is undefined, which means that their shared evolutionary history cannot be uniquely translated into a covariance structure.
Resolving polytomies
multi2di() resolves a polytomy by inserting very short branches, creating an arbitrary but fully bifurcating tree. This corresponds to assuming that the true branching order is unknown but occurred over a very short evolutionary time.
set.seed(1)
tree_bin <- multi2di(tree_poly, random = TRUE)is.binary(tree_bin)
# [1] TRUEtree_bin <- multi2di(tree_poly, random = TRUE)
plot(tree_bin)Because the resolution is random, different runs of multi2di() will yield different bifurcating trees. If you want reproducible results, set a random seed before calling multi2di().
set.seed(15)
tree_bin1 <- multi2di(tree_poly, random = TRUE)
set.seed(50)
tree_bin2 <- multi2di(tree_poly, random = TRUE)Create a non-ultrametric toy tree
tree_nonultra <- read.tree(text = "((A:2,B:2):3,(C:1,D:4):2,E:5);")
is.ultrametric(tree_nonultra)
# [1] FALSEAs you can see, the tips do not all end at the same distance from the root, so the tree is not ultrametric. This means that branch lengths cannot be interpreted directly as shared evolutionary time, which is required for Brownian motion or Ornstein–Uhlenbeck based phylogenetic covariance models.
chronos() estimates branch lengths under a relaxed molecular clock, producing a tree in which all tips represent the same present time.
This is required when phylogenetic covariance is interpreted as shared evolutionary time.
tree_ultra1 <- chronos(tree_nonultra)
Setting initial dates...
Fitting in progress... get a first set of estimates
(Penalised) log-lik = -17.41843
Optimising rates... dates... -17.41843
Optimising rates... dates... -17.4165
log-Lik = -14.93714
PHIIC = 49.73
is.ultrametric(tree_ultra1)
# [1] TRUE
Setting initial dates...
Fitting in progress... get a first set of estimates
(Penalised) log-lik = -17.41843
Optimising rates... dates... -17.41843
Optimising rates... dates... -17.4165
log-Lik = -14.93714
PHIIC = 49.73
Or, you can use force.ultrametric() from the phytools package to coerce a tree into an ultrametric form by extending branch lengths.
tree_ultra2 <- force.ultrametric(tree_nonultra, method = "extend")
is.ultrametric(tree_ultra2)***************************************************************
* Note: *
* force.ultrametric does not include a formal method to *
* ultrametricize a tree & should only be used to coerce *
* a phylogeny that fails is.ultrametric due to rounding -- *
* not as a substitute for formal rate-smoothing methods. *
***************************************************************
In practice, the typical workflow is:
- Check whether the tree is binary and ultrametric.
- If polytomies are present, resolve them using
multi2di(), ideally multiple times. - If the tree is not ultrametric, apply a dating method such as
chronos()or use a time-calibrated tree from the literature. - Only then compute the phylogenetic covariance or correlation matrix using
vcv().
This ensures that the phylogenetic random effect used in meta-analysis corresponds to a well-defined evolutionary model.
Examples from real meta-analyses
1. Moura et al. (2021)
Here we use the dataset from Moura et al. (2021), which collated 1,828 effect sizes from 457 studies and 341 animal species to investigate assortative mating patterns. Each effect size is a Fisher’s Z transformed correlation coefficient for body size correlations paired couples for within a single population and sampling period; each correlation is species-specific, and there are multiple correlations per species. The effect size indicates how strongly mates resemble each other in body size across a very broad taxonomic and ecological range. The dataset and phylogenetic tree are available in the metadat package as dat.moura2021.
Before fitting hierarchical meta-analytic models, it is important to assess whether key categorical predictor variables have sufficient replication to support the estimation of random effects (or fixed effects). In particular, the identifiability of variance components depends on having an adequate number of levels for each random factor, as well as repeated observations within those levels.
So, we begin by summarising the number of unique levels for all categorical variables in the dataset, with particular attention to variables commonly used as random effects in meta-analysis, such as study identity, species identity, and effect-size identity.
# read data from metadat package
dat_moura2021 <- dat.moura2021$dat
# duplicate the column with species name - additional column is needed for adding phylogenetic correlation matrix and non-phylogenetic random effect separately.
dat_moura2021$species.id.phy <- dat_moura2021$species.id
# add a column to be used as an ID of effect sizes to specify random errors in the meta-analytic model
dat_moura2021$effect.size.id <- 1:nrow(dat_moura2021)
# convert ID of effect sizes to a factor (categorical) variable
dat_moura2021$effect.size.id <- as.factor(dat_moura2021$effect.size.id)
## check data structure
cat_vars <- dat_moura2021 |>
dplyr::select(where( ~ is.factor(.x) || is.character(.x)))
n_levels <- cat_vars |>
dplyr::summarise(across(
everything(),
~ dplyr::n_distinct(.x)
)) |>
tidyr::pivot_longer(
cols = everything(),
names_to = "variable",
values_to = "n_levels"
)
n_levels # shows how many levels each categorical variable has
# # A tibble: 13 × 2
# variable n_levels
# <chr> <int>
# 1 study.id 457
# 2 effect.size.id 1828
# 3 species 345
# 4 species.id 341
# 5 subphylum 11
# 6 phylum 3
# 7 assortment.trait 233
# 8 trait.dimensions 5
# 9 field.collection 2
# 10 pooled.data 2
# 11 spatially.pooled 2
# 12 temporally.pooled 2
# 13 species.id.phy 341The dataset contains 1,828 effect sizes (effect.size.id) drawn from 457 independent studies (study.id) and 341 animal species (species.id.phy). This provides replication at both the study and species levels, which is generally important for estimating random-effects variance components. There is, however, no universal rule for what constitutes sufficient replication, because this depends on the number of levels, the distribution of observations across levels, and the magnitudes of the variance components. As a general guide, variance components are more likely to be estimable when each random effect has multiple levels and when at least some levels contribute repeated observations. In the present dataset, the relatively large number of species particularly supports the inclusion of species-level random effects, including both non-phylogenetic (species.id) and phylogenetic (species.id.phy) terms.
Let’s move on to compute the phylogenetic correlation matrix. A matching tree is supplied as a data object in the metadat package, so we just load it and can compute the matrix without additional checking and processing
# read tree from metadat package
tree <- dat.moura2021$tree
# calculate r-to-z transformed correlations and corresponding sampling variances
dat_moura2021 <- escalc(measure = "ZCOR", ri = ri, ni = ni, data = dat_moura2021)
# turn tree into an ultrametric onevusing the default "Grafen" method (Grafen 1989)
tree <- compute.brlen(tree)
# compute phylogenetic correlation matrix
A <- vcv(tree, corr = TRUE)Brownian motion (meta-analysis)
The Brownian Motion (BM) model assumes that traits evolve according to a random walk process along the branches of the phylogenetic tree. BM is default model for continuous trait evolution in comparative methods. In the model, the expected variance-covariance structure among species is directly proportional to their shared evolutionary history, as represented by the phylogenetic tree.
phylo_eg1_meta_ma_BM <- rma.mv(yi, vi,
random = list(
~ 1 | study.id, # among-study variation
~ 1 | effect.size.id, # additional effect-size-level variation
~ 1 | species.id, # species-specific, non-phylogenetic variation
~ 1 | species.id.phy), # species-specific, phylogenetic variation
R = list(species.id.phy = A), # Brownian motion phylogenetic correlation matrix
data = dat_moura2021,
verbose = TRUE,
sparse = TRUE,
method = "REML"
)
summary(phylo_eg1_meta_ma_BM)
# Multivariate Meta-Analysis Model (k = 1828; method: REML)
#
# logLik Deviance AIC BIC AICc
# -167.6726 335.3453 345.3453 372.8974 345.3782
#
# Variance Components:
#
# estim sqrt nlvls fixed factor R
# sigma^2.1 0.0192 0.1384 457 no study.id no
# sigma^2.2 0.0145 0.1202 1828 no effect.size.id no
# sigma^2.3 0.0557 0.2359 341 no species.id no
# sigma^2.4 0.0512 0.2263 341 no species.id.phy yes
#
# Test for Heterogeneity:
# Q(df = 1827) = 10743.8076, p-val < .0001
#
# Model Results:
#
# estimate se zval pval ci.lb ci.ub
# 0.3682 0.1300 2.8311 0.0046 0.1133 0.6230 **
confint(phylo_eg1_meta_ma_BM)
# estimate ci.lb ci.ub
# sigma^2.1 0.0192 0.0108 0.0325
# sigma.1 0.1384 0.1038 0.1802
# estimate ci.lb ci.ub
# sigma^2.2 0.0145 0.0121 0.0172
# sigma.2 0.1202 0.1099 0.1311
# estimate ci.lb ci.ub
# sigma^2.3 0.0557 0.0334 0.0788
# sigma.3 0.2359 0.1827 0.2807
# estimate ci.lb ci.ub
# sigma^2.4 0.0512 0.0179 0.1792
# sigma.4 0.2263 0.1336 0.4233
# calculate I2 estimate
orchaRd::i2_ml(phylo_eg1_meta_ma_BM)
# I2_Total I2_study.id I2_effect.size.id I2_species.id I2_species.id.phy
# 97.30527 13.26902 10.00808 38.55098 35.47718
# estimate phylogenetic heritability
phylo_heritability <- phylo_eg1_meta_ma_BM$sigma2[4] / (phylo_eg1_meta_ma_BM$sigma2[4] + phylo_eg1_meta_ma_BM$sigma2[3])
phylo_heritability
# [1] 0.479239The overall estimate \(\beta_0\) is 0.368 (95% CI 0.113, 0.623) on the Fisher’s Z scale. Back transforming to the Pearson’s r, this corresponds to a correlation of roughly 0.35 - which indicates a positive correlation between mates in body size across species. On average, larger males tend to pair with larger females across species.
The \(I^2\) (\(\approx\) 97.31%) indicate that there is high heterogeneity among studies, effect sizes, species, and phylogenetic relatedness. About 35.48% of the total variance can be attributed to phylogenetic relatedness among species, indicating a moderate phylogenetic signal in mate size correlation. Biologically, this means that different species tend to have different average levels of mate size correlation, and closely related species tend to have more similar levels of mate size correlation than distantly related species.
By partitioning species level variance into phylogenetic and non-phylogenetic components, we can estimate the phylogenetic heritability (or Pagel’s \(\lambda\)) of mate size correlation. This implies that about 48 percent of the differences among species in assortative mating strength are explained by phylogenetic relatedness.
In other words, closely related species tend to have similar levels of size assortative mating, but nearly half of the variation is species specific and not explained by shared ancestry.
We show two ways of specifying sampling error in brms:
Using
se(sqrt(vi))to supply known standard errors.Representing sampling error explicitly via a group level term with a known variance covariance structure. In this formulation, the term
(1 | gr(effect.size.id, cov = vcv))defines a random effect whose covariance matrix is fixed to the diagonal matrix of sampling variances for lnRR. This is mathematically equivalent to providing a sampling variance matrix tometafor::rma.mv(), but implemented within brms.
Because sampling variances are treated as known in meta analysis, the standard deviation of this group level term is fixed to 1. Consequently, the magnitude of sampling error is fully determined by the supplied variance covariance matrix and is not estimated from the data.
The second approach is less common, but it can be much faster than usual (see more detailed information: link).
If you are not familiar with brms, we highly recommend checking out the instruction of brms first.
1. Using se(sqrt(vi))
In this specification, sampling error is modelled by supplying known standard errors via se() in the model formula. The expression yi|se(sqrt(vi)) indicates that the response variable yi has known standard errors equal to the square root of the sampling variances vi. Under this formulation, brms treats the observed effect sizes as noisy measurements of latent true effects. The remaining terms in the model specify sources of between study and between species variation in these latent true effects. Specifically, (1 | study.id) and (1 | effect.size.id) model study level and effect size level heterogeneity, (1 | species.id) and (1 | gr(species.id.phy, cov = A)) model non phylogenetic and phylogenetic species level variation, respectively.
fit_phylo_eg1_brms_ma <- bf(yi | se(sqrt(vi)) ~ 1 +
(1 | study.id) +
(1 | effect.size.id) +
(1 | species.id) +
(1 | gr(species.id.phy, cov = A))
)
# set prior
prior_brms1 <- get_prior(
formula = fit_phylo_eg1_brms_ma,
data = dat_moura2021,
data2 = list(A = A),
family = gaussian()
)
phylo_eg1_brms_ma <- brm(
formula = fit_phylo_eg1_brms_ma,
family = gaussian(),
data = dat_moura2021,
data2 = list(A = A),
prior = prior_brms1,
iter = 10000,
warmup = 7000,
chains = num_chains,
backend = "cmdstanr",
threads = threading(threads_per_chain),
control = list(adapt_delta = 0.95, max_treedepth = 15)
)
summary(phylo_eg1_brms_ma)
# Family: gaussian
# Links: mu = identity
# Formula: yi | se(sqrt(vi)) ~ 1 + (1 | study.id) + (1 | effect.size.id) + (1 | species.id) + (1 | gr(species.id.phy, cov = A))
# Data: dat_moura2021 (Number of observations: 1828)
# Draws: 2 chains, each with iter = 10000; warmup = 7000; thin = 1;
# total post-warmup draws = 6000
#
# Multilevel Hyperparameters:
# ~effect.size.id (Number of levels: 1828)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.12 0.01 0.11 0.13 1.00 2091 3894
#
# ~species.id (Number of levels: 341)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.23 0.03 0.17 0.28 1.00 795 1292
#
# ~species.id.phy (Number of levels: 341)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.27 0.09 0.14 0.50 1.00 1158 954
#
# ~study.id (Number of levels: 457)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.14 0.02 0.11 0.18 1.00 661 1223
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 0.37 0.16 0.05 0.70 1.00 3504 2965
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.00 0.00 0.00 0.00 NA NA NA2. Using gr(…, vcv = vcv)
In this alternative specification, sampling error is represented explicitly as a group level effect with a known variance covariance structure. The term (1 | gr(effect.size.id, cov = vcv)) defines a random effect for each effect size whose covariance matrix is fixed to vcv, a diagonal matrix containing the sampling variances v_i on the diagonal. This implies the following hierarchical formulation:
\(\text{Sampling error} \sim \mathcal{N}(0, V),\)
where is the supplied variance covariance matrix vcv. In this setup, sampling error is modelled as a latent random effect rather than being placed directly in the likelihood via se().
Because sampling variances are treated as known in meta analysis, the standard deviation of this group level term is fixed to 1 using constant(1). As a result, the magnitude of the sampling error is entirely determined by the supplied matrix V and is not estimated from the data.
This formulation is mathematically equivalent to providing a sampling variance matrix to metafor::rma.mv(), but implemented here within the brms modeling framework. It also yields the same likelihood for the observed effect sizes as the se(sqrt(vi)) approach, differing only in how sampling error is represented in the model.
vcv <- diag(dat_moura2021$vi)
rownames(vcv) <- colnames(vcv) <- dat_moura2021$effect.size.id
fit_phylo_eg1.1_brms_ma <- bf(yi ~ 1 +
(1 | study.id) +
(1 | species.id) +
(1 | gr(species.id.phy, cov = A)) +
(1 | gr(effect.size.id, cov = vcv))
)
prior_brms1.1 <- default_prior(fit_phylo_eg1.1_brms,
data = dat_moura2021,
data2 = list(A = A, vcv = vcv),
family = gaussian())
prior_ma$prior[3] = "constant(1)"
phylo_eg1.1_brms_ma <- brm(
formula = fit_phylo_eg1.1_brms_ma,
family = gaussian(),
data = dat_moura2021,
data2 = list(A = A, vcv = vcv),
prior = prior_brms1.1,
iter = 18000,
warmup = 10000,
chains = num_chains,
backend = "cmdstanr",
threads = threading(threads_per_chain),
control = list(adapt_delta = 0.98, max_treedepth = 18)
)
summary(phylo_eg1.1_brms)
# Family: gaussian
# Links: mu = identity
# Formula: yi ~ 1 + (1 | study.id) + (1 | species.id) + (1 | gr(species.id.phy, cov = A)) + (1 | gr(effect.size.id, cov = vcv))
# Data: dat_moura2021 (Number of observations: 1828)
# Draws: 2 chains, each with iter = 18000; warmup = 10000; thin = 1;
# total post-warmup draws = 16000
#
# Multilevel Hyperparameters:
# ~effect.size.id (Number of levels: 1828)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.00 0.00 1.00 1.00 NA NA NA
#
# ~species.id (Number of levels: 341)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.23 0.03 0.17 0.28 1.00 1390 1570
#
# ~species.id.phy (Number of levels: 341)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.27 0.09 0.15 0.50 1.00 1788 1515
#
# ~study.id (Number of levels: 457)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.14 0.02 0.11 0.18 1.00 1068 1500
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 0.37 0.16 0.03 0.70 1.00 6398 6604
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.12 0.01 0.11 0.13 1.00 1801 3462Nice - the outputs are very similar to the previous model using se(), as expected and also consistent with the metafor results. In this fit, the phylogenetic species-level variance is slightly larger than the non-phylogenetic component, whereas the opposite pattern was observed in the metafor model.
The small differences in the estimates arise because this model is fitted in a Bayesian framework.
Let’s calculate \(I^2\) statistics to quantify the proportion of variance attributable to different sources of heterogeneity.
fit <- phylo_eg1_brms
dat <- dat_moura2021
v_harm <- 1 / mean(1/dat$vi, na.rm = TRUE)
I2_total_harm <- var_total_hetero / (var_total_hetero + v_harm)
I2_effect_harm <- var_effect / (var_total_hetero + v_harm)
I2_study_harm <- var_study / (var_total_hetero + v_harm)
I2_species_harm <- var_species / (var_total_hetero + v_harm)
I2_phylo_harm <- var_phylo / (var_total_hetero + v_harm)
round(rbind(
total = summ_I2(I2_total_harm),
effect = summ_I2(I2_effect_harm),
study = summ_I2(I2_study_harm),
species = summ_I2(I2_species_harm),
phylo = summ_I2(I2_phylo_harm)
), 3)
# mean q2.5.2.5% q50.50% q97.5.97.5%
# total 0.978 0.969 0.978 0.990
# effect 0.090 0.043 0.091 0.134
# study 0.129 0.048 0.123 0.247
# species 0.330 0.115 0.333 0.530
# phylo 0.429 0.178 0.414 0.759glmmTMB is a popular package for fitting generalised linear mixed models. Unlike dedicated meta-analytic packages such as metafor, it does not allow to easily account for known sampling variances of effect sizes - so when we implement a meta-analysis in glmmTMB, we need explicitly specify the sampling error structure of the effect sizes.
In practice, this require three steps:
1. Treating each effect size as a unique random-effect level
# random effects in glmmTMB are defined over factor levels. By converting the effect-size identifier into a factor, each effect size (i) is assigned its own random effect (e_i), which will repredsent its sampling error - this is what allows the model to attach the correlation variance (v_i) to each observation
dat_moura2021$effect.size.id <- as.factor(dat_moura2021$effect.size.id)In glmmTMB, grouping variables for random effects must be factors. By converting effect_size_ID to a factor, we ensure that each effect size is treated as a distinct level in the random-effects structure. This step is required for the model to correctly associate each observation with its corresponding sampling variance.
2. Specifying the random effect independent grouping structure
# the equalto() and propto() terms require a grouping variable, even when there is only one group. the dummy variable 'g' assigns all observations to single group, while the covariance structure is defined entirely by supplied matrices…
dat_moura2021$g <- 1 3. Constructing a variance-covariance matrix (VCV) that contains the known sampling variances of each effect size
# this creates a variance-covariance matrix - diagonal elements are the known sampling variances of the effect sizes. The off-diagonal elements are zero, reflecting the usual assumption that effect sizes are independent. This matrix plays the same rule as V = vi in metafor
VCV <- diag(dat_moura2021$vi, nrow = nrow(dat_moura2021))
rownames(VCV)<- colnames(VCV)<- dat_moura2021$effect.size.idWe create a variance–covariance matrix whose diagonal elements are the known sampling variances of the effect sizes. Off-diagonal elements are set to zero, reflecting the assumption that effect sizes are independent. This mirrors the standard assumption made in most meta-analyses.
The equalto() term in glmmTMB requires a grouping factor. The dummy variable g serves this purpose, which we set to a single level (one group), i.e. all observations are assigned to the same group. If the grouping factor had more than one level, each group would have a separate independent random-effect vector with the same variance-covariance parameters. For meta-analysis, this grouping factor should always have one level.
For more details, you can also refer to this vignette: https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html
And the following supplementary material for fitting meta-analysis with glmmTMB: https://coraliewilliams.github.io/equalto_sim_study/webpage.html.
# reorders the rows and columns of the phylogenetic covariance matrix so that species appear in a consistent alphabetical order, ensuring that the matrix aligns correctly with the species factor used in the model.
A <- A[sort(rownames(A)), sort(rownames(A))]
system.time(
phylo_eg1_tmb <- glmmTMB(yi ~ 1 +
# the term equalto(0 + effect.size.id|g, VCV) specifies that the random effect associated with each effect size has a covariance structure defined by the matrix VCV. This effectively encodes the known sampling variances into the model
equalto(0 + effect.size.id|g, VCV) +
(1|study.id) +
# the terms (1|species.id) and propto(0 + species.id.phy|g, A) model non-phylogenetic and phylogenetic species-level variation, respectively. The propto() term uses the phylogenetic covariance matrix A to capture shared evolutionary history among species
(1|species.id) +
propto(0 + species.id.phy|g, A),
data = dat_moura2021,
REML = T)
)Now, we want to know the model output. Although the fitted model is statistically equivalent to the models fitted metafor and brms, glmmTMB reports variance components in a different way, which can initially confusing…
head(confint(phylo_eg1_tmb), 10)
# 2.5 % 97.5 % Estimate
# (Intercept) 0.1132610 0.6230707 0.3681658
# Std.Dev.(Intercept)|study.id 0.1056462 0.1813458 0.1384142
# Std.Dev.(Intercept)|species.id 0.1932854 0.2879763 0.2359271
# Std.Dev.species.id.phyAcanthurus_leucosternon_ott388125|g.1 0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAcanthurus_nigricans_ott467313|g.1 0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAcanthurus_nigrofuscus_ott605289|g.1 0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAchatina_fulica_ott997087|g.1 0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAegithalos_glaucogularis_vinaceus_ott5560982|g.1 0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAethia_pusilla_ott855484|g.1 0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAgalychnis_callidryas_ott9483|g.1 0.1293612 0.3959730 0.2263262
sigma(phylo_eg1_tmb)^2
# [1] 0.01445014 # residual variance
phylo_eg1_tmb_varcor <- VarCorr(phylo_eg1_tmb)$cond
exp(phylo_eg1_tmb$fit$par)
# betadisp theta theta theta
# 0.12020875 0.13841423 0.23592711 0.05122353 <- residual, study.id, species.id, species.id.phyYou can see the outputs both summary() and confint(). The confint() output reports standard deviations (SD) of random effects, not variances
For example…
(Intercept) -> overall mean
Std.Dev.(Intercept) | study.id -> correspond to the square roots of study-level variance
Std.Dev.(Intercept) | species.id -> correspond to the square roots of non-phylogenetic species-level varianceYou see the output also includes many rows such as…
Std.Dev.species.id.phyAcanthurus_leucosternon_ott388125|g.1
Std.Dev.species.id.phyAcanthurus_nigricans_ott467313|g.1do not represent separate variance parameters for each species. They are phylogenetic random effect! They are the conditional standard deviations of individual species’ phylogenetic random effects, which are all draws from the same multivariate normal distribution defined by the phylogenetic covariance matrix and a single scaling parameter. So, we should not be interpreted as species-specific variance components.
To obtain the actual variance parameters, it is more reliable to inspect the internal parameter vector:
exp(phylo_eg1_tmb$fit$par)
# betadisp -> residual SD
# theta[1] -> SD of study-level random effects
# theta[2] -> SD of non-phylogenetic species effects
# theta[3] -> Variance of phylogenetic effectsBut be careful - the meaning of theta can be changed. It depends on what random effect you included in your model. The residual variance is reported by sigma(model)^2.
Brownian motion (meta-regression)
We next extend the meta-analysis model by including the categorical moderator as a fixed effect (predictor). We will use variable called temporally.pooled (yes/no) as our moderator - it codes whether effect sizes were calculated from data pooled across multiple sampling periods. The resulting meta-regression model will allows us to examine whether methodological features of the primary studies help explain some of the variation among effect sizes, over and above the hierarchical and phylogenetic structure already accounted for.
phylo_eg1.1_meta_BM <- rma.mv(
yi, vi,
mods = ~ temporally.pooled,
random = list(
~ 1 | study.id,
~ 1 | effect.size.id,
~ 1 | species.id,
~ 1 | species.id.phy
),
R = list(species.id.phy = A),
data = dat_moura2021,
verbose = TRUE,
sparse = TRUE,
method = "REML"
)
summary(phylo_eg1.1_meta_BM)
# logLik Deviance AIC BIC AICc
# -165.9738 331.9476 343.9476 377.0069 343.9938
# Variance Components:
# estim sqrt nlvls fixed factor R
# sigma^2.1 0.0194 0.1393 457 no study.id no
# sigma^2.2 0.0145 0.1204 1828 no effect.size.id no
# sigma^2.3 0.0540 0.2323 341 no species.id no
# sigma^2.4 0.0520 0.2280 341 no species.id.phy yes
# Test for Residual Heterogeneity:
# QE(df = 1826) = 10668.4998, p-val < .0001
# Test of Moderators (coefficient 2):
# QM(df = 1) = 3.1936, p-val = 0.0739
# Model Results:
# estimate se zval pval ci.lb ci.ub
# intrcpt 0.3562 0.1311 2.7163 0.0066 0.0992 0.6132 **
# temporally.pooledyes 0.0395 0.0221 1.7871 0.0739 -0.0038 0.0829 . For studies without temporal pooling, the intercept \(\beta_0\) represents the estimated mean effect size and was 0.356 (95% CI 0.099, 0.613) on the Fisher’s Z scale. Relative to this baseline, temporally pooled studies tended to show slightly larger effect sizes on average, but the uncertainty interval overlapped zero (\(\beta_{\text{temporally.pooled}_{\text{yes}}}\) = 0.040, 95% CI = [-0.004, 0.083]).
Importantly, this inference is made while accounting for non-independence among effect sizes arising from shared studies, repeated measurements, species identity, and phylogenetic relatedness. Thus, the moderator effect reflects systematic differences associated with temporal pooling rather than confounding hierarchical structure.
orchaRd::r2_ml(phylo_eg1.1_meta_BM)
# R2_marginal R2_conditional
# 0.002500313 0.896644494 We further quantified the explanatory power of the model using marginal and conditional \(R^2\). The R2_marginal (0.003) represents the proportion of variance explained by the fixed effect alone, in this case the moderator temporally.pooled. In contrast, the conditional R2_conditional (0.897) reflects the variance explained by the full model, including both fixed effects and all random effects.
The very low marginal \(R^2\) indicates that temporal pooling explains only a negligible fraction of the total variability in effect sizes. By contrast, the high conditional \(R^2\) shows that most of the variation is captured by the hierarchical and phylogenetic structure of the model.
res1 <- orchaRd::mod_results(moura2021_BM_meta_reg, mod = "temporally.pooled", group = "study.id", subset = TRUE)
orchard_plot(res1,
mod = "temporally.pooled",
group = "study.id",
xlab = "Effect size (ZCOR)",
angle = 45) +
# scale_x_discrete(labels = c("Overall effect")) +
# scale_color_manual(values = "#CDBE70") +
# scale_fill_manual(values = "#EEDC82") +
# scale_y_continuous(breaks = seq(-4.0, 4.0, 1), limits = c(-4.0, 4.0)) +
theme_classic()vcv <- diag(dat_moura2021$vi)
rownames(vcv) <- colnames(vcv) <- dat_moura2021$effect.size.id
fit_phylo_eg1_brms_mr <- bf(yi ~ 1 + temporally.pooled +
(1 | study.id) +
(1 | species.id) +
(1 | gr(species.id.phy, cov = A)) +
(1 | gr(effect.size.id, cov = vcv))
)
prior_mr <- default_prior(fit_phylo_eg1_brms_mr,
data = dat_moura2021,
data2 = list(A = A, vcv = vcv),
family = gaussian())
prior_mr$prior[5] = "constant(1)"
system.time(
phylo_eg1_brms_mr <- brm(
formula = fit_phylo_eg1_brms_mr,
family = gaussian(),
data = dat_moura2021,
data2 = list(A = A, vcv = vcv),
prior = prior_mr,
iter = 12000,
warmup = 8000,
chains = num_chains,
backend = "cmdstanr",
threads = threading(threads_per_chain),
control = list(adapt_delta = 0.95, max_treedepth = 15)
)
)
summary(phylo_eg1_brms_mr)
# Family: gaussian
# Links: mu = identity
# Formula: yi ~ 1 + temporally.pooled + (1 | study.id) + (1 | species.id) + (1 | gr(species.id.phy, cov = A)) + (1 | gr(effect.size.id, cov = vcv))
# Data: dat_moura2021 (Number of observations: 1828)
# Draws: 2 chains, each with iter = 12000; warmup = 8000; thin = 1;
# total post-warmup draws = 8000
#
# Multilevel Hyperparameters:
# ~effect.size.id (Number of levels: 1828)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.00 0.00 1.00 1.00 NA NA NA
#
# ~species.id (Number of levels: 341)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.22 0.03 0.16 0.27 1.00 733 941
#
# ~species.id.phy (Number of levels: 341)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.28 0.10 0.15 0.52 1.00 925 798
#
# ~study.id (Number of levels: 457)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.14 0.02 0.11 0.19 1.01 569 887
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 0.35 0.16 0.01 0.68 1.00 4844 3939
# temporally.pooledyes 0.04 0.02 -0.00 0.08 1.00 4646 5357
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.12 0.01 0.11 0.13 1.00 914 2060A <- A[sort(rownames(A)), sort(rownames(A))]
dat_moura2021$g <- 1
VCV <- diag(dat_moura2021$vi, nrow = nrow(dat_moura2021))
rownames(VCV)<- colnames(VCV)<- dat_moura2021$effect.size.id
dat_moura2021$effect.size.id <- as.factor(dat_moura2021$effect.size.id)
head(dat_moura2021)
system.time(
phylo_eg1.1_tmb <- glmmTMB(
yi ~ 1 + temporally.pooled +
equalto(0 + effect.size.id|g, VCV) +
(1|study.id) +
(1|species.id.phy) +
propto(0 + species.id.phy|g, A),
data = dat_moura2021,
REML = T)
)
head(confint(phylo_eg1.1_tmb), 10)
# 2.5 % 97.5 % Estimate
# (Intercept) 0.099133194 0.61322985 0.35618152
# temporally.pooledyes -0.004080496 0.08314767 0.03953359
# Std.Dev.(Intercept)|study.id 0.106434012 0.18231634 0.13930061
# Std.Dev.(Intercept)|species.id.phy 0.189351801 0.28492224 0.23227256
# Std.Dev.species.id.phyAcanthurus_leucosternon_ott388125|g.1 0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAcanthurus_nigricans_ott467313|g.1 0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAcanthurus_nigrofuscus_ott605289|g.1 0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAchatina_fulica_ott997087|g.1 0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAegithalos_glaucogularis_vinaceus_ott5560982|g.1 0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAethia_pusilla_ott855484|g.1 0.130542545 0.39822099 0.22800171
sigma(phylo_eg1.1_tmb)^2
# [1] 0.01448824 # residual variance
phylo_eg1.1_tmb_varcor <- VarCorr(phylo_eg1.1_tmb)$cond
exp(phylo_eg1.1_tmb$fit$par)
# betadisp theta theta theta
# 0.12036710 0.13930061 0.23227256 0.05198478 <- residual variance (sd), study.id (sd), species.id (sd), species.id.phy (variance)Ornstein-Uhlenbeck
We next fit a phylogenetic meta-analysis model under an Ornstein-Uhlenbeck (OU) model of trait evolution. The OU model is useful here because it implies that correlation between species declines with phylogenetic distance, and this decay can be represented using an exponential correlation function.
Under an OU model, the phylogenetic covariance between species \(i\) and \(j\) can be written as
\[ \operatorname{Cov}_{ij} = \sigma^2 \exp(-\alpha d_{ij}), \]
where \(d_{ij}\) is the phylogenetic distance between species \(i\) and \(j\), \(\sigma^2\) is the trait variance, and \(\alpha\) controls how quickly correlation decays with distance. Larger values of \(\alpha\) imply faster decay of phylogenetic correlation, corresponding to a stronger pull towards a species-specific optimum.
This has the same functional form as the exponential correlation structure often used in distance-based models:
\[ \operatorname{Cov}_{ij} = \sigma^2 \exp(-\rho d_{ij}), \]
which allows the OU model to be fitted by exploiting this equivalence.
Let \(\mathbf{A}\) be the Brownian motion phylogenetic correlation matrix derived from the tree, where each entry represents phylogenetic similarity based on shared branch length. We then define a distance matrix \(\mathbf{D}\) as
\[ \mathbf{D} = \mathbf{I} - \mathbf{A}, \]
where \(\mathbf{I}\) is the identity matrix. The entries of \(\mathbf{D}\) represent phylogenetic distances between species, scaled between 0 and 1. Closely related species have smaller distances, whereas distantly related species have larger distances.
We fit the OU phylogenetic model using a two-stage procedure. First, we estimate the rate at which phylogenetic correlation decays with distance by fitting a meta-analysis model with an exponential correlation structure, using \(\mathbf{D}\) as the phylogenetic distance matrix. Second, we use this fitted decay structure to construct an OU based phylogenetic correlation matrix for the final meta-analysis or meta-regression model. This provides a practical way to incorporate OU style phylogenetic dependence within the standard multilevel meta-analytic framework implemented in metafor.
We first fit a meta-analysis model with an exponential correlation structure, using \(\mathbf{D}\) as the phylogenetic distance matrix:
# create phylogenetic distance matrix
dat_moura2021$const <- 1
I <- matrix(1, nrow = length(tree$tip.label), ncol = length(tree$tip.label))
D <- I-A
D[1:5, 1:5]
phy_ou_meta <- rma.mv(yi, vi,
random = list(~ 1 | study.id,
~ 1 | effect.size.id,
~ 1 | species.id,
~ species.id.phy | const),
dist = list(D),
struct = "SPEXP",
control = list(rho.init = 1),
data = dat_moura2021,
verbose = TRUE,
sparse = TRUE,
method = "REML",
test = "t"
)
summary(phy_ou_meta)
# Multivariate Meta-Analysis Model (k = 1828; method: REML)
# logLik Deviance AIC BIC AICc
# -160.3783 320.7567 332.7567 365.8192 332.8028
# Variance Components:
# estim sqrt nlvls fixed factor
# sigma^2.1 0.0157 0.1254 457 no study.id
# sigma^2.2 0.0144 0.1201 1828 no effect.size.id
# sigma^2.3 0.0000 0.0000 341 no species.id
# outer factor: const (nlvls = 1)
# inner term: ~species.id.phy (nlvls = 341)
# estim sqrt fixed
# tau^2 0.1029 0.3208 no
# rho 0.0182 no
# Test for Heterogeneity:
# Q(df = 1827) = 10743.8076, p-val < .0001
# Model Results:
# estimate se zval pval ci.lb ci.ub
# 0.3514 0.0355 9.9090 <.0001 0.2819 0.4209 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1This model estimates the exponential decay parameter governing how quickly phylogenetic correlation declines with distance. Using this fitted decay parameter, we then construct an OU based phylogenetic correlation matrix as
\[ \mathbf{A}_{OU} = \exp(-\alpha \mathbf{D}). \]
This matrix represents the phylogenetic correlation structure implied by the fitted OU model. We then refit the meta-analysis, supplying \(\mathbf{A}_{OU}\) as a fixed phylogenetic correlation matrix:
rho <- phy_ou_meta$rho #[1] 0.01820382
alpha <- 1/rho # 54.93351
A_OU <- exp(-alpha*D)
phy_ou_meta1 <- rma.mv(yi, vi,
random = list(~ 1 | study.id,
~ 1 | effect.size.id,
~ 1 | species.id,
~ 1 | species.id.phy),
R = list(species.id.phy = A_OU),
data = dat_moura2021,
verbose = TRUE,
sparse = TRUE,
method = "REML",
test = "t"
)
summary(phy_ou_meta1)
# Multivariate Meta-Analysis Model (k = 1828; method: REML)
#
# logLik Deviance AIC BIC AICc
# -160.3783 320.7567 330.7567 358.3088 330.7896
#
# Variance Components:
#
# estim sqrt nlvls fixed factor R
# sigma^2.1 0.0157 0.1254 457 no study.id no
# sigma^2.2 0.0144 0.1201 1828 no effect.size.id no
# sigma^2.3 0.0000 0.0001 341 no species.id no
# sigma^2.4 0.1029 0.3208 341 no species.id.phy yes
#
# Test for Heterogeneity:
# Q(df = 1827) = 10743.8076, p-val < .0001
#
# Model Results:
#
# estimate se tval df pval ci.lb ci.ub
# 0.3514 0.0355 9.9090 1827 <.0001 0.2819 0.4210 *** We can extend the same correlation structure to a meta-regression model by including moderators while retaining the OU based phylogenetic random effect:
phy_ou_meta_reg2 <- rma.mv(yi, vi,
mods = ~ temporally.pooled,
random = list(~ 1 | study.id,
~ 1 | effect.size.id,
~ 1 | species.id,
~ 1 | species.id.phy),
R = list(species.id.phy = A_OU),
data = dat_moura2021,
verbose = TRUE,
sparse = TRUE,
method = "REML",
test = "t"
)
summary(phy_ou_meta_reg2)
# Multivariate Meta-Analysis Model (k = 1828; method: REML)
#
# logLik Deviance AIC BIC AICc
# -159.3682 318.7364 330.7364 363.7957 330.7826
#
# Variance Components:
#
# estim sqrt nlvls fixed factor R
# sigma^2.1 0.0159 0.1261 457 no study.id no
# sigma^2.2 0.0144 0.1202 1828 no effect.size.id no
# sigma^2.3 0.0000 0.0001 341 no species.id no
# sigma^2.4 0.1014 0.3184 341 no species.id.phy yes
#
# Test for Residual Heterogeneity:
# QE(df = 1826) = 10668.4998, p-val < .0001
#
# Test of Moderators (coefficient 2):
# F(df1 = 1, df2 = 1826) = 1.8609, p-val = 0.1727
#
# Model Results:
#
# estimate se tval df pval ci.lb ci.ub
# intrcpt 0.3421 0.0359 9.5422 1826 <.0001 0.2718 0.4125 ***
# temporally.pooledyes 0.0293 0.0215 1.3641 1826 0.1727 -0.0128 0.0713 The OU phylogenetic model fitted the data substantially better than the BM model (AIC = 330.76 vs 345.35), indicating that phylogenetic similarity in assortative mating strength decays rapidly with evolutionary distance. In other words, closely related species tend to resemble one another in mate size correlation, but this resemblance weakens quickly as phylogenetic distance increases. This pattern is consistent with an OU process in which species are pulled towards their own lineage-specific optima, rather than retaining strong similarity over deep evolutionary time-scales.
Although the pooled effect size (meta-analysis) and moderator effect(meta-regression) was similar under both models, the uncertainty around this estimate was slightly smaller under OU, as reflected by the narrower confidence interval and smaller standard error. Under the OU model, species-level variation was captured almost entirely by the OU-structured phylogenetic term, whereas the BM model partitioned this variation between phylogenetic and non-phylogenetic species effects. Together, these results suggest that assortative mating strength shows only limited deep phylogenetic inertia and is instead dominated by relatively shallow phylogenetic resemblance and species-specific variation.
Note: We cannot currently fit this OU model directly in brms or glmmTMB.
Visualisations
Visualising results often makes them much easier to understand than presenting them only in tables or in text. We want to visualise both the overall effect size estimate and the variance components from the random effects - but how to do this nicely?
Traditionally, forest plots have bee used to report the pooled estimate and confidence intervals. However, forest plots does not clearly convey each study’s contribution to the overall estimates (that is, the relative precision or inverse-variance weight of each effect size). The orchaRd package provides a nice alternative, the orchard plot, which visualises the overall effect size along with the precision of each effect size estimate.
Although orchard plots provide an effective summary of model-based estimates, they are not designed to display variance-component decomposition. We therefore construct custom visualisations in ggplot2 to present the pooled effect size alongside the estimated variance components.
Here, we present how to make figures from metafor and brms using meta-analysis model outputs.
metafor_p1 <- orchaRd::orchard_plot(moura_BM_metafor1,
group = "study.id",
xlab = "Effect size",
angle = 45) +
scale_x_discrete(labels = c("Overall effect")) +
scale_color_manual(values = "#CDAD00") +
scale_fill_manual(values = "#FFD700") +
theme_classic()
## random effect
ci_var <- confint(moura_BM_metafor1, level = 0.95)
# estimate ci.lb ci.ub
# sigma^2.1 0.0192 0.0108 0.0325
# sigma.1 0.1384 0.1038 0.1802
#
# estimate ci.lb ci.ub
# sigma^2.2 0.0145 0.0121 0.0172
# sigma.2 0.1202 0.1099 0.1311
#
# estimate ci.lb ci.ub
# sigma^2.3 0.0557 0.0334 0.0788
# sigma.3 0.2359 0.1827 0.2807
#
# estimate ci.lb ci.ub
# sigma^2.4 0.0512 0.0179 0.1792
# sigma.4 0.2263 0.1336 0.4233
tbl <- tibble::as_tibble(ci_var, rownames = "term")
label_map <- c("Effect_id", "Study_id", "Species", "Phylo")
tbl_var <- tbl |>
filter(str_detect(term, "^sigma\\^2\\.")) |>
mutate(
idx = as.integer(str_match(term, "\\.(\\d+)$")[,2]),
label = case_when(
idx == 1 ~ "Study_id",
idx == 2 ~ "Effect_id",
idx == 3 ~ "Species",
idx == 4 ~ "Phylo"
),
label = factor(label, levels = label_map)
)
metafor_p2 <- ggplot(tbl_var, aes(x = label, y = estimate)) +
geom_point(size = 3.0, color = "#528B8B") +
geom_errorbar(aes(ymin = ci.lb, ymax = ci.ub), width = 0.25, color = "#528B8B") +
coord_flip() +
labs(x = NULL, y = "Variance 95% CI") +
# scale_y_continuous(breaks = seq(0, 10.0, 1), limits = c(0, 10.0)) +
theme_classic(base_size = 12)
metafor_eg1_1 <- metafor_p1 / metafor_p2
metafor_eg1_1In brms, we can extract posterior samples for both fixed and random effects, and visualise them using ggplot2.
moura_BM_brms1 <- readRDS(here("Rdata", "tutorial_v2", "moura2021_BM_brms.rds"))
get_variables(moura_BM_brms1)
## fixed effects
fixed_effects_samples_brms <- moura_BM_brms1 |>
spread_draws(b_Intercept)
fixed_effects_samples_brms <- fixed_effects_samples_brms |>
pivot_longer(cols = starts_with("b_"),
names_to = ".variable",
values_to = ".value")
brms_p1 <- ggplot(fixed_effects_samples_brms, aes(x = .value, y = .variable)) +
stat_halfeye(
normalize = "xy",
point_interval = "mean_qi",
fill = "#FFD700",
color = "#CDAD00"
) +
geom_vline(xintercept = 0, linetype = "dashed", color = "#005") +
scale_x_continuous(breaks = seq(-1.0, 2.0, 1)) +
labs(y = "Overall effect"
) +
theme_classic()
head(fixed_effects_samples_brms)
## random effects
random_effects_samples_brms <- moura_BM_brms1 |>
spread_draws(sd_species.id.phy__Intercept, sd_effect.size.id__Intercept, sd_species.id__Intercept, sd_study.id__Intercept)
random_effects_samples_brms <- random_effects_samples_brms |>
pivot_longer(cols = starts_with("sd_"),
names_to = ".variable",
values_to = ".value") |>
mutate(.value = .value^2,
.variable = case_when(
.variable == "sd_species.id.phy__Intercept" ~ "Phylo",
.variable == "sd_study.id__Intercept" ~ "Study",
.variable == "sd_species.id__Intercept" ~ "Species",
.variable == "sd_effect.size.id__Intercept" ~ "Effect_id"
),
.variable = factor(.variable,
levels = c("Effect_id", "Study", "Species", "Phylo")
)
)
head(random_effects_samples_brms)
brms_p2 <- ggplot(random_effects_samples_brms, aes(x = .value, y = .variable)) +
stat_halfeye(
normalize = "xy",
point_interval = "mean_qi",
fill = "#98F5FF",
color = "#528B8B"
) +
geom_vline(xintercept = 0, linetype = "dashed", color = "#005") +
scale_x_continuous(breaks = seq(0, 0.08, 0.02), limits = c(0, 0.08)) +
labs(
# title = "Posterior distributions of random effects - brms",
y = "Random effects (variance)"
) +
theme_classic() 2. Lim et al. (2014)
We used the o_o_unadj dataset from dat.lim2014 in the metadat package. The dataset come from Lim et al. (2014). It includes correlation coefficients (ri) and sample sizes (ni) describing the association between offspring size and offspring number, unadjusted for maternal size, across multiple species, along with a matching phylogenetic tree. We converted ri to Fisher’s z effect sizes (yi) and derived the corresponding sampling variances (vi) prior to model fitting. The resulting example dataset contained 170 effect sizes and 120 species.
# load dataset and tree
dat_lim2014 <- dat.lim2014$o_o_unadj
tre_lim2014 <- dat.lim2014$o_o_unadj_tree
# calculate effect size and sampling variances
dat_lim2014 <- escalc(measure = "ZCOR", ri = ri, ni = ni, data = dat_lim2014)
# check the tree
is.binary(tre_lim2014) # TRUE
is.ultrametric(tre_lim2014)
# in .is.ultrametric_ape(phy, tol, option, length(phy$tip.label)) : the tree has no branch lengths
# -> so we need to get the branch length…
# compute branch length
tre_lim2014 <- compute.brlen(tre_lim2014)
# make the additional species column - we will use this to consider phylo- and non-phylo random effects
dat_lim2014$phy <- dat_lim2014$species
# make effect size id
dat_lim2014$id <- 1:nrow(dat_lim2014)
head(dat_lim2014)# check the datasetBrownian motion (meta-analysis)
# make phylogenetic correlation matrix
A <- vcv(tre_lim2014, corr = TRUE)
fit_phylo_eg2_metafor_ma <- rma.mv(yi, vi,
random = list(~ 1 | id,
~ 1 | phy,
~ 1| species),
R = list(phy = A),
data = dat_lim2014,
verbose = TRUE,
sparse = TRUE,
method = "REML"
)
summary(fit_phylo_eg2_metafor_ma)
# Multivariate Meta-Analysis Model (k = 170; method: REML)
#
# logLik Deviance AIC BIC AICc
# -93.3327 186.6653 194.6653 207.1849 194.9092
#
# Variance Components:
#
# estim sqrt nlvls fixed factor R
# sigma^2.1 0.0630 0.2510 170 no id no
# sigma^2.2 0.0534 0.2311 120 no phy yes
# sigma^2.3 0.0777 0.2788 120 no species no
#
# Test for Heterogeneity:
# Q(df = 169) = 1823.4239, p-val < .0001
#
# Model Results:
#
# estimate se zval pval ci.lb ci.ub
# -0.1308 0.1219 -1.0729 0.2833 -0.3696 0.1081 lim_vcv <- vcv.phylo(tre_lim2014, corr = TRUE)
vcv <- diag(dat_lim2014$vi)
rownames(vcv) <- colnames(vcv) <- dat_lim2014$id
fit_phylo_eg2_brms_ma <- bf(yi ~ 1 +
(1 | species) +
(1 | gr(phy, cov = lim_vcv)) +
(1 | gr(id, cov = vcv))
)
prior_ma <- default_prior(fit_phylo_eg2_brms_ma,
data = dat_lim2014,
data2 = list(lim_vcv = lim_vcv, vcv = vcv),
family = gaussian())
prior_ma$prior[3] = "constant(1)"
system.time(
phylo_eg2_brms_ma <- brm(
formula = fit_phylo_eg2_brms_ma,
family = gaussian(),
data = dat_lim2014,
data2 = list(lim_vcv = lim_vcv, vcv = vcv),
prior = prior_ma,
iter = 5000,
warmup = 2000,
chains = 4,
backend = "cmdstanr",
threads = threading(threads_per_chain),
control = list(adapt_delta = 0.98, max_treedepth = 15)
)
)
summary(phylo_eg2_brms_ma)
# Family: gaussian
# Links: mu = identity
# Formula: yi ~ 1 + (1 | species) + (1 | gr(phy, cov = lim_vcv)) + (1 | gr(id, cov = vcv))
# Data: dat_lim2014 (Number of observations: 170)
# Draws: 4 chains, each with iter = 5000; warmup = 2000; thin = 1;
# total post-warmup draws = 12000
#
# Multilevel Hyperparameters:
# ~id (Number of levels: 170)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.00 0.00 1.00 1.00 NA NA NA
#
# ~phy (Number of levels: 120)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.29 0.12 0.09 0.57 1.00 1935 2772
#
# ~species (Number of levels: 120)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.26 0.07 0.10 0.37 1.00 896 978
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.13 0.16 -0.45 0.19 1.00 6557 6385
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.26 0.04 0.19 0.35 1.00 870 1995
fit_phylo_eg2_brms_ma <- bf(yi | se(sqrt(vi)) ~ 1+
(1 | species) +
(1 | gr(phy, cov = lim_vcv)) +
(1 | id))
prior_ma <- default_prior(fit_phylo_eg2_brms_ma,
data = dat_lim2014,
data2 = list(lim_vcv = lim_vcv),
family = gaussian())
system.time(
phylo_eg2_brms_ma_2 <- brm(
formula = fit_phylo_eg2_brms_ma,
family = gaussian(),
data = dat_lim2014,
data2 = list(lim_vcv = lim_vcv),
prior = prior_ma,
iter = 5000,
warmup = 2000,
chains = 4,
backend = "cmdstanr",
threads = threading(threads_per_chain),
control = list(adapt_delta = 0.98, max_treedepth = 15)
)
)
summary(phylo_eg2_brms_ma_2)
# Family: gaussian
# Links: mu = identity
# Formula: yi | se(sqrt(vi)) ~ 1 + (1 | species) + (1 | gr(phy, cov = lim_vcv)) + (1 | id)
# Data: dat_lim2014 (Number of observations: 170)
# Draws: 4 chains, each with iter = 5000; warmup = 2000; thin = 1;
# total post-warmup draws = 12000
#
# Multilevel Hyperparameters:
# ~id (Number of levels: 170)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.26 0.04 0.19 0.35 1.00 1311 1410
#
# ~phy (Number of levels: 120)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.29 0.12 0.09 0.58 1.00 1732 3147
#
# ~species (Number of levels: 120)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.26 0.07 0.08 0.37 1.01 1010 714
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.13 0.16 -0.44 0.18 1.00 5189 6304
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.00 0.00 0.00 0.00 NA NA NAdat_lim2014$id <- factor(
as.character(dat_lim2014$id),
levels = mixedsort(unique(as.character(dat_lim2014$id)))
)
class(dat_lim2014$id)
A <- A[sort(rownames(A)), sort(rownames(A))]
vcv <- diag(dat_lim2014$vi, nrow = nrow(dat_lim2014))
rownames(vcv)<- colnames(vcv)<- dat_lim2014$id
dat_lim2014$g <- 1
phylo_eg2_tmb_ma <- glmmTMB(yi ~ 1 +
equalto(0 + id|g, vcv) +
(1| species) +
propto(0 + phy|g, A),
data = dat_lim2014,
REML = T)
summary(phylo_eg2_tmb_ma)
# Family: gaussian ( identity )
# Formula: yi ~ 1 + equalto(0 + id | g, vcv) + (1 | species) + propto(0 + phy | g, A)
# Data: dat_lim2014
#
# AIC BIC logLik -2*log(L) df.resid
# 199.8 212.3 -95.9 191.8 166
#
# Random effects:
#
# Conditional model:
# Groups Name Variance Std.Dev. Corr
# g id1 0.0555556 0.23570
# id2 0.0909091 0.30151 0.00
# id3 0.0555556 0.23570 0.00 0.00
# id4 0.0909091 0.30151 0.00 0.00 0.00
# id5 0.0140845 0.11868 0.00 0.00 0.00 0.00
# id6 0.0098039 0.09901 0.00 0.00 0.00 0.00 0.00
# id7 0.0099010 0.09950 0.00 0.00 0.00 0.00 0.00 0.00
# id8 0.0217391 0.14744 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# id9 0.0031746 0.05634 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# id10 0.0208333 0.14434 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# ... ... ... ... ... ... ... ... ... ... ...
# species (Intercept) 0.0777194 0.27878
# g.1 phyAlectoris_rufa 0.0533848 0.23105
# phyAlytes_obstetricans 0.0533848 0.23105 0.20
# phyAnas_platyrhynchos 0.0533848 0.23105 0.92 0.20
# phyAnguis_fragilis 0.0533848 0.23105 0.32 0.20 0.32
# phyApalone_ferox 0.0533848 0.23105 0.65 0.20 0.65 0.32
# phyApalone_mutica 0.0533848 0.23105 0.65 0.20 0.65 0.32 0.99
# phyAphrastura_spinicauda 0.0533848 0.23105 0.77 0.20 0.77 0.32 0.65 0.65
# phyAraschnia_levana 0.0533848 0.23105 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# phyAustropotamobius_italicus 0.0533848 0.23105 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.93
# phyAythya_ferina 0.0533848 0.23105 0.92 0.20 0.99 0.32 0.65 0.65 0.77 0.00
# ... ... ... ... ... ... ... ... ... ... ...
# Residual 0.0629794 0.25096
#
# 0.00
# ... ...
#
# 0.00
# ... ...
#
# Number of obs: 170, groups: g, 1; species, 120
#
# Dispersion estimate for gaussian family (sigma^2): 0.063
#
# Conditional model:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.1308 0.1220 -1.072 0.284
phylo_eg2_tmb$fit$par
# betadisp theta theta
# -1.382474 -1.277325 -2.930230 Across metafor, brms, and glmmTMB, this example yielded highly consistent results. The estimated overall mean effect was approximately \(\beta_0\) = -0.13 in all three packages, and heterogeneity was partitioned similarly across effect-size, phylogenetic, and species-level components. Small differences in the reported variance terms arise from package-specific parametrisations, particularly in the way sampling variance is represented.
Biologically, the negative mean effect suggests a tendency for larger offspring size to be associated with fewer offspring, consistent with the classic offspring size-number trade-off. At the same time, the non-zero phylogenetic and species-level variance components indicate that the strength of this relationship differs among taxa and is only partly explained by shared evolutionary history.
Brawnian motion (meta-regression)
fit_phylo_eg2_metafor_mr <- rma.mv(yi, vi,
mods = ~ environment,
random = list(~ 1 | id,
~ 1 | phy,
~ 1 | species),
R = list(phy = A),
data = dat_lim2014,
verbose = TRUE,
sparse = TRUE,
method = "REML"
)
summary(fit_phylo_eg2_metafor_mr)
# Multivariate Meta-Analysis Model (k = 170; method: REML)
#
# logLik Deviance AIC BIC AICc
# -93.0600 186.1200 196.1200 211.7398 196.4904
#
# Variance Components:
#
# estim sqrt nlvls fixed factor R
# sigma^2.1 0.0630 0.2511 170 no id no
# sigma^2.2 0.0521 0.2283 120 no phy yes
# sigma^2.3 0.0792 0.2815 120 no species no
#
# Test for Residual Heterogeneity:
# QE(df = 168) = 1774.0613, p-val < .0001
#
# Test of Moderators (coefficient 2):
# QM(df = 1) = 0.0401, p-val = 0.8413
#
# Model Results:
#
# estimate se zval pval ci.lb ci.ub
# intrcpt -0.1395 0.1289 -1.0826 0.2790 -0.3921 0.1131
# environmentwild 0.0166 0.0829 0.2003 0.8413 -0.1460 0.1792 fit_phylo_eg2_brms_mr <- bf(yi ~ 1 + environment +
(1 | species) +
(1 | gr(phy, cov = lim_vcv)) +
(1 | gr(id, cov = vcv))
)
prior_mr <- default_prior(fit_phylo_eg2_brms_mr,
data = dat_lim2014,
data2 = list(lim_vcv = lim_vcv, vcv = vcv),
family = gaussian())
prior_mr$prior[5] = "constant(1)"
phylo_eg2_brms_mr <- brm(
formula = fit_phylo_eg2_brms_mr,
family = gaussian(),
data = dat_lim2014,
data2 = list(lim_vcv = lim_vcv, vcv = vcv),
prior = prior_mr,
iter = 5000,
warmup = 2000,
chains = 4,
backend = "cmdstanr",
threads = threading(threads_per_chain),
control = list(adapt_delta = 0.98, max_treedepth = 15)
)
summary(phylo_eg2_brms_mr)
# Family: gaussian
# Links: mu = identity
# Formula: yi ~ 1 + environment + (1 | species) + (1 | gr(phy, cov = lim_vcv)) + (1 | gr(id, cov = vcv))
# Data: dat_lim2014 (Number of observations: 170)
# Draws: 4 chains, each with iter = 5000; warmup = 2000; thin = 1;
# total post-warmup draws = 12000
#
# Multilevel Hyperparameters:
# ~id (Number of levels: 170)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.00 0.00 1.00 1.00 NA NA NA
#
# ~phy (Number of levels: 120)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.28 0.12 0.08 0.57 1.00 2150 2926
#
# ~species (Number of levels: 120)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.26 0.07 0.10 0.37 1.00 1017 1099
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.14 0.16 -0.46 0.19 1.00 9372 7645
# environmentwild 0.02 0.09 -0.15 0.18 1.00 10459 8813
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.26 0.04 0.19 0.35 1.00 949 1860
#
# Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).phylo_eg2_tmb_mr <- glmmTMB(yi ~ 1 + environment +
equalto(0 + id|g, vcv) +
(1| species) +
propto(0 + phy|g, A),
data = dat_lim2014,
REML = T)
summary(phylo_eg2_tmb_mr)
# Family: gaussian ( identity )
# Formula: yi ~ 1 + environment + equalto(0 + id | g, vcv) + (1 | species) +
# propto(0 + phy | g, A)
# Data: dat_lim2014
#
# AIC BIC logLik -2*log(L) df.resid
# 204.9 220.6 -97.5 194.9 165
#
# Random effects:
#
# Conditional model:
# Groups Name Variance Std.Dev. Corr
# g id1 0.0555556 0.23570
# id2 0.0909091 0.30151 0.00
# id3 0.0555556 0.23570 0.00 0.00
# id4 0.0909091 0.30151 0.00 0.00 0.00
# id5 0.0140845 0.11868 0.00 0.00 0.00 0.00
# id6 0.0098039 0.09901 0.00 0.00 0.00 0.00 0.00
# id7 0.0099010 0.09950 0.00 0.00 0.00 0.00 0.00 0.00
# id8 0.0217391 0.14744 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# id9 0.0031746 0.05634 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# id10 0.0208333 0.14434 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# ... ... ... ... ... ... ... ... ... ... ...
# species (Intercept) 0.0792370 0.28149
# g.1 phyAlectoris_rufa 0.0521124 0.22828
# phyAlytes_obstetricans 0.0521124 0.22828 0.20
# phyAnas_platyrhynchos 0.0521124 0.22828 0.92 0.20
# phyAnguis_fragilis 0.0521124 0.22828 0.32 0.20 0.32
# phyApalone_ferox 0.0521124 0.22828 0.65 0.20 0.65 0.32
# phyApalone_mutica 0.0521124 0.22828 0.65 0.20 0.65 0.32 0.99
# phyAphrastura_spinicauda 0.0521124 0.22828 0.77 0.20 0.77 0.32 0.65 0.65
# phyAraschnia_levana 0.0521124 0.22828 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# phyAustropotamobius_italicus 0.0521124 0.22828 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.93
# phyAythya_ferina 0.0521124 0.22828 0.92 0.20 0.99 0.32 0.65 0.65 0.77 0.00
# ... ... ... ... ... ... ... ... ... ... ...
# Residual 0.0630428 0.25108
# 0.00
# ... ...
# 0.00
# ... ...
#
# Number of obs: 170, groups: g, 1; species, 120
#
# Dispersion estimate for gaussian family (sigma^2): 0.063
#
# Conditional model:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.13951 0.12893 -1.082 0.279
# environmentwild 0.01661 0.08469 0.196 0.844Visualisations
In this dataset, we plot the meta-regression model output.
mr_res <- orchaRd::mod_results(
fit_phylo_eg2_metafor_mr,
mod = "environment",
group = "id"
)
mr_res
## 3. orchard plot
metafor_mr_p1 <- orchaRd::orchard_plot(
mr_res,
mod = "environment",
group = "id",
xlab = "Effect size",
angle = 45
) +
# scale_color_manual(values = "#CDAD00") +
# scale_fill_manual(values = "#FFD700") +
theme_classic()
metafor_mr_p1
## for random effect
ci_var_mr <- confint(fit_phylo_eg2_metafor_mr, level = 0.95)
tbl_mr <- tibble::as_tibble(ci_var_mr, rownames = "term")
label_map_mr <- c("Effect_id", "Species", "Phylo")
tbl_var_mr <- tbl_mr |>
filter(str_detect(term, "^sigma\\^2\\.")) |>
mutate(
idx = as.integer(str_match(term, "\\.(\\d+)$")[, 2]),
label = case_when(
idx == 1 ~ "Effect_id", # ~1 | id
idx == 2 ~ "Phylo", # ~1 | phy
idx == 3 ~ "Species" # ~1 | species
),
label = factor(label, levels = label_map_mr)
)
metafor_mr_p2 <- ggplot(tbl_var_mr, aes(x = label, y = estimate)) +
geom_point(size = 3.0, color = "#528B8B") +
geom_errorbar(aes(ymin = ci.lb, ymax = ci.ub), width = 0.25, color = "#528B8B") +
coord_flip() +
labs(x = NULL, y = "Variance (95% CI)") +
theme_classic(base_size = 12)
metafor_eg2_1 <- metafor_mr_p1 / metafor_mr_p2
metafor_eg2_1get_variables(phylo_eg2_brms_mr)
## fixed effects
fixed_effects_samples_brms <- phylo_eg2_brms_mr |>
spread_draws(b_Intercept)
fixed_effects_samples_brms <- fixed_effects_samples_brms |>
pivot_longer(cols = starts_with("b_"),
names_to = ".variable",
values_to = ".value")
brms_p1 <- ggplot(fixed_effects_samples_brms, aes(x = .value, y = .variable)) +
stat_halfeye(
normalize = "xy",
point_interval = "mean_qi",
fill = "#FFD700",
color = "#CDAD00"
) +
geom_vline(xintercept = 0, linetype = "dashed", color = "#005") +
scale_x_continuous(breaks = seq(-1.0, 2.0, 1)) +
labs(y = "Overall effect"
) +
theme_classic()
head(fixed_effects_samples_brms)
## random effects
random_effects_samples_brms <- phylo_eg2_brms_mr |>
spread_draws(sd_phy__Intercept, sd_species__Intercept, sigma) |>
as_tibble()
random_effects_samples_brms <- random_effects_samples_brms |>
pivot_longer(
cols = c(sd_phy__Intercept, sd_species__Intercept, sigma),
names_to = ".variable",
values_to = ".value"
) |>
mutate(
.value = .value^2, # sd -> variance
.variable = case_when(
.variable == "sd_phy__Intercept" ~ "Phylo",
.variable == "sd_species__Intercept" ~ "Species",
.variable == "sigma" ~ "Effect_id" # remember use "sigma" as we set sd of id = 1
),
.variable = factor(.variable,
levels = c("Effect_id", "Species", "Phylo"))
)
head(random_effects_samples_brms)
brms_p2 <- ggplot(random_effects_samples_brms, aes(x = .value, y = .variable)) +
stat_halfeye(
normalize = "xy",
point_interval = "mean_qi",
fill = "#98F5FF",
color = "#528B8B"
) +
geom_vline(xintercept = 0, linetype = "dashed", color = "#005") +
# scale_x_continuous(breaks = seq(0, 0.08, 0.02), limits = c(0, 0.08)) +
labs(
# title = "Posterior distributions of random effects - brms",
y = "Random effects (variance)"
) +
theme_classic()
brms_eg2 <- brms_p1 / brms_p2Spatial meta-analysis
How to make spatial correlation matrix?
In spatial meta-analysis, we model the residual similarity between effect sizes as a function of the pairwise spatial separation among observations. In many applications, each effect size is associated with a sampling location defined by latitude and longitude, from which pairwise geographic distances can be calculated. More generally, however, the key input is a matrix describing the pairwise separation among effect sizes, which may reflect straight-line geographic distance or other biologically meaningful measures of spatial separation, such as habitat connectivity, landscape resistance, barriers to movement, or distances among broader spatial units when exact sampling coordinates are unavailable. Depending on the spatial correlation structure assumed (for example, exponential, Gaussian, or spherical), this pairwise distance matrix is then transformed into a correlation matrix that describes the expected similarity between effect sizes based on their spatial proximity. In this context, the kernel is the mathematical function that maps pairwise distances to expected correlations, thereby specifying how quickly similarity declines with distance. In the examples below, we use pairwise distances calculated from exact sampling coordinates.
Step 1. Prepare data:
Ensure each effect size has geographic coordinates, a unique identifier and projected coordinates in km
Each effect size should have geographic coordinates (latitude and longitude) indicating where the data were collected. Normally, these data would be extracted from the study and/or imputed/approximated if unreported. Because latitude and longitude are measured in degrees, we first project them to a planar coordinate reference system (CRS) and then convert the resulting coordinates to kilometres. This allows us to compute meaningful Euclidean distances.
library(tidyverse)
library(sf)
toy_sites <- tibble(
effect_id = 1:5,
latitude = c(34.05, 36.16, 40.71, 34.05, 37.77),
longitude = c(-118.24, -115.15, -74.00, -118.25, -122.42)
)
# convert to an sf object (WGS84 longitude / latitude)
toy_sites_sf <- st_as_sf(
toy_sites,
coords = c("longitude", "latitude"),
crs = 4326)
# project to a planar coordinate system and convert to kilometers
# EPSG:3857 is convenient for small-scale distance calculations
# for real analyses, consider using an appropriate local projection
toy_sites_sf_proj <- st_transform(toy_sites_sf, crs = 3857)
xy_m <- st_coordinates(toy_sites_sf_proj)
toy_sites$x_km <- xy_m[,1] / 1000
toy_sites$y_km <- xy_m[,2] / 1000
toy_sites
# A tibble: 5 × 5
# effect_id latitude longitude x_km y_km
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 34.0 -118. -13162. 4036.
# 2 2 36.2 -115. -12818. 4323.
# 3 3 40.7 -74 -8238. 4970.
# 4 4 34.0 -118. -13164. 4036.
# 5 5 37.8 -122. -13628. 4547.The projected coordinates (x_km, y_km) can be negative because they represent positions relative to the origin of the coordinate system. We first assign a geographic CRS (EPSG:4326) to declare that the coordinates are in latitude and longitude.
Then we transform the coordinates to a planar CRS (EPSG:3857) suitable for distance calculations, and convert the resulting x and y coordinates from meters to kilometres.
Step 2. Compute pairwise Euclidean distance matrix
Once each effect size has projected coordinates in kilometres, we can compute the Euclidean distance between all pairs of sampling locations. These distances form the basis of spatial correlation structures in spatial meta-analysis.
# extract projected coordinates
coords_km <- as.matrix(toy_sites[, c("x_km", "y_km")])
# compute pairwise Euclidean distances (in km)
D <- as.matrix(dist(coords_km))
# assign row and column names using effect_id
rownames(D) <- toy_sites$effect_id
colnames(D) <- toy_sites$effect_id
D
# 1 2 3 4 5
# 1 0.000000 448.0744 5012.587 1.113195 691.4604
# 2 448.074446 0.0000 4626.263 448.929589 839.8075
# 3 5012.586560 4626.2633 0.000 5013.680258 5406.6368
# 4 1.113195 448.9296 5013.680 0.000000 690.7118
# 5 691.460407 839.8075 5406.637 690.711777 0.0000Each element \(D_{ij}\) is the straight-line distance (in kilometers) between effect sizes i and j.
Why this matrix matters?
Spatial meta-analytic models assume that residual similarity between effect sizes decays with geographic distance. The distance matrix \(\mathbf{D}\) encodes this information.
Differences in implementation and requirements between \(R\) packages:
- In
metafor, we supply \(\mathbf{D}\) directly, andmetafortransforms it into a residual correlation matrix using a specified kernel, such as a linear, exponential, or Gaussian kernel(more info: here). - In
brms, we supply the geographic coordinates, andbrmsinternally computes the pairwise distances and applies the corresponding distance-based kernel. - In
glmmTMB, we supply geographic coordinates, and it models spatial dependence through a continuous spatial random field whose covariance structure is implicitly governed by distance rather than by an explicit user-supplied distance matrix.
Examples from real meta-analyses
1. Grau-Andrés et al. (2024)
The dataset comes from Grau-Andrés et al. (2024) and contains 2361 effect sizes of Hedges’ d. Each effect size is associated with a geographic location given by latitude and longitude. These locations represent sampling sites at which the original biological measurement were collected.
The original data provide latitude and longitude in degrees. We first project these coordinates to a planar CRS and convert them to kilometres. We then compute the pairwise Euclidean distance matrix among all effect sizes based on their projected coordinates.
dat_Roger <- read.csv(here("data", "Roger_etal_2024", "Roger_etal_2024.csv"))
dat_Roger <- dat_Roger |>
filter(!is.na(longitude))
coords2 <- cbind(dat_Roger$longitude, dat_Roger$latitude)
# project to a planar coordinate system and convert to kilometers
dat_sf2 <- st_as_sf(dat_Roger, coords = c("longitude", "latitude"), crs = 4326)
dat_sf2_proj <- st_transform(dat_sf2, crs = 3857)
coords_m <- st_coordinates(dat_sf2_proj)
dat_Roger$x_km <- coords_m[,1] / 1000
dat_Roger$y_km <- coords_m[,2] / 1000
coords_km <- cbind(dat_Roger$x_km, dat_Roger$y_km)
dist_matrix_euclid2 <- as.matrix(dist(coords_km))
dat_Roger$effect_id <- seq_len(nrow(dat_Roger))
dat_Roger$effect_id <- factor(
as.character(dat_Roger$effect_id),
levels = mixedsort(unique(as.character(dat_Roger$effect_id)))
)
# row and column names
rownames(dist_matrix_euclid2) <- dat_Roger$effect_id
colnames(dist_matrix_euclid2) <- dat_Roger$effect_id
dat_Roger <- dat_Roger |>
group_by(latitude, longitude) |>
mutate(site_id = cur_group_id()) |>
ungroup()
names(dat_Roger)The exponential kernel assumes that similarity between two effect sizes decreases smoothly as the geographic distance between their location increases. Therefore, nearby sites are strongly correlated, while distant sites are effectively independent.
metafor
In metafor, the spatial process is introduced by specifying struct = "SPEXP" together with either (1) via a supplied distance matrix using random = ~ effect_id|const and dist = list(…), or (2) via coordinates using random = ~ x_km + y_km|const, in which case distances are computed internally.
dat_Roger$const <- 1 # add constant for spatial modelsHere, const is a dummy outer factor set to a constant (all ones) so that all effects are allowed to be spatially correlated. If outer were set to a grouping variable (e.g., study ID), spatial correlation would only be modelled within groups (block-diagonal structure). The spatial random field is parametrised by (1) \(\tau^2\), the marginal variance of the spatial process (its amplitude), and (2) \(\rho\), the range/scale parameter controlling how quickly correlation decays with distance under the chosen kernel (we show SPEXP- exponential kernel and also SPGAU- Gaussian process version later). Let \(d_{ij}\) be the distance between locations \(i\) and \(j\). The spatial random effects satisfy \(\mathrm{Cov}(u_i,u_j)=\tau^2\,\mathrm{cor}_{\rho}(d_{ij})\), where \(\mathrm{cor}_{\rho}(d_{ij})\) is the kernel-specific correlation function that decreases with distance, and \(\rho\) controls the rate of decay. Distances can be supplied directly via text{dist=} or computed internally from coordinates (~ x_km + y_km | const); both are equivalent when they imply the same Euclidean distance matrix.
# Use distance matrix
ma_sp1_exp0_mf <- rma.mv(d_Hedges, var_Hedges,
random = list(
~ 1|effect_id,
~ effect_id|const
),
struct = "SPEXP",
data = dat_Roger,
dist = list(Site = dist_matrix_euclid2),
test = "t",
method = "REML",
sparse = TRUE,
verbose = TRUE)
summary(ma_sp1_exp0_mf)
# Multivariate Meta-Analysis Model (k = 2361; method: REML)
#
# logLik Deviance AIC BIC AICc
# -4034.4852 8068.9704 8076.9704 8100.0360 8076.9874
#
# Variance Components:
#
# estim sqrt nlvls fixed factor
# sigma^2 0.7935 0.8908 2361 no effect_id
#
# outer factor: const (nlvls = 1)
# inner term: ~effect_id (nlvls = 2361)
#
# estim sqrt fixed
# tau^2 1.2330 1.1104 no
# rho 0.1744 no
#
# Test for Heterogeneity:
# Q(df = 2360) = 21273.5149, p-val < .0001
#
# Model Results:
#
# estimate se tval df pval ci.lb ci.ub
# -0.3349 0.0644 -5.2006 2360 <.0001 -0.4612 -0.2086 ***
# Use the coordinates directly (Euclidean distance)
system.time(sp_eg1_exp_mf <- rma.mv(d_Hedges, var_Hedges,
random = list(
~ 1|effect_id,
~ x_km + y_km |const
),
struct = "SPEXP",
data = dat_Roger,
sparse = TRUE,
verbose = TRUE,
method = "REML",
test = "t"
)
)
summary(sp_eg1_exp_mf)
# Multivariate Meta-Analysis Model (k = 2361; method: REML)
#
# logLik Deviance AIC BIC AICc
# -4034.4852 8068.9704 8076.9704 8100.0360 8076.9874
#
# Variance Components:
#
# estim sqrt nlvls fixed factor
# sigma^2 0.7935 0.8908 2361 no effect_id
#
# outer factor: const (nlvls = 1)
# inner term: ~x_km + y_km (nlvls = 383)
#
# estim sqrt fixed
# tau^2 1.2330 1.1104 no
# rho 0.1744 no
#
# Test for Heterogeneity:
# Q(df = 2360) = 21273.5149, p-val < .0001
#
# Model Results:
#
# estimate se tval df pval ci.lb ci.ub
# -0.3349 0.0644 -5.2006 2360 <.0001 -0.4612 -0.2086 ***
confint(sp_eg1_exp_mf)
# estimate ci.lb ci.ub
# sigma^2 0.7935 0.7257 0.8678
# sigma 0.8908 0.8519 0.9316
#
# estimate ci.lb ci.ub
# tau^2 1.2330 1.0147 1.4976
# tau 1.1104 1.0073 1.2238
#
# estimate ci.lb ci.ub
# rho 0.1744 <0.0174 0.8572 Both metafor models (use distance matrix or coordinates directly or Euclidean distance) returned the same output.
brms
In brms, the covariance function of the Gaussian process is controlled by the cov argument. To use an exponential kernel, set cov = "exponential". Note that scale determines whether the input predictors are rescaled (by default, to make the maximum Euclidean distance equal to 1). If the coordinates are already expressed in meaningful units such as kilometres, and you want the GP length-scale parameter to be interpretable in those units, it is advisable to use scale = FALSE.
dat_Roger$effect_id <- as.character(dat_Roger$effect_id)
vcv <- diag(dat_Roger$var_Hedges)
rownames(vcv) <- colnames(vcv) <- dat_Roger$effect_id
fit_sp_eg1_exp_brms <- bf(d_Hedges ~ 1 +
# (1|site_id) + # this is site-level random effect
(1|gr(effect_id, cov = vcv)) + # this is m (sampling error)
gp(x_km, y_km,
cov = "exponential",
scale = FALSE)
)
# generate default priors
prior <- default_prior(fit_sp_eg1_exp_brms,
data = dat_Roger,
data2 = list(vcv = vcv),
family = gaussian())
prior$prior[5] = "constant(1)" # meta-analysis assumes sampling variance is known so fixing this to 1
prior
# prior class coef group resp dpar nlpar lb ub tag source
# student_t(3, -0.1, 2.5) Intercept default
# (flat) lscale 0 default
# inv_gamma(1.494197, 2141.138175) lscale gpx_kmy_km 0 default
# student_t(3, 0, 2.5) sd 0 default
# constant(1) sd effect_id 0 default <- check this
# constant(1) sd Intercept effect_id 0 (vectorized) <- check this
# student_t(3, 0, 2.5) sdgp 0 default
# student_t(3, 0, 2.5) sdgp gpx_kmy_km 0 (vectorized)
# student_t(3, 0, 2.5) sigma 0 default
# fitting model
sp_eg1_exp_brms <- brm(
formula = fit_sp_eg1_exp_brms,
data = dat_Roger,
data2 = list(vcv=vcv),
chains = 2,
iter = 6000,
warmup = 3000,
prior = prior,
control = list(adapt_delta = 0.95, max_treedepth = 15)
)
summary(sp_eg1_exp_brms)
# Family: gaussian
# Links: mu = identity
# Formula: d_Hedges ~ 1 + (1 | gr(effect_id, cov = vcv)) + gp(x_km, y_km, cov = "exponential", scale = FALSE)
# Data: dat_Roger (Number of observations: 2361)
# Draws: 2 chains, each with iter = 6000; warmup = 3000; thin = 1;
# total post-warmup draws = 6000
#
# Gaussian Process Hyperparameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sdgp(gpx_kmy_km) 1.14 0.07 1.00 1.29 1.00 1818 3552
# lscale(gpx_kmy_km) 128.18 26.02 85.59 187.10 1.00 2227 2567
#
# Multilevel Hyperparameters:
# ~effect_id (Number of levels: 2361)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.00 0.00 1.00 1.00 NA NA NA
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept -0.32 0.09 -0.51 -0.15 1.00 1503 2737
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.97 0.02 0.92 1.01 1.00 3047 4199glmmTMB
In glmmTMB, the exponential spatial field is specified by exp(pos + 0 | const), where pos is an index constructed from the spatial coordinates.
dat_Roger$effect_id <- factor(dat_Roger$effect_id)
dat_Roger$site_id <- factor(dat_Roger$site_id)
VCV <- diag(dat_Roger$var_Hedges, nrow = nrow(dat_Roger))
rownames(VCV)<- colnames(VCV)<- dat_Roger$effect_id
VCV[1:5, 1:5]
dat_Roger$pos <- numFactor(dat_Roger$x_km, dat_Roger$y_km)
system.time(
sp_eg1_exp_tmb <- glmmTMB(d_Hedges ~ 1
+ equalto(0+effect_id|const, VCV)
# + (1|site_id)
+ exp(pos+0|const),
data = dat_Roger,
REML=TRUE)
)
# user system elapsed
# 45.570 1.369 47.530
summary(sp_eg1_exp_tmb)
# Family: gaussian ( identity )
# Formula: d_Hedges ~ 1 + equalto(0 + effect_id | const, VCV) + exp(pos + 0 | const)
# Data: dat_Roger
#
# AIC BIC logLik -2*log(L) df.resid
# 9088.5 9111.5 -4540.2 9080.5 2357
#
# Random effects:
#
# Conditional model:
# Groups Name Variance Std.Dev. Corr
# const effect_id1 0.202208 0.44968
# effect_id10 0.166676 0.40826 0.00
# effect_id100 0.225770 0.47515 0.00 0.00
# effect_id1000 0.197994 0.44497 0.00 0.00 0.00
# effect_id1001 3.676968 1.91754 0.00 0.00 0.00 0.00
# effect_id1002 0.315616 0.56180 0.00 0.00 0.00 0.00 0.00
# effect_id1003 0.305460 0.55268 0.00 0.00 0.00 0.00 0.00 0.00
# effect_id1004 0.313426 0.55984 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# effect_id1005 0.413540 0.64307 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# effect_id1006 0.325726 0.57072 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# ... ... ... ... ... ... ... ... ... ... ... ... ...
# const.1 pos(16262.6644099893,-5204.51951303405) 1.866150 1.36607
# pos(16280.4755285163,-5116.14629455727) 1.866150 1.36607 0.00
# pos(-7965.96086752978,-5019.4733863405) 1.866150 1.36607 0.00 0.00
# pos(-7959.34359171906,-4693.06364429579) 1.866150 1.36607 0.00 0.00 0.00
# pos(-7960.45678662699,-4691.63535918108) 1.866150 1.36607 0.00 0.00 0.00 0.00
# pos(-7096.61753807119,-4685.92422115392) 1.866150 1.36607 0.00 0.00 0.00 0.00 0.00
# pos(-7992.73943895704,-4607.717759142) 1.866150 1.36607 0.00 0.00 0.00 0.00 0.00 0.00
# pos(-7903.68384632242,-4579.4258128701) 1.866150 1.36607 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# pos(16237.0609271069,-4578.0132445546) 1.866150 1.36607 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# pos(16304.594788628,-4572.36489569299) 1.866150 1.36607 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# ... ... ... ... ... ... ... ... ... ... ... ... ...
# Residual 1.864305 1.36540
# Number of obs: 2361, groups: const, 1
#
# Dispersion estimate for gaussian family (sigma^2): 1.86
#
# Conditional model:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.40936 0.08139 -5.029 4.92e-07 ***
head(confint(sp_eg1_exp_tmb), 10)
# 2.5 % 97.5 % Estimate
# (Intercept) -0.5688798 -0.2498315 -0.4093556
# Std.Dev.pos(16262.6644099893,-5204.51951303405)|const.1 1.2417049 1.5028930 1.3660708
# Std.Dev.pos(16280.4755285163,-5116.14629455727)|const.1 1.2417049 1.5028930 1.3660708
# Std.Dev.pos(-7965.96086752978,-5019.4733863405)|const.1 1.2417049 1.5028930 1.3660708
# Std.Dev.pos(-7959.34359171906,-4693.06364429579)|const.1 1.2417049 1.5028930 1.3660708
# Std.Dev.pos(-7960.45678662699,-4691.63535918108)|const.1 1.2417049 1.5028930 1.3660708
# Std.Dev.pos(-7096.61753807119,-4685.92422115392)|const.1 1.2417049 1.5028930 1.3660708
# Std.Dev.pos(-7992.73943895704,-4607.717759142)|const.1 1.2417049 1.5028930 1.3660708
# Std.Dev.pos(-7903.68384632242,-4579.4258128701)|const.1 1.2417049 1.5028930 1.3660708
# Std.Dev.pos(16237.0609271069,-4578.0132445546)|const.1 1.2417049 1.5028930 1.3660708
sigma(sp_eg1_exp_tmb)
# 1.365396
sp_eg1_exp_tmb_varcor <- VarCorr(sp_eg1_exp_tmb)$cond
exp(sp_eg1_exp_tmb$fit$par)
# betadisp (effect_id) theta (tau/tau^2) theta (rho)
# 1.3653956 1.3660708 (1.866149) 0.2476221 Each model estimates two key aspects of the spatial structure.
The first is the spatial variance (\(\rho^2\)), which quantifies how much of the between effect size variability is organised by geography. If this variance is close to zero, spatial structure is negligible. If it is large, nearby effect sizes tend to be much more similar than distant ones.
The second is the range of spatial correlation, which describes how quickly this similarity decays with distance. Short ranges imply that only very nearby locations are correlated, while long ranges indicate broad scale spatial trends.
It is important to note that metafor and brms use different parametrisations for this range. In metafor, correlation is expressed as \(exp(− \rho * distance)\), whereas in brms it is \(exp(− distance / lscale)\). When distances are measured in kilometres, these are approximately related by \(\rho \approx 1 / lscale\). The numerical values therefore should not be compared directly across packages.
Because range parameters are often weakly identified, their uncertainty is typically large. Interpretation should focus more on whether spatial structure exists and on its approximate scale rather than on the exact numeric estimate.
In all three implementations, sampling error is treated as known through the reported variances of Hedges’ d. In brms and glmmTMB, this is enforced explicitly by supplying the variance covariance matrix of the effect sizes, which ensures that within effect size uncertainty is not re estimated. In metafor, the same role is played by the vi argument.
All three models, metafor, brms, and glmmTMB, give a consistent output. There is strong spatial autocorrelation in effect sizes. Nearby locations tend to show much more similar Hedges’ d values than would be expected from sampling error alone. This means that a substantial fraction of the apparent heterogeneity in the dataset is structured geographically rather than being random noise. The estimated spatial range is large, on the order of hundreds of kilometres, indicating that the spatial pattern is driven by broad scale geographic gradients rather than by very local site effects. Although the exact range is uncertain, all models agree that spatial dependence extends over long distances. After accounting for this spatial structure, the overall mean effect (\(\beta_0\)) remains negative, around −0.3 to −0.4 across all three packges. This shows that the global negative effect is not an artefact of spatial clustering.
Figures
TBU
metafor
sp_eg1_gau_mf <- rma.mv(d_Hedges, var_Hedges,
random = list(
~ 1|effect_id,
~ x_km + y_km |const
),
struct = "SPGAU",
data = dat_Roger,
sparse = TRUE,
verbose = TRUE,
method = "REML",
test = "t"
)
summary(sp_eg1_gau_mf)
# Multivariate Meta-Analysis Model (k = 2361; method: REML)
#
# logLik Deviance AIC BIC AICc
# -4034.4772 8068.9544 8076.9544 8100.0201 8076.9714
#
# Variance Components:
#
# estim sqrt nlvls fixed factor
# sigma^2 0.7934 0.8907 2361 no effect_id
#
# outer factor: const (nlvls = 1)
# inner term: ~x_km + y_km (nlvls = 383)
#
# estim sqrt fixed
# tau^2 1.2308 1.1094 no
# rho 0.0377 no
#
# Test for Heterogeneity:
# Q(df = 2360) = 21273.5149, p-val < .0001
#
# Model Results:
#
# estimate se tval df pval ci.lb ci.ub
# -0.3340 0.0642 -5.2036 2360 <.0001 -0.4599 -0.2082 *** brms
- model is running -
dat_Roger$effect_id <- as.character(dat_Roger$effect_id)
vcv <- diag(dat_Roger$var_Hedges)
rownames(vcv) <- colnames(vcv) <- dat_Roger$effect_id
fit_sp_eg1_gau_brms <- bf(d_Hedges ~ 1 +
# (1|site_id) + # this is site-level random effect
(1|gr(effect_id, cov = vcv)) + # this is m (sampling error)
gp(x_km, y_km,
cov = "exp_quad",
scale = FALSE)
)
# generate default priors
prior <- default_prior(fit_sp_eg1_gau_brms,
data = dat_Roger,
data2 = list(vcv = vcv),
family = gaussian())
prior$prior[5] = "constant(1)" # meta-analysis assumes sampling variance is known so fixing this to 1
prior
# fitting model
sp_eg1_gau_brms <- brm(
formula = fit_sp_eg1_gau_brms,
data = dat_Roger,
data2 = list(vcv=vcv),
chains = 2,
iter = 6000,
warmup = 3000,
prior = prior,
control = list(adapt_delta = 0.95, max_treedepth = 15)
)
summary(sp_eg1_gau_brms)glmmTMB
VCV <- diag(dat_Roger$var_Hedges, nrow = nrow(dat_Roger))
rownames(VCV)<- colnames(VCV)<- dat_Roger$effect_id
VCV[1:5, 1:5]
dat_Roger$pos <- numFactor(dat_Roger$x_km, dat_Roger$y_km)
system.time(
sp_eg1_gau_tmb <- glmmTMB(d_Hedges ~ 1
+ equalto(0+effect_id|const, VCV)
# + (1|site_id)
+ exp(pos+0|const),
data = dat_Roger,
REML=TRUE)
)
# user system elapsed
# 45.570 1.369 47.530
summary(sp_eg1_gau_tmb)
# Family: gaussian ( identity )
# Formula: d_Hedges ~ 1 + equalto(0 + effect_id | const, VCV) + exp(pos + 0 | const)
# Data: dat_Roger
#
# AIC BIC logLik -2*log(L) df.resid
# 8084.7 8107.8 -4038.4 8076.7 2357
#
# Random effects:
#
# Conditional model:
# Groups Name Variance Std.Dev. Corr
# const effect_id1 0.202208 0.44968
# effect_id2 0.166676 0.40826 0.00
# effect_id3 0.225770 0.47515 0.00 0.00
# effect_id4 0.197994 0.44497 0.00 0.00 0.00
# effect_id5 3.676968 1.91754 0.00 0.00 0.00 0.00
# effect_id6 0.315616 0.56180 0.00 0.00 0.00 0.00 0.00
# effect_id7 0.305460 0.55268 0.00 0.00 0.00 0.00 0.00
# effect_id8 0.313426 0.55984 0.00 0.00 0.00 0.00 0.00
# effect_id9 0.413540 0.64307 0.00 0.00 0.00 0.00 0.00
# effect_id10 0.325726 0.57072 0.00 0.00 0.00 0.00 0.00
# ... ... ... ... ... ... ... ...
# const.1 pos(16262.6644099893,-5204.51951303405) 1.232997 1.11040
# pos(16280.4755285163,-5116.14629455727) 1.232997 1.11040 0.00
# pos(-7965.96086752978,-5019.4733863405) 1.232997 1.11040 0.00 0.00
# pos(-7959.34359171906,-4693.06364429579) 1.232997 1.11040 0.00 0.00 0.00
# pos(-7960.45678662699,-4691.63535918108) 1.232997 1.11040 0.00 0.00 0.00 0.00
# pos(-7096.61753807119,-4685.92422115392) 1.232997 1.11040 0.00 0.00 0.00 0.00 0.00
# pos(-7992.73943895704,-4607.717759142) 1.232997 1.11040 0.00 0.00 0.00 0.00 0.00
# pos(-7903.68384632242,-4579.4258128701) 1.232997 1.11040 0.00 0.00 0.00 0.00 0.00
# pos(16237.0609271069,-4578.0132445546) 1.232997 1.11040 0.00 0.00 0.00 0.00 0.00
# pos(16304.594788628,-4572.36489569299) 1.232997 1.11040 0.00 0.00 0.00 0.00 0.00
# ... ... ... ... ... ... ... ...
# Residual 0.793492 0.89078
#
# 0.00
# 0.00 0.00
# 0.00 0.00 0.00
# 0.00 0.00 0.00 0.00
# ... ... ... ... ...
#
# 0.00
# 0.00 0.00
# 0.00 0.00 0.00
# 0.00 0.00 0.00 0.00
# ... ... ... ... ...
#
# Number of obs: 2361, groups: const, 1
#
# Dispersion estimate for gaussian family (sigma^2): 0.793
#
# Conditional model:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.33493 0.06444 -5.197 2.02e-07 ***
sp_eg1_gau_tmb$fit$par
# betadisp theta theta
# -0.1156562 0.1047238 -1.7463695
confint(sp_eg1_gau_tmb)TBU
2. Scholer et al. (2021)
We used the dataset accompanying Scholer et al. (2020), which was assembled for a meta-analysis of global avian survival across species and latitude. The dataset includes 949 estimates from 204 studies, with each row containing a survival estimate, its standard error, study and species identifiers, geographic coordinates, and several climatic and life-history variables. In our analyses, logit_survival was treated as the effect size (yi), and the known sampling variance was calculated as vi = se^2.
dat_scholer2020 <- read.csv(here("data","Scholer_2020", "Scholer_2020.csv"))
head(dat_scholer2020)
# logit_survival -> yi, 1/se -> vi^2
dat_scholer2020 <- dat_scholer2020 |>
mutate(vi = se^2)
dat_scholer2020$const <- 1 # add constant for spatial models
dat_scholer2020$effect_id <- seq_len(nrow(dat_scholer2020))
coords2 <- cbind(dat_scholer2020$long, dat_scholer2020$lat)
# project to a planar coordinate system and convert to kilometers
dat_sf2 <- st_as_sf(dat_scholer2020, coords = c("long", "lat"), crs = 4326)
dat_sf2_proj <- st_transform(dat_sf2, crs = 3857)
coords_m <- st_coordinates(dat_sf2_proj)
dat_scholer2020$x_km <- coords_m[,1] / 1000
dat_scholer2020$y_km <- coords_m[,2] / 1000
coords_km <- cbind(dat_scholer2020$x_km, dat_scholer2020$y_km)
dist_matrix_euclid2 <- as.matrix(dist(coords_km))
# row and column names
rownames(dist_matrix_euclid2) <- dat_scholer2020$effect_id
colnames(dist_matrix_euclid2) <- dat_scholer2020$effect_id
dat_scholer2020 <- dat_scholer2020 |>
group_by(lat, long) |>
mutate(site_id = cur_group_id()) |>
ungroup()
names(dat_scholer2020)
dat_scholer2020 <- dat_scholer2020 |>
mutate(site_id = as.factor(site_id),
effect_id = as.factor(effect_id))
nrow(dat_scholer2020)
# [1] 949
cat_vars <- dat_scholer2020 |>
dplyr::select(where( ~ is.factor(.x) || is.character(.x)))
n_levels <- cat_vars |>
dplyr::summarise(across(
everything(),
~ dplyr::n_distinct(.x)
)) |>
tidyr::pivot_longer(
cols = everything(),
names_to = "variable",
values_to = "n_levels"
)
n_levels
# A tibble: 15 × 2
# variable n_levels
# <chr> <int>
# 1 ref 205
# 2 order 15
# 3 species 636
# 4 common 636
# 5 tip_label 625
# 6 migration 2
# 7 sitename 207
# 8 country 46
# 9 hemisphere 2
# 10 passerine 2
# 11 realm 2
# 12 study_type 4
# 13 auk_used 2
# 14 geomean_used 9
# 15 effect_id 949
# 16 site_id 454It looks multiple sampling locations and effect sizes per studies…
site_coords <- dat_scholer2020 |>
distinct(site_id, x_km, y_km) |>
arrange(site_id)
coords_site <- as.matrix(site_coords[, c("x_km", "y_km")])
dist_site <- as.matrix(dist(coords_site))
rownames(dist_site) <- as.character(site_coords$site_id)
colnames(dist_site) <- as.character(site_coords$site_id)
dist_site
dat_scholer2020$site_id <- as.character(dat_scholer2020$site_id)We initially included site_id as a random effect, but the model yielded a zero estimate for it. As a result, we removed it from the following models.
metafor
sp_exp_metafor_eg2 <- rma.mv(
yi = logit_survival,
V = vi,
random = list(
~ 1 | effect_id,
~ 1 | ref,
~ effect_id | const
),
struct = "SPEXP",
data = dat_scholer2020,
dist = list(site = dist_site),
method = "REML",
test = "t",
sparse = TRUE,
verbose = TRUE
)
summary(sp_exp_metafor_eg2)
# Multivariate Meta-Analysis Model (k = 949; method: REML)
#
# Variance Components:
#
# estim sqrt nlvls fixed factor
# sigma^2.1 0.2255 0.4749 949 no effect_id
# sigma^2.2 0.4471 0.6687 205 no ref
#
# outer factor: const (nlvls = 1)
# inner term: ~site_id (nlvls = 454)
#
# estim sqrt fixed
# tau^2 0.0207 0.1438 no
# rho 737.8164 no
#
# Test for Heterogeneity:
# Q(df = 948) = 974007.4599, p-val < .0001
#
# Model Results:
#
# estimate se tval df pval ci.lb ci.ub
# 0.6540 0.0614 10.6455 948 <.0001 0.5334 0.7746 ***
confint(ma_sp2_exp)
# estimate ci.lb ci.ub
# sigma^2.1 0.2255 0.2025 0.2519
# sigma.1 0.4749 0.4500 0.5019
#
# estimate ci.lb ci.ub
# sigma^2.2 0.4471 0.3308 0.5947
# sigma.2 0.6687 0.5751 0.7712
#
# estimate ci.lb ci.ub
# tau^2 0.0207 0.0000 >10.0000
# tau 0.1438 0.0000 >3.1623
#
# estimate ci.lb ci.ub
# rho 737.8164 <73.7816 >7378.1637 brms
dat_scholer2020$ref_id <- as.integer(factor(dat_scholer2020$ref))
dat_scholer2020$effect_id <- as.character(dat_scholer2020$effect_id)
vcv <- diag(dat_scholer2020$vi)
rownames(vcv) <- colnames(vcv) <- dat_scholer2020$effect_id
fit_gp_brms <- bf(logit_survival ~ 1 +
(1|ref_id) + # this is site-level random effect
(1|gr(effect_id, cov = vcv)) + # this is m (sampling error)
gp(x_km, y_km,
cov = "exponential",
scale = FALSE)
)
# generate default priors
prior <- default_prior(fit_gp_brms,
data = dat_scholer2020,
data2 = list(vcv = vcv),
family = gaussian())
prior$prior[5] = "constant(1)" # meta-analysis assumes sampling variance is known so fixing this to 1
prior
sp_exp_brms_eg2 <- brm(
formula = fit_gp_brms,
data = dat_scholer2020,
data2 = list(vcv = vcv),
prior = prior,
iter = 6000,
warmup = 3000,
chains = num_chains,
backend = "cmdstanr",
threads = threading(threads_per_chain),
control = list(adapt_delta = 0.95, max_treedepth = 15)
)
summary(sp_exp_brms_eg2)
# Family: gaussian
# Links: mu = identity
# Formula: logit_survival ~ 1 + (1 | ref_id) + (1 | gr(effect_id, cov = vcv)) + gp(x_km, y_km, cov = "exponential", scale = FALSE)
# Data: dat_scholer2020 (Number of observations: 949)
# Draws: 2 chains, each with iter = 6000; warmup = 3000; thin = 1;
# total post-warmup draws = 6000
#
# Gaussian Process Hyperparameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sdgp(gpx_kmy_km) 0.18 0.09 0.02 0.39 1.00 588 998
# lscale(gpx_kmy_km) 4062.02 11170.84 495.01 21405.11 1.00 1562 2126
#
# Multilevel Hyperparameters:
# ~effect_id (Number of levels: 949)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.00 0.00 1.00 1.00 NA NA NA
#
# ~ref_id (Number of levels: 205)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.67 0.05 0.57 0.77 1.00 1520 2637
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 0.64 0.10 0.45 0.82 1.00 1854 2469
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.48 0.01 0.45 0.50 1.00 3887 4592
#
# Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).glmmTMB
TBU
The metafor and brms implementations produced very similar substantive results. In both models, the estimated intercept was about 0.65, with broadly overlapping uncertainty intervals. Among-study heterogeneity was also highly consistent across packages: metafor estimated a study-level variance of 0.45, while brms estimated standard deviation as 0.68 (var = 0.462). At the effect-size level, metafor estimated a variance of 0.23. In the analogous brms model, the residual standard deviation (sigma) was 0.48, giving a closely comparable estimate of effect-size-level variation.
The spatial term was smaller in magnitude, with metafor estimating \(\tau^2\) and brms estimating sdgp at about 0.18 (sdgp^2 = 0.032). However, the spatial range parameter was very imprecisely estimated in both implementations, suggesting limited information about the exact distance scale of spatial autocorrelation.
This implies that average adult survival was positive on the logit scale, while also varying appreciably among studies and locations. The spatial component suggests that geographically proximate populations may show somewhat similar survival values, although the exact scale of that dependence is not estimated very precisely in this example.
metafor
sp_gau_metafor_eg2 <- rma.mv(
yi = logit_survival,
V = vi,
random = list(
~ 1 | effect_id,
~ 1 | ref,
~ site_id | const
),
struct = "SPGAU",
data = dat_scholer2020,
dist = list(site = dist_site),
method = "REML",
test = "t",
sparse = TRUE,
verbose = TRUE
)
summary(sp_gau_metafor_eg2)
# Multivariate Meta-Analysis Model (k = 949; method: REML)
#
# logLik Deviance AIC BIC AICc
# -824.5113 1649.0225 1659.0225 1683.2943 1659.0862
#
# Variance Components:
#
# estim sqrt nlvls fixed factor
# sigma^2.1 0.2297 0.4793 949 no effect_id
# sigma^2.2 0.4578 0.6766 205 no ref
#
# outer factor: const (nlvls = 1)
# inner term: ~site_id (nlvls = 454)
#
# estim sqrt fixed
# tau^2 0.0202 0.1420 no
# rho 4713.8765 no
#
# Test for Heterogeneity:
# Q(df = 948) = 974007.4599, p-val < .0001
#
# Model Results:
#
# estimate se tval df pval ci.lb ci.ub
# 0.6466 0.0830 7.7876 948 <.0001 0.4837 0.8096 ***
confint(sp_gau_metafor_eg2)
# estimate ci.lb ci.ub
# sigma^2.1 0.2297 0.2069 0.2559
# sigma.1 0.4793 0.4548 0.5059
#
# estimate ci.lb ci.ub
# sigma^2.2 0.4578 0.3435 0.6054
# sigma.2 0.6766 0.5861 0.7781
#
# estimate ci.lb ci.ub
# tau^2 0.0202 0.0000 >10.0000
# tau 0.1420 0.0000 >3.1623
#
# estimate ci.lb ci.ub
# rho 4713.8765 <471.3877 >47138.7654 brms
fit_gp_brms <- bf(logit_survival ~ 1 +
(1|ref) + # this is site-level random effect
(1|gr(effect_id, cov = vcv)) + # this is m (sampling error)
gp(x_km, y_km,
cov = "exp_quad",
scale = FALSE)
)
# generate default priors
prior <- default_prior(fit_gp_brms,
data = dat_scholer2020,
data2 = list(vcv = vcv),
family = gaussian())
prior$prior[5] = "constant(1)" # meta-analysis assumes sampling variance is known so fixing this to 1
prior
sp_gau_brms_eg2 <- brm(
formula = fit_gp_brms,
data = dat_scholer2020,
data2 = list(vcv = vcv),
prior = prior,
iter = 8000,
warmup = 4000,
chains = num_chains,
backend = "cmdstanr",
threads = threading(threads_per_chain),
control = list(adapt_delta = 0.95, max_treedepth = 15)
)
summary(sp_gau_brms_eg2)
# Family: gaussian
# Links: mu = identity
# Formula: logit_survival ~ 1 + (1 | ref) + (1 | gr(effect_id, cov = vcv)) + gp(x_km, y_km, cov = "exp_quad", scale = FALSE)
# Data: dat_scholer2020 (Number of observations: 949)
# Draws: 2 chains, each with iter = 8000; warmup = 4000; thin = 1;
# total post-warmup draws = 8000
#
# Gaussian Process Hyperparameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sdgp(gpx_kmy_km) 0.16 0.11 0.02 0.42 1.00 836 969
# lscale(gpx_kmy_km) 3465.74 13257.32 365.32 19298.85 1.00 242 525
#
# Multilevel Hyperparameters:
# ~effect_id (Number of levels: 949)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 1.00 0.00 1.00 1.00 NA NA NA
#
# ~ref (Number of levels: 205)
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept) 0.68 0.05 0.58 0.78 1.00 2069 3664
#
# Regression Coefficients:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept 0.65 0.10 0.47 0.82 1.00 2347 1774
#
# Further Distributional Parameters:
# Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma 0.48 0.01 0.45 0.51 1.00 4263 5271
#
# Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).glmmTMB
TBU
The Gaussian-process kernel gave us the identical results to exponential kernel model except for rho and lscale. The metafor and brms implementations gave very similar substantive results. In both models, the estimated intercept was about 0.65, with broadly overlapping 95% CI and CrI. Variation among individual effect sizes was estimated at approximately 0.48 on the standard deviation scale, and between-study heterogeneity was also nearly identical across packages at about 0.68. The spatial term was smaller in magnitude, with metafor estimating and brms estimating the Gaussian process standard deviation at about 0.16 (sdgp^2 = 0.026). However, the distance-scale parameter of spatial correlation remained weakly identified, with wide uncertainty in both implementations.
TBU
Software and package versions
R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.7.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Edmonton
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] phytools_2.5-2 maps_3.4.3 gtools_3.9.5
[4] sf_1.0-24 rotl_3.1.0 glmmTMB_1.1.14
[7] janitor_2.2.1 magrittr_2.0.4 orchaRd_2.1.3
[10] posterior_1.6.1 tidybayes_3.0.7 bayesplot_1.15.0
[13] MCMCglmm_2.36 coda_0.19-4.1 patchwork_1.3.2
[16] ape_5.8-1 here_1.0.2 crayon_1.5.3
[19] lubridate_1.9.4 forcats_1.0.1 stringr_1.6.0
[22] dplyr_1.1.4 purrr_1.2.0 readr_2.1.6
[25] tidyr_1.3.2 tibble_3.3.1 ggplot2_4.0.2
[28] tidyverse_2.0.0 metafor_4.8-0 numDeriv_2016.8-1.1
[31] metadat_1.4-0 Matrix_1.7-4 brms_2.23.0
[34] Rcpp_1.1.1
loaded via a namespace (and not attached):
[1] mathjaxr_2.0-0 RColorBrewer_1.1-3 tensorA_0.36.2.1
[4] rstudioapi_0.17.1 jsonlite_2.0.0 estimability_1.5.1
[7] farver_2.1.2 nloptr_2.2.1 rmarkdown_2.30
[10] vctrs_0.7.1 minqa_1.2.8 htmltools_0.5.9
[13] progress_1.2.3 distributional_0.5.0 curl_7.0.0
[16] DEoptim_2.2-8 KernSmooth_2.23-26 htmlwidgets_1.6.4
[19] sandwich_3.1-1 emmeans_2.0.1 zoo_1.8-15
[22] TMB_1.9.19 igraph_2.2.1 lifecycle_1.0.5
[25] iterators_1.0.14 pkgconfig_2.0.3 R6_2.6.1
[28] fastmap_1.2.0 rbibutils_2.4 snakecase_0.11.1
[31] digest_0.6.39 rprojroot_2.1.1 clusterGeneration_1.3.8
[34] timechange_0.3.0 httr_1.4.7 abind_1.4-8
[37] mgcv_1.9-3 compiler_4.5.2 proxy_0.4-29
[40] withr_3.0.2 doParallel_1.0.17 S7_0.2.1
[43] backports_1.5.0 optimParallel_1.0-2 DBI_1.2.3
[46] MASS_7.3-65 scatterplot3d_0.3-44 classInt_0.4-11
[49] corpcor_1.6.10 loo_2.9.0 tools_4.5.2
[52] units_1.0-0 rncl_0.8.8 otel_0.2.0
[55] rentrez_1.2.4 quadprog_1.5-8 glue_1.8.0
[58] nlme_3.1-168 grid_4.5.2 checkmate_2.3.3
[61] generics_0.1.4 gtable_0.3.6 tzdb_0.5.0
[64] class_7.3-23 hms_1.1.4 foreach_1.5.2
[67] pillar_1.11.1 ggdist_3.3.3 splines_4.5.2
[70] lattice_0.22-7 tidyselect_1.2.1 knitr_1.51
[73] reformulas_0.4.3.1 arrayhelpers_1.1-0 xfun_0.55
[76] expm_1.0-0 bridgesampling_1.2-1 matrixStats_1.5.0
[79] stringi_1.8.7 yaml_2.3.12 pacman_0.5.1
[82] boot_1.3-32 evaluate_1.0.5 codetools_0.2-20
[85] cli_3.6.5 RcppParallel_5.1.11-1 xtable_1.8-4
[88] Rdpack_2.6.4 svUnit_1.0.8 XML_3.99-0.20
[91] parallel_4.5.2 rstantools_2.5.0 prettyunits_1.2.0
[94] cubature_2.1.4-1 Brobdingnag_1.2-9 phangorn_2.12.1
[97] lme4_1.1-38 mvtnorm_1.3-3 scales_1.4.0
[100] e1071_1.7-17 combinat_0.0-8 rlang_1.1.7
[103] fastmatch_1.1-6 mnormt_2.1.1
References
- Jetz, W., G. H. Thomas, J. B. Joy, K. Hartmann, and A. O. Mooers. (2012). The global diversity of birds in space and time. Nature. 491:444-448. https://doi.org/10.1038/nature11631
- Grafen A. (1989). The phylogenetic regression. Philosophical Transactions of the Royal Society of London. B, Biological Sciences. 326:119-157. https://doi.org/10.1098/rstb.1989.0106
- Grau-Andrés, R., Moreira, B., & Pausas, J. G. (2024). Global plant responses to intensified fire regimes. Global Ecology and Biogeography. 33:e13858. https://doi.org/10.1111/geb.13858
- Grenié M, Berti E, Carvajal‐Quintero J, Dädlow GM, Sagouis A, Winter M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods in Ecology and Evolution. 14:12-25. https://doi.org/10.1111/2041-210X.13802
- Lim, J. N., Senior, A. M., & Nakagawa, S. (2014). Heterogeneity in individual quality and reproductive trade-offs within species. Evolution. 68:2306–2318. https://doi.org/10.1111/evo.12446
- Nakagawa, S., Lagisz, M., O’Dea, R. E., Pottier, P., Rutkowska, J., Senior, A. M., Yang, Y., & Noble, D. W. A. (2023). orchaRd 2.0: An R package for visualising meta-analyses with orchard plots. Methods in Ecology and Evolution. 14:2003–2010. https://doi.org/10.1111/2041-210X.14152
- Scholer, M. N., Strimas‐Mackey, M., & Jankowski, J. E. (2020). A meta‐analysis of global avian survival across species and latitude. Ecology Letters. 23:1537-1549. https://doi.org/10.1111/ele.13573
- Viechtbauer, W., White, T., Noble, D., Senior, A., & Hamilton, W. K. (2025). metadat: Meta-analysis datasets (Version 1.5-2) R package. https://github.com/wviechtb/metadat
- Rios Moura, R., Oliveira Gonzaga, M., Silva Pinto, N., Vasconcellos-Neto, J., & Requena, G. S. (2021). Assortative mating in space and time: Patterns and biases. Ecology Letters. 24:1089–1102. https://doi.org/10.1111/ele.13690