Intended audience and scope

This online tutorial accompanies our tutorial paper: A unified framework for phylogenetic and spatial meta-analysis: concepts, implementation, and practical guidance [LINK]

This is intended for researchers who already have a basic understanding of meta-analysis and some familiarity with statistical modelling in R. It is aimed at readers who are interested in phylogenetic and spatial meta-analytic modelling, given the increasing availability of large-scale datasets that include species-level traits and geographic information.

In this online tutorial, we demonstrate how to conduct phylogenetic and spatial meta-analyses using R packages such as metafor, brms, and glmmTMB. We provide step-by-step instructions on creating phylogenetic and spatial correlation matrices, fitting models with different correlation structures, and interpreting the results.

Setup

rm(list = ls())  # remove all objects from the workspace to start from a clean session and avoid using stale variables.
gc()  # trigger garbage collection to free memory and reduce RAM usage after clearing the workspace.

# install.packages("pacman") # if not already installed
pacman::p_load(brms, metafor, metadat, tidyverse, crayon, here, ape, patchwork, dplyr, tidyr, MCMCglmm, bayesplot, tidybayes, posterior, orchaRd,purrr, stringr, readr, magrittr, janitor, glmmTMB, rotl, sf, gtools)

Set up parallel processing for brms - please adjust according to your computer’s capabilities.

max_cores <- 10
num_chains <- 2
threads_per_chain <- floor(max_cores / num_chains)
options(mc.cores = num_chains)

Phylogenetic meta-analysis

How to make phylogenetic correlation matrix?

First, we will show how to make a phylogenetic correlation matrix to account for the shared evolutionary history of species in your meta-analytic dataset. A phylogenetic correlation matrix reflects how closely related species are. Relations among a set of species (phylogenies) can be derived in two main ways:

Using a specific phylogenetic tree
Using the rotl package (accessing a synthetic tree from Open Tree of Life; https://opentreeoflife.github.io/)

In the following two sections, we introduce how to make the correlation matrix from phylogenetic trees in these two ways. When making a correlation matrix from a phylogenetic tree we need to assume an evolutionary model (process), which determines how branch lengths are transformed into expected covariances between species. The example below demonstrates how to compute a correlation matrix under the Brownian motion process.

Note on tree preparation and cleaning/pruning

The starting point usually is a list of species represented in your data set – ideally, it is a list of correct Latin binomial names, using the most recent and accepted taxonomic naming convention. Sometimes, the species names in your data may not match those in the tree you are using to create your phylogenetic correlation matrix. Data matching and cleaning are not covered in this tutorial - please ensure that naming is consistent and fully matched before creating the phylogenetic correlation matrix. There are some helper R packages for resolving mismatched taxonomic names among datasets and phylogenetic trees (reviewed in Grenié et al. 2022). Meta-analytic models will not run if the lists of names in the data set and those in the correlation matrix do not match exactly.

Also, when subsetting a larger existing phylogenetic tree to match your list of species, it will be necessary to remove (drop) extra species from the tree that are not in your dataset (i.e. prune the tree). In that case, you can use TreeTools::DropTip() / ape::drop.tip() (orTreeTools::KeepTip() / ape::keep.tip()). You can see how to do this in the example below)

In some cases, the phylogenetic relationships among the species of interest are already well established, and a corresponding phylogenetic tree for a given taxon is available from a published source or a public repository. When this is the case, it is usually preferable to use the existing tree rather than constructing a new one. For example, large, time-calibrated phylogenies are available for many taxonomic groups (e.g. birds, mammals, plants), often accompanied by branch lengths and documentation of the underlying assumptions. Using such trees ensures consistency with previous studies and avoids unnecessary reconstruction steps

We now illustrate how to use an existing phylogenetic tree provided as a file and prepare it for use in downstream analyses, such as phylogenetic meta-analysis (of course, you can also apply what we explain to broader phylogenetic comparative analysis.)

Step 1. Read the tree from a file

Phylogenetic trees are commonly distributed in formats such as Newick (.nwk, .tre) or Nexus (.nex). The ape package can read most of these standard-format files.

Here, we will use a sample dataset on Sylviidae (Sylviid warblers; a family of passerines). We have a Nexus file containing a phylogenetic tree for 294 species from the Sylvidae family. However, our meta-analytic dataset only includes 10 species (for didactic purposes). So, we first need to “prune” the tree to remove those species that are not in our data set.

library(ape)

#load ape library for reading and manipulating phylogenetic trees in R

dat <- read.csv(here("data", "Sylviidae_dat.csv"))
dat$Phylo 
#  [1] "Abroscopus_albogularis"      "Abroscopus_schisticeps"     
#  [3] "Abroscopus_superciliaris"    "Achaetops_pycnopygius"      
#  [5] "Acrocephalus_aedon"          "Acrocephalus_aequinoctialis"
#  [7] "Acrocephalus_agricola"       "Acrocephalus_arundinaceus"  
#  [9] "Acrocephalus_atyphus"        "Acrocephalus_australis"     

# Load phylogenetic tree – note that this file contains multiple alternative trees
trees <- read.nexus(here("data", "Sylviidae.nex"))

# select the first tree from a list of many alternative trees for Sylviidae
tree <- trees[[1]]

#plot(tree) #you can try to plot the tree, but its going to be quite big!

Step 2. Basic structural checks

Before using the tree in analyses, it is good practice to perform some basic checks to ensure that the tree structure is appropriate for the intended analysis (i.e. in this case, is the tree ultrametric and binary – see below).

# Check whether the tree is ultrametric (time-calibrated)
is.ultrametric(tree)
# [1] TRUE

# Check whether the tree is fully bifurcating
is.binary(tree)
# [1] TRUE

Step 3. Match tree tip labels to the dataset

The species names in the tree (tip labels) must match exactly the species names used in the dataset.

# species names used in the dataset
species_data <- unique(dat$Phylo)

length(tree$tip.label) # number if species names on the tree tips
# [1] 294
length(species_data) # number of species names in the dataset
# [1] 10

# check for mismatches between tree tip labels and dataset species names
setdiff(species_data, tree$tip.label) # species names from the dataset not found in the tree
setdiff(tree$tip.label, species_data) # species names from the tree not found in the dataset

Given that we only have 10 species in the dataset and 294 species in the tree, most species from tree will not be present in the dataset. At this point, it is also very important to look at the other mismatch result, which checks whether all of the species in our dataset can be matched to the tree. If mismatches occur, we need to consider whether it is a truly missing species, typographical errors, outdated taxonomy, hybrid species which do not fit on binomial tree, or differences in naming conventions (e.g. using underscores vs spaces between name segments, or using subspecies names). These issues must be resolved manually before proceeding. In our example, we have a dataset with species names that match the tree, si we do not need to deal with such issues.

Step 4. Prune the tree to include only species in the dataset

Phylogenetic trees often contain more species than required for a given analysis (as in our example above). Extraneous tips should be removed to match the dataset.

tree_pruned  <- drop.tip(tree,setdiff(tree$tip.label, species_data))

lwngth(tree_pruned$tip.label) # check how many species left on the pruned tree
plot(tree_pruned) # check the pruned tree

# check the binary and ultrametric status again
is.binary(tree_pruned)
# [1] TRUE

is.ultrametric(tree_pruned)
# [1] TRUE

Step 5. Compute the phylogenetic correlation matrix

Once the tree is time-calibrated (= ultrametric), fully bifurcating (= binary), and pruned to match the dataset, you can compute the phylogenetic correlation matrix under a Brownian motion model using the vcv() function from the ape package.

A <-  vcv(tree_pruned, corr = TRUE)
# check the dimensions and preview the top-left block
dim(A)

This matrix represents the expected pirwise correlation among species due to shared evolutionary history and can be used directly in phylogenetic comparative or meta-analytic models.

In practice, you will not always have a phylogenetic tree readily available for the species in your dataset. In such cases, the rotl package can be used to access the Open Tree of Life and retrieve a tree based on the Latin species names in your dataset.

Note

Make sure that the species names in your dataset are in Latin (scientific names) and are spelled correctly. The rotl package relies on these names to find the corresponding species in the Open Tree of Life (https://opentreeoflife.github.io/use).
The OpenTree synthetic tree and name resolution services are actively updated. Running the same code at different times may yield slightly different trees due to updates in the database.
For reproducible research, it is good practice to save the retrieved tree to a file (e.g., in Newick or Nexus format) after obtaining it using rotl. This way, you can ensure that you are using the same tree in future analyses.
Automated name matching is convenient but may not always be perfect. It is advisable to manually check the matched names to ensure accuracy.

# install and load necessary packages - if not already installed
# install.packages("rotl")
# install.packages("ape")

library(rotl)
library(ape)

# record info about OpenTree version
tol_about()
# OpenTree Synthetic Tree of Life.

# Tree version: opentree16.1
# Taxonomy version: 3.7draft3
# Constructed on: 2025-12-20 00:55:58
# Number of terminal taxa: 2385875
# Number of source trees: 2064
# Number of source studies: 1931
# Source list present: false
# Root taxon: cellular organisms
# Root ott_id: 93302
# Root node_id: ott93302

Step 1. Provide a species list

Here, we use 10 commonly used lab model taxa across different animal groups as an example. Use Latin names, not common names.

# make species list
myspecies <- c(
  "Escherichia colli",                # typo on purpose
  "Chlamydomonas reinhardtii",
  "Drosophila melanogaster",
  "Arabidopsis thaliana",
  "Rattus norvegicus",
  "Mus musculus",
  "Cavia porcellus",
  "Xenopus laevis",
  "Saccharomyces cervisae",           # typo on purpose
  "Danio rerio"
)

Step 2. Resolve species names using OpenTree TNRS

We start with strict matching to identify exact matches and flag names that may require manual attention. We then use approximate matching only as a diagnostic step for unresolved names, because fuzzy matches can help detect likely typos or synonyms but should always be checked before being accepted.

# strict matching first:
taxa_strict <- tnrs_match_names(
  names = myspecies, 
  do_approximate_matching = FALSE # set FALSE
  )
#   Warning message:
# Escherichia colli, Saccharomyces cervisae are not matched 

taxa_strict
#                search_string               unique_name approximate_match score
# 1          escherichia colli                      <NA>                NA    NA
# 2  chlamydomonas reinhardtii Chlamydomonas reinhardtii             FALSE     1
# 3    drosophila melanogaster   Drosophila melanogaster             FALSE     1
# 4       arabidopsis thaliana      Arabidopsis thaliana             FALSE     1
# 5          rattus norvegicus         Rattus norvegicus             FALSE     1
# 6               mus musculus              Mus musculus             FALSE     1
# 7            cavia porcellus           Cavia porcellus             FALSE     1
# 8             xenopus laevis            Xenopus laevis             FALSE     1
# 9     saccharomyces cervisae                      <NA>                NA    NA
# 10               danio rerio               Danio rerio             FALSE     1
#     ott_id is_synonym flags number_matches
# 1       NA         NA  <NA>             NA
# 2    33153      FALSE                    1
# 3   505714      FALSE                    1
# 4   309263      FALSE                    1
# 5   271555      FALSE                    1
# 6   542509      FALSE                    1
# 7   744000      FALSE                    1
# 8   465096      FALSE                    1
# 9       NA         NA  <NA>             NA
# 10 1005914      FALSE                    1


# fuzzy matching for unresolved names:
taxa <- tnrs_match_names(
  names = myspecies,
  do_approximate_matching = TRUE # set TRUE
)

taxa
#                search_string               unique_name approximate_match
# 1          escherichia colli          Escherichia coli              TRUE
# 2  chlamydomonas reinhardtii Chlamydomonas reinhardtii             FALSE
# 3    drosophila melanogaster   Drosophila melanogaster             FALSE
# 4       arabidopsis thaliana      Arabidopsis thaliana             FALSE
# 5          rattus norvegicus         Rattus norvegicus             FALSE
# 6               mus musculus              Mus musculus             FALSE
# 7            cavia porcellus           Cavia porcellus             FALSE
# 8             xenopus laevis            Xenopus laevis             FALSE
# 9     saccharomyces cervisae  Saccharomyces cerevisiae              TRUE
# 10               danio rerio               Danio rerio             FALSE
#        score  ott_id is_synonym          flags number_matches
# 1  0.9375000  474506      FALSE sibling_higher              2
# 2  1.0000000   33153      FALSE                             1
# 3  1.0000000  505714      FALSE                             1
# 4  1.0000000  309263      FALSE                             1
# 5  1.0000000  271555      FALSE                             1
# 6  1.0000000  542509      FALSE                             1
# 7  1.0000000  744000      FALSE                             1
# 8  1.0000000  465096      FALSE                             1
# 9  0.9166667  356221      FALSE                             2
# 10 1.0000000 1005914      FALSE                             1

In this example, the strict match flags two misspelled names, and approximate matching helps identify their likely intended taxa. In practice, once these likely matches have been identified, it is better to correct the original species names and rerun strict matching than to proceed with fuzzy matches in the final workflow.

If the strict match fails or returns NAs, you can enable approximate matching. Approximate matching uses fuzzy string matching to find the closest matches in the OpenTree taxonomy, but you should always inspect the returned matches carefully.

Key columns to check include

unique_name: the matched OpenTree taxon name
ott_id: the OpenTree taxonomy identifier used for tree retrieval
approximate_match: TRUE indicates a non-exact match, for example due to a typo
is_synonym: TRUE indicates the input name was treated as a synonym

A useful rule of thumb is as follows:

if approximate_match == TRUE and the reason is an obvious typo, correct the original species list and rerun the name matching. This ensures that the final dataset contains clean and unambiguous names.
If a match looks suspicious, for example because the matched taxon belongs to a different genus or the name could plausibly refer to multiple taxa, do not accept the match blindly. Instead, inspect the additional information returned by tnrs_match_names() to confirm that the matched taxon is indeed the one you intended to include.

Step 3. Fix typos and re-run matching

myspecies_fixed <- c(
          "Escherichia coli",
          "Chlamydomonas reinhardtii",
          "Drosophila melanogaster",
          "Arabidopsis thaliana",
          "Rattus norvegicus",
          "Mus musculus",
          "Cavia porcellus",
          "Xenopus laevis",
          "Saccharomyces cerevisiae",
          "Danio rerio"
           )
  
taxa_fixed <- tnrs_match_names(
                    names = myspecies_fixed,
                    do_approximate_matching = FALSE
                     )

# taxa_fixed
#                search_string               unique_name approximate_match score
# 1           escherichia coli          Escherichia coli             FALSE     1
# 2  chlamydomonas reinhardtii Chlamydomonas reinhardtii             FALSE     1
# 3    drosophila melanogaster   Drosophila melanogaster             FALSE     1
# 4       arabidopsis thaliana      Arabidopsis thaliana             FALSE     1
# 5          rattus norvegicus         Rattus norvegicus             FALSE     1
# 6               mus musculus              Mus musculus             FALSE     1
# 7            cavia porcellus           Cavia porcellus             FALSE     1
# 8             xenopus laevis            Xenopus laevis             FALSE     1
# 9   saccharomyces cerevisiae  Saccharomyces cerevisiae             FALSE     1
# 10               danio rerio               Danio rerio             FALSE     1
#     ott_id is_synonym          flags number_matches
# 1   474506      FALSE sibling_higher              1
# 2    33153      FALSE                             1
# 3   505714      FALSE                             1
# 4   309263      FALSE                             1
# 5   271555      FALSE                             1
# 6   542509      FALSE                             1
# 7   744000      FALSE                             1
# 8   465096      FALSE                             1
# 9   356221      FALSE                             1
# 10 1005914      FALSE                             1

At this point, you should confirm that: - All species have a valid ott_id. - the returned matches correspond to the intended taxa.

Step 4. Retrieve the phylogenetic tree from OpenTree

Once you are satisfied with the matched OpenTree identifiers, you can request a trimmed subtree from the OpenTree synthetic tree using tol_induced_subtree(). At this stage, the OpenTree subtree provides a convenient phylogenetic topology for the matched taxa. However, before using it in comparative or meta-analytic models, you should still check whether the tip labels are appropriate, whether branch lengths are available, and whether the tree satisfies assumptions such as bifurcation and ultrametricity.

tree <- tol_induced_subtree(
            ott_ids = taxa_fixed[["ott_id"]],
            label_format = "name"
            ) 
# returns a trimmed subtree as an ape::phylo object
# warnings about collapsing single nodes are common and usually not problematic here


Progress [---------------------------------] 0/304 (  0%) ?s
Progress [==============================] 304/304 (100%)  0s

Step 5. Clean and standardise tip labels

The tip labels returned by OpenTree are not always directly ready for downstream analyses. They may contain underscores, appended identifiers, or labels corresponding to internal nodes rather than named terminal taxa. The goal of this step is therefore to standardise the labels for readability and to check that each tip can be matched unambiguously to the intended species in the dataset.

In our example, one tip is labelled "mrcaott616ott617" rather than Escherichia coli. This is a non-taxon internal node within Bacteria, which can arise when only one representative from a large clade is included. Since there are no other bacteria in the dataset and the tree position is appropriate for our purposes, we relabel this tip as Escherichia coli for didactic purposes.

tree$tip.label # see the current tree tip labels
#  [1] "Arabidopsis_thaliana"      "Chlamydomonas_reinhardtii"
#  [3] "Mus_musculus"              "Rattus_norvegicus"        
#  [5] "Cavia_porcellus"           "Xenopus_laevis"           
#  [7] "Danio_rerio"               "Drosophila_melanogaster"  
#  [9] "Saccharomyces_cerevisiae"  "mrcaott616ott617"   

# replace underscores with spaces for readability
tree$tip.label <- gsub("_", " ", tree$tip.label)

# inspect the lineage of the E. coli OTT identifier
ecoli <- tol_node_info(ott_id = 474506, include_lineage = TRUE)
tax_lineage(ecoli) # domain within Bacteria
#      rank               name        unique_name ott_id
# 1  domain           Bacteria           Bacteria 844192
# 2 no rank cellular organisms cellular organisms  93302

# we can see what info is available for this genus - not much, looks like a messy bit of a tree flagged as "INCONSISTENT"
# tnrs_match_names(names = ("Escherichia")) 

# relabel the MRCA-like tip for plotting and matching
tree$tip.label <- gsub("mrcaott616ott617", "Escherichia coli", tree$tip.label)

We can now plot the cleaned tree for inspection. Note that branch lengths are not yet included, so at this stage the tree is mainly useful for checking topology and tip labels.

Step 6. Final checks (binary tree and matching tip labels)

Before using the tree in downstream analyses, it is good practice to confirm that:

The tree is fully binary, if that is required for your method.
The tip labels match exactly the species list in your dataset.

# check whether the tree is binary
is.binary(tree) # should return TRUE
# [1] TRUE

# check exact matching between the species list and tree tip labels
intersect(as.character(tree$tip.label), myspecies_fixed)
#  [1] "Arabidopsis thaliana"      "Chlamydomonas reinhardtii"
#  [3] "Mus musculus"              "Rattus norvegicus"        
#  [5] "Cavia porcellus"           "Xenopus laevis"           
#  [7] "Danio rerio"               "Drosophila melanogaster"  
#  [9] "Saccharomyces cerevisiae"  "Escherichia coli"  

setdiff(myspecies_fixed, as.character(tree$tip.label))
# character(0)
setdiff(as.character(tree$tip.label), myspecies_fixed)
# character(0)

Step 7. Compute the phylogenetic correlation matrix (handling missing branch lengths)

At this point, the tree topology may be usable for plotting or simple inspection, but it is not yet guaranteed to be suitable for phylogenetic covariance calculations. In particular, vcv() requires branch lengths, and many comparative models additionally assume a fully bifurcating and often ultrametric tree. The Open Tree of Life synthetic tree often provides a well-resolved topology, but it does not always include branch lengths.

To compute a phylogenetic variance-covariance (VCV) or correlation matrix under a Brownian motion (BM) process, we use vcv() from the ape package. If the tree has no branch lengths, vcv() will fail with:

A <- vcv(tree, corr = TRUE)
# Error in vcv.phylo(tree, corr = TRUE) : the tree has no branch lengths

For didactic purposes in this tutorial, we assign equal branch lengths to all edges (that is, all branch lengths are set to 1). This allows us to demonstrate the workflow and obtain a valid correlation matrix. In real analyses, you should consider using a tree with biologically meaningful branch lengths (for example, a time-calibrated phylogeny).

# check whether the OpenTree synthetic tree includes branch lengths
is.null(tree$edge.length)
# [1] TRUE # TRUE indicates that branch lengths are missing

# if branch lengths are missing, add equal branch lengths for didactic purposes
if (is.null(tree$edge.length)) {
  tree$edge.length <- rep(1, nrow(tree$edge))
    }

# compute the phylogenetic correlation matrix under Brownian motion
A <- vcv(tree, corr = TRUE)

# check the dimensions and preview the top-left block
dim(A)        # should return the number of species (rows/columns)
# [1] 10 10

A[1:5, 1:5]
#                           Arabidopsis_thaliana Chlamydomonas_reinhardtii
# Arabidopsis_thaliana                 1.0000000                 0.6666667
# Chlamydomonas_reinhardtii            0.6666667                 1.0000000
# Mus_musculus                         0.2041241                 0.2041241
# Rattus_norvegicus                    0.2041241                 0.2041241
# Cavia_porcellus                      0.2182179                 0.2182179
#                           Mus_musculus Rattus_norvegicus Cavia_porcellus
# Arabidopsis_thaliana         0.2041241         0.2041241       0.2182179
# Chlamydomonas_reinhardtii    0.2041241         0.2041241       0.2182179
# Mus_musculus                 1.0000000         0.8750000       0.8017837
# Rattus_norvegicus            0.8750000         1.0000000       0.8017837
# Cavia_porcellus              0.8017837         0.8017837       1.0000000

The subtree returned by OpenTree may be sufficient for demonstration purposes, but many phylogenetic models require additional assumptions about branching structure and branch lengths. We therefore illustrate how to diagnose and handle two common issues: polytomies and non-ultrametric trees.

A polytomy occurs when an internal node has more than two descendant branches (a multifurcation).

A non-ultrametric tree is a tree in which not all tips are equidistant from the root, meaning that the tree is not time-calibrated.

Many phylogenetic comparative and meta-analytic models assume a fully bifurcating and ultrametric tree, because the expected covariance among species is defined under a continuous-time branching process (for example Brownian motion or Ornstein–Uhlenbeck evolution). Here we illustrate what these issues mean, and how they can be handled, using a simple toy example.

Create a toy tree with a polytomy

library(ape)
library(phytools)

# A simple tree with one polytomy (A, B, C diverge simultaneously)
tree_poly <- read.tree(text = "((A,B,C),D,E);")
plot(tree_poly)

is.binary(tree_poly)
# [1] FALSE

Here, species A, B, and C form a polytomy because their common ancestor has three descendants instead of two…

Under a Brownian motion or OU model, the phylogenetic covariance matrix is derived from branch lengths along a fully specified bifurcating tree.
When a node is a polytomy, the relative order of divergence among its descendants is undefined, which means that their shared evolutionary history cannot be uniquely translated into a covariance structure.

Resolving polytomies

multi2di() resolves a polytomy by inserting very short branches, creating an arbitrary but fully bifurcating tree. This corresponds to assuming that the true branching order is unknown but occurred over a very short evolutionary time.

set.seed(1)

tree_bin <- multi2di(tree_poly, random = TRUE)

is.binary(tree_bin)
# [1] TRUE

tree_bin <- multi2di(tree_poly, random = TRUE)
plot(tree_bin)

Because the resolution is random, different runs of multi2di() will yield different bifurcating trees. If you want reproducible results, set a random seed before calling multi2di().

set.seed(15)
tree_bin1 <- multi2di(tree_poly, random = TRUE)
set.seed(50)
tree_bin2 <- multi2di(tree_poly, random = TRUE)

Create a non-ultrametric toy tree

tree_nonultra <- read.tree(text = "((A:2,B:2):3,(C:1,D:4):2,E:5);")
is.ultrametric(tree_nonultra)
# [1] FALSE

As you can see, the tips do not all end at the same distance from the root, so the tree is not ultrametric. This means that branch lengths cannot be interpreted directly as shared evolutionary time, which is required for Brownian motion or Ornstein–Uhlenbeck based phylogenetic covariance models.

chronos() estimates branch lengths under a relaxed molecular clock, producing a tree in which all tips represent the same present time.
This is required when phylogenetic covariance is interpreted as shared evolutionary time.

tree_ultra1 <- chronos(tree_nonultra)


Setting initial dates...
Fitting in progress... get a first set of estimates
         (Penalised) log-lik = -17.41843 
Optimising rates... dates... -17.41843 
Optimising rates... dates... -17.4165 

log-Lik = -14.93714 
PHIIC = 49.73

is.ultrametric(tree_ultra1)
# [1] TRUE


Setting initial dates...
Fitting in progress... get a first set of estimates
         (Penalised) log-lik = -17.41843 
Optimising rates... dates... -17.41843 
Optimising rates... dates... -17.4165 

log-Lik = -14.93714 
PHIIC = 49.73

Or, you can use force.ultrametric() from the phytools package to coerce a tree into an ultrametric form by extending branch lengths.

tree_ultra2 <- force.ultrametric(tree_nonultra, method = "extend")
is.ultrametric(tree_ultra2)

***************************************************************
*                          Note:                              *
*    force.ultrametric does not include a formal method to    *
*    ultrametricize a tree & should only be used to coerce    *
*   a phylogeny that fails is.ultrametric due to rounding --  *
*    not as a substitute for formal rate-smoothing methods.   *
***************************************************************

In practice, the typical workflow is:

Check whether the tree is binary and ultrametric.
If polytomies are present, resolve them using multi2di(), ideally multiple times.
If the tree is not ultrametric, apply a dating method such as chronos() or use a time-calibrated tree from the literature.
Only then compute the phylogenetic covariance or correlation matrix using vcv().

This ensures that the phylogenetic random effect used in meta-analysis corresponds to a well-defined evolutionary model.

Examples from real meta-analyses

1. Moura et al. (2021)

Here we use the dataset from Moura et al. (2021), which collated 1,828 effect sizes from 457 studies and 341 animal species to investigate assortative mating patterns. Each effect size is a Fisher’s Z transformed correlation coefficient for body size correlations paired couples for within a single population and sampling period; each correlation is species-specific, and there are multiple correlations per species. The effect size indicates how strongly mates resemble each other in body size across a very broad taxonomic and ecological range. The dataset and phylogenetic tree are available in the metadat package as dat.moura2021.

Before fitting hierarchical meta-analytic models, it is important to assess whether key categorical predictor variables have sufficient replication to support the estimation of random effects (or fixed effects). In particular, the identifiability of variance components depends on having an adequate number of levels for each random factor, as well as repeated observations within those levels.

So, we begin by summarising the number of unique levels for all categorical variables in the dataset, with particular attention to variables commonly used as random effects in meta-analysis, such as study identity, species identity, and effect-size identity.

# read data from metadat package
dat_moura2021 <- dat.moura2021$dat

# duplicate the column with species name - additional column is needed for adding phylogenetic correlation matrix and non-phylogenetic random effect separately.
dat_moura2021$species.id.phy <- dat_moura2021$species.id

# add a column to be used as an ID of effect sizes to specify random errors in the meta-analytic model
dat_moura2021$effect.size.id <- 1:nrow(dat_moura2021)

# convert ID of effect sizes to a factor (categorical) variable
dat_moura2021$effect.size.id <- as.factor(dat_moura2021$effect.size.id)

## check data structure
cat_vars <- dat_moura2021 |>
  dplyr::select(where( ~ is.factor(.x) || is.character(.x)))

n_levels <- cat_vars |> 
  dplyr::summarise(across(
    everything(),
    ~ dplyr::n_distinct(.x)
  )) |> 
  tidyr::pivot_longer(
    cols = everything(),
    names_to = "variable",
    values_to = "n_levels"
  )

n_levels # shows how many levels each categorical variable has
# # A tibble: 13 × 2
#    variable          n_levels
#    <chr>                <int>
#  1 study.id               457
#  2 effect.size.id        1828
#  3 species                345
#  4 species.id             341
#  5 subphylum               11
#  6 phylum                   3
#  7 assortment.trait       233
#  8 trait.dimensions         5
#  9 field.collection         2
# 10 pooled.data              2
# 11 spatially.pooled         2
# 12 temporally.pooled        2
# 13 species.id.phy         341

The dataset contains 1,828 effect sizes (effect.size.id) drawn from 457 independent studies (study.id) and 341 animal species (species.id.phy). This provides replication at both the study and species levels, which is generally important for estimating random-effects variance components. There is, however, no universal rule for what constitutes sufficient replication, because this depends on the number of levels, the distribution of observations across levels, and the magnitudes of the variance components. As a general guide, variance components are more likely to be estimable when each random effect has multiple levels and when at least some levels contribute repeated observations. In the present dataset, the relatively large number of species particularly supports the inclusion of species-level random effects, including both non-phylogenetic (species.id) and phylogenetic (species.id.phy) terms.

Let’s move on to compute the phylogenetic correlation matrix. A matching tree is supplied as a data object in the metadat package, so we just load it and can compute the matrix without additional checking and processing

# read tree from metadat package
tree <- dat.moura2021$tree

# calculate r-to-z transformed correlations and corresponding sampling variances
dat_moura2021 <- escalc(measure = "ZCOR", ri = ri, ni = ni, data = dat_moura2021)

# turn tree into an ultrametric onevusing the default "Grafen" method (Grafen 1989)
tree <- compute.brlen(tree)

# compute phylogenetic correlation matrix
A <- vcv(tree, corr = TRUE)

Brownian motion (meta-analysis)

The Brownian Motion (BM) model assumes that traits evolve according to a random walk process along the branches of the phylogenetic tree. BM is default model for continuous trait evolution in comparative methods. In the model, the expected variance-covariance structure among species is directly proportional to their shared evolutionary history, as represented by the phylogenetic tree.

phylo_eg1_meta_ma_BM <- rma.mv(yi, vi,
                 random = list(
                     ~ 1 | study.id, # among-study variation 
                     ~ 1 | effect.size.id,　# additional effect-size-level variation
                     ~ 1 | species.id, # species-specific, non-phylogenetic variation
                     ~ 1 | species.id.phy), # species-specific, phylogenetic variation
                     R = list(species.id.phy = A), # Brownian motion phylogenetic correlation matrix
                 data = dat_moura2021,
                 verbose = TRUE,
                 sparse = TRUE,
                 method = "REML"
                       )

summary(phylo_eg1_meta_ma_BM)
# Multivariate Meta-Analysis Model (k = 1828; method: REML)
# 
#    logLik   Deviance        AIC        BIC       AICc   
# -167.6726   335.3453   345.3453   372.8974   345.3782   
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed          factor    R 
# sigma^2.1  0.0192  0.1384    457     no        study.id   no 
# sigma^2.2  0.0145  0.1202   1828     no  effect.size.id   no 
# sigma^2.3  0.0557  0.2359    341     no      species.id   no 
# sigma^2.4  0.0512  0.2263    341     no  species.id.phy  yes 
# 
# Test for Heterogeneity:
# Q(df = 1827) = 10743.8076, p-val < .0001
# 
# Model Results:
# 
# estimate      se    zval    pval   ci.lb   ci.ub     
#   0.3682  0.1300  2.8311  0.0046  0.1133  0.6230  ** 

confint(phylo_eg1_meta_ma_BM)
#           estimate  ci.lb  ci.ub 
# sigma^2.1   0.0192 0.0108 0.0325 
# sigma.1     0.1384 0.1038 0.1802 

#           estimate  ci.lb  ci.ub 
# sigma^2.2   0.0145 0.0121 0.0172 
# sigma.2     0.1202 0.1099 0.1311 

#           estimate  ci.lb  ci.ub 
# sigma^2.3   0.0557 0.0334 0.0788 
# sigma.3     0.2359 0.1827 0.2807 

#           estimate  ci.lb  ci.ub 
# sigma^2.4   0.0512 0.0179 0.1792 
# sigma.4     0.2263 0.1336 0.4233 

# calculate I2 estimate
orchaRd::i2_ml(phylo_eg1_meta_ma_BM)
# I2_Total       I2_study.id I2_effect.size.id     I2_species.id I2_species.id.phy 
# 97.30527          13.26902          10.00808          38.55098          35.47718 

# estimate phylogenetic heritability
phylo_heritability <- phylo_eg1_meta_ma_BM$sigma2[4] / (phylo_eg1_meta_ma_BM$sigma2[4] + phylo_eg1_meta_ma_BM$sigma2[3])
phylo_heritability
# [1] 0.479239

The overall estimate \(\beta_0\) is 0.368 (95% CI 0.113, 0.623) on the Fisher’s Z scale. Back transforming to the Pearson’s r, this corresponds to a correlation of roughly 0.35 - which indicates a positive correlation between mates in body size across species. On average, larger males tend to pair with larger females across species.

The \(I^2\) (\(\approx\) 97.31%) indicate that there is high heterogeneity among studies, effect sizes, species, and phylogenetic relatedness. About 35.48% of the total variance can be attributed to phylogenetic relatedness among species, indicating a moderate phylogenetic signal in mate size correlation. Biologically, this means that different species tend to have different average levels of mate size correlation, and closely related species tend to have more similar levels of mate size correlation than distantly related species.

By partitioning species level variance into phylogenetic and non-phylogenetic components, we can estimate the phylogenetic heritability (or Pagel’s \(\lambda\)) of mate size correlation. This implies that about 48 percent of the differences among species in assortative mating strength are explained by phylogenetic relatedness.

In other words, closely related species tend to have similar levels of size assortative mating, but nearly half of the variation is species specific and not explained by shared ancestry.

We show two ways of specifying sampling error in brms:

Using se(sqrt(vi)) to supply known standard errors.
Representing sampling error explicitly via a group level term with a known variance covariance structure. In this formulation, the term (1 | gr(effect.size.id, cov = vcv)) defines a random effect whose covariance matrix is fixed to the diagonal matrix of sampling variances for lnRR. This is mathematically equivalent to providing a sampling variance matrix to metafor::rma.mv(), but implemented within brms.

Because sampling variances are treated as known in meta analysis, the standard deviation of this group level term is fixed to 1. Consequently, the magnitude of sampling error is fully determined by the supplied variance covariance matrix and is not estimated from the data.

The second approach is less common, but it can be much faster than usual (see more detailed information: link).

If you are not familiar with brms, we highly recommend checking out the instruction of brms first.

1. Using se(sqrt(vi))

In this specification, sampling error is modelled by supplying known standard errors via se() in the model formula. The expression yi|se(sqrt(vi)) indicates that the response variable yi has known standard errors equal to the square root of the sampling variances vi. Under this formulation, brms treats the observed effect sizes as noisy measurements of latent true effects. The remaining terms in the model specify sources of between study and between species variation in these latent true effects. Specifically, (1 | study.id) and (1 | effect.size.id) model study level and effect size level heterogeneity, (1 | species.id) and (1 | gr(species.id.phy, cov = A)) model non phylogenetic and phylogenetic species level variation, respectively.

fit_phylo_eg1_brms_ma <- bf(yi | se(sqrt(vi)) ~ 1 +
                     (1 | study.id) + 
                     (1 | effect.size.id) + 
                     (1 | species.id) + 
                     (1 | gr(species.id.phy, cov = A))
                     )

# set prior
prior_brms1 <- get_prior(
  formula = fit_phylo_eg1_brms_ma,
  data = dat_moura2021,
  data2 = list(A = A),
  family = gaussian()
  )

phylo_eg1_brms_ma <- brm(
    formula = fit_phylo_eg1_brms_ma,
    family = gaussian(),
    data = dat_moura2021,
    data2 = list(A = A),
    prior = prior_brms1,
    iter = 10000, 
    warmup = 7000,  
    chains = num_chains,
    backend = "cmdstanr",
    threads = threading(threads_per_chain), 
    control = list(adapt_delta = 0.95, max_treedepth = 15)
    )

summary(phylo_eg1_brms_ma)
#  Family: gaussian 
#   Links: mu = identity 
# Formula: yi | se(sqrt(vi)) ~ 1 + (1 | study.id) + (1 | effect.size.id) + (1 | species.id) + (1 | gr(species.id.phy, cov = A)) 
#    Data: dat_moura2021 (Number of observations: 1828) 
#   Draws: 2 chains, each with iter = 10000; warmup = 7000; thin = 1;
#          total post-warmup draws = 6000
# 
# Multilevel Hyperparameters:
# ~effect.size.id (Number of levels: 1828) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.12      0.01     0.11     0.13 1.00     2091     3894
# 
# ~species.id (Number of levels: 341) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.23      0.03     0.17     0.28 1.00      795     1292
# 
# ~species.id.phy (Number of levels: 341) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.27      0.09     0.14     0.50 1.00     1158      954
# 
# ~study.id (Number of levels: 457) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.14      0.02     0.11     0.18 1.00      661     1223
# 
# Regression Coefficients:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     0.37      0.16     0.05     0.70 1.00     3504     2965
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.00      0.00     0.00     0.00   NA       NA       NA

2. Using gr(…, vcv = vcv)

In this alternative specification, sampling error is represented explicitly as a group level effect with a known variance covariance structure. The term (1 | gr(effect.size.id, cov = vcv)) defines a random effect for each effect size whose covariance matrix is fixed to vcv, a diagonal matrix containing the sampling variances v_i on the diagonal. This implies the following hierarchical formulation:

\(\text{Sampling error} \sim \mathcal{N}(0, V),\)

where is the supplied variance covariance matrix vcv. In this setup, sampling error is modelled as a latent random effect rather than being placed directly in the likelihood via se().

Because sampling variances are treated as known in meta analysis, the standard deviation of this group level term is fixed to 1 using constant(1). As a result, the magnitude of the sampling error is entirely determined by the supplied matrix V and is not estimated from the data.

This formulation is mathematically equivalent to providing a sampling variance matrix to metafor::rma.mv(), but implemented here within the brms modeling framework. It also yields the same likelihood for the observed effect sizes as the se(sqrt(vi)) approach, differing only in how sampling error is represented in the model.

vcv <- diag(dat_moura2021$vi)
rownames(vcv) <- colnames(vcv) <- dat_moura2021$effect.size.id

fit_phylo_eg1.1_brms_ma <-  bf(yi ~ 1 + 
                               (1 | study.id) + 
                               (1 | species.id) + 
                               (1 | gr(species.id.phy, cov = A)) +
                               (1 | gr(effect.size.id, cov = vcv))
                             )

prior_brms1.1 <- default_prior(fit_phylo_eg1.1_brms, 
                      data = dat_moura2021, 
                      data2 = list(A = A, vcv = vcv), 
                      family = gaussian())

prior_ma$prior[3] = "constant(1)"

phylo_eg1.1_brms_ma <- brm(
    formula = fit_phylo_eg1.1_brms_ma,
    family = gaussian(),
    data = dat_moura2021,
    data2 = list(A = A, vcv = vcv),
    prior = prior_brms1.1,
    iter = 18000, 
    warmup = 10000,  
    chains = num_chains,
    backend = "cmdstanr",
    threads = threading(threads_per_chain), 
    control = list(adapt_delta = 0.98, max_treedepth = 18)
    )

summary(phylo_eg1.1_brms)
# Family: gaussian 
#   Links: mu = identity 
# Formula: yi ~ 1 + (1 | study.id) + (1 | species.id) + (1 | gr(species.id.phy, cov = A)) + (1 | gr(effect.size.id, cov = vcv)) 
#    Data: dat_moura2021 (Number of observations: 1828) 
#   Draws: 2 chains, each with iter = 18000; warmup = 10000; thin = 1;
#          total post-warmup draws = 16000
# 
# Multilevel Hyperparameters:
# ~effect.size.id (Number of levels: 1828) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     1.00      0.00     1.00     1.00   NA       NA       NA
# 
# ~species.id (Number of levels: 341) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.23      0.03     0.17     0.28 1.00     1390     1570
# 
# ~species.id.phy (Number of levels: 341) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.27      0.09     0.15     0.50 1.00     1788     1515
# 
# ~study.id (Number of levels: 457) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.14      0.02     0.11     0.18 1.00     1068     1500
# 
# Regression Coefficients:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     0.37      0.16     0.03     0.70 1.00     6398     6604
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.12      0.01     0.11     0.13 1.00     1801     3462

Nice - the outputs are very similar to the previous model using se(), as expected and also consistent with the metafor results. In this fit, the phylogenetic species-level variance is slightly larger than the non-phylogenetic component, whereas the opposite pattern was observed in the metafor model.

The small differences in the estimates arise because this model is fitted in a Bayesian framework.

Let’s calculate \(I^2\) statistics to quantify the proportion of variance attributable to different sources of heterogeneity.

fit <- phylo_eg1_brms  
dat <- dat_moura2021
v_harm <- 1 / mean(1/dat$vi, na.rm = TRUE)

I2_total_harm  <- var_total_hetero / (var_total_hetero + v_harm)
I2_effect_harm <- var_effect / (var_total_hetero + v_harm)
I2_study_harm  <- var_study / (var_total_hetero + v_harm)
I2_species_harm　<- var_species / (var_total_hetero + v_harm)
I2_phylo_harm  <- var_phylo / (var_total_hetero + v_harm)

round(rbind(
      total  = summ_I2(I2_total_harm),
      effect = summ_I2(I2_effect_harm),
      study = summ_I2(I2_study_harm),
      species = summ_I2(I2_species_harm),
      phylo = summ_I2(I2_phylo_harm)
    ), 3)
#         mean q2.5.2.5% q50.50% q97.5.97.5%
# total   0.978     0.969   0.978       0.990
# effect  0.090     0.043   0.091       0.134
# study   0.129     0.048   0.123       0.247
# species 0.330     0.115   0.333       0.530
# phylo   0.429     0.178   0.414       0.759

glmmTMB is a popular package for fitting generalised linear mixed models. Unlike dedicated meta-analytic packages such as metafor, it does not allow to easily account for known sampling variances of effect sizes - so when we implement a meta-analysis in glmmTMB, we need explicitly specify the sampling error structure of the effect sizes.

In practice, this require three steps:

1. Treating each effect size as a unique random-effect level

# random effects in glmmTMB are defined over factor levels. By converting the effect-size identifier into a factor, each effect size (i) is assigned its own random effect (e_i), which will repredsent its sampling error - this is what allows the model to attach the correlation variance (v_i) to each observation

dat_moura2021$effect.size.id <- as.factor(dat_moura2021$effect.size.id)

In glmmTMB, grouping variables for random effects must be factors. By converting effect_size_ID to a factor, we ensure that each effect size is treated as a distinct level in the random-effects structure. This step is required for the model to correctly associate each observation with its corresponding sampling variance.

2. Specifying the random effect independent grouping structure

# the equalto() and propto() terms require a grouping variable, even when there is only one group. the dummy variable 'g' assigns all observations to single group, while the covariance structure is defined entirely by supplied matrices…
dat_moura2021$g <- 1

3. Constructing a variance-covariance matrix (VCV) that contains the known sampling variances of each effect size

# this creates a variance-covariance matrix - diagonal elements are the known sampling variances of the effect sizes. The off-diagonal elements are zero, reflecting the usual assumption that effect sizes are independent. This matrix plays the same rule as V = vi in metafor

VCV <- diag(dat_moura2021$vi, nrow = nrow(dat_moura2021)) 
rownames(VCV)<- colnames(VCV)<- dat_moura2021$effect.size.id

We create a variance–covariance matrix whose diagonal elements are the known sampling variances of the effect sizes. Off-diagonal elements are set to zero, reflecting the assumption that effect sizes are independent. This mirrors the standard assumption made in most meta-analyses.

The equalto() term in glmmTMB requires a grouping factor. The dummy variable g serves this purpose, which we set to a single level (one group), i.e. all observations are assigned to the same group. If the grouping factor had more than one level, each group would have a separate independent random-effect vector with the same variance-covariance parameters. For meta-analysis, this grouping factor should always have one level.

For more details, you can also refer to this vignette: https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html

And the following supplementary material for fitting meta-analysis with glmmTMB: https://coraliewilliams.github.io/equalto_sim_study/webpage.html.

# reorders the rows and columns of the phylogenetic covariance matrix so that species appear in a consistent alphabetical order, ensuring that the matrix aligns correctly with the species factor used in the model.
A <- A[sort(rownames(A)), sort(rownames(A))]

system.time(
    phylo_eg1_tmb <- glmmTMB(yi ~ 1 +

# the term equalto(0 + effect.size.id|g, VCV) specifies that the random effect associated with each effect size has a covariance structure defined by the matrix VCV. This effectively encodes the known sampling variances into the model
                         equalto(0 + effect.size.id|g, VCV) + 
                         (1|study.id) + 
  
# the terms (1|species.id) and propto(0 + species.id.phy|g, A) model non-phylogenetic and phylogenetic species-level variation, respectively. The propto() term uses the phylogenetic covariance matrix A to capture shared evolutionary history among species
                         (1|species.id) +
                         propto(0 + species.id.phy|g, A), 
                       data = dat_moura2021,
                       REML = T)
　　　　　　　)

Now, we want to know the model output. Although the fitted model is statistically equivalent to the models fitted metafor and brms, glmmTMB reports variance components in a different way, which can initially confusing…

head(confint(phylo_eg1_tmb), 10)
#                                                                           2.5 %    97.5 %   Estimate
# (Intercept)                                                            0.1132610 0.6230707 0.3681658
# Std.Dev.(Intercept)|study.id                                           0.1056462 0.1813458 0.1384142
# Std.Dev.(Intercept)|species.id                                         0.1932854 0.2879763 0.2359271
# Std.Dev.species.id.phyAcanthurus_leucosternon_ott388125|g.1            0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAcanthurus_nigricans_ott467313|g.1               0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAcanthurus_nigrofuscus_ott605289|g.1             0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAchatina_fulica_ott997087|g.1                    0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAegithalos_glaucogularis_vinaceus_ott5560982|g.1 0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAethia_pusilla_ott855484|g.1                     0.1293612 0.3959730 0.2263262
# Std.Dev.species.id.phyAgalychnis_callidryas_ott9483|g.1                0.1293612 0.3959730 0.2263262
sigma(phylo_eg1_tmb)^2
# [1] 0.01445014 # residual variance

phylo_eg1_tmb_varcor <- VarCorr(phylo_eg1_tmb)$cond
exp(phylo_eg1_tmb$fit$par)
#   betadisp      theta      theta      theta 
# 0.12020875 0.13841423 0.23592711 0.05122353 <- residual, study.id, species.id, species.id.phy

You can see the outputs both summary() and confint(). The confint() output reports standard deviations (SD) of random effects, not variances

For example…

(Intercept) -> overall mean
Std.Dev.(Intercept) | study.id -> correspond to the square roots of study-level variance
Std.Dev.(Intercept) | species.id -> correspond to the square roots of non-phylogenetic species-level variance

You see the output also includes many rows such as…

Std.Dev.species.id.phyAcanthurus_leucosternon_ott388125|g.1
Std.Dev.species.id.phyAcanthurus_nigricans_ott467313|g.1

do not represent separate variance parameters for each species. They are phylogenetic random effect! They are the conditional standard deviations of individual species’ phylogenetic random effects, which are all draws from the same multivariate normal distribution defined by the phylogenetic covariance matrix and a single scaling parameter. So, we should not be interpreted as species-specific variance components.

To obtain the actual variance parameters, it is more reliable to inspect the internal parameter vector:

exp(phylo_eg1_tmb$fit$par)
# betadisp   -> residual SD
# theta[1]   -> SD of study-level random effects
# theta[2]   -> SD of non-phylogenetic species effects
# theta[3]   -> Variance of phylogenetic effects

But be careful - the meaning of theta can be changed. It depends on what random effect you included in your model. The residual variance is reported by sigma(model)^2.

Brownian motion (meta-regression)

We next extend the meta-analysis model by including the categorical moderator as a fixed effect (predictor). We will use variable called temporally.pooled　(yes/no) as our moderator - it codes whether effect sizes were calculated from data pooled across multiple sampling periods. The resulting meta-regression model will allows us to examine whether methodological features of the primary studies help explain some of the variation among effect sizes, over and above the hierarchical and phylogenetic structure already accounted for.

 phylo_eg1.1_meta_BM <- rma.mv(
  yi, vi,
  mods = ~ temporally.pooled,
  random = list(
    ~ 1 | study.id, 
    ~ 1 | effect.size.id, 
    ~ 1 | species.id, 
    ~ 1 | species.id.phy
    ),
  R = list(species.id.phy = A), 
  data = dat_moura2021,
  verbose = TRUE,
  sparse = TRUE,
  method = "REML"
  )

summary(phylo_eg1.1_meta_BM)

#    logLik   Deviance        AIC        BIC       AICc   
# -165.9738   331.9476   343.9476   377.0069   343.9938   

# Variance Components:

#             estim    sqrt  nlvls  fixed          factor    R 
# sigma^2.1  0.0194  0.1393    457     no        study.id   no 
# sigma^2.2  0.0145  0.1204   1828     no  effect.size.id   no 
# sigma^2.3  0.0540  0.2323    341     no      species.id   no 
# sigma^2.4  0.0520  0.2280    341     no  species.id.phy  yes 

# Test for Residual Heterogeneity:
# QE(df = 1826) = 10668.4998, p-val < .0001

# Test of Moderators (coefficient 2):
# QM(df = 1) = 3.1936, p-val = 0.0739

# Model Results:

#                       estimate      se    zval    pval    ci.lb   ci.ub     
# intrcpt                 0.3562  0.1311  2.7163  0.0066   0.0992  0.6132  ** 
# temporally.pooledyes    0.0395  0.0221  1.7871  0.0739  -0.0038  0.0829   .

For studies without temporal pooling, the intercept \(\beta_0\) represents the estimated mean effect size and was 0.356 (95% CI 0.099, 0.613) on the Fisher’s Z scale. Relative to this baseline, temporally pooled studies tended to show slightly larger effect sizes on average, but the uncertainty interval overlapped zero (\(\beta_{\text{temporally.pooled}_{\text{yes}}}\) = 0.040, 95% CI = [-0.004, 0.083]).

Importantly, this inference is made while accounting for non-independence among effect sizes arising from shared studies, repeated measurements, species identity, and phylogenetic relatedness. Thus, the moderator effect reflects systematic differences associated with temporal pooling rather than confounding hierarchical structure.

orchaRd::r2_ml(phylo_eg1.1_meta_BM)
   # R2_marginal R2_conditional 
   # 0.002500313    0.896644494

We further quantified the explanatory power of the model using marginal and conditional \(R^2\). The R2_marginal (0.003) represents the proportion of variance explained by the fixed effect alone, in this case the moderator temporally.pooled. In contrast, the conditional R2_conditional (0.897) reflects the variance explained by the full model, including both fixed effects and all random effects.

The very low marginal \(R^2\) indicates that temporal pooling explains only a negligible fraction of the total variability in effect sizes. By contrast, the high conditional \(R^2\) shows that most of the variation is captured by the hierarchical and phylogenetic structure of the model.

res1 <- orchaRd::mod_results(moura2021_BM_meta_reg, mod = "temporally.pooled",  group = "study.id", subset = TRUE)

orchard_plot(res1, 
             mod = "temporally.pooled",
             group = "study.id",
             xlab = "Effect size (ZCOR)",
             angle = 45) + 
  # scale_x_discrete(labels = c("Overall effect")) +
  # scale_color_manual(values = "#CDBE70") +
  # scale_fill_manual(values = "#EEDC82") + 
  # scale_y_continuous(breaks = seq(-4.0, 4.0, 1), limits = c(-4.0, 4.0)) + 
  theme_classic()

vcv <- diag(dat_moura2021$vi)
rownames(vcv) <- colnames(vcv) <- dat_moura2021$effect.size.id
fit_phylo_eg1_brms_mr <-  bf(yi ~ 1 + temporally.pooled + 
                     (1 | study.id) + 
                     (1 | species.id) + 
                     (1 | gr(species.id.phy, cov = A)) +
                     (1 | gr(effect.size.id, cov = vcv))
                     )

prior_mr <- default_prior(fit_phylo_eg1_brms_mr, 
                data = dat_moura2021, 
                data2 = list(A = A, vcv = vcv), 
                family = gaussian())

prior_mr$prior[5] = "constant(1)"

system.time(
  phylo_eg1_brms_mr <- brm(
    formula = fit_phylo_eg1_brms_mr,
    family = gaussian(),
    data = dat_moura2021,
    data2 = list(A = A, vcv = vcv),
    prior = prior_mr,
    iter = 12000, 
    warmup = 8000,  
    chains = num_chains,
    backend = "cmdstanr",
    threads = threading(threads_per_chain), 
    control = list(adapt_delta = 0.95, max_treedepth = 15)
    )
  )

summary(phylo_eg1_brms_mr)
# Family: gaussian 
#   Links: mu = identity 
# Formula: yi ~ 1 + temporally.pooled + (1 | study.id) + (1 | species.id) + (1 | gr(species.id.phy, cov = A)) + (1 | gr(effect.size.id, cov = vcv)) 
#    Data: dat_moura2021 (Number of observations: 1828) 
#   Draws: 2 chains, each with iter = 12000; warmup = 8000; thin = 1;
#          total post-warmup draws = 8000
# 
# Multilevel Hyperparameters:
# ~effect.size.id (Number of levels: 1828) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     1.00      0.00     1.00     1.00   NA       NA       NA
# 
# ~species.id (Number of levels: 341) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.22      0.03     0.16     0.27 1.00      733      941
# 
# ~species.id.phy (Number of levels: 341) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.28      0.10     0.15     0.52 1.00      925      798
# 
# ~study.id (Number of levels: 457) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.14      0.02     0.11     0.19 1.01      569      887
# 
# Regression Coefficients:
#                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept                0.35      0.16     0.01     0.68 1.00     4844     3939
# temporally.pooledyes     0.04      0.02    -0.00     0.08 1.00     4646     5357
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.12      0.01     0.11     0.13 1.00      914     2060

A <- A[sort(rownames(A)), sort(rownames(A))]

dat_moura2021$g <- 1 

VCV <- diag(dat_moura2021$vi, nrow = nrow(dat_moura2021)) 
rownames(VCV)<- colnames(VCV)<- dat_moura2021$effect.size.id

dat_moura2021$effect.size.id <- as.factor(dat_moura2021$effect.size.id)
head(dat_moura2021)

system.time(
    phylo_eg1.1_tmb <- glmmTMB(
      yi ~ 1 + temporally.pooled + 
        equalto(0 + effect.size.id|g, VCV) +
        (1|study.id) + 
        (1|species.id.phy) +
        propto(0 + species.id.phy|g, A),
      data = dat_moura2021,
      REML = T)
    )
head(confint(phylo_eg1.1_tmb), 10)
#                                                                             2.5 %     97.5 %   Estimate
# (Intercept)                                                             0.099133194 0.61322985 0.35618152
# temporally.pooledyes                                                   -0.004080496 0.08314767 0.03953359
# Std.Dev.(Intercept)|study.id                                            0.106434012 0.18231634 0.13930061
# Std.Dev.(Intercept)|species.id.phy                                      0.189351801 0.28492224 0.23227256
# Std.Dev.species.id.phyAcanthurus_leucosternon_ott388125|g.1             0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAcanthurus_nigricans_ott467313|g.1                0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAcanthurus_nigrofuscus_ott605289|g.1              0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAchatina_fulica_ott997087|g.1                     0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAegithalos_glaucogularis_vinaceus_ott5560982|g.1  0.130542545 0.39822099 0.22800171
# Std.Dev.species.id.phyAethia_pusilla_ott855484|g.1                      0.130542545 0.39822099 0.22800171


sigma(phylo_eg1.1_tmb)^2
# [1] 0.01448824 # residual variance
phylo_eg1.1_tmb_varcor <- VarCorr(phylo_eg1.1_tmb)$cond
exp(phylo_eg1.1_tmb$fit$par)
#   betadisp      theta      theta      theta 
# 0.12036710 0.13930061 0.23227256 0.05198478  <- residual variance (sd), study.id (sd), species.id (sd), species.id.phy (variance)

Ornstein-Uhlenbeck

We next fit a phylogenetic meta-analysis model under an Ornstein-Uhlenbeck (OU) model of trait evolution. The OU model is useful here because it implies that correlation between species declines with phylogenetic distance, and this decay can be represented using an exponential correlation function.

Under an OU model, the phylogenetic covariance between species \(i\) and \(j\) can be written as

\[ \operatorname{Cov}_{ij} = \sigma^2 \exp(-\alpha d_{ij}), \]

where \(d_{ij}\) is the phylogenetic distance between species \(i\) and \(j\), \(\sigma^2\) is the trait variance, and \(\alpha\) controls how quickly correlation decays with distance. Larger values of \(\alpha\) imply faster decay of phylogenetic correlation, corresponding to a stronger pull towards a species-specific optimum.

This has the same functional form as the exponential correlation structure often used in distance-based models:

\[ \operatorname{Cov}_{ij} = \sigma^2 \exp(-\rho d_{ij}), \]

which allows the OU model to be fitted by exploiting this equivalence.

Let \(\mathbf{A}\) be the Brownian motion phylogenetic correlation matrix derived from the tree, where each entry represents phylogenetic similarity based on shared branch length. We then define a distance matrix \(\mathbf{D}\) as

\[ \mathbf{D} = \mathbf{I} - \mathbf{A}, \]

where \(\mathbf{I}\) is the identity matrix. The entries of \(\mathbf{D}\) represent phylogenetic distances between species, scaled between 0 and 1. Closely related species have smaller distances, whereas distantly related species have larger distances.

We fit the OU phylogenetic model using a two-stage procedure. First, we estimate the rate at which phylogenetic correlation decays with distance by fitting a meta-analysis model with an exponential correlation structure, using \(\mathbf{D}\) as the phylogenetic distance matrix. Second, we use this fitted decay structure to construct an OU based phylogenetic correlation matrix for the final meta-analysis or meta-regression model. This provides a practical way to incorporate OU style phylogenetic dependence within the standard multilevel meta-analytic framework implemented in metafor.

We first fit a meta-analysis model with an exponential correlation structure, using \(\mathbf{D}\) as the phylogenetic distance matrix:

# create phylogenetic distance matrix
dat_moura2021$const <- 1
I  <- matrix(1, nrow = length(tree$tip.label), ncol = length(tree$tip.label))
D <- I-A

D[1:5, 1:5]

phy_ou_meta <- rma.mv(yi, vi,
                 random = list(~ 1 | study.id, 
                 ~ 1 | effect.size.id, 
                 ~ 1 | species.id, 
                 ~ species.id.phy | const),
                 dist = list(D), 
                 struct = "SPEXP",
                 control = list(rho.init = 1),
                 data = dat_moura2021,
                 verbose = TRUE,
                 sparse = TRUE,
                 method = "REML",
                 test = "t"
                 )

summary(phy_ou_meta)
# Multivariate Meta-Analysis Model (k = 1828; method: REML)

#    logLik   Deviance       AIC        BIC       AICc   
# -160.3783   320.7567   332.7567   365.8192   332.8028   

# Variance Components:

#             estim    sqrt  nlvls  fixed          factor 
# sigma^2.1  0.0157  0.1254    457     no        study.id 
# sigma^2.2  0.0144  0.1201   1828     no  effect.size.id 
# sigma^2.3  0.0000  0.0000    341     no      species.id 

# outer factor: const           (nlvls = 1)
# inner term:   ~species.id.phy (nlvls = 341)

#             estim    sqrt  fixed 
# tau^2      0.1029  0.3208     no 
# rho        0.0182             no 

# Test for Heterogeneity:
# Q(df = 1827) = 10743.8076, p-val < .0001

# Model Results:

# estimate      se    zval    pval   ci.lb   ci.ub      
#   0.3514  0.0355  9.9090  <.0001  0.2819  0.4209  *** 

# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This model estimates the exponential decay parameter governing how quickly phylogenetic correlation declines with distance. Using this fitted decay parameter, we then construct an OU based phylogenetic correlation matrix as

\[ \mathbf{A}_{OU} = \exp(-\alpha \mathbf{D}). \]

This matrix represents the phylogenetic correlation structure implied by the fitted OU model. We then refit the meta-analysis, supplying \(\mathbf{A}_{OU}\) as a fixed phylogenetic correlation matrix:

rho <- phy_ou_meta$rho  #[1] 0.01820382
alpha <- 1/rho # 54.93351
A_OU <- exp(-alpha*D)

phy_ou_meta1 <- rma.mv(yi, vi,
     random = list(~ 1 | study.id, 
    ~ 1 | effect.size.id, 
    ~ 1 | species.id, 
    ~ 1 | species.id.phy),
    R = list(species.id.phy = A_OU), 
    data = dat_moura2021,
    verbose = TRUE,
    sparse = TRUE,
    method = "REML",
    test = "t"
    )

summary(phy_ou_meta1)
# Multivariate Meta-Analysis Model (k = 1828; method: REML)
# 
#    logLik   Deviance        AIC        BIC       AICc   
# -160.3783   320.7567   330.7567   358.3088   330.7896   
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed          factor    R 
# sigma^2.1  0.0157  0.1254    457     no        study.id   no 
# sigma^2.2  0.0144  0.1201   1828     no  effect.size.id   no 
# sigma^2.3  0.0000  0.0001    341     no      species.id   no 
# sigma^2.4  0.1029  0.3208    341     no  species.id.phy  yes 
# 
# Test for Heterogeneity:
# Q(df = 1827) = 10743.8076, p-val < .0001
# 
# Model Results:
# 
# estimate      se    tval    df    pval   ci.lb   ci.ub      
#   0.3514  0.0355  9.9090  1827  <.0001  0.2819  0.4210  ***

We can extend the same correlation structure to a meta-regression model by including moderators while retaining the OU based phylogenetic random effect:

phy_ou_meta_reg2 <- rma.mv(yi, vi,
                  mods = ~ temporally.pooled,
                  random = list(~ 1 | study.id, 
                  ~ 1 | effect.size.id, 
                  ~ 1 | species.id, 
                  ~ 1 | species.id.phy),
                  R = list(species.id.phy = A_OU), 
                  data = dat_moura2021,
                  verbose = TRUE,
                  sparse = TRUE,
                  method = "REML",
                  test = "t"
                  )

summary(phy_ou_meta_reg2)
# Multivariate Meta-Analysis Model (k = 1828; method: REML)
# 
#    logLik   Deviance        AIC        BIC       AICc   
# -159.3682   318.7364   330.7364   363.7957   330.7826   
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed          factor    R 
# sigma^2.1  0.0159  0.1261    457     no        study.id   no 
# sigma^2.2  0.0144  0.1202   1828     no  effect.size.id   no 
# sigma^2.3  0.0000  0.0001    341     no      species.id   no 
# sigma^2.4  0.1014  0.3184    341     no  species.id.phy  yes 
# 
# Test for Residual Heterogeneity:
# QE(df = 1826) = 10668.4998, p-val < .0001
# 
# Test of Moderators (coefficient 2):
# F(df1 = 1, df2 = 1826) = 1.8609, p-val = 0.1727
# 
# Model Results:
# 
#                       estimate      se    tval    df    pval    ci.lb   ci.ub      
# intrcpt                 0.3421  0.0359  9.5422  1826  <.0001   0.2718  0.4125  *** 
# temporally.pooledyes    0.0293  0.0215  1.3641  1826  0.1727  -0.0128  0.0713

The OU phylogenetic model fitted the data substantially better than the BM model (AIC = 330.76 vs 345.35), indicating that phylogenetic similarity in assortative mating strength decays rapidly with evolutionary distance. In other words, closely related species tend to resemble one another in mate size correlation, but this resemblance weakens quickly as phylogenetic distance increases. This pattern is consistent with an OU process in which species are pulled towards their own lineage-specific optima, rather than retaining strong similarity over deep evolutionary time-scales.

Although the pooled effect size (meta-analysis) and moderator effect(meta-regression) was similar under both models, the uncertainty around this estimate was slightly smaller under OU, as reflected by the narrower confidence interval and smaller standard error. Under the OU model, species-level variation was captured almost entirely by the OU-structured phylogenetic term, whereas the BM model partitioned this variation between phylogenetic and non-phylogenetic species effects. Together, these results suggest that assortative mating strength shows only limited deep phylogenetic inertia and is instead dominated by relatively shallow phylogenetic resemblance and species-specific variation.

Note: We cannot currently fit this OU model directly in brms or glmmTMB.

Visualisations

Visualising results often makes them much easier to understand than presenting them only in tables or in text. We want to visualise both the overall effect size estimate and the variance components from the random effects - but how to do this nicely?

Traditionally, forest plots have bee used to report the pooled estimate and confidence intervals. However, forest plots does not clearly convey each study’s contribution to the overall estimates (that is, the relative precision or inverse-variance weight of each effect size). The orchaRd package provides a nice alternative, the orchard plot, which visualises the overall effect size along with the precision of each effect size estimate.

Although orchard plots provide an effective summary of model-based estimates, they are not designed to display variance-component decomposition. We therefore construct custom visualisations in ggplot2 to present the pooled effect size alongside the estimated variance components.

Here, we present how to make figures from metafor and brms using meta-analysis model outputs.

metafor_p1 <- orchaRd::orchard_plot(moura_BM_metafor1, 
                           group = "study.id",
                           xlab = "Effect size",
                           angle = 45) + 
  scale_x_discrete(labels = c("Overall effect")) +
  scale_color_manual(values = "#CDAD00") +
  scale_fill_manual(values = "#FFD700") + 
  theme_classic()

## random effect
ci_var <- confint(moura_BM_metafor1, level = 0.95) 
# estimate  ci.lb  ci.ub 
# sigma^2.1   0.0192 0.0108 0.0325 
# sigma.1     0.1384 0.1038 0.1802 
# 
# estimate  ci.lb  ci.ub 
# sigma^2.2   0.0145 0.0121 0.0172 
# sigma.2     0.1202 0.1099 0.1311 
# 
# estimate  ci.lb  ci.ub 
# sigma^2.3   0.0557 0.0334 0.0788 
# sigma.3     0.2359 0.1827 0.2807 
# 
# estimate  ci.lb  ci.ub 
# sigma^2.4   0.0512 0.0179 0.1792 
# sigma.4     0.2263 0.1336 0.4233 

tbl <- tibble::as_tibble(ci_var, rownames = "term")

label_map <- c("Effect_id", "Study_id", "Species", "Phylo")
tbl_var <- tbl |>
  filter(str_detect(term, "^sigma\\^2\\.")) |>
  mutate(
    idx   = as.integer(str_match(term, "\\.(\\d+)$")[,2]),
    label = case_when(
      idx == 1 ~ "Study_id",
      idx == 2 ~ "Effect_id",
      idx == 3 ~ "Species",
      idx == 4 ~ "Phylo"
    ),
    label = factor(label, levels = label_map) 
  )

metafor_p2 <- ggplot(tbl_var, aes(x = label, y = estimate)) +
  geom_point(size = 3.0, color = "#528B8B") +
  geom_errorbar(aes(ymin = ci.lb, ymax = ci.ub), width = 0.25, color = "#528B8B") +
  coord_flip() +
  labs(x = NULL, y = "Variance 95% CI") +
  # scale_y_continuous(breaks = seq(0, 10.0, 1), limits = c(0, 10.0)) + 
  theme_classic(base_size = 12)

metafor_eg1_1 <- metafor_p1 / metafor_p2
metafor_eg1_1

In brms, we can extract posterior samples for both fixed and random effects, and visualise them using ggplot2.

moura_BM_brms1 <- readRDS(here("Rdata", "tutorial_v2", "moura2021_BM_brms.rds"))
get_variables(moura_BM_brms1)

## fixed effects
fixed_effects_samples_brms <- moura_BM_brms1 |>
  spread_draws(b_Intercept)
fixed_effects_samples_brms <- fixed_effects_samples_brms |>
  pivot_longer(cols = starts_with("b_"), 
               names_to = ".variable", 
               values_to = ".value")

brms_p1 <- ggplot(fixed_effects_samples_brms, aes(x = .value, y = .variable)) +
  stat_halfeye(
    normalize = "xy", 
    point_interval = "mean_qi", 
    fill           = "#FFD700",
    color          = "#CDAD00"
  ) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "#005") +
  scale_x_continuous(breaks = seq(-1.0, 2.0, 1)) + 
  labs(y = "Overall effect"
  ) +
  theme_classic()
head(fixed_effects_samples_brms)

## random effects
random_effects_samples_brms <- moura_BM_brms1 |>
  spread_draws(sd_species.id.phy__Intercept, sd_effect.size.id__Intercept, sd_species.id__Intercept, sd_study.id__Intercept)
random_effects_samples_brms <- random_effects_samples_brms |>
  pivot_longer(cols = starts_with("sd_"), 
               names_to = ".variable", 
               values_to = ".value")　|> 
  mutate(.value = .value^2,
         .variable = case_when(
           .variable == "sd_species.id.phy__Intercept" ~ "Phylo",
           .variable == "sd_study.id__Intercept" ~ "Study",
           .variable == "sd_species.id__Intercept" ~ "Species",
           .variable == "sd_effect.size.id__Intercept" ~ "Effect_id"
         ),
         .variable = factor(.variable, 
                            levels = c("Effect_id", "Study", "Species", "Phylo")
                            )
         )

head(random_effects_samples_brms)

brms_p2 <- ggplot(random_effects_samples_brms, aes(x = .value, y = .variable)) +
  stat_halfeye(
    normalize = "xy",
    point_interval = "mean_qi", 
    fill = "#98F5FF",
    color = "#528B8B"
  ) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "#005") +
　scale_x_continuous(breaks = seq(0, 0.08, 0.02), limits = c(0, 0.08)) + 
  labs(
    # title = "Posterior distributions of random effects - brms",
    y = "Random effects (variance)"
  ) +
  theme_classic()

2. Lim et al. (2014)

We used the o_o_unadj dataset from dat.lim2014 in the metadat package. The dataset come from Lim et al. (2014). It includes correlation coefficients (ri) and sample sizes (ni) describing the association between offspring size and offspring number, unadjusted for maternal size, across multiple species, along with a matching phylogenetic tree. We converted ri to Fisher’s z effect sizes (yi) and derived the corresponding sampling variances (vi) prior to model fitting. The resulting example dataset contained 170 effect sizes and 120 species.

# load dataset and tree
dat_lim2014 <- dat.lim2014$o_o_unadj
tre_lim2014 <- dat.lim2014$o_o_unadj_tree

# calculate effect size and sampling variances
dat_lim2014 <- escalc(measure = "ZCOR", ri = ri, ni = ni, data = dat_lim2014)

# check the tree
is.binary(tre_lim2014)  # TRUE
is.ultrametric(tre_lim2014)
# in .is.ultrametric_ape(phy, tol, option, length(phy$tip.label)) : the tree has no branch lengths
# -> so we need to get the branch length…
# compute branch length
tre_lim2014 <- compute.brlen(tre_lim2014)

# make the additional species column - we will use this to consider phylo- and non-phylo random effects
dat_lim2014$phy <- dat_lim2014$species

# make effect size id
dat_lim2014$id <- 1:nrow(dat_lim2014)

head(dat_lim2014)# check the dataset

Brownian motion (meta-analysis)

# make phylogenetic correlation matrix
A <- vcv(tre_lim2014, corr = TRUE)

fit_phylo_eg2_metafor_ma <- rma.mv(yi, vi,
                      random = list(~ 1 | id,
                                    ~ 1 | phy,
                                    ~ 1| species),
                      R = list(phy = A),
                      data = dat_lim2014,
                      verbose = TRUE,
                      sparse = TRUE,
                      method = "REML"
                      )

summary(fit_phylo_eg2_metafor_ma)
# Multivariate Meta-Analysis Model (k = 170; method: REML)
# 
#   logLik  Deviance       AIC       BIC      AICc   
# -93.3327  186.6653  194.6653  207.1849  194.9092   
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed   factor    R 
# sigma^2.1  0.0630  0.2510    170     no       id   no 
# sigma^2.2  0.0534  0.2311    120     no      phy  yes 
# sigma^2.3  0.0777  0.2788    120     no  species   no 
# 
# Test for Heterogeneity:
# Q(df = 169) = 1823.4239, p-val < .0001
# 
# Model Results:
# 
# estimate      se     zval    pval    ci.lb   ci.ub    
#  -0.1308  0.1219  -1.0729  0.2833  -0.3696  0.1081

lim_vcv <- vcv.phylo(tre_lim2014, corr = TRUE)
vcv <- diag(dat_lim2014$vi)
rownames(vcv) <- colnames(vcv) <- dat_lim2014$id

fit_phylo_eg2_brms_ma <-  bf(yi ~ 1 +
                     (1 | species) +
                     (1 | gr(phy, cov = lim_vcv)) +
                     (1 | gr(id, cov = vcv))
                     )

prior_ma <- default_prior(fit_phylo_eg2_brms_ma,
                data = dat_lim2014,
                data2 = list(lim_vcv = lim_vcv, vcv = vcv),
                family = gaussian())

prior_ma$prior[3] = "constant(1)"

system.time(
  phylo_eg2_brms_ma <- brm(
    formula = fit_phylo_eg2_brms_ma,
    family = gaussian(),
    data = dat_lim2014,
    data2 = list(lim_vcv = lim_vcv, vcv = vcv),
    prior = prior_ma,
    iter = 5000,
    warmup = 2000,
    chains = 4,
    backend = "cmdstanr",
    threads = threading(threads_per_chain),
    control = list(adapt_delta = 0.98, max_treedepth = 15)
    )
  )

summary(phylo_eg2_brms_ma)
#  Family: gaussian 
#   Links: mu = identity 
# Formula: yi ~ 1 + (1 | species) + (1 | gr(phy, cov = lim_vcv)) + (1 | gr(id, cov = vcv)) 
#    Data: dat_lim2014 (Number of observations: 170) 
#   Draws: 4 chains, each with iter = 5000; warmup = 2000; thin = 1;
#          total post-warmup draws = 12000
# 
# Multilevel Hyperparameters:
# ~id (Number of levels: 170) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     1.00      0.00     1.00     1.00   NA       NA       NA
# 
# ~phy (Number of levels: 120) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.29      0.12     0.09     0.57 1.00     1935     2772
# 
# ~species (Number of levels: 120) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.26      0.07     0.10     0.37 1.00      896      978
# 
# Regression Coefficients:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept    -0.13      0.16    -0.45     0.19 1.00     6557     6385
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.26      0.04     0.19     0.35 1.00      870     1995


fit_phylo_eg2_brms_ma <-  bf(yi | se(sqrt(vi)) ~ 1+
                     (1 | species) +
                     (1 | gr(phy, cov = lim_vcv)) +
                     (1 | id))

prior_ma <- default_prior(fit_phylo_eg2_brms_ma,
                data = dat_lim2014,
                data2 = list(lim_vcv = lim_vcv),
                family = gaussian())

system.time(
  phylo_eg2_brms_ma_2 <- brm(
    formula = fit_phylo_eg2_brms_ma,
    family = gaussian(),
    data = dat_lim2014,
    data2 = list(lim_vcv = lim_vcv),
    prior = prior_ma,
    iter = 5000,
    warmup = 2000,
    chains = 4,
    backend = "cmdstanr",
    threads = threading(threads_per_chain),
    control = list(adapt_delta = 0.98, max_treedepth = 15)
    )
  )

summary(phylo_eg2_brms_ma_2)
#  Family: gaussian 
#   Links: mu = identity 
# Formula: yi | se(sqrt(vi)) ~ 1 + (1 | species) + (1 | gr(phy, cov = lim_vcv)) + (1 | id) 
#    Data: dat_lim2014 (Number of observations: 170) 
#   Draws: 4 chains, each with iter = 5000; warmup = 2000; thin = 1;
#          total post-warmup draws = 12000
# 
# Multilevel Hyperparameters:
# ~id (Number of levels: 170) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.26      0.04     0.19     0.35 1.00     1311     1410
# 
# ~phy (Number of levels: 120) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.29      0.12     0.09     0.58 1.00     1732     3147
# 
# ~species (Number of levels: 120) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.26      0.07     0.08     0.37 1.01     1010      714
# 
# Regression Coefficients:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept    -0.13      0.16    -0.44     0.18 1.00     5189     6304
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.00      0.00     0.00     0.00   NA       NA       NA

dat_lim2014$id <- factor(
          as.character(dat_lim2014$id),
          levels = mixedsort(unique(as.character(dat_lim2014$id)))
          )
class(dat_lim2014$id)

A <- A[sort(rownames(A)), sort(rownames(A))]
vcv <- diag(dat_lim2014$vi, nrow = nrow(dat_lim2014))

rownames(vcv)<- colnames(vcv)<- dat_lim2014$id

dat_lim2014$g <- 1

phylo_eg2_tmb_ma <- glmmTMB(yi ~ 1 +
                           equalto(0 + id|g, vcv) +
                            (1| species) +
                          propto(0 + phy|g, A),
                          data = dat_lim2014,
                          REML = T)

summary(phylo_eg2_tmb_ma)
# Family: gaussian  ( identity )
# Formula:          yi ~ 1 + equalto(0 + id | g, vcv) + (1 | species) + propto(0 +      phy | g, A)
# Data: dat_lim2014
# 
#       AIC       BIC    logLik -2*log(L)  df.resid 
#     199.8     212.3     -95.9     191.8       166 
# 
# Random effects:
# 
# Conditional model:
#  Groups   Name                         Variance  Std.Dev. Corr                                   
#  g        id1                          0.0555556 0.23570                                         
#           id2                          0.0909091 0.30151  0.00                                   
#           id3                          0.0555556 0.23570  0.00 0.00                              
#           id4                          0.0909091 0.30151  0.00 0.00 0.00                         
#           id5                          0.0140845 0.11868  0.00 0.00 0.00 0.00                    
#           id6                          0.0098039 0.09901  0.00 0.00 0.00 0.00 0.00               
#           id7                          0.0099010 0.09950  0.00 0.00 0.00 0.00 0.00 0.00          
#           id8                          0.0217391 0.14744  0.00 0.00 0.00 0.00 0.00 0.00 0.00     
#           id9                          0.0031746 0.05634  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
#           id10                         0.0208333 0.14434  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
#           ...                          ...       ...      ...  ...  ...  ...  ...  ...  ...  ... 
#  species  (Intercept)                  0.0777194 0.27878                                         
#  g.1      phyAlectoris_rufa            0.0533848 0.23105                                         
#           phyAlytes_obstetricans       0.0533848 0.23105  0.20                                   
#           phyAnas_platyrhynchos        0.0533848 0.23105  0.92 0.20                              
#           phyAnguis_fragilis           0.0533848 0.23105  0.32 0.20 0.32                         
#           phyApalone_ferox             0.0533848 0.23105  0.65 0.20 0.65 0.32                    
#           phyApalone_mutica            0.0533848 0.23105  0.65 0.20 0.65 0.32 0.99               
#           phyAphrastura_spinicauda     0.0533848 0.23105  0.77 0.20 0.77 0.32 0.65 0.65          
#           phyAraschnia_levana          0.0533848 0.23105  0.00 0.00 0.00 0.00 0.00 0.00 0.00     
#           phyAustropotamobius_italicus 0.0533848 0.23105  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.93
#           phyAythya_ferina             0.0533848 0.23105  0.92 0.20 0.99 0.32 0.65 0.65 0.77 0.00
#           ...                          ...       ...      ...  ...  ...  ...  ...  ...  ...  ... 
#  Residual                              0.0629794 0.25096                    
#          
#  0.00    
#  ...  ...
#          
#  0.00    
#  ...  ...
#          
# Number of obs: 170, groups:  g, 1; species, 120
# 
# Dispersion estimate for gaussian family (sigma^2): 0.063 
# 
# Conditional model:
#             Estimate Std. Error z value Pr(>|z|)
# (Intercept)  -0.1308     0.1220  -1.072    0.284


phylo_eg2_tmb$fit$par
#  betadisp     theta     theta 
# -1.382474 -1.277325 -2.930230

Across metafor, brms, and glmmTMB, this example yielded highly consistent results. The estimated overall mean effect was approximately \(\beta_0\) = -0.13 in all three packages, and heterogeneity was partitioned similarly across effect-size, phylogenetic, and species-level components. Small differences in the reported variance terms arise from package-specific parametrisations, particularly in the way sampling variance is represented.

Biologically, the negative mean effect suggests a tendency for larger offspring size to be associated with fewer offspring, consistent with the classic offspring size-number trade-off. At the same time, the non-zero phylogenetic and species-level variance components indicate that the strength of this relationship differs among taxa and is only partly explained by shared evolutionary history.

Brawnian motion (meta-regression)

fit_phylo_eg2_metafor_mr <- rma.mv(yi, vi,
                              mods = ~ environment,
                              random = list(~ 1 | id,
                                            ~ 1 | phy,
                                            ~ 1 | species),
                              R = list(phy = A),
                              data = dat_lim2014,
                              verbose = TRUE,
                              sparse = TRUE,
                              method = "REML"
                              )

summary(fit_phylo_eg2_metafor_mr)
# Multivariate Meta-Analysis Model (k = 170; method: REML)
# 
#   logLik  Deviance       AIC       BIC      AICc   
# -93.0600  186.1200  196.1200  211.7398  196.4904   
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed   factor    R 
# sigma^2.1  0.0630  0.2511    170     no       id   no 
# sigma^2.2  0.0521  0.2283    120     no      phy  yes 
# sigma^2.3  0.0792  0.2815    120     no  species   no 
# 
# Test for Residual Heterogeneity:
# QE(df = 168) = 1774.0613, p-val < .0001
# 
# Test of Moderators (coefficient 2):
# QM(df = 1) = 0.0401, p-val = 0.8413
# 
# Model Results:
# 
#                  estimate      se     zval    pval    ci.lb   ci.ub    
# intrcpt           -0.1395  0.1289  -1.0826  0.2790  -0.3921  0.1131    
# environmentwild    0.0166  0.0829   0.2003  0.8413  -0.1460  0.1792

fit_phylo_eg2_brms_mr <-  bf(yi ~ 1 + environment +
                             (1 | species) +
                             (1 | gr(phy, cov = lim_vcv)) +
                             (1 | gr(id, cov = vcv))
                             )

prior_mr <- default_prior(fit_phylo_eg2_brms_mr,
                data = dat_lim2014,
                data2 = list(lim_vcv = lim_vcv, vcv = vcv),
                family = gaussian())

prior_mr$prior[5] = "constant(1)"

phylo_eg2_brms_mr <- brm(
        formula = fit_phylo_eg2_brms_mr,
        family = gaussian(),
        data = dat_lim2014,
        data2 = list(lim_vcv = lim_vcv, vcv = vcv),
        prior = prior_mr,
        iter = 5000,
        warmup = 2000,
        chains = 4,
        backend = "cmdstanr",
        threads = threading(threads_per_chain),
        control = list(adapt_delta = 0.98, max_treedepth = 15)
        )

summary(phylo_eg2_brms_mr)
#  Family: gaussian 
#   Links: mu = identity 
# Formula: yi ~ 1 + environment + (1 | species) + (1 | gr(phy, cov = lim_vcv)) + (1 | gr(id, cov = vcv)) 
#    Data: dat_lim2014 (Number of observations: 170) 
#   Draws: 4 chains, each with iter = 5000; warmup = 2000; thin = 1;
#          total post-warmup draws = 12000
# 
# Multilevel Hyperparameters:
# ~id (Number of levels: 170) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     1.00      0.00     1.00     1.00   NA       NA       NA
# 
# ~phy (Number of levels: 120) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.28      0.12     0.08     0.57 1.00     2150     2926
# 
# ~species (Number of levels: 120) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.26      0.07     0.10     0.37 1.00     1017     1099
# 
# Regression Coefficients:
#                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept          -0.14      0.16    -0.46     0.19 1.00     9372     7645
# environmentwild     0.02      0.09    -0.15     0.18 1.00    10459     8813
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.26      0.04     0.19     0.35 1.00      949     1860
# 
# Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).

phylo_eg2_tmb_mr <- glmmTMB(yi ~ 1 + environment +
                             equalto(0 + id|g, vcv) +
                             (1| species) +
                             propto(0 + phy|g, A),
                           data = dat_lim2014,
                           REML = T)

summary(phylo_eg2_tmb_mr)
#  Family: gaussian  ( identity )
# Formula:          yi ~ 1 + environment + equalto(0 + id | g, vcv) + (1 | species) +  
#     propto(0 + phy | g, A)
# Data: dat_lim2014
# 
#       AIC       BIC    logLik -2*log(L)  df.resid 
#     204.9     220.6     -97.5     194.9       165 
# 
# Random effects:
# 
# Conditional model:
#  Groups   Name                         Variance  Std.Dev. Corr                                   
#  g        id1                          0.0555556 0.23570                                         
#           id2                          0.0909091 0.30151  0.00                                   
#           id3                          0.0555556 0.23570  0.00 0.00                              
#           id4                          0.0909091 0.30151  0.00 0.00 0.00                         
#           id5                          0.0140845 0.11868  0.00 0.00 0.00 0.00                    
#           id6                          0.0098039 0.09901  0.00 0.00 0.00 0.00 0.00               
#           id7                          0.0099010 0.09950  0.00 0.00 0.00 0.00 0.00 0.00          
#           id8                          0.0217391 0.14744  0.00 0.00 0.00 0.00 0.00 0.00 0.00     
#           id9                          0.0031746 0.05634  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
#           id10                         0.0208333 0.14434  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
#           ...                          ...       ...      ...  ...  ...  ...  ...  ...  ...  ... 
#  species  (Intercept)                  0.0792370 0.28149                                         
#  g.1      phyAlectoris_rufa            0.0521124 0.22828                                         
#           phyAlytes_obstetricans       0.0521124 0.22828  0.20                                   
#           phyAnas_platyrhynchos        0.0521124 0.22828  0.92 0.20                              
#           phyAnguis_fragilis           0.0521124 0.22828  0.32 0.20 0.32                         
#           phyApalone_ferox             0.0521124 0.22828  0.65 0.20 0.65 0.32                    
#           phyApalone_mutica            0.0521124 0.22828  0.65 0.20 0.65 0.32 0.99               
#           phyAphrastura_spinicauda     0.0521124 0.22828  0.77 0.20 0.77 0.32 0.65 0.65          
#           phyAraschnia_levana          0.0521124 0.22828  0.00 0.00 0.00 0.00 0.00 0.00 0.00     
#           phyAustropotamobius_italicus 0.0521124 0.22828  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.93
#           phyAythya_ferina             0.0521124 0.22828  0.92 0.20 0.99 0.32 0.65 0.65 0.77 0.00
#           ...                          ...       ...      ...  ...  ...  ...  ...  ...  ...  ... 
#  Residual                              0.0630428 0.25108                 
#  0.00    
#  ...  ...
#  0.00    
#  ...  ...
#          
# Number of obs: 170, groups:  g, 1; species, 120
# 
# Dispersion estimate for gaussian family (sigma^2): 0.063 
# 
# Conditional model:
#                 Estimate Std. Error z value Pr(>|z|)
# (Intercept)     -0.13951    0.12893  -1.082    0.279
# environmentwild  0.01661    0.08469   0.196    0.844

Visualisations

In this dataset, we plot the meta-regression model output.

mr_res <- orchaRd::mod_results(
  fit_phylo_eg2_metafor_mr,
  mod   = "environment",
  group = "id"
)

mr_res

## 3. orchard plot
metafor_mr_p1 <- orchaRd::orchard_plot(
  mr_res,
  mod   = "environment",
  group = "id",
  xlab  = "Effect size",
  angle = 45
) +
#   scale_color_manual(values = "#CDAD00") +
#   scale_fill_manual(values = "#FFD700") +
  theme_classic()

metafor_mr_p1

## for random effect
ci_var_mr <- confint(fit_phylo_eg2_metafor_mr, level = 0.95)

tbl_mr <- tibble::as_tibble(ci_var_mr, rownames = "term")

label_map_mr <- c("Effect_id", "Species", "Phylo")

tbl_var_mr <- tbl_mr |>
  filter(str_detect(term, "^sigma\\^2\\.")) |>
  mutate(
    idx = as.integer(str_match(term, "\\.(\\d+)$")[, 2]),
    label = case_when(
      idx == 1 ~ "Effect_id",   # ~1 | id
      idx == 2 ~ "Phylo",       # ~1 | phy
      idx == 3 ~ "Species"      # ~1 | species
    ),
    label = factor(label, levels = label_map_mr)
  )

metafor_mr_p2 <- ggplot(tbl_var_mr, aes(x = label, y = estimate)) +
  geom_point(size = 3.0, color = "#528B8B") +
  geom_errorbar(aes(ymin = ci.lb, ymax = ci.ub), width = 0.25, color = "#528B8B") +
  coord_flip() +
  labs(x = NULL, y = "Variance (95% CI)") +
  theme_classic(base_size = 12)

metafor_eg2_1 <- metafor_mr_p1 / metafor_mr_p2
metafor_eg2_1

get_variables(phylo_eg2_brms_mr)

## fixed effects
fixed_effects_samples_brms <- phylo_eg2_brms_mr |>
  spread_draws(b_Intercept)
fixed_effects_samples_brms <- fixed_effects_samples_brms |>
  pivot_longer(cols = starts_with("b_"), 
               names_to = ".variable", 
               values_to = ".value")

brms_p1 <- ggplot(fixed_effects_samples_brms, aes(x = .value, y = .variable)) +
  stat_halfeye(
    normalize = "xy", 
    point_interval = "mean_qi", 
    fill           = "#FFD700",
    color          = "#CDAD00"
  ) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "#005") +
  scale_x_continuous(breaks = seq(-1.0, 2.0, 1)) + 
  labs(y = "Overall effect"
  ) +
  theme_classic()
head(fixed_effects_samples_brms)

## random effects
random_effects_samples_brms <- phylo_eg2_brms_mr |>
  spread_draws(sd_phy__Intercept, sd_species__Intercept, sigma) |>
  as_tibble()

random_effects_samples_brms <- random_effects_samples_brms |>
  pivot_longer(
    cols = c(sd_phy__Intercept, sd_species__Intercept, sigma),
    names_to = ".variable",
    values_to = ".value"
  ) |>
  mutate(
    .value = .value^2, # sd -> variance
    .variable = case_when(
      .variable == "sd_phy__Intercept" ~ "Phylo",
      .variable == "sd_species__Intercept" ~ "Species",
      .variable == "sigma" ~ "Effect_id" # remember use "sigma" as we set sd of id = 1
    ),
    .variable = factor(.variable,
                       levels = c("Effect_id", "Species", "Phylo"))
  )

head(random_effects_samples_brms)

brms_p2 <- ggplot(random_effects_samples_brms, aes(x = .value, y = .variable)) +
  stat_halfeye(
    normalize = "xy",
    point_interval = "mean_qi", 
    fill = "#98F5FF",
    color = "#528B8B"
  ) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "#005") +
　# scale_x_continuous(breaks = seq(0, 0.08, 0.02), limits = c(0, 0.08)) + 
  labs(
    # title = "Posterior distributions of random effects - brms",
    y = "Random effects (variance)"
  ) +
  theme_classic() 

brms_eg2 <- brms_p1 / brms_p2

Spatial meta-analysis

How to make spatial correlation matrix?

In spatial meta-analysis, we model the residual similarity between effect sizes as a function of the pairwise spatial separation among observations. In many applications, each effect size is associated with a sampling location defined by latitude and longitude, from which pairwise geographic distances can be calculated. More generally, however, the key input is a matrix describing the pairwise separation among effect sizes, which may reflect straight-line geographic distance or other biologically meaningful measures of spatial separation, such as habitat connectivity, landscape resistance, barriers to movement, or distances among broader spatial units when exact sampling coordinates are unavailable. Depending on the spatial correlation structure assumed (for example, exponential, Gaussian, or spherical), this pairwise distance matrix is then transformed into a correlation matrix that describes the expected similarity between effect sizes based on their spatial proximity. In this context, the kernel is the mathematical function that maps pairwise distances to expected correlations, thereby specifying how quickly similarity declines with distance. In the examples below, we use pairwise distances calculated from exact sampling coordinates.

Step 1. Prepare data:

Ensure each effect size has geographic coordinates, a unique identifier and projected coordinates in km

Each effect size should have geographic coordinates (latitude and longitude) indicating where the data were collected. Normally, these data would be extracted from the study and/or imputed/approximated if unreported. Because latitude and longitude are measured in degrees, we first project them to a planar coordinate reference system (CRS) and then convert the resulting coordinates to kilometres. This allows us to compute meaningful Euclidean distances.

library(tidyverse)
library(sf)

toy_sites <- tibble(
  effect_id = 1:5,
  latitude = c(34.05, 36.16, 40.71, 34.05, 37.77),
  longitude = c(-118.24, -115.15, -74.00, -118.25, -122.42)
  )

# convert to an sf object (WGS84 longitude / latitude)
toy_sites_sf <- st_as_sf(
  toy_sites, 
  coords = c("longitude", "latitude"), 
  crs = 4326)

# project to a planar coordinate system and convert to kilometers
# EPSG:3857 is convenient for small-scale distance calculations
# for real analyses, consider using an appropriate local projection
toy_sites_sf_proj <- st_transform(toy_sites_sf, crs = 3857)

xy_m <- st_coordinates(toy_sites_sf_proj)
toy_sites$x_km <- xy_m[,1] / 1000
toy_sites$y_km <- xy_m[,2] / 1000
toy_sites
# A tibble: 5 × 5
#   effect_id latitude longitude    x_km  y_km
#       <int>    <dbl>     <dbl>   <dbl> <dbl>
# 1         1     34.0     -118. -13162. 4036.
# 2         2     36.2     -115. -12818. 4323.
# 3         3     40.7      -74   -8238. 4970.
# 4         4     34.0     -118. -13164. 4036.
# 5         5     37.8     -122. -13628. 4547.

The projected coordinates (x_km, y_km) can be negative because they represent positions relative to the origin of the coordinate system. We first assign a geographic CRS (EPSG:4326) to declare that the coordinates are in latitude and longitude.
Then we transform the coordinates to a planar CRS (EPSG:3857) suitable for distance calculations, and convert the resulting x and y coordinates from meters to kilometres.

Step 2. Compute pairwise Euclidean distance matrix

Once each effect size has projected coordinates in kilometres, we can compute the Euclidean distance between all pairs of sampling locations. These distances form the basis of spatial correlation structures in spatial meta-analysis.

# extract projected coordinates
coords_km <- as.matrix(toy_sites[, c("x_km", "y_km")])

# compute pairwise Euclidean distances (in km)
D <- as.matrix(dist(coords_km))

# assign row and column names using effect_id
rownames(D) <- toy_sites$effect_id
colnames(D) <- toy_sites$effect_id

D
#           1         2        3           4         5
# 1    0.000000  448.0744 5012.587    1.113195  691.4604
# 2  448.074446    0.0000 4626.263  448.929589  839.8075
# 3 5012.586560 4626.2633    0.000 5013.680258 5406.6368
# 4    1.113195  448.9296 5013.680    0.000000  690.7118
# 5  691.460407  839.8075 5406.637  690.711777    0.0000

Each element \(D_{ij}\) is the straight-line distance (in kilometers) between effect sizes i and j.

Why this matrix matters?

Spatial meta-analytic models assume that residual similarity between effect sizes decays with geographic distance. The distance matrix \(\mathbf{D}\) encodes this information.

Differences in implementation and requirements between \(R\) packages:

In metafor, we supply \(\mathbf{D}\) directly, and metafor transforms it into a residual correlation matrix using a specified kernel, such as a linear, exponential, or Gaussian kernel(more info: here).
In brms, we supply the geographic coordinates, and brms internally computes the pairwise distances and applies the corresponding distance-based kernel.
In glmmTMB, we supply geographic coordinates, and it models spatial dependence through a continuous spatial random field whose covariance structure is implicitly governed by distance rather than by an explicit user-supplied distance matrix.

Examples from real meta-analyses

1. Grau-Andrés et al. (2024)

The dataset comes from Grau-Andrés et al. (2024) and contains 2361 effect sizes of Hedges’ d. Each effect size is associated with a geographic location given by latitude and longitude. These locations represent sampling sites at which the original biological measurement were collected.

The original data provide latitude and longitude in degrees. We first project these coordinates to a planar CRS and convert them to kilometres. We then compute the pairwise Euclidean distance matrix among all effect sizes based on their projected coordinates.

dat_Roger <- read.csv(here("data", "Roger_etal_2024", "Roger_etal_2024.csv"))

dat_Roger <- dat_Roger |>
  filter(!is.na(longitude))
coords2 <- cbind(dat_Roger$longitude, dat_Roger$latitude)

# project to a planar coordinate system and convert to kilometers
dat_sf2 <- st_as_sf(dat_Roger, coords = c("longitude", "latitude"), crs = 4326)
dat_sf2_proj <- st_transform(dat_sf2, crs = 3857)

coords_m <- st_coordinates(dat_sf2_proj)
dat_Roger$x_km <- coords_m[,1] / 1000
dat_Roger$y_km <- coords_m[,2] / 1000

coords_km <- cbind(dat_Roger$x_km, dat_Roger$y_km)
dist_matrix_euclid2 <- as.matrix(dist(coords_km))


dat_Roger$effect_id <- seq_len(nrow(dat_Roger))

dat_Roger$effect_id <- factor(
  as.character(dat_Roger$effect_id),
  levels = mixedsort(unique(as.character(dat_Roger$effect_id)))
  )

# row and column names
rownames(dist_matrix_euclid2) <- dat_Roger$effect_id
colnames(dist_matrix_euclid2) <- dat_Roger$effect_id

dat_Roger <- dat_Roger |>
  group_by(latitude, longitude) |>
  mutate(site_id = cur_group_id()) |>
  ungroup()
names(dat_Roger)

The exponential kernel assumes that similarity between two effect sizes decreases smoothly as the geographic distance between their location increases. Therefore, nearby sites are strongly correlated, while distant sites are effectively independent.

metafor

In metafor, the spatial process is introduced by specifying struct = "SPEXP" together with either (1) via a supplied distance matrix using random = ~ effect_id|const and dist = list(…), or (2) via coordinates using random = ~ x_km + y_km|const, in which case distances are computed internally.

dat_Roger$const <- 1 # add constant for spatial models

Here, const is a dummy outer factor set to a constant (all ones) so that all effects are allowed to be spatially correlated. If outer were set to a grouping variable (e.g., study ID), spatial correlation would only be modelled within groups (block-diagonal structure). The spatial random field is parametrised by (1) \(\tau^2\), the marginal variance of the spatial process (its amplitude), and (2) \(\rho\), the range/scale parameter controlling how quickly correlation decays with distance under the chosen kernel (we show SPEXP- exponential kernel and also SPGAU- Gaussian process version later). Let \(d_{ij}\) be the distance between locations \(i\) and \(j\). The spatial random effects satisfy \(\mathrm{Cov}(u_i,u_j)=\tau^2\,\mathrm{cor}_{\rho}(d_{ij})\), where \(\mathrm{cor}_{\rho}(d_{ij})\) is the kernel-specific correlation function that decreases with distance, and \(\rho\) controls the rate of decay. Distances can be supplied directly via text{dist=} or computed internally from coordinates (~ x_km + y_km | const); both are equivalent when they imply the same Euclidean distance matrix.

# Use distance matrix
ma_sp1_exp0_mf <- rma.mv(d_Hedges, var_Hedges, 
                 random = list(
                   ~ 1|effect_id, 
                   ~ effect_id|const
                 ), 
                 struct = "SPEXP", 
                 data = dat_Roger,
                 dist = list(Site = dist_matrix_euclid2),
                 test = "t",
                 method = "REML",
                 sparse = TRUE,
                 verbose = TRUE)
summary(ma_sp1_exp0_mf)
# Multivariate Meta-Analysis Model (k = 2361; method: REML)
# 
#     logLik    Deviance         AIC         BIC        AICc   
# -4034.4852   8068.9704   8076.9704   8100.0360   8076.9874   
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed     factor 
# sigma^2    0.7935  0.8908   2361     no  effect_id 
# 
# outer factor: const      (nlvls = 1)
# inner term:   ~effect_id (nlvls = 2361)
# 
#             estim    sqrt  fixed 
# tau^2      1.2330  1.1104     no 
# rho        0.1744             no 
# 
# Test for Heterogeneity:
# Q(df = 2360) = 21273.5149, p-val < .0001
# 
# Model Results:
# 
# estimate      se     tval    df    pval    ci.lb    ci.ub      
#  -0.3349  0.0644  -5.2006  2360  <.0001  -0.4612  -0.2086  *** 


# Use the coordinates directly (Euclidean distance)
system.time(sp_eg1_exp_mf <- rma.mv(d_Hedges, var_Hedges, 
                              random = list(
                                ~ 1|effect_id,
                                ~ x_km + y_km |const
                              ), 
                              struct = "SPEXP", 
                              data = dat_Roger,
                              sparse = TRUE,
                              verbose = TRUE,
                              method = "REML",
                              test = "t"
                              )
            )

summary(sp_eg1_exp_mf)
# Multivariate Meta-Analysis Model (k = 2361; method: REML)
# 
#     logLik    Deviance         AIC         BIC        AICc   
# -4034.4852   8068.9704   8076.9704   8100.0360   8076.9874   
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed     factor 
# sigma^2    0.7935  0.8908   2361     no  effect_id 
# 
# outer factor: const        (nlvls = 1)
# inner term:   ~x_km + y_km (nlvls = 383)
# 
#             estim    sqrt  fixed 
# tau^2      1.2330  1.1104     no 
# rho        0.1744             no 
# 
# Test for Heterogeneity:
# Q(df = 2360) = 21273.5149, p-val < .0001
# 
# Model Results:
# 
# estimate      se     tval    df    pval    ci.lb    ci.ub      
#  -0.3349  0.0644  -5.2006  2360  <.0001  -0.4612  -0.2086  *** 

confint(sp_eg1_exp_mf)
#         estimate  ci.lb  ci.ub 
# sigma^2   0.7935 0.7257 0.8678 
# sigma     0.8908 0.8519 0.9316 
# 
#       estimate  ci.lb  ci.ub 
# tau^2   1.2330 1.0147 1.4976 
# tau     1.1104 1.0073 1.2238 
# 
#     estimate   ci.lb  ci.ub 
# rho   0.1744 <0.0174 0.8572

Both metafor models (use distance matrix or coordinates directly or Euclidean distance) returned the same output.

brms

In brms, the covariance function of the Gaussian process is controlled by the cov argument. To use an exponential kernel, set cov = "exponential". Note that scale determines whether the input predictors are rescaled (by default, to make the maximum Euclidean distance equal to 1). If the coordinates are already expressed in meaningful units such as kilometres, and you want the GP length-scale parameter to be interpretable in those units, it is advisable to use scale = FALSE.

dat_Roger$effect_id <- as.character(dat_Roger$effect_id)
vcv <- diag(dat_Roger$var_Hedges)

rownames(vcv) <- colnames(vcv) <- dat_Roger$effect_id

fit_sp_eg1_exp_brms <- bf(d_Hedges ~ 1 +
                 # (1|site_id) + # this is site-level random effect
                 (1|gr(effect_id, cov = vcv)) + # this is m (sampling error)
                gp(x_km, y_km, 
                   cov = "exponential",
                   scale = FALSE)
                )

# generate default priors
prior <- default_prior(fit_sp_eg1_exp_brms, 
                           data = dat_Roger, 
                           data2 = list(vcv = vcv),
                           family = gaussian())
prior$prior[5] = "constant(1)" # meta-analysis assumes sampling variance is known so fixing this to 1
prior
 #                            prior     class       coef     group resp dpar nlpar lb ub tag       source
 #          student_t(3, -0.1, 2.5) Intercept                                                     default
 #                           (flat)    lscale                                       0             default
 # inv_gamma(1.494197, 2141.138175)    lscale gpx_kmy_km                            0             default
 #             student_t(3, 0, 2.5)        sd                                       0             default
 #                      constant(1)        sd            effect_id                  0             default <- check this
 #                      constant(1)        sd  Intercept effect_id                  0        (vectorized) <- check this
 #             student_t(3, 0, 2.5)      sdgp                                       0             default
 #             student_t(3, 0, 2.5)      sdgp gpx_kmy_km                            0        (vectorized)
 #             student_t(3, 0, 2.5)     sigma                                       0             default

# fitting model
sp_eg1_exp_brms <- brm(
  formula = fit_sp_eg1_exp_brms,
  data = dat_Roger,
  data2 = list(vcv=vcv),
  chains = 2,
  iter = 6000,
  warmup = 3000,
  prior = prior,
  control = list(adapt_delta = 0.95, max_treedepth = 15)
  )
summary(sp_eg1_exp_brms)
#  Family: gaussian 
#   Links: mu = identity 
# Formula: d_Hedges ~ 1 + (1 | gr(effect_id, cov = vcv)) + gp(x_km, y_km, cov = "exponential", scale = FALSE) 
#    Data: dat_Roger (Number of observations: 2361) 
#   Draws: 2 chains, each with iter = 6000; warmup = 3000; thin = 1;
#          total post-warmup draws = 6000
# 
# Gaussian Process Hyperparameters:
#                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sdgp(gpx_kmy_km)       1.14      0.07     1.00     1.29 1.00     1818     3552
# lscale(gpx_kmy_km)   128.18     26.02    85.59   187.10 1.00     2227     2567
# 
# Multilevel Hyperparameters:
# ~effect_id (Number of levels: 2361) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     1.00      0.00     1.00     1.00   NA       NA       NA
# 
# Regression Coefficients:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept    -0.32      0.09    -0.51    -0.15 1.00     1503     2737
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.97      0.02     0.92     1.01 1.00     3047     4199

glmmTMB

In glmmTMB, the exponential spatial field is specified by exp(pos + 0 | const), where pos is an index constructed from the spatial coordinates.

dat_Roger$effect_id <- factor(dat_Roger$effect_id)
dat_Roger$site_id <- factor(dat_Roger$site_id)

VCV <- diag(dat_Roger$var_Hedges, nrow = nrow(dat_Roger)) 
rownames(VCV)<- colnames(VCV)<- dat_Roger$effect_id
VCV[1:5, 1:5]

dat_Roger$pos <- numFactor(dat_Roger$x_km, dat_Roger$y_km)

system.time(
  sp_eg1_exp_tmb <- glmmTMB(d_Hedges ~ 1 
                   + equalto(0+effect_id|const, VCV)
                   # + (1|site_id)
                   + exp(pos+0|const),
                   data = dat_Roger, 
                   REML=TRUE)
  )
 #   user  system elapsed 
 # 45.570   1.369  47.530 

summary(sp_eg1_exp_tmb)　
# Family: gaussian  ( identity )
# Formula:          d_Hedges ~ 1 + equalto(0 + effect_id | const, VCV) + exp(pos +      0 | const)
# Data: dat_Roger
# 
#       AIC       BIC    logLik -2*log(L)  df.resid 
#    9088.5    9111.5   -4540.2    9080.5      2357 
# 
# Random effects:
# 
# Conditional model:
#  Groups   Name                                     Variance  Std.Dev. Corr                                            
#  const    effect_id1                                0.202208 0.44968                                                  
#           effect_id10                               0.166676 0.40826  0.00                                            
#           effect_id100                              0.225770 0.47515  0.00 0.00                                       
#           effect_id1000                             0.197994 0.44497  0.00 0.00 0.00                                  
#           effect_id1001                             3.676968 1.91754  0.00 0.00 0.00 0.00                             
#           effect_id1002                             0.315616 0.56180  0.00 0.00 0.00 0.00 0.00                        
#           effect_id1003                             0.305460 0.55268  0.00 0.00 0.00 0.00 0.00 0.00                   
#           effect_id1004                             0.313426 0.55984  0.00 0.00 0.00 0.00 0.00 0.00 0.00              
#           effect_id1005                             0.413540 0.64307  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00         
#           effect_id1006                             0.325726 0.57072  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00    
#           ...                                      ...       ...      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
#  const.1  pos(16262.6644099893,-5204.51951303405)   1.866150 1.36607                                                  
#           pos(16280.4755285163,-5116.14629455727)   1.866150 1.36607  0.00                                            
#           pos(-7965.96086752978,-5019.4733863405)   1.866150 1.36607  0.00 0.00                                       
#           pos(-7959.34359171906,-4693.06364429579)  1.866150 1.36607  0.00 0.00 0.00                                  
#           pos(-7960.45678662699,-4691.63535918108)  1.866150 1.36607  0.00 0.00 0.00 0.00                             
#           pos(-7096.61753807119,-4685.92422115392)  1.866150 1.36607  0.00 0.00 0.00 0.00 0.00                        
#           pos(-7992.73943895704,-4607.717759142)    1.866150 1.36607  0.00 0.00 0.00 0.00 0.00 0.00                   
#           pos(-7903.68384632242,-4579.4258128701)   1.866150 1.36607  0.00 0.00 0.00 0.00 0.00 0.00 0.00              
#           pos(16237.0609271069,-4578.0132445546)    1.866150 1.36607  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00         
#           pos(16304.594788628,-4572.36489569299)    1.866150 1.36607  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00    
#           ...                                      ...       ...      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
#  Residual                                           1.864305 1.36540                                                  
# Number of obs: 2361, groups:  const, 1
# 
# Dispersion estimate for gaussian family (sigma^2): 1.86 
# 
# Conditional model:
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept) -0.40936    0.08139  -5.029 4.92e-07 ***

head(confint(sp_eg1_exp_tmb), 10) 
#                                                               2.5 %     97.5 %   Estimate
# (Intercept)                                              -0.5688798 -0.2498315 -0.4093556
# Std.Dev.pos(16262.6644099893,-5204.51951303405)|const.1   1.2417049  1.5028930  1.3660708
# Std.Dev.pos(16280.4755285163,-5116.14629455727)|const.1   1.2417049  1.5028930  1.3660708
# Std.Dev.pos(-7965.96086752978,-5019.4733863405)|const.1   1.2417049  1.5028930  1.3660708
# Std.Dev.pos(-7959.34359171906,-4693.06364429579)|const.1  1.2417049  1.5028930  1.3660708
# Std.Dev.pos(-7960.45678662699,-4691.63535918108)|const.1  1.2417049  1.5028930  1.3660708
# Std.Dev.pos(-7096.61753807119,-4685.92422115392)|const.1  1.2417049  1.5028930  1.3660708
# Std.Dev.pos(-7992.73943895704,-4607.717759142)|const.1    1.2417049  1.5028930  1.3660708
# Std.Dev.pos(-7903.68384632242,-4579.4258128701)|const.1   1.2417049  1.5028930  1.3660708
# Std.Dev.pos(16237.0609271069,-4578.0132445546)|const.1    1.2417049  1.5028930  1.3660708

sigma(sp_eg1_exp_tmb) 
# 1.365396
sp_eg1_exp_tmb_varcor <- VarCorr(sp_eg1_exp_tmb)$cond
exp(sp_eg1_exp_tmb$fit$par)
#   betadisp (effect_id)     theta (tau/tau^2)      theta (rho)
#   1.3653956               1.3660708 (1.866149)   0.2476221

Each model estimates two key aspects of the spatial structure.

The first is the spatial variance (\(\rho^2\)), which quantifies how much of the between effect size variability is organised by geography. If this variance is close to zero, spatial structure is negligible. If it is large, nearby effect sizes tend to be much more similar than distant ones.

The second is the range of spatial correlation, which describes how quickly this similarity decays with distance. Short ranges imply that only very nearby locations are correlated, while long ranges indicate broad scale spatial trends.

It is important to note that metafor and brms use different parametrisations for this range. In metafor, correlation is expressed as \(exp(− \rho * distance)\), whereas in brms it is \(exp(− distance / lscale)\). When distances are measured in kilometres, these are approximately related by \(\rho \approx 1 / lscale\). The numerical values therefore should not be compared directly across packages.

Because range parameters are often weakly identified, their uncertainty is typically large. Interpretation should focus more on whether spatial structure exists and on its approximate scale rather than on the exact numeric estimate.

In all three implementations, sampling error is treated as known through the reported variances of Hedges’ d. In brms and glmmTMB, this is enforced explicitly by supplying the variance covariance matrix of the effect sizes, which ensures that within effect size uncertainty is not re estimated. In metafor, the same role is played by the vi argument.

All three models, metafor, brms, and glmmTMB, give a consistent output. There is strong spatial autocorrelation in effect sizes. Nearby locations tend to show much more similar Hedges’ d values than would be expected from sampling error alone. This means that a substantial fraction of the apparent heterogeneity in the dataset is structured geographically rather than being random noise. The estimated spatial range is large, on the order of hundreds of kilometres, indicating that the spatial pattern is driven by broad scale geographic gradients rather than by very local site effects. Although the exact range is uncertain, all models agree that spatial dependence extends over long distances. After accounting for this spatial structure, the overall mean effect (\(\beta_0\)) remains negative, around −0.3 to −0.4 across all three packges. This shows that the global negative effect is not an artefact of spatial clustering.

Figures

TBU

metafor

sp_eg1_gau_mf <- rma.mv(d_Hedges, var_Hedges, 
                         random = list(
                          ~ 1|effect_id,
                          ~ x_km + y_km |const
                          ), 
                        struct = "SPGAU", 
                        data = dat_Roger,
                        sparse = TRUE,
                        verbose = TRUE,
                        method = "REML",
                        test = "t"
                        )

summary(sp_eg1_gau_mf)
# Multivariate Meta-Analysis Model (k = 2361; method: REML)
# 
# logLik    Deviance         AIC         BIC        AICc   
# -4034.4772   8068.9544   8076.9544   8100.0201   8076.9714   
# 
# Variance Components:
#   
#   estim    sqrt  nlvls  fixed     factor 
# sigma^2    0.7934  0.8907   2361     no  effect_id 
# 
# outer factor: const        (nlvls = 1)
# inner term:   ~x_km + y_km (nlvls = 383)
# 
# estim    sqrt  fixed 
# tau^2      1.2308  1.1094     no 
# rho        0.0377             no 
# 
# Test for Heterogeneity:
#   Q(df = 2360) = 21273.5149, p-val < .0001
# 
# Model Results:
#   
#   estimate      se     tval    df    pval    ci.lb    ci.ub      
# -0.3340  0.0642  -5.2036  2360  <.0001  -0.4599  -0.2082  ***

brms

model is running -

dat_Roger$effect_id <- as.character(dat_Roger$effect_id)
vcv <- diag(dat_Roger$var_Hedges)

rownames(vcv) <- colnames(vcv) <- dat_Roger$effect_id

fit_sp_eg1_gau_brms <- bf(d_Hedges ~ 1 +
                            # (1|site_id) + # this is site-level random effect
                            (1|gr(effect_id, cov = vcv)) + # this is m (sampling error)
                            gp(x_km, y_km, 
                               cov = "exp_quad",
                               scale = FALSE)
)

# generate default priors
prior <- default_prior(fit_sp_eg1_gau_brms, 
                       data = dat_Roger, 
                       data2 = list(vcv = vcv),
                       family = gaussian())
prior$prior[5] = "constant(1)" # meta-analysis assumes sampling variance is known so fixing this to 1
prior

# fitting model
sp_eg1_gau_brms <- brm(
  formula = fit_sp_eg1_gau_brms,
  data = dat_Roger,
  data2 = list(vcv=vcv),
  chains = 2,
  iter = 6000,
  warmup = 3000,
  prior = prior,
  control = list(adapt_delta = 0.95, max_treedepth = 15)
)
summary(sp_eg1_gau_brms)

glmmTMB

VCV <- diag(dat_Roger$var_Hedges, nrow = nrow(dat_Roger)) 
rownames(VCV)<- colnames(VCV)<- dat_Roger$effect_id
VCV[1:5, 1:5]

dat_Roger$pos <- numFactor(dat_Roger$x_km, dat_Roger$y_km)

system.time(
  sp_eg1_gau_tmb <- glmmTMB(d_Hedges ~ 1 
                            + equalto(0+effect_id|const, VCV)
                            # + (1|site_id)
                            + exp(pos+0|const),
                            data = dat_Roger, 
                            REML=TRUE)
)
#   user  system elapsed 
# 45.570   1.369  47.530 

summary(sp_eg1_gau_tmb)　
# Family: gaussian  ( identity )
# Formula:          d_Hedges ~ 1 + equalto(0 + effect_id | const, VCV) + exp(pos +      0 | const)
# Data: dat_Roger
# 
# AIC       BIC    logLik -2*log(L)  df.resid 
# 8084.7    8107.8   -4038.4    8076.7      2357 
# 
# Random effects:
#   
#   Conditional model:
#   Groups   Name                                     Variance  Std.Dev. Corr                    
# const    effect_id1                                0.202208 0.44968                          
# effect_id2                                0.166676 0.40826  0.00                    
# effect_id3                                0.225770 0.47515  0.00 0.00               
# effect_id4                                0.197994 0.44497  0.00 0.00 0.00          
# effect_id5                                3.676968 1.91754  0.00 0.00 0.00 0.00     
# effect_id6                                0.315616 0.56180  0.00 0.00 0.00 0.00 0.00
# effect_id7                                0.305460 0.55268  0.00 0.00 0.00 0.00 0.00
# effect_id8                                0.313426 0.55984  0.00 0.00 0.00 0.00 0.00
# effect_id9                                0.413540 0.64307  0.00 0.00 0.00 0.00 0.00
# effect_id10                               0.325726 0.57072  0.00 0.00 0.00 0.00 0.00
# ...                                      ...       ...      ...  ...  ...  ...  ... 
# const.1  pos(16262.6644099893,-5204.51951303405)   1.232997 1.11040                          
# pos(16280.4755285163,-5116.14629455727)   1.232997 1.11040  0.00                    
# pos(-7965.96086752978,-5019.4733863405)   1.232997 1.11040  0.00 0.00               
# pos(-7959.34359171906,-4693.06364429579)  1.232997 1.11040  0.00 0.00 0.00          
# pos(-7960.45678662699,-4691.63535918108)  1.232997 1.11040  0.00 0.00 0.00 0.00     
# pos(-7096.61753807119,-4685.92422115392)  1.232997 1.11040  0.00 0.00 0.00 0.00 0.00
# pos(-7992.73943895704,-4607.717759142)    1.232997 1.11040  0.00 0.00 0.00 0.00 0.00
# pos(-7903.68384632242,-4579.4258128701)   1.232997 1.11040  0.00 0.00 0.00 0.00 0.00
# pos(16237.0609271069,-4578.0132445546)    1.232997 1.11040  0.00 0.00 0.00 0.00 0.00
# pos(16304.594788628,-4572.36489569299)    1.232997 1.11040  0.00 0.00 0.00 0.00 0.00
# ...                                      ...       ...      ...  ...  ...  ...  ... 
# Residual                                           0.793492 0.89078                          
# 
# 0.00                   
# 0.00 0.00              
# 0.00 0.00 0.00         
# 0.00 0.00 0.00 0.00    
# ...  ...  ...  ...  ...
# 
# 0.00                   
# 0.00 0.00              
# 0.00 0.00 0.00         
# 0.00 0.00 0.00 0.00    
# ...  ...  ...  ...  ...
# 
# Number of obs: 2361, groups:  const, 1
# 
# Dispersion estimate for gaussian family (sigma^2): 0.793 
# 
# Conditional model:
#   Estimate Std. Error z value Pr(>|z|)    
# (Intercept) -0.33493    0.06444  -5.197 2.02e-07 ***

sp_eg1_gau_tmb$fit$par
# betadisp      theta      theta 
# -0.1156562  0.1047238 -1.7463695 

confint(sp_eg1_gau_tmb)

TBU

2. Scholer et al. (2021)

We used the dataset accompanying Scholer et al. (2020), which was assembled for a meta-analysis of global avian survival across species and latitude. The dataset includes 949 estimates from 204 studies, with each row containing a survival estimate, its standard error, study and species identifiers, geographic coordinates, and several climatic and life-history variables. In our analyses, logit_survival was treated as the effect size (yi), and the known sampling variance was calculated as vi = se^2.

dat_scholer2020 <- read.csv(here("data","Scholer_2020", "Scholer_2020.csv"))

head(dat_scholer2020)
# logit_survival -> yi, 1/se -> vi^2

dat_scholer2020 <- dat_scholer2020 |> 
  mutate(vi = se^2)

dat_scholer2020$const <- 1 # add constant for spatial models
dat_scholer2020$effect_id <- seq_len(nrow(dat_scholer2020))

coords2 <- cbind(dat_scholer2020$long, dat_scholer2020$lat)

# project to a planar coordinate system and convert to kilometers
dat_sf2 <- st_as_sf(dat_scholer2020, coords = c("long", "lat"), crs = 4326)
dat_sf2_proj <- st_transform(dat_sf2, crs = 3857)

coords_m <- st_coordinates(dat_sf2_proj)
dat_scholer2020$x_km <- coords_m[,1] / 1000
dat_scholer2020$y_km <- coords_m[,2] / 1000

coords_km <- cbind(dat_scholer2020$x_km, dat_scholer2020$y_km)
dist_matrix_euclid2 <- as.matrix(dist(coords_km))

# row and column names
rownames(dist_matrix_euclid2) <- dat_scholer2020$effect_id
colnames(dist_matrix_euclid2) <- dat_scholer2020$effect_id

dat_scholer2020 <- dat_scholer2020 |>
  group_by(lat, long) |>
  mutate(site_id = cur_group_id()) |>
  ungroup()
names(dat_scholer2020)
dat_scholer2020 <- dat_scholer2020 |>
  mutate(site_id = as.factor(site_id),
         effect_id = as.factor(effect_id))
nrow(dat_scholer2020)
# [1] 949

cat_vars <- dat_scholer2020 |>
  dplyr::select(where( ~ is.factor(.x) || is.character(.x)))

n_levels <- cat_vars |> 
  dplyr::summarise(across(
    everything(),
    ~ dplyr::n_distinct(.x)
  )) |> 
  tidyr::pivot_longer(
    cols = everything(),
    names_to = "variable",
    values_to = "n_levels"
  )

n_levels
# A tibble: 15 × 2
# variable     n_levels
# <chr>           <int>
# 1 ref               205
# 2 order              15
# 3 species           636
# 4 common            636
# 5 tip_label         625
# 6 migration           2
# 7 sitename          207
# 8 country            46
# 9 hemisphere          2
# 10 passerine           2
# 11 realm               2
# 12 study_type          4
# 13 auk_used            2
# 14 geomean_used        9
# 15 effect_id         949
# 16 site_id           454

It looks multiple sampling locations and effect sizes per studies…

site_coords <- dat_scholer2020 |>
  distinct(site_id, x_km, y_km) |>
  arrange(site_id)

coords_site <- as.matrix(site_coords[, c("x_km", "y_km")])
dist_site <- as.matrix(dist(coords_site))

rownames(dist_site) <- as.character(site_coords$site_id)
colnames(dist_site) <- as.character(site_coords$site_id)
dist_site

dat_scholer2020$site_id <- as.character(dat_scholer2020$site_id)

We initially included site_id as a random effect, but the model yielded a zero estimate for it. As a result, we removed it from the following models.

metafor

sp_exp_metafor_eg2 <- rma.mv(
  yi = logit_survival,
  V  = vi,                      
  random = list(
    ~ 1 | effect_id,          
    ~ 1 | ref,                   
    ~ effect_id | const           
  ),
  struct = "SPEXP",
  data = dat_scholer2020,
  dist = list(site = dist_site), 
  method = "REML",
  test = "t",
  sparse = TRUE,
  verbose = TRUE
)

summary(sp_exp_metafor_eg2)
# Multivariate Meta-Analysis Model (k = 949; method: REML)
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed     factor 
# sigma^2.1  0.2255  0.4749    949     no  effect_id 
# sigma^2.2  0.4471  0.6687    205     no        ref 
# 
# outer factor: const    (nlvls = 1)
# inner term:   ~site_id (nlvls = 454)
# 
#               estim    sqrt  fixed 
# tau^2        0.0207  0.1438     no 
# rho        737.8164             no 
# 
# Test for Heterogeneity:
# Q(df = 948) = 974007.4599, p-val < .0001
# 
# Model Results:
# 
# estimate      se     tval   df    pval   ci.lb   ci.ub      
#   0.6540  0.0614  10.6455  948  <.0001  0.5334  0.7746  *** 

confint(ma_sp2_exp)
#           estimate  ci.lb  ci.ub 
# sigma^2.1   0.2255 0.2025 0.2519 
# sigma.1     0.4749 0.4500 0.5019 
# 
#           estimate  ci.lb  ci.ub 
# sigma^2.2   0.4471 0.3308 0.5947 
# sigma.2     0.6687 0.5751 0.7712 
# 
#       estimate  ci.lb    ci.ub 
# tau^2   0.0207 0.0000 >10.0000 
# tau     0.1438 0.0000  >3.1623 
# 
#     estimate    ci.lb      ci.ub 
# rho 737.8164 <73.7816 >7378.1637

brms

dat_scholer2020$ref_id <- as.integer(factor(dat_scholer2020$ref))
dat_scholer2020$effect_id <- as.character(dat_scholer2020$effect_id)
vcv <- diag(dat_scholer2020$vi)

rownames(vcv) <- colnames(vcv) <- dat_scholer2020$effect_id

fit_gp_brms <- bf(logit_survival ~ 1 +
                    (1|ref_id) + # this is site-level random effect
                    (1|gr(effect_id, cov = vcv)) + # this is m (sampling error)
                    gp(x_km, y_km, 
                       cov = "exponential",
                       scale = FALSE)
)

# generate default priors
prior <- default_prior(fit_gp_brms, 
                       data = dat_scholer2020, 
                       data2 = list(vcv = vcv),
                       family = gaussian())
prior$prior[5] = "constant(1)" # meta-analysis assumes sampling variance is known so fixing this to 1
prior

sp_exp_brms_eg2 <- brm(
        formula = fit_gp_brms,
        data = dat_scholer2020,
        data2 = list(vcv = vcv),
        prior = prior,
        iter = 6000, 
        warmup = 3000,  
        chains = num_chains,
        backend = "cmdstanr",
        threads = threading(threads_per_chain), 
        control = list(adapt_delta = 0.95, max_treedepth = 15)
        )

summary(sp_exp_brms_eg2)
# Family: gaussian 
#   Links: mu = identity 
# Formula: logit_survival ~ 1 + (1 | ref_id) + (1 | gr(effect_id, cov = vcv)) + gp(x_km, y_km, cov = "exponential", scale = FALSE) 
#    Data: dat_scholer2020 (Number of observations: 949) 
#   Draws: 2 chains, each with iter = 6000; warmup = 3000; thin = 1;
#          total post-warmup draws = 6000
# 
# Gaussian Process Hyperparameters:
#                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sdgp(gpx_kmy_km)       0.18      0.09     0.02     0.39 1.00      588      998
# lscale(gpx_kmy_km)  4062.02  11170.84   495.01 21405.11 1.00     1562     2126
# 
# Multilevel Hyperparameters:
# ~effect_id (Number of levels: 949) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     1.00      0.00     1.00     1.00   NA       NA       NA
# 
# ~ref_id (Number of levels: 205) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.67      0.05     0.57     0.77 1.00     1520     2637
# 
# Regression Coefficients:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     0.64      0.10     0.45     0.82 1.00     1854     2469
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.48      0.01     0.45     0.50 1.00     3887     4592
# 
# Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).

glmmTMB

TBU

The metafor and brms implementations produced very similar substantive results. In both models, the estimated intercept was about 0.65, with broadly overlapping uncertainty intervals. Among-study heterogeneity was also highly consistent across packages: metafor estimated a study-level variance of 0.45, while brms estimated standard deviation as 0.68 (var = 0.462). At the effect-size level, metafor estimated a variance of 0.23. In the analogous brms model, the residual standard deviation (sigma) was 0.48, giving a closely comparable estimate of effect-size-level variation.

The spatial term was smaller in magnitude, with metafor estimating \(\tau^2\) and brms estimating sdgp at about 0.18 (sdgp^2 = 0.032). However, the spatial range parameter was very imprecisely estimated in both implementations, suggesting limited information about the exact distance scale of spatial autocorrelation.

This implies that average adult survival was positive on the logit scale, while also varying appreciably among studies and locations. The spatial component suggests that geographically proximate populations may show somewhat similar survival values, although the exact scale of that dependence is not estimated very precisely in this example.

metafor

sp_gau_metafor_eg2 <- rma.mv(
  yi = logit_survival,
  V  = vi,                      
  random = list(
    ~ 1 | effect_id,          
    ~ 1 | ref,                   
    ~ site_id | const           
  ),
  struct = "SPGAU",
  data = dat_scholer2020,
  dist = list(site = dist_site), 
  method = "REML",
  test = "t",
  sparse = TRUE,
  verbose = TRUE
)

summary(sp_gau_metafor_eg2)
# Multivariate Meta-Analysis Model (k = 949; method: REML)
# 
#    logLik   Deviance        AIC        BIC       AICc   
# -824.5113  1649.0225  1659.0225  1683.2943  1659.0862   
# 
# Variance Components:
# 
#             estim    sqrt  nlvls  fixed     factor 
# sigma^2.1  0.2297  0.4793    949     no  effect_id 
# sigma^2.2  0.4578  0.6766    205     no        ref 
# 
# outer factor: const    (nlvls = 1)
# inner term:   ~site_id (nlvls = 454)
# 
#                estim    sqrt  fixed 
# tau^2         0.0202  0.1420     no 
# rho        4713.8765             no 
# 
# Test for Heterogeneity:
# Q(df = 948) = 974007.4599, p-val < .0001
# 
# Model Results:
# 
# estimate      se    tval   df    pval   ci.lb   ci.ub      
#   0.6466  0.0830  7.7876  948  <.0001  0.4837  0.8096  *** 

confint(sp_gau_metafor_eg2)
#           estimate  ci.lb  ci.ub 
# sigma^2.1   0.2297 0.2069 0.2559 
# sigma.1     0.4793 0.4548 0.5059 
# 
#           estimate  ci.lb  ci.ub 
# sigma^2.2   0.4578 0.3435 0.6054 
# sigma.2     0.6766 0.5861 0.7781 
# 
#       estimate  ci.lb    ci.ub 
# tau^2   0.0202 0.0000 >10.0000 
# tau     0.1420 0.0000  >3.1623 
# 
#      estimate     ci.lb       ci.ub 
# rho 4713.8765 <471.3877 >47138.7654

brms

fit_gp_brms <- bf(logit_survival ~ 1 +
                    (1|ref) + # this is site-level random effect
                    (1|gr(effect_id, cov = vcv)) + # this is m (sampling error)
                    gp(x_km, y_km, 
                       cov = "exp_quad",
                       scale = FALSE)
)

# generate default priors
prior <- default_prior(fit_gp_brms, 
                       data = dat_scholer2020, 
                       data2 = list(vcv = vcv),
                       family = gaussian())
prior$prior[5] = "constant(1)" # meta-analysis assumes sampling variance is known so fixing this to 1
prior

sp_gau_brms_eg2 <- brm(
  formula = fit_gp_brms,
  data = dat_scholer2020,
  data2 = list(vcv = vcv),
  prior = prior,
  iter = 8000, 
  warmup = 4000,  
  chains = num_chains,
  backend = "cmdstanr",
  threads = threading(threads_per_chain), 
  control = list(adapt_delta = 0.95, max_treedepth = 15)
)

summary(sp_gau_brms_eg2)
#  Family: gaussian 
#   Links: mu = identity 
# Formula: logit_survival ~ 1 + (1 | ref) + (1 | gr(effect_id, cov = vcv)) + gp(x_km, y_km, cov = "exp_quad", scale = FALSE) 
#    Data: dat_scholer2020 (Number of observations: 949) 
#   Draws: 2 chains, each with iter = 8000; warmup = 4000; thin = 1;
#          total post-warmup draws = 8000
# 
# Gaussian Process Hyperparameters:
#                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sdgp(gpx_kmy_km)       0.16      0.11     0.02     0.42 1.00      836      969
# lscale(gpx_kmy_km)  3465.74  13257.32   365.32 19298.85 1.00      242      525
# 
# Multilevel Hyperparameters:
# ~effect_id (Number of levels: 949) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     1.00      0.00     1.00     1.00   NA       NA       NA
# 
# ~ref (Number of levels: 205) 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(Intercept)     0.68      0.05     0.58     0.78 1.00     2069     3664
# 
# Regression Coefficients:
#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept     0.65      0.10     0.47     0.82 1.00     2347     1774
# 
# Further Distributional Parameters:
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.48      0.01     0.45     0.51 1.00     4263     5271
# 
# Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).

glmmTMB

TBU

The Gaussian-process kernel gave us the identical results to exponential kernel model except for rho and lscale. The metafor and brms implementations gave very similar substantive results. In both models, the estimated intercept was about 0.65, with broadly overlapping 95% CI and CrI. Variation among individual effect sizes was estimated at approximately 0.48 on the standard deviation scale, and between-study heterogeneity was also nearly identical across packages at about 0.68. The spatial term was smaller in magnitude, with metafor estimating and brms estimating the Gaussian process standard deviation at about 0.16 (sdgp^2 = 0.026). However, the distance-scale parameter of spatial correlation remained weakly identified, with wide uncertainty in both implementations.

TBU

Software and package versions

R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.7.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Edmonton
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] phytools_2.5-2      maps_3.4.3          gtools_3.9.5       
 [4] sf_1.0-24           rotl_3.1.0          glmmTMB_1.1.14     
 [7] janitor_2.2.1       magrittr_2.0.4      orchaRd_2.1.3      
[10] posterior_1.6.1     tidybayes_3.0.7     bayesplot_1.15.0   
[13] MCMCglmm_2.36       coda_0.19-4.1       patchwork_1.3.2    
[16] ape_5.8-1           here_1.0.2          crayon_1.5.3       
[19] lubridate_1.9.4     forcats_1.0.1       stringr_1.6.0      
[22] dplyr_1.1.4         purrr_1.2.0         readr_2.1.6        
[25] tidyr_1.3.2         tibble_3.3.1        ggplot2_4.0.2      
[28] tidyverse_2.0.0     metafor_4.8-0       numDeriv_2016.8-1.1
[31] metadat_1.4-0       Matrix_1.7-4        brms_2.23.0        
[34] Rcpp_1.1.1         

loaded via a namespace (and not attached):
  [1] mathjaxr_2.0-0          RColorBrewer_1.1-3      tensorA_0.36.2.1       
  [4] rstudioapi_0.17.1       jsonlite_2.0.0          estimability_1.5.1     
  [7] farver_2.1.2            nloptr_2.2.1            rmarkdown_2.30         
 [10] vctrs_0.7.1             minqa_1.2.8             htmltools_0.5.9        
 [13] progress_1.2.3          distributional_0.5.0    curl_7.0.0             
 [16] DEoptim_2.2-8           KernSmooth_2.23-26      htmlwidgets_1.6.4      
 [19] sandwich_3.1-1          emmeans_2.0.1           zoo_1.8-15             
 [22] TMB_1.9.19              igraph_2.2.1            lifecycle_1.0.5        
 [25] iterators_1.0.14        pkgconfig_2.0.3         R6_2.6.1               
 [28] fastmap_1.2.0           rbibutils_2.4           snakecase_0.11.1       
 [31] digest_0.6.39           rprojroot_2.1.1         clusterGeneration_1.3.8
 [34] timechange_0.3.0        httr_1.4.7              abind_1.4-8            
 [37] mgcv_1.9-3              compiler_4.5.2          proxy_0.4-29           
 [40] withr_3.0.2             doParallel_1.0.17       S7_0.2.1               
 [43] backports_1.5.0         optimParallel_1.0-2     DBI_1.2.3              
 [46] MASS_7.3-65             scatterplot3d_0.3-44    classInt_0.4-11        
 [49] corpcor_1.6.10          loo_2.9.0               tools_4.5.2            
 [52] units_1.0-0             rncl_0.8.8              otel_0.2.0             
 [55] rentrez_1.2.4           quadprog_1.5-8          glue_1.8.0             
 [58] nlme_3.1-168            grid_4.5.2              checkmate_2.3.3        
 [61] generics_0.1.4          gtable_0.3.6            tzdb_0.5.0             
 [64] class_7.3-23            hms_1.1.4               foreach_1.5.2          
 [67] pillar_1.11.1           ggdist_3.3.3            splines_4.5.2          
 [70] lattice_0.22-7          tidyselect_1.2.1        knitr_1.51             
 [73] reformulas_0.4.3.1      arrayhelpers_1.1-0      xfun_0.55              
 [76] expm_1.0-0              bridgesampling_1.2-1    matrixStats_1.5.0      
 [79] stringi_1.8.7           yaml_2.3.12             pacman_0.5.1           
 [82] boot_1.3-32             evaluate_1.0.5          codetools_0.2-20       
 [85] cli_3.6.5               RcppParallel_5.1.11-1   xtable_1.8-4           
 [88] Rdpack_2.6.4            svUnit_1.0.8            XML_3.99-0.20          
 [91] parallel_4.5.2          rstantools_2.5.0        prettyunits_1.2.0      
 [94] cubature_2.1.4-1        Brobdingnag_1.2-9       phangorn_2.12.1        
 [97] lme4_1.1-38             mvtnorm_1.3-3           scales_1.4.0           
[100] e1071_1.7-17            combinat_0.0-8          rlang_1.1.7            
[103] fastmatch_1.1-6         mnormt_2.1.1

References

Jetz, W., G. H. Thomas, J. B. Joy, K. Hartmann, and A. O. Mooers. (2012). The global diversity of birds in space and time. Nature. 491:444-448. https://doi.org/10.1038/nature11631
Grafen A. (1989). The phylogenetic regression. Philosophical Transactions of the Royal Society of London. B, Biological Sciences. 326:119-157. https://doi.org/10.1098/rstb.1989.0106
Grau-Andrés, R., Moreira, B., & Pausas, J. G. (2024). Global plant responses to intensified fire regimes. Global Ecology and Biogeography. 33:e13858. https://doi.org/10.1111/geb.13858
Grenié M, Berti E, Carvajal‐Quintero J, Dädlow GM, Sagouis A, Winter M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods in Ecology and Evolution. 14:12-25. https://doi.org/10.1111/2041-210X.13802
Lim, J. N., Senior, A. M., & Nakagawa, S. (2014). Heterogeneity in individual quality and reproductive trade-offs within species. Evolution. 68:2306–2318. https://doi.org/10.1111/evo.12446
Nakagawa, S., Lagisz, M., O’Dea, R. E., Pottier, P., Rutkowska, J., Senior, A. M., Yang, Y., & Noble, D. W. A. (2023). orchaRd 2.0: An R package for visualising meta-analyses with orchard plots. Methods in Ecology and Evolution. 14:2003–2010. https://doi.org/10.1111/2041-210X.14152
Scholer, M. N., Strimas‐Mackey, M., & Jankowski, J. E. (2020). A meta‐analysis of global avian survival across species and latitude. Ecology Letters. 23:1537-1549. https://doi.org/10.1111/ele.13573
Viechtbauer, W., White, T., Noble, D., Senior, A., & Hamilton, W. K. (2025). metadat: Meta-analysis datasets (Version 1.5-2) R package. https://github.com/wviechtb/metadat
Rios Moura, R., Oliveira Gonzaga, M., Silva Pinto, N., Vasconcellos-Neto, J., & Requena, G. S. (2021). Assortative mating in space and time: Patterns and biases. Ecology Letters. 24:1089–1102. https://doi.org/10.1111/ele.13690