R/rmm.R
rmm.Rd
The rmm package provides an interface to fit Bayesian multiple membership multilevel models with endogenized weights using JAGS.
rmm(
formula,
family = "Gaussian",
priors = NULL,
inits = NULL,
iter = 1000,
burnin = 100,
chains = 3,
seed = NULL,
run = T,
parallel = F,
monitor = T,
hdi = F,
r = 4,
transform = "center",
modelfile = F,
data = NULL
)
A symbolic description of the model in form of an R formula. More details below.
Character vector. Currently supported are "Gaussian", "Logit", "Condlogit", "Weibull", or "Cox".
A list with parameter names as tags and their prior specification as values. More details below.
A list with parameter as tags and their initial values as values. This list will be used in all chains. If NULL, JAGS and rmm select appropriate inits.
Total number of iterations.
Number of iterations that will be discarded .
Number of chains.
A random number.
A logical value (True or False) indicating whether JAGS should estimate the model.
A logical value (True or False). If True
, weights, random effects, predictions, and JAGS output is saved as well.
Numeric or False. If confidence level x
is specified (e.g. x=0.95), mode, standard deviation, and (x*100)% HDI are given. If False
is specified, mean, standard deviation, and 95% CI are given.
Numeric. Rounding value. Default is 3.
Character vector or FALSE. Specifying center
or std
to center or standardize continuous predictors before estimation. Specifying std2
will divide by two times the standard deviation, so that regression coefficients are comparable to those of binary predictors (Gelman 2008).
Character vector or TRUE|False. If TRUE, the JAGS model is saved in rmm/temp/modelstring.txt. If a file path is supplied as string, rmm will just create the data structure and use the provided modelfile.
Dataframe object. The dataset must have level 1 as unit of analysis. More details below.
A list with 7 elements: reg.table, w, re.l1, re.l3, pred, input, jags.out. If monitor=F, only the regression table is returned. If monitor=T, the predicted weights, level-1 random effects (if specified in the model), level-3 random effects (if specified in the model), predicted values of the dependent variable, and the internally created variables are returned. The last element of the list is the unformatted Jags output.
The main function of rmm is rmm
, which uses formula syntax to specify a multiple membership
multilevel model with endogenized weights. Based on the supplied formulas, data, and additional information,
it writes code to fit the model in JAGS. Subsequently, the JAGS output is processed to ease the
interpretation of the model results.
In order to fit the models, JAGS must be installed.
The package rmm estimates models with a complex nonstandard multilevel structure, known
as multiple membership multilevel structure. The difference of this package to other packages and programs
to estimate multiple membership multilevel models, such as brms
or MLwiN,
is that rmm allows to endogenize the membership weights with a weight function. In doing so, rmm allows
to examine the process by which the effects of lower-level units aggregate to a higher level (micro-macro link).
Accessible introductions to multiple membership models are given by the report by Fielding and Goldstein (2006) and the book chapter by Beretvas (2010). More advanced treatments of multiple membership models are provided in the multilevel textbook by Goldstein (2011, Chapter 13), the book chapters on multiple membership models by Rasbash and Browne (2001, 2008), the paper by Browne et al. (2001), and the report by Leckie (2013).
General formula structure
Y ~ 1 + mm(id(l1id, l2id), mmc(X.L1), mmw(w ~ 1 / offset(N), constraint=1)) + X.L2 + X.L3 + hm(id=l3id, name=l3name, type=FE, showFE=F)
Dependent variable: Y
Multiple membership object: mm() to analyze how the effects of level-1 predictors from multiple constituting members aggregate to level 2
Level-2 predictors: X.L2, being something like X1 + ... + XN
Level-3 predictors: X.L3, being something like X1 + ... + XN
Hierarchical membership object: hm() to recognize that level 2 units embedded in a third level
Currently supported dependent variables / link functions
Gaussian continuous variable Y
Binomial outcome for logistic regression Y
Conditional logistic outcomes ???
Weibull survival time: Surv(survivaltime, event)
or Surv(survivaltime, event, upperlimit)
(where upperlimit is the upper limit for predictions(!))
Cox survival time: Surv(survivaltime, event)
Vector of level-2 predictors
An intercept at the main level 2 is added whether or not a 1
is specified in the beginning. Interaction terms have to be included as separate variables.
Multiple membership object mm()
id()
to indicate level-1 and level-2 ids
mmc()
to specify level-1 predictors. No intercept allowed. Interaction terms have to be included as separate variables.
mmw()
to specify the weight function (micro-macro link). The function can be nonlinear and contain variables but needs to be identifiable.
To give a few examples: w ~ 1/offset(N)
specifies mean-aggregation, with N being a variables that indicates the number of level-1 units per level-2 entity.
If no mmw() is specified, w ~ 1/offset(N)
is assumed. In Rosche (2021), I propose to use w ~ 1/offset(N)^exp(-(X.W))
as the general form for weight functions.
This function ensures that weights are limited by 0 and 1. Specifying variables as offset(X.W)
will not estimate a parameter for this variable.
Two different identification restrictions are provided:
mmw(w ~ ..., constraint=1)
: constraint=1
restricts the weights to sum to 1 for each level-2 entity. (default)
mmw(w ~ ..., constraint=2)
: constraint=2
restricts the weights to sum to the total number of level-2 entities over the whole dataset, allowing some level-2 entities to have weights smaller/larger than 1.
mmw(w ~ ..., ar=TRUE)
: Allows random effects of level-1 units to change across memberships in level-2 entities.
mmw(w ~ ..., ar=FALSE)
: Assumes all level-1 units to have one random effect
Hierachical membership object hm()
id=l3id
to indicate level-3 id
name=l3name
to specify value labels for level 3 units.
type=RE
(default) or type=FE
to choose between random- or fixed effect estimation.
If RE is chosen, level 3 predictors can be added. If FE is chosen, each level 3 unit has its own intercept and level 3 predictors are removed.
If showFE=TRUE
the fixed effects are reported, otherwise omitted (default). The first l3id is the base.
More details on changing priors
Priors of the following parameters may be changed: b.l1, b.l2, b.l3, b.w, tau.l1, tau.l2, tau.l3
.
The priors are specified as a list with parameter names as tags and their prior specification as values:
priors=list("b.l1"="dnorm(0,0.01)")
. In this example, the priors of all level-1 regression coefficients
are changed to a more informative prior that has a smaller variance than the default (dnorm(0,0.0001)
).
I refer to the JAGS manual
for more details on possible prior specifications.
More details on the weight function
...
More details on constructing the data
...
Tips
Error in update.jags(model, n.iter, ...) : Error in node w[1285] Invalid parent values
The weight function must be designed such that the distribution of weights is in
line with the priors for all other parameters. This error could, for instance, be caused if weights can be negative but negative weights cause the distribution of other parameters to be outside of the distribution of their priors.
Carefully designing the weight function so that it is properly bounded may therefore help. Specifying transform="std"
may help as well.
Including weight regressors demands a lot from your data It is therefore a good idea to start with slightly more informative priors.
I suggest starting with priors = list("b.w"="dnorm(0,0.1)"
and then increasing the variance step by step.
...
Rosche, B. (2021). A multilevel model for coalition governments: Uncovering dependencies within and across governments due to parties. https://doi.org/10.31235/osf.io/4bafr
data(coalgov)
m1 <- rmm(Surv(govdur, earlyterm, govmaxdur) ~ 1 + mm(id(pid, gid), mmc(fdep), mmw(w ~ 1/offset(n), constraint=1)) + majority + hm(id=cid, name=cname, type=RE, showFE=F),
family="Weibull", monitor=T, data=coalgov)
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 983
#> Unobserved stochastic nodes: 843
#> Total graph size: 15792
#>
#> Initializing model
#>
#> Error in jags.samples(model, variable.names, n.iter, thin, type = "trace", ...): thin must be a positive integer
m1$reg.table # the regression output
#> Error in eval(expr, envir, enclos): object 'm1' not found
m1$w # the estimated weights
#> Error in eval(expr, envir, enclos): object 'm1' not found
m1$re.l1 # the level-1 random effects
#> Error in eval(expr, envir, enclos): object 'm1' not found
m1$re.l3 # the level-3 random effects
#> Error in eval(expr, envir, enclos): object 'm1' not found
m1$pred # posterior predictions of the dependent variable (linear predictor for \code{family="Gaussian"}, survival time for \code{family="Weibull"})
#> Error in eval(expr, envir, enclos): object 'm1' not found
m1$input # internal variables
#> Error in eval(expr, envir, enclos): object 'm1' not found
jags.out <- m1$jags.out # JAGS output
#> Error in eval(expr, envir, enclos): object 'm1' not found
monetPlot(m1, "b.l1") # monetPlot to inspect the posterior distribution of the model parameters
#> Error in eval(expr, envir, enclos): object 'm1' not found