Comparisons of different propensity score methods in a multilevel framework: implications for cluster-based program evaluation
Loading...
Files
Date
2022-11-07
Authors
Liu, Kun
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Background: Propensity score (PS) methods have been used to minimize bias in an
observational or experimental study in which participants are not randomly assigned to treatment
conditions to infer causal effects. The conventional PS methods were developed for independent
sampling or non-nested data. However, in health, psychology, organizational sciences, and
education area, data collected are often with multilevel or hierarchical structure. In cluster-based
intervention programs where clusters are treated as the unit of assignment, each cluster has its
own probability (PS) of being assigned to the treatment group, and this probability is associated
with factors at both individual and cluster levels. There is lack of both methodology and
empirical research on the use of PS methods to estimate the treatment effect with multilevel data
from cluster-based programs.
Objectives: The objectives of this study are, (i) to compare the performance of PS models and
PS conditioning methods in reproducing the treatment effect estimates with multilevel data from
cluster-based programs; (ii) to examine the impact of different PS methods on the evaluation of a
school-based mental health prevention program and investigate the implications of different PS
methods in program evaluations.
Methods: Using Monte Carlo simulations, we examined the appropriateness of using PS
methods to reproduce treatment effect estimates in cluster-based programs. The data simulations
incorporated a clustered observational study (COS) design with treatment assignment at the
cluster level. The design factors in the simulation study included: cluster size, number of
clusters, intra-class correlation (ICC), as well as the treatment effect size. Specifically, this study
compared two different PS models and four different PS conditioning methods across different
simulation scenarios in terms of these design factors. The first PS model disaggregates clusterlevel
covariates to individual level and uses a logistic regression at individual level to estimate
PSs for individuals, and the second PS model aggregates lower-level covariates to cluster level
and performs a logistic regression at cluster level. Four different conditioning techniques
(covariate adjustment, stratification, weighting, and matching) were combined with each of the
two PS models to estimate the average treatment effect (ATE) or the average treatment effect on
the treated (ATT). The performance of these PS methods was examined using relative bias, mean
squared error (MSE) and 95% CI coverage in data simulation under different situations. We also
applied different PS methods to the evaluation of a real mental health prevention program, PAX
Good Behavior Game (PAX). The impact of different methods on PAX evaluation was
illustrated using three-level multilevel regression combined with PS methods.
Results: The results of our simulation study suggest that the performance of PS analyses depends
on the PS estimation model (i.e., individual level PS model vs. cluster level PS model) and
conditional strategies (i.e., matching, stratification, covariate adjustment, weighting), as well as
other factors including number of clusters and ICC. Overall, the individual PS model worked
better than the cluster PS model when combined with the same conditional method; and PSbased
methods generated less biased and more stable estimates when the number of clusters is
large. In terms of conditional methods, covariate adjustment (adjusting on PS score) and
weighting produced less biased and more stable estimates than stratification when estimating
ATE, and weighting and stratification produced more reliable estimates than matching when
estimating ATT. When the number of clusters (e.g., school) is large, the differences among
different PS method on program effect size estimation are minimal. This was revealed by
application of PS methods to PAX program data analyses. However, using the PS methods
improved the imbalance at both individual and cluster levels.
Conclusions and significance: In evaluation of cluster-based programs with treatment assigned
at cluster level, it is important to consider the potential bias due to imbalance at both individual
and cluster levels among these treatment arms. The PS-based methods have the potential to
reduce the imbalance and produce more accurate estimates of treatment effects. Overall, the
individual level PS models fared slightly better than the cluster level PS models. The impact of
different conditional PS techniques might depend on many factors such as ICC, sample sizes at
each level and covariates information. Our results provide guidance for practitioners who
implement group-based interventions.
Description
Keywords
Cluster-randomized controlled trial (CRCT), Clustered observational study (COS), Propensity score models, Propensity score conditioning methods, Simulation