| Title: | Benchmark, Expand, and Calibrate (BenchExCal) Trial Emulation Tools |
|---|---|
| Description: | Lightweight tools for evaluating real-world evidence (RWE) studies that emulate randomized clinical trials (RCTs). Provides (1) computation of the three pre-specified RCT-DUPLICATE / ENCORE agreement metrics -- statistical significance agreement (SA), estimate agreement (EA), and standardized difference agreement (SD) -- and (2) the Benchmark, Expand, and Calibrate (BenchExCal) calibration of a Stage 2 RWE study using the divergence observed in a Stage 1 RCT-RWE pair, plus tipping-point sensitivity analysis and forest plots. Methods follow Wang et al. JAMA 2023;329:1376 and Wang et al. CPT 2025;117:1820. |
| Authors: | Xiangzhong Xue [aut, cre] (Postdoctoral fellow, BWH/HMS) |
| Maintainer: | Xiangzhong Xue <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-18 02:50:29 UTC |
| Source: | https://github.com/xxue064/benchexcal |
For an RCT-database study pair (or RCT vs. emulation), computes the three pre-specified binary agreement metrics used in RCT-DUPLICATE (Wang et al., JAMA 2023) and ENCORE (Weberpals et al., CPT 2026): statistical significance agreement (SA), estimate agreement (EA), and standardized difference / SMD agreement (SD).
agreement( rct_hr, rct_lower, rct_upper, rwe_hr, rwe_lower, rwe_upper, trial_type = c("superiority", "noninferiority"), ni_margin = NULL )agreement( rct_hr, rct_lower, rct_upper, rwe_hr, rwe_lower, rwe_upper, trial_type = c("superiority", "noninferiority"), ni_margin = NULL )
rct_hr |
Numeric. Point estimate (HR) from the RCT. |
rct_lower, rct_upper
|
Numeric. 95% CI bounds for the RCT HR. |
rwe_hr |
Numeric. Point estimate (HR) from the database study. |
rwe_lower, rwe_upper
|
Numeric. 95% CI bounds for the RWE HR. |
trial_type |
One of |
ni_margin |
Numeric on the HR scale (e.g. 1.3). The non-inferiority
margin defined by the trial protocol. Only used when
|
Metric definitions follow Wang et al. (JAMA 2023):
SA (statistical significance agreement): TRUE when both point estimates AND both 95% CIs sit on the same side of the null (HR = 1). For non-inferiority trials, partial SA (SAP) is flagged if the database upper bound lies below the NI margin, even when the RCT was not powered for superiority.
EA (estimate agreement): TRUE when the database point estimate falls inside the RCT 95% CI.
SD (standardized difference): TRUE when |z| < 1.96, where
and each variance is derived from its 95% CI via SE = (log(UL) - log(LL)) / (2 * 1.96).
An object of class bx_agreement (a list). Use
print() or summary() for formatted output.
Wang SV, Schneeweiss S, RCT-DUPLICATE Initiative. Emulation of randomized clinical trials with nonrandomized database analyses: results of 32 clinical trials. JAMA. 2023;329(16):1376-1385.
Weberpals J, Schneeweiss S, et al. Emulating Comparative Oncology Trials With Real-World Evidence Studies (ENCORE). Clin Pharmacol Ther. 2026.
# LEADER trial (Wang et al., JAMA 2023, Table 1, study #1) agreement( rct_hr = 0.87, rct_lower = 0.78, rct_upper = 0.97, rwe_hr = 0.82, rwe_lower = 0.76, rwe_upper = 0.87 ) # PRONOUNCE (noninferiority, NI margin 1.225) agreement( rct_hr = 1.28, rct_lower = 0.59, rct_upper = 2.79, rwe_hr = 1.35, rwe_lower = 0.94, rwe_upper = 1.93, trial_type = "noninferiority", ni_margin = 1.225 )# LEADER trial (Wang et al., JAMA 2023, Table 1, study #1) agreement( rct_hr = 0.87, rct_lower = 0.78, rct_upper = 0.97, rwe_hr = 0.82, rwe_lower = 0.76, rwe_upper = 0.87 ) # PRONOUNCE (noninferiority, NI margin 1.225) agreement( rct_hr = 1.28, rct_lower = 0.59, rct_upper = 2.79, rwe_hr = 1.35, rwe_lower = 0.94, rwe_upper = 1.93, trial_type = "noninferiority", ni_margin = 1.225 )
Implements the Benchmark, Expand, Calibrate (BenchExCal) framework (Wang et al., Clin Pharmacol Ther 2025): uses the divergence observed between a Stage 1 RCT and its database emulation to calibrate the point estimate and 95% CI of a Stage 2 database study designed to inform a supplemental indication (no Stage 2 RCT yet).
calibrate( stage1_rct_hr, stage1_rct_lower, stage1_rct_upper, stage1_rwe_hr, stage1_rwe_lower, stage1_rwe_upper, stage2_rwe_hr, stage2_rwe_lower, stage2_rwe_upper, check_benchmark = TRUE )calibrate( stage1_rct_hr, stage1_rct_lower, stage1_rct_upper, stage1_rwe_hr, stage1_rwe_lower, stage1_rwe_upper, stage2_rwe_hr, stage2_rwe_lower, stage2_rwe_upper, check_benchmark = TRUE )
stage1_rct_hr, stage1_rct_lower, stage1_rct_upper
|
Stage 1 RCT HR and 95% CI. |
stage1_rwe_hr, stage1_rwe_lower, stage1_rwe_upper
|
Stage 1 database study HR and 95% CI (emulation of the Stage 1 RCT). |
stage2_rwe_hr, stage2_rwe_lower, stage2_rwe_upper
|
Stage 2 database study HR and 95% CI (the supplemental indication). |
check_benchmark |
Logical. If |
Let , , and
denote the Stage 1 RCT, Stage 1 RWE, and Stage 2 RWE log hazard ratios,
with variances . The Stage 1 divergence is
Following Wang et al. (CPT 2025), the rescaled Stage 2 divergence is
and the calibrated Stage 2 log HR is
with variance inflated by the Stage 1 divergence uncertainty,
Sign convention. The paper writes the mean as
. Because
(RWE minus RCT) is defined as the systematic bias, bias correction is a
subtraction; this implementation uses
. If you would rather match the paper's
formula literally, negate the returned xi_hat_2 before applying.
An object of class bx_calibration.
Wang SV, Russo M, Glynn RJ, et al. A Benchmark, Expand, and Calibration (BenchExCal) Trial Emulation Approach for Using Real-World Evidence to Support Indication Expansions. Clin Pharmacol Ther. 2025;117(6):1820-1828.
# Stage 1: LEADER liraglutide (RCT-DUPLICATE) # Stage 2: hypothetical expanded indication, RWE only cal <- calibrate( stage1_rct_hr = 0.87, stage1_rct_lower = 0.78, stage1_rct_upper = 0.97, stage1_rwe_hr = 0.82, stage1_rwe_lower = 0.76, stage1_rwe_upper = 0.87, stage2_rwe_hr = 0.85, stage2_rwe_lower = 0.78, stage2_rwe_upper = 0.93 ) print(cal)# Stage 1: LEADER liraglutide (RCT-DUPLICATE) # Stage 2: hypothetical expanded indication, RWE only cal <- calibrate( stage1_rct_hr = 0.87, stage1_rct_lower = 0.78, stage1_rct_upper = 0.97, stage1_rwe_hr = 0.82, stage1_rwe_lower = 0.76, stage1_rwe_upper = 0.87, stage2_rwe_hr = 0.85, stage2_rwe_lower = 0.78, stage2_rwe_upper = 0.93 ) print(cal)
A quick reference that prints what you need to supply to
calibrate. Call this if you forget which six (or nine) HR/CI
values calibrate() expects.
calibration_inputs()calibration_inputs()
Invisibly returns NULL. Called for the printed side effect.
calibration_inputs()calibration_inputs()
Plots the Stage 1 RCT, Stage 1 RWE, Stage 2 RWE (uncalibrated), and Stage 2 RWE (calibrated) HRs on a single log-scale forest plot.
## S3 method for class 'bx_calibration' plot(x, ...)## S3 method for class 'bx_calibration' plot(x, ...)
x |
A |
... |
Currently unused. |
A ggplot object.
Visualises how the Stage 2 calibrated HR (with CI) shifts as a function
of the assumed Stage 2 divergence .
## S3 method for class 'bx_tipping' plot(x, ...)## S3 method for class 'bx_tipping' plot(x, ...)
x |
A |
... |
Currently unused. |
A ggplot object.
Effect estimates (pooled, adjusted) from selected trials in Wang et al.
(JAMA 2023), useful for testing and illustrating
agreement and calibrate.
data(rct_duplicate)data(rct_duplicate)
A data frame with the following columns:
Trial acronym.
RCT HR and 95% CI.
Database study (adjusted, pooled) HR and 95% CI.
"superiority" or "noninferiority".
NI margin on the HR scale, or NA.
Wang SV et al., JAMA 2023;329:1376-1385, Table 1.
Sweeps over a range of plausible Stage 2 divergence values
(by default from the 5th to the 95th percentile of the
BenchExCal prior on ) and reports the calibrated Stage 2 HR
at each point. The tipping point is the value of at
which the calibrated 95% CI just crosses the decision threshold
(HR = 1 by default).
tipping_point(x, n_grid = 200, quantile_range = c(0.05, 0.95), threshold = 1)tipping_point(x, n_grid = 200, quantile_range = c(0.05, 0.95), threshold = 1)
x |
A |
n_grid |
Integer. Number of grid points to evaluate (default 200). |
quantile_range |
Length-2 numeric. Quantile range of the
|
threshold |
Numeric on the HR scale. Decision threshold for the
tipping point (default |
At each grid value of , the function computes
and constructs a 95% CI using the uncalibrated Stage 2 standard error
(the prior uncertainty in is already represented by the
breadth of the scan itself, so additional variance inflation would
double-count). The first grid point at which the relevant CI bound
crosses the threshold is returned as the tipping point.
An object of class bx_tipping.