--- title: "Getting started with benchexcal" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with benchexcal} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` The **benchexcal** package gives you two things: 1. `agreement()` — compute the three RCT-DUPLICATE / ENCORE agreement metrics (SA, EA, SD) between an RCT and a real-world emulation; 2. `calibrate()` — apply the BenchExCal (Wang et al., CPT 2025) calibration to a Stage 2 RWE estimate, using the divergence observed in a Stage 1 RCT-RWE pair as prior information. Plus tipping-point sensitivity analysis and forest plots. ```{r} library(benchexcal) ``` ## 1. Agreement metrics The three metrics from Wang et al. (JAMA 2023). Reproduce LEADER (study #1 in Table 1): ```{r} agreement( rct_hr = 0.87, rct_lower = 0.78, rct_upper = 0.97, rwe_hr = 0.82, rwe_lower = 0.76, rwe_upper = 0.87 ) ``` The published `SD = 0.90` matches. For a noninferiority trial, pass `trial_type` and `ni_margin` so partial agreement (SAP) can fire: ```{r} agreement( rct_hr = 1.28, rct_lower = 0.59, rct_upper = 2.79, rwe_hr = 1.35, rwe_lower = 0.94, rwe_upper = 1.93, trial_type = "noninferiority", ni_margin = 1.225 ) ``` `summary()` returns a tidy data frame, easy to bind into a report table: ```{r} res <- agreement(0.87, 0.78, 0.97, 0.82, 0.76, 0.87) summary(res) ``` ### Looping over a library of trials The package ships with selected RCT-DUPLICATE estimates: ```{r} data(rct_duplicate) head(rct_duplicate) # Apply agreement() to every row do.call(rbind, lapply(seq_len(nrow(rct_duplicate)), function(i) { r <- rct_duplicate[i, ] a <- agreement(r$rct_hr, r$rct_lower, r$rct_upper, r$rwe_hr, r$rwe_lower, r$rwe_upper, trial_type = r$trial_type, ni_margin = if (is.na(r$ni_margin)) NULL else r$ni_margin) data.frame(trial = r$trial, SA = a$sa$status, EA = a$ea, SD = a$sd$agree, z = round(a$sd$z, 2)) })) ``` ## 2. BenchExCal calibration If you forget the inputs: ```{r} calibration_inputs() ``` A worked example. Stage 1 = LEADER. Stage 2 = a hypothetical expanded indication where we have only an RWE estimate: ```{r} cal <- calibrate( stage1_rct_hr = 0.87, stage1_rct_lower = 0.78, stage1_rct_upper = 0.97, stage1_rwe_hr = 0.82, stage1_rwe_lower = 0.76, stage1_rwe_upper = 0.87, stage2_rwe_hr = 0.85, stage2_rwe_lower = 0.78, stage2_rwe_upper = 0.93 ) print(cal) ``` Note three things in the output: - `xi_hat_1 < 0` means the Stage 1 RWE was more protective than the RCT. - The calibrated Stage 2 HR is **shifted toward the null** to remove that apparent bias. - The calibrated 95% CI is **wider** than the uncalibrated one — this is the bias-variance trade-off that BenchExCal makes explicit. ### Forest plot ```{r, fig.width=7, fig.height=4} if (requireNamespace("ggplot2", quietly = TRUE)) plot(cal) ``` ## 3. Tipping-point sensitivity ```{r} tp <- tipping_point(cal) print(tp) ``` ```{r, fig.width=7, fig.height=4} if (requireNamespace("ggplot2", quietly = TRUE)) plot(tp) ``` If the tipping point sits outside the prior 95% interval for ξ₂, the Stage 2 conclusion is robust to plausible Stage 1–type bias. If it sits inside, the regulatory claim should be treated with caution. ## References - Wang SV, Schneeweiss S, RCT-DUPLICATE Initiative. Emulation of randomized clinical trials with nonrandomized database analyses: results of 32 clinical trials. *JAMA*. 2023;329:1376-1385. - Wang SV, Russo M, Glynn RJ, et al. A Benchmark, Expand, and Calibration (BenchExCal) Trial Emulation Approach for Using Real-World Evidence to Support Indication Expansions. *Clin Pharmacol Ther*. 2025;117:1820-1828. - Weberpals J, Schneeweiss S, et al. Emulating Comparative Oncology Trials With Real-World Evidence Studies (ENCORE). *Clin Pharmacol Ther*. 2026.