---
title: "Getting started with benchexcal"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with benchexcal}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

The **benchexcal** package gives you two things:

1. `agreement()` — compute the three RCT-DUPLICATE / ENCORE agreement
   metrics (SA, EA, SD) between an RCT and a real-world emulation;
2. `calibrate()` — apply the BenchExCal (Wang et al., CPT 2025)
   calibration to a Stage 2 RWE estimate, using the divergence observed
   in a Stage 1 RCT-RWE pair as prior information.

Plus tipping-point sensitivity analysis and forest plots.

```{r}
library(benchexcal)
```

## 1. Agreement metrics

The three metrics from Wang et al. (JAMA 2023). Reproduce LEADER (study #1
in Table 1):

```{r}
agreement(
  rct_hr = 0.87, rct_lower = 0.78, rct_upper = 0.97,
  rwe_hr = 0.82, rwe_lower = 0.76, rwe_upper = 0.87
)
```

The published `SD = 0.90` matches.

For a noninferiority trial, pass `trial_type` and `ni_margin` so partial
agreement (SAP) can fire:

```{r}
agreement(
  rct_hr = 1.28, rct_lower = 0.59, rct_upper = 2.79,
  rwe_hr = 1.35, rwe_lower = 0.94, rwe_upper = 1.93,
  trial_type = "noninferiority", ni_margin = 1.225
)
```

`summary()` returns a tidy data frame, easy to bind into a report table:

```{r}
res <- agreement(0.87, 0.78, 0.97, 0.82, 0.76, 0.87)
summary(res)
```

### Looping over a library of trials

The package ships with selected RCT-DUPLICATE estimates:

```{r}
data(rct_duplicate)
head(rct_duplicate)

# Apply agreement() to every row
do.call(rbind, lapply(seq_len(nrow(rct_duplicate)), function(i) {
  r <- rct_duplicate[i, ]
  a <- agreement(r$rct_hr, r$rct_lower, r$rct_upper,
                 r$rwe_hr, r$rwe_lower, r$rwe_upper,
                 trial_type = r$trial_type,
                 ni_margin  = if (is.na(r$ni_margin)) NULL else r$ni_margin)
  data.frame(trial = r$trial,
             SA = a$sa$status,
             EA = a$ea,
             SD = a$sd$agree,
             z  = round(a$sd$z, 2))
}))
```

## 2. BenchExCal calibration

If you forget the inputs:

```{r}
calibration_inputs()
```

A worked example. Stage 1 = LEADER. Stage 2 = a hypothetical expanded
indication where we have only an RWE estimate:

```{r}
cal <- calibrate(
  stage1_rct_hr = 0.87, stage1_rct_lower = 0.78, stage1_rct_upper = 0.97,
  stage1_rwe_hr = 0.82, stage1_rwe_lower = 0.76, stage1_rwe_upper = 0.87,
  stage2_rwe_hr = 0.85, stage2_rwe_lower = 0.78, stage2_rwe_upper = 0.93
)
print(cal)
```

Note three things in the output:

- `xi_hat_1 < 0` means the Stage 1 RWE was more protective than the RCT.
- The calibrated Stage 2 HR is **shifted toward the null** to remove that
  apparent bias.
- The calibrated 95% CI is **wider** than the uncalibrated one — this is
  the bias-variance trade-off that BenchExCal makes explicit.

### Forest plot

```{r, fig.width=7, fig.height=4}
if (requireNamespace("ggplot2", quietly = TRUE)) plot(cal)
```

## 3. Tipping-point sensitivity

```{r}
tp <- tipping_point(cal)
print(tp)
```

```{r, fig.width=7, fig.height=4}
if (requireNamespace("ggplot2", quietly = TRUE)) plot(tp)
```

If the tipping point sits outside the prior 95% interval for ξ₂, the
Stage 2 conclusion is robust to plausible Stage 1–type bias. If it sits
inside, the regulatory claim should be treated with caution.

## References

- Wang SV, Schneeweiss S, RCT-DUPLICATE Initiative. Emulation of randomized
  clinical trials with nonrandomized database analyses: results of 32 clinical
  trials. *JAMA*. 2023;329:1376-1385.
- Wang SV, Russo M, Glynn RJ, et al. A Benchmark, Expand, and Calibration
  (BenchExCal) Trial Emulation Approach for Using Real-World Evidence to
  Support Indication Expansions. *Clin Pharmacol Ther*. 2025;117:1820-1828.
- Weberpals J, Schneeweiss S, et al. Emulating Comparative Oncology Trials
  With Real-World Evidence Studies (ENCORE). *Clin Pharmacol Ther*. 2026.