Fits one or more Cox models for each exposure variable and returns a tidy result table suitable for downstream forest plots. By default, two standard adjustment models are always included alongside any user-specified model:
Usage
assoc_coxph(
data,
outcome_col,
time_col,
exposure_col,
covariates = NULL,
base = TRUE,
strata = NULL,
test = c("wald", "lrt"),
cluster_col = NULL,
conf_level = 0.95
)
assoc_cox(
data,
outcome_col,
time_col,
exposure_col,
covariates = NULL,
base = TRUE,
strata = NULL,
test = c("wald", "lrt"),
cluster_col = NULL,
conf_level = 0.95
)Arguments
- data
(data.frame or data.table) Analysis dataset. Must contain all columns referenced by
outcome_col,time_col, andexposure_col.- outcome_col
(character) Name of the event indicator column. Accepts
logical(TRUE/FALSE) or numeric/integer (0/1).- time_col
(character) Name of the follow-up time column (numeric, in consistent units, e.g. years).
- exposure_col
(character) One or more exposure variable names. Each variable is analysed separately; results are stacked row-wise.
- covariates
(character or NULL) Additional covariate column names for the Fully adjusted model (e.g.
c("tdi", "smoking", paste0("pc", 1:10))). WhenNULL(default), the Fully adjusted model is not run.- base
(logical) If
TRUE(default), always include the Unadjusted and Age and sex adjusted models in addition to any user-specifiedcovariatesmodel. Set toFALSEto run only the Fully adjusted model (requirescovariatesto be non-NULL).- strata
(character or NULL) Optional stratification variable. Passed to
survival::strata()in the Cox formula.- test
(character) P-value method for Cox models:
"wald"(default) or"lrt".- cluster_col
(character or NULL) Optional clustering variable for cluster-robust Cox variance, added to the model as
cluster(cluster_col). When supplied,testmust be"wald".- conf_level
(numeric) Confidence level for hazard ratio intervals. Default:
0.95.
Value
A data.table with one row per exposure \(\times\) term
\(\times\) model combination, and the following columns:
exposureExposure variable name.
termCoefficient name as returned by
coxph(e.g."bmi_categoryObese"for a factor, or the variable name itself for numeric/binary exposures).modelOrdered factor:
Unadjusted<Age and sex adjusted<Fully adjusted.nNumber of participants included in the model (after
NAremoval).n_eventsNumber of events in the model's analysis set (after
NAremoval).person_yearsTotal person-years of follow-up in the model's analysis set (rounded, after
NAremoval).HRHazard ratio (point estimate).
CI_lowerLower bound of the confidence interval.
CI_upperUpper bound of the confidence interval.
p_valueP-value from the method selected by
test. Whencluster_colis supplied, this is the cluster-robust Wald p-value.HR_labelFormatted string, e.g.
"1.23 (1.05-1.44)".
Details
Unadjusted - no covariates (crude).
Age and sex adjusted - age + sex auto-detected from standard UKB names (
p21022/p31) or decoded names (age_at_recruitment/sex). Errors if either column cannot be found.Fully adjusted - the covariates supplied via the
covariatesargument. Only run whencovariatesis non-NULL.
Outcome coding: outcome_col may be logical
(TRUE/FALSE) or integer/numeric (0/1).
Logical values are converted to integer internally.
Exposure types supported:
Binary -
0/1orTRUE/FALSE; produces onetermrow per model.Factor - produces one
termrow per non-reference level.Numeric (continuous) - produces one
termrow per model.
P-value method: test = "wald" returns the term-level Wald
p-value from summary.coxph(). test = "lrt" returns the
exposure-level likelihood-ratio p-value from single-term deletion
(drop1(..., test = "Chisq")); for factor exposures, the same overall
exposure p-value is repeated across the non-reference level rows.
When cluster_col is supplied, only test = "wald" is supported;
the Wald p-value and confidence interval use the cluster-robust variance from
summary.coxph().
Examples
dt <- ops_toy(scenario = "association", n = 500)
#> ✔ ops_toy: 500 participants | 33 columns | scenario = "association" | seed = 42
dt <- dt[dm_timing != 1L]
res <- assoc_coxph(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = "p20116_i0",
covariates = c("bmi_cat", "tdi_cat"),
base = FALSE
)
#> ℹ outcome_col dm_status: logical detected, converting TRUE/FALSE -> 1/0
#>
#> ── assoc_coxph ─────────────────────────────────────────────────────────────────
#> ℹ 1 exposure x 1 model = 1 Cox regression
#> ℹ Input cohort: 463 participants (n/n_events/person_years reflect each model's actual analysis set)
#>
#> ── p20116_i0 ──
#>
#> ✔ Fully adjusted | p20116_i0Previous: HR 0.39 (0.13-1.16), p = 0.0918
#> ✔ Fully adjusted | p20116_i0Current: HR 0.41 (0.09-1.77), p = 0.232
#> ✔ Done: 2 result rows across 1 exposure and 1 model.
# Fully adjusted only (skip Unadjusted + Age-sex)
res <- assoc_coxph(
data = dt,
outcome_col = "dm_status",
time_col = "dm_followup_years",
exposure_col = "p20116_i0",
covariates = c("p21022", "p31", "bmi_cat", "tdi_cat"),
base = FALSE
)
#> ℹ outcome_col dm_status: logical detected, converting TRUE/FALSE -> 1/0
#>
#> ── assoc_coxph ─────────────────────────────────────────────────────────────────
#> ℹ 1 exposure x 1 model = 1 Cox regression
#> ℹ Input cohort: 463 participants (n/n_events/person_years reflect each model's actual analysis set)
#>
#> ── p20116_i0 ──
#>
#> ✔ Fully adjusted | p20116_i0Previous: HR 0.41 (0.14-1.21), p = 0.105
#> ✔ Fully adjusted | p20116_i0Current: HR 0.42 (0.10-1.84), p = 0.253
#> ✔ Done: 2 result rows across 1 exposure and 1 model.
