Linear regression association analysis — assoc

Fits one or more linear regression models for each exposure variable and returns a tidy result table. By default, two standard adjustment models are always included:

Usage

assoc_linear(
  data,
  outcome_col,
  exposure_col,
  covariates = NULL,
  base = TRUE,
  test = c("wald", "lrt"),
  conf_level = 0.95
)

assoc_lm(
  data,
  outcome_col,
  exposure_col,
  covariates = NULL,
  base = TRUE,
  test = c("wald", "lrt"),
  conf_level = 0.95
)

Arguments

data: (data.frame or data.table) Analysis dataset.
outcome_col: (character) Name of the continuous numeric outcome column.
exposure_col: (character) One or more exposure variable names.
covariates: (character or NULL) Covariate column names for the Fully adjusted model. Default: NULL.
base: (logical) Include Unadjusted and Age and sex adjusted models. Default: TRUE.
test: (character) P-value method for linear models: "wald" (default) or "lrt". For lm, "lrt" uses the conventional nested-model F test.
conf_level: (numeric) Confidence level. Default: 0.95.

Value

A data.table with one row per exposure \(\times\) term \(\times\) model combination, and columns:

exposure: Exposure variable name.
term: Coefficient name (e.g. "bmi_categoryObese").
model: Ordered factor: Unadjusted < Age and sex adjusted < Fully adjusted.
n: Participants in model (after NA removal).
beta: Regression coefficient (\(\beta\)).
se: Standard error of \(\beta\).
CI_lower: Lower confidence bound.
CI_upper: Upper confidence bound.
p_value: P-value from the method selected by test.
beta_label: Formatted string, e.g. "0.23 (0.05-0.41)".

Details

Unadjusted - no covariates (crude).
Age and sex adjusted - age + sex auto-detected from standard UKB names (p21022/p31) or decoded names (age_at_recruitment/sex). Errors if either column cannot be found.
Fully adjusted - the covariates supplied via the covariates argument. Only run when covariates is non-NULL.

Outcome: intended for continuous numeric variables. Passing a binary (0/1) or logical column is permitted (linear probability model) but will trigger a warning recommending assoc_logistic instead.

CI method: based on the t-distribution via confint.lm(), which is exact under the normal linear model assumption. There is no ci_method argument (unlike assoc_logistic) as profile likelihood does not apply to lm.

SE column: the standard error of \(\beta\) is included to support downstream meta-analysis and GWAS-style summary statistics.

P-value method: test = "wald" returns coefficient-level t-test p-values from summary.lm(). test = "lrt" returns an exposure-level nested-model p-value from single-term deletion (drop1(..., test = "F")); for factor exposures, the same overall exposure p-value is repeated across the non-reference level rows.

Examples

dt <- ops_toy(scenario = "association", n = 500)
#> ✔ ops_toy: 500 participants | 33 columns | scenario = "association" | seed = 42

res <- assoc_linear(
  data         = dt,
  outcome_col  = "p21001_i0",
  exposure_col = "p20116_i0",
  covariates   = c("bmi_cat", "tdi_cat"),
  base         = FALSE
)
#> 
#> ── assoc_linear ────────────────────────────────────────────────────────────────
#> ℹ 1 exposure x 1 model = 1 linear regression
#> ℹ Input cohort: 500 participants | test: wald (n reflects each model's actual analysis set)
#> 
#> ── p20116_i0 ──
#> 
#> ✔   Fully adjusted | p20116_i0Previous: beta 0.10 (-0.31-0.51), p = 0.628
#> ✔   Fully adjusted | p20116_i0Current: beta -0.05 (-0.59-0.49), p = 0.856
#> ✔ Done: 2 result rows across 1 exposure and 1 model.