
Derive a binary disease flag from UKB First Occurrence fields
Source:R/derive.R
derive_first_occurrence.RdUKB pre-computes the earliest recorded date for hundreds of ICD-10 chapters
and categories as First Occurrence fields (p131xxx). Each
field contains a single date per participant; no array or instance depth is
involved. This function reads that date, converts it to IDate,
and writes two analysis-ready columns:
Arguments
- data
(data.frame or data.table) UKB phenotype data.
- name
(character) Output column prefix, e.g.
"disease"producesdisease_foanddisease_fo_date.- field
(integer or character) UKB field ID of the First Occurrence field, e.g.
131666for E11 (type 2 diabetes).- col
(character or NULL) Name of the source column in
data. WhenNULL(default) the column is detected automatically fromfield.
Value
The input data (invisibly) with two new columns added
in-place: {name}_fo (logical) and {name}_fo_date (IDate).
Details
{name}_fo_dateEarliest First Occurrence date (
IDate). Values that cannot be coerced to a valid date (e.g. UKB error codes) are silently set toNA.{name}_foLogical flag derived from
{name}_fo_date:TRUEif and only if a valid date exists. This guarantees that every positive case has a usable date - essential for time-to-event and prevalent/incident classification.
Column detection: the function locates the source column
automatically from field, handling both the raw format used by
extract_pheno (participant.p131666) and the
snake_case format produced by decode_names
(date_e11_first_reported_type_2_diabetes). Supply col
to override auto-detection.
data.table pass-by-reference: when the input is a
data.table, new columns are added in-place via :=.
The returned object and the original variable point to the same memory.
Examples
dt <- ops_toy(n = 100)
#> ✔ ops_toy: 100 participants | 75 columns | scenario = "cohort" | seed = 42
derive_first_occurrence(dt, name = "outcome", field = 131742L, col = "p131742")
#> ✔ derive_first_occurrence (outcome): 5 cases with valid date.