Skip to contents

Submits an asynchronous table-exporter job on the DNAnexus Research Analysis Platform for large-scale phenotype extraction. Use this instead of extract_pheno() when extracting many fields (e.g. 50+).

Usage

extract_batch(
  field_id,
  dataset = NULL,
  file = NULL,
  instance_type = NULL,
  priority = c("low", "high")
)

Arguments

field_id

(integer) Vector of UKB Field IDs to extract. eid is always included automatically.

dataset

(character) Dataset file name. Default: NULL (auto-detect from project root).

file

(character) Output file name on the cloud (without extension), e.g. "ad_cscc_pheno". Default: NULL (auto-generate as "ukb_pheno_YYYYMMDD_HHMMSS" to avoid same-day collisions).

instance_type

(character) DNAnexus instance type, e.g. "mem1_ssd1_v2_x16". Default: NULL (auto-select: x4 for up to 20 cols, x8 for up to 100 cols, x16 for up to 500 cols, x36 for more than 500 cols).

priority

(character) Job scheduling priority. "low" (recommended, cheaper) or "high" (faster queue). Default: "low".

Value

Invisibly returns the job ID string (e.g. "job-XXXX").

Details

The job runs on the cloud and typically completes in 20-40 minutes. Monitor progress and retrieve results using the job_ series.

Examples

if (FALSE) { # \dontrun{
job_id <- extract_batch(core_field_ids)
job_id <- extract_batch(core_field_ids, file = "ad_cscc_pheno")
job_id <- extract_batch(core_field_ids, priority = "high")
# Monitor: job_status(job_id)
# Download: job_result(job_id, dest = "data/pheno.csv")
} # }