Downloads GEO (Gene Expression Omnibus) datasets including expression data, supplemental files, and platform annotations with error handling and logging.
Usage
download_geo_data(
gse_id,
dest_dir,
overwrite = FALSE,
log = TRUE,
log_file = NULL,
retries = 2,
timeout = 300
)
Arguments
- gse_id
Character. GEO Series accession ID (e.g., "GSE12345").
- dest_dir
Character. Destination directory for downloaded files.
- overwrite
Logical. Whether to overwrite existing files (default: FALSE).
- log
Logical. Whether to create log file (default: TRUE).
- log_file
Character or NULL. Log file path (auto-generated if NULL).
- retries
Numeric. Number of retry attempts (default: 2).
- timeout
Numeric. Timeout in seconds (default: 300).
Value
A list with components:
- gse_object
ExpressionSet object with expression data and annotations
- supplemental_files
Paths to downloaded supplemental files
- platform_info
Platform information (platform_id, gpl_files)
- meta
Download metadata (timing, file counts, etc.)
Details
Downloads GSEMatrix files, supplemental files, and GPL annotations. Includes retry mechanism, timeout control, and logging. Requires: GEOquery, Biobase, withr, cli.
References
https://www.ncbi.nlm.nih.gov/geo/
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013 Jan; 41(Database issue):D991-5.
Examples
# Basic usage (commented to avoid network operations):
# result <- download_geo_data("GSE12345", dest_dir = tempdir())
# Advanced usage with custom settings:
# result <- download_geo_data(
# gse_id = "GSE7305",
# dest_dir = tempdir(),
# log = TRUE,
# retries = 3,
# timeout = 600
# )
# Access downloaded data:
# expr_data <- Biobase::exprs(result$gse_object)
# sample_info <- Biobase::pData(result$gse_object)
# feature_info <- Biobase::fData(result$gse_object)