Toy Data Utilities

Overview

The toy module provides two helper functions for examples, demos, and tests:

Task	Functions
Toy gene reference data	`toy_gene_ref()`
Toy GMT file generation	`toy_gmt()`

library(evanverse)

Note: All code examples in this vignette are static (eval = FALSE). Output is hand-written to reflect the current implementation.

1 Toy Gene Reference Data

`toy_gene_ref()` - Generate a compact gene reference table

toy_gene_ref() returns a small deterministic reference table compatible with gene2entrez() / gene2ensembl() workflows. The human reference includes the symbols used by toy_gmt(), so GMT parsing and gene ID conversion examples can run offline together.

ref_human <- toy_gene_ref()
head(ref_human, 3)
#>          symbol       ensembl_id entrez_id      gene_type species ensembl_version download_date
#> 1          TP53 ENSG00000141510      7157 protein_coding   human             113    2025-04-23
#> 2         BRCA1 ENSG00000012048       672         lncRNA   human             113    2025-04-23
#> 3           MYC ENSG00000136997      4609     pseudogene   human             113    2025-04-23

Key behavior:

species supports only "human" and "mouse".
n controls returned rows and is capped at 100 available rows.
Symbols and Ensembl IDs are unique within each species.

Mouse example:

ref_mouse <- toy_gene_ref(species = "mouse", n = 10)
ref_mouse[, c("symbol", "ensembl_id", "species")]
#>          symbol          ensembl_id species
#> 1         Trp53 ENSMUSG00000059552   mouse
#> 2         Brca1 ENSMUSG00000017146   mouse
#> ...

n larger than 100 is silently capped:

nrow(toy_gene_ref(n = 999))
#> [1] 100

Invalid n (0, negative, non-integer) raises an error.

2 Toy GMT File Generation

`toy_gmt()` - Write a temporary GMT file

toy_gmt() writes a GMT file to a temporary path and returns that file path. It is designed to feed directly into gmt2df() and gmt2list().

path <- toy_gmt()
path
#> [1] "C:/Users/.../Rtmp.../file....gmt"

readLines(path)[1]
#> [1] "HALLMARK_P53_PATHWAY\tGenes regulated by p53\tTP53\tBRCA1\tMYC\tEGFR\t..."

Key behavior:

Default n = 5 writes 5 gene sets.
n is capped at 5 available built-in sets.
Every line is GMT-formatted: term, description, then genes.

length(readLines(toy_gmt(n = 1)))
#> [1] 1

length(readLines(toy_gmt(n = 3)))
#> [1] 3

length(readLines(toy_gmt(n = 99)))
#> [1] 5

Invalid n (0, negative, non-integer) raises an error.

3 Compatibility With Base Utilities

The outputs are intentionally aligned with existing base functions.

`toy_gene_ref()` with gene ID conversion

ref <- toy_gene_ref(species = "human", n = 50)
gene2entrez(c("TP53", "BRCA1", "GHOST"), ref = ref, species = "human")
#>   symbol symbol_std entrez_id
#> 1   TP53       TP53      7157
#> 2  BRCA1      BRCA1       672
#> 3  GHOST      GHOST      <NA>

`toy_gmt()` with GMT parsers

tmp <- toy_gmt(n = 3)

df <- gmt2df(tmp)
head(df, 4)
#>                    term               description  gene
#> 1 HALLMARK_P53_PATHWAY   Genes regulated by p53  TP53
#> 2 HALLMARK_P53_PATHWAY   Genes regulated by p53 BRCA1
#> 3 HALLMARK_P53_PATHWAY   Genes regulated by p53   MYC
#> 4 HALLMARK_P53_PATHWAY   Genes regulated by p53  EGFR

lst <- gmt2list(tmp)
names(lst)
#> [1] "HALLMARK_P53_PATHWAY" "HALLMARK_MTORC1_SIGNALING" "HALLMARK_HYPOXIA"

4 A Combined Workflow

A common offline testing flow is:

Build a small reference with toy_gene_ref().
Build a small gene-set file with toy_gmt().
Parse GMT and convert symbols to IDs using base converters.

library(evanverse)

# 1. Toy reference
ref <- toy_gene_ref(species = "human", n = 100)

# 2. Toy gene sets
path <- toy_gmt(n = 3)
long <- gmt2df(path)

# 3. Convert symbols to Entrez IDs
id_map <- gene2entrez(long$gene, ref = ref, species = "human")
long$entrez_id <- id_map$entrez_id

# 4. Rebuild list of Entrez IDs per term
long2 <- long[!is.na(long$entrez_id), ]
sets_entrez <- df2list(long2, group_col = "term", value_col = "entrez_id")

names(sets_entrez)
#> [1] "HALLMARK_P53_PATHWAY" "HALLMARK_MTORC1_SIGNALING" "HALLMARK_HYPOXIA"

Overview

1 Toy Gene Reference Data

toy_gene_ref() - Generate a compact gene reference table

2 Toy GMT File Generation

toy_gmt() - Write a temporary GMT file

3 Compatibility With Base Utilities

toy_gene_ref() with gene ID conversion

toy_gmt() with GMT parsers

4 A Combined Workflow

Getting Help

`toy_gene_ref()` - Generate a compact gene reference table

`toy_gmt()` - Write a temporary GMT file

`toy_gene_ref()` with gene ID conversion

`toy_gmt()` with GMT parsers