This function identifies "driver genes" from phenotype-cell type association analyses. For a given phenotype-cell type pair, driver genes are defined as the intersect of genes that had a phenotype evidence score >0 and were within the top 75th expression specificity percentile (quantiles 30-40 out of 40) for the associated cell type.

add_driver_genes(
  results = load_example_results(),
  ctd_list = load_example_ctd(file = paste0("ctd_", unique(results$ctd), ".rds"),
    multi_dataset = TRUE),
  annotLevels = map_ctd_levels(results),
  keep_quantiles = seq(30, 40),
  min_value = NULL,
  metric = "specificity_quantiles",
  top_n = NULL,
  group_var = "hpo_id",
  celltype_var = "CellType",
  ...
)

Arguments

results

The cell type-phenotype enrichment results generated by gen_results and merged together with merge_results

ctd_list

A named list of CellTypeDataset objects each created with generate_celltype_data.

annotLevels

The annotation level to use within each CTD in ctd_list.

keep_quantiles

Quantiles to keep in each CellTypeDataset of the ctd_list.

min_value

Miniumum specificity quantile to keep.

metric

Which metric to use in the CellTypeDatasets.

top_n

Top N genes to keep when grouping by group_vars.

group_var

Grouping variable to use when selecting top N genes. Only used when top_n!=NULL.

celltype_var

The name of the cell type column to merge on.

...

Arguments passed on to HPOExplorer::add_genes

phenos

A data.table containing HPO IDs and other metadata.

hpo

Human Phenotype Ontology object, loaded from get_ontology.

all.x

logical; if TRUE, rows from x which have no matching row in y are included. These rows will have 'NA's in the columns that are usually filled with values from y. The default is FALSE so that only rows with data from both x and y are included in the output.

allow.cartesian

See allow.cartesian in [.data.table.

phenotype_to_genes

Output of load_phenotype_to_genes mapping phenotypes to gene annotations.

by

A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the key of x.

gene_col

Name of the gene column.

Examples

res <- load_example_results()[seq(100)]
res <- add_driver_genes(results=res)
#> Reading cached RDS file: phenotype_to_genes.txt
#> + Version: v2025-11-24
#> Adding genes and disease IDs.
#> Loading ctd_DescartesHuman.rds