Predict cell types — predict_celltypes • MSTExplorer

Predict the causal cell types underlying a patient's phenotypes given some varying degree of prior knowledge.

predict_celltypes(
  phenotypes,
  diseases_include = NULL,
  diseases_exclude = NULL,
  genes_include = NULL,
  genes_exclude = NULL,
  gene_weights = list(include = 2, default = 1, exclude = 0),
  results = MSTExplorer::load_example_results(),
  phenotype_to_genes = HPOExplorer::load_phenotype_to_genes(),
  agg_var = c("cl_name"),
  effect_var = "logFC",
  x_var = agg_var[1],
  y_var = "score_mean",
  fill_var = "score_sum",
  evidence_score_var = "evidence_score_sum",
  max_x_var = 10,
  subtitle_size = 9,
  plot.margin = ggplot2::margin(1, 1, 1, 40),
  show_plot = TRUE,
  save_path = NULL,
  width = NULL,
  height = NULL
)

Arguments

phenotypes: Phenotypes observed in the patient. Can be a list of HPO phenotype IDs or HPO phenotype names.
diseases_include: Diseases that the patient is known to have. Can be provided as OMIM, Orphanet, or DECIPHER disease IDs.
diseases_exclude: Diseases that the patient is known NOT to have. Can be provided as OMIM, Orphanet, or DECIPHER disease IDs.
genes_include: Genes in which the patient is known to have abnormalities.
genes_exclude: Genes in which the patient is known NOT to have abnormalities.
gene_weights: A named list describing the weight to apply to genes in the include, default, and exclude lists.
results: The cell type-phenotype enrichment results generated by gen_results and merged together with merge_results
phenotype_to_genes: Phenotype to gene mapping from load_phenotype_to_genes.
agg_var: The variable(s) to aggregate results by.
effect_var: Name of the effect size column in the results.
x_var: Variable to plot on the x-axis.
y_var: Variable to plot on the y-axis.
fill_var: Variable to fill by.
evidence_score_var: Which variable from add_evidence to use when weighting genes.
max_x_var: The maximum number of cell types to display.
subtitle_size: Size of the plot subtitle.
plot.margin: margin around entire plot (unit with the sizes of the top, right, bottom, and left margins)
show_plot: Print the plot to the console.
save_path: Save the plot to a file. Set to NULL to not save the plot.
width: Width of the saved plot.
height: Height of the saved plot.

Value

data.table of prioritised cell types, sorted by a "score" that combines:

The phenotype-cell type enrichment p-values ("p").
The phenotype-cell type enrichment effect size ("effect").
A gene-wise factor that upweights/downweights included/excluded genes respectively, multiplied by the evidence score of a phenotype-gene association. Only applied when genes_include or genes_exclude is provided.

Examples

phenotypes <- c("Generalized neonatal hypotonia",
                "Scrotal hypospadias",
                "Increased circulating progesterone")
# diseases_include <- "OMIM:176270"
genes_include <- c("MAGEL2","HERC2")
genes_exclude <- c("SNORD115-1")
ct <- predict_celltypes(phenotypes = phenotypes,
                        genes_include = genes_include,
                        genes_exclude = genes_exclude)
#> Translating ontology terms to ids.
#> Adding logFC column.
#> Reading cached RDS file: phenotype_to_genes.txt
#> + Version: v2025-05-06
#> Adding genes and disease IDs.
#> Mapping cell types to cell ontology terms.
#> Adding stage information.
#> Reading cached RDS file: phenotype_to_genes.txt
#> + Version: v2025-05-06
#> Loading ctd_DescartesHuman.rds
#> Loading ctd_HumanCellLandscape.rds
#> Annotating gene-disease associations with Evidence Score
#> Gathering data from GenCC.
#> Importing cached file.
#> Evidence scores for: 
#>  - 11050 diseases 
#>  - 5533 genes
#> + Version: 2025-08-08
#> Warning: A shallow copy of this data.table was taken so that := can add or remove 1 columns by reference. At an earlier point, this data.table was copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. It's also not unusual for data.table-agnostic packages to produce tables affected by this issue. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.