Run Term Frequency - Inverse Document Frequency (TF-IDF) analysis on samples metadata to characterise each cluster.

run_tfidf(
  obj = NULL,
  reduction = "UMAP",
  label_var = "label",
  cluster_var = "seurat_clusters",
  replace_regex = "[.]|[_]|[-]",
  terms_per_cluster = 3,
  force_new = FALSE,
  return_all_results = FALSE,
  verbose = TRUE
)

Arguments

obj

Single-cell data object.

reduction

Name of the reduction to use (case insensitive).

label_var

Which cell metadata column to input to NLP analysis.

cluster_var

Which cell metadata column to use to identify which cluster each cell is assigned to.

replace_regex

Characters used to split label_var into terms (i.e. tokens) for NLP enrichment analysis.

terms_per_cluster

The maximum number of words to return per cluster.

force_new

If NLP results are already detected the metadata, set force_new=TRUE to replace them with new results.

return_all_results

Whether to return just the obj with updated metadata (TRUE), or all intermediate results (FALSE).

verbose

Whether to print messages.

Value

The input object with TF-IDF results added to metadata (enriched_words and tf_idf columns), or if return_all_results=TRUE, a list with the object and intermediate results.

Examples

data("pseudo_seurat")
obj2 <- run_tfidf(obj = pseudo_seurat,
                  cluster_var = "cluster",
                  label_var = "celltype")
#> Extracting obsm from Seurat: umap
#> + Dropping 2 conflicting obs variables: UMAP.1, UMAP.2
#> Setting cell metadata (obs) in obj.