Run Term Frequency - Inverse Document Frequency (TF-IDF) analysis on samples metadata to characterise each cluster.
run_tfidf(
obj = NULL,
reduction = "UMAP",
label_var = "label",
cluster_var = "seurat_clusters",
replace_regex = "[.]|[_]|[-]",
terms_per_cluster = 3,
force_new = FALSE,
return_all_results = FALSE,
verbose = TRUE
)Single-cell data object.
Name of the reduction to use (case insensitive).
Which cell metadata column to input to NLP analysis.
Which cell metadata column to use to identify which cluster each cell is assigned to.
Characters used to split label_var into terms
(i.e. tokens) for NLP enrichment analysis.
The maximum number of words to return per cluster.
If NLP results are already detected the metadata,
set force_new=TRUE to replace them with new results.
Whether to return just the obj
with updated metadata (TRUE),
or all intermediate results (FALSE).
Whether to print messages.
The input object with TF-IDF results added to metadata
(enriched_words and tf_idf columns), or if
return_all_results=TRUE, a list with the object and intermediate
results.
data("pseudo_seurat")
obj2 <- run_tfidf(obj = pseudo_seurat,
cluster_var = "cluster",
label_var = "celltype")
#> Extracting obsm from Seurat: umap
#> + Dropping 2 conflicting obs variables: UMAP.1, UMAP.2
#> Setting cell metadata (obs) in obj.