Run tf-idf on a metadata table.
tfidf(
clusts,
label_var = "dataset",
cluster_var = "seurat_clusters",
terms_per_cluster = 1,
replace_regex = "[.]|[_]|[-]",
force_new = FALSE,
with_ties = FALSE
)data.frame/data.table with the
per-cell metadata and cluster assignments.
Which cell metadata column to input to NLP analysis.
Which cell metadata column to use to identify which cluster each cell is assigned to.
The maximum number of words to return per cluster.
Characters used to split label_var into terms
(i.e. tokens) for NLP enrichment analysis.
If NLP results are already detected the metadata,
set force_new=TRUE to replace them with new results.
Should ties be kept together? The default, TRUE,
may return more rows than you request. Use FALSE to ignore ties,
and return the first n rows.
A data.frame with TF-IDF enrichment results per cluster,
including columns: cluster, word, n, total, samples, tf, idf, tf_idf.
data("pseudo_seurat")
clusts <- pseudo_seurat[[]]
clusts$cluster <- clusts$seurat_clusters
tfidf_results <- tfidf(clusts = clusts,
label_var = "celltype",
cluster_var = "cluster")