Perform correlation analysis of DEG effect sizes between single-cell datasets

Runs the correlation analysis pipeline by computing Spearman’s rank correlations of log₂ fold-changes for differentially expressed genes (DEGs) across and within multiple scRNA-seq datasets. Uses a user-specified reference dataset to define DEGs and compares effect sizes across studies, independently sampled subsets, and permuted controls.

correlation_analysis(
  main_dataset,
  SCEs,
  sampleIDs,
  celltypeIDs,
  celltype_correspondence,
  dataset_names,
  assay_names = "counts",
  pvals = c(0.05, 0.025, 0.01, 0.001, 1e-04),
  alphaval = 0.25,
  N_randperms = 5,
  N_subsets = 5,
  sex_DEGs = FALSE,
  fontsize_yaxislabels = 12,
  fontsize_yaxisticks = 9,
  fontsize_title = 14,
  fontsize_legendlabels = 9,
  fontsize_legendtitle = 9,
  fontsize_facet_labels = 9,
  output_path = getwd()
)

Arguments

main_dataset: Name of the dataset used to select significant DEGs from (specified as a string, use the dataset name as in dataset_names)
SCEs: A list of SingleCellExperiment (SCE) objects, each representing a scRNA-seq dataset.
sampleIDs: A character vector specifying the column name in each SCE that represents sample or donor IDs (in order of SCEs).
celltypeIDs: A character vector specifying the column name in each SCE that denotes cell type identity (in order of SCEs).
celltype_correspondence: A named vector that maps a standard cell type label (e.g., list(Micro=c("Micro",NA), Astro=c(NA,"Astro")) to how that cell type appears in each dataset. Use NA if the cell type is not present in a given dataset.
dataset_names: A vector of names corresponding to each dataset (as you would like them to appear in output plots).
assay_names: A character vector specifying the assay names in each SCE that will be used for the analysis (in order of SCEs). Default is a vector with all entries "counts", which uses the count assay in each SCE.
pvals: list of P-value thresholds for selecting DEGs in each individual dataset. Default is c(0.05,0.025,0.01,0.001,0.0001).
alphaval: Transparency of the non-mean boxplots. The value of alpha ranges between 0 (completely transparent) and 1 (completely opaque).
N_randperms: Number of random permutations of the dataset used to select significant DEGs from. Default is 5.
N_subsets: Number of pairs of random subsets of the dataset used to select significant DEGs from. Default is 5.
sex_DEGs: If TRUE, only keep genes present on sex chromosmomes. Queries hspanies gene Ensembl dataset.
fontsize_yaxislabels: font size for axis labels in plot
fontsize_yaxisticks: font size for axis tick labels in plot
fontsize_title: font size for plot title
fontsize_legendlabels: font size for legend labels in plot
fontsize_legendtitle: font size for legend title in plot
fontsize_facet_labels: font size for facet labels Saves all plots and DGE analysis outputs in the appropriate directories
output_path: A directory path where outputs will be saved.

Perform correlation analysis of DEG effect sizes between single-cell datasets

Arguments

Examples