Runs the complete bulk RNA-seq-informed power analysis pipeline by performing downsampling-based DEG detection across multiple scRNA-seq datasets, comparing overlaps with bulk RNA-seq DEGs, and generating summary plots for evaluation.

bulk_power_analysis(
  SCEs,
  dataset_names,
  celltype_correspondence,
  output_path = getwd(),
  celltypeIDs = "cell_type",
  sampled = "individuals",
  sampleIDs = "donor_id",
  bulkDE = "placeholder",
  bulk_cutoff = 0.9,
  pvalue = 0.05,
  Nperms = 20,
  fontsize_axislabels = 12,
  fontsize_axisticks = 9,
  fontsize_title = 14,
  fontsize_legendlabels = 9,
  fontsize_legendtitle = 9,
  plot_title = "placeholder"
)

Arguments

SCEs

A list of SingleCellExperiment (SCE) objects, each representing a scRNA-seq dataset.

dataset_names

A vector of names corresponding to each dataset (as you would like them to appear in output plots).

celltype_correspondence

A named vector that maps a standard cell type label (e.g., list(Micro=c("Micro",NA), Astro=c(NA,"Astro")) to how that cell type appears in each dataset. Use NA if the cell type is not present in a given dataset.

output_path

A clean directory path where DGE analysis outputs of down-sampled datasets and summary plots will be saved (should contain no subdirectories).

celltypeIDs

A character vector specifying the column name in each SCE that denotes cell type identity (in order of SCEs).

sampled

Specifies the unit of down-sampling. Can be either "individuals" or "cells", depending on whether the analysis downsamples across samples or cells.

sampleIDs

A character vector specifying the column name in each SCE that represents sample or donor IDs (in order of SCEs).

bulkDE

DGE analysis output for a bulk RNA-seq dataset (e.g., LFSR.tsv): rows (rownames) should be the genes, columns should be tissues, and entries should be significance levels

bulk_cutoff

Proportion (0–1) of bulk tissues in which a gene must be differentially expressed to be considered (e.g., 0.9 selects DEGs found in ≥90% of tissues). Default is 0.9.

pvalue

P-value threshold for selecting DEGs in each individual dataset. Default is 0.05.

Nperms

Number of subsets (permutations) to generate at each downsampling level during power analysis. Each subset is analyzed independently to estimate variability. Default is 20.

fontsize_axislabels

Font size for axis labels in plot

fontsize_axisticks

Font size for axis tick labels in plot

fontsize_title

Font size for plot title

fontsize_legendlabels

Font size for legend labels in plot

fontsize_legendtitle

Font size for legend title in plot

plot_title

Plot title Saves all plots in the appropriate directories

Examples