Validating Power Analysis using Bulk Data

This section introduces a validation approach for scRNA-seq power analysis using high-confidence, sex-biased DEGs from bulk GTEx data. The goal is to quantify what fraction of “PTP DEGs”(genes consistently sex-biased in ≥90% of GTEx tissues) can be recovered in your single-cell DGE results across various down-sampling levels of individuals or cells. Since these DEGs are defined across many tissues, they serve as a robust benchmark for assessing sex-bias detectability.

The bulk_power_analysis function takes as input one or more SCE objects, along with metadata that defines how cell types and samples are labeled in each dataset. Using a provided bulk DEG reference (i.e., GTEx data), the function estimates statistical power to detect these DEGs under user-defined down-sampling schemes. Below is an example that demonstrates how to run bulk_power_analysis() with two scRNA-seq datasets:

library(poweranalysis)
library(SingleCellExperiment)

# Load SCE objects (replace with actual file paths or data)
allen_path <- system.file("extdata", "Allen_Endo_subset.qs", package="poweranalysis")
allen_endo <- qs::qread(allen_path)
tsai_path <- system.file("extdata", "Tsai_Micro.qs", package="poweranalysis")
tsai_micro <- qs::qread(tsai_path)

# List of SCE datasets
SCEs <- list(allen_endo, tsai_micro)

# Dataset names (used in plots and output files)
dataset_names <- c("Allen_Endo", "Tsai_Micro")

# Cell type mapping
celltype_corr <- list(Endo=c("cerebral cortex endothelial cell",NA),
                      Micro=c(NA,"Micro"))

# Metadata column names per SCE
celltypeIDs <- c("cell_type","cluster_celltype")
sampleIDs <- c("donor_id","sample_id")

# Load GTEx bulk DEGs
bulk_path <- system.file("extdata", "LFSR.tsv", package="poweranalysis")
bulkDE <- read.table(bulk_path, sep = "\t", header = TRUE)

# Run power analysis (Nperms=3 for speed in example)
bulk_power_analysis(
  SCEs = SCEs,
  dataset_names = dataset_names,
  celltype_corr = celltype_corr,
  celltypeIDs = celltypeIDs,
  sampleIDs = sampleIDs,
  bulkDE = bulkDE,
  sampled = "individuals",
  Nperms = 3
)

The output plot displays the percentage of PTP DEGs from the bulk data that are detected across all cell types at different down-sampling levels of the scRNA-seq datasets: