The poweranalysis R package is designed to run robust power analysis for differential gene expression in scRNA-seq studies and provides tools to estimate the optimal number of samples and cells needed to achieve reliable power levels.

Differential Gene Expression (DGE) Analysis

The DGE_analysis function performs cell type–specific pseudobulk differential expression from a SingleCellExperiment (SCE) object. This enables robust identification of differentially expressed genes (DEGs) across biological conditions or groups.

Start by loading your own SCE object:

# Load your SCE object (replace with actual file path)
library(qs)

SCE_path <- system.file("extdata", "Tsai_Micro.qs", package="poweranalysis")
SCE <- qs::qread(SCE_path)

The function requires the following key arguments:

· design: A formula specifying covariates to include in the differential expression model. For example, ~ sex + diagnosis will compare gene expression across diagnosis groups while adjusting for sex.

· coef: The level (group) of the response variable you want to test for. For instance, "AD" in diagnosis identifies genes upregulated in Alzheimer’s disease relative to the reference group.

· sampleID and celltypeID: Column names in the SCE metadata that specify the biological replicate (e.g., patient ID) and cell type, respectively.

· y (optional): The name of the response variable. If not provided, the function uses the last variable in the design formula.

Example 1: Comparing Sex Differences
To identify genes differentially expressed between males and females:

# Run the DGE_analysis function for a sex comparison
DGE_analysis.sex <- DGE_analysis(
  SCE,
  design = ~ sex,
  coef = "M",
  celltypeID="cluster_celltype",
  sampleID = "sample_id",
)

Example 2: Adjusting for Covariates in a Disease Comparison
To compare Alzheimer’s disease vs. control while adjusting for sex:

# Run the DGE_analysis function for a disease vs. control comparison
DGE_analysis.AD <- DGE_analysis(
  SCE,
  design = ~ sex + pathological_diagnosis,
  y = "pathological_diagnosis",
  coef = "AD",                     # Group of interest
  sampleID = "sample_id",
  celltypeID = "cluster_celltype"
)

Power Analysis

Perform power analysis to estimate the accuracy and reliability of DEG detection in your scRNA‑seq dataset under different levels of sampling. This evaluates how well DEGs can be recovered at varying numbers of individuals and cells. DEGs identified in each down‑sampled subset are compared to those from the full dataset to compute the percentage of true positives recovered, along with the False Discovery Rate (FDR).

To assess power based on sex‑specific DEGs, use the following function:

# Specify the down-sampling range (or use the default range)
range_ind = c(10,20,30,40)
range_cell = c(10,30,50,70)

# Run the power_analysis function for a sex comparison
power_analysis.sex <- power_analysis(
  SCE,
  range_downsampled_individuals = range_ind,  
  range_downsampled_cells = range_cell,
  design = ~ sex,
  coef = "M",
  sampleID = "sample_id",
  celltypeID = "cluster_celltype",
  Nperms = 3)

The power_analysis function generates several key outputs:

  • QC plots display distributions of effect sizes (log2 fold-change) across detected DEGs and the number of cells per individual in the full dataset.

  • DGE analysis results identify PTP DEGs and non-DEGs using a 0.05 cut-off for both nominal and adjusted p-values over the down-sampling range of datasets.

  • Power plots show the mean percentage of PTP DEGs detected and FDR trends as sample size or number of cells per sample increases.
    Down-sampling individuals:

Down-sampling cells:

  • Effect size-specific detection rates assess the DEG recovery across different absolute log2 fold-change bins.
    Down-sampling individuals:

Down-sampling cells:

  • Correlation plots assess the correlation of effect sizes for the top 500 and 1000 genes across down-sampling levels.
    Down-sampling individuals:

Down-sampling cells: