vignettes/validation_analysis.Rmd
validation_analysis.Rmd
This section introduces a validation approach for scRNA-seq power
analysis using high-confidence, sex-biased DEGs from bulk GTEx data. The
goal is to quantify what fraction of “PTP DEGs”(genes consistently
sex-biased in ≥90% of GTEx tissues) can be recovered in your single-cell
DGE results across various down-sampling levels of individuals or cells.
Since these DEGs are defined across many tissues, they serve as a robust
benchmark for assessing sex-bias detectability.
The bulk_power_analysis
function takes as input one or
more SCE
objects, along with metadata that defines how cell
types and samples are labeled in each dataset. Using a provided bulk DEG
reference (i.e., GTEx data), the function estimates statistical power to
detect these DEGs under user-defined down-sampling schemes. Below is an
example that demonstrates how to run bulk_power_analysis()
with two scRNA-seq datasets:
library(poweranalysis)
library(SingleCellExperiment)
# Load SCE objects (replace with actual file paths or data)
allen_path <- system.file("extdata", "Allen_Endo_subset.qs", package="poweranalysis")
allen_endo <- qs::qread(allen_path)
tsai_path <- system.file("extdata", "Tsai_Micro.qs", package="poweranalysis")
tsai_micro <- qs::qread(tsai_path)
# List of SCE datasets
SCEs <- list(allen_endo, tsai_micro)
# Dataset names (used in plots and output files)
dataset_names <- c("Allen_Endo", "Tsai_Micro")
# Cell type mapping
celltype_corr <- list(Endo=c("cerebral cortex endothelial cell",NA),
Micro=c(NA,"Micro"))
# Metadata column names per SCE
celltypeIDs <- c("cell_type","cluster_celltype")
sampleIDs <- c("donor_id","sample_id")
# Load GTEx bulk DEGs
bulk_path <- system.file("extdata", "LFSR.tsv", package="poweranalysis")
bulkDE <- read.table(bulk_path, sep = "\t", header = TRUE)
# Run power analysis (Nperms=3 for speed in example)
bulk_power_analysis(
SCEs = SCEs,
dataset_names = dataset_names,
celltype_corr = celltype_corr,
celltypeIDs = celltypeIDs,
sampleIDs = sampleIDs,
bulkDE = bulkDE,
sampled = "individuals",
Nperms = 3
)
The output plot displays the percentage of PTP DEGs from the bulk data that are detected across all cell types at different down-sampling levels of the scRNA-seq datasets: