Adds biomaRt annotations (e.g. gene, gene_biotype) and QC metric annotations.

annotate_sce(
  sce,
  min_library_size = 300,
  max_library_size = "adaptive",
  min_features = 100,
  max_features = "adaptive",
  max_mito = "adaptive",
  min_ribo = 0,
  max_ribo = 1,
  min_counts = 2,
  min_cells = 2,
  drop_unmapped = TRUE,
  drop_mito = TRUE,
  drop_ribo = FALSE,
  annotate_genes = TRUE,
  annotate_cells = TRUE,
  nmads = 4,
  ensembl_mapping_file = NULL,
  species = getOption("scflow_species", default = "human")
)

Arguments

sce

a SingleCellExperiment object

min_library_size

the minimum number of counts per cell

max_library_size

the maximum number of counts per cell or "adaptive"

min_features

the minimum number of features per cell (i.e. the minimum number of genes with >0 counts)

max_features

the maximum number of features per cell or "adaptive"

max_mito

the maximum proportion of counts mapping to mitochondrial genes (0 - 1) or "adaptive"

min_ribo

the minimum proportion of counts mapping to ribosomal genes (0 - 1)

max_ribo

the maximum proportion of counts mapping to ribosomal genes (0 - 1)

min_counts

the minimum number of counts per cell in min_cells

min_cells

the minimum number of cells with min_counts

drop_unmapped

set TRUE to remove unmapped ensembl_gene_id

drop_mito

set TRUE to remove mitochondrial genes

drop_ribo

set TRUE to remove ribosomal genes

annotate_genes

optionally skip gene annotation with FALSE

annotate_cells

optionally skip cell annotation with FALSE

nmads

The number of median absolute deviations used to define outliers for adaptive thresholding.

ensembl_mapping_file

a local tsv file with ensembl_gene_id and additional columns for mapping ensembl_gene_id to gene info. If not provided, the biomaRt db is queried (slower).

species

The biological species of the sample.

Value

sce a annotated SingleCellExperiment object

Quality control options and thresholds

In addition to calculating QC metrics and annotating gene information, this function adds boolean (TRUE/FALSE) indicators of which cells/genes met the QC criteria. This enables QC reports, plots, and various QC-related tables to be saved before filtering with the filter_sce() function.

Annotations

With the default settings, the SingleCellExperiment object is annotated with:

Cell-level annotations

  • total_counts - sum of counts across all genes

  • total_features_by_counts - total number of unique genes with expression >0

  • qc_metric_min_library_size - did the cell have at least min_library_size counts

  • qc_metric_min_features - did the cell have counts >0 in at least min_features number of cells?

  • pc_mito - percentage of counts mapping to mitochondrial genes in this cell

  • qc_metric_pc_mito_ok was pc_mito <= the max_mito cutoff?

  • pc_ribo - percentage of counts mapping to ribosomal genes in this cell

  • qc_metric_pc_ribo_ok was pc_ribo <= the max_ribo cutoff?

  • qc_metric_passed - did the cell pass all of the cell QC tests

Gene-level annotations

  • gene - official gene name

  • gene_biotype - protein_coding, lncRNA, pseudogene, etc.

  • qc_metric_ensembl_mapped - was the ensembl_gene_id found in biomaRt

  • qc_metric_is_mito - is the gene mitochondrial

  • qc_metric_is_ribo - is the gene ribosomal

  • qc_metric_n_cells_expressing - number of cells with at least min_counts

  • qc_metric_is_expressive - did at least min_cells have min_counts?