Aggregate/expand a gene matrix (gene_df) using a gene mapping data.frame (gene_map). Importantly, mappings can be performed across a variety of scenarios that can occur during within-species and between-species gene mapping:

  • 1 gene : 1 gene

  • many genes : 1 gene

  • 1 gene : many genes

  • many genes : many genes

For more details on how aggregation/expansion is performed, please see: many2many_rows.

aggregate_mapped_genes(
  gene_df,
  gene_map = NULL,
  input_col = "input_gene",
  output_col = "ortholog_gene",
  input_species = "human",
  output_species = input_species,
  method = c("gprofiler", "homologene", "babelgene"),
  agg_fun = "sum",
  agg_method = c("monocle3", "stats"),
  aggregate_orthologs = TRUE,
  transpose = FALSE,
  mthreshold = 1,
  target = "ENSG",
  numeric_ns = "",
  as_integers = FALSE,
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  dropNA = TRUE,
  sort_rows = FALSE,
  verbose = TRUE
)

Arguments

gene_df

Input matrix where row names are genes.

gene_map

A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:

  • gene_map=<data.frame> :
    When a data.frame containing the gene key:value columns (specified by input_col and output_col, respectively) is provided, this will be used to perform aggregation/expansion.

  • gene_map=NULL and input_species!=output_species :
    A gene_map is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion.

  • gene_map=NULL and input_species==output_species :
    A gene_map is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion.

input_col

Column name within gene_map with gene names matching the row names of X.

output_col

Column name within gene_map with gene names that you wish you map the row names of X onto.

input_species

Name of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species.

output_species

Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species.

method

R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

agg_fun

Aggregation function.

agg_method

Aggregation method.

aggregate_orthologs

[Optional] After performing an initial round of many:many aggregation/expansion with many2many_rows, ensure each orthologous gene only appears in one row by using the aggregate_rows function (default: TRUE).

transpose

Transpose gene_df before mapping genes.

mthreshold

maximum number of results per initial alias to show. Shows all by default.

target

target namespace.

numeric_ns

namespace to use for fully numeric IDs (list of available namespaces).

as_integers

Force all values in the matrix to become integers, by applying floor (default: FALSE).

as_sparse

Convert aggregated matrix to sparse matrix.

as_DelayedArray

Convert aggregated matrix to DelayedArray.

dropNA

Drop genes assigned to NA in groupings.

sort_rows

Sort gene_df rows alphanumerically.

verbose

Print messages.

Value

Aggregated matrix

Examples

#### Aggregate within species: gene synonyms ####
data("exp_mouse_enst")                                
X_agg <- aggregate_mapped_genes(gene_df = exp_mouse_enst, 
                                input_species = "mouse")  
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: mmusculus
#> 482 / 482 (100%) genes mapped.
#> Aggregating rows using: monocle3
#> Converting obj to sparseMatrix.
#> Matrix aggregated:
#>   - Input: 482 x 7 
#>   - Output: 92 x 7
                                 
#### Aggregate across species: gene orthologs ####               
data("exp_mouse")
X_agg2 <- aggregate_mapped_genes(gene_df = exp_mouse, 
                                 input_species = "mouse",
                                 output_species = "human",
                                 method="homologene")                                                     
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Mapping many:many rows.
#> 1110012L19Rik : converting 1 row(s) --> 2 row(s).
#> 2610034B18Rik : converting 1 row(s) --> 2 row(s).
#> AA415398 : converting 1 row(s) --> 2 row(s).
#> Ankhd1 : converting 1 row(s) --> 2 row(s).
#> Anxa8 : converting 1 row(s) --> 2 row(s).
#> Apitd1 : converting 1 row(s) --> 2 row(s).
#> Arhgap8 : converting 1 row(s) --> 2 row(s).
#> Asb3 : converting 1 row(s) --> 2 row(s).
#> C4a : converting 1 row(s) --> 2 row(s).
#> C4b : converting 1 row(s) --> 2 row(s).
#> Cbs : converting 1 row(s) --> 2 row(s).
#> Ccz1 : converting 1 row(s) --> 2 row(s).
#> Ckmt1 : converting 1 row(s) --> 2 row(s).
#> Coro7 : converting 1 row(s) --> 2 row(s).
#> Cryaa : converting 1 row(s) --> 2 row(s).
#> D10Jhu81e : converting 1 row(s) --> 2 row(s).
#> F8a : converting 1 row(s) --> 3 row(s).
#> Fam21 : converting 1 row(s) --> 2 row(s).
#> Fcgr4 : converting 1 row(s) --> 2 row(s).
#> Gpr89 : converting 1 row(s) --> 2 row(s).
#> Gstt2 : converting 1 row(s) --> 2 row(s).
#> H3f3a : converting 1 row(s) --> 2 row(s).
#> H3f3b : converting 1 row(s) --> 2 row(s).
#> Hspa1a : converting 1 row(s) --> 2 row(s).
#> Icosl : converting 1 row(s) --> 2 row(s).
#> Klhl23 : converting 1 row(s) --> 2 row(s).
#> Mrpl23 : converting 1 row(s) --> 2 row(s).
#> Nbl1 : converting 1 row(s) --> 2 row(s).
#> Nomo1 : converting 1 row(s) --> 3 row(s).
#> Pmf1 : converting 1 row(s) --> 2 row(s).
#> Pom121 : converting 1 row(s) --> 2 row(s).
#> Pramef8 : converting 1 row(s) --> 2 row(s).
#> Prodh : converting 1 row(s) --> 2 row(s).
#> Ranbp2 : converting 1 row(s) --> 7 row(s).
#> Serf1 : converting 1 row(s) --> 2 row(s).
#> Sgk3 : converting 1 row(s) --> 2 row(s).
#> Slx1b : converting 1 row(s) --> 2 row(s).
#> Smn1 : converting 1 row(s) --> 2 row(s).
#> Spin2d : converting 1 row(s) --> 2 row(s).
#> Aggregating rows using: monocle3
#> Converting obj to sparseMatrix.
#> Matrix aggregated:
#>   - Input: 15,259 x 7 
#>   - Output: 13,316 x 7