Expand/aggregate rows of a matrix with any combination of many:many mappings. This method ensures that total counts per gene remain the same regardless of how many genes it has split/condensed into. This allows for many:many mappings that are otherwise not possible using standard aggregation functions, since they all require many:1 scenarios.
Internally, this is done as follows:

  1. Identify genes that appear more than once in gene_map[[input_col]].

  2. For each gene identified, split its row into multiple rows, where the number of new rows is equal to the number of times that gene appears within gene_map[[input_col]]. In the new expanded matrix, each row will be equal to the column sums divided by the number of new rows. This means that averaged counts will be split equally amongst the new rows, in a column-specific manner.
    Thus, the column sums of the output matrix will be equal to the column sums in the input matrix. In the case of gene expression count matrices, this means that the total counts will remain equal between matrices, while avoiding being forced to drop genes with many:many mappings (as is the case with most other aggregation methods).

  3. Map rownames of the expanded matrix onto the orthologous gene names from gene_map$ortholog_gene.

  4. [Optional] : When aggregate_orthologs=TRUE, aggregate rows of the expanded/mapped matrix such that there will only be 1 row per ortholog gene, using aggregate_rows. The arguments FUN, method, as_sparse, as_DelayedArray, and dropNA will all be passed to aggregate_rows if this step is selected.

many2many_rows(
  X,
  gene_map,
  input_col = "input_gene",
  output_col = "ortholog_gene",
  agg_fun = "sum",
  agg_method = c("monocle3", "stats"),
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  dropNA = TRUE,
  aggregate_orthologs = TRUE,
  verbose = TRUE
)

Source

data("exp_mouse") X <- exp_mouse gene_map <- orthogene:::map_orthologs(genes = rownames(exp_mouse), input_species = "mouse", method="homologene") X_agg <- orthogene:::many2many_rows(X = X, gene_map = gene_map) sum(duplicated(rownames(exp_mouse))) # 0 sum(duplicated(gene_map$input_gene)) # 46 sum(duplicated(gene_map$ortholog_gene)) # 56 sum(duplicated(rownames(X_agg))) # 56

Arguments

X

Input matrix.

gene_map

A data.frame generated by map_orthologs, with columns mapping input_col to output_col.

input_col

Column name within gene_map with gene names matching the row names of X.

output_col

Column name within gene_map with gene names that you wish you map the row names of X onto.

agg_fun

Aggregation function.

agg_method

Aggregation method.

as_sparse

Convert aggregated matrix to sparse matrix.

as_DelayedArray

Convert aggregated matrix to DelayedArray.

dropNA

Drop genes assigned to NA in groupings.

aggregate_orthologs

[Optional] After performing an initial round of many:many aggregation/expansion with many2many_rows, ensure each orthologous gene only appears in one row by using the aggregate_rows function (default: TRUE).

verbose

Print messages.

Value

Expanded/aggregated matrix.