Expand/aggregate rows of a matrix with any combination of
many:many mappings.
This method ensures that total counts per gene remain the
same regardless of how many genes it has split/condensed into.
This allows for many:many mappings that are otherwise not possible
using standard aggregation functions,
since they all require many:1 scenarios.
Internally, this is done as follows:
Identify genes that appear more than once
in gene_map[[input_col]]
.
For each gene identified, split its row into multiple
rows, where the number of new rows is equal to the number of times that
gene appears within gene_map[[input_col]]
.
In the new expanded matrix, each row will be equal to the column sums
divided by the number of new rows. This means that averaged counts will be
split equally amongst the new rows, in a column-specific manner.
Thus, the column sums of the output matrix will be equal
to the column sums in the input matrix.
In the case of gene expression count matrices,
this means that the total counts will remain equal between matrices,
while avoiding being forced to drop genes with many:many mappings
(as is the case with most other aggregation methods).
Map rownames of the expanded matrix onto the orthologous gene names
from gene_map$ortholog_gene
.
[Optional] : When aggregate_orthologs=TRUE
,
aggregate rows of the expanded/mapped matrix
such that there will only be 1 row per ortholog gene,
using aggregate_rows.
The arguments FUN
, method
,
as_sparse
, as_DelayedArray
, and dropNA
will all
be passed to aggregate_rows if this step is selected.
many2many_rows(
X,
gene_map,
input_col = "input_gene",
output_col = "ortholog_gene",
agg_fun = "sum",
agg_method = c("monocle3", "stats"),
as_sparse = TRUE,
as_DelayedArray = FALSE,
dropNA = TRUE,
aggregate_orthologs = TRUE,
verbose = TRUE
)
data("exp_mouse")
X <- exp_mouse
gene_map <- orthogene:::map_orthologs(genes = rownames(exp_mouse),
input_species = "mouse",
method="homologene")
X_agg <- orthogene:::many2many_rows(X = X,
gene_map = gene_map)
sum(duplicated(rownames(exp_mouse))) # 0
sum(duplicated(gene_map$input_gene)) # 46
sum(duplicated(gene_map$ortholog_gene)) # 56
sum(duplicated(rownames(X_agg))) # 56
Input matrix.
A data.frame generated by
map_orthologs,
with columns mapping input_col
to output_col
.
Column name within gene_map
with gene names matching
the row names of X
.
Column name within gene_map
with gene names
that you wish you map the row names of X
onto.
Aggregation function.
Aggregation method.
Convert aggregated matrix to sparse matrix.
Convert aggregated matrix to DelayedArray.
Drop genes assigned to NA
in groupings
.
[Optional] After performing an initial round of
many:many aggregation/expansion with many2many_rows,
ensure each orthologous gene only appears in one row by using the
aggregate_rows function (default: TRUE
).
Print messages.
Expanded/aggregated matrix.