R/aggregate_mapped_genes.R
aggregate_mapped_genes.Rd
Aggregate/expand a gene matrix (gene_df
) using a gene mapping
data.frame (gene_map
).
Importantly, mappings can be performed across a variety of scenarios that
can occur during within-species and between-species gene mapping:
1 gene : 1 gene
many genes : 1 gene
1 gene : many genes
many genes : many genes
For more details on how aggregation/expansion is performed, please see: many2many_rows.
aggregate_mapped_genes(
gene_df,
gene_map = NULL,
input_col = "input_gene",
output_col = "ortholog_gene",
input_species = "human",
output_species = input_species,
method = c("gprofiler", "homologene", "babelgene"),
agg_fun = "sum",
agg_method = c("monocle3", "stats"),
aggregate_orthologs = TRUE,
transpose = FALSE,
mthreshold = 1,
target = "ENSG",
numeric_ns = "",
as_integers = FALSE,
as_sparse = TRUE,
as_DelayedArray = FALSE,
dropNA = TRUE,
sort_rows = FALSE,
verbose = TRUE
)
Input matrix where row names are genes.
A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:
gene_map=<data.frame>
:
When a data.frame containing the
gene key:value columns
(specified by input_col
and output_col
, respectively)
is provided, this will be used to perform aggregation/expansion.
gene_map=NULL
and input_species!=output_species
:
A gene_map
is automatically generated by
map_orthologs to perform inter-species
gene aggregation/expansion.
gene_map=NULL
and input_species==output_species
:
A gene_map
is automatically generated by
map_genes to perform within-species
gene gene symbol standardization and aggregation/expansion.
Column name within gene_map
with gene names matching
the row names of X
.
Column name within gene_map
with gene names
that you wish you map the row names of X
onto.
Name of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species.
Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species.
R package to use for gene mapping:
"gprofiler"
: Slower but more species and genes.
"homologene"
: Faster but fewer species and genes.
"babelgene"
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
Aggregation function.
Aggregation method.
[Optional] After performing an initial round of
many:many aggregation/expansion with many2many_rows,
ensure each orthologous gene only appears in one row by using the
aggregate_rows function (default: TRUE
).
Transpose gene_df
before mapping genes.
maximum number of results per initial alias to show. Shows all by default.
target namespace.
namespace to use for fully numeric IDs (list of available namespaces).
Force all values in the matrix to become integers,
by applying floor (default: FALSE
).
Convert aggregated matrix to sparse matrix.
Convert aggregated matrix to DelayedArray.
Drop genes assigned to NA
in groupings
.
Sort gene_df
rows alphanumerically.
Print messages.
Aggregated matrix
#### Aggregate within species: gene synonyms ####
data("exp_mouse_enst")
X_agg <- aggregate_mapped_genes(gene_df = exp_mouse_enst,
input_species = "mouse")
#> Retrieving all organisms available in gprofiler.
#> Using stored `gprofiler_orgs`.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: mmusculus
#> 482 / 482 (100%) genes mapped.
#> Aggregating rows using: monocle3
#> Converting obj to sparseMatrix.
#> Matrix aggregated:
#> - Input: 482 x 7
#> - Output: 92 x 7
#### Aggregate across species: gene orthologs ####
data("exp_mouse")
X_agg2 <- aggregate_mapped_genes(gene_df = exp_mouse,
input_species = "mouse",
output_species = "human",
method="homologene")
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Mapping many:many rows.
#> 1110012L19Rik : converting 1 row(s) --> 2 row(s).
#> 2610034B18Rik : converting 1 row(s) --> 2 row(s).
#> AA415398 : converting 1 row(s) --> 2 row(s).
#> Ankhd1 : converting 1 row(s) --> 2 row(s).
#> Anxa8 : converting 1 row(s) --> 2 row(s).
#> Apitd1 : converting 1 row(s) --> 2 row(s).
#> Arhgap8 : converting 1 row(s) --> 2 row(s).
#> Asb3 : converting 1 row(s) --> 2 row(s).
#> C4a : converting 1 row(s) --> 2 row(s).
#> C4b : converting 1 row(s) --> 2 row(s).
#> Cbs : converting 1 row(s) --> 2 row(s).
#> Ccz1 : converting 1 row(s) --> 2 row(s).
#> Ckmt1 : converting 1 row(s) --> 2 row(s).
#> Coro7 : converting 1 row(s) --> 2 row(s).
#> Cryaa : converting 1 row(s) --> 2 row(s).
#> D10Jhu81e : converting 1 row(s) --> 2 row(s).
#> F8a : converting 1 row(s) --> 3 row(s).
#> Fam21 : converting 1 row(s) --> 2 row(s).
#> Fcgr4 : converting 1 row(s) --> 2 row(s).
#> Gpr89 : converting 1 row(s) --> 2 row(s).
#> Gstt2 : converting 1 row(s) --> 2 row(s).
#> H3f3a : converting 1 row(s) --> 2 row(s).
#> H3f3b : converting 1 row(s) --> 2 row(s).
#> Hspa1a : converting 1 row(s) --> 2 row(s).
#> Icosl : converting 1 row(s) --> 2 row(s).
#> Klhl23 : converting 1 row(s) --> 2 row(s).
#> Mrpl23 : converting 1 row(s) --> 2 row(s).
#> Nbl1 : converting 1 row(s) --> 2 row(s).
#> Nomo1 : converting 1 row(s) --> 3 row(s).
#> Pmf1 : converting 1 row(s) --> 2 row(s).
#> Pom121 : converting 1 row(s) --> 2 row(s).
#> Pramef8 : converting 1 row(s) --> 2 row(s).
#> Prodh : converting 1 row(s) --> 2 row(s).
#> Ranbp2 : converting 1 row(s) --> 7 row(s).
#> Serf1 : converting 1 row(s) --> 2 row(s).
#> Sgk3 : converting 1 row(s) --> 2 row(s).
#> Slx1b : converting 1 row(s) --> 2 row(s).
#> Smn1 : converting 1 row(s) --> 2 row(s).
#> Spin2d : converting 1 row(s) --> 2 row(s).
#> Aggregating rows using: monocle3
#> Converting obj to sparseMatrix.
#> Matrix aggregated:
#> - Input: 15,259 x 7
#> - Output: 13,316 x 7