Currently supports ortholog mapping between any
pair of 700+ species.
Use map_species to return a full list of available organisms.
convert_orthologs( gene_df, gene_input = "rownames", gene_output = "rownames", standardise_genes = FALSE, input_species, output_species = "human", method = c("gprofiler", "homologene", "babelgene"), drop_nonorths = TRUE, non121_strategy = "drop_both_species", agg_fun = NULL, mthreshold = Inf, as_sparse = FALSE, as_DelayedArray = FALSE, sort_rows = FALSE, verbose = TRUE, ... )
Data object containing the genes
gene_input for options on how
the genes can be stored within the object).
Can be one of the following formats:
A sparse or dense matrix.
list or character
Genes, transcripts, proteins, SNPs, or genomic ranges
can be provided in any format
(HGNC, Ensembl, RefSeq, UniProt, etc.) and will be
automatically converted to gene symbols unless
specified otherwise with the
Note: If you set
must either supply genes in gene symbol format (e.g. "Sox2")
Which aspect of
get gene names from:
From row names of data.frame/matrix.
From column names of data.frame/matrix.
<column name> :
From a column in
How to return genes.
As row names of
As column names of
As new columns "input_gene", "ortholog_gene" (and "input_gene_standard" if
As a dictionary (named list) where the names are input_gene and the values are ortholog_gene.
As a reversed dictionary (named list) where the names are ortholog_gene and the values are input_gene.
gene_output="columns", a new column "input_gene_standard"
will be added to
gene_df containing standardised HGNC symbols
identified by gorth.
Name of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species.
Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species.
R package to use for gene mapping:
"gprofiler" : Slower but more species and genes.
"homologene" : Faster but fewer species and genes.
"babelgene" : Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
Drop genes that don't have an ortholog
How to handle genes that don't have
1:1 mappings between
"drop_both_species" or "dbs" or 1 :
Drop genes that have duplicate mappings in either the
"drop_input_species" or "dis" or 2 :
Only drop genes that have duplicate mappings in the
"drop_output_species" or "dos" or 3 :
Only drop genes that have duplicate mappings in the
"keep_both_species" or "kbs" or 4 :
Keep all genes regardless of whether they have duplicate mappings in either species.
"keep_popular" or "kp" or 5 :
Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs.
"sum","mean","median","min" or "max" :
gene_df is a matrix and
these options will aggregate many-to-one gene mappings
after dropping any duplicate genes in the
Aggregation function passed to
NULL to skip aggregation step (default).
Maximum number of ortholog names per gene to show.
Passed to gorth.
Only used when
method="gprofiler" (DEFAULT :
gene_df to a sparse matrix.
Only works if
gene_df is one of the following classes:
gene_df is a sparse matrix to begin with,
it will be returned as a sparse matrix
(so long as
Convert aggregated matrix to DelayedArray.
gene_df rows alphanumerically.
Additional arguments to be passed to
gorth or homologene.
NOTE: To return only the most "popular" interspecies ortholog mappings, supply
mthreshold=1 here AND set
This procedure tends to yield a greater number of returned genes but at
the cost of many of them not being true biological 1:1 orthologs.
For more details, please see here.
gene_df with orthologs converted to the
Instead returned as a dictionary (named list) if
data("exp_mouse") gene_df <- convert_orthologs( gene_df = exp_mouse, input_species = "mouse" ) #> Preparing gene_df. #> sparseMatrix format detected. #> Extracting genes from rownames. #> 15,259 genes extracted. #> Converting mouse ==> human orthologs using: gprofiler #> Retrieving all organisms available in gprofiler. #> Using stored `gprofiler_orgs`. #> Mapping species name: mouse #> Common name mapping found for mouse #> 1 organism identified from search: mmusculus #> Retrieving all organisms available in gprofiler. #> Using stored `gprofiler_orgs`. #> Mapping species name: human #> Common name mapping found for human #> 1 organism identified from search: hsapiens #> Checking for genes without orthologs in human. #> Extracting genes from input_gene. #> 16,022 genes extracted. #> Extracting genes from ortholog_gene. #> 16,022 genes extracted. #> Dropping 2,659 NAs of all kinds from ortholog_gene. #> Checking for genes without 1:1 orthologs. #> Dropping 452 genes that have multiple input_gene per ortholog_gene (many:1). #> Dropping 341 genes that have multiple ortholog_gene per input_gene (1:many). #> Filtering gene_df with gene_map #> Setting ortholog_gene to rownames. #> Loading required namespace: DelayedArray #> #> =========== REPORT SUMMARY =========== #> Total genes dropped after convert_orthologs : #> 2,908 / 15,259 (19%) #> Total genes remaining after convert_orthologs : #> 12,351 / 15,259 (81%)