Identify the number of orthologous genes between two species.
report_orthologs(
target_species = "mouse",
reference_species = "human",
standardise_genes = FALSE,
method_all_genes = c("homologene", "gprofiler", "babelgene"),
method_convert_orthologs = method_all_genes,
drop_nonorths = TRUE,
non121_strategy = "drop_both_species",
round_digits = 2,
return_report = TRUE,
mc.cores = 1,
verbose = TRUE,
...
)
Target species.
Reference species.
If TRUE
AND
gene_output="columns"
, a new column "input_gene_standard"
will be added to gene_df
containing standardised HGNC symbols
identified by gorth.
R package to to use in all_genes step:
"gprofiler"
: Slower but more species and genes.
"homologene"
: Faster but fewer species and genes.
"babelgene"
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
R package to to use in convert_orthologs step:
"gprofiler"
: Slower but more species and genes.
"homologene"
: Faster but fewer species and genes.
"babelgene"
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
Drop genes that don't have an ortholog
in the output_species
.
How to handle genes that don't have
1:1 mappings between input_species
:output_species
.
Options include:
"drop_both_species" or "dbs" or 1
:
Drop genes that have duplicate
mappings in either the input_species
or output_species
(DEFAULT).
"drop_input_species" or "dis" or 2
:
Only drop genes that have duplicate
mappings in the input_species
.
"drop_output_species" or "dos" or 3
:
Only drop genes that have duplicate
mappings in the output_species
.
"keep_both_species" or "kbs" or 4
:
Keep all genes regardless of whether
they have duplicate mappings in either species.
"keep_popular" or "kp" or 5
:
Return only the most "popular" interspecies ortholog mappings.
This procedure tends to yield a greater number of returned genes
but at the cost of many of them not being true biological 1:1 orthologs.
"sum","mean","median","min" or "max"
:
When gene_df
is a matrix and gene_output="rownames"
,
these options will aggregate many-to-one gene mappings
(input_species
-to-output_species
)
after dropping any duplicate genes in the output_species
.
Number of digits to round to when printing percentages.
Return just the ortholog mapping
between two species (FALSE
) or return both the
ortholog mapping as well a data.frame
of the report statistics (TRUE
).
Number of cores to parallelise each
target_species
with.
Print messages.
Additional arguments to be passed to
gorth or homologene.
NOTE: To return only the most "popular"
interspecies ortholog mappings,
supply mthreshold=1
here AND set method="gprofiler"
above.
This procedure tends to yield a greater number of returned genes but at
the cost of many of them not being true biological 1:1 orthologs.
For more details, please see
here.
A list containing:
map : A table of inter-species gene mappings.
report : A list of aggregate orthology report statistics.
If >1 target_species
are provided, then a table of
aggregated report
statistics concatenated across species
will be returned instead.
orth_fly <- orthogene::report_orthologs(
target_species = "fly",
reference_species = "human"
)
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: fly
#> Common name mapping found for fly
#> 1 organism identified from search: 7227
#> Gene table with 8,438 rows retrieved.
#> Returning all 8,438 genes from fly.
#> --
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 8,438 genes extracted.
#> Converting fly ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: fly
#> Common name mapping found for fly
#> 1 organism identified from search: 7227
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 4,500 genes extracted.
#> Dropping 1 NAs of all kinds from input_gene.
#> Extracting genes from ortholog_gene.
#> 4,499 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 19 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 266 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#>
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#> 4,380 / 8,438 (52%)
#> Total genes remaining after convert_orthologs :
#> 4,058 / 8,438 (48%)
#> --
#>
#> =========== REPORT SUMMARY ===========
#> 4,058 / 8,438 (48.09%) target_species genes remain after ortholog conversion.
#> 4,058 / 19,129 (21.21%) reference_species genes remain after ortholog conversion.