Identify the number of orthologous genes between two species.

report_orthologs(
  target_species = "mouse",
  reference_species = "human",
  standardise_genes = FALSE,
  method_all_genes = c("homologene", "gprofiler", "babelgene"),
  method_convert_orthologs = method_all_genes,
  drop_nonorths = TRUE,
  non121_strategy = "drop_both_species",
  round_digits = 2,
  return_report = TRUE,
  mc.cores = 1,
  verbose = TRUE,
  ...
)

Arguments

target_species

Target species.

reference_species

Reference species.

standardise_genes

If TRUE AND gene_output="columns", a new column "input_gene_standard" will be added to gene_df containing standardised HGNC symbols identified by gorth.

method_all_genes

R package to to use in all_genes step:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

method_convert_orthologs

R package to to use in convert_orthologs step:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

drop_nonorths

Drop genes that don't have an ortholog in the output_species.

non121_strategy

How to handle genes that don't have 1:1 mappings between input_species:output_species. Options include:

  • "drop_both_species" or "dbs" or 1 :
    Drop genes that have duplicate mappings in either the input_species or output_species
    (DEFAULT).

  • "drop_input_species" or "dis" or 2 :
    Only drop genes that have duplicate mappings in the input_species.

  • "drop_output_species" or "dos" or 3 :
    Only drop genes that have duplicate mappings in the output_species.

  • "keep_both_species" or "kbs" or 4 :
    Keep all genes regardless of whether they have duplicate mappings in either species.

  • "keep_popular" or "kp" or 5 :
    Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs.

  • "sum","mean","median","min" or "max" :
    When gene_df is a matrix and gene_output="rownames", these options will aggregate many-to-one gene mappings (input_species-to-output_species) after dropping any duplicate genes in the output_species.

round_digits

Number of digits to round to when printing percentages.

return_report

Return just the ortholog mapping between two species (FALSE) or return both the ortholog mapping as well a data.frame of the report statistics (TRUE).

mc.cores

Number of cores to parallelise each target_species with.

verbose

Print messages.

...

Additional arguments to be passed to gorth or homologene.

NOTE: To return only the most "popular" interspecies ortholog mappings, supply mthreshold=1 here AND set method="gprofiler" above. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs.

For more details, please see here.

Value

A list containing:

  • map : A table of inter-species gene mappings.

  • report : A list of aggregate orthology report statistics.

If >1 target_species are provided, then a table of aggregated report statistics concatenated across species will be returned instead.

Examples

orth_fly <- orthogene::report_orthologs(
    target_species = "fly",
    reference_species = "human"
)
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: fly
#> Common name mapping found for fly
#> 1 organism identified from search: 7227
#> Gene table with 8,438 rows retrieved.
#> Returning all 8,438 genes from fly.
#> --
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 8,438 genes extracted.
#> Converting fly ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: fly
#> Common name mapping found for fly
#> 1 organism identified from search: 7227
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 4,500 genes extracted.
#> Dropping 1 NAs of all kinds from input_gene.
#> Extracting genes from ortholog_gene.
#> 4,499 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 19 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 266 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#> 
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#>    4,380 / 8,438 (52%)
#> Total genes remaining after convert_orthologs :
#>    4,058 / 8,438 (48%)
#> --
#> 
#> =========== REPORT SUMMARY ===========
#> 4,058 / 8,438 (48.09%) target_species genes remain after ortholog conversion.
#> 4,058 / 19,129 (21.21%) reference_species genes remain after ortholog conversion.