Create a gene background as the union/intersect of
all orthologs between input species (species1
and species2
),
and the output_species
.
This can be useful when generating random lists of background genes
to test against in analyses with data from multiple species
(e.g. enrichment of mouse cell-type markers gene sets in
human GWAS-derived gene sets).
create_background(
species1,
species2,
output_species = "human",
as_output_species = TRUE,
use_intersect = TRUE,
bg = NULL,
gene_map = NULL,
method = "homologene",
non121_strategy = "drop_both_species",
verbose = TRUE
)
First species.
Second species.
Species to convert all genes from
species1
and species2
to first.
Default="human"
, but can be to either any species
supported by orthogene, including
species1
or species2
.
Return background gene list as
output_species
orthologs, instead of the
gene names of the original input species.
When species1
and species2
are both
different from output_species
, this argument will determine whether
to use the intersect (TRUE
) or union (FALSE
) of all genes
from species1
and species2
.
User supplied background list that will be returned to the user after removing duplicate genes.
User-supplied gene_map
data table from
map_orthologs or map_genes.
R package to use for gene mapping:
"gprofiler"
: Slower but more species and genes.
"homologene"
: Faster but fewer species and genes.
"babelgene"
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
How to handle genes that don't have
1:1 mappings between input_species
:output_species
.
Options include:
"drop_both_species" or "dbs" or 1
:
Drop genes that have duplicate
mappings in either the input_species
or output_species
(DEFAULT).
"drop_input_species" or "dis" or 2
:
Only drop genes that have duplicate
mappings in the input_species
.
"drop_output_species" or "dos" or 3
:
Only drop genes that have duplicate
mappings in the output_species
.
"keep_both_species" or "kbs" or 4
:
Keep all genes regardless of whether
they have duplicate mappings in either species.
"keep_popular" or "kp" or 5
:
Return only the most "popular" interspecies ortholog mappings.
This procedure tends to yield a greater number of returned genes
but at the cost of many of them not being true biological 1:1 orthologs.
"sum","mean","median","min" or "max"
:
When gene_df
is a matrix and gene_output="rownames"
,
these options will aggregate many-to-one gene mappings
(input_species
-to-output_species
)
after dropping any duplicate genes in the output_species
.
Print messages.
Background gene list.
bg <- orthogene::create_background(species1 = "mouse",
species2 = "rat",
output_species = "human")
#> Generating gene background for mouse x rat ==> human
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Gene table with 21,207 rows retrieved.
#> Returning all 21,207 genes from mouse.
#> --
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 21,207 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 17,355 genes extracted.
#> Extracting genes from ortholog_gene.
#> 17,355 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#>
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#> 4,725 / 21,207 (22%)
#> Total genes remaining after convert_orthologs :
#> 16,482 / 21,207 (78%)
#> --
#>
#> =========== REPORT SUMMARY ===========
#> 16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion.
#> 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion.
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: rat
#> Common name mapping found for rat
#> 1 organism identified from search: 10116
#> Gene table with 20,616 rows retrieved.
#> Returning all 20,616 genes from rat.
#> --
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 20,616 genes extracted.
#> Converting rat ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: rat
#> Common name mapping found for rat
#> 1 organism identified from search: 10116
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 16,989 genes extracted.
#> Extracting genes from ortholog_gene.
#> 16,989 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 122 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 607 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#>
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#> 4,813 / 20,616 (23%)
#> Total genes remaining after convert_orthologs :
#> 15,803 / 20,616 (77%)
#> --
#>
#> =========== REPORT SUMMARY ===========
#> 15,803 / 20,616 (76.65%) target_species genes remain after ortholog conversion.
#> 15,803 / 19,129 (82.61%) reference_species genes remain after ortholog conversion.
#> 15,450 intersect background genes used.