Identify the number of orthologous genes between two species.
report_orthologs(
target_species = "mouse",
reference_species = "human",
standardise_genes = FALSE,
method_all_genes = c("homologene", "gprofiler", "babelgene"),
method_convert_orthologs = method_all_genes,
drop_nonorths = TRUE,
non121_strategy = "drop_both_species",
round_digits = 2,
return_report = TRUE,
ref_genes = NULL,
mc.cores = 1,
verbose = TRUE,
...
)Target species.
Reference species.
If TRUE AND
gene_output="columns", a new column "input_gene_standard"
will be added to gene_df containing standardised HGNC symbols
identified by gorth.
Package to use in the all_genes step:
"gprofiler"Slower, but covers more species and genes.
"homologene"Faster, but covers fewer species and genes.
"babelgene"Faster, fewer species/genes; also provides consensus scores for each mapping from multiple data sources.
Package to use in the convert_orthologs step:
"gprofiler"Slower, but covers more species and genes.
"homologene"Faster, but covers fewer species and genes.
"babelgene"Faster, fewer species/genes; also provides consensus scores for each mapping from multiple data sources.
Drop genes that don't have an ortholog
in the output_species.
How to handle genes that don't have
1:1 mappings between input_species:output_species.
Options include:
"drop_both_species" or "dbs" or 1Drop genes that have duplicate
mappings in either the input_species or output_species
(DEFAULT).
"drop_input_species" or "dis" or 2Only drop genes that have duplicate
mappings in the input_species.
"drop_output_species" or "dos" or 3Only drop genes that have duplicate
mappings in the output_species.
"keep_both_species" or "kbs" or 4Keep all genes regardless of whether they have duplicate mappings in either species.
"keep_popular" or "kp" or 5Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs.
"sum","mean","median","min" or "max"When gene_df is a matrix and gene_output="rownames",
these options will aggregate many-to-one gene mappings
(input_species-to-output_species)
after dropping any duplicate genes in the output_species.
Number of digits to round to when printing percentages.
If FALSE, return only the ortholog mapping
between the two species. If TRUE, return both the ortholog mapping
and a data.frame of report statistics.
A table of all genes for reference_species.
If NULL (default), this is created via
all_genes.
Number of cores to parallelize across target_species.
Print messages.
Arguments passed on to convert_orthologs
gene_dfData object containing the genes
(see gene_input for options on how
the genes can be stored within the object).
Can be one of the following formats:
matrixA sparse or dense matrix.
data.frameA data.frame,
data.table. or tibble.
listA list or character vector.
Genes, transcripts, proteins, SNPs, or genomic ranges
can be provided in any format
(HGNC, Ensembl, RefSeq, UniProt, etc.) and will be
automatically converted to gene symbols unless
specified otherwise with the ... arguments.
Note: If you set method="homologene", you
must either supply genes in gene symbol format (e.g. "Sox2")
OR set standardise_genes=TRUE.
gene_inputWhich aspect of gene_df to
get gene names from:
"rownames"From row names of data.frame/matrix.
"colnames"From column names of data.frame/matrix.
<column name>From a column in gene_df,
e.g. "gene_names".
gene_outputHow to return genes.
Options include:
"rownames"As row names of gene_df.
"colnames"As column names of gene_df.
"columns"As new columns "input_gene", "ortholog_gene"
(and "input_gene_standard" if standardise_genes=TRUE)
in gene_df.
"dict"As a dictionary (named list) where the names are input_gene and the values are ortholog_gene.
"dict_rev"As a reversed dictionary (named list) where the names are ortholog_gene and the values are input_gene.
input_speciesName of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species.
output_speciesName of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species.
agg_funAggregation function passed to
aggregate_mapped_genes.
Set to NULL to skip aggregation step (default).
mthresholdMaximum number of ortholog names per gene to show.
Passed to gorth.
Only used when method="gprofiler" (DEFAULT : Inf).
methodR package to use for gene mapping:
"gprofiler"Slower but more species and genes.
"homologene"Faster but fewer species and genes.
"babelgene"Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
as_sparseConvert gene_df to a sparse matrix.
Only works if gene_df is one of the following classes:
matrix
Matrix
data.frame
data.table
tibble
If gene_df is a sparse matrix to begin with,
it will be returned as a sparse matrix
(so long as gene_output= "rownames" or "colnames").
sort_rowsSort gene_df rows alphanumerically.
gene_mapA data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:
gene_map=<data.frame>When a data.frame containing the
gene key:value columns
(specified by input_col and output_col, respectively)
is provided, this will be used to perform aggregation/expansion.
gene_map=NULL and input_species!=output_speciesA gene_map is automatically generated by
map_orthologs to perform inter-species
gene aggregation/expansion.
gene_map=NULL and input_species==output_speciesA gene_map is automatically generated by
map_genes to perform within-species
gene gene symbol standardization and aggregation/expansion.
as_DelayedArrayConvert aggregated matrix to DelayedArray.
input_colColumn name within gene_map with gene names matching
the row names of X.
output_colColumn name within gene_map with gene names
that you wish you map the row names of X onto.
A list containing:
A table of inter-species gene mappings.
A list of aggregate orthology report statistics.
If more than one target_species is provided, the function returns a
table of aggregated report statistics concatenated across species.
orth_fly <- report_orthologs(
target_species = "fly",
reference_species = "human"
)
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: fly
#> Common name mapping found for fly
#> 1 organism identified from search: 7227
#> Gene table with 8,438 rows retrieved.
#> Returning all 8,438 genes from fly.
#> --
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 8,438 genes extracted.
#> Converting fly ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: fly
#> Common name mapping found for fly
#> 1 organism identified from search: 7227
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 4,500 genes extracted.
#> Dropping 1 NAs of all kinds from input_gene.
#> Extracting genes from ortholog_gene.
#> 4,499 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 19 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 266 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#>
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#> 4,380 / 8,438 (52%)
#> Total genes remaining after convert_orthologs :
#> 4,058 / 8,438 (48%)
#> --
#>
#> =========== REPORT SUMMARY ===========
#> 4,058 / 8,438 (48.09%) target_species genes remain after ortholog conversion.
#> 4,058 / 19,129 (21.21%) reference_species genes remain after ortholog conversion.