Functions to map IDs across ontologies/databases.
map_colors(
dat,
columns = NULL,
as = c("vector", "dict", "name", "function"),
what = "nodes",
preferred_palettes = NULL
)
map_genes_monarch(
dat,
gene_col,
as_graph = methods::is(dat, "tbl_graph"),
map_by_merge = FALSE,
all.x = FALSE
)
map_medgen(dat, input_col, ...)
map_mondo(
dat,
input_col = "id",
output_col = "mondo_id",
to = "mondo",
map_types = NULL,
map_to = NULL,
top_n = NULL,
add_name = TRUE,
add_definitions = TRUE,
all.x = TRUE,
allow.cartesian = FALSE,
save_dir = cache_dir()
)
map_ontology_terms(
ont,
terms = NULL,
to = c("name", "id", "short_id"),
keep_order = TRUE,
invert = FALSE,
ignore_case = TRUE,
ignore_char = c("-", "/", ",", "\\."),
verbose = 1
)
map_upheno(
pheno_map_method = c("upheno", "monarch"),
gene_map_method = c("monarch"),
filters = list(db1 = "HP", gene_taxon_label1 = "Homo sapiens"),
terms = NULL,
fill_scores = NULL,
show_plot = TRUE,
force_new = FALSE,
save_dir = cache_dir()
)
map_variants(
gr,
build = c("GRCh37", "GRCh38"),
upstream = 2000L,
downstream = 200L,
keep_chr = paste0("chr", c(seq_len(22), "X", "Y")),
ignore.strand = TRUE
)
https://data.monarchinitiative.org/upheno2/current/qc/index
https://data.monarchinitiative.org/upheno2/current/upheno-release/all/index.html
data.table with genes.
Names of columns to map colour palettes to.
A character string specifying the format to convert to.
What should get activated? Possible values are nodes
or
edges
.
Preferred palettes to use for each column.
Name of the gene column in dat
.
Return the object as a tbl_graph.
Map orthologs by merging the node data such that the
orthologous genes will appear as a new column (TRUE
).
Otherwise, the orthologs will be added as new nodes to the graph
(FALSE
).
logical; if TRUE
, rows from x
which have no matching row
in y
are included. These rows will have 'NA's in the columns that are usually
filled with values from y
. The default is FALSE
so that only rows with
data from both x
and y
are included in the output.
Column name of input IDs.
Arguments passed on to map_mondo
Column name of output IDs.
Character vector of database(s) to map IDs to.
When not "mondo"
, can supply multiple alternative databases to map to
(e.g. c("OMIM","Orphanet","DECIPHER")
).
Mapping types to include.
Mapping outputs to include (from Mondo IDs to another database's IDs).
Top number of mappings to return per top_by
grouping.
Set to NULL
to skip this step.
Logical, if TRUE, add mondo name column.
logical, if TRUE, add mondo definition column.
See allow.cartesian
in [.data.table
.
Directory to save cached data.
An ontology of class ontology_DAG.
A subset of HPO IDs to include in the final dataset and plots (e.g. c("HP:0001508","HP:0001507")).
Return a named list of the same length and order
as terms
.
If FALSE
, return a named list of only the unique terms
,
sometimes in a different order.
Invert the keys/values of the dictionary, such that the key becomes the values (and vice versa).
Ignore case when mapping terms.
A character vector of characters to ignore when mapping terms.
Print messages.
Method to use for mapping phenotypes across ontologies.
"upheno"Use uPheno's phenotype-to-phenotype mappings. Contains fewer ontologies but with greater coverage of phenotypes.
"monarch"Use Monarch's phenotype-to-phenotype mappings. Contains more ontologies but with less coverage of phenotypes.
Method to use for mapping genes across species.
"monarch"Use Monarch's gene-to-gene mappings.
A named list, where each element in the list is the name of a column in the data, and the vector within each element represents the values to include in the final data.
Fill missing scores in the "equivalence_score" and "subclass_score" columns with this value. These columns represent the quality of mapping between two phenotypes on a scale from 0-1.
Show the plot.
Create a new file instead of using any cached files.
A GRanges object.
Genome build to use when mapping genomic coordinates.
Single integer
values representing the number of base pairs
upstream of the 5'-end and downstream of the 3'-end. Used in contructing
PromoterVariants()
and IntergenicVariants()
objects only.
Which chromosomes to keep.
A logical
indicating if strand should be
ignored when performing overlaps.
Mapped data.
Mapped dat
Character vector.
A list containing the data and plot.
map_colors()
: map_
map_genes_monarch()
: map_
Map Monarch genes
Map Monarch gene IDs to HGNC gene symbols, within or across species.
map_medgen()
: map_
Map Medgen.
map_mondo()
: map_
Map to/from mondo IDs
map_ontology_terms()
: map_
Map ontology terms to an alternative name.
Harmonise a mixed vector of term names (e.g. "Focal motor seizure") and term IDs (e.g. c("HP:0000002","HP:0000003")).
map_upheno()
: map_
Map phenotypes across uPheno
Map phenotypes across species within the Unified Phenotype Ontology (uPheno). First, gathers phenotype-phenotype mappings across ontologies. Next, gathers all phenotype-gene associations for each ontology, converts all genes to human HGNC orthologs, and computes the number of overlapping orthologs between all pairs mapped phenotypes. Finally, plots the results as the proportion of intersecting genes between all pairs of phenotypes.
map_variants()
: map_
colors <- map_colors(dat=mtcars, columns=c("cyl","gear"), preferred="viridis")
#> Using palette: viridis
#> Using palette: okabe
dat <- example_dat("gene")
dt2 <- map_genes_monarch(dat=dat, gene_col="gene")
#> Filtering with `queries`.
#> Files found: 1
#> Importing 1 Monarch files.
#> - 1/1: gene_homology.all
#> Unique species with orthologs: 25
#> Filtered 'subject_db' : 6,503,843 / 6,906,371 rows dropped.
#> Unique orthologs: 314,817
#> 6 / 6 rows remain after gene orthology mapping.
dat <- example_dat(rm_types="gene")
dat2 <- map_mondo(dat = dat, map_to="hpo")
#> Loading required namespace: echogithub
#> Searching for all branches in: monarch-initiative/mondo
#> 1 matching branch(es) found:
#> - master
#> 1,433 files found in GitHub repo: monarch-initiative/mondo
#> 70 file(s) found matching query.
#> Mapping id --> mondo_id
#> Loading cached ontology: /github/home/.cache/R/KGExplorer/mondo.rds
#> 4 / 20 (20%) mondo_id missing.
#> 4 / 20 (20%) mondo_name missing.
#> 9 / 20 (45%) mondo_def missing.
ont <- get_ontology("hp")
#> Loading cached ontology: /github/home/.cache/R/KGExplorer/hp.rds
terms <- c("Focal motor seizure",
"Focal MotoR SEIzure",
"Focal-motor,/seizure.",
"HP:0000002","HP:0000003")
term_names <- map_ontology_terms(ont=ont, terms=terms)
#> Translating ontology terms to names.
term_ids <- map_ontology_terms(ont=ont, terms=terms, to="id")
#> Translating ontology terms to ids.
if (FALSE) { # \dontrun{
res <- map_upheno()
} # }
if(interactive()){
gr <- GenomicRanges::GRanges("1:100-10000")
hits <- map_variants(gr)
}