Map functions — map_ • KGExplorer

Functions to map IDs across ontologies/databases.

map_colors(
  dat,
  columns = NULL,
  as = c("vector", "dict", "name", "function"),
  what = "nodes",
  preferred_palettes = NULL
)

map_genes_monarch(
  dat,
  gene_col,
  as_graph = methods::is(dat, "tbl_graph"),
  map_by_merge = FALSE,
  all.x = FALSE
)

map_medgen(dat, input_col, ...)

map_mondo(
  dat,
  input_col = "id",
  output_col = "mondo_id",
  to = "mondo",
  map_types = NULL,
  map_to = NULL,
  top_n = NULL,
  add_name = TRUE,
  add_definitions = TRUE,
  all.x = TRUE,
  allow.cartesian = FALSE,
  save_dir = cache_dir()
)

map_ontology_terms(
  ont,
  terms = NULL,
  to = c("name", "id", "short_id"),
  keep_order = TRUE,
  invert = FALSE,
  ignore_case = TRUE,
  ignore_char = c("-", "/", ",", "\\."),
  verbose = 1
)

map_upheno(
  pheno_map_method = c("upheno", "monarch"),
  gene_map_method = c("monarch"),
  filters = list(db1 = "HP", gene_taxon_label1 = "Homo sapiens"),
  terms = NULL,
  fill_scores = NULL,
  show_plot = TRUE,
  force_new = FALSE,
  save_dir = cache_dir()
)

map_variants(
  gr,
  build = c("GRCh37", "GRCh38"),
  upstream = 2000L,
  downstream = 200L,
  keep_chr = paste0("chr", c(seq_len(22), "X", "Y")),
  ignore.strand = TRUE
)

Source

https://data.monarchinitiative.org/upheno2/current/qc/index

https://data.monarchinitiative.org/upheno2/current/upheno-release/all/index.html

Arguments

dat

data.table with genes.

columns

Names of columns to map colour palettes to.

as

A character string specifying the format to convert to.

what

What should get activated? Possible values are nodes or edges.

preferred_palettes

Preferred palettes to use for each column.

gene_col

Name of the gene column in dat.

as_graph

Return the object as a tbl_graph.

map_by_merge

Map orthologs by merging the node data such that the orthologous genes will appear as a new column (TRUE). Otherwise, the orthologs will be added as new nodes to the graph (FALSE).

all.x

logical; if TRUE, rows from x which have no matching row in y are included. These rows will have 'NA's in the columns that are usually filled with values from y. The default is FALSE so that only rows with data from both x and y are included in the output.

input_col

Column name of input IDs.

...

Arguments passed on to map_mondo

output_col

Column name of output IDs.

to

Character vector of database(s) to map IDs to. When not "mondo", can supply multiple alternative databases to map to (e.g. c("OMIM","Orphanet","DECIPHER")).

map_types

Mapping types to include.

map_to

Mapping outputs to include (from Mondo IDs to another database's IDs).

top_n

Top number of mappings to return per top_by grouping. Set to NULL to skip this step.

add_name

Logical, if TRUE, add mondo name column.

add_definitions

logical, if TRUE, add mondo definition column.

allow.cartesian

See allow.cartesian in [.data.table.

save_dir

Directory to save cached data.

ont

An ontology of class ontology_DAG.

terms

A subset of HPO IDs to include in the final dataset and plots (e.g. c("HP:0001508","HP:0001507")).

keep_order

Return a named list of the same length and order as terms. If FALSE, return a named list of only the unique terms, sometimes in a different order.

invert

Invert the keys/values of the dictionary, such that the key becomes the values (and vice versa).

ignore_case

Ignore case when mapping terms.

ignore_char

A character vector of characters to ignore when mapping terms.

verbose

Print messages.

pheno_map_method

Method to use for mapping phenotypes across ontologies.

"upheno"Use uPheno's phenotype-to-phenotype mappings. Contains fewer ontologies but with greater coverage of phenotypes.
"monarch"Use Monarch's phenotype-to-phenotype mappings. Contains more ontologies but with less coverage of phenotypes.

gene_map_method

Method to use for mapping genes across species.

"monarch"Use Monarch's gene-to-gene mappings.

filters

A named list, where each element in the list is the name of a column in the data, and the vector within each element represents the values to include in the final data.

fill_scores

Fill missing scores in the "equivalence_score" and "subclass_score" columns with this value. These columns represent the quality of mapping between two phenotypes on a scale from 0-1.

show_plot

Show the plot.

force_new

Create a new file instead of using any cached files.

gr

A GRanges object.

build

Genome build to use when mapping genomic coordinates.

upstream, downstream

Single integer values representing the number of base pairs upstream of the 5'-end and downstream of the 3'-end. Used in contructing PromoterVariants() and IntergenicVariants() objects only.

keep_chr

Which chromosomes to keep.

ignore.strand

A logical indicating if strand should be ignored when performing overlaps.

Value

Mapped data.

Mapped dat

Character vector.

A list containing the data and plot.

Functions

map_colors(): map_
map_genes_monarch(): map_ Map Monarch genes

Map Monarch gene IDs to HGNC gene symbols, within or across species.
map_medgen(): map_ Map Medgen.
map_mondo(): map_ Map to/from mondo IDs
map_ontology_terms(): map_ Map ontology terms to an alternative name.

Harmonise a mixed vector of term names (e.g. "Focal motor seizure") and term IDs (e.g. c("HP:0000002","HP:0000003")).
map_upheno(): map_ Map phenotypes across uPheno

Map phenotypes across species within the Unified Phenotype Ontology (uPheno). First, gathers phenotype-phenotype mappings across ontologies. Next, gathers all phenotype-gene associations for each ontology, converts all genes to human HGNC orthologs, and computes the number of overlapping orthologs between all pairs mapped phenotypes. Finally, plots the results as the proportion of intersecting genes between all pairs of phenotypes.
map_variants(): map_

Examples

colors <- map_colors(dat=mtcars, columns=c("cyl","gear"), preferred="viridis")
#> Using palette: viridis
#> Using palette: okabe
dat <- example_dat("gene")
dt2 <- map_genes_monarch(dat=dat, gene_col="gene")
#> Filtering with `queries`.
#> Files found: 1
#> Importing 1 Monarch files.
#> - 1/1: gene_homology.all
#> Unique species with orthologs: 25
#> Filtered 'subject_db' : 6,503,843 / 6,906,371 rows dropped.
#> Unique orthologs: 314,817
#> 6 / 6 rows remain after gene orthology mapping.
dat <- example_dat(rm_types="gene")
dat2 <- map_mondo(dat = dat, map_to="hpo")
#> Loading required namespace: echogithub
#> Searching for all branches in: monarch-initiative/mondo
#> 1 matching branch(es) found: 
#>  - master
#> 1,446 files found in GitHub repo: monarch-initiative/mondo
#> 70 file(s) found matching query.
#> Mapping id --> mondo_id
#> Loading cached ontology: /github/home/.cache/R/KGExplorer/mondo.rds
#> 4 / 20 (20%) mondo_id missing.
#> 4 / 20 (20%) mondo_name missing.
#> 9 / 20 (45%) mondo_def missing.
ont <- get_ontology("hp")
#> Loading cached ontology: /github/home/.cache/R/KGExplorer/hp.rds
terms <- c("Focal motor seizure",
            "Focal MotoR SEIzure",
            "Focal-motor,/seizure.",
            "HP:0000002","HP:0000003")
term_names <- map_ontology_terms(ont=ont, terms=terms)
#> Translating ontology terms to names.
term_ids <- map_ontology_terms(ont=ont, terms=terms, to="id")
#> Translating ontology terms to ids.
if (FALSE) { # \dontrun{
res <- map_upheno()
} # }
if(interactive()){
gr <- GenomicRanges::GRanges("1:100-10000")
hits <- map_variants(gr)
}