Search through provided motif database to find similar motifs to the input.
Light wrapper around TOMTOM
from MEME Suite.
Usage
find_motifs(
streme_out,
motif_db,
out_dir = tempdir(),
meme_path = NULL,
workers = 1,
verbose = FALSE,
debug = FALSE,
...
)
Arguments
- streme_out
Output from
denovo_motifs
.- motif_db
Path to
.meme
format file to use as reference database, or a list ofuniversalmotif-class
objects. (optional) Results from de-novo motif discovery are searched against this database to find similar motifs. If not provided, JASPAR CORE database will be used. NOTE: p-value estimates are inaccurate when the database has fewer than 50 entries.- out_dir
A
character
vector of output directory to save STREME results to. (default =tempdir()
)- meme_path
path to "meme/bin/" (default:
NULL
). Will use default search behavior as described incheck_meme_install()
if unset.- workers
The number of workers to use for parallel processing.
- verbose
A logical indicating whether to print verbose messages while running the function. (default = FALSE)
- debug
A logical indicating whether to print debug/error messages in the HTML report. (default = FALSE)
- ...
Additional arguments to pass to
TOMTOM
. For more information, refer to the official MEME Suite documentation on TOMTOM.
Value
data.frame of match results. Contains best_match_motif
column of
universalmotif
objects with the matched PWM from the database, a series
of best_match_*
columns describing the TomTom results of the match, and a
tomtom
list column storing the ranked list of possible matches to each
motif. If a universalmotif data.frame is used as input, these columns are
appended to the data.frame. If no matches are returned, tomtom
and
best_match_motif
columns will be set to NA
and a message indicating
this will print.
Examples
data("CTCF_TIP_peaks", package = "MotifPeeker")
# \donttest{
if (memes::meme_is_installed()) {
if (requireNamespace("BSgenome.Hsapiens.UCSC.hg38", quietly = TRUE)) {
genome_build <-
BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
res <- denovo_motifs(list(CTCF_TIP_peaks),
trim_seq_width = 100,
genome_build = genome_build,
denovo_motifs = 2,
filter_n = 6,
out_dir = tempdir())
res2 <- find_motifs(res, motif_db = get_JASPARCORE(),
out_dir = tempdir())
print(res2)
}
}
#> Warning: p-values will be inaccurate if primary and control
#> Warning: p-values will be inaccurate if primary and control
#> [[1]]
#> [[1]][[1]]
#> motif name altname consensus alphabet strand
#> 1 <mot:m01_..> m01_GCCCTCTGSTGGC STREME-1 GCCMYCTGSTGGC DNA +-
#> icscore nsites pval type pseudocount bkg best_match_name
#> 1 15.22492 54 0.12 PCM 0 0.241, 0.... MA0139.2
#> best_match_altname best_db_name
#> 1 CTCF 22eb665f8684_JASPAR2024_CORE_non-redundant_pfms_meme.txt
#> best_match_offset best_match_pval best_match_eval best_match_qval
#> 1 2 3.33e-13 7.81e-10 1.56e-09
#> best_match_strand best_match_motif tomtom
#> 1 - <mot:MA01..> c("MA013....
#>
#> [Hidden empty columns: family, organism, bkgsites, qval, eval.]
#>
#> [[1]][[2]]
#> motif name altname consensus alphabet strand icscore nsites
#> 1 <mot:m02_..> m02_GGAAGTAA STREME-2 GGAAGTAA DNA +- 10.9251 21
#> pval type pseudocount bkg best_match_name best_match_altname
#> 1 0.5 PCM 0 0.241, 0.... MA1931.1 ELK1::HOXA1
#> best_db_name best_match_offset
#> 1 22eb665f8684_JASPAR2024_CORE_non-redundant_pfms_meme.txt 3
#> best_match_pval best_match_eval best_match_qval best_match_strand
#> 1 1.55e-05 0.0365 0.0725 +
#> best_match_motif tomtom
#> 1 <mot:MA19..> c("MA193....
#>
#> [Hidden empty columns: family, organism, bkgsites, qval, eval.]
#>
#>
# }