Skip to contents

Search through provided motif database to find similar motifs to the input. Light wrapper around TOMTOM from MEME Suite.

Usage

find_motifs(
  streme_out,
  motif_db,
  out_dir = tempdir(),
  meme_path = NULL,
  workers = 1,
  verbose = FALSE,
  debug = FALSE,
  ...
)

Arguments

streme_out

Output from denovo_motifs.

motif_db

Path to .meme format file to use as reference database, or a list of universalmotif-class objects. (optional) Results from de-novo motif discovery are searched against this database to find similar motifs. If not provided, JASPAR CORE database will be used. NOTE: p-value estimates are inaccurate when the database has fewer than 50 entries.

out_dir

A character vector of output directory to save STREME results to. (default = tempdir())

meme_path

path to "meme/bin/" (default: NULL). Will use default search behavior as described in check_meme_install() if unset.

workers

The number of workers to use for parallel processing.

verbose

A logical indicating whether to print verbose messages while running the function. (default = FALSE)

debug

A logical indicating whether to print debug/error messages in the HTML report. (default = FALSE)

...

Additional arguments to pass to TOMTOM. For more information, refer to the official MEME Suite documentation on TOMTOM.

Value

data.frame of match results. Contains best_match_motif column of universalmotif objects with the matched PWM from the database, a series of best_match_* columns describing the TomTom results of the match, and a tomtom list column storing the ranked list of possible matches to each motif. If a universalmotif data.frame is used as input, these columns are appended to the data.frame. If no matches are returned, tomtom and best_match_motif columns will be set to NA and a message indicating this will print.

Examples

data("CTCF_TIP_peaks", package = "MotifPeeker")
    
# \donttest{
    if (requireNamespace("BSgenome.Hsapiens.UCSC.hg38", quietly = TRUE)) {
        genome_build <-
            BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
   
        res <- denovo_motifs(list(CTCF_TIP_peaks),
                        trim_seq_width = 100,
                        genome_build = genome_build,
                        denovo_motifs = 2,
                        filter_n = 6,
                        out_dir = tempdir())
        res2 <- find_motifs(res, motif_db = get_JASPARCORE(),
                            out_dir = tempdir())
        print(res2)
    }
#> Warning: p-values will be inaccurate if primary and control
#> Warning: p-values will be inaccurate if primary and control
#> [[1]]
#> [[1]][[1]]
#>          motif              name  altname     consensus alphabet strand
#> 1 <mot:m01_..> m01_GCCCTCTGSTGGC STREME-1 GCCMYCTGSTGGC      DNA     +-
#>    icscore nsites pval type pseudocount          bkg best_match_name
#> 1 15.22492     54 0.12  PCM           0 0.241, 0....        MA0139.2
#>   best_match_altname                                            best_db_name
#> 1               CTCF 702433f7104_JASPAR2024_CORE_non-redundant_pfms_meme.txt
#>   best_match_offset best_match_pval best_match_eval best_match_qval
#> 1                 2        3.33e-13        7.81e-10        1.56e-09
#>   best_match_strand best_match_motif       tomtom
#> 1                 -     <mot:MA01..> c("MA013....
#> 
#> [Hidden empty columns: family, organism, bkgsites, qval, eval.]
#> 
#> [[1]][[2]]
#>          motif         name  altname consensus alphabet strand icscore nsites
#> 1 <mot:m02_..> m02_GGAAGTAA STREME-2  GGAAGTAA      DNA     +- 10.9251     21
#>   pval type pseudocount          bkg best_match_name best_match_altname
#> 1  0.5  PCM           0 0.241, 0....        MA1931.1        ELK1::HOXA1
#>                                              best_db_name best_match_offset
#> 1 702433f7104_JASPAR2024_CORE_non-redundant_pfms_meme.txt                 3
#>   best_match_pval best_match_eval best_match_qval best_match_strand
#> 1        1.55e-05          0.0365          0.0725                 +
#>   best_match_motif       tomtom
#> 1     <mot:MA19..> c("MA193....
#> 
#> [Hidden empty columns: family, organism, bkgsites, qval, eval.]
#> 
#> 
# }