Skip to contents

Use STREME from MEME suite to find de-novo motifs in the provided sequences. To speed up the process, the sequences can be optionally trimmed to reduce the search space. The result is then optionally filtered to remove motifs with a high number of nucleotide repeats

Usage

denovo_motifs(
  seqs,
  trim_seq_width,
  genome_build,
  denovo_motifs = 3,
  minw = 8,
  maxw = 25,
  filter_n = 6,
  out_dir = tempdir(),
  meme_path = NULL,
  workers = 1,
  verbose = FALSE,
  debug = FALSE,
  ...
)

Arguments

seqs

A list of GRanges objects containing sequences to search for motifs.

trim_seq_width

An integer specifying the width of the sequence to extract around the summit (default = NULL). This sequence is used to search for de novo motifs. If not provided, the entire peak region will be used. This parameter is intended to reduce the search space and speed up motif discovery; therefore, a value less than the average peak width is recommended. Peaks are trimmed symmetrically around the summit while respecting the peak bounds.

genome_build

The genome build that the peak sequences should be derived from.

denovo_motifs

An integer specifying the number of de-novo motifs to discover. (default = 3) Note that higher values take longer to compute.

minw

An integer specifying the minimum width of the motif. (default = 8)

maxw

An integer specifying the maximum width of the motif. (default = 25)

filter_n

An integer specifying the number of consecutive nucleotide repeats a de-novo discovered motif must contain to be filtered out. (default = 6)

out_dir

A character vector of output directory to save STREME results to. (default = tempdir())

meme_path

path to "meme/bin/" (default: NULL). Will use default search behavior as described in check_meme_install() if unset.

workers

The number of workers to use for parallel processing.

verbose

A logical indicating whether to print verbose messages while running the function. (default = FALSE)

debug

A logical indicating whether to print debug/error messages in the HTML report. (default = FALSE)

...

Additional arguments to pass to STREME. For more information, refer to the official MEME Suite documentation on STREME.

Value

A list of universalmotif objects and associated metadata.

Examples

data("CTCF_TIP_peaks", package = "MotifPeeker")
if (requireNamespace("BSgenome.Hsapiens.UCSC.hg38", quietly = TRUE)) {
    genome_build <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
   
    res <- denovo_motifs(list(CTCF_TIP_peaks),
                        trim_seq_width = 100,
                        genome_build = genome_build,
                        denovo_motifs = 2,
                        filter_n = 6,
                        out_dir = tempdir())
    print(res[[1]]$consensus)
}
#> Warning: p-values will be inaccurate if primary and control
#> Warning: p-values will be inaccurate if primary and control
#> [1] "GCCMYCTGSTGGC" "GGAAGTAA"