Calculate the distance between peak summits and motifs

summit_to_motif() calculates the distance between each motif and its nearest peak summit. runFimo from the memes package is used to recover the locations of each motif.

Usage

summit_to_motif(
  peak_input,
  motif,
  fp_rate = 0.05,
  genome_build,
  out_dir = tempdir(),
  meme_path = NULL,
  verbose = FALSE,
  ...
)

Arguments

peak_input

Either a path to the narrowPeak file or a GRanges peak object generated by read_peak_file().

motif

An object of class universalmotif.

fp_rate

The desired false-positive rate. A p-value threshold will be selected based on this value. The default false-positive rate is 0.05.

genome_build

The genome build that the peak sequences should be derived from.

out_dir

Location to save the 0-order background file. By default, the background file will be written to a temporary directory.

meme_path

path to "meme/bin/" (default: NULL). Will use default search behavior as described in check_meme_install() if unset.

verbose

A logical indicating whether to print verbose messages while running the function. (default = FALSE)

...

Arguments passed on to memes::runFimo

parse_genomic_coord: logical(1) whether to parse genomic position from fasta headers. Fasta headers must be UCSC format positions (ie "chr:start-end"), but base 1 indexed (GRanges format). If names of fasta entries are genomic coordinates and parse_genomic_coord == TRUE, results will contain genomic coordinates of motif matches, otherwise FIMO will return relative coordinates (i.e. positions from 1 to length of the fasta entry).
skip_matched_sequence: logical(1) whether or not to include the DNA sequence of the match. Default: FALSE. Note: jobs will complete faster if set to TRUE. add_sequence() can be used to lookup the sequence after data import if parse_genomic_coord is TRUE, so setting this flag is not strictly needed.
max_strand: if match is found on both strands, only report strand with best match (default: TRUE).
text: logical(1) (default: TRUE). No output files will be created on the filesystem. The results are unsorted and no q-values are computed. This setting allows fast searches on very large inputs. When set to FALSE FIMO will discard 50% of the lower significance matches if >100,000 matches are detected. text = FALSE will also incur a performance penalty because it must first read a file to disk, then read it into memory. For these reasons, I suggest keeping text = TRUE.
silent: logical(1) whether to suppress stdout/stderr printing to console (default: TRUE). If the command is failing or giving unexpected output, setting silent = FALSE can aid troubleshooting.

Value

A list containing an expanded GRanges peak object with metadata columns relating to motif positions along with a vector of summit-to-motif distances for each valid peak.

Details

To calculate the p-value threshold for a desired false-positive rate, we use the approximate formula: $$p \approx \frac{fp\_rate}{2 \times \text{average peak width}}$$ (Dervied from FIMO documentation)

Examples