Skip to contents

summit_to_motif() calculates the distance between each motif and its nearest peak summit. runFimo from the memes package is used to recover the locations of each motif.

Usage

summit_to_motif(
  peak_input,
  motif,
  fp_rate = 0.05,
  genome_build,
  out_dir = tempdir(),
  meme_path = NULL,
  verbose = FALSE,
  ...
)

Arguments

peak_input

Either a path to the narrowPeak file or a GRanges peak object generated by read_peak_file().

motif

An object of class universalmotif.

fp_rate

The desired false-positive rate. A p-value threshold will be selected based on this value. The default false-positive rate is 0.05.

genome_build

The genome build that the peak sequences should be derived from.

out_dir

Location to save the 0-order background file. By default, the background file will be written to a temporary directory.

meme_path

path to "meme/bin/" (default: NULL). Will use default search behavior as described in check_meme_install() if unset.

verbose

A logical indicating whether to print verbose messages while running the function. (default = FALSE)

...

Arguments passed on to memes::runFimo

parse_genomic_coord

logical(1) whether to parse genomic position from fasta headers. Fasta headers must be UCSC format positions (ie "chr:start-end"), but base 1 indexed (GRanges format). If names of fasta entries are genomic coordinates and parse_genomic_coord == TRUE, results will contain genomic coordinates of motif matches, otherwise FIMO will return relative coordinates (i.e. positions from 1 to length of the fasta entry).

skip_matched_sequence

logical(1) whether or not to include the DNA sequence of the match. Default: FALSE. Note: jobs will complete faster if set to TRUE. add_sequence() can be used to lookup the sequence after data import if parse_genomic_coord is TRUE, so setting this flag is not strictly needed.

max_strand

if match is found on both strands, only report strand with best match (default: TRUE).

text

logical(1) (default: TRUE). No output files will be created on the filesystem. The results are unsorted and no q-values are computed. This setting allows fast searches on very large inputs. When set to FALSE FIMO will discard 50% of the lower significance matches if >100,000 matches are detected. text = FALSE will also incur a performance penalty because it must first read a file to disk, then read it into memory. For these reasons, I suggest keeping text = TRUE.

silent

logical(1) whether to suppress stdout/stderr printing to console (default: TRUE). If the command is failing or giving unexpected output, setting silent = FALSE can aid troubleshooting.

Value

A list containing an expanded GRanges peak object with metadata columns relating to motif positions along with a vector of summit-to-motif distances for each valid peak.

Details

To calculate the p-value threshold for a desired false-positive rate, we use the approximate formula: $$p \approx \frac{fp\_rate}{2 \times \text{average peak width}}$$ (Dervied from FIMO documentation)

See also

Examples

data("CTCF_TIP_peaks", package = "MotifPeeker")
data("motif_MA1102.3", package = "MotifPeeker")

res <- summit_to_motif(
    peak_input = CTCF_TIP_peaks,
    motif = motif_MA1102.3,
    fp_rate = 5e-02,
    genome_build = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
)
print(res)
#> $peak_set
#> GRanges object with 30 ranges and 10 metadata columns:
#>                   seqnames            ranges strand |            name     score
#>                      <Rle>         <IRanges>  <Rle> |     <character> <numeric>
#>    TIPseq_peak_15    chr10 67788408-67788724      * |  TIPseq_peak_15       129
#>    TIPseq_peak_39    chr10 68988538-68988838      * |  TIPseq_peak_39       111
#>    TIPseq_peak_42    chr10 69179798-69180156      * |  TIPseq_peak_42        75
#>    TIPseq_peak_47    chr10 69408915-69409153      * |  TIPseq_peak_47       146
#>    TIPseq_peak_48    chr10 69413328-69413612      * |  TIPseq_peak_48       102
#>               ...      ...               ...    ... .             ...       ...
#>   TIPseq_peak_157    chr10 73529363-73529638      * | TIPseq_peak_157       106
#>   TIPseq_peak_162    chr10 73849014-73849393      * | TIPseq_peak_162       137
#>   TIPseq_peak_163    chr10 73879729-73880038      * | TIPseq_peak_163       248
#>   TIPseq_peak_167    chr10 73957451-73957745      * | TIPseq_peak_167        73
#>   TIPseq_peak_170    chr10 73997853-73998166      * | TIPseq_peak_170        91
#>                   signalValue    pValue    qValue      peak    summit
#>                     <numeric> <numeric> <numeric> <integer> <integer>
#>    TIPseq_peak_15     9.06865   17.2614  12.96200       153  67788561
#>    TIPseq_peak_39     9.10857   15.2180  11.14710       168  68988706
#>    TIPseq_peak_42     7.28686   11.2099   7.52416       121  69179919
#>    TIPseq_peak_47     8.42133   19.1676  14.67840       110  69409025
#>    TIPseq_peak_48     7.00187   14.2174  10.25130       141  69413469
#>               ...         ...       ...       ...       ...       ...
#>   TIPseq_peak_157     8.79456   14.6750  10.66540       105  73529468
#>   TIPseq_peak_162     9.57946   18.1045  13.72430       240  73849254
#>   TIPseq_peak_163    12.53940   30.6089  24.84330       140  73879869
#>   TIPseq_peak_167     6.58212   11.0334   7.35935       125  73957576
#>   TIPseq_peak_170     7.78776   12.9930   9.14808       138  73997991
#>                   motif_start motif_end distance_to_summit
#>                     <integer> <integer>          <numeric>
#>    TIPseq_peak_15    67788624  67788631              -66.5
#>    TIPseq_peak_39    68988582  68988589              120.5
#>    TIPseq_peak_42    69179983  69179990              -67.5
#>    TIPseq_peak_47    69408938  69408945               83.5
#>    TIPseq_peak_48    69413548  69413555              -82.5
#>               ...         ...       ...                ...
#>   TIPseq_peak_157    73529430  73529437               34.5
#>   TIPseq_peak_162    73849208  73849215               42.5
#>   TIPseq_peak_163    73879829  73879836               36.5
#>   TIPseq_peak_167    73957554  73957561               18.5
#>   TIPseq_peak_170    73998063  73998070              -75.5
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $distance_to_summit
#>  [1]  -66.5  120.5  -67.5   83.5  -82.5  -48.5   25.5   50.5  -59.5  -19.5
#> [11]  -70.5 -110.5 -316.5  387.5  -45.5  -64.5   91.5  -60.5   48.5  -26.5
#> [21]   31.5 -107.5 -189.5 -146.5 -114.5   34.5   42.5   36.5   18.5  -75.5
#>