Calculate the distance between peak summits and motifs
Source:R/summit_to_motif.R
summit_to_motif.Rd
summit_to_motif()
calculates the distance between each motif and its
nearest peak summit. runFimo
from the memes
package is used to
recover the locations of each motif.
Usage
summit_to_motif(
peak_input,
motif,
fp_rate = 0.05,
genome_build,
out_dir = tempdir(),
meme_path = NULL,
verbose = FALSE,
...
)
Arguments
- peak_input
Either a path to the narrowPeak file or a GRanges peak object generated by
read_peak_file()
.- motif
An object of class
universalmotif
.- fp_rate
The desired false-positive rate. A p-value threshold will be selected based on this value. The default false-positive rate is 0.05.
- genome_build
The genome build that the peak sequences should be derived from.
- out_dir
Location to save the 0-order background file. By default, the background file will be written to a temporary directory.
- meme_path
path to "meme/bin/" (default:
NULL
). Will use default search behavior as described incheck_meme_install()
if unset.- verbose
A logical indicating whether to print verbose messages while running the function. (default = FALSE)
- ...
Arguments passed on to
memes::runFimo
parse_genomic_coord
logical(1)
whether to parse genomic position from fasta headers. Fasta headers must be UCSC format positions (ie "chr:start-end"), but base 1 indexed (GRanges format). If names of fasta entries are genomic coordinates and parse_genomic_coord == TRUE, results will contain genomic coordinates of motif matches, otherwise FIMO will return relative coordinates (i.e. positions from 1 to length of the fasta entry).skip_matched_sequence
logical(1)
whether or not to include the DNA sequence of the match. Default:FALSE
. Note: jobs will complete faster if set toTRUE
.add_sequence()
can be used to lookup the sequence after data import ifparse_genomic_coord
isTRUE
, so setting this flag is not strictly needed.max_strand
if match is found on both strands, only report strand with best match (default: TRUE).
text
logical(1)
(default:TRUE
). No output files will be created on the filesystem. The results are unsorted and no q-values are computed. This setting allows fast searches on very large inputs. When set toFALSE
FIMO will discard 50% of the lower significance matches if >100,000 matches are detected.text = FALSE
will also incur a performance penalty because it must first read a file to disk, then read it into memory. For these reasons, I suggest keepingtext = TRUE
.silent
logical(1)
whether to suppress stdout/stderr printing to console (default: TRUE). If the command is failing or giving unexpected output, settingsilent = FALSE
can aid troubleshooting.
Value
A list containing an expanded GRanges peak object with metadata columns relating to motif positions along with a vector of summit-to-motif distances for each valid peak.
Details
To calculate the p-value threshold for a desired false-positive rate, we use the approximate formula: $$p \approx \frac{fp\_rate}{2 \times \text{average peak width}}$$ (Dervied from FIMO documentation)
Examples
if (memes::meme_is_installed()) {
data("CTCF_TIP_peaks", package = "MotifPeeker")
data("motif_MA1102.3", package = "MotifPeeker")
res <- summit_to_motif(
peak_input = CTCF_TIP_peaks,
motif = motif_MA1102.3,
fp_rate = 5e-02,
genome_build = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
)
print(res)
}
#> $peak_set
#> GRanges object with 30 ranges and 10 metadata columns:
#> seqnames ranges strand | name score
#> <Rle> <IRanges> <Rle> | <character> <numeric>
#> TIPseq_peak_15 chr10 67788408-67788724 * | TIPseq_peak_15 129
#> TIPseq_peak_39 chr10 68988538-68988838 * | TIPseq_peak_39 111
#> TIPseq_peak_42 chr10 69179798-69180156 * | TIPseq_peak_42 75
#> TIPseq_peak_47 chr10 69408915-69409153 * | TIPseq_peak_47 146
#> TIPseq_peak_48 chr10 69413328-69413612 * | TIPseq_peak_48 102
#> ... ... ... ... . ... ...
#> TIPseq_peak_157 chr10 73529363-73529638 * | TIPseq_peak_157 106
#> TIPseq_peak_162 chr10 73849014-73849393 * | TIPseq_peak_162 137
#> TIPseq_peak_163 chr10 73879729-73880038 * | TIPseq_peak_163 248
#> TIPseq_peak_167 chr10 73957451-73957745 * | TIPseq_peak_167 73
#> TIPseq_peak_170 chr10 73997853-73998166 * | TIPseq_peak_170 91
#> signalValue pValue qValue peak summit
#> <numeric> <numeric> <numeric> <integer> <integer>
#> TIPseq_peak_15 9.06865 17.2614 12.96200 153 67788561
#> TIPseq_peak_39 9.10857 15.2180 11.14710 168 68988706
#> TIPseq_peak_42 7.28686 11.2099 7.52416 121 69179919
#> TIPseq_peak_47 8.42133 19.1676 14.67840 110 69409025
#> TIPseq_peak_48 7.00187 14.2174 10.25130 141 69413469
#> ... ... ... ... ... ...
#> TIPseq_peak_157 8.79456 14.6750 10.66540 105 73529468
#> TIPseq_peak_162 9.57946 18.1045 13.72430 240 73849254
#> TIPseq_peak_163 12.53940 30.6089 24.84330 140 73879869
#> TIPseq_peak_167 6.58212 11.0334 7.35935 125 73957576
#> TIPseq_peak_170 7.78776 12.9930 9.14808 138 73997991
#> motif_start motif_end distance_to_summit
#> <integer> <integer> <numeric>
#> TIPseq_peak_15 67788624 67788631 -66.5
#> TIPseq_peak_39 68988582 68988589 120.5
#> TIPseq_peak_42 69179983 69179990 -67.5
#> TIPseq_peak_47 69408938 69408945 83.5
#> TIPseq_peak_48 69413548 69413555 -82.5
#> ... ... ... ...
#> TIPseq_peak_157 73529430 73529437 34.5
#> TIPseq_peak_162 73849208 73849215 42.5
#> TIPseq_peak_163 73879829 73879836 36.5
#> TIPseq_peak_167 73957554 73957561 18.5
#> TIPseq_peak_170 73998063 73998070 -75.5
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> $distance_to_summit
#> [1] -66.5 120.5 -67.5 83.5 -82.5 -48.5 25.5 50.5 -59.5 -19.5
#> [11] -70.5 -110.5 -316.5 387.5 -45.5 -64.5 91.5 -60.5 48.5 -26.5
#> [21] 31.5 -107.5 -189.5 -146.5 -114.5 34.5 42.5 36.5 18.5 -75.5
#>