Standardise a list of peak files by rebinning them into fixd-width tiles across the genome.
rebin_peaks(
peakfiles,
genome_build,
intensity_cols = c("total_signal", "qValue", "Peak Score", "score"),
bin_size = 5000,
keep_chr = NULL,
sep = c(":", "-"),
drop_empty_chr = FALSE,
as_sparse = TRUE,
workers = check_workers(),
verbose = TRUE,
...
)
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using list()
.
E.g. list("name1"=file1, "name2"=file2)
. If no names are specified,
default file names will be assigned.
The build of **all** peak and reference files to calculate the correlation matrix on. If all peak and reference files are not of the same build use liftover_grlist to convert them all before running. Genome build should be one of hg19, hg38, mm9, mm10.
Depending on which columns are present, this value will be used to get quantiles and ultimately calculate the correlations:
"total_signal" : Used by the peak calling software SEACR. NOTE: Another SEACR column (e.g. "max_signal") can be used together or instead of "total_signal".
"qValue"Used by the peak calling software MACS2/3. Should contain the negative log of the p-values after multiple testing correction.
"Peak Score" : Used by the peak calling software HOMER.
Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage.
Which chromosomes to keep.
Separator to be used after chromosome name (first item) and between start/end genomic coordinates (second item).
Drop chromosomes that are not present in any of the
peakfiles
(default: FALSE
).
Return the rebinned peaks as a sparse matrix
(default: TRUE
),
which is more efficiently stored than a dense matrix (FALSE
).
Number of threads to parallelize across.
Print messages.
Arguments passed on to bpplapply
apply_fun
Iterator function to use.
register_now
Register the cores now with
register (TRUE
),
or simply return the BPPARAM object
(default: FALSE
).
use_snowparam
Whether to use
SnowParam (default: TRUE
) or
MulticoreParam (FALSE
)
when parallelising across multiple workers
.
progressbar
logical(1)
Enable progress bar (based on plyr:::progress_text).
X
Any object for which methods length
, [
, and
[[
are implemented.
FUN
The function
to be applied to each element of X
.
Binned peaks matrix
data("CnR_H3K27ac")
data("CnT_H3K27ac")
peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac)
#increasing bin_size for speed
peakfiles_rebinned <- rebin_peaks(peakfiles = peakfiles,
genome_build = "hg19",
bin_size = 5000,
workers = 1)
#> Standardising peak files in 647,114 bins of 5,000 bp.
#> Merging data into matrix.
#> Converting to sparse matrix.
#> Binned matrix size: 647,114 x 2
#> Matrix sparsity: 0.9968