Standardise a list of peak files by rebinning them into fixd-width tiles across the genome.

rebin_peaks(
  peakfiles,
  genome_build,
  intensity_cols = c("total_signal", "qValue", "Peak Score", "score"),
  bin_size = 5000,
  keep_chr = NULL,
  sep = c(":", "-"),
  drop_empty_chr = FALSE,
  as_sparse = TRUE,
  workers = check_workers(),
  verbose = TRUE,
  ...
)

Arguments

peakfiles

A list of peak files as GRanges object and/or as paths to BED files. If paths are provided, EpiCompare imports the file as GRanges object. EpiCompare also accepts a list containing a mix of GRanges objects and paths.Files must be listed and named using list(). E.g. list("name1"=file1, "name2"=file2). If no names are specified, default file names will be assigned.

genome_build

The build of **all** peak and reference files to calculate the correlation matrix on. If all peak and reference files are not of the same build use liftover_grlist to convert them all before running. Genome build should be one of hg19, hg38, mm9, mm10.

intensity_cols

Depending on which columns are present, this value will be used to get quantiles and ultimately calculate the correlations:

  • "total_signal" : Used by the peak calling software SEACR. NOTE: Another SEACR column (e.g. "max_signal") can be used together or instead of "total_signal".

  • "qValue"Used by the peak calling software MACS2/3. Should contain the negative log of the p-values after multiple testing correction.

  • "Peak Score" : Used by the peak calling software HOMER.

bin_size

Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage.

keep_chr

Which chromosomes to keep.

sep

Separator to be used after chromosome name (first item) and between start/end genomic coordinates (second item).

drop_empty_chr

Drop chromosomes that are not present in any of the peakfiles (default: FALSE).

as_sparse

Return the rebinned peaks as a sparse matrix (default: TRUE), which is more efficiently stored than a dense matrix (FALSE).

workers

Number of threads to parallelize across.

verbose

Print messages.

...

Arguments passed on to bpplapply

apply_fun

Iterator function to use.

register_now

Register the cores now with register (TRUE), or simply return the BPPARAM object (default: FALSE).

use_snowparam

Whether to use SnowParam (default: TRUE) or MulticoreParam (FALSE) when parallelising across multiple workers.

progressbar

logical(1) Enable progress bar (based on plyr:::progress_text).

X

Any object for which methods length, [, and [[ are implemented.

FUN

The function to be applied to each element of X.

Value

Binned peaks matrix

Examples

data("CnR_H3K27ac") 
data("CnT_H3K27ac")
peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) 

#increasing bin_size for speed
peakfiles_rebinned <- rebin_peaks(peakfiles = peakfiles,
                                  genome_build = "hg19",
                                  bin_size = 5000,
                                  workers = 1)
#> Standardising peak files in 647,114 bins of 5,000 bp.
#> Merging data into matrix.
#> Converting to sparse matrix.
#> Binned matrix size: 647,114 x 2
#> Matrix sparsity: 0.9968