Compute correlation matrix on all peak files.
compute_corr(
peakfiles,
reference = NULL,
genome_build,
keep_chr = NULL,
drop_empty_chr = FALSE,
bin_size = 5000,
method = "spearman",
intensity_cols = c("total_signal", "qValue", "Peak Score", "score"),
return_bins = FALSE,
fill_diag = NA,
workers = check_workers(),
save_path = tempfile(fileext = ".corr.csv.gz")
)
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using list()
.
E.g. list("name1"=file1, "name2"=file2)
. If no names are specified,
default file names will be assigned.
A named list containing reference peak file(s) as GRanges
object. Please ensure that the reference file is listed and named
i.e. list("reference_name" = reference_peak)
. If more than one
reference is specified, individual reports for each reference will be
generated. However, please note that specifying more than one reference can
take awhile. If a reference is specified, it enables two analyses: (1) plot
showing statistical significance of overlapping/non-overlapping peaks; and
(2) ChromHMM of overlapping/non-overlapping peaks.
The build of **all** peak and reference files to calculate the correlation matrix on. If all peak and reference files are not of the same build use liftover_grlist to convert them all before running. Genome build should be one of hg19, hg38, mm9, mm10.
Which chromosomes to keep.
Drop chromosomes that are not present in any of the
peakfiles
(default: FALSE
).
Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage.
Default spearman (i.e. non-parametric). A character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson", "kendall", or "spearman": can be abbreviated.
Depending on which columns are present, this value will be used to get quantiles and ultimately calculate the correlations:
"total_signal" : Used by the peak calling software SEACR. NOTE: Another SEACR column (e.g. "max_signal") can be used together or instead of "total_signal".
"qValue"Used by the peak calling software MACS2/3. Should contain the negative log of the p-values after multiple testing correction.
"Peak Score" : Used by the peak calling software HOMER.
If TRUE
, returns a named list
with both the rebinned
(standardised) peaks ("bin") and the correlation matrix ("cor").
If FALSE
(default), returns only the correlation matrix (unlisted).
Fill the diagonal of the overlap matrix.
Number of threads to parallelize across.
Path to save a table of correlation results to.
correlation matrix
data("CnR_H3K27ac")
data("CnT_H3K27ac")
data("encode_H3K27ac")
peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac)
reference <- list("encode_H3K27ac"=encode_H3K27ac)
#increasing bin_size for speed but lower values will give more granular corr
corr_mat <- compute_corr(peakfiles = peakfiles,
reference = reference,
genome_build = "hg19",
bin_size = 200000,
workers = 1)
#> Standardising peak files in 16,318 bins of 2e+05 bp.
#> Merging data into matrix.
#> Binned matrix size: 16,318 x 3
#> Matrix sparsity: 0.9646
#> Calculating correlation matrix.
#> Done computing correlations in 1 seconds.
#> Saving correlation results ==> /tmp/RtmpKu2U4A/file44302aee9381.corr.csv.gz