This function compares and analyses multiple epigenomic datasets and outputs an HTML report containing all results of the analysis. The report is mainly divided into three sections: (1) General Metrics on Peakfiles, (2) Peak Overlaps and (3) Functional Annotation of Peaks.
EpiCompare(
peakfiles,
genome_build,
genome_build_output = "hg19",
blacklist = NULL,
picard_files = NULL,
reference = NULL,
upset_plot = FALSE,
stat_plot = FALSE,
chromHMM_plot = FALSE,
chromHMM_annotation = "K562",
chipseeker_plot = FALSE,
enrichment_plot = FALSE,
tss_plot = FALSE,
tss_distance = c(-3000, 3000),
precision_recall_plot = FALSE,
n_threshold = 20,
corr_plot = FALSE,
bin_size = 5000,
interact = TRUE,
add_download_button = FALSE,
save_output = FALSE,
output_filename = "EpiCompare",
output_timestamp = FALSE,
output_dir,
display = NULL,
run_all = FALSE,
workers = 1,
quiet = FALSE,
error = FALSE,
debug = FALSE
)
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using list()
.
E.g. list("name1"=file1, "name2"=file2)
. If no names are specified,
default file names will be assigned.
A named list indicating the human genome build used to generate each of the following inputs:
"peakfiles" : Genome build for the peakfiles
input.
Assumes genome build is the same for each element in the peakfiles
list.
"reference" : Genome build for the reference
input.
"blacklist" : Genome build for the blacklist
input.
Example input list:genome_build = list(peakfiles="hg38",
reference="hg19", blacklist="hg19")
Alternatively, you can supply a single character string instead of a list.
This should only be done in situations where all three inputs
(peakfiles
, reference
, blacklist
) are of the same
genome build. For example:genome_build = "hg19"
Genome build to standardise all inputs to. Liftovers will be performed automatically as needed. Default: "hg19".
A GRanges object containing blacklisted genomic regions. Blacklists included in EpiCompare are:
NULL
(default): Automatically selects the appropriate
blacklist based on the genome_build_output
argument.
"hg19_blacklist": Regions of hg19 genome that have anomalous and/or unstructured signals. hg19_blacklist
"hg38_blacklist": Regions of hg38 genome that have anomalous and/or unstructured signals. hg38_blacklist
"mm10_blacklist": Regions of mm10 genome that have anomalous and/or unstructured signals. mm10_blacklist
"mm9_blacklist": Blacklisted regions of mm10 genome that have been lifted over from mm10_blacklist. mm9_blacklist
<user_input>
: A custom user-provided blacklist in
GRanges format.
A list of summary metrics output from Picard.
Files must be in data.frame format and listed using list()
and named using names()
.
To import Picard duplication metrics (.txt file)
into R as data frame, use:picard <- read.table("/path/to/picard/output",
header = TRUE, fill = TRUE)
.
A named list containing reference peak file(s) as GRanges
object. Please ensure that the reference file is listed and named
i.e. list("reference_name" = reference_peak)
. If more than one
reference is specified, individual reports for each reference will be
generated. However, please note that specifying more than one reference can
take awhile. If a reference is specified, it enables two analyses: (1) plot
showing statistical significance of overlapping/non-overlapping peaks; and
(2) ChromHMM of overlapping/non-overlapping peaks.
Default FALSE. If TRUE, the report includes upset plot of overlapping peaks.
Default FALSE. If TRUE, the function creates a plot showing the statistical significance of overlapping/non-overlapping peaks. Reference peak file must be provided.
Default FALSE. If TRUE, the function outputs ChromHMM heatmap of individual peak files. If a reference peak file is provided, ChromHMM annotation of overlapping and non-overlapping peaks is also provided.
ChromHMM annotation for ChromHMM plots. Default K562 cell-line. Cell-line options are:
"K562" = K-562 cells
"Gm12878" = Cellosaurus cell-line GM12878
"H1hesc" = H1 Human Embryonic Stem Cell
"Hepg2" = Hep G2 cell
"Hmec" = Human Mammary Epithelial Cell
"Hsmm" = Human Skeletal Muscle Myoblasts
"Huvec" = Human Umbilical Vein Endothelial Cells
"Nhek" = Normal Human Epidermal Keratinocytes
"Nhlf" = Normal Human Lung Fibroblasts
Default FALSE. If TRUE, the report includes a barplot of ChIPseeker annotation of peak files.
Default FALSE. If TRUE, the report includes dotplots of KEGG and GO enrichment analysis of peak files.
Default FALSE. If TRUE, the report includes peak count frequency around transcriptional start site. Note that this can take awhile.
A vector specifying the distance upstream and downstream
around transcription start sites (TSS).
The default value is c(-3000,3000)
; meaning peak frequency
3000bp upstream and downstream of TSS will be displayed.
Default is FALSE. If TRUE, creates a precision-recall curve plot and an F1 plot using plot_precision_recall.
Number of thresholds to test.
Default is FALSE. If TRUE, creates a correlation plot across all peak files using plot_corr.
Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage.
Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static.
Add download buttons for each plot or dataset.
Default FALSE. If TRUE, all outputs (tables and plots) of the analysis will be saved in a folder (EpiCompare_file).
Default EpiCompare.html. If otherwise, the html report will be saved in the specified name.
Default FALSE. If TRUE, date will be included in the file name.
Path to where output HTML file should be saved.
After completion, automatically display the HTML report file in one of the following ways:
"browser" : Display the report in your default web browser.
"rsstudio" : Display the report in Rstudio.
NULL (default) : Do not display the report.
Convenience argument that enables all plots/features
(without specifying each argument manually)
by overriding the default values.
Default: FALSE
.
Number of threads to parallelize across.
An option to suppress printing during rendering from knitr,
pandoc command line and others. To only suppress printing of the last
"Output created: " message, you can set rmarkdown.render.message
to
FALSE
If TRUE
, the Rmarkdown report will continue to render
even when some chunks encounter errors (default: FALSE
).
Passed to opts_chunk.
Run in debug mode, where are messages and warnings
are printed within the HTML report (default: FALSE
).
Path to one or more HTML report files.
### Load Data ###
data("encode_H3K27ac") # example dataset as GRanges object
data("CnT_H3K27ac") # example dataset as GRanges object
data("CnR_H3K27ac") # example dataset as GRanges object
data("CnT_H3K27ac_picard") # example Picard summary output
data("CnR_H3K27ac_picard") # example Picard summary output
#### Prepare Input ####
# create named list of peakfiles
peakfiles <- list(CnR=CnR_H3K27ac, CnT=CnT_H3K27ac)
# create named list of picard outputs
picard_files <- list(CnR=CnR_H3K27ac_picard, CnT=CnT_H3K27ac_picard)
# reference peak file
reference <- list("ENCODE" = encode_H3K27ac)
### Run EpiCompare ###
output_html <- EpiCompare(peakfiles = peakfiles,
genome_build = list(peakfiles="hg19",
reference="hg19"),
picard_files = picard_files,
reference = reference,
output_filename = "EpiCompare_test",
output_dir = tempdir())
#> NOTE: The following EpiCompare features are NOT being used:
#> - upset_plot=
#> - stat_plot=
#> - chromHMM_plot=
#> - chipseeker_plot=
#> - enrichment_plot=
#> - tss_plot=
#> - precision_recall_plot=
#> - corr_plot=
#> - add_download_button=
#>
#>
#> processing file: EpiCompare.Rmd
#> 1/59
#> 2/59 [setup]
#> 3/59
#> 4/59 [list file names]
#> 5/59
#> 6/59 [download processed peaks]
#> 7/59
#> 8/59 [report_command]
#> 9/59
#> 10/59 [peak_info]
#> 11/59
#> 12/59 [picard_files]
#> 13/59
#> 14/59 [width_boxplot]
#> 15/59
#> 16/59 [overlap_heatmap]
#> 17/59
#> 18/59 [overlap_upset_plot]
#> 19/59
#> 20/59 [overlap_stat_plot]
#> 21/59
#> 22/59 [precision_recall_plot]
#> 23/59
#> 24/59 [Correlation Plot]
#> 25/59
#> 26/59 [plot_chromHMM1]
#> 27/59
#> 28/59 [overlap_percent1]
#> 29/59
#> 30/59 [plot_chromHMM2]
#> 31/59
#> 32/59 [overlap_percent2]
#> 33/59
#> 34/59 [plot_chromHMM3]
#> 35/59
#> 36/59 [overlap_percent3]
#> 37/59
#> 38/59 [plot_chromHMM4]
#> 39/59
#> 40/59 [overlap_percent4]
#> 41/59
#> 42/59 [plot_chromHMM5]
#> 43/59
#> 44/59 [Annotate Peaks]
#> 45/59
#> 46/59 [plot_enrichment]
#> 47/59
#> 48/59 [KEGG_plot]
#> 49/59
#> 50/59 [GO_plot]
#> 51/59
#> 52/59 [unnamed-chunk-1]
#> 53/59
#> 54/59 [tss_plots]
#> 55/59
#> 56/59 [unnamed-chunk-2]
#> 57/59
#> 58/59 [Session Info]
#> 59/59
#> output file: EpiCompare.knit.md
#> /usr/bin/pandoc +RTS -K512m -RTS EpiCompare.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpKu2U4A/EpiCompare_test.html --lua-filter /__w/_temp/Library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /__w/_temp/Library/rmarkdown/rmarkdown/lua/latex-div.lua --lua-filter /__w/_temp/Library/rmarkdown/rmarkdown/lua/table-classes.lua --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /__w/_temp/Library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --number-sections --variable theme=bootstrap --css custom.css --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpKu2U4A/rmarkdown-str44305c4cc624.html --variable code_folding=hide --variable code_menu=1
#>
#> Output created: /tmp/RtmpKu2U4A/EpiCompare_test.html
#> [1] "Done in 0.09 min."
#> All outputs saved to: /tmp/RtmpKu2U4A
# utils::browseURL(output_html)