EpiCompare compares different epigenetic datasets for quality control and benchmarking purposes. The report is divided into three sections:

  1. General Metrics: Metrics on fragments (duplication rate) and peaks (blacklisted peaks and peak widths) of input files
  2. Peak Overlap: Percentage and statistical significance of overlapping and unon-overlapping peaks.
  3. Functional Annotation: Functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks.

Input Datasets
  • Reference peakfile: ENCODE_H3K27ac
  • Total of 19 peak files:
## [1] "File1: H3K27ac_CnT_ActiveMotif_SEACR"
## [1] "File2: H3K27ac_CnT_ActiveMotif_MACS2"
## [1] "File3: H3K27ac_CnT_Abcamab4729_SEACR"
## [1] "File4: H3K27ac_CnT_Abcamab4729_MACS2"
## [1] "File5: H3K27ac_CnT_KayaOkur_SEACR"
## [1] "File6: H3K27ac_CnT_KayaOkur_MACS2"
## [1] "File7: H3K27ac_CnR_Meers_SEACR"
## [1] "File8: H3K27ac_CnR_Meers_MACS2"
## [1] "File9: H3K27me3_CnR_Meers_SEACR"
## [1] "File10: H3K27me3_CnR_Meers_MACS2"
## [1] "File11: H3K27me3_ENCODE"
## [1] "File12: H3K27ac_ENCODE"
## [1] "File13: H3K27ac_TIP_Abcam.phase_1_05_jan_2022.S_1_R1"
## [1] "File14: H3K27ac_TIP_Abcam.phase_2_03_feb_2022.S_4_R1"
## [1] "File15: H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_2_R1"
## [1] "File16: H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_3_R1"
## [1] "File17: H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_4_R1"
## [1] "File18: H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_5_R1"
## [1] "File19: H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_6_R1"
Code
EpiCompare(peakfiles = list(H3K27ac_CnT_ActiveMotif_SEACR, H3K27ac_CnT_ActiveMotif_MACS2, H3K27ac_CnT_Abcamab4729_SEACR, H3K27ac_CnT_Abcamab4729_MACS2, H3K27ac_CnT_KayaOkur_SEACR, H3K27ac_CnT_KayaOkur_MACS2, H3K27ac_CnR_Meers_SEACR, H3K27ac_CnR_Meers_MACS2, H3K27me3_CnR_Meers_SEACR, H3K27me3_CnR_Meers_MACS2, H3K27me3_ENCODE, H3K27ac_ENCODE, H3K27ac_TIP_Abcam.phase_1_05_jan_2022.S_1_R1, H3K27ac_TIP_Abcam.phase_2_03_feb_2022.S_4_R1, H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_2_R1, H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_3_R1, H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_4_R1, H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_5_R1, H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_6_R1),
           blacklist = blacklist,
           picard_files = list(),
           reference = ENCODE_H3K27ac,
           stat_plot = TRUE,
           chrmHMM_plot = TRUE,
           chrmHMM_annotation = "K562",
           chipseeker_plot = TRUE,
           enrichment_plot = TRUE,
           interact = TRUE,
           save_output = FALSE,
           output_dir = "/Users/serachoi/Documents/EpiCompare")

1. General Metrics

Peak Information

Column Description:

  • PeakN before tidy: Total number of peaks including those blacklisted and those in non-standard chromosomes.

  • Blacklisted peaks removed (%): Percentage of blacklisted peaks present in the sample. ENCODE blacklist includes regions in the hg19 genome that have anomalous and/or unstructured signals independent of the cell-line or experiment.

  • Non-standard peaks removed (%): Percentage of peaks identified in non-standard and/or mitochondrial chromosomes. Identified using BRGenomics::tidyChromosomes().

  • PeakN after tidy: Total number of peaks after filtering blacklisted peaks and those in non-standard chromosomes.

    NB: All analyses in EpiCompare are conducted on tidied datasets (i.e. blacklisted peaks and those in non-standard chromosomes removed)


Sample PeakN before tidy Blacklisted peaks removed (%) Non-standard peaks removed (%) PeakN after tidy
H3K27ac_CnT_ActiveMotif_SEACR 3211 17.800 4.390 2497
H3K27ac_CnT_ActiveMotif_MACS2 2526 23.200 7.050 1762
H3K27ac_CnT_Abcamab4729_SEACR 13530 5.690 0.761 12657
H3K27ac_CnT_Abcamab4729_MACS2 26077 6.800 0.874 24076
H3K27ac_CnT_KayaOkur_SEACR 6669 3.420 0.390 6415
H3K27ac_CnT_KayaOkur_MACS2 13456 3.470 0.498 12922
H3K27ac_CnR_Meers_SEACR 21498 3.480 0.744 20589
H3K27ac_CnR_Meers_MACS2 17627 3.140 0.567 16974
H3K27me3_CnR_Meers_SEACR 83685 2.440 0.213 81465
H3K27me3_CnR_Meers_MACS2 103635 2.430 0.323 100777
H3K27me3_ENCODE 164472 0.389 0.000 163833
H3K27ac_ENCODE 51176 0.952 0.000 50689
H3K27ac_TIP_Abcam.phase_1_05_jan_2022.S_1_R1 60064 2.670 0.526 58143
H3K27ac_TIP_Abcam.phase_2_03_feb_2022.S_4_R1 71682 2.630 0.632 69346
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_2_R1 61249 2.940 0.410 59195
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_3_R1 73765 2.620 0.493 71472
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_4_R1 65526 2.630 0.491 63480
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_5_R1 64540 2.630 0.318 62638
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_6_R1 41891 2.400 0.465 40689

Fragment Information

Metrics on fragments is shown only if Picard summary is provided. See manual for help.

Column Description:

  • Mapped_Fragments: Number of mapped read pairs in the file.
  • Duplication_Rate: Percentage of mapped sequence that is marked as duplicate.
  • Unique_Fragments: Number of mapped sequence that is not marked as duplicate.



Peak widths

Distribution of peak widths in each sample.



2. Peak Overlap

Percentage Overlap

Heatmap of percentage of overlapping peaks between samples. Hover over the heatmap for percentage values.

N.B. How to interpret heatmap: [Samples in x-axis of heatmap] peaks in [Samples in y-axis of heatmap] peaks



Statistical Significance

The plot is shown only if a reference peak file is provided and stat_plot = TRUE. Depending on the format of the reference file, EpiCompare outputs different plots:

  • Reference dataset has BED6+4 format (peakcalling performed with MACS2): EpiCompare generates paired boxplot per sample showing the distribution of -log10(q-value) of reference peaks that are overlapping and non-overlapping with the sample dataset.
  • Reference dataset does not have BED6+4 format: EpiCompare generates a barplot of percentage of overlapping sample peaks with the reference, coloured by statistical significance (adjusted p-value) of the overlap.

Keys:

Reference peakfile: ENCODE_H3K27ac

  • Overlap: Sample peaks in Reference peaks
  • Unique: Sample peaks not in Reference peaks



3. Functional Annotation

3.1 ChromHMM

ChromHMM annotates and characterises peaks into different chromatin states. ChromHMM annotations used in EpiCompare were obtained from here.

  • Cell-type annotation file used in this analysis: K562

All samples

ChromHMM annotation of individual samples.

Overlap: Sample peaks in Reference peaks

Percentage of Sample peaks found in Reference peaks (Reference peakfile: ENCODE_H3K27ac)

Percentage
H3K27ac_CnT_ActiveMotif_SEACR 77.200
H3K27ac_CnT_ActiveMotif_MACS2 63.200
H3K27ac_CnT_Abcamab4729_SEACR 89.700
H3K27ac_CnT_Abcamab4729_MACS2 78.700
H3K27ac_CnT_KayaOkur_SEACR 92.500
H3K27ac_CnT_KayaOkur_MACS2 83.300
H3K27ac_CnR_Meers_SEACR 80.600
H3K27ac_CnR_Meers_MACS2 80.700
H3K27me3_CnR_Meers_SEACR 0.308
H3K27me3_CnR_Meers_MACS2 0.331
H3K27me3_ENCODE 0.128
H3K27ac_ENCODE 100.000
H3K27ac_TIP_Abcam.phase_1_05_jan_2022.S_1_R1 8.250
H3K27ac_TIP_Abcam.phase_2_03_feb_2022.S_4_R1 8.440
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_2_R1 5.690
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_3_R1 9.730
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_4_R1 9.820
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_5_R1 3.620
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_6_R1 7.930

ChromHMM annotation of sample peaks found in reference peaks.

Overlap: Reference peaks in Sample peaks

Percentage of Reference peaks found in Sample peaks (Reference peakfile: ENCODE_H3K27ac)

Percentage
H3K27ac_CnT_ActiveMotif_SEACR 4.650
H3K27ac_CnT_ActiveMotif_MACS2 2.240
H3K27ac_CnT_Abcamab4729_SEACR 46.300
H3K27ac_CnT_Abcamab4729_MACS2 44.400
H3K27ac_CnT_KayaOkur_SEACR 25.300
H3K27ac_CnT_KayaOkur_MACS2 27.100
H3K27ac_CnR_Meers_SEACR 39.600
H3K27ac_CnR_Meers_MACS2 53.900
H3K27me3_CnR_Meers_SEACR 0.582
H3K27me3_CnR_Meers_MACS2 0.793
H3K27me3_ENCODE 0.422
H3K27ac_ENCODE 100.000
H3K27ac_TIP_Abcam.phase_1_05_jan_2022.S_1_R1 12.700
H3K27ac_TIP_Abcam.phase_2_03_feb_2022.S_4_R1 16.000
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_2_R1 10.900
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_3_R1 23.000
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_4_R1 19.600
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_5_R1 5.270
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_6_R1 6.200

ChromHMM annotation of reference peaks found in sample peaks.

Unique: Sample peaks not in Reference peaks

Percentage of sample peaks not found in reference peaks (Reference peakfile: ENCODE_H3K27ac)

Percentage
H3K27ac_CnT_ActiveMotif_SEACR 22.80
H3K27ac_CnT_ActiveMotif_MACS2 36.80
H3K27ac_CnT_Abcamab4729_SEACR 10.30
H3K27ac_CnT_Abcamab4729_MACS2 21.30
H3K27ac_CnT_KayaOkur_SEACR 7.48
H3K27ac_CnT_KayaOkur_MACS2 16.70
H3K27ac_CnR_Meers_SEACR 19.40
H3K27ac_CnR_Meers_MACS2 19.30
H3K27me3_CnR_Meers_SEACR 99.70
H3K27me3_CnR_Meers_MACS2 99.70
H3K27me3_ENCODE 99.90
H3K27ac_ENCODE 0.00
H3K27ac_TIP_Abcam.phase_1_05_jan_2022.S_1_R1 91.70
H3K27ac_TIP_Abcam.phase_2_03_feb_2022.S_4_R1 91.60
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_2_R1 94.30
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_3_R1 90.30
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_4_R1 90.20
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_5_R1 96.40
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_6_R1 92.10

ChromHMM annotation of sample peaks not found in reference peaks.

Unique: Reference peaks not in Sample peaks

Percentage of reference peaks not found in sample peaks (Reference peakfile: ENCODE_H3K27ac)

Percentage
H3K27ac_CnT_ActiveMotif_SEACR 95.4
H3K27ac_CnT_ActiveMotif_MACS2 97.8
H3K27ac_CnT_Abcamab4729_SEACR 53.7
H3K27ac_CnT_Abcamab4729_MACS2 55.6
H3K27ac_CnT_KayaOkur_SEACR 74.7
H3K27ac_CnT_KayaOkur_MACS2 72.9
H3K27ac_CnR_Meers_SEACR 60.4
H3K27ac_CnR_Meers_MACS2 46.1
H3K27me3_CnR_Meers_SEACR 99.4
H3K27me3_CnR_Meers_MACS2 99.2
H3K27me3_ENCODE 99.6
H3K27ac_ENCODE 0.0
H3K27ac_TIP_Abcam.phase_1_05_jan_2022.S_1_R1 87.3
H3K27ac_TIP_Abcam.phase_2_03_feb_2022.S_4_R1 84.0
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_2_R1 89.1
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_3_R1 77.0
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_4_R1 80.4
H3K27me3_TIP_Diagenode.phase_2_28_jan_2022.S_5_R1 94.7
H3K27ac_TIP_Diagenode.phase_2_28_jan_2022.S_6_R1 93.8

ChromHMM annotation of reference peaks not found in sample peaks.

3.2 ChIPseeker

EpiCompare uses annotatePeak function in ChIPseeker package to annotate peaks with the nearest gene and genomic region where the peak is located. The peaks are annotated with genes taken from the annotations of human genome hg19 provided by Bioconductor.

3.3 Functional Enrichment Analysis

EpiCompare performs KEGG pathway and GO enrichment analysis using clusterProfiler. annotatePeak function in ChIPseeker package is first used to assign peaks to nearest genes and biological themes amongst the genes are identified using ontologies (KEGG and GO). The peaks are annotated with genes taken from the annotations of human genome hg19 provided by Bioconductor.

KEGG

GO

4. Session Info

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 11.2
## 
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] EpiCompare_0.99.0    org.Hs.eg.db_3.14.0  AnnotationDbi_1.56.2 IRanges_2.28.0       S4Vectors_0.32.3     Biobase_2.54.0       BiocGenerics_0.40.0 
##  [8] testthat_3.1.2       devtools_2.4.3       usethis_2.1.5       
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.3                           rtracklayer_1.54.0                       tidyr_1.2.0                             
##   [4] ggplot2_3.3.5                            bit64_4.0.5                              knitr_1.37                              
##   [7] DelayedArray_0.20.0                      data.table_1.14.2                        KEGGREST_1.34.0                         
##  [10] RCurl_1.98-1.6                           generics_0.1.2                           GenomicFeatures_1.46.5                  
##  [13] callr_3.7.0                              RSQLite_2.2.10                           shadowtext_0.1.1                        
##  [16] bit_4.0.4                                tzdb_0.2.0                               enrichplot_1.14.2                       
##  [19] webshot_0.5.2                            xml2_1.3.3                               SummarizedExperiment_1.24.0             
##  [22] assertthat_0.2.1                         viridis_0.6.2                            xfun_0.29                               
##  [25] hms_1.1.1                                jquerylib_0.1.4                          evaluate_0.15                           
##  [28] TSP_1.2-0                                fansi_1.0.2                              restfulr_0.0.13                         
##  [31] progress_1.2.2                           caTools_1.18.2                           dendextend_1.15.2                       
##  [34] dbplyr_2.1.1                             igraph_1.2.11                            DBI_1.1.2                               
##  [37] geneplotter_1.72.0                       htmlwidgets_1.5.4                        purrr_0.3.4                             
##  [40] ellipsis_0.3.2                           crosstalk_1.2.0                          dplyr_1.0.8                             
##  [43] ggpubr_0.4.0.999                         backports_1.4.1                          annotate_1.72.0                         
##  [46] gridBase_0.4-7                           biomaRt_2.50.3                           MatrixGenerics_1.6.0                    
##  [49] vctrs_0.3.8                              remotes_2.4.2                            abind_1.4-5                             
##  [52] cachem_1.0.6                             withr_2.4.3                              ggforce_0.3.3                           
##  [55] BSgenome_1.62.0                          genomation_1.26.0                        vroom_1.5.7                             
##  [58] GenomicAlignments_1.30.0                 treeio_1.18.1                            prettyunits_1.1.1                       
##  [61] DOSE_3.20.1                              ape_5.6-1                                lazyeval_0.2.2                          
##  [64] crayon_1.5.0                             genefilter_1.76.0                        pkgconfig_2.0.3                         
##  [67] labeling_0.4.2                           tweenr_1.0.2                             GenomeInfoDb_1.30.1                     
##  [70] nlme_3.1-155                             pkgload_1.2.4                            seriation_1.3.2                         
##  [73] rlang_1.0.1                              lifecycle_1.0.1                          downloader_0.4                          
##  [76] registry_0.5-1                           filelock_1.0.2                           BiocFileCache_2.2.1                     
##  [79] seqPattern_1.26.0                        rprojroot_2.0.2                          polyclip_1.10-0                         
##  [82] matrixStats_0.61.0                       Matrix_1.4-0                             aplot_0.1.2                             
##  [85] carData_3.0-5                            boot_1.3-28                              processx_3.5.2                          
##  [88] png_0.1-7                                viridisLite_0.4.0                        rjson_0.2.21                            
##  [91] bitops_1.0-7                             KernSmooth_2.23-20                       Biostrings_2.62.0                       
##  [94] blob_1.2.2                               stringr_1.4.0                            qvalue_2.26.0                           
##  [97] readr_2.1.2                              rstatix_0.7.0                            gridGraphics_0.5-1                      
## [100] ggsignif_0.6.3                           scales_1.1.1                             memoise_2.0.1                           
## [103] magrittr_2.0.2                           plyr_1.8.6                               gplots_3.1.1                            
## [106] zlibbioc_1.40.0                          compiler_4.1.2                           scatterpie_0.1.7                        
## [109] TxDb.Hsapiens.UCSC.hg38.knownGene_3.14.0 BiocIO_1.4.0                             RColorBrewer_1.1-2                      
## [112] plotrix_3.8-2                            DESeq2_1.34.0                            Rsamtools_2.10.0                        
## [115] cli_3.2.0                                XVector_0.34.0                           patchwork_1.1.1                         
## [118] ps_1.6.0                                 MASS_7.3-55                              tidyselect_1.1.2                        
## [121] stringi_1.7.6                            highr_0.9                                yaml_2.3.5                              
## [124] GOSemSim_2.20.0                          locfit_1.5-9.4                           ggrepel_0.9.1                           
## [127] grid_4.1.2                               sass_0.4.0                               fastmatch_1.1-3                         
## [130] tools_4.1.2                              parallel_4.1.2                           rstudioapi_0.13                         
## [133] foreach_1.5.2                            TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  gridExtra_2.3                           
## [136] BRGenomics_1.6.0                         farver_2.1.0                             ggraph_2.0.5                            
## [139] digest_0.6.29                            Rcpp_1.0.8                               GenomicRanges_1.46.1                    
## [142] car_3.0-12                               broom_0.7.12                             httr_1.4.2                              
## [145] colorspace_2.0-3                         brio_1.1.3                               XML_3.99-0.9                            
## [148] fs_1.5.2                                 splines_4.1.2                            yulab.utils_0.0.4                       
## [151] tidytree_0.3.8                           graphlayouts_0.8.0                       ggplotify_0.1.0                         
## [154] plotly_4.10.0                            sessioninfo_1.2.2                        xtable_1.8-4                            
## [157] jsonlite_1.8.0                           ggtree_3.2.1                             heatmaply_1.3.0                         
## [160] tidygraph_1.2.0                          ggfun_0.0.5                              R6_2.5.1                                
## [163] pillar_1.7.0                             htmltools_0.5.2                          glue_1.6.2                              
## [166] fastmap_1.1.0                            clusterProfiler_4.2.2                    BiocParallel_1.28.3                     
## [169] codetools_0.2-18                         ChIPseeker_1.30.3                        fgsea_1.20.0                            
## [172] pkgbuild_1.3.1                           utf8_1.2.2                               lattice_0.20-45                         
## [175] bslib_0.3.1                              tibble_3.1.6                             curl_4.3.2                              
## [178] gtools_3.9.2                             GO.db_3.14.0                             roxygen2_7.1.2.9000                     
## [181] survival_3.2-13                          rmarkdown_2.11                           desc_1.4.0                              
## [184] munsell_0.5.0                            DO.db_2.9                                GenomeInfoDbData_1.2.7                  
## [187] iterators_1.0.14                         impute_1.68.0                            reshape2_1.4.4                          
## [190] gtable_0.3.0