Import peaks

Pre-computed peaks

Some peaks files are already made available on GEO. import_peaks will import these as a named list GenomicRanges, where the names are the GEO sample ids.

grl <- tryCatch({
    PeakyFinders::import_peaks(
        ids = "GSM945244",
        searches = construct_searches(keys = c("narrowpeak",
                                               "broadpeak")))
}, error = function(e) {
    message("GEO import skipped: ", conditionMessage(e))
    NULL
})
## Processing id(s).
## 1 unique GEO id(s) identified.
## Querying 1 id(s) from: GEO
## Processing id: >>> GSM945244 <<<
## Determining available file types.
## Found file link(s) for 2 categories.
## narrowpeak : 
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep1.narrowPeak.gz
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep2.narrowPeak.gz
## broadpeak : 
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdHotspotsRep1.broadPeak.gz
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdHotspotsRep2.broadPeak.gz
## 
## Importing 2 pre-computed narrowPeak files.
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep1.narrowPeak.gz
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep2.narrowPeak.gz
## Importing pre-computed broadPeak files.
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdHotspotsRep1.broadPeak.gz
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdHotspotsRep2.broadPeak.gz
## 
## Removing empty entries.
## Saving results ==>  /tmp/RtmpLB7Sbb/filea14639feb300_PeakyFinders_grl.rds

Pre-computed peak: subset

By default, the genome-wide files are imported. However, you can also query specific subset of these peak files defined by query_granges.

Here, we’ll just construct query for all of chromosome 22 using the helper function get_genome

query_granges <- PeakyFinders::get_genome(genome = "hg19",
                                          keep.chr = 22)
## 
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
## 
##     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
##     setequal, union
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
##     unsplit, which.max, which.min
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
## 
##     strsplit
grl2 <- tryCatch({
    PeakyFinders::import_peaks(
        ids = "GSM945244",
        searches = construct_searches(keys = c("narrowpeak",
                                               "broadpeak")),
        builds = "hg19",
        query_granges = query_granges,
        query_granges_build = "hg19")
}, error = function(e) {
    message("GEO import skipped: ", conditionMessage(e))
    NULL
})
## Processing id(s).
## 1 unique GEO id(s) identified.
## dat is already a GRanges object.
## Querying 1 id(s) from: GEO
## Processing id: >>> GSM945244 <<<
## Determining available file types.
## Found file link(s) for 2 categories.
## narrowpeak : 
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep1.narrowPeak.gz
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep2.narrowPeak.gz
## broadpeak : 
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdHotspotsRep1.broadPeak.gz
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdHotspotsRep2.broadPeak.gz
## grlist is already in the output_build format. Skipping liftover.
## 
## Importing 2 pre-computed narrowPeak files.
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep1.narrowPeak.gz
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep2.narrowPeak.gz
## Importing pre-computed broadPeak files.
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdHotspotsRep1.broadPeak.gz
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdHotspotsRep2.broadPeak.gz
## Subsetting peaks.
## 
## Removing empty entries.
## Saving results ==>  /tmp/RtmpLB7Sbb/filea146124f5d5e_PeakyFinders_grl.rds
if(!is.null(grl2)) {
    knitr::kable(
        head(data.frame(grl2$GSM945244))
    )
}

Call peaks

When no pre-computed peaks are available, you can instead use import_peaks to compute peaks from bigWig or bedGraph files. The default peak calling method is currently MACS3 via the MACSr package.

Note:

By default, peaks are called using whole-chromosomes, from which a subset specified by query_granges is returned. This ensures that an appropriate background is used when computing peaks. You can also call peaks using the whole genome at once, but this is usally less computationally efficient than splitting the task by chromosome (i.e. setting split_chromosomes=TRUE when query_granges is not provided).

By default, import_peaks (and the subfunction call_peaks) automatically infer a reasonable cutoff threshold based on an iterative testing procedure. See ?PeakyFinders::call_peaks for more details.

Warning:

Peak calling with MACS3/MACSr is not currently compatible with Windows OS.

### rtracklayer::import.bw() is currently broken on Windows.
### Peak calling with MACS3 can also fail in CI environments due to
### external data dependencies, so we wrap in tryCatch.
if(.Platform$OS.type!="windows"){
    grl3 <- tryCatch({
        PeakyFinders::import_peaks(
            ids = "GSM945244",
            searches = construct_searches(keys = "bigwig"),
            builds = "hg19",
            query_granges = query_granges,
            query_granges_build = "hg19"
        )
    }, error = function(e) {
        message("Peak calling example skipped due to external dependency: ",
                conditionMessage(e))
        NULL
    })

    if(!is.null(grl3)) {
        knitr::kable(
            head(data.frame(grl3$GSM945244))
        )
    }
}
## Processing id(s).
## 1 unique GEO id(s) identified.
## dat is already a GRanges object.
## Querying 1 id(s) from: GEO
## Processing id: >>> GSM945244 <<<
## Determining available file types.
## Found file link(s) for 1 category.
## bigwig : 
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdRawRep1.bigWig
## >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdRawRep2.bigWig
## grlist is already in the output_build format. Skipping liftover.
## 
## Computing peaks from bigWig file.
##  - https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdRawRep1.bigWig
## Importing chromosome(s): chr22
## + Importing as: BigWig
## Returning BigWig of length 242,161
## Fixing GRanges seqinfo.
## Loading required namespace: TxDb.Hsapiens.UCSC.hg19.knownGene
## Writing bigWig subset as bedGraph.
## File already in bedGraph format.
## Analyzing cutoff thresholds.
## INFO  @ 25 Jan 2026 22:50:04: [1201 MB] Read and build bedGraph... 
## INFO  @ 25 Jan 2026 22:50:04: [1201 MB] Analyze cutoff vs number of peaks/total length of peaks/average length of peak 
## INFO  @ 25 Jan 2026 22:50:05: [1201 MB] Write report... 
## INFO  @ 25 Jan 2026 22:50:05: [1201 MB] Done
## Calling peaks.
## Peak calling example skipped due to external dependency: TypeError: float() argument must be a string or a real number, not 'NoneType'
## Run `reticulate::py_last_error()` for details.

Parallelization

To speed up the process of importing and/or calling peaks, you can take advantage of the parallelisation features in import_peaks.

For example, if you want to call peaks across the entire genome from a given bigWig file, you could speed up this process by dividing the task by chromosome (split_chromosomes=TRUE) and distributing it across multiple cores or compute nodes (nThread=2 or greater).

Internally, PeakyFinders takes a number of steps to optimise these parallel processes, so you don’t have to worry about it as much. This helps to avoid common issues stemming from excessive numbers of queries, both to the servers where the peak data is hosted, and to the C libraries that R relies on to makes these queries.

Note:

It’s usually a good idea to leave several cores free when running parallel processes, so that you can still have enough resources to use your computer while the function is running. In the example below, I’m using 10/12 cores on my local laptop.

Warning:

Parallelisation is not currently compatible with Windows OS. If Windows OS is detected, import_peaks will automatically switch to serial processing with nThread=1.

grl4 <- PeakyFinders::import_peaks(
    ids = "GSM945244",
    searches = construct_searches(keys = "bigwig"),
    split_chromosomes = TRUE,
    nThread = 10
    )  

Session Info

utils::sessionInfo()
## R Under development (unstable) (2026-01-22 r89323)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BSgenome.Hsapiens.UCSC.hg19.masked_1.3.993
##  [2] BSgenome.Hsapiens.UCSC.hg19_1.4.3         
##  [3] BSgenome_1.79.1                           
##  [4] rtracklayer_1.71.3                        
##  [5] BiocIO_1.21.0                             
##  [6] Biostrings_2.79.4                         
##  [7] XVector_0.51.0                            
##  [8] GenomicRanges_1.63.1                      
##  [9] Seqinfo_1.1.0                             
## [10] IRanges_2.45.0                            
## [11] S4Vectors_0.49.0                          
## [12] BiocGenerics_0.57.0                       
## [13] generics_0.1.4                            
## [14] PeakyFinders_0.99.4                       
## [15] BiocStyle_2.39.0                          
## 
## loaded via a namespace (and not attached):
##   [1] piggyback_0.1.5                         
##   [2] DBI_1.2.3                               
##   [3] bitops_1.0-9                            
##   [4] httr2_1.2.2                             
##   [5] rlang_1.1.7                             
##   [6] magrittr_2.0.4                          
##   [7] otel_0.2.0                              
##   [8] matrixStats_1.5.0                       
##   [9] compiler_4.6.0                          
##  [10] RSQLite_2.4.5                           
##  [11] GenomicFeatures_1.63.1                  
##  [12] dir.expiry_1.19.0                       
##  [13] png_0.1-8                               
##  [14] systemfonts_1.3.1                       
##  [15] vctrs_0.7.1                             
##  [16] rvest_1.0.5                             
##  [17] stringr_1.6.0                           
##  [18] pkgconfig_2.0.3                         
##  [19] crayon_1.5.3                            
##  [20] fastmap_1.2.0                           
##  [21] dbplyr_2.5.1                            
##  [22] Rsamtools_2.27.0                        
##  [23] rmarkdown_2.30                          
##  [24] tzdb_0.5.0                              
##  [25] UCSC.utils_1.7.1                        
##  [26] ragg_1.5.0                              
##  [27] purrr_1.2.1                             
##  [28] bit_4.6.0                               
##  [29] xfun_0.56                               
##  [30] cachem_1.1.0                            
##  [31] cigarillo_1.1.0                         
##  [32] GenomeInfoDb_1.47.2                     
##  [33] jsonlite_2.0.0                          
##  [34] blob_1.3.0                              
##  [35] DelayedArray_0.37.0                     
##  [36] BiocParallel_1.45.0                     
##  [37] echoconda_0.99.10                       
##  [38] parallel_4.6.0                          
##  [39] R6_2.6.1                                
##  [40] bslib_0.9.0                             
##  [41] stringi_1.8.7                           
##  [42] limma_3.67.0                            
##  [43] reticulate_1.44.1                       
##  [44] jquerylib_0.1.4                         
##  [45] Rcpp_1.1.1                              
##  [46] bookdown_0.46                           
##  [47] SummarizedExperiment_1.41.0             
##  [48] knitr_1.51                              
##  [49] R.utils_2.13.0                          
##  [50] readr_2.1.6                             
##  [51] rentrez_1.2.4                           
##  [52] Matrix_1.7-4                            
##  [53] tidyselect_1.2.1                        
##  [54] abind_1.4-8                             
##  [55] yaml_2.3.12                             
##  [56] codetools_0.2-20                        
##  [57] curl_7.0.0                              
##  [58] regioneR_1.43.0                         
##  [59] lattice_0.22-7                          
##  [60] tibble_3.3.1                            
##  [61] Biobase_2.71.0                          
##  [62] basilisk.utils_1.23.1                   
##  [63] KEGGREST_1.51.1                         
##  [64] evaluate_1.0.5                          
##  [65] desc_1.4.3                              
##  [66] BiocFileCache_3.1.0                     
##  [67] zip_2.3.3                               
##  [68] xml2_1.5.2                              
##  [69] ExperimentHub_3.1.0                     
##  [70] pillar_1.11.1                           
##  [71] BiocManager_1.30.27                     
##  [72] filelock_1.0.3                          
##  [73] MatrixGenerics_1.23.0                   
##  [74] TxDb.Hsapiens.UCSC.hg19.knownGene_3.22.1
##  [75] DT_0.34.0                               
##  [76] RCurl_1.98-1.17                         
##  [77] BiocVersion_3.23.1                      
##  [78] hms_1.1.4                               
##  [79] glue_1.8.0                              
##  [80] tools_4.6.0                             
##  [81] AnnotationHub_4.1.0                     
##  [82] data.table_1.18.0                       
##  [83] GenomicAlignments_1.47.0                
##  [84] openxlsx_4.2.8.1                        
##  [85] GEOquery_2.79.0                         
##  [86] fs_1.6.6                                
##  [87] XML_3.99-0.20                           
##  [88] grid_4.6.0                              
##  [89] tidyr_1.3.2                             
##  [90] echodata_0.99.17                        
##  [91] AnnotationDbi_1.73.0                    
##  [92] basilisk_1.23.0                         
##  [93] restfulr_0.0.16                         
##  [94] cli_3.6.5                               
##  [95] rappdirs_0.3.4                          
##  [96] textshaping_1.0.4                       
##  [97] S4Arrays_1.11.1                         
##  [98] dplyr_1.1.4                             
##  [99] R.methodsS3_1.8.2                       
## [100] sass_0.4.10                             
## [101] digest_0.6.39                           
## [102] SparseArray_1.11.10                     
## [103] rjson_0.2.23                            
## [104] htmlwidgets_1.6.4                       
## [105] memoise_2.0.1                           
## [106] htmltools_0.5.9                         
## [107] pkgdown_2.2.0                           
## [108] R.oo_1.27.1                             
## [109] lifecycle_1.0.5                         
## [110] httr_1.4.7                              
## [111] MACSr_1.13.2                            
## [112] statmod_1.5.1                           
## [113] bit64_4.6.0-1