Import peaks — import_peaks • PeakyFinders

Import pre-computed peak files, or compute new peaks from bedGraph/bigWig files. Can import a subset of ranges specified by query_granges, or across the whole genome by setting query_granges=NULL.
Currently recognizes IDs from:

GEO :
ENCODE : See peaks_metadata_encode for example metadata.
ROADMAP : See peaks_metadata_roadmap for example metadata.
AnnotationHub : See peaks_metadata_annotationhub for example metadata.

Notable features:

Automatically infers which database each accession ID is from and organizes the outputs accordingly.
Automatically infers which function is needed to import which file types.
Automatically calls peaks from any bedGraph/bigWig files.
query_granges can be a different genome build than the files being imported, as the query_granges will be lifted over to the correct genome build with liftover_grlist.
When nThread>1, accelerates file importing and peak calling using multi-core parallelisation.

import_peaks(
  ids,
  builds = "hg19",
  query_granges = NULL,
  query_granges_build = NULL,
  split_chromosomes = FALSE,
  condense_queries = TRUE,
  force_new = FALSE,
  call_peaks_method = "MACSr",
  cutoff = NULL,
  searches = construct_searches(),
  peaks_dir = tempdir(),
  save_path = tempfile(fileext = "_PeakyFinders_grl.rds"),
  nThread = 1,
  verbose = TRUE
)

Arguments

ids

IDs from one of the supported databases. IDs can be at any level: file, sample, or experiment.

builds

Genome build that each sample in ids is aligned to. This will determine whether whether the query_granges data need to be lifted over to different genome build before querying. Can be a single character string applied to all ids (e.g. "hg19"), or a vector of the same length as ids named using the ids (e.g. c("GSM4271282"="hg19", "ENCFF048VDO"="hg38")).

query_granges

[Optional] GRanges object indicating which genomic regions to extract from each sample.

query_granges_build

[Optional] Genome build that query_granges is aligned to.

split_chromosomes

Split single-threaded query into multi-threaded query across chromosomes. This is can be helpful especially when calling peaks from large bigWig/bedGraph files. The number of threads used is set by the nThread argument.

condense_queries

Condense query_granges by taking the min/max position per chromosome (default: TRUE). This helps to reduce the total number of queries, which can cause memory allocation problems due to repeated calls to the underlying C libraries.

force_new

By default, saved results of the same save_path name will be imported instead of running queries. However you can override this by setting force_new to perform new queries regardless and overwrite the old save_path file.

call_peaks_method

Method to call peaks with:

"MACSr" : Uses MACS3 via bdgpeakcall.

cutoff

Cutoff depends on which method you used for score track. If the file contains pvalue scores from MACS3, score 5 means pvalue 1e-5. If NULL, a reasonable cutoff value will be inferred through a cutoff_analysis.

searches

Named list of regex queries.

peaks_dir

Directory to save peaks to (only used when calling peaks from bedGraph files).

save_path

Path to save query results to in .rds format.

nThread

When nThread>1, accelerates file importing and peak calling using multi-core parallelisation.

verbose

Print messages.

Value

A nested named list of peak files in GRanges format. Nesting structure is as follows: database -> id -> GRanges objectEach GRanges object contains all the peak data that was found for that particular id, merged into one. You can differentiate the various source file types by looking at the column "peaktype". If peaks could not be recovered for a sample, that element will be set to NULL.

Examples

grl <- PeakyFinders::import_peaks(
    ids = c("GSM945244"),# "ENCSR000AHD"
    searches = PeakyFinders::construct_searches(keys = "narrowpeak"))
#> Processing id(s).
#> 1 unique GEO id(s) identified.
#> Querying 1 id(s) from: GEO
#> Processing id: >>> GSM945244 <<<
#> Determining available file types.
#> Found file link(s) for 1 category.
#> narrowpeak : 
#> >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep1.narrowPeak.gz
#> >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep2.narrowPeak.gz
#> 
#> Importing pre-computed narrowPeak files.
#> 
#> Saving results ==>  /tmp/RtmppYFQrz/file163c479b1747_PeakyFinders_grl.rds