Import pre-computed peak files, or
compute new peaks from bedGraph/bigWig files.
Can import a subset of ranges specified by query_granges
,
or across the whole genome by setting query_granges=NULL
.
Currently recognizes IDs from:
GEO :
ENCODE :
See peaks_metadata_encode
for example metadata.
ROADMAP :
See peaks_metadata_roadmap
for example metadata.
AnnotationHub :
See peaks_metadata_annotationhub
for example metadata.
Notable features:
Automatically infers which database each accession ID is from and organizes the outputs accordingly.
Automatically infers which function is needed to import which file types.
Automatically calls peaks from any bedGraph/bigWig files.
query_granges
can be a different genome build than
the files being imported, as the query_granges
will
be lifted over to the correct genome build
with liftover_grlist.
When nThread>1
, accelerates file importing
and peak calling using multi-core parallelisation.
import_peaks(
ids,
builds = "hg19",
query_granges = NULL,
query_granges_build = NULL,
split_chromosomes = FALSE,
condense_queries = TRUE,
force_new = FALSE,
call_peaks_method = "MACSr",
cutoff = NULL,
searches = construct_searches(),
peaks_dir = tempdir(),
save_path = tempfile(fileext = "_PeakyFinders_grl.rds"),
nThread = 1,
verbose = TRUE
)
IDs from one of the supported databases. IDs can be at any level: file, sample, or experiment.
Genome build that each sample in ids
is aligned to.
This will determine whether whether the query_granges
data need to be
lifted over to different genome build before querying.
Can be a single character string applied to all ids
(e.g. "hg19"),
or a vector of the same length as ids
named using the ids
(e.g. c("GSM4271282"="hg19", "ENCFF048VDO"="hg38")).
[Optional] GRanges object indicating which genomic regions to extract from each sample.
[Optional]
Genome build that query_granges
is aligned to.
Split single-threaded query
into multi-threaded query across chromosomes.
This is can be helpful especially when calling peaks from
large bigWig/bedGraph files.
The number of threads used is set by the nThread
argument.
Condense query_granges
by taking the min/max position per chromosome (default: TRUE)
.
This helps to reduce the total number of queries,
which can cause memory allocation problems
due to repeated calls to the underlying C libraries.
By default, saved results of the same save_path
name
will be imported instead of running queries. However you can override this
by setting force_new
to perform new queries regardless and overwrite
the old save_path
file.
Method to call peaks with:
"MACSr" : Uses MACS3 via bdgpeakcall.
Cutoff depends on which method you used for score track.
If the file contains pvalue scores from MACS3, score 5 means pvalue 1e-5.
If NULL
, a reasonable cutoff
value will be inferred
through a cutoff_analysis
.
Named list of regex queries.
Directory to save peaks to (only used when calling peaks from bedGraph files).
Path to save query results to in .rds format.
When nThread>1
, accelerates file importing
and peak calling using multi-core parallelisation.
Print messages.
A nested named list of peak files in GRanges format.
Nesting structure is as follows:
database -> id -> GRanges objectEach GRanges object contains all the peak
data that was found for that particular id
, merged into one.
You can differentiate the various
source file types by looking at the column "peaktype".
If peaks could not be recovered for a sample,
that element will be set to NULL
.
grl <- PeakyFinders::import_peaks(
ids = c("GSM945244"),# "ENCSR000AHD"
searches = PeakyFinders::construct_searches(keys = "narrowpeak"))
#> Processing id(s).
#> 1 unique GEO id(s) identified.
#> Querying 1 id(s) from: GEO
#> Processing id: >>> GSM945244 <<<
#> Determining available file types.
#> Found file link(s) for 1 category.
#> narrowpeak :
#> >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep1.narrowPeak.gz
#> >>> https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM945nnn/GSM945244/suppl//GSM945244_hg19_wgEncodeUwHistoneA549H3k04me3StdPkRep2.narrowPeak.gz
#>
#> Importing pre-computed narrowPeak files.
#>
#> Saving results ==> /tmp/RtmppYFQrz/file163c479b1747_PeakyFinders_grl.rds