Generates EWCE results on multiple gene lists in parallel by calling
ewce_para
. It allows you to stop the analysis and then continue later
from where you left off as it checks the results output directory for
finished gene lists and removes them from the input.
It also excludes gene lists with
less than 4 unique genes (which cause errors in EWCE analysis).
gen_results(
ctd,
gene_data,
list_name_column = "hpo_id",
gene_column = "gene_symbol",
list_names = unique(gene_data[[list_name_column]]),
reps = 100,
annotLevel = 1,
genelistSpecies = "human",
sctSpecies = "human",
force_new = FALSE,
bg = get_bg(species1 = genelistSpecies, species2 = sctSpecies, overwrite = force_new),
min_genes = 4,
cores = 1,
parallel_boot = FALSE,
save_dir_tmp = NULL,
save_dir = tempdir(),
verbose = 1,
...
)
Cell Type Data List generated using generate_celltype_data.
data frame of gene list names and genes (see get_gene_lists).
The name of the gene_data column that has the gene list names.
The name of the gene_data column that contains the genes.
character vector of gene list names.
Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
An integer indicating which level of sct_data
to
analyse (Default: 1).
Species that hits
genes came from
(no longer limited to just "mouse" and "human").
See list_species for all available species.
Species that sct_data
is currently formatted as
(no longer limited to just "mouse" and "human").
See list_species for all available species.
Overwrite previous results
in the save_dir_tmp
.
List of gene symbols containing the background gene list
(including hit genes). If bg=NULL
,
an appropriate gene background will be created automatically.
Minimum number of genes per list (default: 4)
The number of cores to run in parallel (e.g. 8) int
.
Parallelise at the level of bootstrap iterations, rather than across gene lists.
Folder to save intermediate results files to
(one file per gene list). Set to NULL
to skip saving temporary files.
Folder to save merged results in.
Print messages.
Arguments passed on to EWCE::bootstrap_enrichment_test
sct_data
List generated using generate_celltype_data.
hits
List of gene symbols containing the target gene list.
Will automatically be converted to human gene symbols
if geneSizeControl=TRUE
.
sctSpecies_origin
Species that the sct_data
originally came from, regardless of its current gene format
(e.g. it was previously converted from mouse to human gene orthologs).
This is used for computing an appropriate backgrund.
output_species
Species to convert sct_data
and hits
to
(Default: "human").
See list_species for all available species.
method
R package to use for gene mapping:
"gprofiler"
: Slower but more species and genes.
"homologene"
: Faster but fewer species and genes.
"babelgene"
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
no_cores
Number of cores to parallelise
bootstrapping reps
over.
geneSizeControl
Whether you want to control for
GC content and transcript length. Recommended if the gene list originates
from genetic studies (Default: FALSE).
If set to TRUE
, then hits
must be from humans.
controlledCT
[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.
mtc_method
Multiple-testing correction method (passed to p.adjust).
sort_results
Sort enrichment results from smallest to largest p-values.
standardise_sct_data
Should sct_data
be standardised?
if TRUE
:
When sctSpecies!=output_species
the sct_data
will be checked for object formatting and
the genes will be converted to the orthologs of the output_species
with standardise_ctd
(which calls map_genes internally).
When sctSpecies==output_species
,
the sct_data
will be checked for object formatting
with standardise_ctd, but the gene names
will remain untouched.
standardise_hits
Should hits
be standardised?
If TRUE
:
When genelistSpecies!=output_species
,
the genes will be converted to the orthologs of the output_species
with convert_orthologs.
When genelistSpecies==output_species
,
the genes will be standardised with map_genes.
If FALSE
, hits
will be passed on to subsequent steps as-is.
localHub
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.
store_gene_data
Store sampled gene data for every bootstrap iteration.
When the number of bootstrap reps
is very high (>=100k) and/or
the number of genes in hits
is very high, you may want
to set store_gene_data=FALSE
to avoid using excessive amounts of
CPU memory.
All results as a dataframe.
The gene_data should be a data frame that contains a column of gene list names (e.g. the column may be called "hpo_name"), and a column of genes (e.g. "gene_symbol"). For example:
hpo_name | gene_symbol |
"Abnormal heart" | gene X |
"Abnormal heart" | gene Y |
"Poor vision" | gene Z |
"Poor vision" | gene Y |
etc... |
For more information on this see docs for get_gene_list (get_gene_lists).
gene_data <- HPOExplorer::load_phenotype_to_genes()
#> Reading cached RDS file: phenotype_to_genes.txt
#> + Version: v2023-10-09
list_names <- unique(gene_data$hpo_id)[seq(5)]
ctd <- load_example_ctd()
all_results <- gen_results(ctd = ctd,
gene_data = gene_data,
list_names = list_names,
reps = 10)
#> Validating gene lists..
#> 2 / 5 gene lists are valid.
#> Useing cached bg.
#> + Version: 2023-11-14
#> Background contains 62,663 genes.
#> Computing gene counts.
#> Computing gene counts.
#> Done in: 5.5 seconds.
#>
#> Saving results ==> /tmp/Rtmp0tNWxK/gen_results.rds