Prioritise target genes based on a procedure:
Disease-level: keep_deaths
:
Keep only diseases with a certain age of death.
Disease-level: severity_threshold_max
:
Keep only diseases annotated as a certain degree of severity or greater
(filters on maximum severity per disease).
Phenotype-level: prune_ancestors
:
Remove redundant ancestral phenotypes when at least one of their
descendants already exist.
Phenotype-level: keep_descendants
:
Remove phenotypes belonging to a certain branch of the HPO,
as defined by an ancestor term.
Phenotype-level: keep_ont_levels
:
Keep only phenotypes at certain absolute ontology levels within the HPO.
Phenotype-level: pheno_ndiseases_threshold
:
The maximum number of diseases each phenotype can be associated with.
Phenotype-level: keep_tiers
:
Keep only phenotypes with high severity Tiers.
Phenotype-level: severity_threshold
:
Keep only phenotypes with mean Severity equal to or below the threshold.
Phenotype-level: gpt_filters
:
Keep only phenotypes with certain GPT annotations in specific
severity metrics.
Phenotype-level: severity_score_gpt_threshold
:
Keep only phenotypes with a minimum GPT severity score.
Phenotype-level: info_content_threshold
:
Keep only phenotypes with a minimum information criterion score
(computed from the HPO).
Symptom-level: pheno_frequency_threshold
:
Keep only phenotypes with mean frequency equal to or above the threshold
(i.e. how frequently a phenotype is associated with any diseases in
which it occurs).
Symptom-level: keep_onsets
:
Keep only symptoms with a certain age of onset.
Symptom-level: symptom_p_threshold
:
Uncorrected p-value threshold to filter cell type-symptom associations by.
Symptom-level: symptom_intersection_threshold
:
Minimum proportion of genes overlapping between a symptom gene list
(phenotype-associated genes in the context of a particular disease)
and the phenotype-cell type association driver genes.
Cell type-level: q_threshold
:
Keep only cell type-phenotype association results at q<=0.05.
Cell type-level: effect_threshold
:
Keep only cell type-phenotype association results at effect size>=1.
Cell type-level: keep_celltypes
:
Keep only terminally differentiated cell types.
Gene-level: keep_chr
:
Remove genes on non-standard chromosomes.
Gene-level: evidence_score_threshold
:
Remove genes that are below an aggregate phenotype-gene
evidence score threshold.
Gene-level: gene_size
:
Keep only genes <4.3kb in length.
Gene-level: add_driver_genes
:
Keep only genes that are driving the association with a given phenotype
(inferred by the intersection of phenotype-associated genes and gene with
high-specificity quantiles in the target cell type).
Gene-level: keep_biotypes
:
Keep only genes belonging to certain biotypes.
Gene-level: gene_frequency_threshold
:
Keep only genes at or above a certain mean frequency threshold
(i.e. how frequently a gene is associated with a given phenotype
when observed within a disease).
Gene-level: keep_specificity_quantiles
:
Keep only genes in top specificity quantiles
from the cell type dataset (CTD).
Gene-level: keep_mean_exp_quantiles
:
Keep only genes in top mean expression quantiles
from the cell type dataset (CTD).
Gene-level: symptom_gene_overlap
:
Ensure that genes nominated at the phenotype-level also
appear in the genes overlapping at the cell type-specific symptom-level.
All levels: sort_cols
:
Sort candidate targets by one or more columns
(e.g. "severity_score_gpt", "q").
All levels: top_n
:
Only return the top N targets per variable group
(specified with the "group_vars" argument).
For example, setting "group_vars" to "hpo_id" and "top_n" to 1 would
only return one target (row) per phenotype ID after sorting.
prioritise_targets(
results = load_example_results(),
ctd_list = load_example_ctd(c("ctd_DescartesHuman.rds", "ctd_HumanCellLandscape.rds"),
multi_dataset = TRUE),
phenotype_to_genes = HPOExplorer::load_phenotype_to_genes(),
hpo = HPOExplorer::get_hpo(),
keep_deaths = HPOExplorer::list_deaths(exclude = c("Miscarriage", "Stillbirth",
"Prenatal death"), include_na = TRUE),
keep_descendants = c("Phenotypic abnormality"),
keep_ont_levels = NULL,
pheno_ndiseases_threshold = NULL,
gpt_filters = NULL,
severity_score_gpt_threshold = 20,
keep_tiers = NULL,
severity_threshold_max = NULL,
info_content_threshold = 8,
run_prune_ancestors = TRUE,
severity_threshold = NULL,
pheno_frequency_threshold = NULL,
keep_onsets = HPOExplorer::list_onsets(include_na = TRUE),
effect_var = "logFC",
q_threshold = 0.05,
effect_threshold = 1,
symptom_intersection_threshold = 0.25,
keep_celltypes = NULL,
evidence_score_threshold = 15,
keep_chr = c(seq(22), "X", "Y"),
gene_size = list(min = 0, max = Inf),
gene_frequency_threshold = NULL,
keep_biotypes = NULL,
keep_specificity_quantiles = seq(30, 40),
keep_mean_exp_quantiles = seq(30, 40),
sort_cols = c(severity_score_gpt = -1, q = 1, logFC = -1, specificity = -1, mean_exp =
-1, pheno_freq_mean = -1, gene_freq_mean = -1, width = 1),
top_n = NULL,
group_vars = c("hpo_id"),
return_report = TRUE,
verbose = TRUE,
save_path = tempfile(fileext = ".rds"),
force_new = FALSE
)
The cell type-phenotype enrichment results generated by gen_results and merged together with merge_results
A named list of CellTypeDataset objects each created with generate_celltype_data.
Output of load_phenotype_to_genes mapping phenotypes to gene annotations.
Human Phenotype Ontology object, loaded from get_ontology.
The age of death associated with each HPO ID to keep. If >1 age of death is associated with the term, only the earliest age is considered. See add_death for details.
Terms whose descendants should be kept
(including themselves).
Set to NULL
(default) to skip this filtering step.
Only keep phenotypes at certain absolute ontology levels to keep. See add_ont_lvl for details.
Filter phenotypes by the maximum number of diseases they are associated with.
A named list of filters to apply to the GPT annotations.
The minimum GPT severity score that a phenotype can have across any disease.
Tiers from hpo_tiers to keep.
Include NA
if you wish to retain phenotypes that
do not have any Tier assignment.
The max severity score that a phenotype can have across any disease.
Minimum phenotype information content threshold.
Prune redundant ancestral terms if any of their descendants are present. Passes to prune_ancestors.
Only keep phenotypes with a mean
severity score (averaged across multiple associated diseases) below the
set threshold. The severity score ranges from 1-4 where 1 is the MOST severe.
Include NA
if you wish to retain phenotypes that
do not have any severity score.
Only keep phenotypes with frequency
above the set threshold. Frequency ranges from 0-100 where 100 is
a phenotype that occurs 100% of the time in all associated diseases.
Include NA
if you wish to retain phenotypes that
do not have any frequency data.
See add_pheno_frequency for details.
The age of onset associated with each HPO ID to keep. If >1 age of onset is associated with the term, only the earliest age is considered. See add_onset for details.
Name of the effect size column in the results
.
The q value threshold to subset the results
by.
The minimum fold change in specific expression
to subset the results
by.
Minimum proportion of genes overlapping between a symptom gene list (phenotype-associated genes in the context of a particular disease) and the phenotype-cell type association driver genes
Cell type to keep.
The minimum threshold of mean evidence scores of each gene-phenotype association to keep.
Chromosomes to keep.
Min/max gene size (important for therapeutics design).
Only keep genes with frequency
above the set threshold. Frequency ranges from 0-100 where 100 is
a gene that occurs 100% of the time in a given phenotype.
Include NA
if you wish to retain genes that
do not have any frequency data.
See add_gene_frequency for details.
Which gene biotypes to keep. (e.g. "protein_coding", "processed_transcript", "snRNA", "lincRNA", "snoRNA", "IG_C_gene")
Which cell type specificity quantiles to keep (max quantile is 40).
Which cell type mean expression quantiles to keep (max quantile is 40).
How to sort the rows using setorderv.
names(sort_cols)
will be supplied to the cols=
argument
and values will be supplied to the order=
argument.
Top N genes to keep when grouping by group_vars
.
Columns to group by when selecting top_n
genes.
If TRUE
, will return a named list containing a
report
that shows the number of
phenotypes/celltypes/genes remaining after each filtering step.
Print messages.
Path to save results to.
Don't use previously saved results when TRUE
.
A data.table of the prioritised phenotype- and cell type-specific gene targets.
Term key:
Disease:
A disease defined in the database
OMIM, DECIPHER and/or Orphanet.
Phenotype: A clinical feature associated with one or more diseases.
Symptom:
A phenotype within the context of a particular disease.
Within a given phenotype, there may be multiple symptoms with
partially overlapping genetic mechanisms.
Assocation:
A cell type-specific enrichment test result conducted
at the disease-level, phenotype-level, or symptom-level.
if (FALSE) { # \dontrun{
results = load_example_results()[q<0.05]
out <- prioritise_targets(results=results)
} # }