Convert gene-phenotype associations from the Human Phenotype Ontology (HPO)
into a gene x phenotype matrix. The returned matrix is sparse and binary,
such that 1 indicates a the gene is associated with a given phenotype
according to the HPO annotation, and 0 indicates it is not.
By default, full phenotype names are used as the column names
(e.g. "Abnormality of body height"),
however you can instead set them to the HPO IDs
by changing the formula
argument to:
formula = "gene_symbol ~ hpo_id"
.
Phenotypes that are not present in the phenotype_to_genes
annotations
are omitted from the final matrix.
hpo_to_matrix(
terms = NULL,
phenotype_to_genes = load_phenotype_to_genes(),
formula = "gene_symbol ~ hpo_id",
fun.aggregate = mean,
value.var = "evidence_score_sum",
fill = 0,
run_cor = FALSE,
as_matrix = TRUE,
as_sparse = TRUE,
method = "pearson",
verbose = TRUE
)
A subset of HPO IDs to include.
Set to NULL
(default) to include all terms.
Output of load_phenotype_to_genes mapping phenotypes to gene annotations.
A formula of the form LHS ~ RHS to cast, see Details.
Should the data be aggregated before casting? If the formula doesn't identify a single observation for each cell, then aggregation defaults to length
with a warning of class 'dt_missing_fun_aggregate_warning'.
To use multiple aggregation functions, pass a list
; see Examples.
Name of the column whose values will be filled to cast. Function guess()
tries to, well, guess this column automatically, if none is provided.
Cast multiple value.var
columns simultaneously by passing their names as a character
vector. See Examples.
Value with which to fill missing cells. If fill=NULL
and missing cells are present, then fun.aggregate
is used on a 0-length vector to obtain a fill value.
Return a matrix of pairwise correlations.
Return the results as a matrix (TRUE
).
Otherwise, will return the results as a data.table
with an extra column "gene_symbol".
Convert the data to a sparse matrix.
Only used when as_matrix=TRUE
.
Method to construct plot with.
Print messages.
A gene x phenotype matrix,
or a phenotype x phenotype matrix if run_cor=TRUE
.
phenos <- example_phenos()
X <- hpo_to_matrix(terms = phenos$hpo_id)
#> Constructing HPO gene x phenotype matrix.
#> Reading cached RDS file: phenotype_to_genes.txt
#> + Version: v2024-12-12
#> Annotating gene-disease associations with Evidence Score
#> Gathering data from GenCC.
#> Importing cached file.
#> Evidence scores for:
#> - 10514 diseases
#> - 5171 genes
#> + Version: 2024-12-19