Convert gene-phenotype associations from the Human Phenotype Ontology (HPO) into a gene x phenotype matrix. The returned matrix is sparse and binary, such that 1 indicates a the gene is associated with a given phenotype according to the HPO annotation, and 0 indicates it is not. By default, full phenotype names are used as the column names (e.g. "Abnormality of body height"), however you can instead set them to the HPO IDs by changing the formula argument to: formula = "gene_symbol ~ hpo_id". Phenotypes that are not present in the phenotype_to_genes annotations are omitted from the final matrix.

hpo_to_matrix(
  terms = NULL,
  phenotype_to_genes = load_phenotype_to_genes(),
  formula = "gene_symbol ~ hpo_id",
  fun.aggregate = mean,
  value.var = "evidence_score_sum",
  fill = 0,
  run_cor = FALSE,
  as_matrix = TRUE,
  as_sparse = TRUE,
  method = "pearson",
  verbose = TRUE
)

Arguments

terms

A subset of HPO IDs to include. Set to NULL (default) to include all terms.

phenotype_to_genes

Output of load_phenotype_to_genes mapping phenotypes to gene annotations.

formula

A formula of the form LHS ~ RHS to cast, see Details.

fun.aggregate

Should the data be aggregated before casting? If the formula doesn't identify a single observation for each cell, then aggregation defaults to length with a warning of class 'dt_missing_fun_aggregate_warning'.

To use multiple aggregation functions, pass a list; see Examples.

value.var

Name of the column whose values will be filled to cast. Function guess() tries to, well, guess this column automatically, if none is provided.

Cast multiple value.var columns simultaneously by passing their names as a character vector. See Examples.

fill

Value with which to fill missing cells. If fill=NULL and missing cells are present, then fun.aggregate is used on a 0-length vector to obtain a fill value.

run_cor

Return a matrix of pairwise correlations.

as_matrix

Return the results as a matrix (TRUE). Otherwise, will return the results as a data.table with an extra column "gene_symbol".

as_sparse

Convert the data to a sparse matrix. Only used when as_matrix=TRUE.

method

Method to construct plot with.

verbose

Print messages.

Value

A gene x phenotype matrix, or a phenotype x phenotype matrix if run_cor=TRUE.

Examples

phenos <- example_phenos()
X <- hpo_to_matrix(terms = phenos$hpo_id)
#> Constructing HPO gene x phenotype matrix.
#> Reading cached RDS file: phenotype_to_genes.txt
#> + Version: v2024-12-12
#> Annotating gene-disease associations with Evidence Score
#> Gathering data from GenCC.
#> Importing cached file.
#> Evidence scores for: 
#>  - 10514 diseases 
#>  - 5171 genes
#> + Version: 2024-12-19