Test the “same genes for each phenotype” assumption

Follows conversation on GitHub here.

You can think of each phenotype-disease combination as its own “symptom”, each with semi-overlapping gene sets.

If the disease where the gene lists came from does not matter, the mean correlations within groups of symptoms (belonging to a particular phenotype) should be high. If they are low, it means the symptoms are very different from one another (despite all being associated with the same phenotype).

Count the number of phenotypes vs. the number of symptoms

Phenotypes are equal to: HPO_ID
Symptoms are the combinations of phenotype ID + disease ID: HPO_ID.HPO_ID.DatabaseID

Source

results <- MultiEWCE::load_example_results()

Message

## Registered S3 method overwritten by 'ggtree':
##   method         from     
##   fortify.igraph ggnetwork

Source

message("N phenotypes: ",formatC(length(unique(results$HPO_ID)),big.mark = ","))

Message

## N phenotypes: 2,831

Source

annot <- HPOExplorer::load_phenotype_to_genes(1)

Message

## Importing existing file: ... phenotype_to_genes.txt

Source

annot[,HPO_ID.DatabaseID:=paste(HPO_ID,LinkID,sep=".")]
message("N symptoms: ",formatC(length(unique(annot$HPO_ID.DatabaseID)),big.mark = ","))

Message

## N symptoms: 621,906

Filter symptoms

Get only symptoms with >4 genes each.

Source

gene_counts <- annot[,list(n=length(unique(Gene))),
                     by="HPO_ID.DatabaseID"]
gene_counts <- cbind(gene_counts,
      data.table::data.table(stringr::str_split(gene_counts$HPO_ID.DatabaseID,"\\.",
                                                simplify = TRUE)) |> `names<-`(c("HPO_ID","DatabaseID")))
gene_counts_valid <- gene_counts[n>=4]
annot <- annot[HPO_ID.DatabaseID %in% gene_counts_valid$HPO_ID.DatabaseID,]

message(round(nrow(gene_counts_valid)/nrow(gene_counts)*100,2),"%",
        " of HPO_IDs remain after filtering by symptom gene lists.")

Message

## 5.39% of HPO_IDs remain after filtering by symptom gene lists.

Create gene matrix

Source

X <- HPOExplorer::hpo_to_matrix(phenotype_to_genes = annot, 
                                formula =  "Gene ~ HPO_ID.DatabaseID")

Message

## Constructing HPO gene x phenotype matrix.

Compute correlations between symptoms

For each phenotype, compute the mean correlation between each of its symptoms.

Source

#### Parse names ####
name_map <- cbind(HPO_ID.DatabaseID=colnames(X),
      data.table::data.table(stringr::str_split(colnames(X),"\\.",
                                                simplify = TRUE)) |> `names<-`(c("HPO_ID","DatabaseID"))) |>
  data.table::setkeyv(cols = "HPO_ID.DatabaseID")
#### Aggregate corr/phenotype group ####
group_cor <- lapply(stats::setNames(unique(name_map$HPO_ID),
                                    unique(name_map$HPO_ID)),
                    function(id){
  idx <- name_map[HPO_ID==id]$HPO_ID.DatabaseID
  X_sub <- X[,idx, drop=FALSE]
  #### Get number of overlapping genes #####
  rs <- Matrix::rowSums(X_sub)
  X_sub <- X_sub[rs>0,]
  intersect_size <- sum(rs==ncol(X_sub))
  union_size <- sum(rs>0)
  max_overlap <- max(rs, na.rm = TRUE) 
  #### Compute corr ####
  X_cor <- WGCNA::cor(x = X_sub) 
  diag(X_cor) <- NA
  rm <- Matrix::rowMeans(X_cor, na.rm = TRUE)
  data.table::data.table(
    intersect_size=intersect_size,
    union_size=union_size,
    max_overlap=max_overlap,
    cor_mean=mean(rm, na.rm=TRUE),
    cor_sd=sd(rm, na.rm = TRUE)
  )
}) |> data.table::rbindlist(use.names = TRUE, idcol = "HPO_ID")

Message

##

Plot

Source

hist(group_cor$cor_mean, 50)

Summarise

I found that the symptom gene lists are not very strongly correlated with one another. Since the presence/absence of a gene is binary, correlation basically equates to “proportion of genes that overlap”.

That said, even given the low correlation between symptoms within a given phenotype, there are some drawbacks to this symptom-focused approach:

Each symptom potentially has a smaller set of genes that the phenotype. This means that more gene lists will be omitted from EWCE due to the >4 gene rule.
The number of gene lists tested increases from 6,170 phenotypes (our current results) to 33,542 symptoms (only 3,946 phenotypes across 335 diseases).

So theoretical debates aside, it seems that the benefit of aggregating gene lists to the level of phenotypes is that it increases our coverage of testable HPO terms (at least with EWCE) and decreases our multiple testing burden.

Source

summary(group_cor$cor_mean)

Output

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -1.0000 -0.4964 -0.1983 -0.3421 -0.0500  0.6703    1460

Linking celltype-phenotype associations to diseases

Which diseases contributed the genes to a particular celltype-phenotype association?

To answer this, I computed a simply gene overlap-based enrichment test for all symptoms, defined as HPO_ID + DiseaseID. This meant: 747,584 symptoms x 77 celltypes = 57,563,968 tests.

gene_data <- HPOExplorer::load_phenotype_to_genes()
gene_data[,HPO_ID.LinkID:=paste(HPO_ID,LinkID,sep=".")]
symptoms <- gen_overlap(gene_data = gene_data,
                             list_name_column = "HPO_ID.LinkID",
                             cores=30)

Source

g <- HPOExplorer::load_phenotype_to_genes()

Message

## Importing existing file: ... phenotype_to_genes.txt

Source

res <- MultiEWCE::load_example_results("Descartes_All_Results_extras.rds")
res <- res[q<0.05]
symp <- MultiEWCE::load_example_results("gen_overlap.symptoms.rds")


report <- function(symp,
                   annot,
                   symp_var,
                   annot_var=symp_var){
  message(paste(formatC(nrow(symp),big.mark = ","),
                "results remain."))
  n1 <- length(unique(symp[[symp_var]]))
  n2 <- length(unique(annot[[annot_var]])) 
  message(paste(n1,"/",n2,symp_var,
                paste0("(",round((n1/n2)*100,2),"%)"),
                "remain."))
}

First, we remove all tests that had 0 overlap, since we can safely rule those out. This still leaves us with >10M results.

Source

symp <- symp[intersection_size>0]
#### Split IDs apart again ####
symp[,c("HPO_ID","LinkID"):=data.table::tstrsplit(HPO_ID.LinkID,"\\.")]
#### Sort by pval then odds.ratio ####
data.table::setorderv(symp,c("pval","odds.ratio"), c(1,-1))

report(symp, g, symp_var="HPO_ID")

Message

## 10,080,094 results remain.

Message

## 10598 / 9677 HPO_ID (109.52%) remain.

Source

report(symp, g, symp_var="LinkID")

Message

## 10,080,094 results remain.

Message

## 8119 / 7839 LinkID (103.57%) remain.

Next, we try filtering by nominal p-values, leaving 684,697 results.

Source

psig <- symp[pval<0.05,]
report(psig, g, symp_var="HPO_ID")

Message

## 684,697 results remain.

Message

## 8315 / 9677 HPO_ID (85.93%) remain.

Source

report(psig, g, symp_var="LinkID")

Message

## 684,697 results remain.

Message

## 3002 / 7839 LinkID (38.3%) remain.

Using multiple-testing correction is too stringent due to the huge number of tests, and only leaves 14 unique diseases (<1%).

Source

qsig <- symp[qval<0.05,]
report(qsig, g, symp_var="HPO_ID")

Message

## 8,312 results remain.

Message

## 864 / 9677 HPO_ID (8.93%) remain.

Source

report(qsig, g, symp_var="LinkID")

Message

## 8,312 results remain.

Message

## 14 / 7839 LinkID (0.18%) remain.

In either filtering scenario, we are left with less than a complete picture of the celltypes invovled in each symptom. So instead, let’s just take all of the ones with at least 1 overlapping genes. We can always filter again later.

Now we can merge back with our celltype-phenotype enrichment results. This gives us a 99.3% annotation rate for linking phenotypes back to diseases via celltype-specific mechanisms. It also gives us a new column “intersection” which lists the genes that are mediating that interaction.

Source

res$CellType <- EWCE::fix_celltype_names(res$CellType, make_unique = FALSE)
res2 <- data.table::merge.data.table(res,
                                     symp, 
                                     by=c("HPO_ID","CellType"))

report(res2, res, symp_var="HPO_ID")

Message

## 1,106,181 results remain.

Message

## 2812 / 2831 HPO_ID (99.33%) remain.

Source

report(res2, g, symp_var="HPO_ID")

Message

## 1,106,181 results remain.

Message

## 2812 / 9677 HPO_ID (29.06%) remain.

Source

report(res2, g, symp_var="LinkID")

Message

## 1,106,181 results remain.

Message

## 8024 / 7839 LinkID (102.36%) remain.

We’ll upload the subset of symptoms that were kept in the end to a new file for easy usage later.

symp_filt <- symp[HPO_ID.LinkID %in% unique(res2$HPO_ID.LinkID)]
tmp <- file.path(tempdir(),"gen_overlap.symptoms.filt.rds")
saveRDS(symp_filt, tmp)

piggyback::pb_upload(file = tmp, 
                     repo = "neurogenomics/MultiEWCE",
                     tag = "v0.0.1")

cols <- c("pval","odds.ratio","jaccard","qval")
data.table::setnames(symp,cols, paste0("symptoms.",gsub("\\.","_",cols)))
res2 <- data.table::merge.data.table(res,
                                     symp, 
                                     all.x = TRUE,
                                     by=c("HPO_ID","CellType"))
tmp <- file.path(tempdir(),"Descartes_All_Results_extras.symptoms.rds")
saveRDS(res2, tmp)

piggyback::pb_upload(file = tmp, 
                     repo = "neurogenomics/MultiEWCE",
                     tag = "v0.0.1")

Session info

Source

sessioninfo::session_info()

Output

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.1 (2022-06-23)
##  os       macOS Big Sur ... 10.16
##  system   x86_64, darwin17.0
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/London
##  date     2023-03-22
##  pandoc   3.1 @ /usr/local/bin/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package                * version        date (UTC) lib source
##  abind                    1.4-5          2016-07-21 [1] CRAN (R 4.2.0)
##  AnnotationDbi            1.60.2         2023-03-10 [1] Bioconductor
##  AnnotationHub            3.6.0          2022-11-01 [1] Bioconductor
##  ape                      5.7-1          2023-03-13 [1] CRAN (R 4.2.0)
##  aplot                    0.1.10         2023-03-08 [1] CRAN (R 4.2.0)
##  babelgene                22.9           2022-09-29 [1] CRAN (R 4.2.0)
##  backports                1.4.1          2021-12-13 [1] CRAN (R 4.2.0)
##  base64enc                0.1-3          2015-07-28 [1] CRAN (R 4.2.0)
##  Biobase                  2.58.0         2022-11-01 [1] Bioconductor
##  BiocFileCache            2.6.1          2023-02-17 [1] Bioconductor
##  BiocGenerics             0.44.0         2022-11-01 [1] Bioconductor
##  BiocManager              1.30.20        2023-02-24 [1] CRAN (R 4.2.0)
##  BiocVersion              3.16.0         2022-05-05 [1] Bioconductor
##  Biostrings               2.66.0         2022-11-01 [1] Bioconductor
##  bit                      4.0.5          2022-11-15 [1] CRAN (R 4.2.0)
##  bit64                    4.0.5          2020-08-30 [1] CRAN (R 4.2.0)
##  bitops                   1.0-7          2021-04-24 [1] CRAN (R 4.2.0)
##  blob                     1.2.3          2022-04-10 [1] CRAN (R 4.2.0)
##  broom                    1.0.4          2023-03-11 [1] CRAN (R 4.2.1)
##  bslib                    0.4.2          2022-12-16 [1] CRAN (R 4.2.1)
##  cachem                   1.0.7          2023-02-24 [1] CRAN (R 4.2.0)
##  car                      3.1-1          2022-10-19 [1] CRAN (R 4.2.1)
##  carData                  3.0-5          2022-01-06 [1] CRAN (R 4.2.0)
##  caTools                  1.18.2         2021-03-28 [1] CRAN (R 4.2.0)
##  checkmate                2.1.0          2022-04-21 [1] CRAN (R 4.2.0)
##  cli                      3.6.0          2023-01-09 [1] CRAN (R 4.2.0)
##  cluster                  2.1.4          2022-08-22 [1] CRAN (R 4.2.0)
##  coda                     0.19-4         2020-09-30 [1] CRAN (R 4.2.0)
##  codetools                0.2-19         2023-02-01 [1] CRAN (R 4.2.0)
##  colorspace               2.1-0          2023-01-23 [1] CRAN (R 4.2.1)
##  crayon                   1.5.2          2022-09-29 [1] CRAN (R 4.2.0)
##  curl                     5.0.0          2023-01-12 [1] CRAN (R 4.2.0)
##  data.table               1.14.8         2023-02-17 [1] CRAN (R 4.2.0)
##  DBI                      1.1.3          2022-06-18 [1] CRAN (R 4.2.0)
##  dbplyr                   2.3.1          2023-02-24 [1] CRAN (R 4.2.0)
##  DelayedArray             0.24.0         2022-11-01 [1] Bioconductor
##  digest                   0.6.31         2022-12-11 [1] CRAN (R 4.2.0)
##  doParallel               1.0.17         2022-02-07 [1] CRAN (R 4.2.0)
##  dplyr                    1.1.0          2023-01-29 [1] CRAN (R 4.2.1)
##  dynamicTreeCut           1.63-1         2016-03-11 [1] CRAN (R 4.2.0)
##  ellipsis                 0.3.2          2021-04-29 [1] CRAN (R 4.2.0)
##  evaluate                 0.20           2023-01-17 [1] CRAN (R 4.2.1)
##  EWCE                     1.7.2          2023-03-10 [1] Bioconductor
##  ewceData                 1.7.1          2023-02-21 [1] Github (neurogenomics/ewceData@7303707)
##  ExperimentHub            2.6.0          2022-11-01 [1] Bioconductor
##  fansi                    1.0.4          2023-01-22 [1] CRAN (R 4.2.1)
##  fastcluster              1.2.3          2021-05-24 [1] CRAN (R 4.2.0)
##  fastmap                  1.1.1          2023-02-24 [1] CRAN (R 4.2.0)
##  filelock                 1.0.2          2018-10-05 [1] CRAN (R 4.2.0)
##  foreach                  1.5.2          2022-02-02 [1] CRAN (R 4.2.0)
##  foreign                  0.8-84         2022-12-06 [1] CRAN (R 4.2.0)
##  Formula                  1.2-5          2023-02-24 [1] CRAN (R 4.2.0)
##  GeneOverlap              1.34.0         2022-11-01 [1] Bioconductor
##  generics                 0.1.3          2022-07-05 [1] CRAN (R 4.2.0)
##  GenomeInfoDb             1.34.9         2023-02-02 [1] Bioconductor
##  GenomeInfoDbData         1.2.9          2022-10-02 [1] Bioconductor
##  GenomicRanges            1.50.2         2022-12-16 [1] Bioconductor
##  ggfun                    0.0.9          2022-11-21 [1] CRAN (R 4.2.0)
##  ggnetwork                0.5.12         2023-03-06 [1] CRAN (R 4.2.1)
##  ggplot2                  3.4.1          2023-02-10 [1] CRAN (R 4.2.0)
##  ggplotify                0.1.0          2021-09-02 [1] CRAN (R 4.2.0)
##  ggpubr                   0.6.0          2023-02-10 [1] CRAN (R 4.2.0)
##  ggsignif                 0.6.4          2022-10-13 [1] CRAN (R 4.2.0)
##  ggtree                   3.6.2          2022-11-10 [1] Bioconductor
##  glue                     1.6.2          2022-02-24 [1] CRAN (R 4.2.0)
##  GO.db                    3.16.0         2022-10-02 [1] Bioconductor
##  gplots                   3.1.3          2022-04-25 [1] CRAN (R 4.2.0)
##  gprofiler2               0.2.1          2021-08-23 [1] CRAN (R 4.2.0)
##  graph                    1.76.0         2022-11-01 [1] Bioconductor
##  gridExtra                2.3            2017-09-09 [1] CRAN (R 4.2.0)
##  gridGraphics             0.5-1          2020-12-13 [1] CRAN (R 4.2.0)
##  grr                      0.9.5          2016-08-26 [1] CRAN (R 4.2.0)
##  gtable                   0.3.1          2022-09-01 [1] CRAN (R 4.2.0)
##  gtools                   3.9.4          2022-11-27 [1] CRAN (R 4.2.0)
##  HGNChelper               0.8.1          2019-10-24 [1] CRAN (R 4.2.0)
##  highr                    0.10           2022-12-22 [1] CRAN (R 4.2.1)
##  Hmisc                    5.0-1          2023-03-08 [1] CRAN (R 4.2.0)
##  homologene               1.4.68.19.3.27 2019-03-28 [1] CRAN (R 4.2.0)
##  HPOExplorer              0.99.7         2023-03-22 [1] Bioconductor
##  htmlTable                2.4.1          2022-07-07 [1] CRAN (R 4.2.0)
##  htmltools                0.5.4          2022-12-07 [1] CRAN (R 4.2.0)
##  htmlwidgets              1.6.1          2023-01-07 [1] CRAN (R 4.2.0)
##  httpuv                   1.6.9          2023-02-14 [1] CRAN (R 4.2.1)
##  httr                     1.4.5          2023-02-24 [1] CRAN (R 4.2.0)
##  impute                   1.72.3         2023-01-19 [1] Bioconductor
##  interactiveDisplayBase   1.36.0         2022-11-01 [1] Bioconductor
##  IRanges                  2.32.0         2022-11-01 [1] Bioconductor
##  iterators                1.0.14         2022-02-05 [1] CRAN (R 4.2.0)
##  jquerylib                0.1.4          2021-04-26 [1] CRAN (R 4.2.0)
##  jsonlite                 1.8.4          2022-12-06 [1] CRAN (R 4.2.0)
##  KEGGREST                 1.38.0         2022-11-01 [1] Bioconductor
##  KernSmooth               2.23-20        2021-05-03 [1] CRAN (R 4.2.1)
##  knitr                    1.42           2023-01-25 [1] CRAN (R 4.2.1)
##  later                    1.3.0          2021-08-18 [1] CRAN (R 4.2.0)
##  lattice                  0.20-45        2021-09-22 [1] CRAN (R 4.2.1)
##  lazyeval                 0.2.2          2019-03-15 [1] CRAN (R 4.2.0)
##  lifecycle                1.0.3          2022-10-07 [1] CRAN (R 4.2.0)
##  limma                    3.54.2         2023-02-28 [1] Bioconductor
##  magrittr                 2.0.3          2022-03-30 [1] CRAN (R 4.2.0)
##  Matrix                   1.5-3          2022-11-11 [1] CRAN (R 4.2.1)
##  MatrixGenerics           1.10.0         2022-11-01 [1] Bioconductor
##  matrixStats              0.63.0         2022-11-18 [1] CRAN (R 4.2.0)
##  memoise                  2.0.1          2021-11-26 [1] CRAN (R 4.2.0)
##  mime                     0.12           2021-09-28 [1] CRAN (R 4.2.0)
##  minidown                 0.4.0          2022-02-08 [1] CRAN (R 4.2.0)
##  MultiEWCE                0.1.4          2023-03-22 [1] Bioconductor
##  munsell                  0.5.0          2018-06-12 [1] CRAN (R 4.2.0)
##  network                  1.18.1         2023-01-24 [1] CRAN (R 4.2.1)
##  nlme                     3.1-162        2023-01-31 [1] CRAN (R 4.2.0)
##  nnet                     7.3-18         2022-09-28 [1] CRAN (R 4.2.0)
##  ontologyIndex            2.10           2022-08-24 [1] CRAN (R 4.2.0)
##  ontologyPlot             1.6            2021-02-10 [1] CRAN (R 4.2.0)
##  orthogene                1.4.1          2022-11-13 [1] Bioconductor
##  paintmap                 1.0            2016-08-31 [1] CRAN (R 4.2.0)
##  patchwork                1.1.2          2022-08-19 [1] CRAN (R 4.2.0)
##  piggyback                0.1.4          2022-07-19 [1] CRAN (R 4.2.0)
##  pillar                   1.8.1          2022-08-19 [1] CRAN (R 4.2.0)
##  pkgconfig                2.0.3          2019-09-22 [1] CRAN (R 4.2.0)
##  plotly                   4.10.1         2022-11-07 [1] CRAN (R 4.2.1)
##  plyr                     1.8.8          2022-11-11 [1] CRAN (R 4.2.1)
##  png                      0.1-8          2022-11-29 [1] CRAN (R 4.2.1)
##  preprocessCore           1.60.2         2023-01-19 [1] Bioconductor
##  promises                 1.2.0.1        2021-02-11 [1] CRAN (R 4.2.0)
##  purrr                    1.0.1          2023-01-10 [1] CRAN (R 4.2.0)
##  R6                       2.5.1          2021-08-19 [1] CRAN (R 4.2.0)
##  rappdirs                 0.3.3          2021-01-31 [1] CRAN (R 4.2.0)
##  RColorBrewer             1.1-3          2022-04-03 [1] CRAN (R 4.2.0)
##  Rcpp                     1.0.10         2023-01-22 [1] CRAN (R 4.2.1)
##  RCurl                    1.98-1.10      2023-01-27 [1] CRAN (R 4.2.0)
##  reshape2                 1.4.4          2020-04-09 [1] CRAN (R 4.2.0)
##  Rgraphviz                2.42.0         2022-11-01 [1] Bioconductor
##  rlang                    1.1.0          2023-03-14 [1] CRAN (R 4.2.0)
##  rmarkdown                2.20.1         2023-02-16 [1] Github (rstudio/rmarkdown@a75dc37)
##  RNOmni                   1.0.1          2022-08-15 [1] CRAN (R 4.2.0)
##  rpart                    4.1.19         2022-10-21 [1] CRAN (R 4.2.0)
##  RSQLite                  2.3.0          2023-02-17 [1] CRAN (R 4.2.0)
##  rstatix                  0.7.2          2023-02-01 [1] CRAN (R 4.2.1)
##  rstudioapi               0.14           2022-08-22 [1] CRAN (R 4.2.0)
##  S4Vectors                0.36.2         2023-02-26 [1] Bioconductor
##  sass                     0.4.5          2023-01-24 [1] CRAN (R 4.2.0)
##  scales                   1.2.1          2022-08-20 [1] CRAN (R 4.2.0)
##  sessioninfo              1.2.2          2021-12-06 [1] CRAN (R 4.2.0)
##  shiny                    1.7.4          2022-12-15 [1] CRAN (R 4.2.1)
##  SingleCellExperiment     1.20.0         2022-11-01 [1] Bioconductor
##  statnet.common           4.8.0          2023-01-24 [1] CRAN (R 4.2.1)
##  stringi                  1.7.12         2023-01-11 [1] CRAN (R 4.2.0)
##  stringr                  1.5.0          2022-12-02 [1] CRAN (R 4.2.0)
##  SummarizedExperiment     1.28.0         2022-11-01 [1] Bioconductor
##  survival                 3.5-5          2023-03-12 [1] CRAN (R 4.2.0)
##  tibble                   3.2.0          2023-03-08 [1] CRAN (R 4.2.0)
##  tidyr                    1.3.0          2023-01-24 [1] CRAN (R 4.2.1)
##  tidyselect               1.2.0          2022-10-10 [1] CRAN (R 4.2.0)
##  tidytree                 0.4.2          2022-12-18 [1] CRAN (R 4.2.0)
##  treeio                   1.23.1         2023-03-16 [1] Github (GuangchuangYu/treeio@2f756cb)
##  utf8                     1.2.3          2023-01-31 [1] CRAN (R 4.2.1)
##  vctrs                    0.6.0          2023-03-16 [1] CRAN (R 4.2.1)
##  viridisLite              0.4.1          2022-08-22 [1] CRAN (R 4.2.0)
##  WGCNA                    1.72-1         2023-01-18 [1] CRAN (R 4.2.0)
##  xfun                     0.37           2023-01-31 [1] CRAN (R 4.2.1)
##  xtable                   1.8-4          2019-04-21 [1] CRAN (R 4.2.0)
##  XVector                  0.38.0         2022-11-01 [1] Bioconductor
##  yaml                     2.3.7          2023-01-23 [1] CRAN (R 4.2.1)
##  yulab.utils              0.0.6          2022-12-20 [1] CRAN (R 4.2.1)
##  zlibbioc                 1.44.0         2022-11-01 [1] Bioconductor
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
## 
## ──────────────────────────────────────────────────────────────────────────────

Rare Disease Celltyping

Symptoms

Brian M. Schilder

Updated: Mar-22-2023

Test the “same genes for each phenotype” assumption

Count the number of phenotypes vs. the number of symptoms

Filter symptoms

Create gene matrix

Compute correlations between symptoms

Plot

Summarise

Linking celltype-phenotype associations to diseases

Session info