Plot congenital annotations — plot_congenital

Test whether there is a difference in proportion if significantly associated fetal and non-fetal cell types across phenotypes with congenital onset vs. those without.

plot_congenital_annotations(
  results,
  gpt_annot = HPOExplorer::gpt_annot_codify(),
  hpo = HPOExplorer::get_hpo(),
  fetal_keywords = c("fetal", "fetus", "primordial", "hESC", "embryonic"),
  celltype_col = "author_celltype",
  x_var = c("fetal_celltype", "fetal_only"),
  remove_annotations = c("varies"),
  keep_descendants = NULL,
  by_branch = FALSE,
  q_threshold = 0.05,
  package = "palettetown",
  palette = "mewtwo",
  proportion.test = TRUE,
  add_baseline = TRUE,
  save_path = NULL,
  ...
)

Arguments

results

The cell type-phenotype enrichment results generated by gen_results and merged together with merge_results

gpt_annot

A data.table of GPT annotations.

hpo

Human Phenotype Ontology object, loaded from get_ontology.

fetal_keywords

A character vector of keywords to identify fetal cell types.

celltype_col

The column name of the cell type.

x_var

X-axis variable to plot.

remove_annotations

A character vector of annotations to remove.

keep_descendants

Terms whose descendants should be kept (including themselves). Set to NULL (default) to skip this filtering step.

by_branch

Use HPO ancestors as the x-axis instead of the frequency of congenital onset.

q_threshold

The q value threshold to subset the results by.

package, palette

Name of the package from which the given palette is to be extracted. The available palettes and packages can be checked by running View(paletteer::palettes_d_names).

proportion.test

Decides whether proportion test for x variable is to be carried out for each level of y. Defaults to results.subtitle. In ggbarstats(), only p-values from this test will be displayed.

add_baseline

Add a horizontal line showing the proportions expected by random.

save_path

The path to save the plot.

...

Arguments passed on to ggstatsplot::ggbarstats

data

A data frame (or a tibble) from which variables specified are to be taken. Other data types (e.g., matrix,table, array, etc.) will not be accepted. Additionally, grouped data frames from {dplyr} should be ungrouped before they are entered as data.

x

The variable to use as the rows in the contingency table. Please note that if there are empty factor levels in your variable, they will be dropped.

y

The variable to use as the columns in the contingency table. Please note that if there are empty factor levels in your variable, they will be dropped. Default is NULL. If NULL, one-sample proportion test (a goodness of fit test) will be run for the x variable. Otherwise an appropriate association test will be run. This argument can not be NULL for ggbarstats().

counts

The variable in data containing counts, or NULL if each row represents a single observation.

type

A character specifying the type of statistical approach:

"parametric"
"nonparametric"
"robust"
"bayes"

You can specify just the initial letter.

paired

Logical indicating whether data came from a within-subjects or repeated measures design study (Default: FALSE).

results.subtitle

Decides whether the results of statistical tests are to be displayed as a subtitle (Default: TRUE). If set to FALSE, only the plot will be returned.

label

Character decides what information needs to be displayed on the label in each pie slice. Possible options are "percentage" (default), "counts", "both".

label.args

Additional aesthetic arguments that will be passed to ggplot2::geom_label().

sample.size.label.args

Additional aesthetic arguments that will be passed to ggplot2::geom_text().

digits

Number of digits for rounding or significant figures. May also be "signif" to return significant figures or "scientific" to return scientific notation. Control the number of digits by adding the value as suffix, e.g. digits = "scientific4" to have scientific notation with 4 decimal places, or digits = "signif5" for 5 significant figures (see also signif()).

digits.perc

Numeric that decides number of decimal places for percentage labels (Default: 0L).

bf.message

Logical that decides whether to display Bayes Factor in favor of the null hypothesis. This argument is relevant only for parametric test (Default: TRUE).

ratio

A vector of proportions: the expected proportions for the proportion test (should sum to 1). Default is NULL, which means the null is equal theoretical proportions across the levels of the nominal variable. E.g., ratio = c(0.5, 0.5) for two levels, ratio = c(0.25, 0.25, 0.25, 0.25) for four levels, etc.

conf.level

Scalar between 0 and 1 (default: 95% confidence/credible intervals, 0.95). If NULL, no confidence intervals will be computed.

sampling.plan

Character describing the sampling plan. Possible options:

"indepMulti" (independent multinomial; default)
"poisson"
"jointMulti" (joint multinomial)
"hypergeom" (hypergeometric). For more, see BayesFactor::contingencyTableBF().

fixed.margin

For the independent multinomial sampling plan, which margin is fixed ("rows" or "cols"). Defaults to "rows".

prior.concentration

Specifies the prior concentration parameter, set to 1 by default. It indexes the expected deviation from the null hypothesis under the alternative, and corresponds to Gunel and Dickey's (1974) "a" parameter.

title

The text for the plot title.

subtitle

The text for the plot subtitle. Will work only if results.subtitle = FALSE.

caption

The text for the plot caption. This argument is relevant only if bf.message = FALSE.

legend.title

Title text for the legend.

xlab

Label for x axis variable. If NULL (default), variable name for x will be used.

ylab

Labels for y axis variable. If NULL (default), variable name for y will be used.

ggtheme

A {ggplot2} theme. Default value is theme_ggstatsplot(). Any of the {ggplot2} themes (e.g., ggplot2::theme_bw()), or themes from extension packages are allowed (e.g., ggthemes::theme_fivethirtyeight(), hrbrthemes::theme_ipsum_ps(), etc.). But note that sometimes these themes will remove some of the details that {ggstatsplot} plots typically contains. For example, if relevant, ggbetweenstats() shows details about multiple comparison test as a label on the secondary Y-axis. Some themes (e.g. ggthemes::theme_fivethirtyeight()) will remove the secondary Y-axis and thus the details as well.

package,palette

Name of the package from which the given palette is to be extracted. The available palettes and packages can be checked by running View(paletteer::palettes_d_names).

ggplot.component

A ggplot component to be added to the plot prepared by {ggstatsplot}. This argument is primarily helpful for grouped_ variants of all primary functions. Default is NULL. The argument should be entered as a {ggplot2} function or a list of {ggplot2} functions.

Examples

results <- load_example_results()
results2 <- plot_congenital_annotations(results=results)
#> Loading required namespace: ggstatsplot
#> Translating ontology terms to ids.
#> Reading cached RDS file: phenotype_to_genes.txt
#> + Version: v2025-05-06
#> 151 phenotypes do not have matching HPO IDs.
#> Reading in GPT annotations for 16,982 phenotypes.
#> Mapping cell types to cell ontology terms.
#> Adding stage information.
#> Adding level-2 ancestor to each HPO ID.
#> Adding ancestor metadata.
#> Ancestor metadata already present. Use force_new=TRUE to overwrite.
#> 46,514 associations remain after filtering.