vignettes/MAGMA.Celltyping.Rmd
MAGMA.Celltyping.Rmd
MAGMA.Celltyping
is a software package that facilitates
conducting cell-type-specific enrichment tests on GWAS summary
statistics.
Specify where you want the large files to be downloaded to.
NOTE: Make sure you change storage_dir
to
somewhere other than tempdir()
if you want to make sure the
results aren’t deleted after this R session closes!
storage_dir <- tempdir()
get_example_gwas()
.Here we provide a pre-munged version of the above file.
Our lab have created MungeSumstats
,
a robust Bioconductor package for formatting multiple types of summary
statistics files. We highly recommend processing your GWAS summary
statistics with MungeSumstats
before continuing. See the
full_workflow vignette for more details.
The minimum info needed after munging is:
- “SNP”, “CHR”, and “BP” as first three columns. - It has at least one
of these columns: “Z”,“OR”,“BETA”,“LOG_ODDS”,“SIGNED_SUMSTAT”
path_formatted <- MAGMA.Celltyping::get_example_gwas(
trait = "prospective_memory")
Note you can input the genome build of your summary statistics for
this step or it can be inferred if left NULL
:
genesOutPath <- MAGMA.Celltyping::map_snps_to_genes(
path_formatted = path_formatted,
genome_build = "GRCh37")
Rather than preprocessing the GWAS yourself, you can instead use the
MAGMA_Files_Public
database we have created. It contains pre-computed MAGMA SNP-to-genes
mapping files for hundreds of GWAS.
You can browse which GWAS traits are available by looking at the provided metadata.csv file.
magma_dirs <- MAGMA.Celltyping::import_magma_files(ids = "ieu-a-298")
## Using built-in example files: ieu-a-298.tsv.gz.35UP.10DOWN
## Returning MAGMA directories.
ewceData
provides a number of CellTypeDatasets (CTD) to
be used a cell-type transcriptomic signature reference files.
If you want to create your own single-cell transcriptomic reference,
you’ll need to first convert it to CTD using the instructions found in
the EWCE
package documentation here.
ctd <- ewceData::ctd()
## see ?ewceData and browseVignettes('ewceData') for documentation
## loading from cache
Note that the cell type dataset loaded in the code above is the Karolinksa cortex/hippocampus data only. For the full Karolinska dataset with hypothalamus and midbrain instead use the following:
ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")
Or for the DRONC seq or AIBS datasets use:
ctd <- get_ctd("ctd_Tasic")
ctd <- get_ctd("ctd_DivSeq")
ctd <- get_ctd("ctd_AIBS")
ctd <- get_ctd("ctd_DRONC_human")
ctd <- get_ctd("ctd_DRONC_mouse")
ctd <- get_ctd("ctd_BlueLake2018_FrontalCortexOnly")
ctd <- get_ctd("ctd_BlueLake2018_VisualCortexOnly")
ctd <- get_ctd("ctd_Saunders")
MAGMA.Celltyping
offers a suite of functions for
conducting various types of cell-type-specific enrichment tests on GWAS
summary statistics.
The celltype_associations_pipeline
wraps several
functions that in previous versions of MAGMA.Celltyping
had
to be set up and run separately. These include:
calculate_celltype_associations(EnrichmentMode = "linear")
internally. Activated when run_linear=TRUE
.calculate_celltype_associations(EnrichmentMode = "Top 10%")
internally. Activated when run_top10=TRUE
.calculate_conditional_celltype_associations
internally.
Activated when run_conditional=TRUE
.Thus, celltype_associations_pipeline
is designed to make
these analyses easier to run.
MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
magma_dirs = magma_dirs,
ctd = ctd,
ctd_species = "mouse",
ctd_name = "Zeisel2015",
run_linear = TRUE,
run_top10 = TRUE)
We’ve also saved a pre-computed version of these results as a dataset:
MAGMA_results <- MAGMA.Celltyping::enrichment_results
merge_results
imports each of the MAGMA enrichment
results files and merges them into one so that they can easily be
plotted and further analysed.
merged_results <- MAGMA.Celltyping::merge_results(
MAGMA_results = MAGMA_results)
## Saving full merged results to ==> /tmp/RtmpCnpa8x/MAGMA_celltyping./.lvl1.csv
knitr::kable(merged_results)
GWAS | Celltype | TYPE | OBS_GENES | BETA | BETA_STD | SE | P | log10p | level | Method | EnrichmentMode | GCOV_FILE | CONTROL | CONTROL_label | genesOutCOND | analysis_name | FDR | Celltype_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ieu-a-298.tsv.gz.35UP.10DOWN | o l i g o d e n d r o c y t e s | COVAR | 962 | 0.0016776 | 0.0199640 | 0.0024613 | 0.24785 | -0.6058111 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.867475 | oligodendrocytes |
ieu-a-298.tsv.gz.35UP.10DOWN | a s t r o c y t e s _ e p e n d y m a l | SET | 104 | 0.0671670 | 0.0178860 | 0.0853000 | 0.21562 | -0.6663110 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | astrocytes_ependymal |
ieu-a-298.tsv.gz.35UP.10DOWN | e n d o t h e l i a l _ m u r a l | SET | 98 | 0.0831670 | 0.0215500 | 0.0962320 | 0.19384 | -0.7125566 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | endothelial_mural |
ieu-a-298.tsv.gz.35UP.10DOWN | o l i g o d e n d r o c y t e s | SET | 94 | 0.1124000 | 0.0285690 | 0.1017200 | 0.13472 | -0.8705679 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | oligodendrocytes |
ieu-a-298.tsv.gz.35UP.10DOWN | a s t r o c y t e s _ e p e n d y m a l | COVAR | 962 | -0.0003729 | -0.0045268 | 0.0022566 | 0.56561 | -0.2474829 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | astrocytes_ependymal |
ieu-a-298.tsv.gz.35UP.10DOWN | e n d o t h e l i a l _ m u r a l | COVAR | 962 | 0.0002391 | 0.0029144 | 0.0023811 | 0.46002 | -0.3372233 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | endothelial_mural |
ieu-a-298.tsv.gz.35UP.10DOWN | i n t e r n e u r o n s | COVAR | 962 | -0.0033333 | -0.0393200 | 0.0024009 | 0.91728 | -0.0374981 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | interneurons |
ieu-a-298.tsv.gz.35UP.10DOWN | m i c r o g l i a | COVAR | 962 | -0.0002839 | -0.0036131 | 0.0022446 | 0.55030 | -0.2594005 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | microglia |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ C A 1 | COVAR | 962 | -0.0037493 | -0.0440450 | 0.0023802 | 0.94218 | -0.0258661 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | pyramidal_CA1 |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ S S | COVAR | 962 | -0.0023657 | -0.0282810 | 0.0023306 | 0.84480 | -0.0732461 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | pyramidal_SS |
ieu-a-298.tsv.gz.35UP.10DOWN | i n t e r n e u r o n s | SET | 106 | -0.1554800 | -0.0417660 | 0.0911160 | 0.95587 | -0.0196012 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.981580 | interneurons |
ieu-a-298.tsv.gz.35UP.10DOWN | m i c r o g l i a | SET | 91 | -0.0341170 | -0.0085425 | 0.1093000 | 0.62250 | -0.2058606 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.981580 | microglia |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ C A 1 | SET | 111 | -0.1930200 | -0.0529550 | 0.0923320 | 0.98158 | -0.0080743 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.981580 | pyramidal_CA1 |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ S S | SET | 98 | -0.0750110 | -0.0194370 | 0.0965900 | 0.78120 | -0.1072378 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.981580 | pyramidal_SS |
Now we’ll construct a heatmap visualizing the enrichment results, such that each GWAS is shown on the y-axis and each cell-type is shown on the x-axis. Results can be further facetted by what kind of test was run (linear, top10%, and/or conditional).
heat <- MAGMA.Celltyping::results_heatmap(
merged_results = merged_results,
title = "Alzheimer's Disease (ieu-a-298) vs.\nnervous system cell-types (Zeisel2015)",
fdr_thresh = 1)
## 14 results @ FDR < 1
## Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
## ℹ Please use the `rows` argument instead.
## ℹ The deprecated feature was likely used in the MAGMA.Celltyping package.
## Please report the issue at
## <https://github.com/neurogenomics/MAGMA_Celltyping/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Now we can also plot the p-value results (-log10 p-values) based on a significance threshold (for either linear and top10%) together in a bar plot (sorted by top10% results).
barplot_ggplot <- MAGMA.Celltyping::results_barplot(
merged_results = merged_results,
title="Alzheimer's Disease (ieu-a-298) vs.\nnervous system cell-types (Zeisel2015)",
fdr_thresh = 1,
horz_line_p = .2
)
## 14 results @ FDR < 1
Get the phenotypes with the greatest number of significant cell-type enrichment results.
top_phenos <- merged_results %>%
dplyr::group_by(EnrichmentMode, GWAS) %>%
dplyr::summarise(Celltype=dplyr::n_distinct(Celltype)) %>%
dplyr::arrange(dplyr::desc(Celltype))
## `summarise()` has grouped output by 'EnrichmentMode'. You can override using
## the `.groups` argument.
knitr::kable(top_phenos)
EnrichmentMode | GWAS | Celltype |
---|---|---|
Linear | ieu-a-298.tsv.gz.35UP.10DOWN | 7 |
Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN | 7 |
Get the phenotypes-celltype enrichment results with the most significant p-values (per phenotype).
top_enrich <- merged_results %>%
dplyr::group_by(EnrichmentMode, GWAS) %>%
dplyr::slice_min(FDR, n = 2)
knitr::kable(top_enrich)
GWAS | Celltype | TYPE | OBS_GENES | BETA | BETA_STD | SE | P | log10p | level | Method | EnrichmentMode | GCOV_FILE | CONTROL | CONTROL_label | genesOutCOND | analysis_name | FDR | Celltype_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ieu-a-298.tsv.gz.35UP.10DOWN | o l i g o d e n d r o c y t e s | COVAR | 962 | 0.0016776 | 0.0199640 | 0.0024613 | 0.24785 | -0.6058111 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.867475 | oligodendrocytes |
ieu-a-298.tsv.gz.35UP.10DOWN | a s t r o c y t e s _ e p e n d y m a l | COVAR | 962 | -0.0003729 | -0.0045268 | 0.0022566 | 0.56561 | -0.2474829 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | astrocytes_ependymal |
ieu-a-298.tsv.gz.35UP.10DOWN | e n d o t h e l i a l _ m u r a l | COVAR | 962 | 0.0002391 | 0.0029144 | 0.0023811 | 0.46002 | -0.3372233 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | endothelial_mural |
ieu-a-298.tsv.gz.35UP.10DOWN | i n t e r n e u r o n s | COVAR | 962 | -0.0033333 | -0.0393200 | 0.0024009 | 0.91728 | -0.0374981 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | interneurons |
ieu-a-298.tsv.gz.35UP.10DOWN | m i c r o g l i a | COVAR | 962 | -0.0002839 | -0.0036131 | 0.0022446 | 0.55030 | -0.2594005 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | microglia |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ C A 1 | COVAR | 962 | -0.0037493 | -0.0440450 | 0.0023802 | 0.94218 | -0.0258661 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | pyramidal_CA1 |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ S S | COVAR | 962 | -0.0023657 | -0.0282810 | 0.0023306 | 0.84480 | -0.0732461 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | pyramidal_SS |
ieu-a-298.tsv.gz.35UP.10DOWN | a s t r o c y t e s _ e p e n d y m a l | SET | 104 | 0.0671670 | 0.0178860 | 0.0853000 | 0.21562 | -0.6663110 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | astrocytes_ependymal |
ieu-a-298.tsv.gz.35UP.10DOWN | e n d o t h e l i a l _ m u r a l | SET | 98 | 0.0831670 | 0.0215500 | 0.0962320 | 0.19384 | -0.7125566 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | endothelial_mural |
ieu-a-298.tsv.gz.35UP.10DOWN | o l i g o d e n d r o c y t e s | SET | 94 | 0.1124000 | 0.0285690 | 0.1017200 | 0.13472 | -0.8705679 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | oligodendrocytes |
utils::sessionInfo()
## R Under development (unstable) (2025-02-15 r87726)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ewceData_1.15.0 ExperimentHub_2.15.0 AnnotationHub_3.15.0
## [4] BiocFileCache_2.15.1 dbplyr_2.5.0 BiocGenerics_0.53.6
## [7] generics_0.1.3 dplyr_1.1.4 MAGMA.Celltyping_2.0.15
## [10] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.5.0 BiocIO_1.17.1
## [3] bitops_1.0-9 ggplotify_0.1.2
## [5] filelock_1.0.3 tibble_3.2.1
## [7] R.oo_1.27.0 XML_3.99-0.18
## [9] lifecycle_1.0.4 Rdpack_2.6.2
## [11] rstatix_0.7.2 lattice_0.22-6
## [13] MASS_7.3-64 backports_1.5.0
## [15] magrittr_2.0.3 limma_3.63.4
## [17] plotly_4.10.4 sass_0.4.9
## [19] rmarkdown_2.29 jquerylib_0.1.4
## [21] yaml_2.3.10 HGNChelper_0.8.15
## [23] DBI_1.2.3 minqa_1.2.8
## [25] abind_1.4-8 GenomicRanges_1.59.1
## [27] purrr_1.0.4 R.utils_2.13.0
## [29] RCurl_1.98-1.16 yulab.utils_0.2.0
## [31] VariantAnnotation_1.53.1 rappdirs_0.3.3
## [33] GenomeInfoDbData_1.2.13 IRanges_2.41.3
## [35] S4Vectors_0.45.4 tidytree_0.4.6
## [37] pkgdown_2.1.1 codetools_0.2-20
## [39] DelayedArray_0.33.6 tidyselect_1.2.1
## [41] aplot_0.2.4 UCSC.utils_1.3.1
## [43] farver_2.1.2 lme4_1.1-36
## [45] matrixStats_1.5.0 stats4_4.5.0
## [47] GenomicAlignments_1.43.0 jsonlite_1.9.0
## [49] Formula_1.2-5 systemfonts_1.2.1
## [51] tools_4.5.0 treeio_1.31.0
## [53] ragg_1.3.3 Rcpp_1.0.14
## [55] glue_1.8.0 SparseArray_1.7.6
## [57] xfun_0.51 MatrixGenerics_1.19.1
## [59] GenomeInfoDb_1.43.4 RNOmni_1.0.1.2
## [61] withr_3.0.2 BiocManager_1.30.25
## [63] fastmap_1.2.0 boot_1.3-31
## [65] digest_0.6.37 mime_0.12
## [67] R6_2.6.1 gridGraphics_0.5-1
## [69] textshaping_1.0.0 colorspace_2.1-1
## [71] RSQLite_2.3.9 R.methodsS3_1.8.2
## [73] tidyr_1.3.1 data.table_1.17.0
## [75] rtracklayer_1.67.1 httr_1.4.7
## [77] htmlwidgets_1.6.4 S4Arrays_1.7.3
## [79] pkgconfig_2.0.3 gtable_0.3.6
## [81] blob_1.2.4 SingleCellExperiment_1.29.1
## [83] XVector_0.47.2 htmltools_0.5.8.1
## [85] carData_3.0-5 bookdown_0.42
## [87] scales_1.3.0 Biobase_2.67.0
## [89] png_0.1-8 reformulas_0.4.0
## [91] ggfun_0.1.8 ggdendro_0.2.0
## [93] knitr_1.49 reshape2_1.4.4
## [95] rjson_0.2.23 nlme_3.1-167
## [97] curl_6.2.1 nloptr_2.1.1
## [99] cachem_1.1.0 stringr_1.5.1
## [101] BiocVersion_3.21.1 parallel_4.5.0
## [103] AnnotationDbi_1.69.0 restfulr_0.0.15
## [105] desc_1.4.3 pillar_1.10.1
## [107] grid_4.5.0 vctrs_0.6.5
## [109] ggpubr_0.6.0 car_3.1-3
## [111] evaluate_1.0.3 orthogene_1.13.0
## [113] GenomicFeatures_1.59.1 cli_3.6.4
## [115] compiler_4.5.0 Rsamtools_2.23.1
## [117] rlang_1.1.5 crayon_1.5.3
## [119] grr_0.9.5 ggsignif_0.6.4
## [121] labeling_0.4.3 gprofiler2_0.2.3
## [123] EWCE_1.15.1 ieugwasr_1.0.1
## [125] plyr_1.8.9 fs_1.6.5
## [127] stringi_1.8.4 viridisLite_0.4.2
## [129] BiocParallel_1.41.2 babelgene_22.9
## [131] munsell_0.5.1 Biostrings_2.75.4
## [133] lazyeval_0.2.2 gh_1.4.1
## [135] homologene_1.4.68.19.3.27 Matrix_1.7-2
## [137] MungeSumstats_1.15.12 BSgenome_1.75.1
## [139] patchwork_1.3.0 bit64_4.6.0-1
## [141] ggplot2_3.5.1 KEGGREST_1.47.0
## [143] statmod_1.5.0 SummarizedExperiment_1.37.0
## [145] rbibutils_2.3 broom_1.0.7
## [147] memoise_2.0.1 bslib_0.9.0
## [149] ggtree_3.15.0 bit_4.5.0.1
## [151] splitstackshape_1.4.8 ape_5.8-1