vignettes/MAGMA.Celltyping.Rmd
MAGMA.Celltyping.Rmd
MAGMA.Celltyping
is a software package that facilitates
conducting cell-type-specific enrichment tests on GWAS summary
statistics.
Specify where you want the large files to be downloaded to.
NOTE: Make sure you change storage_dir
to
somewhere other than tempdir()
if you want to make sure the
results aren’t deleted after this R session closes!
storage_dir <- tempdir()
get_example_gwas()
.Here we provide a pre-munged version of the above file.
Our lab have created MungeSumstats
,
a robust Bioconductor package for formatting multiple types of summary
statistics files. We highly recommend processing your GWAS summary
statistics with MungeSumstats
before continuing. See the
full_workflow vignette for more details.
The minimum info needed after munging is:
- “SNP”, “CHR”, and “BP” as first three columns. - It has at least one
of these columns: “Z”,“OR”,“BETA”,“LOG_ODDS”,“SIGNED_SUMSTAT”
path_formatted <- MAGMA.Celltyping::get_example_gwas(
trait = "prospective_memory")
Note you can input the genome build of your summary statistics for
this step or it can be inferred if left NULL
:
genesOutPath <- MAGMA.Celltyping::map_snps_to_genes(
path_formatted = path_formatted,
genome_build = "GRCh37")
Rather than preprocessing the GWAS yourself, you can instead use the
MAGMA_Files_Public
database we have created. It contains pre-computed MAGMA SNP-to-genes
mapping files for hundreds of GWAS.
You can browse which GWAS traits are available by looking at the provided metadata.csv file.
magma_dirs <- MAGMA.Celltyping::import_magma_files(ids = "ieu-a-298")
## Using built-in example files: ieu-a-298.tsv.gz.35UP.10DOWN
## Returning MAGMA directories.
ewceData
provides a number of CellTypeDatasets (CTD) to
be used a cell-type transcriptomic signature reference files.
If you want to create your own single-cell transcriptomic reference,
you’ll need to first convert it to CTD using the instructions found in
the EWCE
package documentation here.
ctd <- ewceData::ctd()
## see ?ewceData and browseVignettes('ewceData') for documentation
## loading from cache
Note that the cell type dataset loaded in the code above is the Karolinksa cortex/hippocampus data only. For the full Karolinska dataset with hypothalamus and midbrain instead use the following:
ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")
Or for the DRONC seq or AIBS datasets use:
ctd <- get_ctd("ctd_Tasic")
ctd <- get_ctd("ctd_DivSeq")
ctd <- get_ctd("ctd_AIBS")
ctd <- get_ctd("ctd_DRONC_human")
ctd <- get_ctd("ctd_DRONC_mouse")
ctd <- get_ctd("ctd_BlueLake2018_FrontalCortexOnly")
ctd <- get_ctd("ctd_BlueLake2018_VisualCortexOnly")
ctd <- get_ctd("ctd_Saunders")
MAGMA.Celltyping
offers a suite of functions for
conducting various types of cell-type-specific enrichment tests on GWAS
summary statistics.
The celltype_associations_pipeline
wraps several
functions that in previous versions of MAGMA.Celltyping
had
to be set up and run separately. These include:
calculate_celltype_associations(EnrichmentMode = "linear")
internally. Activated when run_linear=TRUE
.calculate_celltype_associations(EnrichmentMode = "Top 10%")
internally. Activated when run_top10=TRUE
.calculate_conditional_celltype_associations
internally.
Activated when run_conditional=TRUE
.Thus, celltype_associations_pipeline
is designed to make
these analyses easier to run.
MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
magma_dirs = magma_dirs,
ctd = ctd,
ctd_species = "mouse",
ctd_name = "Zeisel2015",
run_linear = TRUE,
run_top10 = TRUE)
We’ve also saved a pre-computed version of these results as a dataset:
MAGMA_results <- MAGMA.Celltyping::enrichment_results
merge_results
imports each of the MAGMA enrichment
results files and merges them into one so that they can easily be
plotted and further analysed.
merged_results <- MAGMA.Celltyping::merge_results(
MAGMA_results = MAGMA_results)
## Saving full merged results to ==> /tmp/Rtmpj3yhpr/MAGMA_celltyping./.lvl1.csv
knitr::kable(merged_results)
GWAS | Celltype | TYPE | OBS_GENES | BETA | BETA_STD | SE | P | log10p | level | Method | EnrichmentMode | GCOV_FILE | CONTROL | CONTROL_label | genesOutCOND | analysis_name | FDR | Celltype_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ieu-a-298.tsv.gz.35UP.10DOWN | o l i g o d e n d r o c y t e s | COVAR | 962 | 0.0016776 | 0.0199640 | 0.0024613 | 0.24785 | -0.6058111 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.867475 | oligodendrocytes |
ieu-a-298.tsv.gz.35UP.10DOWN | a s t r o c y t e s _ e p e n d y m a l | SET | 104 | 0.0671670 | 0.0178860 | 0.0853000 | 0.21562 | -0.6663110 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | astrocytes_ependymal |
ieu-a-298.tsv.gz.35UP.10DOWN | e n d o t h e l i a l _ m u r a l | SET | 98 | 0.0831670 | 0.0215500 | 0.0962320 | 0.19384 | -0.7125566 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | endothelial_mural |
ieu-a-298.tsv.gz.35UP.10DOWN | o l i g o d e n d r o c y t e s | SET | 94 | 0.1124000 | 0.0285690 | 0.1017200 | 0.13472 | -0.8705679 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | oligodendrocytes |
ieu-a-298.tsv.gz.35UP.10DOWN | a s t r o c y t e s _ e p e n d y m a l | COVAR | 962 | -0.0003729 | -0.0045268 | 0.0022566 | 0.56561 | -0.2474829 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | astrocytes_ependymal |
ieu-a-298.tsv.gz.35UP.10DOWN | e n d o t h e l i a l _ m u r a l | COVAR | 962 | 0.0002391 | 0.0029144 | 0.0023811 | 0.46002 | -0.3372233 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | endothelial_mural |
ieu-a-298.tsv.gz.35UP.10DOWN | i n t e r n e u r o n s | COVAR | 962 | -0.0033333 | -0.0393200 | 0.0024009 | 0.91728 | -0.0374981 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | interneurons |
ieu-a-298.tsv.gz.35UP.10DOWN | m i c r o g l i a | COVAR | 962 | -0.0002839 | -0.0036131 | 0.0022446 | 0.55030 | -0.2594005 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | microglia |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ C A 1 | COVAR | 962 | -0.0037493 | -0.0440450 | 0.0023802 | 0.94218 | -0.0258661 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | pyramidal_CA1 |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ S S | COVAR | 962 | -0.0023657 | -0.0282810 | 0.0023306 | 0.84480 | -0.0732461 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | pyramidal_SS |
ieu-a-298.tsv.gz.35UP.10DOWN | i n t e r n e u r o n s | SET | 106 | -0.1554800 | -0.0417660 | 0.0911160 | 0.95587 | -0.0196012 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.981580 | interneurons |
ieu-a-298.tsv.gz.35UP.10DOWN | m i c r o g l i a | SET | 91 | -0.0341170 | -0.0085425 | 0.1093000 | 0.62250 | -0.2058606 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.981580 | microglia |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ C A 1 | SET | 111 | -0.1930200 | -0.0529550 | 0.0923320 | 0.98158 | -0.0080743 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.981580 | pyramidal_CA1 |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ S S | SET | 98 | -0.0750110 | -0.0194370 | 0.0965900 | 0.78120 | -0.1072378 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.981580 | pyramidal_SS |
Now we’ll construct a heatmap visualizing the enrichment results, such that each GWAS is shown on the y-axis and each cell-type is shown on the x-axis. Results can be further facetted by what kind of test was run (linear, top10%, and/or conditional).
heat <- MAGMA.Celltyping::results_heatmap(
merged_results = merged_results,
title = "Alzheimer's Disease (ieu-a-298) vs. nervous system cell-types (Zeisel2015)",
fdr_thresh = 1)
## 14 results @ FDR < 1
## Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
## ℹ Please use the `rows` argument instead.
## ℹ The deprecated feature was likely used in the MAGMA.Celltyping package.
## Please report the issue at
## <https://github.com/neurogenomics/MAGMA_Celltyping/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Get the phenotypes with the greatest number of significant cell-type enrichment results.
top_phenos <- merged_results %>%
dplyr::group_by(EnrichmentMode, GWAS) %>%
dplyr::summarise(Celltype=dplyr::n_distinct(Celltype)) %>%
dplyr::arrange(dplyr::desc(Celltype))
## `summarise()` has grouped output by 'EnrichmentMode'. You can override using
## the `.groups` argument.
knitr::kable(top_phenos)
EnrichmentMode | GWAS | Celltype |
---|---|---|
Linear | ieu-a-298.tsv.gz.35UP.10DOWN | 7 |
Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN | 7 |
Get the phenotypes-celltype enrichment results with the most significant p-values (per phenotype).
top_enrich <- merged_results %>%
dplyr::group_by(EnrichmentMode, GWAS) %>%
dplyr::slice_min(FDR, n = 2)
knitr::kable(top_enrich)
GWAS | Celltype | TYPE | OBS_GENES | BETA | BETA_STD | SE | P | log10p | level | Method | EnrichmentMode | GCOV_FILE | CONTROL | CONTROL_label | genesOutCOND | analysis_name | FDR | Celltype_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ieu-a-298.tsv.gz.35UP.10DOWN | o l i g o d e n d r o c y t e s | COVAR | 962 | 0.0016776 | 0.0199640 | 0.0024613 | 0.24785 | -0.6058111 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.867475 | oligodendrocytes |
ieu-a-298.tsv.gz.35UP.10DOWN | a s t r o c y t e s _ e p e n d y m a l | COVAR | 962 | -0.0003729 | -0.0045268 | 0.0022566 | 0.56561 | -0.2474829 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | astrocytes_ependymal |
ieu-a-298.tsv.gz.35UP.10DOWN | e n d o t h e l i a l _ m u r a l | COVAR | 962 | 0.0002391 | 0.0029144 | 0.0023811 | 0.46002 | -0.3372233 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | endothelial_mural |
ieu-a-298.tsv.gz.35UP.10DOWN | i n t e r n e u r o n s | COVAR | 962 | -0.0033333 | -0.0393200 | 0.0024009 | 0.91728 | -0.0374981 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | interneurons |
ieu-a-298.tsv.gz.35UP.10DOWN | m i c r o g l i a | COVAR | 962 | -0.0002839 | -0.0036131 | 0.0022446 | 0.55030 | -0.2594005 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | microglia |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ C A 1 | COVAR | 962 | -0.0037493 | -0.0440450 | 0.0023802 | 0.94218 | -0.0258661 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | pyramidal_CA1 |
ieu-a-298.tsv.gz.35UP.10DOWN | p y r a m i d a l _ S S | COVAR | 962 | -0.0023657 | -0.0282810 | 0.0023306 | 0.84480 | -0.0732461 | 1 | MAGMA | Linear | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_linear | 0.981580 | pyramidal_SS |
ieu-a-298.tsv.gz.35UP.10DOWN | a s t r o c y t e s _ e p e n d y m a l | SET | 104 | 0.0671670 | 0.0178860 | 0.0853000 | 0.21562 | -0.6663110 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | astrocytes_ependymal |
ieu-a-298.tsv.gz.35UP.10DOWN | e n d o t h e l i a l _ m u r a l | SET | 98 | 0.0831670 | 0.0215500 | 0.0962320 | 0.19384 | -0.7125566 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | endothelial_mural |
ieu-a-298.tsv.gz.35UP.10DOWN | o l i g o d e n d r o c y t e s | SET | 94 | 0.1124000 | 0.0285690 | 0.1017200 | 0.13472 | -0.8705679 | 1 | MAGMA | Top 10% | ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out | BASELINE | BASELINE | NA | Zeisel2015_top10 | 0.867475 | oligodendrocytes |
utils::sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ewceData_1.13.0 ExperimentHub_2.13.1 AnnotationHub_3.13.3
## [4] BiocFileCache_2.13.0 dbplyr_2.5.0 BiocGenerics_0.51.3
## [7] dplyr_1.1.4 MAGMA.Celltyping_2.0.14 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] splines_4.4.1 BiocIO_1.15.2
## [3] bitops_1.0-9 ggplotify_0.1.2
## [5] filelock_1.0.3 tibble_3.2.1
## [7] R.oo_1.26.0 XML_3.99-0.17
## [9] lifecycle_1.0.4 rstatix_0.7.2
## [11] lattice_0.22-6 MASS_7.3-61
## [13] backports_1.5.0 magrittr_2.0.3
## [15] limma_3.61.12 plotly_4.10.4
## [17] sass_0.4.9 rmarkdown_2.28
## [19] jquerylib_0.1.4 yaml_2.3.10
## [21] HGNChelper_0.8.14 minqa_1.2.8
## [23] DBI_1.2.3 abind_1.4-8
## [25] zlibbioc_1.51.1 GenomicRanges_1.57.1
## [27] purrr_1.0.2 R.utils_2.12.3
## [29] RCurl_1.98-1.16 yulab.utils_0.1.7
## [31] VariantAnnotation_1.51.1 rappdirs_0.3.3
## [33] GenomeInfoDbData_1.2.13 IRanges_2.39.2
## [35] S4Vectors_0.43.2 tidytree_0.4.6
## [37] pkgdown_2.1.1 codetools_0.2-20
## [39] DelayedArray_0.31.13 tidyselect_1.2.1
## [41] aplot_0.2.3 UCSC.utils_1.1.0
## [43] farver_2.1.2 lme4_1.1-35.5
## [45] matrixStats_1.4.1 stats4_4.4.1
## [47] GenomicAlignments_1.41.0 jsonlite_1.8.9
## [49] Formula_1.2-5 systemfonts_1.1.0
## [51] tools_4.4.1 treeio_1.29.1
## [53] ragg_1.3.3 Rcpp_1.0.13
## [55] glue_1.8.0 SparseArray_1.5.41
## [57] xfun_0.48 MatrixGenerics_1.17.0
## [59] GenomeInfoDb_1.41.2 RNOmni_1.0.1.2
## [61] withr_3.0.1 BiocManager_1.30.25
## [63] fastmap_1.2.0 boot_1.3-31
## [65] fansi_1.0.6 digest_0.6.37
## [67] mime_0.12 R6_2.5.1
## [69] gridGraphics_0.5-1 textshaping_0.4.0
## [71] colorspace_2.1-1 RSQLite_2.3.7
## [73] R.methodsS3_1.8.2 utf8_1.2.4
## [75] tidyr_1.3.1 generics_0.1.3
## [77] data.table_1.16.0 rtracklayer_1.65.0
## [79] httr_1.4.7 htmlwidgets_1.6.4
## [81] S4Arrays_1.5.10 pkgconfig_2.0.3
## [83] gtable_0.3.5 blob_1.2.4
## [85] SingleCellExperiment_1.27.2 XVector_0.45.0
## [87] htmltools_0.5.8.1 carData_3.0-5
## [89] bookdown_0.40 scales_1.3.0
## [91] Biobase_2.65.1 png_0.1-8
## [93] ggfun_0.1.6 ggdendro_0.2.0
## [95] knitr_1.48 reshape2_1.4.4
## [97] rjson_0.2.23 nloptr_2.1.1
## [99] nlme_3.1-166 curl_5.2.3
## [101] cachem_1.1.0 stringr_1.5.1
## [103] BiocVersion_3.20.0 parallel_4.4.1
## [105] AnnotationDbi_1.67.0 restfulr_0.0.15
## [107] desc_1.4.3 pillar_1.9.0
## [109] grid_4.4.1 vctrs_0.6.5
## [111] ggpubr_0.6.0 car_3.1-3
## [113] evaluate_1.0.0 orthogene_1.11.0
## [115] GenomicFeatures_1.57.1 cli_3.6.3
## [117] compiler_4.4.1 Rsamtools_2.21.2
## [119] rlang_1.1.4 crayon_1.5.3
## [121] grr_0.9.5 ggsignif_0.6.4
## [123] labeling_0.4.3 gprofiler2_0.2.3
## [125] EWCE_1.13.1 plyr_1.8.9
## [127] fs_1.6.4 stringi_1.8.4
## [129] viridisLite_0.4.2 BiocParallel_1.39.0
## [131] assertthat_0.2.1 babelgene_22.9
## [133] munsell_0.5.1 Biostrings_2.73.2
## [135] lazyeval_0.2.2 gh_1.4.1
## [137] homologene_1.4.68.19.3.27 Matrix_1.7-0
## [139] MungeSumstats_1.13.7 BSgenome_1.73.1
## [141] patchwork_1.3.0 bit64_4.5.2
## [143] ggplot2_3.5.1 KEGGREST_1.45.1
## [145] statmod_1.5.0 highr_0.11
## [147] SummarizedExperiment_1.35.3 googleAuthR_2.0.2
## [149] gargle_1.5.2 broom_1.0.7
## [151] memoise_2.0.1 bslib_0.8.0
## [153] ggtree_3.13.1 bit_4.5.0
## [155] splitstackshape_1.4.8 ape_5.8