Intro

MAGMA.Celltyping is a software package that facilitates conducting cell-type-specific enrichment tests on GWAS summary statistics.

Setup

Specify where you want the large files to be downloaded to.

NOTE: Make sure you change storage_dir to somewhere other than tempdir() if you want to make sure the results aren’t deleted after this R session closes!

storage_dir <- tempdir()

Prepare data

GWAS

  • We need to have a summary statistics file to analyse as input.
  • As an example, you can download UK Biobank summary statistics for ‘fluid_intelligence’ using get_example_gwas().

Here we provide a pre-munged version of the above file.

Munging

Our lab have created MungeSumstats, a robust Bioconductor package for formatting multiple types of summary statistics files. We highly recommend processing your GWAS summary statistics with MungeSumstats before continuing. See the full_workflow vignette for more details.

The minimum info needed after munging is:
- “SNP”, “CHR”, and “BP” as first three columns. - It has at least one of these columns: “Z”,“OR”,“BETA”,“LOG_ODDS”,“SIGNED_SUMSTAT”

path_formatted <- MAGMA.Celltyping::get_example_gwas(
  trait = "prospective_memory")

Map SNPs to Genes

Note you can input the genome build of your summary statistics for this step or it can be inferred if left NULL:

genesOutPath <- MAGMA.Celltyping::map_snps_to_genes(
  path_formatted = path_formatted,
  genome_build = "GRCh37")

MAGMA_Files_Public

Rather than preprocessing the GWAS yourself, you can instead use the MAGMA_Files_Public database we have created. It contains pre-computed MAGMA SNP-to-genes mapping files for hundreds of GWAS.

You can browse which GWAS traits are available by looking at the provided metadata.csv file.

magma_dirs <- MAGMA.Celltyping::import_magma_files(ids = "ieu-a-298")
## Using built-in example files: ieu-a-298.tsv.gz.35UP.10DOWN
## Returning MAGMA directories.

CellTypeDataset

ewceData provides a number of CellTypeDatasets (CTD) to be used a cell-type transcriptomic signature reference files.

If you want to create your own single-cell transcriptomic reference, you’ll need to first convert it to CTD using the instructions found in the EWCE package documentation here.

ctd <- ewceData::ctd()
## see ?ewceData and browseVignettes('ewceData') for documentation
## loading from cache

Note that the cell type dataset loaded in the code above is the Karolinksa cortex/hippocampus data only. For the full Karolinska dataset with hypothalamus and midbrain instead use the following:

ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")

Or for the DRONC seq or AIBS datasets use:

ctd <- get_ctd("ctd_Tasic")
ctd <- get_ctd("ctd_DivSeq")
ctd <- get_ctd("ctd_AIBS")
ctd <- get_ctd("ctd_DRONC_human")
ctd <- get_ctd("ctd_DRONC_mouse")
ctd <- get_ctd("ctd_BlueLake2018_FrontalCortexOnly")
ctd <- get_ctd("ctd_BlueLake2018_VisualCortexOnly")
ctd <- get_ctd("ctd_Saunders")

Run cell-type enrichment analyses

MAGMA.Celltyping offers a suite of functions for conducting various types of cell-type-specific enrichment tests on GWAS summary statistics.

The celltype_associations_pipeline wraps several functions that in previous versions of MAGMA.Celltyping had to be set up and run separately. These include:

  • Linear enrichment: calculate_celltype_associations(EnrichmentMode = "linear") internally. Activated when run_linear=TRUE.
  • Top 10% enrichment: Uses calculate_celltype_associations(EnrichmentMode = "Top 10%") internally. Activated when run_top10=TRUE.
  • Conditional enrichment: Uses calculate_conditional_celltype_associations internally. Activated when run_conditional=TRUE.

Thus, celltype_associations_pipeline is designed to make these analyses easier to run.

MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
  magma_dirs = magma_dirs,
  ctd = ctd,
  ctd_species = "mouse", 
  ctd_name = "Zeisel2015", 
  run_linear = TRUE, 
  run_top10 = TRUE)

We’ve also saved a pre-computed version of these results as a dataset:

MAGMA_results <- MAGMA.Celltyping::enrichment_results

Plot results

Merge results

merge_results imports each of the MAGMA enrichment results files and merges them into one so that they can easily be plotted and further analysed.

merged_results <- MAGMA.Celltyping::merge_results(
  MAGMA_results = MAGMA_results)
## Saving full merged results to ==> /tmp/RtmpfhgSUF/MAGMA_celltyping./.lvl1.csv
knitr::kable(merged_results)
GWAS Celltype TYPE OBS_GENES BETA BETA_STD SE P log10p level Method EnrichmentMode GCOV_FILE CONTROL CONTROL_label genesOutCOND analysis_name FDR Celltype_id
ieu-a-298.tsv.gz.35UP.10DOWN o l i g o d e n d r o c y t e s COVAR 962 0.0016776 0.0199640 0.0024613 0.24785 -0.6058111 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.867475 oligodendrocytes
ieu-a-298.tsv.gz.35UP.10DOWN a s t r o c y t e s _ e p e n d y m a l SET 104 0.0671670 0.0178860 0.0853000 0.21562 -0.6663110 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.867475 astrocytes_ependymal
ieu-a-298.tsv.gz.35UP.10DOWN e n d o t h e l i a l _ m u r a l SET 98 0.0831670 0.0215500 0.0962320 0.19384 -0.7125566 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.867475 endothelial_mural
ieu-a-298.tsv.gz.35UP.10DOWN o l i g o d e n d r o c y t e s SET 94 0.1124000 0.0285690 0.1017200 0.13472 -0.8705679 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.867475 oligodendrocytes
ieu-a-298.tsv.gz.35UP.10DOWN a s t r o c y t e s _ e p e n d y m a l COVAR 962 -0.0003729 -0.0045268 0.0022566 0.56561 -0.2474829 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 astrocytes_ependymal
ieu-a-298.tsv.gz.35UP.10DOWN e n d o t h e l i a l _ m u r a l COVAR 962 0.0002391 0.0029144 0.0023811 0.46002 -0.3372233 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 endothelial_mural
ieu-a-298.tsv.gz.35UP.10DOWN i n t e r n e u r o n s COVAR 962 -0.0033333 -0.0393200 0.0024009 0.91728 -0.0374981 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 interneurons
ieu-a-298.tsv.gz.35UP.10DOWN m i c r o g l i a COVAR 962 -0.0002839 -0.0036131 0.0022446 0.55030 -0.2594005 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 microglia
ieu-a-298.tsv.gz.35UP.10DOWN p y r a m i d a l _ C A 1 COVAR 962 -0.0037493 -0.0440450 0.0023802 0.94218 -0.0258661 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 pyramidal_CA1
ieu-a-298.tsv.gz.35UP.10DOWN p y r a m i d a l _ S S COVAR 962 -0.0023657 -0.0282810 0.0023306 0.84480 -0.0732461 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 pyramidal_SS
ieu-a-298.tsv.gz.35UP.10DOWN i n t e r n e u r o n s SET 106 -0.1554800 -0.0417660 0.0911160 0.95587 -0.0196012 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.981580 interneurons
ieu-a-298.tsv.gz.35UP.10DOWN m i c r o g l i a SET 91 -0.0341170 -0.0085425 0.1093000 0.62250 -0.2058606 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.981580 microglia
ieu-a-298.tsv.gz.35UP.10DOWN p y r a m i d a l _ C A 1 SET 111 -0.1930200 -0.0529550 0.0923320 0.98158 -0.0080743 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.981580 pyramidal_CA1
ieu-a-298.tsv.gz.35UP.10DOWN p y r a m i d a l _ S S SET 98 -0.0750110 -0.0194370 0.0965900 0.78120 -0.1072378 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.981580 pyramidal_SS

Heatmap

Now we’ll construct a heatmap visualizing the enrichment results, such that each GWAS is shown on the y-axis and each cell-type is shown on the x-axis. Results can be further facetted by what kind of test was run (linear, top10%, and/or conditional).

heat <- MAGMA.Celltyping::results_heatmap(
  merged_results = merged_results, 
  title = "Alzheimer's Disease (ieu-a-298) vs. nervous system cell-types (Zeisel2015)",
  fdr_thresh = 1)
## 14 results @ FDR < 1
## Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
##  Please use the `rows` argument instead.
##  The deprecated feature was likely used in the MAGMA.Celltyping package.
##   Please report the issue at
##   <https://github.com/neurogenomics/MAGMA_Celltyping/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Top results

Top phenotypes

Get the phenotypes with the greatest number of significant cell-type enrichment results.

top_phenos <- merged_results %>% 
  dplyr::group_by(EnrichmentMode, GWAS) %>%
  dplyr::summarise(Celltype=dplyr::n_distinct(Celltype)) %>%
  dplyr::arrange(dplyr::desc(Celltype))
## `summarise()` has grouped output by 'EnrichmentMode'. You can override using
## the `.groups` argument.
knitr::kable(top_phenos)
EnrichmentMode GWAS Celltype
Linear ieu-a-298.tsv.gz.35UP.10DOWN 7
Top 10% ieu-a-298.tsv.gz.35UP.10DOWN 7

Top enrichments

Get the phenotypes-celltype enrichment results with the most significant p-values (per phenotype).

top_enrich <- merged_results %>% 
  dplyr::group_by(EnrichmentMode, GWAS) %>%
  dplyr::slice_min(FDR, n = 2)
knitr::kable(top_enrich) 
GWAS Celltype TYPE OBS_GENES BETA BETA_STD SE P log10p level Method EnrichmentMode GCOV_FILE CONTROL CONTROL_label genesOutCOND analysis_name FDR Celltype_id
ieu-a-298.tsv.gz.35UP.10DOWN o l i g o d e n d r o c y t e s COVAR 962 0.0016776 0.0199640 0.0024613 0.24785 -0.6058111 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.867475 oligodendrocytes
ieu-a-298.tsv.gz.35UP.10DOWN a s t r o c y t e s _ e p e n d y m a l COVAR 962 -0.0003729 -0.0045268 0.0022566 0.56561 -0.2474829 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 astrocytes_ependymal
ieu-a-298.tsv.gz.35UP.10DOWN e n d o t h e l i a l _ m u r a l COVAR 962 0.0002391 0.0029144 0.0023811 0.46002 -0.3372233 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 endothelial_mural
ieu-a-298.tsv.gz.35UP.10DOWN i n t e r n e u r o n s COVAR 962 -0.0033333 -0.0393200 0.0024009 0.91728 -0.0374981 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 interneurons
ieu-a-298.tsv.gz.35UP.10DOWN m i c r o g l i a COVAR 962 -0.0002839 -0.0036131 0.0022446 0.55030 -0.2594005 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 microglia
ieu-a-298.tsv.gz.35UP.10DOWN p y r a m i d a l _ C A 1 COVAR 962 -0.0037493 -0.0440450 0.0023802 0.94218 -0.0258661 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 pyramidal_CA1
ieu-a-298.tsv.gz.35UP.10DOWN p y r a m i d a l _ S S COVAR 962 -0.0023657 -0.0282810 0.0023306 0.84480 -0.0732461 1 MAGMA Linear ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_linear.gsa.out BASELINE BASELINE NA Zeisel2015_linear 0.981580 pyramidal_SS
ieu-a-298.tsv.gz.35UP.10DOWN a s t r o c y t e s _ e p e n d y m a l SET 104 0.0671670 0.0178860 0.0853000 0.21562 -0.6663110 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.867475 astrocytes_ependymal
ieu-a-298.tsv.gz.35UP.10DOWN e n d o t h e l i a l _ m u r a l SET 98 0.0831670 0.0215500 0.0962320 0.19384 -0.7125566 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.867475 endothelial_mural
ieu-a-298.tsv.gz.35UP.10DOWN o l i g o d e n d r o c y t e s SET 94 0.1124000 0.0285690 0.1017200 0.13472 -0.8705679 1 MAGMA Top 10% ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2015_top10.gsa.out BASELINE BASELINE NA Zeisel2015_top10 0.867475 oligodendrocytes

Session Info

utils::sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ewceData_1.9.0          ExperimentHub_2.9.1     AnnotationHub_3.9.1    
## [4] BiocFileCache_2.9.1     dbplyr_2.3.3            BiocGenerics_0.47.0    
## [7] dplyr_1.1.2             MAGMA.Celltyping_2.0.11 BiocStyle_2.29.1       
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.3.1                 later_1.3.1                  
##   [3] BiocIO_1.11.0                 bitops_1.0-7                 
##   [5] ggplotify_0.1.2               filelock_1.0.2               
##   [7] tibble_3.2.1                  R.oo_1.25.0                  
##   [9] XML_3.99-0.14                 lifecycle_1.0.3              
##  [11] rstatix_0.7.2                 rprojroot_2.0.3              
##  [13] MASS_7.3-60                   lattice_0.21-8               
##  [15] backports_1.4.1               magrittr_2.0.3               
##  [17] limma_3.57.7                  plotly_4.10.2                
##  [19] sass_0.4.7                    rmarkdown_2.24               
##  [21] jquerylib_0.1.4               yaml_2.3.7                   
##  [23] httpuv_1.6.11                 HGNChelper_0.8.1             
##  [25] minqa_1.2.5                   DBI_1.1.3                    
##  [27] abind_1.4-5                   zlibbioc_1.47.0              
##  [29] GenomicRanges_1.53.1          purrr_1.0.2                  
##  [31] R.utils_2.12.2                RCurl_1.98-1.12              
##  [33] yulab.utils_0.0.7             VariantAnnotation_1.47.1     
##  [35] rappdirs_0.3.3                GenomeInfoDbData_1.2.10      
##  [37] IRanges_2.35.2                S4Vectors_0.39.1             
##  [39] tidytree_0.4.5                pkgdown_2.0.7                
##  [41] codetools_0.2-19              DelayedArray_0.27.10         
##  [43] xml2_1.3.5                    tidyselect_1.2.0             
##  [45] aplot_0.2.0                   farver_2.1.1                 
##  [47] lme4_1.1-34                   matrixStats_1.0.0            
##  [49] stats4_4.3.1                  GenomicAlignments_1.37.0     
##  [51] jsonlite_1.8.7                ellipsis_0.3.2               
##  [53] systemfonts_1.0.4             tools_4.3.1                  
##  [55] progress_1.2.2                treeio_1.25.3                
##  [57] ragg_1.2.5                    Rcpp_1.0.11                  
##  [59] glue_1.6.2                    SparseArray_1.1.11           
##  [61] xfun_0.40                     MatrixGenerics_1.13.1        
##  [63] GenomeInfoDb_1.37.2           RNOmni_1.0.1                 
##  [65] withr_2.5.0                   BiocManager_1.30.22          
##  [67] fastmap_1.1.1                 boot_1.3-28.1                
##  [69] fansi_1.0.4                   digest_0.6.33                
##  [71] R6_2.5.1                      mime_0.12                    
##  [73] gridGraphics_0.5-1            textshaping_0.3.6            
##  [75] colorspace_2.1-0              biomaRt_2.57.1               
##  [77] RSQLite_2.3.1                 R.methodsS3_1.8.2            
##  [79] utf8_1.2.3                    tidyr_1.3.0                  
##  [81] generics_0.1.3                data.table_1.14.8            
##  [83] rtracklayer_1.61.1            prettyunits_1.1.1            
##  [85] httr_1.4.7                    htmlwidgets_1.6.2            
##  [87] S4Arrays_1.1.5                pkgconfig_2.0.3              
##  [89] gtable_0.3.3                  blob_1.2.4                   
##  [91] SingleCellExperiment_1.23.0   XVector_0.41.1               
##  [93] htmltools_0.5.6               carData_3.0-5                
##  [95] bookdown_0.35                 scales_1.2.1                 
##  [97] Biobase_2.61.0                png_0.1-8                    
##  [99] ggdendro_0.1.23               ggfun_0.1.2                  
## [101] knitr_1.43                    reshape2_1.4.4               
## [103] rjson_0.2.21                  nloptr_2.0.3                 
## [105] nlme_3.1-163                  curl_5.0.2                   
## [107] cachem_1.0.8                  stringr_1.5.0                
## [109] BiocVersion_3.18.0            parallel_4.3.1               
## [111] AnnotationDbi_1.63.2          restfulr_0.0.15              
## [113] desc_1.4.2                    pillar_1.9.0                 
## [115] grid_4.3.1                    vctrs_0.6.3                  
## [117] promises_1.2.1                ggpubr_0.6.0                 
## [119] car_3.1-2                     xtable_1.8-4                 
## [121] evaluate_0.21                 orthogene_1.7.0              
## [123] GenomicFeatures_1.53.1        cli_3.6.1                    
## [125] compiler_4.3.1                Rsamtools_2.17.0             
## [127] rlang_1.1.1                   crayon_1.5.2                 
## [129] grr_0.9.5                     ggsignif_0.6.4               
## [131] labeling_0.4.2                gprofiler2_0.2.2             
## [133] EWCE_1.9.2                    plyr_1.8.8                   
## [135] fs_1.6.3                      stringi_1.7.12               
## [137] viridisLite_0.4.2             BiocParallel_1.35.4          
## [139] assertthat_0.2.1              babelgene_22.9               
## [141] munsell_0.5.0                 Biostrings_2.69.2            
## [143] gh_1.4.0                      lazyeval_0.2.2               
## [145] homologene_1.4.68.19.3.27     Matrix_1.6-1                 
## [147] MungeSumstats_1.9.15          BSgenome_1.69.0              
## [149] hms_1.1.3                     patchwork_1.1.3              
## [151] bit64_4.0.5                   ggplot2_3.4.3                
## [153] KEGGREST_1.41.0               statmod_1.5.0                
## [155] shiny_1.7.5                   highr_0.10                   
## [157] SummarizedExperiment_1.31.1   interactiveDisplayBase_1.39.0
## [159] googleAuthR_2.0.1             gargle_1.5.2                 
## [161] broom_1.0.5                   memoise_2.0.1                
## [163] bslib_0.5.1                   ggtree_3.9.1                 
## [165] bit_4.0.5                     ape_5.7-1