vignettes/full_workflow.Rmd
full_workflow.Rmd
## ===================== 🦠🌋🦠 Welcome to MAGMA.Celltyping 🦠🌋🦠 =====================
## This package uses MAGMA:
## https://ctg.cncr.nl/software/magma
##
## To cite MAGMA.Celltyping, please use:
## * Skene, N.G., Bryois, J., Bakken, T.E. et al. Genetic identification of
## brain cell types underlying schizophrenia. Nat Genet 50, 825-833 (2018).
## https://doi.org/10.1038/s41588-018-0129-5
## * de Leeuw CA, Mooij JM, Heskes T, Posthuma D (2015) MAGMA: Generalized
## Gene-Set Analysis of GWAS Data. PLOS Computational Biology 11(4): e1004219.
## https://doi.org/10.1371/journal.pcbi.1004219
##
## Please report any bugs or feature requests by filling out an Issues template:
## https://github.com/neurogenomics/MAGMA_Celltyping/issues
## ===================== 🦠🌋🦠 =========================== 🦠🌋🦠 =====================
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
MAGMA.Celltyping
is a software package that facilitates
conducting cell-type-specific enrichment tests on GWAS summary
statistics.
The following vignette shows the step-by-step version of the
MAGMA.Celltyping
pipeline. For the newer streamlined
version of the pipeline (that wraps many of these functions), please see
the Getting Started vignette.
get_example_gwas()
.
gwas_sumstats_path <- MAGMA.Celltyping::get_example_gwas(
trait = "fluid_intelligence",
munged = FALSE)
Our lab have created a robust Bioconductor package for formatting
multiple types of summary statistics files: MungeSumstats
(please cite the associated publication if you use this package):
MungeSumstats
here to reformat the summary
statistics for Fluid Intelligence. The minimum info needed
after munging is:Genome builds:
MungeSumstats
will infer this information by default
(simply set ref_genome=NULL
).
path_formatted <- MungeSumstats::format_sumstats(path=gwas_sumstats_path,
save_path = tempfile(fileext = ".formatted.tsv.gz"),
ref_genome ="GRCh37")
For the sake of this example, we also provide a pre-munged version of the above file. If you use this, you can skip the previous two steps.
path_formatted_intelligence <- MAGMA.Celltyping::get_example_gwas(
trait = "fluid_intelligence",
munged = TRUE)
Next, we must convert the GWAS summary statistics file to a gene-level signature so that we can compare it to a gene-level transcriptomic cell-type reference.
Note you can input the genome build of your summary statistics for
this step or it can be inferred if left NULL
:
genesOutPath_intelligence <- MAGMA.Celltyping::map_snps_to_genes(
path_formatted = gwas_sumstats_path,
genome_build = "GRCh37")
The second step in MAGMA.Celltyping
is to prepare your
Cell Type Dataset (CTD) reference, from which we will get gene
signatures for each cell-type in the reference.
The ewceData
package comes with a CTD which we use as an example.
If you want to import your own single cell RNA-seq dataset, then this
needs converting into CTD format; please see the EWCE
vingette for explanation of how to do this.
ctd <- ewceData::ctd()
Note that the cell type dataset loaded in the code above is the Karolinksa cortex/hippocampus data only. For the full Karolinska dataset with hypothalamus and midbrain instead use the following:
ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")
You can find a list of other datasets provided by this package using
?MAGMA.Celltyping::get_ctd
:
ctd <- get_ctd("ctd_Tasic")
ctd <- get_ctd("ctd_DivSeq")
ctd <- get_ctd("ctd_AIBS")
ctd <- get_ctd("ctd_DRONC_human")
ctd <- get_ctd("ctd_DRONC_mouse")
ctd <- get_ctd("ctd_BlueLake2018_FrontalCortexOnly")
ctd <- get_ctd("ctd_BlueLake2018_VisualCortexOnly")
ctd <- get_ctd("ctd_Saunders")
ctd <- get_ctd("ctd_Zeisel2018")
Preparing quantiles in the CTD is now something that is automatically
handled by by MAGMA.Celltyping
so the user doesn’t need to
do this themselves as a pre-step anymore.
However, we still export the function
prepare_quantile_groups
in case users want to have more
control.
prepare_quantile_groups
calculates the quantile groups
for each celltype within the single-cell dataset. If your single-cell
dataset is not from mouse, then change the ctd_species
argument.
ctd <- MAGMA.Celltyping::prepare_quantile_groups(ctd = ctd,
ctd_species="mouse")
# Examine how the quantile groups look
print(ctd[[1]]$specificity_quantiles[c("Gfap","Dlg4","Aif1"),])
print(table(ctd[[1]]$specificity_quantiles[,1]))
The analyses can be run in either linear or top10% enrichment modes. Each have their own advantages/disadvantages.
Let’s start with linear:
ctAssocsLinear <- MAGMA.Celltyping::calculate_celltype_associations(
ctd = ctd,
gwas_sumstats_path = path_formatted_intelligence,
ctd_species = "mouse")
FigsLinear <- MAGMA.Celltyping::plot_celltype_associations(
ctAssocs = ctAssocsLinear,
ctd = ctd)
Now let’s add the top 10% mode
ctAssocsTop <- MAGMA.Celltyping::calculate_celltype_associations(
ctd = ctd,
gwas_sumstats_path = path_formatted_intelligence,
EnrichmentMode = "Top 10%")
FigsTopDecile <- MAGMA.Celltyping::plot_celltype_associations(
ctAssocs = ctAssocsTop,
ctd = ctd)
Then plot linear together with the top 10% mode.
ctAssocMerged <- MAGMA.Celltyping::merge_magma_results(
ctAssoc1 = ctAssocsLinear,
ctAssoc2 = ctAssocsTop)
FigsMerged <- MAGMA.Celltyping::plot_celltype_associations(
ctAssocs = ctAssocMerged,
ctd = ctd)
Sometimes a single cell-type can dominate the signature in a given GWAS (e.g. pyramidal cells in Schizophrenia). This makes it difficult to disentangle whether other cell-types are truly associated with the GWAS trait, or whether the are simply enriched because they have an overlapping signature with the top cell-type. Conditional mode allows you to control for one (or several) cell-type(s) and see whether other cell-types still remain significantly enriched.
By default, it is assumed that you want to run the linear enrichment
analysis. There are two modes for conditional analyses, you can either
control for the top N cell types from the baseline analysis (in which
case, set controlTopNcells
) or control for specific
specified cell types (in which case, set
controlledCTs
).
ctCondAssocs <- MAGMA.Celltyping::calculate_conditional_celltype_associations(
ctd = ctd,
gwas_sumstats_path = path_formatted_intelligence,
analysis_name = "Conditional",
controlTopNcells= 2)
if(!is.null(ctCondAssocs)){
MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocs,
ctd = ctd)
}
Let’s try as an alternative to control for expression of 3 neuronal subtypes at the same time:
ctCondAssocs <- MAGMA.Celltyping::calculate_conditional_celltype_associations(
ctd = ctd,
gwas_sumstats_path = path_formatted_intelligence,
analysis_name = "Conditional",
controlledCTs = c("pyramidal CA1","pyramidal SS","interneurons"),
controlledAnnotLevel = 1)
MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocs,
ctd=ctd)
Note that Periventricular Microglia (PVM) go from totally non-significant to significant once the neurons are controlled for. Test if this change is significant as follows:
magma1 = ctCondAssocs[[2]]$results[ctCondAssocs[[2]]$results$CONTROL=="BASELINE",]
magma2 = ctCondAssocs[[2]]$results[ctCondAssocs[[2]]$results$CONTROL=="interneurons,pyramidal_CA1,pyramidal_SS",]
resCompared <- MAGMA.Celltyping:::compare_trait_enrichments(magma1=magma1,
magma2=magma2,
annotLevel=2,
ctd=ctd)
resCompared[1:3,]
Using this approach we can see that the increased enrichment is microglia in the controlled analysis is almost significantly increased relative to the baseline analysis.
Conditional analyses can also be performed with top 10% mode (although the conditioning is done in linear mode).
ctCondAssocsTopTen = MAGMA.Celltyping::calculate_conditional_celltype_associations(
ctd = ctd,
gwas_sumstats_path = path_formatted_intelligence,
analysis_name = "Conditional",
controlledCTs=c("pyramidal CA1","pyramidal SS","interneurons"),
controlledAnnotLevel=1,
EnrichmentMode = "Top 10%")
MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocsTopTen,
ctd = ctd)
We now want to test enrichments that remain in a GWAS after we control for a second GWAS. So let’s download a second GWAS sumstats file and prepare it for analysis.
20016.assoc.tsv is the sumstats file for ‘Fluid Intelligence Score’ from the UK Biobank.
So let’s subtract genes associated with prospective memory from those involved in fluid intelligence.
#### Download pre-munged GWAS summary stats ####
gwas_sumstats_path_memory <- MAGMA.Celltyping::get_example_gwas(
trait = "prospective_memory",
munged = FALSE)
#### Format summary stats ####
## and infer the genome build from the data whilst doing so (see output messages)
path_formatted_memory <- MungeSumstats::format_sumstats(
path = gwas_sumstats_path_memory)
#### Can also skip ahead by importing pre-munged summary stats ####
# path_formatted_memory <- MAGMA.Celltyping::get_example_gwas(
# trait = "prospective_memory",
# munged = TRUE)
#### Map SNPs to genes ####
genesOutPath_memory <- MAGMA.Celltyping::map_snps_to_genes(
path_formatted = path_formatted_memory,
genome_build = "GRCH37")
ctAssocsLinear_memory <- MAGMA.Celltyping::calculate_celltype_associations(
ctd = ctd,
gwas_sumstats_path = path_formatted_memory,
ctd_species = "mouse")
ctAssocsLinear_intelligence <- MAGMA.Celltyping::calculate_celltype_associations(
ctd = ctd,
gwas_sumstats_path = path_formatted_intelligence,
ctd_species = "mouse")
MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctAssocsLinear_memory,
ctd = ctd)
ctAssocMerged_MemInt <- MAGMA.Celltyping::merge_magma_results(
ctAssoc1 = ctAssocsLinear_memory,
ctAssoc2 = ctAssocsLinear_intelligence)
FigsMerged_MemInt <- lapply(seq_len(length(ctd)),
function(lvl){
MAGMA.Celltyping::magma_tileplot(ctd=ctd,
results=ctAssocMerged_MemInt[[lvl]]$results,
annotLevel=lvl,
fileTag=paste0("Merged_MemInt_lvl",lvl))
})
# Set paths for GWAS sum stats + .genes.out file (with the z-scores)
ctAssocsLinear <- MAGMA.Celltyping::calculate_celltype_associations(
ctd = ctd,
ctd_species = "mouse",
gwas_sumstats_path = path_formatted_intelligence,
genesOutCOND = genesOutPath_memory,
analysis_name = "ControllingForPropMemory")
FigsLinear <- MAGMA.Celltyping::plot_celltype_associations(
ctAssocs = ctAssocsLinear,
ctd = ctd,
fileTag = "ControllingForPropMemory")
We find that after controlling for prospective memory, there is no significant enrichment left associated with fluid intelligence.
Instead of relying on the tool MAGMA (which requires an extra installation step), we can also compute cell-type enrichment scores in an R-native re-implementation of the MAGMA enrichment analyses using the following functions:
ctd <- ewceData::ctd()
magma_GenesOut_file <- MAGMA.Celltyping::import_magma_files(
file_types = "genes.out",
return_dir = FALSE)
magmaAdjZ <- MAGMA.Celltyping:: adjust_zstat_in_genesOut(
ctd = ctd,
ctd_species = "mouse",
magma_GenesOut_file = magma_GenesOut_file)
output <- MAGMA.Celltyping:: calculate_celltype_enrichment_limma(
magmaAdjZ = magmaAdjZ,
ctd = ctd,
thresh = 0.0001,
ctd_species = "mouse",
annotLevel = 2)
We can then get the probability of the celltype being enriched as follows
The results should closely resemble those obtained using MAGMA
To test whether a gene set is enriched using MAGMA the following commands can be used:
data("rbfox_binding")
geneset_res <- MAGMA.Celltyping::calculate_geneset_enrichment(
geneset = rbfox_binding,
gwas_sumstats_path = path_formatted_intelligence,
analysis_name = "Rbfox_20016",
geneset_species = "mouse")
print(geneset_res)
We can then test whether the gene set is still enriched after controlling for celltype enrichment:
ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")
analysis_name <- "Rbfox_16_pyrSS"
controlledCTs <- c("pyramidal SS")
cond_geneset_res_pyrSS = MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
geneset = geneset,
ctd = ctd,
controlledAnnotLevel = 1,
controlledCTs = controlledCTs,
gwas_sumstats_path = gwas_sumstats_path,
analysis_name = analysis_name,
ctd_species = "mouse")
controlledCTs <- c("pyramidal CA1")
cond_geneset_res_pyrCA1 <- MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
geneset = geneset,
ctd = ctd,
controlledAnnotLevel = 1,
controlledCTs = controlledCTs,
gwas_sumstats_path = gwas_sumstats_path,
analysis_name = analysis_name,
ctd_species = "mouse")
controlledCTs <- c("pyramidal CA1","pyramidal SS")
cond_geneset_res_pyr = MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
geneset = geneset,
ctd = ctd,
controlledAnnotLevel = 1,
controlledCTs = controlledCTs,
gwas_sumstats_path = gwas_sumstats_path,
analysis_name = analysis_name,
ctd_species = "mouse")
controlledCTs <- c("Medium Spiny Neuron")
cond_geneset_res_MSN = MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
geneset = geneset,
ctd = ctd,
controlledAnnotLevel = 1,
controlledCTs = controlledCTs,
gwas_sumstats_path = gwas_sumstats_path,
analysis_name = analysis_name,
ctd_species = "mouse")
controlledCTs <- c("Medium Spiny Neuron","pyramidal CA1","pyramidal SS","interneurons")
cond_geneset_res = MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
geneset = geneset,
ctd = ctd,
controlledAnnotLevel = 1,
controlledCTs = controlledCTs,
gwas_sumstats_path = gwas_sumstats_path,
analysis_name = analysis_name,
ctd_species = "mouse")
utils::sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.4 MAGMA.Celltyping_2.0.14 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] splines_4.4.1 BiocIO_1.15.2
## [3] bitops_1.0-9 ggplotify_0.1.2
## [5] filelock_1.0.3 tibble_3.2.1
## [7] R.oo_1.26.0 XML_3.99-0.17
## [9] lifecycle_1.0.4 rstatix_0.7.2
## [11] lattice_0.22-6 MASS_7.3-61
## [13] backports_1.5.0 magrittr_2.0.3
## [15] limma_3.61.12 plotly_4.10.4
## [17] sass_0.4.9 rmarkdown_2.28
## [19] jquerylib_0.1.4 yaml_2.3.10
## [21] HGNChelper_0.8.14 minqa_1.2.8
## [23] DBI_1.2.3 abind_1.4-8
## [25] zlibbioc_1.51.1 GenomicRanges_1.57.1
## [27] purrr_1.0.2 R.utils_2.12.3
## [29] BiocGenerics_0.51.3 RCurl_1.98-1.16
## [31] yulab.utils_0.1.7 VariantAnnotation_1.51.1
## [33] rappdirs_0.3.3 GenomeInfoDbData_1.2.13
## [35] IRanges_2.39.2 S4Vectors_0.43.2
## [37] tidytree_0.4.6 pkgdown_2.1.1
## [39] codetools_0.2-20 DelayedArray_0.31.13
## [41] tidyselect_1.2.1 aplot_0.2.3
## [43] UCSC.utils_1.1.0 farver_2.1.2
## [45] lme4_1.1-35.5 matrixStats_1.4.1
## [47] stats4_4.4.1 BiocFileCache_2.13.0
## [49] GenomicAlignments_1.41.0 jsonlite_1.8.9
## [51] Formula_1.2-5 systemfonts_1.1.0
## [53] tools_4.4.1 treeio_1.29.1
## [55] ragg_1.3.3 Rcpp_1.0.13
## [57] glue_1.8.0 SparseArray_1.5.41
## [59] xfun_0.48 MatrixGenerics_1.17.0
## [61] GenomeInfoDb_1.41.2 RNOmni_1.0.1.2
## [63] BiocManager_1.30.25 fastmap_1.2.0
## [65] boot_1.3-31 fansi_1.0.6
## [67] digest_0.6.37 R6_2.5.1
## [69] gridGraphics_0.5-1 textshaping_0.4.0
## [71] colorspace_2.1-1 RSQLite_2.3.7
## [73] R.methodsS3_1.8.2 utf8_1.2.4
## [75] tidyr_1.3.1 generics_0.1.3
## [77] data.table_1.16.0 rtracklayer_1.65.0
## [79] httr_1.4.7 htmlwidgets_1.6.4
## [81] S4Arrays_1.5.10 pkgconfig_2.0.3
## [83] gtable_0.3.5 blob_1.2.4
## [85] SingleCellExperiment_1.27.2 XVector_0.45.0
## [87] htmltools_0.5.8.1 carData_3.0-5
## [89] bookdown_0.40 scales_1.3.0
## [91] Biobase_2.65.1 png_0.1-8
## [93] ggfun_0.1.6 ggdendro_0.2.0
## [95] knitr_1.48 reshape2_1.4.4
## [97] rjson_0.2.23 nloptr_2.1.1
## [99] nlme_3.1-166 curl_5.2.3
## [101] cachem_1.1.0 stringr_1.5.1
## [103] BiocVersion_3.20.0 parallel_4.4.1
## [105] AnnotationDbi_1.67.0 restfulr_0.0.15
## [107] desc_1.4.3 pillar_1.9.0
## [109] grid_4.4.1 vctrs_0.6.5
## [111] ggpubr_0.6.0 car_3.1-3
## [113] dbplyr_2.5.0 evaluate_1.0.0
## [115] orthogene_1.11.0 GenomicFeatures_1.57.1
## [117] cli_3.6.3 compiler_4.4.1
## [119] Rsamtools_2.21.2 rlang_1.1.4
## [121] crayon_1.5.3 grr_0.9.5
## [123] ggsignif_0.6.4 gprofiler2_0.2.3
## [125] EWCE_1.13.1 plyr_1.8.9
## [127] fs_1.6.4 stringi_1.8.4
## [129] viridisLite_0.4.2 ewceData_1.13.0
## [131] BiocParallel_1.39.0 assertthat_0.2.1
## [133] babelgene_22.9 munsell_0.5.1
## [135] Biostrings_2.73.2 lazyeval_0.2.2
## [137] gh_1.4.1 homologene_1.4.68.19.3.27
## [139] Matrix_1.7-0 ExperimentHub_2.13.1
## [141] MungeSumstats_1.13.7 BSgenome_1.73.1
## [143] patchwork_1.3.0 bit64_4.5.2
## [145] ggplot2_3.5.1 KEGGREST_1.45.1
## [147] statmod_1.5.0 SummarizedExperiment_1.35.3
## [149] AnnotationHub_3.13.3 googleAuthR_2.0.2
## [151] gargle_1.5.2 broom_1.0.7
## [153] memoise_2.0.1 bslib_0.8.0
## [155] ggtree_3.13.1 bit_4.5.0
## [157] splitstackshape_1.4.8 ape_5.8