Use MAGMA to test enrichment in a geneset
Source:R/calculate_conditional_geneset_enrichment.R
calculate_conditional_geneset_enrichment.RdRun gene set enrichment analysis with a given geneset
on a GWAS previously mapped to genes
(using map_snps_to_genes),
while conditioning on a given cell-type or list of cell-types
(controlledCTs), as defined by the "specificity_quantiles" matrix
in a given level (controlledAnnotLevel) of the provide
CellTypeDataset (ctd). This allows one to conduct gene set enrichment
analyses while controlling for strong cell-type-specific signatures.
Usage
calculate_conditional_geneset_enrichment(
geneset,
ctd,
ctd_species = infer_ctd_species(ctd),
controlledAnnotLevel = 1,
controlledCTs,
prepare_ctd = TRUE,
gwas_sumstats_path = NULL,
magma_dir = NULL,
analysis_name = "MainRun",
upstream_kb = 35,
downstream_kb = 10,
geneset_species = infer_geneset_species(geneset),
version = NULL,
verbose = TRUE
)Arguments
- geneset
Genes which are to be tested. Can be gene symbols (or other gene IDs) from any species listed in list_species.
- ctd
Cell type data structure containing
specificity_quantiles.- ctd_species
Species name relevant to the CellTypeDataset (
ctd). See list_species for all available species. Ifctd_species=NULL(default), thectdspecies will automatically be inferred using infer_species.- controlledAnnotLevel
Annotation level of the celltypes being controlled for.
- controlledCTs
Array of the celltype to be controlled for, e.g. c("Interneuron type 16","Medium Spiny Neuron).
- prepare_ctd
Whether to run prepare_quantile_groups on the
ctdfirst.- gwas_sumstats_path
File path of the summary statistics file.
- magma_dir
Path to folder containing the pre-computed MAGMA GWAS files (.gsa.rawand .gsa.out).
- analysis_name
Used in file names which area created.
- upstream_kb
How many kb upstream of the gene should SNPs be included?
- downstream_kb
How many kb downstream of the gene should SNPs be included?
- geneset_species
Species that the
genesetcame from. Can be any species listed in list_species. Ifgeneset_species!="human", thegenesetwill be converted to 1:1 human orthologs using convert_orthologs.- version
MAGMA version to use.
- verbose
Print messages.
Value
data.table of baseline (column name prefix "BASELINE_") and conditioned (column name prefix "COND_") gene set enrichment results.
Examples
#### Import data ####
ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")
#> Loading required namespace: piggyback
#> Downloading 1 CellTypeDataset file(s).
#> ℹ All local files already up-to-date!
magma_dir <- MAGMA.Celltyping::import_magma_files(ids = "ieu-a-298")
#> Using built-in example files: ieu-a-298.tsv.gz.35UP.10DOWN
#> Returning MAGMA directories.
geneset <- MAGMA.Celltyping::rbfox_binding
res <- MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
geneset = geneset,
ctd = ctd,
controlledAnnotLevel = 1,
controlledCTs = "pyramidal SS",
magma_dir = magma_dir,
analysis_name = "Rbfox_16_pyrSS",
geneset_species = "mouse",
ctd_species = "mouse")
#> Installed MAGMA version: v1.10
#> Skipping MAGMA installation.
#> The desired_version of MAGMA is currently installed: v1.10
#> Using: magma_v1.10
#> ctd is already standardised. Returning original ctd.
#> Set force_standardise=TRUE to re-standardise.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Preparing gene_df.
#> character format detected.
#> Converting to data.frame
#> Extracting genes from input_gene.
#> 4,143 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 3,769 genes extracted.
#> Extracting genes from ortholog_gene.
#> 3,769 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 13 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 2 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#>
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#> 398 / 4,143 (9.6%)
#> Total genes remaining after convert_orthologs :
#> 3,745 / 4,143 (90%)
#> Mapping gene symbols in specificity_quantiles matrix to entrez IDs.
#> Reading sets out info from .gsa.out file.