R/adjust_zstat_in_genesOut.r
adjust_zstat_in_genesOut.Rd
Used when you want to directly analyse the gene-level Z-scores for a given GWAS while correcting for known confounding variables such as:
NSNPS
: Number of SNPs
NPARAM
: Number of parameters?
GENELEN
: Gene length
log***
: The logged version of each of the above variables,
using the default log function.
adjust_zstat_in_genesOut(
magma_GenesOut_file,
ctd = NULL,
ctd_species = infer_ctd_species(ctd),
prepare_ctd = TRUE,
method = "bonferroni",
verbose = TRUE,
...
)
A MAGMA .genes.out file generated by map_snps_to_genes.
Cell type data structure containing
specificity_quantiles
.
Species name relevant to the CellTypeDataset (ctd
).
See list_species for all available species.
If ctd_species=NULL
(default),
the ctd
species will automatically
be inferred using infer_species.
Whether to run
prepare_quantile_groups on the ctd
first.
R package to use for gene mapping:
"gprofiler"
: Slower but more species and genes.
"homologene"
: Faster but fewer species and genes.
"babelgene"
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
Print messages.
Arguments passed on to EWCE::standardise_ctd
dataset
CellTypeData. name.
input_species
Which species the gene names in exp
come from.
See list_species for all available species.
output_species
Which species' genes names to convert exp
to.
See list_species for all available species.
sctSpecies_origin
Species that the sct_data
originally came from, regardless of its current gene format
(e.g. it was previously converted from mouse to human gene orthologs).
This is used for computing an appropriate backgrund.
non121_strategy
How to handle genes that don't have
1:1 mappings between input_species
:output_species
.
Options include:
"drop_both_species" or "dbs" or 1
:
Drop genes that have duplicate
mappings in either the input_species
or output_species
(DEFAULT).
"drop_input_species" or "dis" or 2
:
Only drop genes that have duplicate
mappings in the input_species
.
"drop_output_species" or "dos" or 3
:
Only drop genes that have duplicate
mappings in the output_species
.
"keep_both_species" or "kbs" or 4
:
Keep all genes regardless of whether
they have duplicate mappings in either species.
"keep_popular" or "kp" or 5
:
Return only the most "popular" interspecies ortholog mappings.
This procedure tends to yield a greater number of returned genes
but at the cost of many of them not being true biological 1:1 orthologs.
"sum","mean","median","min" or "max"
:
When gene_df
is a matrix and gene_output="rownames"
,
these options will aggregate many-to-one gene mappings
(input_species
-to-output_species
)
after dropping any duplicate genes in the output_species
.
force_new_quantiles
By default, quantile computation is
skipped if they have already been computed.
Set =TRUE
to override this and generate new quantiles.
force_standardise
If ctd
has already been standardised, whether
to rerun standardisation anyway (Default: FALSE
).
remove_unlabeled_clusters
Remove any samples that have numeric column names.
numberOfBins
Number of non-zero quantile bins.
keep_annot
Keep the column annotation data if provided.
keep_plots
Keep the dendrograms if provided.
as_sparse
Convert to sparse matrix.
as_DelayedArray
Convert to DelayedArray
.
rename_columns
Remove replace_chars
from column names.
make_columns_unique
Rename each columns with the prefix
dataset.species.celltype
.
myGenesOut <- MAGMA.Celltyping::import_magma_files(
ids = c("ieu-a-298"),
file_types = ".genes.out",
return_dir = FALSE)
#> Using built-in example files: ieu-a-298.tsv.gz.35UP.10DOWN
#> Returning MAGMA gene.* file paths
ctd <- ewceData::ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
magmaGenesOut <- MAGMA.Celltyping::adjust_zstat_in_genesOut(
ctd = ctd,
magma_GenesOut_file = myGenesOut,
ctd_species = "mouse"
)
#> Standardising CellTypeDataset
#> Found 5 matrix types across 2 CTD levels.
#> Processing level: 1
#> Processing level: 2
#> Importing genes.out file.
#> 4 genes without HGNC gene symbols were dropped.
#> 371 genes that are absent from the ctd were dropped.
#> Computing adjusted Z-statistic.