yaml header : - The arguments within --- ... ---
are called the yaml header. It includes various options to customize your knitted rmarkdown output file. This part is not visible in the knitted output, only in the rmarkdown file.
One especially useful feature is toc: true
which generates a floating table of contents (toc), which helps navigate large knitted rmarkdowns.
Also note that if you want to share a link to a specific (sub)section in the rmarkdown, simply go to that section, then hover over the header until you see a link symbol. You can then copy that link and share it with others, such that it will immediately take them to that part of the rmarkdown output (assuming you’ve already pushed it to GitHub Pages).
setup chunk:
Below, you can modify the default settings when you run chunks interactively ( knitr::opts_chunk$set(...)
) or when you are knitting the rmarkdown ( knitr::opts_knit$set(...)
). Because these are set in the first chunk, the modifications will be applied to all subsequent chunks.
If there’s code you want to make sure is always run first (before you run code in any other chunks) you can add it here, in the special {r setup}
chunk.
This is especially helpful when you don’t want to manually load the same packages every time you open the rmarkdown.
<- here::here()
root.dir ::opts_chunk$set(
knitrcollapse = TRUE,
comment = "#>",
root.dir = root.dir
# fig.height = 12,
# fig.width = 10
) ::opts_knit$set(root.dir = root.dir, dpi = 300)
knitr
library(dplyr)
library(ggplot2)
library(plotly)
library(uwot)
Headers:
Headers are extremely useful for hierarchically organizing your code.
You can make subsections within the higher-level section using progressively more # symbols.
For example:
# Analysis 1
## Analysis with subsample A
### Results table
### Plot
## Analysis with subsample B
### Results table
### Plot
# Analysis 2
## Analysis with subsample A
### Results table
### Plot
Once you’ve added your headers, you can quickly navigate across sections by clicking “Show document outline” button in the upper right of RStudio.
Chunk names:
- You can also choose to name chunks within the chunk options, e.g. {r Section 1}
. This helps to keep track of which step of your code is running when you go to knit it.
Let’s first create a named list of colors so that the datasets we’ll be using have consistent coloring.
<- c(TabulaMuris="turquoise", Zeisel2018="slateblue") color_dict
<- readRDS(url("https://neurogenomics.github.io/model_celltype_conservation/processed_data/EWCE/standardized_CTD/TabulaMuris.rds"))
tabulamuris <- tabulamuris$level_2$mean_exp mat1
<- uwot::umap(X = t(mat1),
umap_res1 # Return 3 dimensions for 3D plotting
n_components = 3,
# Get extra UMAP info (not just the embedding)
ret_extra = c("model","nn","fgraph"))
## Prepare embeddings dataframe
<- data.frame(umap_res1$embedding) %>%
embedding1 `colnames<-`(paste("UMAP",c(1:ncol(umap_res1$embedding)), sep = "_")) %>%
::mutate(cell_id=colnames(mat1)) %>%
dplyr::separate(col = "cell_id", sep = "[.]", into = c("species","dataset","celltype"), remove = F, extra = "merge") tidyr
<- ggplot(embedding1, aes(x=UMAP_1, y=UMAP_2, label=celltype, color=dataset)) +
gg_umap1 geom_point() +
scale_color_manual(values = color_dict)
print(gg_umap1)
Now let’s explore the data interactively using plotly.
Plotly commands: - Hover to see data info “tooltips”. - Click and drag to zoom in within a window.
- Double click to return to the full view.
- SHIFT + click and drag to pan (or simply click and drag the axes).
Show/hide groups: - Another feature of plotly to note is that in the legend, you can click any of the groups to make them invisible/visible.
::ggplotly(gg_umap1) plotly
Now let’s import another mouse scRNAseq dataset, this time from Zeisel 2018.
Note, you can’t have two chunks with the same name, we so add the suffix “-Zeisel2018” here.
<- readRDS(url("https://neurogenomics.github.io/model_celltype_conservation/processed_data/EWCE/standardized_CTD/Zeisel2018.rds"))
zeisel2018 <- 5
level <- zeisel2018[[level]]$mean_exp mat2
<- uwot::umap(X = t(mat2),
umap_res2 # Return 3 dimensions for 3D plotting
n_components = 3,
# Get extra UMAP info (not just the embedding)
ret_extra = c("model","nn","fgraph"))
## Prepare embeddings dataframe
<- data.frame(umap_res2$embedding) %>%
embedding2 `colnames<-`(paste("UMAP",c(1:ncol(umap_res2$embedding)), sep = "_")) %>%
::mutate(cell_id=colnames(zeisel2018[[level]]$mean_exp)) %>%
dplyr::separate(col = "cell_id", sep = "[.]", into = c("species","dataset","celltype"), remove = F, extra = "merge") tidyr
<- ggplot(embedding2, aes(x=UMAP_1, y=UMAP_2, label=celltype, color=dataset)) +
gg_umap2 geom_point() +
scale_color_manual(values = color_dict)
print(gg_umap2)
::ggplotly(gg_umap2) plotly
Let’s first make a composite plot of both sets of UMAP embeddings. Patchwork is a flexible and easy-to-use package to combine plots.
Note: If you try to give patchwork objects to ggplotly()
, it will only use the last plot.
library(patchwork)
<- gg_umap1 + gg_umap2 +
gg_multi ## Modify plot layout
::plot_layout(ncol = 1, heights = c(1,1)) +
patchwork## Annotate plot/subplots
::plot_annotation(title = "UMAP projections", tag_levels = letters)
patchworkprint(gg_multi)
Image formats:
Which format you save in can have a big impact on the quality of the image. This is especially important when preparing figures for publications. “pdf” and “svg” formats don’t degrade their resolution when you zoom in/out, making them better for publications.
ggsave()
will automatically detect the image type you want to save from the extension in the filename=
. In the example below, we’re saving the figure as a pdf.
You can also adjust dpi
to increase resolution when saving the figures, even when saving in other formats (e.g. “png” or “jpg”). Most journals require dpi ≥ 300 for figures, though you’ll want to check the journal-specific author guidelines to be sure.
Note: You’ll probably want to not use the dpi
argument when using the file format “svg”, because this can cause problems when saving in R.
::ggsave(filename = here::here("plots/gg_multi_umap.pdf"),
ggplot2height = 8,
width = 10,
dpi = 300)
Now let’s merge both datasets and rerun UMAP.
Merge on gene names (row names).
<- merge(mat1, mat2, by=0) %>%
mat ::column_to_rownames("Row.names") %>%
tibbleas.matrix()
<- uwot::umap(X = t(mat),
umap_res # Return 3 dimensions for 3D plotting
n_components = 3,
# Get extra UMAP info (not just the embedding)
ret_extra = c("model","nn","fgraph"))
## Prepare embeddings dataframe
<- data.frame(umap_res$embedding) %>%
embedding `colnames<-`(paste("UMAP",c(1:ncol(umap_res$embedding)), sep = "_")) %>%
::mutate(cell_id=colnames(mat)) %>%
dplyr::separate(col = "cell_id", sep = "[.]", into = c("species","dataset","celltype"), remove = F, extra = "merge") tidyr
<- ggplot(embedding, aes(x=UMAP_1, y=UMAP_2, label=celltype, color=dataset, shape=species)) +
gg_umap geom_point() +
scale_color_manual(values = color_dict)
::ggplotly(gg_umap) plotly
Here we plot the same UMAP embedding in 3D! In addition to the usual plotly commends, you can use the scroll wheel on your mouse to zoom in and out.
This allows us to explore another dimension of the data that could reveal aspects not visible previously. For example, sometimes clusters appear to be overlapping when just looking at dimensions 1 (x) and 2 (y), but in reality the clusters are far away from each other in the 3rd dimension (z).
This also lets us to flexibly zoom in and out of the data without have to regenerate the plot each time.
From this, plot, we can see the samples largely segregate according to which dataset they came from. This indicates we should probably do some data integration steps (using tools like LIGER, fastMNN, or CellBLAST) before analyzing the data further.
<- plotly::plot_ly(data = embedding,
gg_umap_3d x= ~UMAP_1, y= ~UMAP_2, z= ~UMAP_3,
color= ~dataset, symbol= ~species,
alpha = .8, colors = color_dict)
gg_umap_3d#> No trace type specified:
#> Based on info supplied, a 'scatter3d' trace seems appropriate.
#> Read more about this trace type -> https://plotly.com/r/reference/#scatter3d
#> No scatter3d mode specifed:
#> Setting the mode to markers
#> Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
Briefly, I’ll also mention that you can use other coding languages in the same rmarkdown, including python! This functionality uses the R package reticulate by default.
See a tutorial here.
You can even import objects created in R using the r.<object_name>
syntax.
import platform
print(platform.python_version())
#> 3.8.8
import pandas as pd
= r.embedding
embed = embed.groupby(["dataset"]).celltype.count()
cell_count print(cell_count)
#> dataset
#> TabulaMuris 120
#> Zeisel2018 231
#> Name: celltype, dtype: int64
It’s always a good idea to print your Session Info at the end of your Rmarkdown so that others can see which versions of R and R packages you used, making your work easier to reproduce later.
The Session Info output can be quite long, so we wrap it within the html syntax <details>...</details>
to collapse it.
Another trick is to add attr.output='style="max-height: 200px;"'
within the {r...}
chunk options. This means that if your output is over 200 pixels in height, it will put all the contents into a scrollable box.
::sessionInfo()
utils#> R version 4.0.4 (2021-02-15)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] patchwork_1.1.1 uwot_0.1.10 Matrix_1.3-2 plotly_4.9.3
#> [5] ggplot2_3.3.3 dplyr_1.0.5
#>
#> loaded via a namespace (and not attached):
#> [1] reticulate_1.18 tidyselect_1.1.0 xfun_0.22 bslib_0.2.4
#> [5] purrr_0.3.4 lattice_0.20-41 colorspace_2.0-0 vctrs_0.3.6
#> [9] generics_0.1.0 htmltools_0.5.1.1 viridisLite_0.3.0 yaml_2.2.1
#> [13] utf8_1.2.1 rlang_0.4.10 jquerylib_0.1.3 pillar_1.5.1
#> [17] glue_1.4.2 withr_2.4.1 DBI_1.1.1 lifecycle_1.0.0
#> [21] stringr_1.4.0 munsell_0.5.0 gtable_0.3.0 htmlwidgets_1.5.3
#> [25] codetools_0.2-18 evaluate_0.14 labeling_0.4.2 knitr_1.31
#> [29] crosstalk_1.1.1 fansi_0.4.2 highr_0.8 Rcpp_1.0.6
#> [33] scales_1.1.1 jsonlite_1.7.2 farver_2.1.0 RSpectra_0.16-0
#> [37] digest_0.6.27 stringi_1.5.3 grid_4.0.4 rprojroot_2.0.2
#> [41] here_1.0.1 tools_4.0.4 magrittr_2.0.1 RcppAnnoy_0.0.18
#> [45] sass_0.3.1 lazyeval_0.2.2 tibble_3.1.0 crayon_1.4.1
#> [49] tidyr_1.1.3 pkgconfig_2.0.3 ellipsis_0.3.1 data.table_1.13.6
#> [53] assertthat_0.2.1 rmarkdown_2.7 httr_1.4.2 R6_2.5.0
#> [57] compiler_4.0.4