maxplanck-ie · adRn-s · Jun 19, 2024 · Jun 19, 2024 · Jun 19, 2024 · Jun 19, 2024
diff --git a/rmd/30_BatchEffects.Rmd b/rmd/30_BatchEffects.Rmd
@@ -72,7 +72,7 @@ if (!file.exists(fdest)) {
   # utils::download.file("https://zenodo.org/record/7891484/files/panc_sub_processed.RDS?download=1", destfile = fdest, method = "curl")
   file.copy("/scratch/local/rseurat/datasets/preprocessed_rds/panc_sub_processed.RDS", fdest, overwrite = FALSE)
 }
-panc_sub <- UpdateSeuratObject(readRDS(fdest))  ## Seurat v5 would require this
+panc_sub <- UpdateSeuratObject(readRDS(fdest)) ## Seurat v5 would require this
 ```
 
 > 🧭✨ Poll:
@@ -132,19 +132,36 @@ p1 + p2
 Let's now explore some options of mitigating batch effects:
 
 -   Seurat Integration
--   Seurat SCTransform
--   Conos
 -   Harmony
+-   SCTransform
+-   kBET
+-   Conos
 -   ComBat/SVA
 -   ...
 
+# Harmony
+
+This method is quite straightforward. It will account for batch effects by correcting our principal components, its functionality is conveniently wrapped under function `RunHarmony`. Just keep in mind that it will create a new 'harmony' reduction that's the one you want to use downstream...
+
+
+```{r, eval=FALSE}
+# ...
+harmony::RunHarmony(object, h.group.by.vars = "orig.ident")
+FindNeighbors(object, reduction = 'harmony')
+FindClusters(object)
+RunUMAP(object, reduction = 'harmony')
+# ...
+```
+
+For further details, [check their vignette](http://htmlpreview.github.io/?https://github.com/immunogenomics/harmony/blob/master/doc/Seurat.html)
+
 # Seurat Integration
 
-> There's a few more slides, [here](https://docs.google.com/presentation/d/1cdeiIY2I7lXwYYJOyUYViOZkv3BRQ2PTfrmwKYMD6Jc/edit?usp=sharing).
+> There's a few more slides, [here](https://docs.google.com/presentation/d/1cdeiIY2I7lXwYYJOyUYViOZkv3BRQ2PTfrmwKYMD6Jc/edit?usp=sharing). For further details, check the [original article](https://www.sciencedirect.com/science/article/pii/S0092867419305598).
 
 ## Seurat Integration: prep datasets
 
-Prior to the integration, we want to normalize each dataset to be integrated separately.
+Prior to the integration, we want to normalize each dataset separately to be integrated.
 
 ```{r seurat_integrate1}
 # split the dataset into a list of seurat objects
@@ -159,7 +176,6 @@ panc.list <- lapply(X = panc.list, FUN = function(x) {
 
 ## Seurat Integration
 
-In this paragraph, we're going to run the three functions key to Seurat Integration.
 
 ```{r seurat_integrate2}
 # select features that are repeatedly variable across datasets for integration
@@ -180,16 +196,14 @@ gc()
 
 Let's have a brief look at the panc.combined dataset - a new Assay has been created by the integration procedure.
 
-> 🧭✨ Poll: What is the new assay called?](<https://PollEv.com/multiple_choice_polls/4ArWqQjwtyVafBYCiaqbr/respond>) Hint: you can access assays of a Seurat object with `Assays()`.
+> 🧭✨ Poll: [What is the new assay called?](<https://PollEv.com/multiple_choice_polls/4ArWqQjwtyVafBYCiaqbr/respond>) Hint: you can access assays of a Seurat object with `Assays()`.
 
 ## Process the newly integrated dataset
 
 After the integration, data scaling on the new assay is necessary, as well as calculation of PCA and UMAP embeddings.
 
 ```{r process_integrated}
-# specify that we will perform downstream analysis on the corrected data note that the
-# original unmodified data still resides in the 'RNA' assay
-DefaultAssay(panc.combined) <- "integrated"
+stopifnot(DefaultAssay(panc.combined) == "integrated")
 
 # Run the standard workflow for visualization and clustering
 panc.combined <- ScaleData(panc.combined, verbose = FALSE)
@@ -249,11 +263,6 @@ p2 <- DimPlot(panc.combined, reduction = "umap", group.by = "celltype")
 p1 + p2
 ```
 
-## Outlook
-
--   Verify that expected marker genes are expressed per cell population
--   Note: Seurat v5 has additional modalities of integrating cells: (Harmony and scVI, bridge integration across modalities)
-
 ## SessionInfo
 
 ```{r sessionInfo}

diff --git a/rmd/31_SCTransform.Rmd b/rmd/31_SCTransform.Rmd
@@ -49,7 +49,7 @@ stim <- ifnb.list[["STIM"]]
 
 Let's remove some redundant objects from memory:
 
-```{r cleanup}
+```{r cleanup, eval=FALSE}
 rm(list = c("ifnb", "ifnb_sub", "ifnb.list"))
 gc()
 ```
@@ -101,19 +101,25 @@ We're going to use the SCTransform function with some more recent implementation
 library(sctransform)
 library(glmGamPoi)
 ctrl <- SCTransform(ctrl, method = "glmGamPoi", vst.flavor = "v2", verbose = FALSE)
+```
+
+We could've also regressed out covariates during this process.
+
+```{r, eval=FALSE}
 # store mitochondrial percentage in object meta data
-# ctrl <- PercentageFeatureSet(ctrl, pattern = "^MT-", col.name = "percent.mt")
-# ctrl <- SCTransform(ctrl, method = "glmGamPoi", vars.to.regress = "percent.mt", verbose = FALSE)
+ctrl <- PercentageFeatureSet(ctrl, pattern = "^MT-", col.name = "percent.mt")
+ctrl <- SCTransform(ctrl, method = "glmGamPoi", vars.to.regress = "percent.mt", verbose = FALSE)
 ```
 
+
 ## Recalculate Dimensional Reductions
 
 We have now gotten a new Assay added to the Seurat object.
 All we need to do now is to run PCA and UMAP for visualization in lowD.
 We're also going to save the elbow plot for comparison with the log-normed dataset.
 
 ```{r sct_umap_calc}
-DefaultAssay(ctrl) <- "SCT" # explain
+stopifnot(DefaultAssay(ctrl) == "SCT")
 ctrl <- RunPCA(ctrl, verbose = FALSE)
 ctrl_sct <- ElbowPlot(ctrl, ndims = "30")
 ```
@@ -127,7 +133,9 @@ ctrl_ln + ctrl_sct
 What has changed and what hasn't?
 
 > 🧭✨ Poll: [What amount of standard deviation does PC1 explain after sc-transform?](https://PollEv.com/multiple_choice_polls/AQwroH3oNpZ2uQRcly2eO/respond)
-> Hint: if you'd like to determine it numerically, you can use the `Stdev` accessor function to access the standard deviations for PC components in a Seurat object.
+> Hint: Function `Stdev(object)` returns the absolute values of standard deviations for principal components. It's the analogous to `object@reductions$pca@stdev`.
+
+
 
 ## Revisit UMAP
 
@@ -182,7 +190,7 @@ immune.combined.sct <- IntegrateData(anchorset = immune.anchors, normalization.m
 
 Let's cleanup again:
 
-```{r cleanup2}
+```{r cleanup2, eval=FALSE}
 rm(list = c("ctrl", "stim", "ifnb.list", "features", "all.genes", "immune.anchors"))
 gc()
 ```
@@ -235,7 +243,7 @@ head(b.interferon.response, n = 10)
 ```{r DE_plot,fig.height=4, fig.width=8}
 Idents(immune.combined.sct) <- "seurat_annotations"
 inf_genes <- rownames(b.interferon.response)
-DefaultAssay(immune.combined.sct) <- "SCT"
+DefaultAssay(immune.combined.sct) == "SCT"  # integrated
 FeaturePlot(immune.combined.sct,
   features = inf_genes[1], split.by = "stim",
   cols = c("grey", "red")

diff --git a/rmd/41_BrainIntegration.Rmd b/rmd/41_BrainIntegration.Rmd
@@ -151,7 +151,6 @@ By default, the column `seurat_clusters` in our `meta.data` slot will point to t
 # Inspect Metadata
 
 # Plot UMAP for different resolutions (hint: change group.by= ...)
-
 ```
 
 1.  Explore the different cluster solutions with respect to their sizes and pairwise relationships (hint: table).