Batch-corrected imputation using magicBatch

Introduction

Marcov Affinity-based Graph Imputation of Cells (MAGIC) is an algorithm described by David Van Dijk, et al that denoises high-dimensional data by smoothing over the manifold of the data and restoring the overall structure of the data. It has been extensively applied to single-cell RNA sequencing data where noise and dropout are a common problem.

Current implementations of MAGIC are well-suited for per batch analyses, such as imputation of gene expression values. However, for applications such as dimensional reduction of multi-batch datasets, there is a need for cross-batch imputation that is responsive to batch correction and other types of data correction methods (e.g. removal of cell cycle effects).

magicBatch is a modified version of the MAGIC algorithm that allows independent specification of the data used to compute the diffusion operator and the data to be imputed. This enables any low-dimensional representation of the data, including batch-corrected data, to be directly used in the powered Marcov affinity matrix computation and subsequently applied to the original gene expression data.

Installation

Install the magicBatch python package using one of two methods:

From within R (preferred):
```
python_path <- system("which python3", intern = TRUE)
system(paste(python_path, "-m pip install magicBatch"))
```
From the command line:
```
pip install magicBatch
which python
```
The output of which python is needed later to invoke the correct python runtime from within R.

Install the magicBatch R package:

devtools::install_github("kbrulois/magicBatch")

tSpace and SingleCellExperiment are also required for this demo:

devtools::install_github('hylasD/tSpace', 
                         build = TRUE, 
                         build_opts = c("--no-resave-data", "--no-manual"), 
                         force = T)

library(magicBatch)
library(tSpace)
library(SingleCellExperiment)

Load demo data

We will use a single cell experiment object containing data from our single cell survey of mouse lymph node endothelial cells. Here we load the object and extract the color scheme:

sce <- readRDS(url('https://stacks.stanford.edu/file/druid:cf352cg6610/PLN123_SCE.rds'))
subset_colors <- setNames(colData(sce)[["color.scheme"]][1:11],
                          colData(sce)[["color.scheme.key"]][1:11])

Motivation: MAGIC imputation enhances tSpace

First, we will illustrate the effect of MAGIC on the tSpace algorithm in the absence of batch effects. For this, we will use one of the three samples in the dataset and compute MAGIC using the classic implementation. For this, we use the magicBatch function with mar_mat_input = NULL (the default), which behaves identically to the original MAGIC algorithm.

sce_PLN1 <- sce[,sce$sample == "PLN1"]

lc_PLN1 <- as.matrix(t(logcounts(sce_PLN1)))

MAGIC_PLN1 <- magicBatch(data = lc_PLN1, 
                         t_param = 6, 
                         python_command = python_path)

Next, we compute tSpace using the imputed data (top 1000 variable genes) or a PCA of the non-imputed data for comparison. Results are visualized using the html_3dPlot function.

var_genes <- rowData(sce)[["var.genes"]]

tsp_out <- lapply(list(`Without MAGIC` = prcomp(t(lc_PLN1[,var_genes]), rank. = 20)[["rotation"]],
                       `With MAGIC` = MAGIC_PLN1[["imputed_data"]][["t6"]][,var_genes]), 
                  \(x) tSpace::tSpace(df = as.data.frame(x), 
                                      trajectories = 100, 
                                      D = 'pearson_correlation', 
                                      core_no = 10))
  
to_plot <- do.call(rbind, 
                   lapply(names(tsp_out), 
                          \(x) {
                            cbind(tsp_out[[x]][["ts_file"]][,2:4],
                                  data.frame(method = rep(x, nrow(tsp_out[[x]][["ts_file"]])),
                                             subsets = colData(sce_PLN1)[,4]))
                          }))

html_3dPlot(coordinates = to_plot[,1:3],  
            color = to_plot[,4:5], 
            discrete_colors_custom = list(subsets = subset_colors),
            wrap = "method",
            include_all_var = FALSE,
            selfcontained = TRUE)

MAGIC prior to tSpace provides a clearer visual representation the developmental branching structure.

magicBatch provides a way to apply batch-corrected imputation prior to tSpace

Now we will return to the full dataset containing 3 samples (PLN1, PLN2 and PLN3) and demonstrate the use of magicBatch to perform batch-corrected imputation.

MAGIC_wo_correction <- magicBatch(data = as.matrix(t(logcounts(sce))), 
                                  t_param = 6, 
                                  python_command = python_path)

MAGIC_w_correction <- magicBatch(data = as.matrix(t(logcounts(sce))), 
                                 mar_mat_input = reducedDim(sce, "MNN_correction"),
                                 t_param = 6, 
                                 python_command = python_path)


tsp_out <- lapply(list(`Without batch correction` = MAGIC_wo_correction[["imputed_data"]][["t6"]][,var_genes],
                       `With batch correction` = MAGIC_w_correction[["imputed_data"]][["t6"]][,var_genes]), 
                  \(x) tSpace::tSpace(df = as.data.frame(x), 
                                      trajectories = 100, 
                                      D = 'pearson_correlation', 
                                      core_no = 10))

to_plot2 <- do.call(rbind, 
                   lapply(names(tsp_out), 
                          \(x) {
                            cbind(tsp_out[[x]][["ts_file"]][,2:4],
                                  data.frame(method = rep(x, nrow(tsp_out[[x]][["ts_file"]]))),
                                  as.data.frame(colData(sce)[,c(3,4,5)]))
                          }))

html_3dPlot(coordinates = to_plot2[,1:3],  
            color = to_plot2[,4:7], 
            discrete_colors_custom = list(subsets = subset_colors),
            wrap = "method",
            include_all_var = FALSE,
            texts = NULL, 
            selfcontained = TRUE)

Without batch correction (original MAGIC), the 3 samples are poorly integrated, with PLN3 (the only sample derived from C57BL/6 mice) being the further separated compared to PLN1 and PLN2 (two BALB/c samples). magicBatch provides an effective way to apply batch corrected imputation.

Kevin Brulois

2024-05-29

Introduction

Installation

Load demo data

Motivation: MAGIC imputation enhances tSpace

magicBatch provides a way to apply batch-corrected imputation prior to tSpace