This is my research on a Deep-Learning Based Pipeline for Benchmarking Pre-processing of Single-Cell Data
Several advances in the domain of single-cell transcriptomics have paved the pathway for the discovery of new cell types and a more comprehensive understanding of human diseases. However, one of the central challenges to be confronted with when handling scRNA-seq data is its significant noisy nature stemming from several technical factors, such as amplification bias, cell cycle effects, library size differences, and notably, a low RNA capture rate. Therefore, preprocessing of the training and testing datasets is a crucial first step in the analysis of scRNA-seq data. Not to mention that different downstream tasks require specific types of preprocessing methods, namely, Quality Control of datasets, Normalization of count matrices, Selection of Highly Variable genes, and Dimensionality Reduction. To ensure a seamless and resource-efficient experience for researchers while utilizing computational models to perform experiments on scRNA-seq data, we propose an extension to our current Python toolkit - DANCE: A Deep Learning Library and Benchmark Platform for Single-Cell Analysis - aimed at supporting deep learning models for analyzing single-cell gene expression at scale. This pipeline, currently in its developmental stage, serves as a culmination of the most widely utilized and crucial preprocessing functions/methods that would aid researchers in selecting the optimum combination of the aforementioned preprocessing functions to suit their computational models. This would be achieved by numerous iterations of the collected benchmark datasets and different available preprocessing methods that would analyze the performance of the user's single-cell analysis model.
My Research at the Mid-SURE 2023 Conference: https://symposium.foragerone.com/mid-sure2023/presentations/58413
The DANCE library: https://github.com/OmicsML/dance