spatiAlign: An Unsupervised Contrastive Learning Model for Data Integration of Spatially Resolved Transcriptomics
Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times. Here, we propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space. In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.
- 🚀 [2024.07.19]
SpatiAlign is onlien at GigaScience
doi: https://doi.org/10.1093/gigascience/giae042 - [2023.08.13]
SpatiAlign is online at BioRxiv.
doi: https://doi.org/10.1101/2023.08.08.552402
If you use spatiAlign
in your work, please cite the publication as follows:
Zhang C, Liu L, Zhang Y, et al. spatiAlign: an unsupervised contrastive learning model for data integration of spatially resolved transcriptomics[J]. GigaScience, 2024, 13: giae042.
Please pay attention to the matching versions of torch, torch_geometric, torch_cluster, torch_scatter and torch_sparse when installing.
- Install through Pypi
pip install spatialign
- or git clone
git clone https://github.com/STOmics/Spatialign.git
cd Spatialign
python setup.py install
- or docker env
docker pull zhangchao162/spatialign
from spatialign import Spatialign
data_lists = $DATA_PATH # dataset list
model = Spatialign(*data_lists,
min_genes=20,
min_cells=20,
batch_key='batch',
is_norm_log=True,
is_scale=False,
is_hvg=False,
is_reduce=False,
n_pcs=100,
n_hvg=2000,
n_neigh=15,
is_undirected=True,
latent_dims=100,
gpu=0,
save_path='./output')
model.train(tau1=0.05, tau2=0.01, tau3=0.1) # training model
model.alignment() # remove batch effects and align datasets distibution
- Stereo-seq Datasets: mouse olfactory bulb dataset has been deposited into CNGB Sequence Archive (CNSA) of China National GeneBank DataBase (CNGBdb) with accession number CNP001543, and the spatiotemporal dataset of mouse embryonic brain is available at https://db.cngb.org/stomics/mosta.
- 10x Genomics Visium Dataset: (mouse olfactory bulb) https://www.10xgenomics.com/resources/datasets/adult-mouse-olfactory-bulb-1-standard. And (DLPFC datasets): https://zenodo.org/record/6925603#.YuM5WXZBwuU
- Slide-seq Datasets: (mouse hippocampus datasets) https://singlecell.broadinstitute.org/single_cell/study/SCP815/highly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with-slide-seqv2#study-summary, https://singlecell.broadinstitute.org/single_cell/study/SCP354/slide-seq-study#study-summary, and https://singlecell.broadinstitute.org/single_cell/study/SCP948/robust-decomposition-of-cell-type-mixtures-in-spatial-transcriptomics#study-summary, respectively.
This is not an official product.