Skip to content

StatBiomed/TemporalVAE

Repository files navigation

TemporalVAE: atlas-assisted temporal mapping of time-series single-cell transcriptomes during embryogenesis

License

Contact: Yuanhua Huang, Yijun Liu

Email: [email protected]

A user-oriented repo is at https://github.com/StatBiomed/TemporalVAE-release with more features to be added.

Introduction

TemporalVAE is a deep generative model in a dual-objective setting to infer the biological time of cells from a compressed latent space. We demonstrated its scalability to millions of cells in the mouse development atlas and its high accuracy in atlas-based cell staging on mouse organogenesis across platforms and during human peri-implantation between in vivo and in vitro conditions. Furthermore, we showed that our atlas-based time predictor can effectively support RNA velocity modeling over short-time cell differentiation, including hematopoiesis and neuronal development.

A preprint describing TemporalVAE's algorithms and results is at [bioRxiv](https://;.


Contents

Latest Updates

  • v0.1 (May, 2024): Initial release.

Installation

To install TemporalVAE, python 3.9 is required and follow the instruction

  1. Install Miniconda3 if not already available.
  2. Clone this repository:
  git clone https://github.com/StatBiomed/TemporalVAE
  1. Navigate to TemporalVAE directory:
  cd TemporalVAE
  1. (5-10 minutes) Create a conda environment with the required dependencies:
  conda env create -f environment.yml
  1. Activate the TemporalVAE environment you just created:
  conda activate TemporalVAE
  1. Install pytorch: You may refer to pytorch installtion as needed. For example, the command of installing a cpu-only pytorch is:
conda install pytorch torchvision torchaudio cpuonly -c pytorch

Reproduce the result in manuscript

The code is in folder named by figure-index

Figure 2:

Compare the TemporalVAE with baseline methods in three small datasets cited in Psupertime mansucript.

  1. Preprocess three datasets by the code described in preprocess_data_fromPsupertimeManuscript.md
  2. run the code of each benchmarking method, then run plotFig2_check_corr.py to generate Fig2.

Figure 3:

  1. Preprocess the mouse atlas data and mouse stereo data by
python -u Fig3_mouse_data/preprocess_data_mouse_embryonic_development_combineData.py
python -u Fig3_mouse_data/preprocess_data_mouse_embryo_stereo.py
  1. Reproduce the result of Figure3.A&B and save results in folder results/230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809
python -u Fig3_mouse_data/TemporalVAE_kFoldOn_mouseAtlas.py 
--result_save_path=230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809
--vae_param_file=supervise_vae_regressionclfdecoder_mouse_stereo
--file_path=/mouse_embryonic_development/preprocess_adata_JAX_dataset_combine_minGene100_minCell50_hvg1000 
--time_standard_type=embryoneg5to5
--train_epoch_num=100  --kfold_test --train_whole_model
> logs/log.log
  1. Plot Figure3.A&B with the result in results/230827_trainOn_mouse_embryonic_development_kFold_testOnYZdata0809, please check Fig3_mouse_data/plot_figure3AB.ipynb

  2. Figure3.C: Compare TemporalVAE with LR, PCA, RF on mouse atlas data, please check Fig3_mouse_data/LR_PCA_RF_kFoldOn_mouseAtlas.ipynb

  3. Figure3.D&E: Models train on mouse atlas data and predict on mouse stereo-seq data, please check Fig3_mouse_data/TemporalVAE_LR_PCA_RF_directlyPredictOn_mouseStereo.ipynb or run code Fig3_mouse_data/TemporalVAE_LR_PCA_RF_directlyPredictOn_mouseStereo.py on console.

Figure 4:

  1. Preprocess the raw dataset by
python -u Fig4_human_data/preprocess_humanEmbryo_xiang2019data.py
python -u Fig4_human_data/preprocess_humanEmbryo_PLOS.py
python -u Fig4_human_data/preprocess_humanEmbryo_CS7_Tyser.py
  1. Figure 4.A: K-fold test on xiang19 dataset, please check Fig4_human_data/vae_humanEmbryo_xiang19.ipynb or run code on console:
python -u Fig4_human_data/TemporalVAE_humanEmbryo_kFoldOn_xiang19.py --file_path=/240322Human_embryo/xiang2019/hvg500/
  1. Figure 4.B: Temporal trained on xiang19 dataset and predict on Lv19 dataset, please check Fig4_human_data/LR_PCA_RF_directlyPredictOn_humanEmbryo_PLOS.ipynb or run code Fig4_human_data/LR_PCA_RF_directlyPredictOn_humanEmbryo_PLOS.py on console.
  2. Figure 4C&D: train on 4 in vitro dataset and predict on one in vivo dataset, please check Fig4_human_data/vae_humanEmbryo_Melania.ipynb or run code on console:
python -u Fig4_human_data/vae_humanEmbryo_Melania.py --file_path=/240405_preimplantation_Melania/

Figure 5:

  1. The data is from paper .
  2. 1 Figure 5. C&E is the data of hematopoiesis cells, please check Fig5_RNA_velocity/VAE_mouse_fineTune_Train_on_U_pairs_S_hematopoiesis.ipynb or run code on console:
python -u Fig5_RNA_velocity/TemporalVAE_mouse_fineTune_Train_on_U_pairs_S.py --sc_file_name=240108mouse_embryogenesis/hematopoiesis --clf_weight=0.2
  1. 2 Figure 5. D&F is the data of neuron cells, please check Fig5_RNA_velocity/VAE_mouse_fineTune_Train_on_U_pairs_S_neuron.ipynb or run code on console:
python -u Fig5_RNA_velocity/TemporalVAE_mouse_fineTune_Train_on_U_pairs_S.py --sc_file_name=240108mouse_embryogenesis/neuron --clf_weight=0.1
  1. The scVelo result in Figure 5. E&F is base on the .ipynb code provided by the dataset's paper, please check Fig5_RNA_velocity/scVelo_hematopoiesis.ipynb and Fig5_RNA_velocity/scVelo_neuron.ipynb

Todo