This project provides a data processing pipeline that uses various scripts and tools to download, process and import to lamindb. The pipeline includes steps from data download to quality control and final data storage.
.
├── R
│ ├── convertAnn.R
│ └── scfetch
├── bash
│ ├── download_xlsx.sh
│ └── run_data_pipeline.sh
├── python
│ ├── 1-qc.py
│ └── 2-lamindb-aws.py
└── run_data_pipeline.sh
- Conda
- docker
- scfetch
- LaminDB
- R
- Python
- ...
-
Clone the repository:
git clone https://github.com/Kang-chen/data_pipeline cd data_pipeline
-
Install necessary dependencies:
TODO
-
Ensure Docker and Conda environments are properly configured:
- Docker
- Conda
To execute the data processing pipeline, run the following command:
bash -i ../run_data_pipeline.sh GSE161382
bash -i ../run_data_pipeline.sh GSE161382 3
GSE161382 is the source_id parameter, representing the dataset to be processed. 3 is the start_step parameter, indicating that the pipeline should start from step 3. If omitted, the pipeline will start from step 1 by default.
If an error occurs during execution, the script will terminate and display an error message in the terminal. You can check the log files or the error output for more detailed information.
Contributions are welcome! Feel free to submit pull requests or report issues to help improve the project.