Skip to content

Kang-chen/data_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data_pipeline

Project Overview

This project provides a data processing pipeline that uses various scripts and tools to download, process and import to lamindb. The pipeline includes steps from data download to quality control and final data storage.

Directory Structure

.
├── R
│ ├── convertAnn.R
│ └── scfetch
├── bash
│ ├── download_xlsx.sh
│ └── run_data_pipeline.sh
├── python
│ ├── 1-qc.py
│ └── 2-lamindb-aws.py
└── run_data_pipeline.sh

Dependencies

  • Conda
  • docker
  • scfetch
  • LaminDB
  • R
  • Python
  • ...

Installation Steps

  1. Clone the repository:

    git clone https://github.com/Kang-chen/data_pipeline
    cd data_pipeline
  2. Install necessary dependencies:

    TODO

  3. Ensure Docker and Conda environments are properly configured:

    • Docker
    • Conda

Usage Instructions

To execute the data processing pipeline, run the following command:

bash -i ../run_data_pipeline.sh GSE161382
bash -i ../run_data_pipeline.sh GSE161382 3

Option Explanation

GSE161382 is the source_id parameter, representing the dataset to be processed. 3 is the start_step parameter, indicating that the pipeline should start from step 3. If omitted, the pipeline will start from step 1 by default.

Error Handling

If an error occurs during execution, the script will terminate and display an error message in the terminal. You can check the log files or the error output for more detailed information.

Contributing

Contributions are welcome! Feel free to submit pull requests or report issues to help improve the project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published