Pro-mPSL/Datastes

The Datastes you can find in https://github.com/xinshuaiiii/FESC-PSL/tree/main/dataset.

Creating a Virtual Environment

To run the code, we need to create a virtual environment using Anaconda, and install the required dependencies.The command is as follows：

conda create -n predict pyhton=3.7.13
conda activate predict
git clone https://github.com/xinshuaiiii/Pro-mPSL.git
cd Pro-mPSL
pip install -r requirements.txt

ProtT5

You can download the ProtT5-XL-UniRef50 model from this [website],there is a detailed tutorial on how to use the model. If you download the model locally,'ProtT5.py' can help you convert sequences to embeddings,you just need to modify the model path in the file. Configure the model and sequence files, that is, put the pre-trained T5 model in the '../prot5' directory.Put your protein sequence in the 'sequence.txt file'.Run the following command.

python Prott5.py --model_path ../prot5 --filename sequence.txt --output_path embeddings.npy

Predict

Model of Gram-positive bacteria is saved in 'Pos/Gram_train_model.py' , The hyperparameters have been set according to the parameters in the paper. Training file is saved in 'Pos/Gram+_train.py' , you need to convert the training set and validation set into embeddings through the ProtT5 model and set them on 'train_data' and 'val_data'. Model of Gram-positive bacteria is same.

There is a demo for test. 'demo/test-152-single-del-label' and 'demo/test-152-single-del-seq 'is a simple dataset. Run ProtT5.py to convert 'test-152-single-del-seq' to embeddings.

cd /your/path
python Prott5.py --model_path ../prot5 --filename test-152-single-del-seq --output_path embeddings.npy

Then transform it into training set and test set.Place the training data and val data in the path of 'Pos/Gram+_train.py' , you can start to train and get the training model.If your training and validation data files are located at data/train_data.pkl and data/val_data.pkl respectively, and you want to save the model and logs to the results/ directory, using the prefix experiment1, you can run:

cd /your/path
python Gram+_train.py --train_data_path data/train_data.pkl --val_data_path data/val_data.pkl --source_dir results --prefix experiment1 --total_epoch 50 --batch_size 64 --learning_rate 0.000766

Neg is the same.

cd /your/path
python Gram-_train.py --train_data_path data/train_data.pkl --val_data_path data/val_data.pkl --source_dir results --prefix experiment1 --total_epoch 50 --batch_size 64 --learning_rate 0.000766

For the construction and principle of the model, please refer to Pro-mPSL: An Ensemble Prediction Model for Bacterial Protein Subcellular Localization Based on Pre-trained Protein Language Model and Bi-LSTM (submission in progress).

If you have any questions, please contact: [email protected] or [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Neg		Neg
Pos		Pos
demo		demo
README.md		README.md
prott5.py		prott5.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pro-mPSL/Datastes

Creating a Virtual Environment

ProtT5

Predict

About

Releases

Packages

Languages

xinshuaiiii/Pro-mPSL

Folders and files

Latest commit

History

Repository files navigation

Pro-mPSL/Datastes

Creating a Virtual Environment

ProtT5

Predict

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages