Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Fine-Tune with PDBs Outside the Preprocessed Dataset? #42

Open
rcossio opened this issue Dec 26, 2024 · 3 comments
Open

How to Fine-Tune with PDBs Outside the Preprocessed Dataset? #42

rcossio opened this issue Dec 26, 2024 · 3 comments

Comments

@rcossio
Copy link

rcossio commented Dec 26, 2024

The training documentation mentions that fine-tuning requires a list of PDB files and the preprocessed wwwPDB dataset (~1TB).

I would like to fine-tune the model using structures that are not included in this preprocessed dataset. Could you provide guidance on how to include external PDB files in the fine-tuning process?
I’m considering defining custom IDs, but should the structures be preprocessed to be added to the dataset?

Thank you in advance for your help!

@zhangyuxuann
Copy link
Collaborator

@rcossio We consider release the data pipeline in the feature. This matter needs to go through some processes. If there is any update, it will be synchronized here.

@Linmj-Judy
Copy link

Hi, yuanxuan, hope this email finds you well.

Your proposed algorithm has captured my keen research interest.

Following up on the previous inquiry about the data processing pipeline, I would also like to express my strong interest in your work.

The algorithm looks promising, and I am eager to see its implementation details.

Thank you for your time and consideration.

Warm regards,
Mujie Lin

@xinshi-chen
Copy link
Collaborator

xinshi-chen commented Jan 19, 2025

Hey everyone, the data pipeline was released last week! You can find more details here:
@rcossio @Linmj-Judy

https://github.com/bytedance/Protenix/blob/main/docs/prepare_training_data.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants