Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross entropy loss during training with xlora #26

Open
crossxxd opened this issue Apr 3, 2024 · 4 comments
Open

cross entropy loss during training with xlora #26

crossxxd opened this issue Apr 3, 2024 · 4 comments

Comments

@crossxxd
Copy link

crossxxd commented Apr 3, 2024

I saw discussions about training in other issues, and I have run train and inference code successfully. Training code is mainly based on SFTTrainer and I think only next-token prediction loss is used. If I want to add cross entropy loss mentioned in the paper, what should I do?

@EricLBuehler
Copy link
Owner

To use cross-entropy loss, we configure the training loop to use CE loss.

@crossxxd
Copy link
Author

I have rewritten the trainer from transformers lib and added the cross entropy of the xlora classifier's category output. There is no problem for now. Thanks for your reply!

@EricLBuehler
Copy link
Owner

@crossxxd, we do not train for the X-LoRA classifier's scalings output in the paper, although you could try that. We just train the model as normal, with the CE loss on the output of the model. This works because the gradients propagate up to the X-LoRA classifier's output, and because the output is a result of the X-LoRA classifier we are training the X-LoRA classifier.

@crossxxd
Copy link
Author

It seems that I misunderstood the definition of loss in the paper. For now, I am using the loss on the output of the model combined with the loss on the scalings output of the xlora classifier for overall training. The total loss can converge and xlora model works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants