-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoRA + DeBERTa: loading model gives erratic, non-deterministic results #2171
Comments
Hi @jchook , i think i know what the issue is, I debugged your code, seems like the reuse of inference_model variable twice is causing the issue, can you try this? it works for me, lmk.
|
Thanks for the assistance. I don't believe this fixes the issue. Starting from a fresh Colab session and using your updated test loop, here is the output of my first 10 consecutive executions of the test cell:
I can reproduce this using normal .py scripts on my local machine where the python interpreter fully exits between executions, so I also don't believe it has to do with lingering Colab session state. |
hmmm @jchook thats weird, can you show me how youre evaluations for n executions. I get consistent results for each run, you can find the example here - https://colab.research.google.com/drive/1bwhOStf2PZ7CfYP7x9hrticJ_rmUWkdN?usp=sharing |
See the issue here: huggingface/peft#2171 Output of first 15 runs: {'accuracy': 1.0} {'accuracy': 0.0} {'accuracy': 1.0} {'accuracy': 1.0} {'accuracy': 0.02} {'accuracy': 1.0} {'accuracy': 0.0} {'accuracy': 1.0} {'accuracy': 1.0} {'accuracy': 0.0} {'accuracy': 0.0} {'accuracy': 0.0} {'accuracy': 1.0} {'accuracy': 0.0} {'accuracy': 0.0}
@JINO-ROHIT I very much appreciate the investigation.
The key is ensure models are reloaded on each execution. You can manually re-execute the entire test cell, or modify your code to load the LoRA and base model from within your run loop. See code changes hereFaster execution for the sake of timeFirst, to make the test code run faster, I reduced the test set to only 100 examples in the dataloader config: - dataset['test'] = dataset['test'].select(range(1000))
+ dataset['test'] = dataset['test'].select(range(100)) Ensure your run loop re-loads the LoRA from disk each timeThe bug appears to have something to do with loading from disk and applying to the base model. Also, make sure you have a sufficient size for import torch
from peft import PeftModelForSequenceClassification, PeftConfig
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import evaluate
from tqdm import tqdm
n = 10 # Number of times to run the evaluation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
peft_model_id = "./output/lora_epoch_1"
results = []
# Loop to run the evaluation `n` times
for run in range(n):
config = PeftConfig.from_pretrained(peft_model_id)
base_model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
inference_model = PeftModelForSequenceClassification.from_pretrained(base_model, peft_model_id)
inference_model.to(device)
inference_model.eval()
# Reset the metric for each run
metric = evaluate.load("accuracy")
print(f"Run {run + 1}/{n}")
for step, batch in enumerate(tqdm(eval_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = inference_model(**batch)
predictions = outputs.logits.argmax(dim=-1)
references = batch["labels"]
metric.add_batch(
predictions=predictions,
references=references,
)
eval_metric = metric.compute()
print(f"Evaluation result for run {run + 1}: {eval_metric}")
results.append(eval_metric)
print("Final evaluation results:")
for result in results:
print(result) After ensuring that the LoRA is re-loaded on each execution and setting
I'm able to consistently reproduce this issue on Colab and my local environment. Here is a reproduction of the issue using normal python scripts, ensuring python fully exits between each invocation of You can see the first 100 |
ahhh i see , not sure why then, the other bits of the code seem mostly okay, did pushing to hub and load help? ill try and see if something else works, meanwhile we can wait for @BenjaminBossan |
Apparently saving/loading from the Hub does not fix the issue either. See this repro on Colab.
However, changing the base model from DeBERTa V3 to RoBERTa seems to resolve the issue. So the problem may be DeBERTa-specific somehow.
|
System Info
A100 Colab
Issue
Here is a Colab repro to demonstrate my issue: https://colab.research.google.com/drive/1Z5FL1QDePY0j-8XL0o1V9o27_MbU-02E?usp=sharing
It's very possible that I am doing something fundamentally wrong or using the
peft
framework incorrectly. However, I tried to copy most of this code from the official LoRA sequence classification exampleOn this toy IMDB sentiment classification example using LoRA + DeBERTA base, I get consistent ~100% test accuracy when I do any of the following:
However, I get erratic results when saving (per-epoch) and loading a trained LoRA adapter from disk.
This occurs without any modifications to the code, in the same python session, or across fresh python sessions. So, I have ruled-out residual python state in RAM, etc.
The results appear to shift randomly between the following outcomes:
Who can help?
@BenjaminBossan
Information
Tasks
examples
folderReproduction
Here is a Colab repro to demonstrate my issue: https://colab.research.google.com/drive/1Z5FL1QDePY0j-8XL0o1V9o27_MbU-02E?usp=sharing
In the Colab, after running the other cells to install, load the
imdb
dataset, and train a LoRA classifier, I am running this test cell repeatedly and seeing erratic results:I tried switching to
PeftModelForSequenceClassification
but that did not resolve the issue.Again, this issue also occurs for me outside of Colab, on my local machine, and running python scripts that fully exit after each execution (edit: here is a repro)
This issue also occurs when I try to fix the random seeds for numpy, torch, etc.
Expected behavior
I would expect relatively similar test accuracy each time I run the test routine.
The text was updated successfully, but these errors were encountered: