Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hugging Face from_pretrained() using merged weights KeyError: 'base_model_name_or_path' #2224

Open
chg0901 opened this issue Jan 2, 2025 · 5 comments
Assignees
Labels
bug Something isn't working triaged This issue has been assigned an owner and appropriate label

Comments

@chg0901
Copy link

chg0901 commented Jan 2, 2025

test codes in https://pytorch.org/torchtune/stable/tutorials/e2e_flow.html#use-with-hugging-face-from-pretrained


from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers

print(transformers.__version__)

#TODO: update it to your chosen epoch
trained_model_path = "models/torchtune/llama3_2_3B/lora_single_device/epoch_1"
# trained_model_path = "/home/cine/Documents/tune/models/Llama-3.2-3B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=trained_model_path,
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(trained_model_path, safetensors=True)


# Function to generate text
def generate_text(model, tokenizer, prompt, max_length=50):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=max_length)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

prompt = "tell me a joke"
print("Base model output:", generate_text(model, tokenizer, prompt))

prompt = "Complete the sentence: 'Once upon a time..."
print("Base model output:", generate_text(model, tokenizer, prompt))

error

(base) cine@20211029-a04:~/Documents/tune$ /home/cine/miniconda3/envs/tune/bin/python /home/cine/Documents/tune/gen_from_merged_sft.py
Traceback (most recent call last):
  File "/home/cine/Documents/tune/gen_from_merged_sft.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/cine/miniconda3/envs/tune/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 514, in from_pretrained
    pretrained_model_name_or_path = adapter_config["base_model_name_or_path"]
KeyError: 'base_model_name_or_path'

but I can use peft to load the sftr model with

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

#TODO: update it to your chosen epoch
trained_model_path = "models/torchtune/llama3_2_3B/lora_single_device/epoch_1"

# Define the model and adapter paths
# # To Avoid this error, we can use local model
original_model_name = '/home/cine/Documents/tune/models/Llama-3.2-3B-Instruct'
model = AutoModelForCausalLM.from_pretrained(original_model_name)

# huggingface will look for adapter_model.safetensors and adapter_config.json
peft_model = PeftModel.from_pretrained(model, trained_model_path)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(original_model_name)

# Function to generate text
def generate_text(model, tokenizer, prompt, max_length=50):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=max_length)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

prompt = "tell me a joke: '"
print("Base model output:", generate_text(peft_model, tokenizer, prompt))
@joecummings joecummings added triaged This issue has been assigned an owner and appropriate label bug Something isn't working labels Jan 6, 2025
@felipemello1
Copy link
Contributor

felipemello1 commented Jan 6, 2025

huggingface may be prioritizing reading from "adapter_config.json", instead of reading the model config. Maybe when i tested it, I tried it with full finetuning, instead of lora.

One sanity check is to remove or move adapter_model.safetensors and adapter_config.json files, to see if it defaults to the full model. I am on PTO this week, but i can look into it next week.

@Ankur-singh
Copy link
Contributor

@chg0901 I'm not able to reproduce the error. For me it seems to work just fine. I might be missing something; can you please help me reproduce it?

@chg0901
Copy link
Author

chg0901 commented Jan 13, 2025 via email

@Ankur-singh
Copy link
Contributor

Will it be possible to share a colab notebook with all the code to reproduce the error?

@chg0901
Copy link
Author

chg0901 commented Jan 13, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged This issue has been assigned an owner and appropriate label
Projects
None yet
Development

No branches or pull requests

4 participants