-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
int4_weight_only
api got error when saving transformers models
#1704
Comments
Same error on CUDA, we cannot save model if we pass import torch
from transformers import TorchAoConfig, AutoModelForCausalLM, AutoTokenizer
from torchao.dtypes import TensorCoreTiledLayout
model_name = "meta-llama/Llama-3.1-8B-Instruct"
device_map = "cuda:0"
# We support int4_weight_only, int8_weight_only and int8_dynamic_activation_int8_weight
# More examples and documentations for arguments can be found in https://github.com/pytorch/ao/tree/main/torchao/quantization#other-available-quantization-techniques
quantization_config = TorchAoConfig("int4_weight_only", group_size=128, layout=TensorCoreTiledLayout())
quantized_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map=device_map, quantization_config=quantization_config)
quantized_model.save_pretrained("./llama3-8B-ao-int4", safe_serialization=False) error:
|
4 tasks
int4_weight_only
api is not friendly for transformers model when savingint4_weight_only
api git error when saving transformers models
int4_weight_only
api git error when saving transformers modelsint4_weight_only
api got error when saving transformers models
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When I load a int4 cpu quantized model and want to save this model, I got this issue:
TypeError: Object of type Int4CPULayout is not JSON serializable
To reproduce it:
output:
I was thinking if we could change to a more friendly data structure to save layout data.
The text was updated successfully, but these errors were encountered: