-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'NoneType' object has no attribute 'attn_bias' #1490
Comments
Same, happy new year btw Continued pretraining qwen2.5-coder-7b-bnb4bit |
Yeah this issue is happening. The extra embed and lm head layers are causing this. I dont think they are even need for training if we are not using a different tokenizer or changed the tokenizer. |
Is there any solution for this? |
it works on my local and these are the versions i have: Package Version absl-py 2.1.0 On local I am using bf16 precision. On colab I was trying to do the pretraining following the same script but was getting error when i added the lm_head and embed layers. On there, I keep getting the error of data type mismatch float 16 and float 32. Something to do with these 2 layers for the Qwen Models. ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 AssertionError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/triton/language/core.py in wrapper(*args, **kwargs) 33 frames /usr/local/lib/python3.10/dist-packages/triton/language/core.py in dot(input, other, acc, input_precision, allow_tf32, max_num_imprecise_acc, out_dtype, _builder) /usr/local/lib/python3.10/dist-packages/triton/language/semantic.py in dot(lhs, rhs, acc, input_precision, max_num_imprecise_acc, out_dtype, builder) /usr/local/lib/python3.10/dist-packages/triton/language/semantic.py in assert_dtypes_valid(lhs_dtype, rhs_dtype, options) AssertionError: First input (fp16) and second input (fp32) must have the same dtype! The above exception was the direct cause of the following exception: CompilationError Traceback (most recent call last) in <cell line: 1>() /usr/local/lib/python3.10/dist-packages/unsloth/tokenizer_utils.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) /usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) /usr/local/lib/python3.10/dist-packages/unsloth/models/_utils.py in _unsloth_training_step(self, model, inputs, num_items_in_batch) /usr/local/lib/python3.10/dist-packages/unsloth/models/_utils.py in _unsloth_pre_compute_loss(self, model, inputs, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs, num_items_in_batch) /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py in forward(*args, **kwargs) /usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py in call(self, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py in decorate_autocast(*args, **kwargs) /usr/local/lib/python3.10/dist-packages/torch/_compile.py in inner(*args, **kwargs) /usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py in _fn(*args, **kwargs) /usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py in PeftModelForCausalLM_fast_forward(self, input_ids, causal_mask, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, task_ids, num_logits_to_keep, **kwargs) /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _wrapped_call_impl(self, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in _call_impl(self, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py in forward(self, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py in _CausalLM_fast_forward(self, input_ids, causal_mask, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, num_logits_to_keep, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/unsloth_zoo/loss_utils.py in fused_linear_cross_entropy(hidden_states, lm_weight, labels, num_items_in_batch, ignore_index, reduction, logit_softcapping, accuracy_threshold) /usr/local/lib/python3.10/dist-packages/cut_cross_entropy/linear_cross_entropy.py in linear_cross_entropy(e, c, targets, ignore_index, softcap, reduction, shift, filter_eps, impl) /usr/local/lib/python3.10/dist-packages/cut_cross_entropy/cce.py in cce_linear_cross_entropy(e, c, targets, ignore_index, softcap, reduction, shift, filter_eps) /usr/local/lib/python3.10/dist-packages/cut_cross_entropy/cce.py in linear_cross_entropy_apply(e, c, params) /usr/local/lib/python3.10/dist-packages/torch/autograd/function.py in apply(cls, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/cut_cross_entropy/cce.py in forward(ctx, e, c, params) /usr/local/lib/python3.10/dist-packages/cut_cross_entropy/cce_lse_forward.py in cce_lse_forward_kernel(e, c, valids, softcap, return_logit_avg) /usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py in (*args, **kwargs) /usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in run(self, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py in run(self, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py in run(self, grid, warmup, *args, **kwargs) /usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py in compile(src, target, options) /usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py in make_ir(self, options, codegen_fns, context) CompilationError: at 60:16:
|
Same error i get for llama 3.2 models as well. The default mistral (unsloth/mistral-7b-v0.3) model in the pretraining notebook that one is working fine. |
On colab have to use float 16 that is the only change so maybe error rises due to that. On local, I am using bfloat 16 and did not run into any errors. My notebook link: https://github.com/mosama1994/Unsloth-Pretraining/blob/main/Pretraining.ipynb |
It's OK when I downgrade the xformers to 0.0.27.post2 (the lowest version of xformers that unsloth requires)... |
What is the point of the example notebooks if they don't work? |
I got an error of xformers:
and my xformers is built for py311_cu12.1.0_pyt2.5.1
why do I get this error?
The text was updated successfully, but these errors were encountered: