-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perplexity (ppl) Calculation of Local Sparse Model: NaN issue #853
Comments
Can you share the model and perhaps some text output from the model? Does the text look reasonable? |
Hi @robertgshaw2-neuralmagic Robert, you were right to question this. I retested the original llama-7B Sparse conversion example from llm-compressor today, along with a simple Load local Sparse Llama-7B modelTest Model Output (Ref)Calculating PerplexityNaN ResultI think the issue is now clearer. I believe the problem lies in how I load the local Sparse Model & Tokenizer. Does Also, I apologize for not providing the exact sparse model I used. After running it in the online RunPod environment, I didn’t download the model. However, this process should be easy to replicate. Here are the steps I followed for testing: Step 1: Execute the official llama-7B sparse conversion example from llm-compressor : run ##The Success Case with Llama3-3B online model Test Model OutputCalculating PerplexityResultSummaryI want to correctly load the local sparse model and calculate its perplexity as an evaluation metric. However, it seems that I haven’t used the correct method to load the model (through the And my testing jupyter notebook is in attatchment. |
👋 Hello Neural Magic community developers,
I encountered an issue while calculating the perplexity for a locally converted Llama3-8B sparse model using the llm-compress library. I'm refer the sparse conversion example script and change model to meta-llama/Meta-Llama-3-8B-Instruct by my self, the sparse conversion need ~ 1.2 hours to finish.
Here’s a detailed breakdown:
Describe the bug
While trying to compute the WikiText2 Perplexity for a Llama3-8B model that has been sparsified (load local model from disk), the resulting perplexity values always turn out to be NaN. I suspect that some configurations might not be properly set when using the custom SparseAutoModelForCausalLM class in combination with the compressed-tensors library.
Expected behavior
I expected the perplexity values to be reasonable and comparable to the official Hugging Face models. For example, when testing with the standard Llama-3.2-3B model from Hugging Face (without sparsification), I got a perplexity of around ~8.8 with the following parameters:
I expected similar results for the sparse model, not NaN values.
Environment
I use RunPod online env with A100-80GB-SXM *2
To Reproduce
Steps to reproduce the behavior:
Errors
Here’s the output I receive when running the perplexity calculation, see the attachment image. The perplexity of local Llama-8B model (load by SparseAutoModelForCausalLM class) always be NaN value. Test with Llama-3B model (load by AutoModelForCausalLM class) can successfully get ppl value.
Sparse Llama 8B (load by SparseAutoModelForCausalLM class) : ppl will be NaN
Load Online Llama 3B (load by AutoModelForCausalLM class) : successfully get ppl value
Additional context
The same perplexity calculation process works perfectly when using the Hugging Face Llama-3.2-3B model without sparsification, which gives a perplexity value of ~8.8. I believe the issue lies in either the custom sparse model class or the integration with compressed-tensors. Maybe I miss some additional configuration/setting of Sparse model ? 🧐
Any guidance on this would be appreciated! 🥰
Additional Question
How to load the final quantization model (i.e the model be saved in stage_quantization folder) correctly ?
I also interest of ppl of final quantization model, but I try load with SparseAutoModelForCausalLM it can not be work 😢
it shows some message mean : "... ... class not support ..."
So how to load the final quantization model correctly ? any documentation can be refer ? 🙏🏼
The text was updated successfully, but these errors were encountered: