-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use UQFF File locally without sending requests to Hugging Face? #821
Comments
Hey, I'm not working with UQFF so I can't say for sure, but I think this might be possible. Here's the functional command line syntax that I used. I'm also completely new to this project and I haven't worked with it in code yet either, so forgive my naivety. |
@Oracuda It seems I'm encountering a problem similar to these issues: Alternatively, my understanding of UQFF might be incorrect to begin with. |
Thank you for the explanation. I also appreciate the work you're doing on #849 to generate all the necessary files when creating a UQFF model. However, I encountered the same error as described in this issue: For context, I was trying to enable metal and use UQFF with the following command:
Interestingly, I was able to build without errors using the Would it be possible to incorporate the changes from PR #846 into the |
@solaoi I merged both #846 and #849, so you can now load UQFF models without downloading the full weights in #849! For example (https://huggingface.co/EricB/Llama-3.2-11B-Vision-Instruct-UQFF):
More models can be found here: https://huggingface.co/collections/EricB/uqff-670e4a49d56ecdd3f7f0fd4c. |
@EricLBuehler Hi, I have a question about the new UQFF API. After specifying the local paths for the uqff file and tokenizer.json, I noticed that the program tries to download config.json from the huggingface. Is there a way to specify a local path for the config.json file? 2024-10-16T05:29:19.553770Z INFO mistralrs_core::pipeline::vision: Using tokenizer.json at `~/Downloads/tokenizer.json`
2024-10-16T05:29:19.553829Z INFO mistralrs_core::pipeline::vision: Loading `config.json` at `EricB/Llama-3.2-11B-Vision-Instruct-UQFF` here is my code use anyhow::Result;
use mistralrs::{IsqType, TextMessageRole, VisionLoaderType, VisionMessages, VisionModelBuilder};
const MODEL_ID: &str = "EricB/Llama-3.2-11B-Vision-Instruct-UQFF";
#[tokio::main]
async fn main() -> Result<()> {
let model = VisionModelBuilder::new(MODEL_ID, VisionLoaderType::VLlama)
.with_isq(IsqType::Q4K)
.with_logging()
.from_uqff("~/Downloads/llam3.2-vision-instruct-q4k.uqff".into())
.with_tokenizer_json("~/Downloads/tokenizer.json")
.build()
.await?;
let bytes = match reqwest::blocking::get(
"https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg",
) {
Ok(http_resp) => http_resp.bytes()?.to_vec(),
Err(e) => anyhow::bail!(e),
};
let image = image::load_from_memory(&bytes)?;
let messages = VisionMessages::new().add_vllama_image_message(
TextMessageRole::User,
"What is depicted here? Please describe the scene in detail.",
image,
);
let response = model.send_chat_request(messages).await?;
println!("{}", response.choices[0].message.content.as_ref().unwrap());
dbg!(
response.usage.avg_prompt_tok_per_sec,
response.usage.avg_compl_tok_per_sec
);
Ok(())
} |
Hi @jiabochao! If you specify a Hugging Face model ID, it will always source the tokenizer from there. If you want to avoid downloading files, it would be best to download the model locally and then use a local model ID. |
@EricLBuehler How can I use a local model ID? Could you please provide me with some examples? |
@EricLBuehler I executed the following command:
However, after execution, I couldn't find the UQFF file.
Could you please look into this?
|
@EricLBuehler
|
@solaoi thanks for catching that. On my Metal machine (M3 Max, macOS Sonoma), it worked during testing but now fails intermittently. This seems to be caused by something in our Candle backend and warrants further investigation! This was actually a regression from a recently merged PR (#857), but I just merged #861 which seems to work now. If you get any errors during the loading phase, please let me know. |
@EricLBuehler The error message is indeed consistent with the one we discussed earlier: "A command encoder is already encoding to this command buffer". On a positive note, I'm glad to hear that when the build does succeed, we're able to load the generated files and use the model without any issues. That's encouraging, but we should still address the inconsistent build process. |
@jiabochao
Once you've downloaded these files, place them in a directory (e.g., |
@solaoi Thank you for your reply, it works! but another error has occurred.
|
@solaoi are you still running into intermittent errors? |
@jiabochao the following works for me:
Perhaps you downloaded the wrong |
@EricLBuehler hi, the issue occurs when specifying let model = VisionModelBuilder::new(MODEL_ID, VisionLoaderType::VLlama)
.with_isq(IsqType::Q4K) // here
.with_logging()
.from_uqff("llama3.2-vision-instruct-q4k.uqff".into())
.build()
.await?; |
@EricLBuehler |
Describe the bug
I'm trying to use UQFF File in a local environment only, but my sample code is still sending requests to Hugging Face.
I would like to know how to prevent these external requests and use UQFF File entirely locally.
Sample Code
I tried changing the model_id from
aixsatoshi/Honyaku-13b
to./honyaku13B/Honyaku-13b-q4_0.uqff
, but this resulted in the following error:Latest commit or version
329e0e8
The text was updated successfully, but these errors were encountered: