You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When initializing and dropping the Model repeatedly:
Memory usage continuously increases as GGUF models aren't properly cleaned up
Channel is erroneously closed after the first iteration
Steps to Reproduce
Create a service that initializes and drops the model multiple times
Run the following code:
use anyhow::Result;
use mistralrs::{GgufModelBuilder, PagedAttentionMetaBuilder, TextMessageRole, TextMessages};
use std::time::Duration;
use tokio::time::sleep;
struct ChatService {
model: Option<mistralrs::Model>,
}
impl ChatService {
async fn new() -> Result<Self> {
Ok(Self { model: None })
}
async fn initialize_model(&mut self) -> Result<()> {
self.model = Some(
GgufModelBuilder::new(
"gguf_models/mistral_v0.1/",
vec!["mistral-7b-instruct-v0.1.Q4_K_M.gguf"],
)
.with_chat_template("chat_templates/mistral.json")
.with_paged_attn(|| PagedAttentionMetaBuilder::default().build())?
.build()
.await?,
);
Ok(())
}
async fn chat(&self, prompt: &str) -> Result<String> {
let messages = TextMessages::new().add_message(TextMessageRole::User, prompt);
let response = self
.model
.as_ref()
.unwrap()
.send_chat_request(messages)
.await?;
Ok(response.choices[0]
.message
.content
.clone()
.unwrap_or_default())
}
}
#[tokio::main]
async fn main() -> Result<()> {
for i in 0..3 {
println!("Iteration {}", i);
let mut service = ChatService::new().await?;
service.initialize_model().await?;
let response = service.chat("Write a short greeting").await?;
println!("Response: {}", response);
// Model is dropped here, but GGUF remains in memory
drop(service);
// Wait to make memory usage observable
sleep(Duration::from_secs(5)).await;
}
Ok(())
}
Cargo.toml is here:
[package]
name = "memory_bug_mistral"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
tokio = { version = "1", features = ["full"] }
anyhow = "1.0"
mistralrs = { git = "https://github.com/EricLBuehler/mistral.rs.git", branch = "master", features = [
"metal",
] }
regex="1.10.6"
Observed Behavior
Memory usage increases with each iteration even after explicit drop
After first iteration, receiving error:
Error: Channel was erroneously closed!
Expected Behavior
Memory should be properly freed when model is dropped
Channel should remain functional for subsequent iterations
Describe the bug
When initializing and dropping the Model repeatedly:
Steps to Reproduce
Cargo.toml
is here:Observed Behavior
Memory usage increases with each iteration even after explicit drop
After first iteration, receiving error:
Expected Behavior
Latest commit or version
The text was updated successfully, but these errors were encountered: