Re-using model weights after initial load? #2158

MoonKraken · 2024-05-03T22:39:10Z

MoonKraken
May 3, 2024

A bit of a noob question mostly because I don't yet have a full grasp on how transformers work, but I noticed that the ModelWeights loaded from file needs to be mutable in order to perform an inference:

https://github.com/huggingface/candle/blob/main/candle-examples/examples/quantized-phi/main.rs#L189

I'm guessing this is because it needs to keep some internal state to accurately predict the next token.

I'd like to load a model from file and create many unrelated inference sessions with that same model. What's the best approach for this? Save off the initial model weights and clone them every time a new inference session needs to be created? Worried I might be barking up the wrong tree here. TIA

EricLBuehler · 2024-05-03T23:09:59Z

EricLBuehler
May 3, 2024

The reason the model needs an &mut self is that it needs to modify its KV cache. The KV cache allows the model to cache the attention computation from previous steps. It is specific to each inference session.

However, if you are running multiple inference sessions at once, you should use an inference platform designed for this. If that is not necessary, you can just clear the KV cache after one session is done and repeat the process.

0 replies

LaurentMazare · 2024-05-04T06:05:35Z

LaurentMazare
May 4, 2024
Maintainer

The recommend solution to handle multiple sessions at once is load the model then for each session clone this "main" model initially (most models are cloneable). This will ensure that each session uses a separate kv cache and the weights will be shared between the different sessions. (you could re-use the sessions by clearning the kv caches but that is not really necessary).

1 reply

MoonKraken May 4, 2024
Author

got it thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-using model weights after initial load? #2158

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Re-using model weights after initial load? #2158

MoonKraken May 3, 2024

Replies: 2 comments · 1 reply

EricLBuehler May 3, 2024

LaurentMazare May 4, 2024 Maintainer

MoonKraken May 4, 2024 Author

MoonKraken
May 3, 2024

Replies: 2 comments 1 reply

EricLBuehler
May 3, 2024

LaurentMazare
May 4, 2024
Maintainer

MoonKraken May 4, 2024
Author