-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eval bug: Slow model loading w/ mmap #10478
Comments
How many ram do you have? Is the second run also same slow? Otherwise it' s expected mmap behavior. |
I have 36gb ram (32 available to GPU). I tested with multiple large models, such as Mixtral 8x7b q4_k_m. Yes, it is still slow on the second run. |
Or just use a frontend like Ollama and Continue extension to set the config of useMmap for each model https://docs.continue.dev/reference. |
I already know to disable it; I'm just saying I don't think it should be this slow just from mmap. With smaller models (~<40b q4) the difference is negligible. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
This was not completed! |
Name and Version
all recent versions
Which operating systems do you know to be affected?
Mac
GGML backends
Metal
Hardware
Apple Silicon. M3 Max
Model
Any big model such as Mixtral 8x7b.
Steps to Reproduce
Just load it with mmap and notice how much slower it is compared to without mmap.
See original issue here: #9244 (comment)
First Bad Commit
idk
Relevant log output
it just loads for a long time.
The text was updated successfully, but these errors were encountered: