You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem
If GPU acceleration is enabled, Jan appears to follow "all or nothing" strategy, with model failing to activate completely if there is not enough vRAM, for example.
Success Criteria
A much better approach would be "graceful degradation", with model activating using CPU instead. Perhaps with a UI warning to notify user what has happened. That way at least the model would still respond, even if more slowly. Additionally, it'd allow accelerating small models and still working with larger ones.
An ideal approach would be to implement partial model offloading. That way it'd be possible to make a guess at how many layers can be safely offloaded into vRAM, so the model is accelerated as much as possible with the given hardware.
Additional context
I think LMStudio and GPT4All implement partial model offloading, so it's something that's possible to do. However, they just stick a slider into UI and leave it to user to find out how many layers can be loaded into vRAM.
The text was updated successfully, but these errors were encountered:
Problem
If GPU acceleration is enabled, Jan appears to follow "all or nothing" strategy, with model failing to activate completely if there is not enough vRAM, for example.
Success Criteria
A much better approach would be "graceful degradation", with model activating using CPU instead. Perhaps with a UI warning to notify user what has happened. That way at least the model would still respond, even if more slowly. Additionally, it'd allow accelerating small models and still working with larger ones.
An ideal approach would be to implement partial model offloading. That way it'd be possible to make a guess at how many layers can be safely offloaded into vRAM, so the model is accelerated as much as possible with the given hardware.
Additional context
I think LMStudio and GPT4All implement partial model offloading, so it's something that's possible to do. However, they just stick a slider into UI and leave it to user to find out how many layers can be loaded into vRAM.
The text was updated successfully, but these errors were encountered: