Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: 4171 - Model loading gets stuck on stop #4177

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

louis-jan
Copy link
Contributor

@louis-jan louis-jan commented Dec 2, 2024

Describe Your Changes

This PR aims to update the cortex.cpp version to address a couple of issues, including rounded float values and stopping the model after loading.

Also, to cancel the pending model start quest, as there’s a known server-side issue where the model couldn’t be stopped mid-load.

CleanShot 2024-12-02 at 14 06 26

I’ve noticed that enabling cont_batching and increasing the number of parallel operations can slightly improve the user experience, especially when the model is generating requests in the background. (There’s an issue where the server can’t stop a running inference). So that better to add these settings into cortex for advanced usage for LLM enthusiast, there will be options and enhanced on the settings UX, this is the first step of mapping engine parameters into Jan Settings.
CleanShot 2024-12-02 at 14 08 52

Fixes Issues

Changes made

The code changes include:

  1. Version Update:

    • The version in version.txt is updated from 1.0.4-rc4 to 1.0.4-rc5.
  2. Settings Update:

    • The default_settings.json has been modified to include new settings (cont_batching, caching_enabled, cache_type, use_mmap) and updated descriptions and default values for various settings. The placeholder and value fields are also adjusted.
  3. Refactoring Constants:

    • In rollup.config.ts and global.d.ts, DEFAULT_SETTINGS is renamed to SETTINGS.
  4. New Enum for Settings:

    • An enum Settings is created in index.ts to manage different available settings.
  5. Settings Handling:

    • New properties added to the class JanInferenceCortexExtension for default settings values.
    • Adding a method onSettingUpdate to handle updates to settings.
    • Register settings on load and use these settings during model operations in loadModel.
  6. Abort Controller:

    • Introduces abort controllers for managing requests associated with models and handles cleanup after operations.
  7. Enhancements in UI:

    • In ErrorMessage, LoadModelError, and TextMessage components, CSS classes are adjusted for capitalization and overflow behavior to improve UI consistency and presentation.

These changes encompass enhancements in functionality (settings management, abort controllers) and user interface adjustments.

@louis-jan louis-jan requested a review from a team December 2, 2024 07:11
@github-actions github-actions bot added the type: bug Something isn't working label Dec 2, 2024
Copy link
Contributor

github-actions bot commented Dec 2, 2024

Barecheck - Code coverage report

Total: 69.32%

Your code coverage diff: 0.00% ▴

Uncovered files and lines
FileLines
web/containers/ErrorMessage/index.tsx38, 40-41, 44
web/screens/Thread/ThreadCenterPanel/LoadModelError/index.tsx17-21, 23-24, 35-37, 40, 57
web/screens/Thread/ThreadCenterPanel/TextMessage/index.tsx28-31, 33-34, 36-37, 40-41

@louis-jan louis-jan merged commit 3118bba into dev Dec 2, 2024
11 checks passed
@louis-jan louis-jan deleted the fix/4171-model-loading-takes-extremely-long branch December 2, 2024 07:32
@github-actions github-actions bot added this to the v0.5.10 milestone Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants