Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: engine restart process for Deno used wrong status #33865

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

d-gubert
Copy link
Member

@d-gubert d-gubert commented Nov 1, 2024

Proposed changes (including videos or screenshots)

Whenever the Deno subprocess of an app crashes, the Apps-Engine tries to restart it so we can keep the operation going. However, the app wasn't correctly reporting its status afterwards, and stayed at the constructed or initialized status, which the engine does not recognize as enabled.

Also, we're removing the limit for the number of times the Apps-Engine would restart an app. At this point, it is not possible to show the restart or process status information to the admin, so it would just cause confusion to see the app not working.

Issue(s)

SUP-690

Steps to test or reproduce

Further comments

Copy link
Contributor

dionisio-bot bot commented Nov 1, 2024

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

Copy link

changeset-bot bot commented Nov 1, 2024

🦋 Changeset detected

Latest commit: ef8469f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 36 packages
Name Type
@rocket.chat/apps-engine Patch
@rocket.chat/meteor Patch
@rocket.chat/apps Patch
@rocket.chat/core-services Patch
@rocket.chat/core-typings Patch
@rocket.chat/fuselage-ui-kit Patch
@rocket.chat/rest-typings Patch
@rocket.chat/ddp-streamer Patch
@rocket.chat/presence Patch
rocketchat-services Patch
@rocket.chat/uikit-playground Patch
@rocket.chat/api-client Patch
@rocket.chat/cron Patch
@rocket.chat/ddp-client Patch
@rocket.chat/freeswitch Patch
@rocket.chat/gazzodown Patch
@rocket.chat/livechat Patch
@rocket.chat/model-typings Patch
@rocket.chat/ui-contexts Patch
@rocket.chat/account-service Patch
@rocket.chat/authorization-service Patch
@rocket.chat/omnichannel-transcript Patch
@rocket.chat/presence-service Patch
@rocket.chat/queue-worker Patch
@rocket.chat/stream-hub-service Patch
@rocket.chat/license Patch
@rocket.chat/omnichannel-services Patch
@rocket.chat/pdf-worker Patch
@rocket.chat/network-broker Patch
@rocket.chat/models Patch
@rocket.chat/ui-avatar Patch
@rocket.chat/ui-client Patch
@rocket.chat/ui-video-conf Patch
@rocket.chat/ui-voip Patch
@rocket.chat/web-ui-registration Patch
@rocket.chat/instance-status Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@d-gubert d-gubert added this to the 7.1.0 milestone Nov 1, 2024
Copy link
Contributor

github-actions bot commented Nov 1, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://RocketChat.github.io/Rocket.Chat/pr-preview/pr-33865/
on branch gh-pages at 2024-11-04 20:05 UTC

Copy link

codecov bot commented Nov 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.42%. Comparing base (953c052) to head (ef8469f).
Report is 2 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop   #33865   +/-   ##
========================================
  Coverage    75.42%   75.42%           
========================================
  Files          493      493           
  Lines        21499    21499           
  Branches      5337     5337           
========================================
  Hits         16215    16215           
  Misses        4644     4644           
  Partials       640      640           
Flag Coverage Δ
unit 75.42% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@d-gubert d-gubert marked this pull request as ready for review November 1, 2024 19:05
@d-gubert d-gubert requested a review from a team as a code owner November 1, 2024 19:05
tapiarafael
tapiarafael previously approved these changes Nov 1, 2024
@@ -10,7 +10,7 @@ const defaultOptions: LivenessManager['options'] = {
pingRequestTimeout: 10000,
pingFrequencyInMS: 10000,
consecutiveTimeoutLimit: 4,
maxRestarts: 3,
maxRestarts: Infinity,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe? If an app restarted more than 10 times, should we continue trying to restart?

Copy link
Member Author

@d-gubert d-gubert Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safe, but you're right to suggest that after some restarts it might mean that the app should not work at all. However, until we have a place where we can show the restart information it will just cause confusion as to why the app is not working properly. Mainly because a simple "enable" call will not make the engine create a new subprocess for the app

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance of deno being "restarted" wrongly?

I ask because we can effectively then have a tremendous number of stale processes running

Copy link
Member

@debdutdeb debdutdeb Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe after a small amount of restarts the log gets pushed to all installed apps'/affected app's individual log? So we dont need a way to expose it centrally but is available through all apps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance of deno being "restarted" wrongly?

You mean like, unnecessarily? I guess it's possible, but the "restart" procedure attempts to kill the living process before spawning a new one, and if we fail killing the process we just halt. So I'd think the chances are slim

Maybe after a small amount of restarts the log gets pushed to all installed apps'/affected app's individual log?

I can add a new warn log if the restart call happens after a threshold of 3 times

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user attempts to "enable" an app while it's being restarted? Could it cause 2 attempts of start, and 2 loops of "app restarting"?

I think we can have a special log of "app restarting" every time it happens so users can also check on them, and we as well. Cause if an app is restarting 2 times every some time, some actions may not work but since we're not passing the threshold, we won't see what's going on.

Also, i think these restart logs should (even if we don't show on the UI) have all the information needed about the process failing and a why if possible. wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user attempts to "enable" an app while it's being restarted? Could it cause 2 attempts of start, and 2 loops of "app restarting"?

If a user tries to enable an app while it's being restarted it will fail because the process is not ready to take in requests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the logging a bit. Please @tapiarafael @debdutdeb @KevLehman check it out again

.changeset/old-coins-bow.md Outdated Show resolved Hide resolved
@@ -10,7 +10,7 @@ const defaultOptions: LivenessManager['options'] = {
pingRequestTimeout: 10000,
pingFrequencyInMS: 10000,
consecutiveTimeoutLimit: 4,
maxRestarts: 3,
maxRestarts: Infinity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user attempts to "enable" an app while it's being restarted? Could it cause 2 attempts of start, and 2 loops of "app restarting"?

I think we can have a special log of "app restarting" every time it happens so users can also check on them, and we as well. Cause if an app is restarting 2 times every some time, some actions may not work but since we're not passing the threshold, we won't see what's going on.

Also, i think these restart logs should (even if we don't show on the UI) have all the information needed about the process failing and a why if possible. wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants