Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve diagnostics for kernel supervisor startup failures #5705

Merged
merged 4 commits into from
Dec 12, 2024

Conversation

jmcphers
Copy link
Collaborator

This change improves diagnostics and logging for (hopefully rare) cases in which the kernel supervisor itself cannot start. This is distinct from the cases where R or Python can't start. So far we've seen just one of these in the wild, as a result of running on an unsupported OS.

The approach is to borrow the wrapper script technique from the Jupyter Adapter (formerly used to invoke the kernels themselves). The wrapper script acts as a sort of supervisor for the supervisor; it eats the output of the supervisor process and writes it to a file. If the supervisor exits unexpectedly at startup, the output file is written to the log channel, and the user is directed there to view errors.

As an additional benefit, this runs the supervisor under a bash process on Unix-alikes, so any environment variables or configuration set up in .bashrc (etc) will now be available to the supervisor.

Addresses #5611 .

May help us figure out #5337.

QA Notes

An easy way to test this is to replace your kcserver binary with a shell script that emits some nonsense and then exits immediately with a nonzero status code. If you're feeling ambitious, you could also test this on the OS named in #5611.

Also, did you know that the longest worm in the world can reach up to 55 meters? Crazy.

@jmcphers jmcphers requested a review from sharon-wang December 11, 2024 23:38
Copy link
Collaborator

@petetronic petetronic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed on macOS that replacing kcserver binary with:

echo "Foo"
exit 197

Showed in the output logs, as well as the new information message notification directing you to open those logs.

2024-12-12 14:23:04.701 [info] Starting Kallichore server /Users/username/dev/git/posit/positron/extensions/positron-supervisor/resources/kallichore/kcserver on port 59681
Waiting for Kallichore server to start (attempt 1, 120ms)
Supervisor terminal closed with exit code 197; output:
/Users/username/dev/git/posit/positron/extensions/positron-supervisor/resources/kallichore/kcserver --port 59681 --token /var/folders/fz/p04t93s91vdd27ljtdr6rrg40000gp/T/kallichore-d071ecc4.token --log-level debug --log-file /var/folders/fz/p04t93s91vdd27ljtdr6rrg40000gp/T/kallichore-d071ecc4.log
Foo

Failed to start Kallichore server: Error: The supervisor process exited before the server was ready.

Copy link
Member

@sharon-wang sharon-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on Windows 👍

I replaced the kcserver binary with a kcserver.bat:

@echo off
echo "worm"
exit /b 55

image

2024-12-12 15:17:22.454 [info] Positron Kernel Supervisor activated
2024-12-12 15:17:22.454 [info] Kallichore server PID 19200 is not running
2024-12-12 15:17:22.454 [info] Could not reconnect to Kallichore server at http://localhost:42305. Starting a new server
2024-12-12 15:17:22.454 [info] Starting Kallichore server c:\Users\sharon\positron\extensions\positron-supervisor\resources\kallichore\kcserver.bat on port 44281
Waiting for Kallichore server to start (attempt 1, 633ms)
Waiting for Kallichore server to start (attempt 6, 1660ms)
Waiting for Kallichore server to start (attempt 11, 1975ms)
Supervisor terminal closed with exit code 55; output:
c:\Users\sharon\positron\extensions\positron-supervisor\resources\kallichore\kcserver.bat --port 44281 --token C:\Users\sharon\AppData\Local\Temp\kallichore-9b27dbe3.token --log-level debug --log-file C:\Users\sharon\AppData\Local\Temp\kallichore-9b27dbe3.log 
"worm"

Failed to start Kallichore server: Error: The supervisor process exited before the server was ready.

@jmcphers jmcphers merged commit 6f5ea7e into main Dec 12, 2024
5 checks passed
@jmcphers jmcphers deleted the feature/kallichore-startup-logging branch December 12, 2024 22:11
@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants