Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add a startup field for checks, and start-checks and stop-checks actions #560

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

tonyandrewmeyer
Copy link
Contributor

@tonyandrewmeyer tonyandrewmeyer commented Jan 16, 2025

This is not at all ready - the PR gives me a convenient place to put notes.

Add support for starting and stopping checks, in a very similar manner to starting and stopping services.

Layer

The layer specification gains a new startup field for checks, which must be either enabled or disabled, and defaults to enabled for backwards compatibility.

The impact of this field is that when the plan is updated (including when Pebble first starts) or a layer is added, new checks where startup is set to disabled will not be started. If a check's configuration changes (for example, the timeout or period), and it is set to startup: disabled but is currently running, then it will be restarted as previously. Any stopped checks with startup: enabled will also be started when the plan changes.

This field is implemented via the existing check manager PlanChanged function. That function is registered as a listener for when the plan changes, which includes new checks being added, and check fields (including startup) changing. The function is adjusted in this PR to only start new checks if startup is enabled. For replan write details here.

CLI

Minor change: since replan now impacts checks as well as services, it is moved in the help listing from the services section to the plan section.

Two new commands are added: start-checks and stop-checks. These take one or more check names as arguments, and start or stop the specified checks. Starting a running check or stopping a stopped check is a no-op.

The checks command output is altered to include a new "Startup" column, which is simply the value of the field from the combined plan for that check, and to show inactive in the "Status" column for any checks that have been stopped (these also get a - value for both "Change" and "Failures").

start-checks and stop-checks are very similar to start and stop, except that they are synchronous so do not offer wait. Checks currently have exactly one change (this PR changes that to zero or one), with the appropriate tasks contained within that change. Since start-check and stop-check can operate on more than one check, that means that they are operating on more than one change, so an interface where a single change ID is returned (and then can be waited on) does not fit.

We could only offer start-check and stop-check, but in the spec review we preferred to have the option to specify multiple checks (as with services). For services, a new change is created that contains the tasks to start or stop all of the services (in the appropriate order) - doing this for checks does not seem to make sense, given that there is already a change running for each check. We could have a new response type that can wait for multiple changes, but that seems like a lot just for this small feature.

API

The existing /v1/checks endpoint is extended to also support POST, requiring admin access. The structure for this endpoint is similar to posting to /v1/services, in that it requires a list of check names and an action to perform (currently either start or stop).

TODO

  • Docs.
  • Fix existing tests.
  • Add more tests for the new functionality.
  • Finalise what the result value should be.
  • Verify behaviour is correct when a stopped check has the config changed.
  • Errors when the check is not in the plan should be more consistent.
  • Do we actually want to panic when the check is in the plan and not in the manager?
  • The checks client has the level as well as check names in the options: we currently ignore the level, because it doesn't seem like you'd want to do "stop all ready checks", but it would probably be cleaner if it was actually used to filer.
  • Autostart: I had this originally because I thought that replan would basically be calling it, but it turns out that's not the case. Currently the API supports it but the CLI does not - it should be removed from one or added to the other.
  • Error message texts: this is a bit inconsistent at the moment (see comments in poc: add run-check to execute a check immediately #557) - that should be resolved in some way.
  • It feels a bit off for the change ID to be the empty string when there's no change. Is this definitely how it should be done? Maybe making it possible to be nil? Have an empty change that has an error message saying it's inactive?
  • Health endpoint needs to skip inactive checks
  • If there's a check that doesn't exist, we currently process the ones up to that point and then fail, so the behaviour is different depending on the order of the checks provided. Probably nicer to fail upfront instead.
  • Check (again?) that if there's a plain replan and there's a startup:enabled check that is stopped that it gets started: it does not, so need to explicitly hook this up.

Spec - OP052 (internal link only, sorry).

@tonyandrewmeyer
Copy link
Contributor Author

@benhoyt as discussed, if you could give this a "pre-review", that would be great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant