Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add a startup field for checks, and start-checks and stop-checks actions #560

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions HACKING.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,31 @@ To pull in the latest style and dependencies from the starter pack, clone the [C
- Under the `docs/` folder, run `python3 build_requirements.py`. This generates the latest `requirements.txt` under the `.sphinx/` folder.
- Under the `docs/` folder, run `tox -e docs-dep` to compile a pinned requirements file for tox environments.

## Updating the CLI reference documentation

To add a new CLI command, ensure that it is added in the list at the top of the [doc](docs/reference/cli-commands.md) in the appropriate section, and then add a new section for the details **in alphabetical order**.

The section should look like:

```
(reference_pebble_{command name}_command)=
## {command name}

The `{command name}` command is used to {describe the command}.

<!-- START AUTOMATED OUTPUT FOR {command name} -->
```{terminal}
:input: pebble {command name} --help
```
<!-- END AUTOMATED OUTPUT FOR {command name} -->
```

With `{command name}` replaced by the name of the command and `{describe the command}` replaced by a suitable description.

In the `docs` directory, run `tox -e commands` to automatically update the CLI reference documentation.

A CI workflow will fail if the CLI reference documentation does not match the actual output from Pebble.

### Writing a great doc

- Use short sentences, ideally with one or two clauses.
Expand Down
85 changes: 78 additions & 7 deletions client/checks.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,23 @@
package client

import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/url"
)

type ChecksOptions struct {
// Level is the check level to query for. A check is included in the
// results if this field is not set, or if it is equal to the check's
// level.
// level. This field is ignored for start and stop actions.
Level CheckLevel

// Names is the list of check names to query for. A check is included in
// the results if this field is nil or empty slice, or if one of the
// values in the slice is equal to the check's name.
// Names is the list of check names on which to action. For querying, a
// check is included in the results if this field is nil or empty slice. For
// all actions, a check is included in the results if one of the values in
// the slice is equal to the check's name.
Names []string
}

Expand All @@ -44,8 +48,17 @@ const (
type CheckStatus string

const (
CheckStatusUp CheckStatus = "up"
CheckStatusDown CheckStatus = "down"
CheckStatusUp CheckStatus = "up"
CheckStatusDown CheckStatus = "down"
CheckStatusInactive CheckStatus = "inactive"
)

// CheckStartup defines the different startup modes for a check.
type CheckStartup string

const (
CheckStartupEnabled CheckStartup = "enabled"
CheckStartupDisabled CheckStartup = "disabled"
)

// CheckInfo holds status information for a single health check.
Expand All @@ -56,8 +69,14 @@ type CheckInfo struct {
// Level is this check's level, from the layer configuration.
Level CheckLevel `json:"level"`

// Startup is the startup mode for the check. If it is "enabled", the check
// will be started in a Pebble replan and when Pebble starts. If it is
// "disabled", it must be started manually.
Startup CheckStartup `json:"startup"`

// Status is the status of this check: "up" if healthy, "down" if the
// number of failures has reached the configured threshold.
// number of failures has reached the configured threshold, "inactive" if
// the check is inactive.
Status CheckStatus `json:"status"`

// Failures is the number of times in a row this check has failed. It is
Expand Down Expand Up @@ -100,3 +119,55 @@ func (client *Client) Checks(opts *ChecksOptions) ([]*CheckInfo, error) {
}
return checks, nil
}

// Start starts the checks named in opts.Names. We ignore ops.Level for this
// action.
func (client *Client) StartChecks(opts *ChecksOptions) (response string, err error) {
response, err = client.doMultiCheckAction("start", opts.Names)
return response, err
}

// Stop stops the checks named in opts.Names. We ignore ops.Level for this
// action.
func (client *Client) StopChecks(opts *ChecksOptions) (response string, err error) {
response, err = client.doMultiCheckAction("stop", opts.Names)
return response, err
}

type multiCheckActionData struct {
Action string `json:"action"`
Checks []string `json:"checks"`
}

func (client *Client) doMultiCheckAction(actionName string, checks []string) (changeID string, err error) {
action := multiCheckActionData{
Action: actionName,
Checks: checks,
}
data, err := json.Marshal(&action)
if err != nil {
return "", fmt.Errorf("cannot marshal multi-check action: %w", err)
}
headers := map[string]string{
"Content-Type": "application/json",
}

resp, err := client.Requester().Do(context.Background(), &RequestOptions{
Type: SyncRequest,
Method: "POST",
Path: "/v1/checks",
Query: nil,
Headers: headers,
Body: bytes.NewBuffer(data),
})
if err != nil {
return "", err
}
var response string
err = resp.DecodeResult(&response)
if err != nil {
return "", err
}

return response, nil
}
10 changes: 7 additions & 3 deletions client/checks_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ func (cs *clientSuite) TestChecksGet(c *check.C) {
cs.rsp = `{
"result": [
{"name": "chk1", "status": "up"},
{"name": "chk3", "status": "down", "failures": 42}
{"name": "chk3", "status": "down", "failures": 42},
{"name": "chk5", "status": "inactive"}
],
"status": "OK",
"status-code": 200,
Expand All @@ -35,7 +36,7 @@ func (cs *clientSuite) TestChecksGet(c *check.C) {

opts := client.ChecksOptions{
Level: client.AliveLevel,
Names: []string{"chk1", "chk3"},
Names: []string{"chk1", "chk3", "chk5"},
}
checks, err := cs.cli.Checks(&opts)
c.Assert(err, check.IsNil)
Expand All @@ -47,11 +48,14 @@ func (cs *clientSuite) TestChecksGet(c *check.C) {
Name: "chk3",
Status: client.CheckStatusDown,
Failures: 42,
}, {
Name: "chk5",
Status: client.CheckStatusInactive,
}})
c.Assert(cs.req.Method, check.Equals, "GET")
c.Assert(cs.req.URL.Path, check.Equals, "/v1/checks")
c.Assert(cs.req.URL.Query(), check.DeepEquals, url.Values{
"level": {"alive"},
"names": {"chk1", "chk3"},
"names": {"chk1", "chk3", "chk5"},
})
}
63 changes: 57 additions & 6 deletions docs/reference/cli-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ The `pebble` command has the following subcommands, organised into logical group

* Run: [run](#reference_pebble_run_command)
* Info: [help](#reference_pebble_help_command), [version](#reference_pebble_version_command)
* Plan: [add](#reference_pebble_add_command), [plan](#reference_pebble_plan_command)
* Services: [services](#reference_pebble_services_command), [logs](#reference_pebble_logs_command), [start](#reference_pebble_start_command), [restart](#reference_pebble_restart_command), [signal](#reference_pebble_signal_command), [stop](#reference_pebble_stop_command), [replan](#reference_pebble_replan_command)
* Checks: [checks](#reference_pebble_checks_command), [health](#reference_pebble_health_command)
* Plan: [add](#reference_pebble_add_command), [plan](#reference_pebble_plan_command), [replan](#reference_pebble_replan_command)
* Services: [services](#reference_pebble_services_command), [logs](#reference_pebble_logs_command), [start](#reference_pebble_start_command), [restart](#reference_pebble_restart_command), [signal](#reference_pebble_signal_command), [stop](#reference_pebble_stop_command)
* Checks: [checks](#reference_pebble_checks_command), [start-checks](#reference_pebble_start_checks_command), [stop-checks](#reference_pebble_stop_checks_command), [health](#reference_pebble_health_command)
* Files: [push](#reference_pebble_push_command), [pull](#reference_pebble_pull_command), [ls](#reference_pebble_ls_command), [mkdir](#reference_pebble_mkdir_command), [rm](#reference_pebble_rm_command), [exec](#reference_pebble_exec_command)
* Changes: [changes](#reference_pebble_changes_command), [tasks](#reference_pebble_tasks_command)
* Notices: [warnings](#reference_pebble_warnings_command), [okay](#reference_pebble_okay_command), [notices](#reference_pebble_notices_command), [notice](#reference_pebble_notice_command), [notify](#reference_pebble_notify_command)
Expand Down Expand Up @@ -236,9 +236,9 @@ Commands can be classified as follows:

Run: run
Info: help, version
Plan: add, plan
Services: services, logs, start, restart, signal, stop, replan
Checks: checks, health
Plan: add, plan, replan
Services: services, logs, start, restart, signal, stop
Checks: checks, start-checks, stop-checks, health
Files: push, pull, ls, mkdir, rm, exec
Changes: changes, tasks
Notices: warnings, okay, notices, notice, notify
Expand Down Expand Up @@ -1007,6 +1007,31 @@ pebble start srv1 srv2
```


(reference_pebble_start_checks_command)=
## start-checks

The `start-checks` command starts the checks with the provided names.

<!-- START AUTOMATED OUTPUT FOR start-checks -->
```{terminal}
:input: pebble start-checks --help
Usage:
pebble start-checks <check>...

The start-checks command starts the configured health checks provided as
positional arguments. For any checks that are already active, the command
has no effect.
```
<!-- END AUTOMATED OUTPUT FOR start-checks -->

### Examples

To start specific checks, run `pebble start-checks` followed by one or more check names. For example, to start two checks named "chk1" and "chk2", run:

```bash
pebble start-checks chk1 chk2
```

(reference_pebble_stop_command)=
## stop

Expand Down Expand Up @@ -1040,6 +1065,32 @@ pebble stop srv1
```


(reference_pebble_stop_checks_command)=
## stop-checks

The `stop-checks` command stops the checks with the provided names.

<!-- START AUTOMATED OUTPUT FOR stop-checks -->
```{terminal}
:input: pebble stop-checks --help
Usage:
pebble stop-checks <check>...

The stop-checks command stops the configured health checks provided as
positional arguments. For any checks that are inactive, the command has
no effect.
```
<!-- END AUTOMATED OUTPUT FOR stop-checks -->

### Examples

To stop specific checks, use `pebble stop-checks` followed by one or more check names. The following example stops one check named "chk1":

```bash
pebble stop-checks chk1
```


(reference_pebble_tasks_command)=
## tasks

Expand Down
28 changes: 23 additions & 5 deletions docs/reference/health-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ checks:
# Optional
level: alive | ready
# Optional
startup: enabled | disabled
# Optional
period: <duration>
# Optional
timeout: <duration>
Expand Down Expand Up @@ -107,20 +109,22 @@ checks:

test:
override: replace
startup: disabled
http:
url: http://localhost:8080/test
```

## Checks command

You can view check status using the `pebble checks` command. This reports the checks along with their status (`up` or `down`) and number of failures. For example:
You can view check status using the `pebble checks` command. This reports the checks along with their status (`up`, `down`, or `inactive`) and number of failures. For example:

```{terminal}
:input: pebble checks
Check Level Status Failures Change
up alive up 0/1 10
online ready down 1/3 13 (dial tcp 127.0.0.1:8000: connect: connection refused)
test - down 42/3 14 (Get "http://localhost:8080/": dial t... run "pebble tasks 14" for more)
Check Level Startup Status Failures Change
up alive enabled up 0/1 10
online ready enabled down 1/3 13 (dial tcp 127.0.0.1:8000: connect: connection refused)
test - disabled down 42/3 14 (Get "http://localhost:8080/": dial t... run "pebble tasks 14" for more)
extra - disabled inactive - -
```

The "Failures" column shows the current number of failures since the check started failing, a slash, and the configured threshold.
Expand All @@ -132,10 +136,24 @@ Health checks are implemented using two change kinds:
* `perform-check`: drives the check while it's "up". The change finishes when the number of failures hits the threshold, at which point the change switches to Error status and a `recover-check` change is spawned. Each check failure records a task log.
* `recover-check`: drives the check while it's "down". The change finishes when the check starts succeeding again, at which point the change switches to Done status and a new `perform-check` change is spawned. Again, each check failure records a task log.

When a check is stopped, the active `perform-check` or `recover-check` change is aborted. When a stopped (inactive) check is started, a new `perform-check` change is created for the check.

## Start-checks and stop-checks commands

You can stop one or more checks using the `pebble stop-checks` command. A stopped check shows in the `pebble checks` output as "inactive" status, and the check will no longer be executed until the check is started again. Stopped (inactive) checks appear in check lists but do not contribute to any overall health calculations - they behave as if the check did not exist.

A stopped check that has `startup` set to `enabled` will be started in a `replan` operation and when the layer is first added. Stopped checks can also be manually started via the `pebble start-checks` command.

Checks that have `startup` set to `disabled` will be added in a stopped (inactive) state. These checks will only be started when instructed by a `pebble start-checks` command.

Including a check that is already running in a `start-checks` command, or including a check that is already stopped (inactive) in a `stop-checks` command is always safe and will simply have no effect on the check.

## Health endpoint

If the `--http` option was given when starting `pebble run`, Pebble exposes a `/v1/health` HTTP endpoint that allows a user to query the health of configured checks, optionally filtered by check level with the query string `?level=<level>` This endpoint returns an HTTP 200 status if the checks are healthy, HTTP 502 otherwise.

Stopped (inactive) checks are ignored for health calculations.

Each check can specify a `level` of "alive" or "ready". These have semantic meaning: "alive" means the check or the service it's connected to is up and running; "ready" means it's properly accepting network traffic. These correspond to [Kubernetes "liveness" and "readiness" probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/).

The tool running the Pebble server can make use of this, for example, under Kubernetes you could initialize its liveness and readiness probes to hit Pebble's `/v1/health` endpoint with `?level=alive` and `?level=ready` filters, respectively.
Expand Down
4 changes: 4 additions & 0 deletions docs/reference/layer-specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@ checks:
# section in the docs for details.
level: alive | ready

# (Optional) Control whether the check is started automatically when
# Pebble starts or performs a 'replan' operation. Default is "enabled".
startup: enabled | disabled

# (Optional) Check is run every time this period (time interval)
# elapses. Must not be zero. Default is "10s".
period: <duration>
Expand Down
39 changes: 39 additions & 0 deletions docs/specs/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,45 @@ paths:
}
]
}
post:
summary: Manage checks
description: Perform a check operation such as start or stop.
tags:
- checks
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
action:
type: string
description: The action to perform.
enum: ["start", "stop"]
checks:
type: array
description: |
A list of service names. Required.
items:
type: string
example:
{"action": "start", "checks": ["svc1"]}
responses:
"202":
description: Accepted - asynchronous operation started.
content:
application/json:
schema:
$ref: "#/components/schemas/PostServicesResponse"
example:
{
"type": "async",
"status-code": 202,
"status": "Accepted",
"change": "25",
"result": null
}
/v1/exec:
post:
summary: Execute a command
Expand Down
Loading
Loading