-
Notifications
You must be signed in to change notification settings - Fork 625
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add RFC - Custom Health Checks for Kustomization using Common Express…
…ion Language(CEL) Signed-off-by: Soule BA <[email protected]>
- Loading branch information
Showing
1 changed file
with
330 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,330 @@ | ||
# RFC-0000 Custom Health Checks for Kustomization using Common Expression Language(CEL) | ||
|
||
**Status:** provisional | ||
|
||
**Creation date:** 2024-01-05 | ||
|
||
**Last update:** 2024-01-05 | ||
|
||
## Summary | ||
|
||
This RFC proposes to support customization of the status readers in `Kustomizations` | ||
during the `healthCheck` phase for custom resources. The user will be able to declare | ||
the needed `conditions` in order to compute a custom resource status. | ||
In order to provide flexibility, we propose to use `CEL` expressions to declare | ||
the expected conditions and their status. | ||
This will introduce a new field `customHealthChecks` in the `Kustomization` CRD | ||
which will be a list of `CustomHealthCheck` objects. | ||
|
||
## Motivation | ||
|
||
Flux uses the `Kstatus` library during the `healthCheck` phase to compute owned | ||
resources status. This works just fine for all standard resources and custom resources | ||
that comply with `Kstatus` interfaces. | ||
|
||
In the current Kustomization implementation, we have addressed such a problem for | ||
kubernetes Jobs. We have implemented a `customJobStatusReader` that computes the | ||
status of a Job based on a defined set of conditions. This is a good solution for | ||
Jobs, but it is not generic and thus not applicable to other custom resources. | ||
|
||
Another use case is relying on non-standard `conditions` to compute the status of | ||
a custom resource. For example, we might want to compute the status of a custom | ||
resource based on a condtion other then `Ready`. This is the case for `Resources` | ||
that do intermediate patching like `Certificate` where you should look at the `Issued` | ||
condition to know if the certificate has been issued or not before looking at the | ||
`Ready` condition. | ||
|
||
In order to provide a generic solution for custom resources, that would not imply | ||
writing a custom status reader for each new custom resource, we need to provide a | ||
way for the user to express the `conditions` that need to be met in order to compute | ||
the status of a given custom resource. And we need to do this in a way that is | ||
flexible enough to cover all possible use cases, without having to change `Flux` | ||
source code for each new use case. | ||
|
||
### Goals | ||
|
||
- provide a generic solution for user to customize the health check of custom resources | ||
- support non-standard resources in `kustomize-controller` | ||
|
||
### Non-Goals | ||
|
||
- We do not plan to support custom `healthChecks` for core resources. | ||
|
||
## Proposal | ||
|
||
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD | ||
|
||
The `CustomHealthChecksExprs` field will be a list of `CustomHealthCheck` objects. | ||
Each `CustomHealthChecksExprs` object will have a `apiVersion`, `kind`, `inProgress`, | ||
`failed` and `current` fields. | ||
|
||
To give an example, here is how we would declare a custom health check for a `Certificate` | ||
resource: | ||
|
||
```yaml | ||
--- | ||
apiVersion: cert-manager.io/v1 | ||
kind: Certificate | ||
metadata: | ||
name: app-certificate | ||
namespace: cert-manager | ||
spec: | ||
commonName: cert-manager-tls | ||
dnsNames: | ||
- app.ns.svc.cluster.local | ||
ipAddresses: | ||
- x.x.x.x | ||
isCA: true | ||
issuerRef: | ||
group: cert-manager.io | ||
kind: ClusterIssuer | ||
name: app-issuer | ||
privateKey: | ||
algorithm: RSA | ||
encoding: PKCS1 | ||
size: 2048 | ||
secretName: app-tls-certs | ||
subject: | ||
organizations: | ||
- example.com | ||
``` | ||
This `Certificate` resource will transition through the following `conditions`: | ||
`Issuing` and `Ready`. | ||
|
||
In order to compute the status of this resource, we need to look at both the `Issuing` | ||
and `Ready` conditions. | ||
|
||
The resulting `Kustomization` object will look like this: | ||
|
||
```yaml | ||
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1 | ||
kind: Kustomization | ||
metadata: | ||
name: application-kustomization | ||
spec: | ||
force: false | ||
interval: 5m0s | ||
path: ./overlays/application | ||
prune: false | ||
sourceRef: | ||
kind: GitRepository | ||
name: application-git | ||
healthChecks: | ||
- apiVersion: cert-manager.io/v1 | ||
kind: Certificate | ||
name: service-certificate | ||
namespace: cert-manager | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: app | ||
namespace: app | ||
customHealthChecksExprs: | ||
- apiVersion: cert-manager.io/v1 | ||
kind: Certificate | ||
inProgress: "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" | ||
failed: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" | ||
current: "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" | ||
``` | ||
|
||
The `HealthChecks` field still contains the objects that should be included in | ||
the health assessment. The `CustomHealthChecksExprs` field will be used to declare | ||
the `conditions` that need to be met in order to compute the status of the custom resource. | ||
|
||
Note that all core resources are discarded from the `CustomHealthChecksExprs` field. | ||
|
||
|
||
#### Provide an evaluator for `CEL` expressions for users | ||
|
||
We will provide a CEL environment that can be used by the user to evaluate `CEL` | ||
expressions. Users will use it to test their expressions before applying them to | ||
their `Kustomization` object. | ||
|
||
```shell | ||
$ flux eval --api-version cert-manager.io/v1 --kind Certificate --in-progress "status.conditions.filter(e, e.type == 'Issuing').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --failed "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'False')" --current "status.conditions.filter(e, e.type == 'Ready').all(e, e.observedGeneration == metadata.generation && e.status == 'True')" --file ./custom_resource.yaml | ||
``` | ||
|
||
### User Stories | ||
|
||
#### Configure custom health checks for a custom resource | ||
|
||
> As a user of Flux, I want to be able to specify custom health checks for my | ||
> custom resources, so that I can have more control over the status of my | ||
> resources. | ||
|
||
#### Enable health checks support in Flux for non-standard resources | ||
|
||
> As a user of Flux, I want to be able to use the health check feature for | ||
> non-standard resources, so that I can have more control over the status of my | ||
> resources. | ||
|
||
### Alternatives | ||
|
||
We need an expression language that is flexible enough to cover all possible use | ||
cases, without having to change `Flux` source code for each new use case. | ||
|
||
On alternative that have been considered is to use `cuelang` instead of `CEL`. | ||
`cuelang` is a more powerful expression language, but it is also more complex and | ||
requires more work to integrate with `Flux`. it also does not have any support in | ||
`Kubernetes` yet while `CEL` is already used in `Kubernetes` and libraries are | ||
available to use it. | ||
|
||
## Design Details | ||
|
||
### Introduce a new field `CustomHealthChecksExprs` in the `Kustomization` CRD | ||
|
||
The `api/v1/kustomization_types.go` file will be updated to add the `CustomHealthChecksExprs` | ||
field to the `KustomizationSpec` struct. | ||
|
||
```go | ||
type KustomizationSpec struct { | ||
... | ||
// A list of resources to be included in the health assessment. | ||
// +optional | ||
HealthChecks []meta.NamespacedObjectKindReference `json:"healthChecks,omitempty"` | ||
|
||
// A list of custom health checks expressed as CEL expressions. | ||
// The CEL expression must evaluate to a boolean value. | ||
// +optional | ||
CustomHealthChecksExprs []CustomHealthCheckExprs `json:"customHealthChecksExprs,omitempty"` | ||
... | ||
} | ||
|
||
// CustomHealthCheckExprs defines the CEL expressions for custom health checks. | ||
// The CEL expressions must evaluate to a boolean value. The expressions are used | ||
// to determine the status of the custom resource. | ||
type CustomHealthCheckExprs struct { | ||
// apiVersion of the custom health check. | ||
// +required | ||
APIVersion string `json:"apiVersion"` | ||
// Kind of the custom health check. | ||
// +required | ||
Kind string `json:"kind"` | ||
// InProgress is the CEL expression that verifies that the status | ||
// of the custom resource is in progress. | ||
// +optional | ||
InProgress string `json:"inProgress"` | ||
// Failed is the CEL expression that verifies that the status | ||
// of the custom resource is failed. | ||
// +optional | ||
Failed string `json:"failed"` | ||
// Current is the CEL expression that verifies that the status | ||
// of the custom resource is ready. | ||
// +optional | ||
Current string `json:"current"` | ||
} | ||
``` | ||
|
||
### Introduce a generic custom status reader | ||
|
||
Introduce a generic custom status reader that will be able to compute the status of | ||
a custom resource based on a list of `conditions` that need to be met. | ||
|
||
```go | ||
import ( | ||
"k8s.io/apimachinery/pkg/runtime/schema" | ||
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/engine" | ||
"sigs.k8s.io/cli-utils/pkg/kstatus/polling/event" | ||
kstatusreaders "sigs.k8s.io/cli-utils/pkg/kstatus/polling/statusreaders" | ||
) | ||
type customGenericStatusReader struct { | ||
genericStatusReader engine.StatusReader | ||
gvk schema.GroupVersionKind | ||
} | ||
|
||
func NewCustomGenericStatusReader(mapper meta.RESTMapper, gvk schema.GroupVersionKind, exprs map[string]string) engine.StatusReader { | ||
genericStatusReader := kstatusreaders.NewGenericStatusReader(mapper, genericConditions(gvk.Kind, exprs)) | ||
return &customJobStatusReader{ | ||
genericStatusReader: genericStatusReader, | ||
gvk: gvk, | ||
} | ||
} | ||
|
||
func (g *customGenericStatusReader) Supports(gk schema.GroupKind) bool { | ||
return gk == g.gvk.GroupKind() | ||
} | ||
|
||
func (g *customGenericStatusReader) ReadStatus(ctx context.Context, reader engine.ClusterReader, resource object.ObjMetadata) (*event.ResourceStatus, error) { | ||
return g.genericStatusReader.ReadStatus(ctx, reader, resource) | ||
} | ||
|
||
func (g *customGenericStatusReader) ReadStatusForObject(ctx context.Context, reader engine.ClusterReader, resource *unstructured.Unstructured) (*event.ResourceStatus, error) { | ||
return g.genericStatusReader.ReadStatusForObject(ctx, reader, resource) | ||
} | ||
``` | ||
|
||
A `genericConditions` closure will takes a `kind` and a map of `CEL` expressions as parameters | ||
and returns a function that takes an `Unstructured` object and returns a `status.Result` object. | ||
|
||
````go | ||
import ( | ||
"sigs.k8s.io/cli-utils/pkg/kstatus/status" | ||
"github.com/fluxcd/pkg/runtime/cel" | ||
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured" | ||
) | ||
|
||
func genericConditions(kind string, exprs map[string]string) func(u *unstructured.Unstructured) (*status.Result, error) { | ||
return func(u *unstructured.Unstructured) (*status.Result, error) { | ||
obj := u.UnstructuredContent() | ||
|
||
for statusKey, expr := range exprs { | ||
// Use CEL to evaluate the expression | ||
result, err := cel.ProcessExpr(expr, obj) | ||
if err != nil { | ||
return nil, err | ||
} | ||
switch statusKey { | ||
case status.CurrentStatus.String(): | ||
// If the expression evaluates to true, we return the current status | ||
case status.FailedStatus.String(): | ||
// If the expression evaluates to true, we return the failed status | ||
case status.InProgressStatus.String(): | ||
// If the expression evaluates to true, we return the reconciling status | ||
} | ||
} | ||
} | ||
} | ||
```` | ||
|
||
The generic status reader will be used by the `statusPoller` provided to the `reconciler` | ||
to compute the status of the resources for the registered custom resources `kind`. | ||
|
||
We will provide a `CEL` environment that will use the Kubernetes CEL library to | ||
evaluate the `CEL` expressions. | ||
|
||
### StatusPoller configuration | ||
|
||
The `reconciler` holds a `statusPoller` that is used to compute the status of the | ||
resources during the `healthCheck` phase of the reconciliation. The `statusPoller` | ||
is configured with a list of `statusReaders` that are used to compute the status | ||
of the resources. | ||
|
||
The `statusPoller` is not configurable once instantiated. This means | ||
that we cannot add new `statusReaders` to the `statusPoller` once it is created. | ||
This is a problem for custom resources because we need to be able to add new | ||
`statusReaders` for each new custom resource that is declared in the `Kustomization` | ||
object's `customHealthChecksExprs` field. Fortunately, the `cli-utils` library has | ||
been forked in the `fluxcd` organization and we can make a change to the `statusPoller` | ||
exposed the `statusReaders` field so that we can add new `statusReaders` to it. | ||
|
||
|
||
The `statusPoller` used by `kustomize-controller` will be updated for every reconciliation | ||
in order to add new polling options for custom resources that have a `CustomHealthChecksExprs` | ||
field defined in their `Kustomization` object. | ||
|
||
### K8s CEL Library | ||
|
||
The `K8s CEL Library` is a library that provides `CEL` functions to help in evaluating | ||
`CEL` expressions on `Kubernetes` objects. | ||
|
||
Unfortunately, this means that we will need to follow the `K8s CEL Library` releases | ||
in order to make sure that we are using the same version of the `CEL` library as | ||
`Kubernetes`. As of the time of writing this RFC, the `K8s CEL Library` is using the | ||
`v0.16.1` version of the `CEL` library while the latest version of the `CEL` library | ||
is `v0.18.2`. This means that we will need to use the `v0.16.1` version of the `CEL` | ||
library in order to be able to use the `K8s CEL Library`. | ||
|
||
|
||
## Implementation History | ||
|
||
See current POC implementation under https://github.com/souleb/kustomize-controller/tree/cel-based-custom-health |