Open Policy Agent is an open source Cloud Native Foundation project which aims to implement policies enforcement for creating, updating and deleting Kubernetes objects operations. Open Policy Agent is based on Admission Controller which intercepts Kubernetes API calls to verify the requested objects against the created policies. OPA utilizes a declarative Rego language specifically designed for policies implementation that allows to iterate through and transform structured documents. It’s based on Datalog, a query language but Rego extends it.
Gatekeeper is a project that provides integration between OPA and Kubernetes, adding some additional features. It is a project that bases the integration on the use of Validation Webhooks that apply OPA policies through Gatekeeper’s own Custom Resource Definitions. And these webhooks fire every time a Kubernetes object is updated, created or deleted. On the other hand, Gatekeeper integrates an audit feature that registers all the events related to the status of a constraint by enabling periodic evaluation of resources against the policies. Audit configuration is defined when creating the Gatekeeper resource and according to this configuration will be based on auditing from replicated data in caché or via Kubernetes API.
Gatekeeper operator can be installed via Red Hat Marketplace or by creating a Subscription resource.
-
Console:
-
Navigate to Operators marketplace
-
Install "Gatekeeper Operator"
-
Create Gatekeeper resource to configure the audit and admission webhook features. This object is used by the operator for deploying the respective components with the specific configuration.
-
-
Subscription:
-
Create Subscription resource as in config/install-operator.yaml
-
Create Gatekeeper resource as in config/create-gatekeeper.yaml
-
Additionally in this repo you can find a script called "setup.sh" which creates all the namespaces requires for this demo lab plus the operator resources for installation and constraints.
./setup.sh
Apart from Openshift Gatekeeper operator, you can install this feature using helm charts available here.
A ConstraintTemplate is a Custom Resource Definition that describes the Rego rule to implement a constraint and the Kind of constraint to be instanced. This Rego rule defines the logic policy to be enforced by iterating through the json object requested while the instance of the constraint will define input parameters and namespaces to apply this constraint.
Here you can find information about the policy reference.
Constraints are instances of the template scheme definition. After creating a ConstraintTemplate, the Kind specified on the scheme can be instanced. When creating a constraint, on spec section are defined the enforcement mode, match namespaces and kinds and input parameters to evaluate the implemented policy.
Constraints define:
-
Parameters: threshold values.
-
Kinds: list of object to which the constraint will apply.
-
Scope: cluster-scoped or namespace-scope resources affected by the constraint. This works together with namespaces excluded by config file.
-
Namespace: apply the constraint to an specific namespace.
-
Excluded namespace: apply the constraint to a non listed namespace.
-
Label selector: apply constraint to these labeled resources.
-
Namespace selector: apply constraint to specific synced namespaces.
Gatekeeper is implemented as a set pg Kubernetes admission webhooks, one for checking a request against the installed constraints and another one for checking labels on namespace requests to bypass certain constraints.
The ValidationWebhookConfiguration object allows to modify admission webhooks on the fly, so after creating the Gatekeeper object a new ValidationWebhookConfiguration fully editable will be created. By default, this object forces to audit all objects created in Openshift.
It is possible to modify this object to limit the number of objects and namespaces audited in order to improve the Gatekeeper service performance.
Overriding this configuration is not covered on this repo but you can find the information here.
You can check current admission hook configuration with this command:
oc get ValidatingWebhookConfiguration gatekeeper-validating-webhook-configuration -o yaml
Config Gatekeeper object is an object to apply the general configuration to control specific cache sync, match and validation settings. ConfigSpec section at Gatekeeper object yaml defines the desired state of Config and its features and is not defined by default.
One of the main purposes of this resource is the possibility of excluding certaing namespaces for policy implementing. You can take a look to the documentation here here.
Open Policy Agent runs on the cluster as an instance while Audit feature is implemented as an Audit Controller which periodically queries the OPA audit endpoint, and evaluates watched object against rego policies, thus writing results to constraint resources. The Source of truth for objects to be audited can be via discovery client or from caché. Discovery client list all objects matching all kinds and once all are listed, exclude those whose namespaces are excluded at config level resource. Then every object is reviewed against the audit endpoint of OPA instance and response return values will be populated to constraint status. If the audit process is performed
Audit configuration values like memory consumption, scope or limits can be overrided to improve performance. Those are defined when creating the Gatekeeper resource. Audit feature must be properly configured to get a good performance and ensure availability of the service.
Some of these values are:
-
Constraint violations limit: default to 20.
-
Audit chunk size: default to infinite. To limit memory consumption of the auditing Pod.
-
Audit interval: default to 60 seconds.
-
Audit from cache: default to false.
Constraints must specify an enforcement action which is deny by default. Other option is dryrun mode which allows to test constraint without making actual changes while are registered as violations in the audit status section. Logs details are configured when creating the Gatekeeper resource. Log levels ranges between DEBUG, INFO, WARNING and ERROR.
Additionally in Config resource you can enable traces for some resources and a specific user. These traces will be logged to the stdout of the Gatekeeper controller.
Config resource defines a list of object to be synced by defining group, version and kind. Once this list of objects is synced, they can be accesed via data inventory document following this structure:
-
data.inventory.cluster-group-kind-name
-
data.inventory.namespace-group-kind-name
This feature is interesting not only for its potential to improve performance but it allows to implement rules which require access to other resources than the one observed directly by the rule.
Rego rules are written using variable assignations so that OPA searches for variable bindings that make all of the expressions true. The rule itself can be understood as: rule violation is "msg" if [body rule]. Msg variable will be evaluated according to the result of the rule while if this variable is omitted, it defaults to true.
Policies are tested both as unit testing and functional testing. Functional testing is verified on validation section Gatekeeper end to end bundle perform. For unit testing OPA provides a framework to write and execute unit tests. Those rego policies are decoupled from the constraint template file definition so constraints are assembled at runtime.
Unit tests are expressed as standard Rego rules with a convention that the rule is prefixed with test_. Each test will verify rules values so every case must be covered. Once every tests are defined in a .rego file format those can be executed using this command:
opa test -v rego/maxreplicas/maxreplicas.rego rego/maxreplicas/maxreplicas_test.rego
Test results can be FAIL if the test rule is undefined or generates a non true test result, ERROR if it encounters a runtime error, SKIPPED if it is marked as todo_ or PASS if variable binding is correct. Additionally we can verify test coverage to check the amount of lines covered:
opa test -v rego/maxreplicas/maxreplicas.rego rego/maxreplicas/maxreplicas_test.rego --coverage
For this demo lab you will find unit testing for one of the rules, but those can be developed for any kind of policy.
Additionally you can rely on the Rego Playground framework for developing, testing and debugging your policies. On top of that you can share your policies with your peer to ease the policies development. You can check the existing playground for the previous policy here.
Here you can find some basic examples about how to implement restrictions and how they work. If you run the ./setup.sh script you will deploy a list of resources that will be tested by creating good and bad resources to test positive and negative violation cases.
Here you can check webhook and audit configuration values as well as validation.
oc get Subscription gatekeeper-operator-product -n openshift-operators -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: gatekeeper-operator-product
namespace: openshift-operators
spec:
channel: stable
installPlanApproval: Automatic
name: gatekeeper-operator-product
source: redhat-operators
sourceNamespace: openshift-marketplace
startingCSV: gatekeeper-operator-product.v0.1.2
oc get gatekeeper gatekeeper -o yaml
apiVersion: operator.gatekeeper.sh/v1alpha1
kind: Gatekeeper
metadata:
name: gatekeeper
spec:
validatingWebhook: Enabled
webhook:
logLevel: DEBUG
replicas: 2
image:
image: >-
registry.redhat.io/rhacm2/gatekeeper-rhel8@sha256:5e66cd510a80ef5753c66c6b50137de0093fe75c0606f5f8ce4afce7d7bca050
audit:
logLevel: DEBUG
replicas: 1
oc get config.config.gatekeeper.sh/config -o yaml -n openshift-gatekeeper-system
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
name: config
namespace: "openshift-gatekeeper-system"
spec:
sync:
syncOnly:
- group: ""
version: "v1"
kind: "ResourceQuota"
match:
- excludedNamespaces: ["gatekeeper-project-excluded"]
processes: ["*"]
Later on you will deploy a series of constraints and templates tested in the next steps.
With this rule you are limiting the amount of replicas for a deployment. This constraint is limited to namespace "gatekeeper-project" and resource "Deployment". Enforcement action is "Deny" and max replicas allowed is 3.
This means you won’t be able to create a deployment with more replicas than allowed and you will be prompted with error message "Deployment %v pods is higher than the maximum allowed of 3".
If you try to create a deployment in a different namespace (not excluded by Config) this constraint won’t apply.
oc apply -f examples/deployment-no-project.yaml
Expected result: Fail.
oc apply -f examples/deployment-no.yaml -n gatekeeper-system
Expected result: Ok.
In this case, constraint is limitating the resources a Pod can request (memory and cpu) within the whole cluster less excluded namespace "gatekeeper-project-excluded" namespace. As memory and cpu resources request can be measured in different units it would be useful to estandarize this calculation to be able to convert constraint limit unit to a different one.
oc apply -f examples/pod-no.yaml -n gatekeeper-project
Expected result: Fail.
As this namespace is excluded for this constraint, you should be able to create pod which exceed request parameters.
oc apply -f examples/pod-no.yaml -n gatekeeper-project-excluded
Expected result: Ok.
oc apply -f examples/pod-no.yaml -n gatekeeper-system
Expected result: Fail.
This deployment will create a ReplicaSet resource which won’t be able to scale as Pod doesn’t fulfill requirements. If you go to ReplicaSet events, you should be prompted with an error message as your deployment is trying to create Pods which request higher values than allowed.
oc apply -f examples/deployment-pod-no.yaml -n gatekeeper-project
Expected result: Fail.
In this example we are going to use Audit feature to access more resources synced in cache apart from the resource under test. This means that all the resources specified at Config (syncOnly) can be accessed via data.properties.
This constraint within "gatekeeper-resourcequota" namespace won’t allow to deploy a pod into a namespace without an existing resource quota. Constraint matches new pods, and the template defines the Rego rule to check existing resource quotas via data.properties.
To test audit feature, patch the existing constraints to dryrun mode and then recreate all the non valid resources. This time you will be able to create the objects but those will be audited.
oc patch k8smaxpods.constraints.gatekeeper.sh/deployment-max-pods -p '{"spec":{"enforcementAction":"dryrun"}}' --type merge
oc patch k8smaxrequests.constraints.gatekeeper.sh/pod-max-requests -p '{"spec":{"enforcementAction":"dryrun"}}' --type merge
oc patch k8sresourcequota.constraints.gatekeeper.sh/resourcequota -p '{"spec":{"enforcementAction":"dryrun"}}' --type merge
Expected result: Non compliant resources are created.