-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prometheus is missing container metrics from certain nodes #970
Comments
Internal related issue: https://github.com/defenseunicorns/uds-infrastructure/issues/573 |
Thanks to @rjferguson21's suggestion, we've been able to confirm that the |
Would suggest to resolve this we build an Code links for current kubeapi logic:
Once this is added as a generated target we can add it to Prometheus and make sure that the traffic works as expected. |
## Description Adds a new generator / target called `KubeNodes` that contains the internal IP addresses of nodes in the cluster. **NOTE:** ~I have no idea (yet) wher the `docs/reference/` file changes came from.~ They appear to be missing on `main`. ## Related Issue Relates to #970 . `Steps to Validate` include steps to verify 970 gets fixed. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Other (security config, docs update, etc) ## Steps to Validate <details> ### Setup and verify behavior of the target Create a k3d cluster named `uds` (we use names later for adding nodes): ```bash k3d cluster create uds ``` Deploy slim-dev: ```bash uds run slim-dev ``` Create and deploy monitoring layer: ```bash uds run -f ./tasks/create.yaml single-layer-callable --set LAYER=monitoring uds run -f ./tasks/deploy.yaml single-layer-callable --set LAYER=monitoring ``` Create and deploy metrics-server layer: ```bash uds run -f ./tasks/create.yaml single-layer-callable --set LAYER=metrics-server uds run -f ./tasks/deploy.yaml single-layer-callable --set LAYER=metrics-server ``` Inspect the network policy for scraping of kube nodes: ```bash kubectl describe networkpolicy allow-prometheus-stack-egress-metrics-scraping-of-kube-nodes -n monitoring ``` The `spec:` part is the relevant part, and should contain the IPs of the nodes: ```bash Spec: PodSelector: app.kubernetes.io/name=prometheus Not affecting ingress traffic Allowing egress traffic: To Port: <any> (traffic allowed to all ports) To: IPBlock: CIDR: 172.28.0.2/32 Except: Policy Types: Egress ``` Add a node: ```bash k3d node create extra1 --cluster uds --wait --memory 500M ``` Verify the internal IP of the new node: ```bash kubectl get nodes -o custom-columns="NAME:.metadata.name,INTERNAL-IP:.status.addresses[?(@.type=='InternalIP')].address" ``` Re-get the netpol to verify the new ip is in the `spec:` block: ```bash kubectl describe networkpolicy allow-prometheus-stack-egress-metrics-scraping-of-kube-nodes -n monitorin ``` Should now be something like this: ```bash Spec: PodSelector: app.kubernetes.io/name=prometheus Not affecting ingress traffic Allowing egress traffic: To Port: <any> (traffic allowed to all ports) To: IPBlock: CIDR: 172.28.0.2/32 Except: To: IPBlock: CIDR: 172.28.0.4/32 Except: Policy Types: Egress ``` ### Verify Prometheus can read things Connect directly to prometheus: ```bash kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090 ``` Visit http://localhost:9090/ Execute this expression to see all node/cpu data: ```bash node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate ``` To see just info from the `extra1` node: ```bash node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{node=~"^k3d-extra.*"} ``` Add a new node: ```bash k3d node create extra2 --cluster uds --wait --memory 500M ``` Verify the netpol updates: ```bash kubectl describe networkpolicy allow-prometheus-stack-egress-metrics-scraping-of-kube-nodes -n monitorin ``` Re-execute the Prometheus query from above. It make take a few minutes for `extra2` to show up though. Not sure why. Delete a node and verify the spec updates again: ```bash kubectl delete node k3d-extra1-0 && k3d node delete k3d-extra1-0 ``` Re-reading the netpol should should the removal of that IP </details> ## Checklist before merging - [x] Test, docs, adr added or updated as needed - [x] [Contributor Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md) followed --------- Signed-off-by: catsby <[email protected]> Co-authored-by: Micah Nagel <[email protected]>
This was completed in the linked PR, thanks @catsby |
Environment
Device and OS: darwin arm64
App version: v0.29.1-unicorn
Kubernetes distro being used: k3d with two nodes
Steps to reproduce
Expected result
Container metrics such as CPU and Memory utilization should be queryable
Actual Result
Prometheus only returns metrics from pods that are scheduled on control plane nodes
Visual Proof (screenshots, videos, text, etc)
Metrics returned for
data:image/s3,"s3://crabby-images/d5276/d527620b5018ee61e1390665cd38f8790adba3ce" alt="image"
container_cpu_usage_seconds
No metrics returned when filtering out control plane node:
data:image/s3,"s3://crabby-images/5c352/5c352abcca795856a21d822d044b5598c5a09ed2" alt="image"
Severity/Priority
Moderate
Additional Context
Removing all
NetworkPolicies
in themonitoring
namespace allows Prometheus to pick up metrics from the missing nodes.The text was updated successfully, but these errors were encountered: