-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconciler error: "error":"getting logs for pod" #1031
Comments
Today I have again an "scan-vulnerabilityreport" pod and corresponding job which are in status "Completed". But starboard operator has following error:
The last part of the log:
|
What about VulnerabilityReport? Is it created after all? |
Yes the VR for the specific image exists. But new VR were not created. |
I'm not sure I understood. What do you mean by "new VR"? |
We use the "OPERATOR_VULNERABILITY_SCANNER_REPORT_TTL" (vulnerabilityScannerReportTTL) parameter. So that the VulnerabilityReports (VRs) are generated every 24h. |
I'm also getting the "error":"unexpected EOF" message on some scans. Looking at the log of the job, I see the JSON response ends at: {
...,
"Results": [
....
] The interesting thing is one of the images this happens on has already been scanned from another deployment and had no issues. The vulnerability report is NOT created when this happens. Additional Note: |
Today it happend again:
One of the scan-jobs is completed, but the above last block in the JSON is invalid, because it ends suddenly. Only after deletion of the "hanging" scan-job, starboard-operator starts other scans. |
I can confirm the issue that @albertschwarzkopf mentioned in #1031 (comment) with two different kind of errors. I guess I can avoid the errors with changes on my side by fixing some security issues with the xdebug image (then the EOF error should disappear) and fixing the version skew with the starboard-operator and trivy images that I use (I did not notice that the public.ecr.aws/aquasecurity/starboard-operator image is not up-to-date and that I am multiple versions ahead with the trivy image from what the starboard chart uses). Still, I wish starboard-operator would be more fault-tolerant because when detailed versions, logs, error messages, etc.
Error: unexpected EOFFrom the starboard-operator logs: {
"level": "error",
"ts": 1652870248.4415047,
"logger": "controller.job",
"msg": "Reconciler error",
"reconciler group": "batch",
"reconciler kind": "Job",
"name": "scan-vulnerabilityreport-5c57c4d49",
"namespace": "starboard-operator",
"error": "unexpected EOF",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"
} The pod has 3 containers but only one of them contains invalid JSON:
I won't paste all 13638 log lines here, but it looks like valid JSON until the output stops in the middle: {
"VulnerabilityID": "CVE-2018-25032",
"VendorIDs": [
"DSA-5111-1"
],
"PkgName": "zlib1g",
"InstalledVersion": "1:1.2.11.dfsg-2",
"FixedVersion": "1:1.2.11.dfsg-2+deb11u1",
"Layer": {
"Digest": "sha256:7d63c13d9b9b6ec5f05a2b07daadacaa9c610d01102a662ae9b1d082105f1ffa",
"DiffID": "sha256:e8b689711f21f9301c40bf2131ce1a1905c3aa09def1de5ec43cf0adf652576e"
},
"SeveritySource": "nvd",
"PrimaryURL": "https://avd.aquasec.com/nvd/cve-2018-25032",
"DataSource": {
"ID": "debian",
"Name": "Debian Security Tracker",
"URL": "https://salsa.debian.org/security-tracker-team/security-tracker"
},
"Title": "zlib: A flaw found in zlib when compressing (not decompressing) certain inputs",
"Description": "zlib before 1.2.12 allows memory corruption when deflating (i.e., when compressing) if the input has many distant matches.",
"Severity": "HIGH",
"CweIDs": [
"CWE-787"
],
"CVSS": {
"nvd": {
"V2Vector": "AV:N/AC:L/Au:N/C:N/I:N/A:P",
"V3Vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H",
"V2Score": 5,
"V3Score": 7.5
},
"redhat": {
"V3Vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:H",
"V3Score": 8.2
}
},
"References": [
"http://www.openwall.com/lists/oss-security/2022/03/25/2",
"http://www.openwall.com/lists/oss-security/2022/03/26/1",
"https://access.redhat.com/security/cve/CVE-2018-25032",
"https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-25032",
"https://github.com/madler/zlib/commit/5c44459c3b28a9bd3283aaceab7c615f8020c531", Error: json: cannot unmarshal number into Go value of type trivy.ScanReportFrom the starboard-operator logs: {
"level": "error",
"ts": 1652869244.7950468,
"logger": "controller.job",
"msg": "Reconciler error",
"reconciler group": "batch",
"reconciler kind": "Job",
"name": "scan-vulnerabilityreport-54f9f659bb",
"namespace": "starboard-operator",
"error": "json: cannot unmarshal number into Go value of type trivy.ScanReport",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"
} Command of the corresponding pod: args:
- --quiet
- client
- --format
- json
- --remote
- http://trivy:4954
- quay.io/ceph/ceph:v16.2.7@sha256:00965b7e88c0cef116e6a47107051a4bfe952139e7b94c6cefd6607cf38a3f0f
command:
- trivy Logs of the corresponding pod:
|
What steps did you take and what happened:
After few days starboard operator stucks in following error:
"level":"error","ts":1647269605.345396,"logger":"controller.job","msg":"Reconciler error","reconciler group":"batch","reconciler kind":"Job","name":"scan-vulnerabilityreport-787ccf9b67","namespace":"starboard-system","error":"getting logs for pod \"starboard-system/scan-vulnerabilityreport-787ccf9b67\": getting pod controlled by job: \"starboard-system/scan-vulnerabilityreport-787ccf9b67\": pod not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
I can see the finished job "scan-vulnerabilityreport-787ccf9b67" in its status "Complete". But there is no pod for this job. Maybe the pod was deleted because the worker node was terminated (because we use spot instances in AWS EKS). Is it possible that such completed job are deleted after X hours,days,...? E.g. ttlSecondsAfterFinished for K8s-jobs?
Environment:
We use Starboard-Operator combined with trivy in client-server-mode. Deployed via Helm Charts:
Starboard Operator Helm-Chart-Version: 0.9.1 (app-version 0.14.1)
Trivy-Server Helm-Chart-Version: 0.4.10 (app-version: 0.24.0)
AWS EKS 1.21 (Bottlerocket OS and AmazonLinux 2)
The text was updated successfully, but these errors were encountered: