You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a Jaeger operator,
I want to be able to limit the execution time of the jaeger-spark-dependencies Job,
so that I can ensure the Job is not running forever and blocking/wasting resources.
Problem
The spark-dependency spark jobs (the actual Spark jobs inside the JVM) often run into OutOfMemory issues.
The actual problem here is, that the Container does not fail (exit), even though the Spark job already failed.
To solve this issue lasting I have created jaegertracing/spark-dependencies#131 within the spark-dependency repo. However this repos seems not to be maintained anymore (?), hence it would be a improvement to at least be able to limit the execution time of the Pod using Kubernetes specifications. This is currently not feasible for the user since the CronJob is managed by the Jaeger Operator.
Proposal
Set activeDeadlineSeconds on the Pod-spec to limit the execution time. If the specified amount of time run out before the job finishes, the Pod will be deleted and a new Pod will be created.
Ideally this should be configurable within jaeger.spec.storage.dependencies. A high default value (8h or 1d) would also be fine, but would be a breaking change in case of (real) long running spark jobs.
This does not solve the problem entirely, but would at least be a mitigation.
Requirement
As a Jaeger operator,
I want to be able to limit the execution time of the
jaeger-spark-dependencies
Job,so that I can ensure the Job is not running forever and blocking/wasting resources.
Problem
The spark-dependency spark jobs (the actual Spark jobs inside the JVM) often run into OutOfMemory issues.
The actual problem here is, that the Container does not fail (exit), even though the Spark job already failed.
To solve this issue lasting I have created jaegertracing/spark-dependencies#131 within the spark-dependency repo. However this repos seems not to be maintained anymore (?), hence it would be a improvement to at least be able to limit the execution time of the Pod using Kubernetes specifications. This is currently not feasible for the user since the CronJob is managed by the Jaeger Operator.
Proposal
Set
activeDeadlineSeconds
on the Pod-spec to limit the execution time. If the specified amount of time run out before the job finishes, the Pod will be deleted and a new Pod will be created.Ideally this should be configurable within
jaeger.spec.storage.dependencies
. A high default value (8h or 1d) would also be fine, but would be a breaking change in case of (real) long running spark jobs.This does not solve the problem entirely, but would at least be a mitigation.
Open questions
Is jaegertracing/spark-dependencies still maintained?
-> If yes: it would be better to fix the Job itself jaegertracing/spark-dependencies#131
-> If no: I could open a PR to address this if the proposal sounds good to you.
The text was updated successfully, but these errors were encountered: