Behaviour of ScaledJob `minReplicaCount` field #4885

LewisJackson1 · 2023-08-17T09:11:16Z

LewisJackson1
Aug 17, 2023

We've had a bit of discussion in the comments of this issue #4554 (comment) concerning the behaviour of the minReplicaCount of the ScaledJob object. I was asked to open a discussion here to get more opinions on the matter.

Just to recap, the behaviour currently is defined as:

minReplicaCount: 10 # Optional. Default: 0
The min number of jobs that is created by default. This can be useful to avoid bootstrapping time of new jobs. If minReplicaCount is greater than maxReplicaCount, minReplicaCount will be set to maxReplicaCount.
New messages may create new jobs - within the limits imposed by maxReplicaCount - in order to reach the state where minReplicaCount jobs are always running. For example, if one sets minReplicaCount to 2 then there will be 2 jobs running permanently. Using a targetValue of 1, if 3 new messages are sent, 2 of those messages will be processed on the already running jobs but another 3 jobs will be created in order to fulfill the desired state dictated by the minReplicaCount parameter that is set to 2.

In my use case, we are considering migrating a ScaledObject to a ScaledJob. In doing so I expected the minReplicaCount of ScaledJob to match the behaviour of a ScaledObject. So to use the example above:

For example, if one sets minReplicaCount to 2 then there will be 2 jobs running permanently. Using a targetValue of 1, if 3 new messages are sent, 2 of those messages will be processed on the already running jobs. 1 more job will be created to process the final message.

The consequences of scaling out too many Jobs in our use case is that we'd incur minimum charges for GPU Nodes. We would like to have two Jobs kept warm without the overprovisioning behaviour that is currently intended. I can understand that this feature may be useful for some users, but I'm not sure that it's common enough to be the default.

The discussion in the issue above is worth reading and all opinions are welcome! I can think of a couple of ways to work around this behaviour to fit our use case, but it'd have been great to have it working out of the box.

JorTurFer · 2023-08-17T09:24:22Z

JorTurFer
Aug 17, 2023
Maintainer

@tomkerkhove @zroubalik , please share your thoughts

5 replies

eugen-nw Aug 17, 2023

@LewisJackson1 how long does your processing typically take? How large are the bursts that you need to scale out to in order to accommodate? Also what drives you to move to using ScaledJobs? Perhaps you do not need that in the first place.

In my world we can easily get > 40 Messages to process in less than a minute. Most of them compute in < 1 minute but some of them can take hours. Because of the long running ones I have to use ScaledJob so I am guaranteed that the processing will complete in the Pod it lands on. I run with minReplicaCount set to 12. and max. set to 80. The moment one Message gets into the Queue, I like the fact that the ScaledJob starts up a new Pod so I will be better prepared to handle a potential burst. Let’s stay with this one Message example for a moment. ScaledJob guarantees that I will have 12 Pods running, which is good. I pay extra for the single Pod that is processing the Message. The moment the processing is over, that Pod terminates and we no longer pay for it. Should 20 Messages come the next minute, I have 11 Pods ready to process them, the newly launched one almost ready and ScaledJob will provide me 20 brand new ones that will accommodate the remaining 8 Messages, plus have a spare of 12 Pods to accommodate the next burst.

That’s how scale-out works, its purpose is to accommodate burst in demand in a reasonable manner. Whenever you scale out two things will happen: 1. extra costs 2. possible delays until scale-out becomes functional. ScaledJob reduces number 2 because it starts up new Pods fast. ScaledObject is slower with the scale-out aspect and can easily scale out to more Pods than necessary.

eugen-nw Aug 17, 2023

On the other hand if you want to minimize costs, use ScaledJob and have minReplicaCount set to 0. Whenever you get N Messages, N Pods will start up to process them and will shut down when the processing is over.

LewisJackson1 Aug 18, 2023
Author

I appreciate the in-depth example but I feel that we're focusing on how the current behaviour could work well for some users. I don't think it's fair to say that this behaviour is how scale-out works - it's just how it's been defined by whoever wrote the PR. My point is actually that I think this is an abnormal behaviour for scaling out a set of Jobs. I don't think if we were discussing how to design this feature from the start that we'd all agree that this behaviour is the obvious default.

Some people may take a "better safe than sorry" approach here and want to scale out warm Pods in case more items hit the queue. In our use case we've got two GPU Nodes reserved for a year so we may as well keep warm Pods running on those, but we're almost always going to be wasting money scaling out to queuelength + 2 Pods just in case another Job comes in.

eugen-nw Aug 18, 2023

If 2 is your desired count of warm Pods, you may want to use a ScaledObject with minReplicaCount set to 2 and triggers set to metricType: Value with a high messageCount: ‘5’ for example. Or even higher than that? ScaledObject will think for a couple of minutes before it scales out and scales out slowly, in progressively larger increments.

I still don’t understand if you actually need to have the ability to scale out to accommodate higher loads or your point is to have only 2 Pods running. If it is the latter, then you do not need to use any scale-out at all. Just let the 2 Pods consume whatever load comes in at the rate that they can accomplish.

I am on the side of the spectrum where I need to have warm Pods ready to handle whatever random loads that get thrown at our solution. That is the whole point of scaling out.

LewisJackson1 Aug 18, 2023
Author

If 2 is your desired count of warm Pods, you may want to use a ScaledObject with minReplicaCount set to 2 and triggers set to metricType: Value with a high messageCount: ‘5’ for example. Or even higher than that? ScaledObject will think for a couple of minutes before it scales out and scales out slowly, in progressively larger increments.

We were using ScaledObject before but the behaviour described here made it quite wasteful:

One important consideration to make is how this pattern can work with long-running executions. Imagine a deployment triggers on a RabbitMQ queue message. Each message takes 3 hours to process. It’s possible that if many queue messages arrive, KEDA will help drive scaling out to many replicas - let’s say 4. Now the HPA makes a decision to scale down from 4 replicas to 2. There is no way to control which of the 2 replicas get terminated to scale down. That means the HPA may attempt to terminate a replica that is 2.9 hours into processing a 3 hour queue message.

It'd mean that if we'd scaled out to 4 Pods and one Pod finished a job, there'd be a 25% chance that that Pod would be the one to be terminated.

I still don’t understand if you actually need to have the ability to scale out to accommodate higher loads or your point is to have only 2 Pods running. If it is the latter, then you do not need to use any scale-out at all. Just let the 2 Pods consume whatever load comes in at the rate that they can accomplish.

We need to process videos from our users which could take less than a minute, 30 minutes, or a few hours. I don't think it matters too much about our particular use case here as I just wanted to discuss the current behaviour of the scaler.

I am on the side of the spectrum where I need to have warm Pods ready to handle whatever random loads that get thrown at our solution. That is the whole point of scaling out.

Again, I disagree that the point of scaling out is to have Pods sitting without work to do. I understand that there's a desire for this but I don't think it is accurate to say that this is the whole point of scaling. I think that a large percentage of users would just expect Pods to spin up as necessary.

The behaviour of having a buffer of two Pods sitting warm at all times can easily be replicated by deploying a Deployment with 2 replicas to process Jobs, or a separate ScaledJob with a min/max of 2/2. I don't think there's a simple way to get back to the behaviour that I would describe as normal.

jasonrberk · 2024-08-13T02:13:11Z

jasonrberk
Aug 13, 2024

I stumbled onto this thread while trying to wrap my head around using keda and I feel like I'm now just more confused :-)

If you're using a scaled job, the intent is for the job to pull a message, process it and terminate. If you start up the so-called warm instances, won't they just hit the time out more often than not before a message lands on the queue and terminate?

1 reply

ghs Nov 12, 2024

Exactly same here. Have been scratching my head for some time now and after reading here, from one side it makes sense but from another its totally confusing and looks wrong...

In our case, we have a queue where services send messages to. From the other, we use ScaledJob with the following settings:

      pollingInterval            = 60
      maxReplicaCount            = 1 # this is 1 because otherwise goes crazy
      successfulJobsHistoryLimit = 1
      failedJobsHistoryLimit     = 5
      jobTargetRef = {
        completions           = 30
        parallelism           = 30
        activeDeadlineSeconds = 60

If maxReplicaCount is set to 1, it will start ONE job with 30 pods in parallel. We had an accident where we had maxReplicaCount not set and basically burned our cluster... xD
The problem with this comes that if there is ONE message in the queue, we have 29 pods doing absolutely nothing and after 60 seconds, they will be marked as failed, thus, 1 success + 29 failed = job failed.

From the documentation, the scaling strategy is mainly (or only) focused on the maxReplicaCount + jobs, but not the actual messages available at the moment of pulling.

Shouldn't there be an option to choose if one really one to have warm pods or not? I am with the same thinking as @LewisJackson1
and in addition to @jasonrberk regarding the termination problem. We've noticed that if a pod starts its processing late, because a message could arrive at some point, and hits the deadline time, it will be basically terminated in the middle of the work creating inconsistencies in data and processes related to it...

@JorTurFer @tomkerkhove @zroubalik @eugen-nw

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Behaviour of ScaledJob `minReplicaCount` field #4885

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Behaviour of ScaledJob minReplicaCount field #4885

LewisJackson1 Aug 17, 2023

Replies: 2 comments · 6 replies

JorTurFer Aug 17, 2023 Maintainer

eugen-nw Aug 17, 2023

eugen-nw Aug 17, 2023

LewisJackson1 Aug 18, 2023 Author

eugen-nw Aug 18, 2023

LewisJackson1 Aug 18, 2023 Author

jasonrberk Aug 13, 2024

ghs Nov 12, 2024

Behaviour of ScaledJob `minReplicaCount` field #4885

LewisJackson1
Aug 17, 2023

Replies: 2 comments 6 replies

JorTurFer
Aug 17, 2023
Maintainer

LewisJackson1 Aug 18, 2023
Author

LewisJackson1 Aug 18, 2023
Author

jasonrberk
Aug 13, 2024