fix: separate flags for compute & requester job selection #2930

frrist · 2023-10-17T19:16:50Z

Compute and Requester nodes each have their own JobSelectionPolicy. This change separates the flags for each configured value.
The Requester JobSelectionPolicy is currently unused and a TODO has been added (see code comment) to address it via: JobSelectionPolicy is un-used for the requester node #2929
closed Job selection probes are misconfigured for Compute and Requester Nodes #2898

- closed #2898

wdbaruni · 2023-10-18T05:06:04Z

cmd/util/flags/configflags/job_selection.go

+	//
+	// Compute Job Selection Policy
+	{
+		FlagName:     "compute-job-selection-data-locality",
+		ConfigPath:   types.NodeComputeJobSelectionLocality,
+		DefaultValue: Default.Node.Compute.JobSelection.Locality,
+		Description:  `Only accept jobs that reference data we have locally ("local") or anywhere ("anywhere").`,
+	},


this breaks backward compatibility and will need to update our documentation with the new flags. Also will be confusing for users when running hybrid nodes where defining compute-job-selection-accept-networked won't be enough as they will have to set requester-job-selection-accept-networked as well

The requirement here is to have a single flag that enables the job selection policy regardless if the node is a requester, compute or hybrid. I believe the quick fix for now as I understood during yesterday's meeting is to have the flag set compute node config instead of the requester, until we come up with a better solution of a single flag that sets multiple configs

Agreed, we should be the ones doing the hard work to use the flag in two places, not the user.

I'm guessing the worry about backwards compatibility isn't a deal-breaker here, since this isn't about changing the protocol, just the user experience. If compatibility is a showstopper, we might as well close this now. But if not, let's continue below:

Agreed that if we accept this change I'll need to update the documentation. I am happy to do that if this is approved.

I believe the quick fix for now as I understood during yesterday's meeting is to have the flag set compute node config instead of the requester

I believe that is what's happening here, there is a single flag (set of flags) to configure the compute nodes policy. There is a different set of flags to configure the requesters policy, but that latter flag is ineffectual due to #2929

Here's why I chose to use two separate flags:

It felt messy to use one flag to set up either my compute node, requester node, or both based on the type of node I had.

I'm not 100% sure, but there might come a time when we want a hybrid node with different config values for its job selection policies: what if the requester and compute nodes need different settings for probe-exec or probe-http to decide if they'll take a job? A single flag won’t cut it then.

Also, the data-locality feels more like it’s meant for the compute node. Maybe we'll phase it out from the requester node at some point?

At the moment, both compute and requester nodes have their unique config values. Using viper+cobra, there's no simple way to tie one flag to two different configs. Usually, one flag matches one config value.

But if we're set on using just one flag, we might consider creating a shared config that both node types can use. This would have common settings and flags for both nodes. I think there is a reasonable case for making these separate config values, but I'll implement whatever decision you decide on.

We should probably come up with an articulated approach for config - is flexibility our core motivator for all the flags we currently maintain? It feels inflexible to have one flag apply to two different components where we might want different behaviour. Assuming there's a use case for that in the case of job selection policy.

I agree it breaks backward compatibility, but if we never do that we'll be stuck with decisions made in different circumatances, or worse end up maintaining what might have been an oversight/mistake.

Probably best to wait for the reviewers to weight in, but I've a feeling that this is possibly the right move, but not sure it's high enough priority to justify the (admittedly minor) disruption to any users who may be using it.

There's a lot of "maybes" gone into the justificaiton, so I'm not compelled to agree. I really can't understand a situation where we would want a hybrid node to have different config for it's requester or compute parts. Maybe we will do these things, maybe we won't, but I'm not seeing any good evidence for why or when we will do all these "maybes". So it doesn't fly as justification.

Even casting backwards compat aside, I don't think there is a better user experience gained from having separate flags for these. From a user's POV, they are running a node and configuring it – even if they are running a hybrid node, their mental model is still of a single node that they are configuring. The fact that a hybrid Bacalhau node is internally composed of requester and compute parts is an internal detail.

The fact that cobra/viper don't support this well is unfortunate, but it's also partly because of our choice to use separate config structs for each node type in the first place, and we shouldn't be dictating the user experience based on the limitations of our code. If it's literally impossible to do one flag going into two places then that's a different conversation, but it's clearly not – I don't see a reason why serve can't pick stuff out of general config and manually put it into config for requester or compute nodes. Ugly maybe, but at least the ugliness is internal and not being exposed to the user.

Still on the side of retaining the existing UX.

wdbaruni · 2023-10-18T15:02:03Z

The issue was blocking my use case as I needed network access in my nodes. This fix seemed to work
#2936

frrist · 2023-10-27T18:16:51Z

Closing based on review comments.

fix: separate flags for compute & requester job selection

4c33c86

- closed #2898

frrist self-assigned this Oct 17, 2023

frrist requested review from rossjones, simonwo and wdbaruni October 17, 2023 19:17

wdbaruni reviewed Oct 18, 2023

View reviewed changes

frrist mentioned this pull request Oct 19, 2023

fix job selection policy flag #2936

Merged

frrist closed this Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: separate flags for compute & requester job selection #2930

fix: separate flags for compute & requester job selection #2930

frrist commented Oct 17, 2023

wdbaruni Oct 18, 2023

simonwo Oct 18, 2023

frrist Oct 19, 2023

rossjones Oct 25, 2023

simonwo Oct 25, 2023

wdbaruni commented Oct 18, 2023

frrist commented Oct 27, 2023

fix: separate flags for compute & requester job selection #2930

fix: separate flags for compute & requester job selection #2930

Conversation

frrist commented Oct 17, 2023

wdbaruni Oct 18, 2023

Choose a reason for hiding this comment

simonwo Oct 18, 2023

Choose a reason for hiding this comment

frrist Oct 19, 2023

Choose a reason for hiding this comment

rossjones Oct 25, 2023

Choose a reason for hiding this comment

simonwo Oct 25, 2023

Choose a reason for hiding this comment

wdbaruni commented Oct 18, 2023

frrist commented Oct 27, 2023