Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: separate flags for compute & requester job selection #2930

Closed
wants to merge 1 commit into from

Conversation

frrist
Copy link
Member

@frrist frrist commented Oct 17, 2023

@frrist frrist self-assigned this Oct 17, 2023
Comment on lines +42 to +49
//
// Compute Job Selection Policy
{
FlagName: "compute-job-selection-data-locality",
ConfigPath: types.NodeComputeJobSelectionLocality,
DefaultValue: Default.Node.Compute.JobSelection.Locality,
Description: `Only accept jobs that reference data we have locally ("local") or anywhere ("anywhere").`,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks backward compatibility and will need to update our documentation with the new flags. Also will be confusing for users when running hybrid nodes where defining compute-job-selection-accept-networked won't be enough as they will have to set requester-job-selection-accept-networked as well

The requirement here is to have a single flag that enables the job selection policy regardless if the node is a requester, compute or hybrid. I believe the quick fix for now as I understood during yesterday's meeting is to have the flag set compute node config instead of the requester, until we come up with a better solution of a single flag that sets multiple configs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should be the ones doing the hard work to use the flag in two places, not the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the worry about backwards compatibility isn't a deal-breaker here, since this isn't about changing the protocol, just the user experience. If compatibility is a showstopper, we might as well close this now. But if not, let's continue below:

Agreed that if we accept this change I'll need to update the documentation. I am happy to do that if this is approved.

I believe the quick fix for now as I understood during yesterday's meeting is to have the flag set compute node config instead of the requester

I believe that is what's happening here, there is a single flag (set of flags) to configure the compute nodes policy. There is a different set of flags to configure the requesters policy, but that latter flag is ineffectual due to #2929

Here's why I chose to use two separate flags:

  • It felt messy to use one flag to set up either my compute node, requester node, or both based on the type of node I had.
    • I'm not 100% sure, but there might come a time when we want a hybrid node with different config values for its job selection policies: what if the requester and compute nodes need different settings for probe-exec or probe-http to decide if they'll take a job? A single flag won’t cut it then.
    • Also, the data-locality feels more like it’s meant for the compute node. Maybe we'll phase it out from the requester node at some point?
  • At the moment, both compute and requester nodes have their unique config values. Using viper+cobra, there's no simple way to tie one flag to two different configs. Usually, one flag matches one config value.

But if we're set on using just one flag, we might consider creating a shared config that both node types can use. This would have common settings and flags for both nodes. I think there is a reasonable case for making these separate config values, but I'll implement whatever decision you decide on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably come up with an articulated approach for config - is flexibility our core motivator for all the flags we currently maintain? It feels inflexible to have one flag apply to two different components where we might want different behaviour. Assuming there's a use case for that in the case of job selection policy.

I agree it breaks backward compatibility, but if we never do that we'll be stuck with decisions made in different circumatances, or worse end up maintaining what might have been an oversight/mistake.

Probably best to wait for the reviewers to weight in, but I've a feeling that this is possibly the right move, but not sure it's high enough priority to justify the (admittedly minor) disruption to any users who may be using it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of "maybes" gone into the justificaiton, so I'm not compelled to agree. I really can't understand a situation where we would want a hybrid node to have different config for it's requester or compute parts. Maybe we will do these things, maybe we won't, but I'm not seeing any good evidence for why or when we will do all these "maybes". So it doesn't fly as justification.

Even casting backwards compat aside, I don't think there is a better user experience gained from having separate flags for these. From a user's POV, they are running a node and configuring it – even if they are running a hybrid node, their mental model is still of a single node that they are configuring. The fact that a hybrid Bacalhau node is internally composed of requester and compute parts is an internal detail.

The fact that cobra/viper don't support this well is unfortunate, but it's also partly because of our choice to use separate config structs for each node type in the first place, and we shouldn't be dictating the user experience based on the limitations of our code. If it's literally impossible to do one flag going into two places then that's a different conversation, but it's clearly not – I don't see a reason why serve can't pick stuff out of general config and manually put it into config for requester or compute nodes. Ugly maybe, but at least the ugliness is internal and not being exposed to the user.

Still on the side of retaining the existing UX.

@wdbaruni
Copy link
Member

The issue was blocking my use case as I needed network access in my nodes. This fix seemed to work
#2936

@frrist
Copy link
Member Author

frrist commented Oct 27, 2023

Closing based on review comments.

@frrist frrist closed this Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job selection probes are misconfigured for Compute and Requester Nodes
4 participants