-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: separate flags for compute & requester job selection #2930
Conversation
frrist
commented
Oct 17, 2023
- Compute and Requester nodes each have their own JobSelectionPolicy. This change separates the flags for each configured value.
- The Requester JobSelectionPolicy is currently unused and a TODO has been added (see code comment) to address it via: JobSelectionPolicy is un-used for the requester node #2929
- closed Job selection probes are misconfigured for Compute and Requester Nodes #2898
// | ||
// Compute Job Selection Policy | ||
{ | ||
FlagName: "compute-job-selection-data-locality", | ||
ConfigPath: types.NodeComputeJobSelectionLocality, | ||
DefaultValue: Default.Node.Compute.JobSelection.Locality, | ||
Description: `Only accept jobs that reference data we have locally ("local") or anywhere ("anywhere").`, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this breaks backward compatibility and will need to update our documentation with the new flags. Also will be confusing for users when running hybrid nodes where defining compute-job-selection-accept-networked
won't be enough as they will have to set requester-job-selection-accept-networked
as well
The requirement here is to have a single flag that enables the job selection policy regardless if the node is a requester, compute or hybrid. I believe the quick fix for now as I understood during yesterday's meeting is to have the flag set compute node config instead of the requester, until we come up with a better solution of a single flag that sets multiple configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we should be the ones doing the hard work to use the flag in two places, not the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing the worry about backwards compatibility isn't a deal-breaker here, since this isn't about changing the protocol, just the user experience. If compatibility is a showstopper, we might as well close this now. But if not, let's continue below:
Agreed that if we accept this change I'll need to update the documentation. I am happy to do that if this is approved.
I believe the quick fix for now as I understood during yesterday's meeting is to have the flag set compute node config instead of the requester
I believe that is what's happening here, there is a single flag (set of flags) to configure the compute nodes policy. There is a different set of flags to configure the requesters policy, but that latter flag is ineffectual due to #2929
Here's why I chose to use two separate flags:
- It felt messy to use one flag to set up either my compute node, requester node, or both based on the type of node I had.
- I'm not 100% sure, but there might come a time when we want a hybrid node with different config values for its job selection policies: what if the requester and compute nodes need different settings for
probe-exec
orprobe-http
to decide if they'll take a job? A single flag won’t cut it then. - Also, the
data-locality
feels more like it’s meant for the compute node. Maybe we'll phase it out from the requester node at some point?
- I'm not 100% sure, but there might come a time when we want a hybrid node with different config values for its job selection policies: what if the requester and compute nodes need different settings for
- At the moment, both compute and requester nodes have their unique config values. Using viper+cobra, there's no simple way to tie one flag to two different configs. Usually, one flag matches one config value.
But if we're set on using just one flag, we might consider creating a shared config that both node types can use. This would have common settings and flags for both nodes. I think there is a reasonable case for making these separate config values, but I'll implement whatever decision you decide on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably come up with an articulated approach for config - is flexibility our core motivator for all the flags we currently maintain? It feels inflexible to have one flag apply to two different components where we might want different behaviour. Assuming there's a use case for that in the case of job selection policy.
I agree it breaks backward compatibility, but if we never do that we'll be stuck with decisions made in different circumatances, or worse end up maintaining what might have been an oversight/mistake.
Probably best to wait for the reviewers to weight in, but I've a feeling that this is possibly the right move, but not sure it's high enough priority to justify the (admittedly minor) disruption to any users who may be using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of "maybes" gone into the justificaiton, so I'm not compelled to agree. I really can't understand a situation where we would want a hybrid node to have different config for it's requester or compute parts. Maybe we will do these things, maybe we won't, but I'm not seeing any good evidence for why or when we will do all these "maybes". So it doesn't fly as justification.
Even casting backwards compat aside, I don't think there is a better user experience gained from having separate flags for these. From a user's POV, they are running a node and configuring it – even if they are running a hybrid node, their mental model is still of a single node that they are configuring. The fact that a hybrid Bacalhau node is internally composed of requester and compute parts is an internal detail.
The fact that cobra/viper don't support this well is unfortunate, but it's also partly because of our choice to use separate config structs for each node type in the first place, and we shouldn't be dictating the user experience based on the limitations of our code. If it's literally impossible to do one flag going into two places then that's a different conversation, but it's clearly not – I don't see a reason why serve
can't pick stuff out of general config and manually put it into config for requester or compute nodes. Ugly maybe, but at least the ugliness is internal and not being exposed to the user.
Still on the side of retaining the existing UX.
The issue was blocking my use case as I needed network access in my nodes. This fix seemed to work |
Closing based on review comments. |