-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Add designs/multi-cluster.md #2746
base: main
Are you sure you want to change the base?
📖 Add designs/multi-cluster.md #2746
Conversation
Signed-off-by: Dr. Stefan Schimanski <[email protected]>
designs/multi-cluster.md
Outdated
} | ||
|
||
// pkg/handler | ||
type DeepCopyableEventHandler interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The eventhandlers are stateless, why do we need the deepcopy for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the propotype. I think this is because EventHandler then would store the Cluster (it is using that info to set the ClusterName field in the request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is gone now in #2726.
Will update the design here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the BYO request/eventhandler changes in #3019, I brought this back after remembering it was mentioned in the proposal. The previous version of my prototype had a weird second layer of event handler that was wrapping the actual event wrapper and was using the event object to communicate the cluster name in. That felt all kinds of weird.
Because we now have BYO EventHandlers, it's possible that they are not entirely stateless (as @sbueringer pointed out, some event handlers might have to store the cluster name in absence of any information on the event object itself). So I think this approach is the most clean, to be honest. It's entirely optional in #3019, existing EventHandlers don't need to be changed.
designs/multi-cluster.md
Outdated
Disengage(context.Context, Cluster) error | ||
} | ||
``` | ||
In particular, controllers implement the `AwareRunnable` interface. They react |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than changing the controller type directly and requiring all its dependencies to known how to deepcopy themselves, how about having something like a controllerconstructor
(name tbd) in between that is filled with a []watchConstructor{source func(Cluster) source.Source, handler func(Cluster) handler.Handler, predicate func(cluster) []predicate.Predicate}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require more invasive changes to our public API (the Controller interface)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you can call Watch
on an existing controller. The idea is to not let the Controller
or its dependencies have any knowledge about this but instead have a thing on top of the Controller
that is configured with constructors that take a cluster.Cluster
and return a source/predicate/handler and then uses those to call Watch
when a new cluster appears.
When one disappears, it would cancel the context
on the Source.
The idea really is the opposite, I do not want the Controller
to know how to extend itself like this, IMHO this is a higher-level abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare #2726 after latest push. I have implemented @alvaroaleman's idea via a MultiClusterController
wrapper implementing cluster.AwareRunnable
and just calling Watch
on the actual controller. All the deepcopy'ing is gone 🎉 Much nicer IMO. @alvaroaleman great intuition!
designs/multi-cluster.md
Outdated
// pkg/cluster | ||
type Provider interface { | ||
Get(ctx context.Context, clusterName string, opts ...Option) (Cluster, error) | ||
List(ctx context.Context) ([]string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return []string
here rather than []Cluster
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Would be good for consistency with the Get func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a misunderstanding of the interface. The getter is actually the constructor. The life-cycle of the returned clusters is owned by the manager (they are added as runnables). Hence, the List
returns names, not clusters. We should rather rename Get
to Create
or Connect
.
designs/multi-cluster.md
Outdated
} | ||
``` | ||
|
||
The `ctrl.Manager` will use the provider to watch clusters coming and going, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to think about if and how this is doable, but ideally the "thing that comes and goes" wouldn't be typed to cluster.Cluster
but can be anything, so this mechanism can also be used if folks have sources that are not kube watches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be mostly about a more generic name? (can't think of much that would work, maybe something like scope)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is an interesting idea and i could see using it, i just have a question about some of the mechanics.
for context, i am investigating a cluster-api provider for karpenter and it would be nice to have the controllers discriminate between objects in the management cluster and objects in the workload clusters.
designs/multi-cluster.md
Outdated
### Examples | ||
|
||
- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled. | ||
- Run a controller-runtime controller against cluster-managers like kind, Cluster-API, Open-Cluster-Manager or Hypershift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the cluster-api example here, is the intention that controllers will be able to reconcile CRDs in clusters that they know about that may only exist in a subset of clusters (e.g. Machine objects in the management cluster but not in the workload cluster) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think that has to be possible. Otherwise we need all resources that we watch in all clusters
(especially good point because today a controller crashes if a resource doesn't exist)
EDIT: Further down:
For example, it can well be that every cluster has different REST mapping because installed CRDs are different. Without a context, we cannot return the right REST mapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Question is whether one would rather group them in managers such that every manager has a uniform set of clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my updated PR #2726. You can now opt into provider and/or the default cluster per controller via options:
// EngageWithDefaultCluster indicates whether the controller should engage
// with the default cluster of a manager. This defaults to false through the
// global controller options of the manager if a cluster provider is set,
// and to true otherwise. Here it can be overridden.
EngageWithDefaultCluster *bool
// EngageWithProvidedClusters indicates whether the controller should engage
// with the provided clusters of a manager. This defaults to true through the
// global controller options of the manager if a cluster provider is set,
// and to false otherwise. Here it can be overridden.
EngageWithProviderClusters *bool
There is no logic yet for a controller to decide whether to engage with a provider cluster or not. Now it's with all of them. If the setup is more diverse, we might want such a functionality, e.g. some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm still understanding the changes in #2726, but i think what you are saying here makes sense to me and would solve the issue.
some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.
+1, i think we definitely need some way for the client user to specify when it should check a specific cluster for a resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow think it should be the author's and managers responsibility (for now) to group them into groups which are working with the pattern. At this point, we don't know what we don't know. Once this is released, we can gather some feedback on edge cases and take it from there. I suspect the majority of use cases will be still single cluster reconcile loops.
Maybe document this edge case and mark this feature overall as experimental? This way we not committing to full production level stability, and allow to gather more feedback?
For those reading, this is currently a little outdated. #2726 has a changed design proposed by @alvaroaleman. Will come back soon to both PRs. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen We'd like to continue working on this, time is simply a bit scarce at the moment. |
@embik: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi, could you please share the future plans for this feature? Thank you! |
/remove-lifecycle rotten |
Signed-off-by: Marvin Beckers <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
Signed-off-by: Marvin Beckers <[email protected]>
📖 Update multi-cluster proposal with new implementation details
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sttts The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Dr. Stefan Schimanski <[email protected]>
…iform Add notes about uniform controllers
Hi @alvaroaleman @sbueringer, we (@sttts and I) finally got around to refreshing this document. Could you please take a look and let us know what you think? The (new) implementation PR at #3019 is functional if you want to look at implementation details, but it's of course not finalised yet (e.g. missing tests). |
// pkg/cluster | ||
type Aware interface { | ||
// Engage gets called when the component should start operations for the given Cluster. | ||
// The given context is tied to the Cluster's lifecycle and will be cancelled when the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would a context be selected from the kubeconfig passed to the controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to avoid any confusion, this description is talking about a context.Context
, not a kubeconfig context.
But in general: This would be something to implement in a Provider
, a kubeconfig provider could be a very simple implementation (although the focus is on dynamic providers, and a kubeconfig would probably be fairly static).
How that provider translates the clusterName
parameter in the Get
method (see the Provider
interface above) to a kubeconfig context would be up to the implementation, but I could see the context name being the identifier here (since Get
returns a cluster.Cluster
, we need credentials for the embedded client, so a context makes a lot of sense here).
Does that make sense? 🤔
Linked the updated implementation #3019. |
+1 highly appreciated in the name of @SAP & is needed in context of our open source efforts with our european partners. |
+1 We are very excited about this and are looking forward us it in our open source efforts with https://apeirora.eu/ @SAP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actually two subcases in here that we don't seem to be differentiating well:
a) Controller whose request
will always contain the cluster the event originated from
b) Controller whose request
may contain a cluster that is not the one the event originated from
I think you are mostly thinking about a) with this proposal. You write:
We could deepcopy the builder instead of the sources and handlers. This would
lead to one controller and one workqueue per cluster. For the reason outlined
in the previous alternative, this is not desireable.
I am having trouble finding the "reasons outlined" and I do think instantiating a controller per cluster rather than adding/removing eventsources is preferrable whenever possible, because:
- Otherwise workqueue metrics get dilluted, this is a big issue for ops, workqueue_depth is usually one of the main metrics people alert on
- It requires zero changes in existing reconcilers
I'd love to hear your thoughts on this and I do think we should clearly differentiate the two cases in this doc. What also seems to get lost a bit is that the main issue today is with doing all of this dynamically - pkg/cluster already exists, if the cluster set is static, this design adds very little.
Consequently, there is no multi-cluster controller ecosystem, but could and | ||
should be. | ||
- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime) | ||
because adding support on-top leads to an inefficient controller design, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leads to an inefficient controller design
Could you elaborate on this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explicitly don't want one controller (with its own workqueue) per cluster.
Example: forget about workspaces. Imagine controller-runtime only supported controllers per one (!) namespace, i.e. another controller with another namespace for every namespace you want to serve. Same argument here, just a level higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And independently you could imagine cases where the same is true e.g. for cluster-api cases where the workqueue should be shared. That's what this enhancement is enabling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a decision that shouldn't be forced onto customers. I can see the case where a workqueue per cluster is desired as it provides some built in fairness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explicitly don't want one controller (with its own workqueue) per cluster.
Others might disagree. This question needs a proper evaluation with pro/cons of both approaches rather than jumping to conclusions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that both topologies can have their place. Am not even sure pro/con is helpful. We shouldn't be opinionated, but give the primitives for the developer to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying my comment from here since this is the bigger thread:
Would you be comfortable with a way to tell this design to either use a shared queue (e.g. start sources) or to start controllers with a config switch [in the manager] or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that both topologies can have their place. Am not even sure pro/con is helpful. We shouldn't be opinionated, but give the primitives for the developer to decide.
A stated goal of this doc is to avoid divergence in the ecosystem
, writing that down while at the same time handwaving away comments about this not actually being a good approach and saying we shouldn't be opinionated is not particularly convincing.
Our goal is be make the majority use case simple and other use-cases possible. This is not possible if we refuse to even look into the question what the majority use-case is and default to assuming that the use-case of the author of a design must be the majority use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I didn't mean we shouldn't think about which workqueue topology is useful when. I meant that there are good reasons for a joint workqueue in some situations (like when I want to throttle all reconciles in a process because that throughput it limited), and independent ones in other situations (like when e.g. writes to a cluster are the limited factor).
I played with a TypedFair
queue:
// Fair is a queue that ensures items are dequeued fairly across different
// fairness keys while maintaining FIFO order within each key.
type Fair TypedFair[any]
// FairnessKeyFunc is a function that returns a string key for a given item.
// Items with different keys are dequeued fairly.
type FairnessKeyFunc[T comparable] func(T) string
// NewFair creates a new Fair instance.
func NewFair(keyFunc FairnessKeyFunc[any]) *Fair {
return (*Fair)(NewTypedFair[any](keyFunc))
}
that could be plugged in here, wrapped by throttling and delays.
object to these kind of changes. | ||
|
||
Here we call a controller to be multi-cluster-compatible if the reconcilers get | ||
reconcile requests in cluster `X` and do all reconciliation in cluster `X`. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "Start/Stop a controller for each cluster" is out of scope, this is purely about "Add/Remove sources to/from controller on cluster arrival/departure"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to "one workqueue" I guess. Start/Stop means another workqueue, which we don't want.
``` | ||
|
||
`ctrl.Manager` will implement `cluster.Aware`. As specified in the `Provider` interface, | ||
it is the cluster provider's responsibility to call `Engage` and `Disengage` on a `ctrl.Manager` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that mean the current interface specification for cluster.Provider
is insufficient, as this entails that the cluster.Provider
needs a reference to the manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's out of scope of the goal of the interface. Wiring the manager in will happen when starting the provider:
prov := NewProvider()
mgr := NewManager(..., Options: {provider: prov})
go mgr.Start(ctx)
go prov.Start(ctx, mgr)
// The given context is tied to the Cluster's lifecycle and will be cancelled when the | ||
// Cluster is removed or an error occurs. | ||
// | ||
// Implementers should return an error if they cannot start operations for the given Cluster, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its non-blocking, how is the controller supposed to surface errors here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that anything that needs to be done for starting operations on a cluster is "blocking" (and thus would return an error), but the operations on the engaged cluster themselves are not blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example?
I have some trouble understanding the difference between "anything that needs to be done for starting operations on a cluster" vs "operations on the engaged cluster" that both seems to be done or aysnchronously triggered (maybe?) in the Engage func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things that happen in the prototype implementation (#3019) for the typedMultiClusterController
is starting the new watches on the newly engaged cluster. So if that fails, Engage
returns an error, but it's not blocking for processing items.
|
||
The multi-cluster controller implementation reacts to engaged clusters by starting | ||
a new `TypedSyncingSource` that also wraps the context passed down from the call to `Engage`, | ||
which _MUST_ be canceled by the cluster provider at the end of a cluster's lifecycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are saying the ctx
passed by cluster.Provider
when calling Engage
on the manager needs to be stored by the manager and re-used when calling Engage
on any multi-cluster runnable which in turn needs to use it to control the lifecycle of the source? What is the point of having Disengage
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, and we can do without Disengage
. I will try to change the prototype implementation to eliminate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to imply that the context used to call Start() on a Source will be used to stop the Source?
I think stopping the Source is not possible by just canceling this context for our current source implementations (or at least some of them)
IIRC the only similar functionality that we have today is that cancelling the context passed into Cache.Start() will stop all informers.
I also wonder how this works when multiple controllers are sharing the same underlying informer. I think if a controller doesn't exclusively own an informer it also shouldn't just shut it down. Or is my assumption wrong that they usually would share informers? (like it works today when multiple controllers are sharing the same cache & underlying informers).
For additional context. If I apply what we currently do in Cluster API to this proposal, it would be the cluster provider that shuts down the cache and all underlying informers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think stopping the Source is not possible by just canceling this context for our current source implementations (or at least some of them)
Am curious which you have in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least source.Kind. The ctx passed into Start can be used to cancel the start process (e.g. WaitForCacheSync) but not the informer
type TypedDeepCopyableEventHandler[object any, request comparable] interface { | ||
TypedEventHandler[object, request] | ||
DeepCopyFor(c cluster.Cluster) TypedDeepCopyableEventHandler[object, request] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why teach handlers to copy themselves rather than just layering this:
type HandlerConstructor[object any, request comparable] func(cluster.Cluster) TypedHandler[object, request]
?
(name likely needs improvement but you get the idea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this was the attempt to keep existing function signatures stable while enabling them to be multi-cluster aware. A HandlerConstructor
would probably need a new separate builder function to be passed as argument, so e.g. something Watches
vs ClusterAwareWatches
(or whatever).
I'm totally open to changing this if you prefer it.
The builder will chose the correct `EventHandler` implementation for both `For` and `Owns` | ||
depending on the `request` type used. | ||
|
||
With the described changes (use `GetCluster(ctx, req.ClusterName)`, making `reconciler` | ||
a `TypedFunc[reconcile.ClusterAwareRequest]`) an existing controller will automatically act as | ||
*uniform multi-cluster controller* if a cluster provider is configured. | ||
It will reconcile resources from cluster `X` in cluster `X`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth pointing out that this can also be achieved by instantiating a controller per target cluster rather than adding/removing sources to/from an existing controller.
IMHO if you ever actually want to operate the resulting component, you likely want "create/remove controller" rather than "create/remove source", because otherwise a single problematic cluster can completely mess up the workqueue metrics and on-calls can't tell if one cluster has an issue or all, which is going to be a big difference in term of severity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be comfortable with a way to tell this design to either use a shared queue (e.g. start sources) or to start controllers with a config switch or similar?
Owns(&v1.ReplicaSet{}). | ||
Complete(reconciler) | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc is completely missing info on:
- The actual implementation of a multi-cluster controller (i.E.
Engage
/Disengage
in theController
) - We are not expecting users to do that, right? - The same for source.Source, but arguably a subtask of the above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See implementation proposal #3019.
EngageWithDefaultCluster: ptr.To(true), | ||
EngageWithProviderClusters: ptr.To(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This double binary distinction is pretty much guaranted to be too little. It would be better if we somehow tell the builder if this is a multi-cluster controller or not and then the cluster.Provider
calls Engage
for all clusters that should be engaged and its up to the implementor of the provider if they want to include the cluster the manager
has a kubeconfig for or not.
If this is insufficient, we need a way for the cluster.Provider
to decide if a given Aware
runnable should be engaged or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its up to the implementor of the provider
Hm not sure if it should be up to the provider. Let's say I implement a cluster provider for Cluster API. I think it shouldn't be my call if all or none of the controllers that are used with this provider also watch the hub cluster or not.
I could make this a configuration option of the cluster provider but if this doesn't work because we only have one cluster provider per manager and I think it's valid that only some controllers of a manager watch the hub cluster and others do not.
So I think it should be a per-controller decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it would make sense to just always call Engage and then the Engage function always can just do nothing if it doesn't want to engage a Cluster. This seems the most flexible option.
(If necessary Engage could have a bool return parameter signalling if the controller actually engaged a cluster or not)
## Alternatives | ||
|
||
- Multi-cluster support could be built outside of core controller-runtime. This would | ||
lead likely to a design with one manager per cluster. This has a number of problems: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - This is precicesly why pkg/cluster
exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about details why it is not possible to implement this outside of CR. (I got the points around adoption across the ecosystem, just wondering about the technical reasons)
We have implemented a component (called ClusterCache) in Cluster API that seems to come very close to what this design is trying to achieve (apart from that it is Cluster API specific of course). Especially since the generic support was added to CR.
Basically ClusterCache in CAPI:
- discovers Clusters
- maintains a Cache per Cluster
- allows retrieving Clients for a Cluster
- allows adding Watches (kind Sources) for a Cluster
- this also allows mapping events that we get from these sources back to the one controller with the one shared work queue
xref: https://github.com/kubernetes-sigs/cluster-api/tree/main/controllers/clustercache
P.S. we are not creating multiple Cluster objects, instead we have our own simplified version that only contains what we need (https://github.com/kubernetes-sigs/cluster-api/blob/main/controllers/clustercache/cluster_accessor.go#L85-L89)
P.S.2. I don't understand the last two points in this list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main "blocker" for this is that the builder is pretty common in 3rdparty controller code. If we do all of this outside of CR, this will very likely mean a fork of pkg/builder
. I think everything else in the implementation PR could be done outside of CR.
Consequently, there is no multi-cluster controller ecosystem, but could and | ||
should be. | ||
- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime) | ||
because adding support on-top leads to an inefficient controller design, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a decision that shouldn't be forced onto customers. I can see the case where a workqueue per cluster is desired as it provides some built in fairness
- writing controllers for upper systems in a **portable** way is hard today. | ||
Consequently, there is no multi-cluster controller ecosystem, but could and | ||
should be. | ||
- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a general note, for the purpose of this proposal we should focus on general controller runtime users. While we can keep kcp
as a reference along other implementation. I'd rephrase the motivation at a high level "setup controllers and watches across multiple Kubernetes clusters in a transparent way"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we tried to cover the high-level part with the first bullet point. The kcp controller-runtime fork is just mentioned to give personal motivation, but I don't think we have to mention it here if that is preferred.
- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled. | ||
- Run a controller-runtime controller against cluster managers like kind, Cluster API, Open-Cluster-Manager or Hypershift. | ||
- Run a controller-runtime controller against a kcp shard with a wildcard watch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to focus the first iteration of this proposal to how Kubernetes works today? Adding uncommon use cases at this point in time increase overall complexity of the implementation. Other use cases should be pluggable imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agree things should be highly pluggable, that's why this is just an (incomplete) list of things that you could eventually plug in. Agreed that kcp is an uncommon use case, but so far we've made (recent) design decisions with Kubernetes clusters in general in mind. The "kcp" provider we'd like to built is itself far from ready yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't be helpful for us though. I don't think the design now is influenced much by the kcp requirements, maybe with exception of the shared workqueue. Other than that the fleet-namespace example (which kind of reflects the kcp requirements) shows that the kcp use-case can be covered by a pretty generic design.
out-of-tree subprojects that can individually evolve and vendor'ed by controller authors. | ||
- Make controller-runtime controllers "binary pluggable". | ||
- Manage one manager per cluster. | ||
- Manage one controller per cluster with dedicated workqueues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Manage one controller per cluster with dedicated workqueues. |
This should be a goal
type Provider interface { | ||
// Get returns a cluster for the given identifying cluster name. Get | ||
// returns an existing cluster if it has been created before. | ||
Get(ctx context.Context, clusterName string) (Cluster, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second argument should probably be a typed reference, like we have for ObjectReference
, even if it contains a single Name
field, it would help with expanding it later, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking of logical.Name
here or more a struct?
The embedded `cluster.Cluster` corresponds to `GetCluster(ctx, "")`. We call the | ||
clusters with non-empty name "provider clusters" or "enganged clusters", while | ||
the embedded cluster of the manager is called the "default cluster" or "hub | ||
cluster". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go by being explicit here, this was one of the main issues is that an empty string is also a default value. We can set a cluster.Reference
to a specific value that's very specific, which in turns is used across the entire codebase
// pkg/reconcile | ||
type ClusterAwareRequest struct { | ||
Request | ||
ClusterName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterName string | |
Cluster cluster.Reference |
can be used as `request` type even for controllers that do not have an active cluster provider. | ||
The cluster name will simply be an empty string, which is compatible with calls to `mgr.GetCluster`. | ||
|
||
**Note:** controller-runtime must provide this cluster-aware request type to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we're saying must
here but SHOULD
in the text right underneath the Cluster-Aware Request
title
Complete(reconciler) | ||
|
||
// new | ||
builder.TypedControllerManagedBy[reconcile.ClusterAwareRequest](mgr). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit too easy to mess up. What's stopping a reconcile.ClusterAwareRequest
being used in the wrong place or vice-versa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general: reconcile.ClusterAwareRequest
could totally be used in a non-multi-cluster setup and it wouldn't change a thing since it embeds a reconcile.Request
. If the ClusterName
field is empty, req.ClusterName
would imply the "default cluster", which is the single-cluster use-case today (e.g. we changed the fleet
example in #3019 to have a flag that lets you toggle multi-cluster vs single-cluster usage).
If you end up using reconcile.Request
you would quickly notice that you don't have the cluster name to pass to mgr.GetCluster
.
For a manager with `cluster.Provider`, the builder _SHOULD_ create a controller | ||
that sources events **ONLY** from the provider clusters that got engaged with | ||
the controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the "default" cluster we won't get events for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be configured by controller. We default to writing a uniform multi-cluster controller, i.e. one that only reacts to provider clusters. It is not common afaik to have the same semantics for both a local cluster (the hub usually) and provider clusters.
|
||
- Ship integration for different multi-cluster setups. This should become | ||
out-of-tree subprojects that can individually evolve and vendor'ed by controller authors. | ||
- Make controller-runtime controllers "binary pluggable". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "binary pluggable" mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like https://pkg.go.dev/plugin to dynamically load providers.
### Controller Author without self-interest in multi-cluster, but open for adoption in multi-cluster setups | ||
|
||
- Replace `mgr.GetClient()` and `mgr.GetCache` with `mgr.GetCluster(req.ClusterName).GetClient()` and `mgr.GetCluster(req.ClusterName).GetCache()`. | ||
- Make manager and controller plumbing vendor'able to allow plugging in multi-cluster provider and BYO request type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that something like cert-manager would decide on startup which cluster provider should be used and can then only work with one cluster provider at a time?
Phrased differently. Do we also want to support using multiple cluster providers at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- One provider per manager at a time.
- Low friction to make reconcilers uniform-multi-cluster capable (this basically means using the cluster-enabled request and to call
mgr.GetCluster(name)
instead of accessing cluster methods directly). - If a controller project wants to add support for a number of provider in their repository, this is fine, but not necessarily the goal.
- Instead it should be easy to instantiate the controllers from an alternative main.go with a provider of your choice.
I already discussed this with @sbueringer and others: I went ahead earlier this week to implement the design outside of controller-runtime: https://github.com/multicluster-runtime/multicluster-runtime. As written in the upper comment, everything but the builder is easy to implement by wrapping controller-runtime concepts. Thanks to the extensive work to get Go generics into CR, those wrappers are easy and natural. The builder consists of roughly 500 lines of code with mostly mechanical changes, which should be feasible to regularly rebase onto the latest state in controller-runtime. Our plan for the moment is to use https://github.com/multicluster-runtime/multicluster-runtime to prove the design with non-trivial projects. Last but not least, it will help to iterate fast and learn on the way. It is not ruled out that parts of it should flow back into controller-runtime at some point. Definitely it can and should influence the design. |
Controller-runtime today allows to write controllers against one cluster only.
Multi-cluster use-cases require the creation of multiple managers and/or cluster
objects. This proposal is about adding native support for multi-cluster use-cases
to controller-runtime.
The proposed changes are prototyped in #3019.