Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: reduce the number of ADS requests #7473

Open
easwars opened this issue Aug 1, 2024 · 1 comment
Open

xdsclient: reduce the number of ADS requests #7473

easwars opened this issue Aug 1, 2024 · 1 comment
Assignees
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior

Comments

@easwars
Copy link
Contributor

easwars commented Aug 1, 2024

The xDS client watch API looks like this:

WatchResource(rType xdsresource.Type, resourceName string, watcher xdsresource.ResourceWatcher) (cancel func())

This means that if a user of the API wants to register watches on N resources, it has to invoke the API N times. This is a common scenario in many use-cases (examples include xDS resolver, CDS LB policy, xDS enabled gRPC server etc).

The watch API implementation inside the xDS client delegates the watch to the appropriate authority here:

func (a *authority) watchResource(rType xdsresource.Type, resourceName string, watcher xdsresource.ResourceWatcher) func() {

This results in the authority asking the underlying transport to send the request out:

a.transport.SendRequest(rType.TypeURL(), resourcesToRequest)

The implement in the Transport queues this request here:

func (t *Transport) SendRequest(url string, resources []string) {

This is eventually processed by the send goroutine and the request is sent out on the wire:

func (t *Transport) send(ctx context.Context) {

So, if we have a case where an entity watching an LDS resource receives a response that contains N RDS resources, it will end up making N watch API calls, one for each resource, and this will eventually result in N requests on the ADS stream. Each of these requests will contain one more resource compared to the previous one. The situation can be worsened in the case where the LDS resource pointed to K RDS resources and was changed to point to N new RDS resources. So, this would end up in K+N requests on the ADS stream.

While this behavior is suboptimal, it needs to be noted that:

  • there is no correctness issue here
  • this is still not violating the xDS transport protocol

There are a few things we can try to make this better though:

  • Support a transaction like semantics in the xDS client API. With this approach, a user would do the following:
    • Start transaction with a new API call.
    • Register a bunch of watches, and/or unregister a bunch of watches (with the existing APIs).
    • Stop the transaction with a new API call.
    • The xDS client will wait for the transaction to complete before sending the resources on the wire.
    • Cons of this approach:
      • This puts the onus on the caller to not perform any expensive operations including I/O during the time the transaction is open. It also puts the onus on the caller to close the transaction in time. Optionally, the xDS client could treat the transaction as advisory and if it is not closed within a pre-defined amount of time, could decide to send outstanding resource requests on the wire.
      • Increases the API surface of the xDS client.
      • The xDS client is shared between multiple entities within a single gRPC channel. It is also shared across multiple gRPC servers. One misbehaving entity can affect others in this approach.
  • Avoid any API changes, and limit the changes to the authority and transport packages. With this approach:
    • Instead of the authority sending the list of resources to the transport to send out, it will simply send a message to the transport saying something needs to be sent out.
      • The authority already maintains a list of resources requested by watchers
      • The transport already maintains a list of resources sent out on the ADS stream, i.e requested from the server
    • So, when the transport sees the message from the authority, it will query the authority to get the most recent list of resources to send out. If the most recent list matches what the transport has already sent out, the operation will be a no-op.
@easwars easwars added the Type: Feature New features or improvements in behavior label Aug 1, 2024
@dfawley dfawley added the P1 label Aug 6, 2024
@easwars easwars removed their assignment Aug 30, 2024
@arjan-bal arjan-bal added the Area: xDS Includes everything xDS related, including LB policies used with xDS. label Sep 4, 2024
@dfawley dfawley removed the P1 label Sep 10, 2024
@dfawley
Copy link
Member

dfawley commented Sep 10, 2024

This should get fixed by @easwars's upcoming refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior
Projects
None yet
Development

No branches or pull requests

3 participants