Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zuul 3 Impovements #771

Closed
13 tasks
carl-mastrangelo opened this issue Apr 11, 2020 · 7 comments
Closed
13 tasks

Zuul 3 Impovements #771

carl-mastrangelo opened this issue Apr 11, 2020 · 7 comments
Labels
3.x Zuul 3 API breaking Breaks public API Stale
Milestone

Comments

@carl-mastrangelo
Copy link
Contributor

carl-mastrangelo commented Apr 11, 2020

Some API improvements I'd like to see in Zuul 3:

  • Remove Hard Dependency on Guice / Governator.
  • Remove Hard Dependency on Groovy, but provide adapter for loading Groovy filters
  • Support Dagger 2 injection.
  • Split zuul-core into zuul-core and zuul-api.
  • Use standardized error types (Canonical Status codes) internally, with sub error codes attached
  • Make Session Context type safe, using Key types for accessors
  • Drop RX as an API type. Use Netty Promises / Futures on the API.

Some stretch API goals:

  • Use Netty 5 API. While unstable, this could be confined to the zuul-core package which doesn't provide semantic version stability

Implementation goals for Zuul 3:

  • Make internals HTTP/2 stream oriented. Upgrade H1 connections rather than downgrading H2 conns.
  • Make Name resolver for origins pluggable. (Factor our Eureka dependency)
  • Make Load Balancer for origins pluggable. (Factor out Ribbon dependency, as it is no longer maintained.)
  • Move away from Servo to Spectator.
  • Move from Archaius 0.7 to Archaius 2 (inject oriented)

The API package will have semantic versioning guarantees. This will include the pluggable parts of Zuul, such as the Filters, Session Context, and Request/Response types.

@carl-mastrangelo carl-mastrangelo added API breaking Breaks public API 3.x Zuul 3 labels Apr 11, 2020
@carl-mastrangelo carl-mastrangelo added this to the Zuul 3 milestone Apr 11, 2020
@artgon
Copy link
Contributor

artgon commented Apr 11, 2020

Additional things to think about:

  • Admin console replacements
  • Spring integration option (may solve the above issue)
  • Connection pool efficiency improvements
  • Declarative routing spec, i.e. make routing a first-class citizen
  • Propagating cancellations from client to origin channel

Features to migrate to OSS:

  • SNI on server
  • H2/gRPC proxying
  • Rate limiting

@artgon artgon pinned this issue Apr 11, 2020
@artgon
Copy link
Contributor

artgon commented Apr 11, 2020

Slightly more controversial one:

  • Separate proxying logic from filter chain to make the filters optional/pluggable

@argha-c
Copy link
Collaborator

argha-c commented Apr 15, 2020

Additional considerations

  • Support for pluggable load balancing implementation for origin selection
  • Support for pluggable load shedding implementation for the Zuul server
  • Pluggable service discovery

+1 to some of the above:

  • Drop RX as an API type. Use Netty Promises / Futures on the API
  • Make Session Context type safe, using Key types for accessors
  • Make internals HTTP/2 stream oriented. Upgrade H1 connections rather than downgrading H2 conns.
  • SNI on server
  • H2/gRPC proxying
  • Propagating cancellations from client to origin channel (would be really nice to have!)

@carl-mastrangelo
Copy link
Contributor Author

One of the design decisions that needs to be revisited is how Name Resolution (NR) works. In the current architecture, Zuul creates a client side load balancer (CSLB) object with the name of the service it wants to balance traffic to. This name is typically called the "vip", but may also be a DNS name. The CSLB then creates a Eureka/Discovery client which resolves the IP addresses and metadata about the service, and asynchronously feeds these into the LB. When Zuul wants to connect, it queries the CSLB for a backend (called a "Server"), and creates the connection if it is absent. The traffic sent to to this server is then tallied in a "ServerStats" object shared with the LB, which is used to pick the next server object for use.

There are several problems with this approach. The Name resolver is tightly coupled with the load balancer. There is no separation between the LB and the NR, so they cannot be exchanged. The load balancer is entirely in charge of the async updates to the server set, preventing any visibility into the name choices. When Zuul gets odd or seemingly impossible IP addresses back, we don't know if it's a bug in the Name resolver, the load balancer, or Zuul itself.

Another problem is the lack of name resolver flexibility. The Eureka client (NR) being used is unlikely to support other, more modern forms of name resolution. We would like to explore using EDS (of the xDS protocol family), but this is currently infeasible. because the NR and the LB have to be swapped out together, the amount of work is effectively doubled.

The Eureka data objects have a number of problems too. They are oriented around the "IP" address as the identity of a server. Modern servers have many IP addresses, usually 1 IPv4 addr and multiple IPv6 addrs. When zuul needs to connect, it must pick one of these. However, since all book keeping about load balancing and healthiness are oriented around a single IP address, Zuul needs to keep the canonical IP around. We claim we sent traffic to a particular IP for Load balancing, and for stats, and for monitoring, but in reality we connected elsewhere. This gets even more puzzling when logging happens, because we need to NOT use the canonical address then.

In a similar vein to the previous problem, modern servers have multiple ports, with multiple protocols. This greatly complicates connection logic, because the IP address and port are picked by the LB rather than Zuul. A recent endeavor to turn on SSL automatically reveals this challenge. Because we decide which address to use late in the load balancing phase, we don't know if SSL is possible until much later. We would prefer to pick addresses which advertise SSL, and only fall back to plaintext when necessary. This is not possible today. Any form of IP address selection or filtering happens too late in the connection logic; i.e. after the load balancing decision.

@carl-mastrangelo
Copy link
Contributor Author

Another change to make: We need to get avoid using IClientConfig as the means for configuration of origin connections. It is currently a map of concatenated strings, which can be unbounded in size. This makes it hard to trace where values are passed along, and what values may be in there. This was originally used to integrate with the Properties configuration system (in use widely at the time), but the use for this has slowly been declining. It would be better to have a well defined set of configuration, which can be audited and validated before use. We can make adapters on API boundaries to make this transitition.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 15, 2024
Copy link

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x Zuul 3 API breaking Breaks public API Stale
Projects
None yet
Development

No branches or pull requests

3 participants