Connect to multiple instances (clusters) at the same time #314

sagikazarmark · 2021-03-25T16:29:08Z

From time to time I see feature requests in software consuming LDAP to add cluster support. I assume it's useful in environments when there is no load balancer in front of the different LDAP servers?

Anyway, I was wondering if it would be doable on the LDAP Client level or if there is an existing implementation somewhere. Couple searches in the repo and issues yielded no results.

Here is an example of those requests: dexidp/dex#1904

I'm not particularly a fan of the implementation in this PR and I'd like to find a better/lower-level solution if possible.

Any ideas? Thanks in advance!

stefanmcshane · 2021-03-26T18:49:22Z

Hi @sagikazarmark,
To my knowledge, we don't currently support anything like this as it would effectively be connecting to multiple LDAP servers.
Whilst it doesn't follow the LDAP spec, for ease of use on the library going forward I would be open to an extra function for connecting to multiple servers, however this would also need to include a plan for failover for each connection including retries and timeouts.

Have you any ideas on what you would like to see in this @sagikazarmark @johnweldon?

On an initial pass, I would think that we have a Cluster struct which could contain either a slice of the *Conn structs, or a map which allows us to refer to a specific server in the cluster.
We would then need an internal retryOnNextConn method which could pass the request onto the next connection.
I'm open to suggestions, and working with you to get this implemented.

sagikazarmark · 2021-03-27T01:52:53Z

@stefanmcshane glad to hear, I think it would help the consumers of this library facing similar use cases.

I'm not intimately familiar with the library, but here are a couple thoughts:

This is essentially client side load balancing, so I would treat it as such (load balancing algorithm, etc)
I probably wouldn't use a map, because it's unordered
when picking a connection (Conn; BTW this is a little bit confusing name in my opinion, because it's more like a client than a connection, isn't it?), the algorithm should check if the connection is actually alive (if that's even possible in this case) to always pick an available server (that is to make sure that a failover event doesn't cause cascading failures in applications)

Those are my initial thoughts about the problem. We can probably discuss retries and timeouts in more detail (for example when and how to retry requests? Should the implementation move to the next connection until it runs out?)

stefanmcshane · 2021-03-30T20:29:24Z

Thanks @sagikazarmark for the suggestions.

The way I like to think of Conn vs Client is akin to how postgres do it in that there could be many connections for a given client. When implementing this library, I tend to call the implementing package the client as I usually setup retries, or various connections there. Granted, this is a personal opinion and wasnt necessarily the consideration when @johnweldon started the package.

This is essentially client side load balancing, so I would treat it as such (load balancing algorithm, etc)

I agree with this. Do you know of any libraries that implement something similar/implement this in a way that you think is seamless? If you want to suggest a draft PR on what you believe would be a desirable user-facing experience, that would also be helpful in the design decisions.

I probably wouldn't use a map, because it's unordered

Whilst this is true and would hurt retries as an example (try next), it could be useful on the assumption that the user will want to change the primary connection on a given request type. An example here could be on a globally deployed platform, they might want to add a new user, which is seen in US first, without waiting on their replication strategy kicking in. I suspect that in a naive implementation, we would have to setup a map of the given hosts to connect to, as well as a slice for the given maps to order them.

the algorithm should check if the connection is actually alive (if that's even possible in this case) to always pick an available server (that is to make sure that a failover event doesn't cause cascading failures in applications)

My initial thought at this would be to implement a basic round-robin, but expose a way that the user can implement their own retry/timeout.

Let me know what you think. If you're up for collaborating on this one, I'd appreciate that also.

sagikazarmark · 2021-03-31T01:43:57Z

@stefanmcshane

I agree with this. Do you know of any libraries that implement something similar/implement this in a way that you think is
seamless?

I think a naive round robin can just be implemented as a counter (ie. uint32) that you keep increasing atomically and calculate the counter modulo number of connections to choose the next connection.

Whilst this is true and would hurt retries as an example (try next), it could be useful on the assumption that the user will want to change the primary connection on a given request type. An example here could be on a globally deployed platform, they might want to add a new user, which is seen in US first, without waiting on their replication strategy kicking in.

As long as the load balancing algorithm is explicit and doesn't depend on the random randomness of map access, I think the internal data structure is less important.

BTW that scenario sounds like a different type of retry. For example instead of choosing a specific connection, I'd implement a weighted list, prioritizing the closest server first and the primary later, but in that case, retrying the query happens if the closest server returns an empty result, not when an error is returned. I'm not sure that makes sense for LDAP, but it makes sense for tackling replication issues. But again, different story.

My initial thought at this would be to implement a basic round-robin, but expose a way that the user can implement their own retry/timeout.

I'd start with something stupid simple and add configuration later when some feedback arrives.

If you're up for collaborating on this one, I'd appreciate that also.

TBH I'm not very familiar with LDAP or this library, so I'm happy to discuss design, test the implementation in Dex or review code even, but I'd leave the implementation to someone more familiar with the library and LDAP.

johnweldon · 2021-04-01T17:49:33Z

I like the collaboration and discussion here; it sounds like it's heading in the right direction.

I'd like to clarify that I didn't actually start this project, I just moved it to a more canonical name and tried to support it a little bit over the years. The original author I believe is @mmitton and this was the original repo before https://github.com/mmitton/ldap

scaranoj · 2021-04-15T19:59:52Z

Hi All! 👋 Here's a use case: Large US furniture manufacturer is using Dex for Kubernetes authentication, but they're only able to connect to a single LDAP/Active Directory backend and would like to prevent a single fault domain by allowing for 2 or more LDAP backends (adding on their behalf).

jarrettprosser · 2022-04-06T22:32:11Z

I can speak to our use case a bit - we are also using Dex for authentication in corporate environments. It's common for IT to provide us with several URLs for LDAP servers in different on-premise data centres. As @scaranoj mentioned it's a way of preventing a single fault domain for the directory, usually not using an (on-premise) load balancer because that would have to be hosted in one of the data centres and would reintroduce a single point of failure.

In some cases we can use Keycloak, which does support LDAP failover through the Java JNDI LDAP provider. I think the implementation there is effectively a round-robin which attempts to connect to each server in turn until a connection is successful, then continues to use that server. From the doco:

If the list contains more than one URL, the provider should attempt to use each URL in turn until it is able to create a successful connection, and after creation, set the property to the successful URL.

mayrstefan · 2022-04-25T09:27:01Z

Just iterating through a list seems to be a very common pattern that could be the starting point or default algorithm. This could be implemented first and more sophisticated algorithms could follow (round-robin, weighted-connection, least-response-time). The simpler algorithms could just reorder the list before iterating through it.

The simple list approach can be found in

Java JNDI LDAP URls (like already mentioned): space separated list of hostnames
Apache httpd mod_authnz_ldap: space separated list of hostnames
PostgreSQL JDBC Driver Connection Fail-Over: comma separated list of hostnames
Almost all DNS based solutions:
- CDNs return multiple records for the queried domain. E.g. cloudflare.com
- DELL EMC Isilon (now Powerscale) as a NAS/file server is also a DNS server. It returns multiple entries and rotates this list on each query for load balancing. So no need to mess with the order on the client side
and many other products

Requests for service discovery like #329 need the same base work: you get a list of servers that has to be sorted by priority and weight. Then you iterate through that list until you find a server you can connect to.

mayrstefan · 2022-04-25T09:41:16Z

Another advantage of a simple list: it also works with only one element. What we use today without loadbalancing or HA.

alexei-matveev · 2023-03-14T23:05:29Z

At least in the AD-Context it seems to be the responsibility of the Client to
choose correct DC(s) and fail over as necessary:

https://serverfault.com/questions/734101/active-directory-multi-site-choose-nearest-dcs-into-linux-not-microsoft-applica

mayrstefan · 2023-03-16T21:55:44Z

@alexei-matveev for AD this boils down to the following:

query DNS für SRV records which will give you a weighted list of ldap servers
iterate through that list

Which means the ability to go through a list of servers until you find a working one is an essential core functionality

stefanmcshane self-assigned this Mar 30, 2021

stefanmcshane added enhancement help wanted labels Mar 30, 2021

stefanmcshane mentioned this issue Apr 3, 2021

feat: ldap cluster connector dexidp/dex#1904

Open

nabokihms mentioned this issue Apr 6, 2022

Multiple host for LDAP connector dexidp/dex#1817

Open

cpuschma self-assigned this Apr 25, 2022

alexei-matveev mentioned this issue Mar 14, 2023

Q: Fault tolerant LDAP connection? dexidp/dex#1548

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect to multiple instances (clusters) at the same time #314

Connect to multiple instances (clusters) at the same time #314

sagikazarmark commented Mar 25, 2021

stefanmcshane commented Mar 26, 2021

sagikazarmark commented Mar 27, 2021

stefanmcshane commented Mar 30, 2021

sagikazarmark commented Mar 31, 2021

johnweldon commented Apr 1, 2021

scaranoj commented Apr 15, 2021

jarrettprosser commented Apr 6, 2022

mayrstefan commented Apr 25, 2022

mayrstefan commented Apr 25, 2022

alexei-matveev commented Mar 14, 2023 •

edited

Loading

mayrstefan commented Mar 16, 2023

Connect to multiple instances (clusters) at the same time #314

Connect to multiple instances (clusters) at the same time #314

Comments

sagikazarmark commented Mar 25, 2021

stefanmcshane commented Mar 26, 2021

sagikazarmark commented Mar 27, 2021

stefanmcshane commented Mar 30, 2021

sagikazarmark commented Mar 31, 2021

johnweldon commented Apr 1, 2021

scaranoj commented Apr 15, 2021

jarrettprosser commented Apr 6, 2022

mayrstefan commented Apr 25, 2022

mayrstefan commented Apr 25, 2022

alexei-matveev commented Mar 14, 2023 • edited Loading

mayrstefan commented Mar 16, 2023

alexei-matveev commented Mar 14, 2023 •

edited

Loading