-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q: Fault tolerant LDAP connection? #1548
Comments
Have you considered using a tcp Load Balancer for this? |
Yes - but - in case of "load balancer" this requires additional complexity in the architecture and many "on premise deployments" just do not have load balancers between k8s and the Active Domain controllers networks. Also LDAP fault tolerance is similar like DNS fault tolerance - handling the failure is on the client side per design.. Do you have an idea what will break / not work in case Dex can not reach the specified LDAP server? Is there identity and group caching in Dex so that even without LDAP the RBAC authentication / authorization continues to work while LDAP is not reachable? Do you have an idea about running a load balancer within the k8s deployment so we could configure Dex -> K8S load balancer -> Active Directory? |
The current LDAP Go client in Dex is not capable of handling such scenarios, so we would have to implement some custom reconnection logic. There is no such cache in the LDAP connector, if the server is unreachable it is not possible to login. For a load balancer, I would put a HAProxy or a NGiNX sidecar next to Dex. |
I agree with @Martin-Weiss - we all know how to loadbalance services, but such a setup is not possible in every scenario. it would be really great, if dex would support multiple ldap-servers per connector. |
In the meantime I did some research and found some "ideas" to work around this problem - but non of the ideas seems to be production ready nor fulfill all requirements i.e. regarding LDAP target health checks. Basically we could do While a) seems to be the most simple one - all of the above just can do TCP check which unfortunately is not sufficient for LDAP connectivity fault tolerance. In case the ldap port is open but the ldap server behind the port does not respond as expected.. (application layer failure) The health check for the ldap connect would need to do "ldap bind" -> check if the ldap communication works at all, "ldap search" -> check if a result comes back or if we get an error and failover if ldap bind or search do not work / timeout against the first given LDAP server. All also needs to be done through SSL which makes load balancing / health check and failover even more difficult. So is there any chance to enhance the ldap client to allow multiple ldap servers to be configured and use a failover mechanism to use a second / third one in case the first one for a given directory does not respond or has an error during bind/search? Then stay with the second one until this fails and so on... with a final error in case all of the configured ldap servers do not work as expected maybe with some retries... |
I am new to this code base. But with an initial look at the code, I am thinking this request can be satisfied by implementing a new connector called 'ldap_cluster' (or similar) which reads configuration into an array of config items. I can give it a try if you think this is acceptable. |
I have also checked the LDAP client since it can't handle multiple addresses I agree that we would need something like @phiremande suggests. I'm okay with such a change! |
Thanks @bonifaido. I will try to come up with the changes needed. |
A workaround is to add multiple LDAP connectors by config. |
I assume you mean configure two identical connectors and give them different IDs so that each user and group is available via two or three connectors? |
This does NOT work, as you are presented with the option to click and choose each connector upon logging in. If one of the connector backends is not available, it would not change anything. You still have to choose which connector you want to use during login. If the first one does not work, you would have to click the second connector. So this is NOT a solution to the given problem. |
@Martin-Weiss, have been making slow progress on this. Please see https://github.com/phiremande/dex/tree/feature-ldapcluster and check https://github.com/phiremande/dex/blob/feature-ldapcluster/examples/config-ldapcluster.yaml for example configuration. |
@phiremande , wow - great to see you could find some time to work on this! I am not a developer so I do not understand the code in detail but I can see that we are able to specify multiple LDAP servers within a cluster with separate filters and bind configurations - great!! :-). Could you give some background on how the logic for connecting and failover is build? Basically I would assume that we would use the same filters against all the LDAP servers in an LDAP cluster as they should have identical content (i.e. Active Directory Domain Controllers that are replicated). So having different filters might not be required. For failover - it would be nice if we could switch between the configured LDAP servers if one gives an error during bind or search - but not failover if just the search does give an empty result. And IMO we should not failback before an other error happens. I am also not sure if we might need configurable timeouts for the LDAP connect and failover - or some retries.. Again - thanks for the great progress and step forward :-). |
@Martin-Weiss , thanks for your response. I am not expecting code review inputs at this stage, but just the functioning based on the initial code (if you build and run the dex binary with multiple LDAP servers). I probably should have given the background on the design when I asked for input, sorry for that. Below is the design that is incorporated currently.
So essentially, bind is round-robin and any subsequent search is to active (server against which bind succeeded), server only. |
Thanks a lot for the details - so yes - this sounds like what we need! I believe we should have an ordered list so the first one is taken when ever possible and we might also need a fallback after some time. Reason: in case we have LDAP/Active Directory we might have a remote LDAP server and we might have a local LDAP server and we should always use the local one if possible - only use the central / remote one in case the local one is not available. |
I just bumped on this issue. Is there any intention of implementation in any near future release? |
I think this might be relevant in multisite AD Environment: https://ldap.com/dns-srv-records-for-ldap/ I also noted that DialTLS() is deprecated (in comments) in the go-ldap library in favor of DialURL(): https://github.com/go-ldap/ldap/blob/master/v3/conn.go#L198 I do not see immediately if DialURL() offers any advantages for "fault tolerance". |
Is it possible to configure dex in a way that is connecting to one LDAP server and in case that is not reachable fail over to a second LDAP server?
I.e. in Active Directory environments fail-over is required if one domain controller gets updated / rebooted - during that period of time the LDAP/AD clients need to fail over to a second one which is providing the same data for fault tolerance and scalability.
The text was updated successfully, but these errors were encountered: