You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue addresses the scenario where an actor gets blacklisted while the registry is unavailable, leading to potential failures of subsequent requests for that actor. Currently, when the registry is down, there is no mechanism in place to handle the blacklisting of actors, which can result in continued routing of requests to blacklisted actors, leading to failures.
Proposed Solution
Evaluate the impact of blacklisting on actor availability and system stability during registry unavailability.
Design a strategy to handle the blacklisting of actors even when the registry is unavailable.
Implement a mechanism to mark blacklisted actors and prevent requests from being routed to them, regardless of the registry's availability.
Explore options for locally storing blacklisted actors and their corresponding server IDs during the registry downtime.
Enhance the routing logic to check the local cache of blacklisted actors and prevent requests from being forwarded to them.
Implement a mechanism to periodically synchronize the local cache with the registry once it becomes available again.
Consider the potential overhead and performance implications of maintaining a local cache for blacklisted actors.
Write tests to validate the behavior of blacklisted actors during registry unavailability and ensure the correctness of the implemented solution.
Evaluate the system's behavior and performance under various scenarios, including blacklisting during registry downtime and subsequent cache synchronization.
Additional information
By addressing this issue, we aim to improve the handling of blacklisted actors during registry unavailability. The proposed solution will prevent requests from being routed to blacklisted actors, even when the registry is down, reducing the likelihood of failures and enhancing the overall stability and reliability of the system.
This issue serves as a reminder to investigate and implement the necessary changes to handle blacklisted actors when the registry is unavailable. It also provides an opportunity to evaluate the impact and effectiveness of the proposed solution in mitigating the potential failures associated with blacklisted actors during registry downtime.
The text was updated successfully, but these errors were encountered:
@aratz-lasa maybe I’m missing something, but I think it may be as simple as:
replicas == 1 and we get blacklisted error: try to refresh registry synchronously. If registry is down, there is nothing we can do because the actor was blacklisted anyways. It’s “expected” that black listing will cause temporary unavailability with RF=1
replicas > 1 and we get blacklisted error: remove the blacklisted reference from the cache, async notify the registry that the actor is blacklisted (some new method we add to the registry) and then we’re done pretty much. Subsequent requests will use only the non blacklisted replicas and the registry will place the actor on a new server when it’s notified of the Blacklistiing and the servers will all eventually pick up the new placement as they asynchronously refresh their caches.
I think a lot of the complexity of the existing implementation were struggling with is because I hijacked the ensureActivation method as a way to notify the registry of actors that have been blacklisted. But if we just had a discrete pathway for that it would be much simpler / cleaner I think.
Description
This issue addresses the scenario where an actor gets blacklisted while the registry is unavailable, leading to potential failures of subsequent requests for that actor. Currently, when the registry is down, there is no mechanism in place to handle the blacklisting of actors, which can result in continued routing of requests to blacklisted actors, leading to failures.
Proposed Solution
Additional information
By addressing this issue, we aim to improve the handling of blacklisted actors during registry unavailability. The proposed solution will prevent requests from being routed to blacklisted actors, even when the registry is down, reducing the likelihood of failures and enhancing the overall stability and reliability of the system.
This issue serves as a reminder to investigate and implement the necessary changes to handle blacklisted actors when the registry is unavailable. It also provides an opportunity to evaluate the impact and effectiveness of the proposed solution in mitigating the potential failures associated with blacklisted actors during registry downtime.
The text was updated successfully, but these errors were encountered: