You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem: the server can be configured in a way that causes an indefinitely hanging job
The current FLARE controller is designed to allow setting the minimum number of required clients along with a server timeout. When min_clients is set to the total number of available clients with server_timeout=0, a failed client will cause the server workflow to hang.
This feature is useful for production use cases, in which the server workflow should be resilient to temporary interruptions in client communication, allowing for clients to temporarily fail and reconnect.
But in cases where a client has failed and is unrecoverable, the server workflow should timeout, independent from the controller workflow configuration. This would also allow a "development mode" in which any client failure causes the server workflow to terminate.
Potential solution
A separate server timeout configuration could be implemented independent of the controller configuration (for example in the server communication layer). This could be configured as a server job timeout, where
a timeout of 0 could trigger immediate failure (development mode)
a timeout of -1 (inf) would result in current behavior (production mode)
a non-zero positive timeout, depending on your level of patience
The text was updated successfully, but these errors were encountered:
Problem: the server can be configured in a way that causes an indefinitely hanging job
The current FLARE controller is designed to allow setting the minimum number of required clients along with a server timeout. When
min_clients
is set to the total number of available clients withserver_timeout=0
, a failed client will cause the server workflow to hang.This feature is useful for production use cases, in which the server workflow should be resilient to temporary interruptions in client communication, allowing for clients to temporarily fail and reconnect.
But in cases where a client has failed and is unrecoverable, the server workflow should timeout, independent from the controller workflow configuration. This would also allow a "development mode" in which any client failure causes the server workflow to terminate.
Potential solution
A separate server timeout configuration could be implemented independent of the controller configuration (for example in the server communication layer). This could be configured as a server job timeout, where
The text was updated successfully, but these errors were encountered: