Draft: Ensure traffic weighting still applies with session affinity. #1655
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a first pass at a change to how the Session Affinity cookies work alongside canary weighting, only implemented for gateway api at this point.
As mentioned in this issue, with session affinity enabled, over time all traffic will be sent to the canary, even with the canary weight only set to 10%. Every request a user sends in this scenario will result in a 10% chance they're routed to the canary, after which they'll be on that version until the canary completes, meaning canary will be receiving 100% after a short time (depending on how many requests each user makes).
The approach I've taken here to resolve this is to add another cookie that will hold sessions to the primary until the next step of the analysis, after which a new "primary cookie" will be set, allowing a rebalance according to the next canary weight. This allows the canary weight to stay roughly accurate across the lifetime of the analysis, while still ensuring users don't switch back to primary after hitting the canary version.
I have not handled clearing the primary cookie, which would require storing each cookie defined (or some sort of prefix based match system). Thus far that seems to work fine as the cookie gets overwritten by a new one in the next canary run, but if there are concerns with leaving a cookie set I can make adjustments to the clearing logic.
There's definitely some refactoring to do to make the code more concise, but I wanted to get feedback on the approach prior to investing too much time there.