-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating FlinkDeployment interpreter to display error status, improving health interpreter #6073
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #6073 +/- ##
==========================================
- Coverage 48.33% 48.11% -0.23%
==========================================
Files 666 668 +2
Lines 54858 55163 +305
==========================================
+ Hits 26518 26541 +23
- Misses 26616 26896 +280
- Partials 1724 1726 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@yike21 @chaunceyjiang can you take a look? |
Ok, I'll take a look at it ASAP :-) |
Here is the FlinkDeployment reference where you can find the definition of the status: |
...irdparty/resourcecustomizations/flink.apache.org/v1beta1/FlinkDeployment/customizations.yaml
Show resolved
Hide resolved
f705b26
to
e210ed8
Compare
...irdparty/resourcecustomizations/flink.apache.org/v1beta1/FlinkDeployment/customizations.yaml
Outdated
Show resolved
Hide resolved
…ng health interpreter Signed-off-by: mszacillo <[email protected]>
e210ed8
to
f14e0f9
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
After doing some load testing of the FlinkDeployment failover (which overall has been looking quite good, but may need to address a couple edge cases), I found that the interpreter is missing one of the ephemeral states that FlinkDeployments can transition through.
Occasionally on failover, the FlinkDeployment will transition from RECONCILING -> INITIALIZING -> CREATED, before finally ending on RUNNING. Additionally, we can make use of the status.error field to further improve the health interpretation.
In this PR I've added:
Which issue(s) this PR fixes:
Fixes #6023
Does this PR introduce a user-facing change?: