Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Build Pipeline with Debug and Core Analysis Support #784

Merged
merged 3 commits into from
Dec 18, 2024

Conversation

edespino
Copy link
Contributor

@edespino edespino commented Dec 16, 2024

Fixes: #765

Adds comprehensive debug build support and automated core dump analysis to the Cloudberry build pipeline. Key features:

  • Debug build capability with preserved symbols and debug-specific RPMs
  • Automated core dump detection and analysis during test execution
  • Core file correlation with test failures
  • Enhanced test result reporting with core dump status
  • Improved artifact management for debug builds

The changes enable better debugging of test failures and provide more detailed information about process crashes during testing.

I expect the ic-good-opt-off group to fail as the mirror_replay test suite is dumping a core. I have filed the following for that issue: Core Dump in mirror_replay Test Suite During Execution #782

After the checks have run and ic-good-opt-off has failed as expected with core dump, I plan to enable the skip ci optionis PR to allow it to be committed after receiving approval.

Benign edit to re-trigger the workflow.

@avamingli
Copy link
Contributor

After the checks have run and ic-good-opt-off has failed as expected with core dump, I plan to enable the skip ci optionis PR to allow it to be committed after receiving approval.

If we do this with a skip CI tag, any later pull requests will fail and cannot be merged. I’m not optimistic that we can fix the core dump issue #782 in a short time.

How about we configure the system to allow the mirror_replay case for now? This way, we can separate the core dumps and report them independently. When someone eventually fixes the mirror_replay issue, we can update the configuration to set the expected core dump case to NULL, ensuring that nothing is blocked.

@edespino
Copy link
Contributor Author

If we do this with a skip CI tag, any later pull requests will fail and cannot be merged. I’m not optimistic that we can fix the core dump issue #782 in a short time.

How about we configure the system to allow the mirror_replay case for now? This way, we can separate the core dumps and report them independently. When someone eventually fixes the mirror_replay issue, we can update the configuration to set the expected core dump case to NULL, ensuring that nothing is blocked.

The best I can do is to disable (ignore) the mirror_replay test for now. Is this what you meant? I do not have a way to exclude the core checking for a specific test suite within a test schedule. Will disabling mirror_replay suffice for now?

@avamingli
Copy link
Contributor

The best I can do is to disable (ignore) the mirror_replay test for now. Is this what you meant?

This is indeed a way to address the issue at the moment. However, it means we will miss out on a test case.
On the positive side, it will allow us to promptly detect core dumps when they occur.

cc: @my-ship-it

@edespino
Copy link
Contributor Author

There are a couple of options:

  1. I could place mirror_replay into it's own schedule, add a makefile target for it, enhance the core file check to allow a non-failure state at the matrix test group level. We could then unwind when it is fixed.
  2. Instead of signaling a failure, I could simply flag core files (if found) as a warning and not a fatal scenario. I can add this as an annotation in the UI and a yellow warning symbol in the summary.

Personally, I prefer option 1 over 2. I do not know if developers tend to review the details of the workflow action only when there are CI check status failures. A core file warning condition could go unnoticed if the test case passes as is the case with mirror_replay.

@edespino
Copy link
Contributor Author

There are a couple of options:

  1. I could place mirror_replay into it's own schedule, add a makefile target for it, enhance the core file check to allow a non-failure state at the matrix test group level. We could then unwind when it is fixed.
  2. Instead of signaling a failure, I could simply flag core files (if found) as a warning and not a fatal scenario. I can add this as an annotation in the UI and a yellow warning symbol in the summary.

Personally, I prefer option 1 over 2. I do not know if developers tend to review the details of the workflow action only when there are CI check status failures. A core file warning condition could go unnoticed if the test case passes as is the case with mirror_replay.

@avamingli & @my-ship-it Any thoughts on which approach to take? I have other test suites I would like to enable that is blocked on getting this work completed.

@my-ship-it
Copy link
Contributor

There are a couple of options:

  1. I could place mirror_replay into it's own schedule, add a makefile target for it, enhance the core file check to allow a non-failure state at the matrix test group level. We could then unwind when it is fixed.
  2. Instead of signaling a failure, I could simply flag core files (if found) as a warning and not a fatal scenario. I can add this as an annotation in the UI and a yellow warning symbol in the summary.

Personally, I prefer option 1 over 2. I do not know if developers tend to review the details of the workflow action only when there are CI check status failures. A core file warning condition could go unnoticed if the test case passes as is the case with mirror_replay.

Hey Ed, thanks for making the enhancement.
I prefer option 1, in addition, it's better we can add a FIXME to remind us to fix this problem later.

Adds comprehensive debug build support and automated core dump analysis to
the Cloudberry build pipeline. Key features:

- Debug build capability with preserved symbols and debug-specific RPMs
- Automated core dump detection and analysis during test execution
- Core file correlation with test failures
- Enhanced test result reporting with core dump status
- Improved artifact management for debug builds

The changes enable better debugging of test failures and provide more
detailed information about process crashes during testing.
This test is currently causing core dumps when run as part of the
greenplum_schedule. To prevent this from blocking other testing while
we investigate the root cause:

- Created new fixme_schedule containing only mirror_replay
- Removed mirror_replay from greenplum_schedule
- Added installcheck-fixme make target to run problematic tests in
  isolation

Issue: apache#782
When enable_check_core is disabled, the test should proceed with a
warning rather than failing. Modified the core file check and summary
to mark mirror_replay with a warning status in these cases.

This complements the previous isolation of this test into
fixme_schedule, allowing testing to proceed while we investigate the
underlying core dump issue.
@edespino edespino closed this Dec 17, 2024
@edespino edespino deleted the core-file-support branch December 17, 2024 10:09
@edespino edespino restored the core-file-support branch December 17, 2024 10:10
@edespino edespino reopened this Dec 17, 2024
@edespino
Copy link
Contributor Author

@avamingli & @my-ship-it Please check out the workflow. You will see a new test group ic-fixme which contains mirror_replay. It succeeds but the core does not cause the test to fail. You will see that warnings are available with the test summary. Let me know what you think.

@edespino edespino merged commit 28eb91a into apache:main Dec 18, 2024
11 checks passed
@edespino edespino deleted the core-file-support branch December 18, 2024 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with Core Dumps in Cloudberry Tests
3 participants