Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8342848: Shenandoah: Marking bitmap may not be completely cleared in generational mode #523

Closed
wants to merge 5 commits into from

Conversation

pengxiaolong
Copy link

@pengxiaolong pengxiaolong commented Oct 23, 2024

In the investigation of the crashe I saw in PR #516, I happened to reproduce the crash GenShen TIP as well.

The crash was reproduced multi times on both AWS r7g-4xlarge and r7i-4xlarge instances by running test below repeatedly:

CONF=linux-aarch64-server-fastdebug  make clean test TEST=gc/stress/gcold/TestGCOldWithShenandoah.java#generational JTREG="REPEAT_COUNT=1000" 

Crash:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/xlpeng/repos/jdk-xlpeng/src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp:642), pid=24134, tid=24158
#  assert(_generation->is_bitmap_clear()) failed: need clear marking bitmap
#
# JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.xlpeng.jdk-xlpeng)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.xlpeng.jdk-xlpeng, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x15eadc4]  ShenandoahConcurrentGC::op_init_mark()+0x358
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /local/home/xlpeng/repos/jdk-xlpeng/build/linux-aarch64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_gc_stress_gcold_TestGCOldWithShenandoah_java_generational/scratch/0/hs_err_pid24134.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

With logging/instrumentation, it seems to be caused by the one line code bool needs_reset = _generation->contains(region) || !region->is_affiliated(); , considering bitmap reset is a concurrent operation, if is possible mutator thread changes the affiliation from FREE to YOUNG when bitmap reset is running, both _generation->contains(region) and !region->is_affiliated() can be false when affiliation is FREE and mutator is updating it at the same time.

Logs from instrumentation:

[32.793s][info][gc          ] GC(19) Not reseting bitmap for YOUNG region (0x0000ffff8c1a6100)(affiliation before test: FREE)

...

[32.807s][info][gc,task     ] GC(20) Using 8 of 8 workers for init marking
[32.808s][info][gc          ] GC(20) Region (0x0000ffff8c1a6100) doesn't have clear bitmap, [1, 1, 1]

The fix is simple, just need to swap the two tests, !region->is_affiliated() should be evaluated prior to _generation->contains(region)


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

  • JDK-8342848: Shenandoah: Marking bitmap may not be completely cleared in generational mode (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/shenandoah.git pull/523/head:pull/523
$ git checkout pull/523

Update a local copy of the PR:
$ git checkout pull/523
$ git pull https://git.openjdk.org/shenandoah.git pull/523/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 523

View PR using the GUI difftool:
$ git pr show -t 523

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/shenandoah/pull/523.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 23, 2024

👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 23, 2024

@pengxiaolong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8342848: Shenandoah: Marking bitmap may not be completely cleared in generational mode

Reviewed-by: wkemper

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • 4970708: 8342580: GenShen: TestChurnNotifications fails executing in unintended test-id modes with ShenandoahGCMode=generational

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@earthling-amzn) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@pengxiaolong pengxiaolong marked this pull request as ready for review October 23, 2024 06:21
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 23, 2024
@mlbridge
Copy link

mlbridge bot commented Oct 23, 2024

Webrevs

@pengxiaolong
Copy link
Author

The test failure should not be caused by this change, spotted the same failure in other open PR:

java.lang.RuntimeException: expected testPhantom1 to be cleared
	at gc.shenandoah.TestReferenceRefersToShenandoah.fail(TestReferenceRefersToShenandoah.java:155)
	at gc.shenandoah.TestReferenceRefersToShenandoah.expectCleared(TestReferenceRefersToShenandoah.java:166)
	at gc.shenandoah.TestReferenceRefersToShenandoah.testConcurrentCollection(TestReferenceRefersToShenandoah.java:243)
	at gc.shenandoah.TestReferenceRefersToShenandoah.main(TestReferenceRefersToShenandoah.java:340)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:573)
	at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
	at java.base/java.lang.Thread.run(Thread.java:1576)

JavaTest Message: Test threw exception: java.lang.RuntimeException: expected testPhantom1 to be cleared
JavaTest Message: shutting down test

` ``

@earthling-amzn
Copy link
Contributor

Yes, we have a ticket for this test failure: https://bugs.openjdk.org/browse/JDK-8342734.

@@ -26,6 +26,7 @@
#define SHARE_VM_GC_SHENANDOAH_SHENANDOAHGENERATION_HPP

#include "memory/allocation.hpp"
#include "gc/shenandoah/shenandoahAffiliation.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but shenandoah/shenandoahAffiliation.hpp should come between heuristics/shenandoahSpaceInfo.hpp and shenandoah/shenandoahGenerationType.hpp.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

Copy link
Contributor

@earthling-amzn earthling-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay with me to /integrate after fixing include order in shenandoahGeneration.hpp

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 23, 2024
@pengxiaolong
Copy link
Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Oct 23, 2024
@openjdk
Copy link

openjdk bot commented Oct 23, 2024

@pengxiaolong
Your change (at version 302bf4f) is now ready to be sponsored by a Committer.

@earthling-amzn
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented Oct 23, 2024

Going to push as commit 55c6f67.
Since your change was applied there has been 1 commit pushed to the master branch:

  • 4970708: 8342580: GenShen: TestChurnNotifications fails executing in unintended test-id modes with ShenandoahGCMode=generational

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 23, 2024
@openjdk openjdk bot closed this Oct 23, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Oct 23, 2024
@openjdk
Copy link

openjdk bot commented Oct 23, 2024

@earthling-amzn @pengxiaolong Pushed as commit 55c6f67.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

2 participants