-
Notifications
You must be signed in to change notification settings - Fork 138
Merge lworld #1688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: lworld+vector
Are you sure you want to change the base?
Merge lworld #1688
Conversation
|
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
Reviewed-by: prr
Reviewed-by: prr
Reviewed-by: ayang, iwalulya, fandreuzzi
Reviewed-by: mullan
…otAndRemainder Reviewed-by: darcy
Reviewed-by: gli, iwalulya
…l.commonPool() Reviewed-by: vklang, alanb
…-512) Reviewed-by: kvn, roland
Reviewed-by: mdoerr
Reviewed-by: kvn, vlivanov
Reviewed-by: kvn, roland
Reviewed-by: naoto
…entire file at once Reviewed-by: erikj
Reviewed-by: rgiulietti
… since jdk-22+8 Reviewed-by: phubner, fparain
Reviewed-by: phubner, fparain
Reviewed-by: liach
Reviewed-by: shade, fandreuzzi
…tions Reviewed-by: alanb, iris
Reviewed-by: alanb
…rowthTriggers.java Reviewed-by: wkemper
Reviewed-by: kizune, dnguyen
Reviewed-by: almatvee
Reviewed-by: shade, apangin, dlong
… and idom lazy updating/forwarding of ctrl and idom via dead ctrl nodes Reviewed-by: chagedorn, thartmann
Reviewed-by: rriggs, liach, fparain
…ntext Reviewed-by: liach
…y available if it is exact Reviewed-by: thartmann
Reviewed-by: rriggs
…merge Reviewed-by: chagedorn
Reviewed-by: chagedorn
Reviewed-by: coleenp, heidinga, phubner
…ilure Reviewed-by: lfoltan
Reviewed-by: phubner, rriggs
…xpected IllegalArgumentException Reviewed-by: liach
…uring deoptimization Reviewed-by: thartmann
Co-authored-by: Frederic Parain <[email protected]> Reviewed-by: coleenp, thartmann
…': conversion from 'size_t' to 'int' Reviewed-by: thartmann
Reviewed-by: heidinga, coleenp
…by JDK-8371915 Reviewed-by: amenkov
…ut.java Reviewed-by: chagedorn
Reviewed-by: rriggs
Reviewed-by: dsimms
…e the extension space Reviewed-by: mhaessig, thartmann
…va fails with segfault in C1 compiled code on aarch64 Reviewed-by: thartmann
…mentWithGCBarrierTests.java fails with --enable-preview Reviewed-by: thartmann, chagedorn
…ment the offset when iterating over segments Reviewed-by: liach, rriggs
Reviewed-by: liach
…Oop.cpp Reviewed-by: fparain
Reviewed-by: liach
…th -XX:+UseAtomicValueFlattening Reviewed-by: liach
|
For the record copy pasting the comments from closed PR #1593 Hi @liach, For now, vectorAPI fallback does make heavy use of unsafe APIs for explicit larval transitions. https://cr.openjdk.org/~jrose/values/multi-field.html Hi @liach , Currently, Unsafe.put* APIs expect to operate on a mutable value, without Unsafe.makePrivateBuffer, there is no way to transition a value object to larval state.
Here is a typical update kernel for the nary operation fallback implementation.
Here are some relevant FAQs on the need for multifield annotation. Q. Why do we need @multifield annotated field in VectorPayloads and not just retain the array-backed backing storage? Since arrays are always allocated over the heap, they carry an identity, which is the distinctive heap address for each new backing storage array. This contradicts the philosophy of value type instances, which are identity-free; the compiler treats two values with the same contents as equivalent entities and is free to substitute one with another. By replacing existing array-backed storage with a @multifield annotated storage, we ensure that payload adheres to true value semantics, a @multifiled is seen as a bundle of fields, encapsulating payload is a value class, unlike an array, a multifield is never allocated an explicit heap storage. Here is an example code
Even though Payload is a value class, its two instances with the same backing storage are not equal, because arrays have identity. Q. Is there any alternative to @multifield?
Consider the above modified payload class ‘TrueValuePayload’, here we create a separate primitive type field for each lane of the vector, thus two vector values with the same contents are treated as equivalent entities. With @multifield we intend to create identity-free backing storage. Q. What are the complications with explicit fields for each lane in the container payload? A bigger concern is that the C2 compiler will see scalar fields as different inputs to InlineTypeNode. The compiler creates an InlineTypeNode for each instance of a value class and scalarizes the fields, with different scalar field we may observe multiple inputs to IR, one for each lane. Each scalar field will be of primitive Type, while the type of InlineTypeNode will be PayloadType, since an inline type node corresponds to a value object. We expect to deal in vector-type field, i.e., the one that carry an ideal type TypeVect. Vector IR is forwarded to its user every time we perform a vector_unbox operation over VectorPayload, this way, vector IR inputs are directly forwarded to the vector operation nodes. Keeping multiple scalar fields, one for each lane type in the VectorPayload class, while creating a vector IR for a bundle of lanes, will result in adding kludges in the code. Consider the following case
While the expected IR pallet should look like the following
Hi @liach , Here is a quick sync-up on the merge status. I started the merge yesterday, but it may take a few more days as there have been lots of changes in the code over the last 4 months. I am studying the code changes and documenting the new code model, in crystalline form. Following is our new model for larval objects. Core concept (New model):-
In the current context, we have the following options to perform vector updates in the fallback implementation:- Premise: Value objects are immutable, and two vector values with the same lane contents must be treated as equals. a/ Current update loop:- The lifetime of a larval is very restricted today, and by Java application compilation, we don't emit any bytecode b/w the new and its subsequent initializing constructor. However, using handcrafted bytecode, we can have a gap b/w a new ininitialized allocation and its initializing constructor, but the JVM runtime rules for bytecode verification catch any undefined behavior. Unsafe operations are outside the purview of JVM bytecode verification. Problem arises when we make the lifetime of the larval object being acted upon by an unsafe API span across a block or even loop of bytecode; in such cases, we may need to handle larval and non-larval merges OR emit an undefined behavior error. Q. How performant is the unsafe get / put operation? Consider the impact of long-lived larval res in the above update loop on intrinsicfication res, was made a larval quantity using makePrivateBuffer, but scalarization triggered at gen_checkcast, which observed a value type oop, which by the way is in larval state. As a result, finishPrivateBuffer receives an InlineTypeNode while it always expects to receive a larval value oop, and hence intrinsification fails, it has a side effect on auto-vectorization, which does not observe an IR expression for store and its associated address computation since we take the native implementation route. Hence, the performance of fall fallback implementation degrades. Q. What are the possible solutions here? b/ Other larval lifetime limiting options:- In the fallback implementation, we intend to operate at the lane level; it may turn out that the generated IR is auto-vectorizable, but it's purely a performance optimization by the Compiler, and was not the intent of the user in the first place. So we read the input lanes using Unsafe API, we need an Unsafe get here, since synthetic multifields are not part of user visible Java code and have no symbol or type information in the constant pool. Thus, with multi-field, we ought to use unsafe APIs. Unsafe. get does not mandate larval transitions of the input value vector. What needs to change, well, to limit the larval life time, we need to allocate a temporary buffer before the loop for storing the result of the computation loop. We need a new Unsafe API that accepts the temporary buffer and the result value object. This API will internally buffer the scalarized value, update its field from the temporary result storage, and then re-materialize the scalarized representation from the updated buffer. Q. Will this API again have both native and intrinsic implementation? Q. Will this be performant over the existing solution? So it's tempting to try out the new solution to limit the complexity due to long-lifespan larval objects. c/ Alternative approach : Create a new immutable value with each iteration of the update loop this is near to new code model . This also limits the lifetime of the larval value object, but on the hind side performs a buffering and scalarization with each iteration. When we buffer, we not only allocate a new temporary storage, but also emit field-level stores to dump the contents into temporary storage; this is very expensive if done for each lane update, given that we are only updating one lane of vector but are still paying the penalty to save and restore all unmodified lanes per iteration. While this does limit the larval life span, it has a huge performance penalty in comparison to both the existing approaches of larval life time across loop and single update with temporary storage allocation. In the vector API, the fallback implementation is wrapped within the intrinsic entry point. As long as intrinsification is successful, we are not hit by a fallback; a poor fallback may delay the warmup stage, but for long-running SIMD kernels, warmups are never an issue as long as intrinsifications are successful. I have created a prototype application [2] mimicking the vector API use case for you. Please give it a try with the existing lworld+vector branch after integrating the patch[1] While I am merging and understanding the implications of the new code mode, it will help if you can share your plan/test sample for the latest API to limit the scope of larval transition, keeping in view the above context. Some notes on the implications of patch[1] To circumvent this issue seen with the test case, we can disable scalarization of return values using -XX:-InlineTypeReturnedAsFields Hi @liach , I am working on refreshing lworld+vector branch and also thinking through some post-merge improvements. post merge improvements:- Code snippet:- Intermediate Representation for [1][2] Intermediate Representation for [3] Intermediate Representation for [4] New Transforms for boxing + unboxing optimizations :-
[Action Item] [Results] |






Merge from lworld in lworld+vector branch.
Apart from the usual merge conflict resolution, the patch also contains bug fixes for some special handling around multifields.
To segregate the code changes and maintain the history[1][2][3][4][5], these bug fixes were initially checked into a temporary private merge branch and are now part of the merge.
Validation: Status
[1] Special copy constructor used for initializing base and synthetic multifields
jatin-bhateja@0fe3153
[2] Fix ciField population for multifields
jatin-bhateja@4a26aff
[3] Multifield handling during type domain buildup
jatin-bhateja@74ca024
[4] prevent store forwarding across CPU memory barrier
jatin-bhateja@cfc71ac
[5] Fix to accommodate additional padding during offset computation of multifields
jatin-bhateja@75ec629
Progress
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1688/head:pull/1688$ git checkout pull/1688Update a local copy of the PR:
$ git checkout pull/1688$ git pull https://git.openjdk.org/valhalla.git pull/1688/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1688View PR using the GUI difftool:
$ git pr show -t 1688Using diff file
Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1688.diff