Change return type of getStringUTF8Length #21005

hzongaro · 2025-01-24T14:58:03Z

The return type of the JVM's getStringUTF8Length method has changed from IDATA to UDATA. This change adjusts the JIT compiler's uses of that method. In particular, the return type of TR_FrontEnd::getStringUTF8Length and its overriding methods change from intptr_t to int32_t. Similarly, the bufferSize argument of TR_FrontEnd::getStringUTF8 becomes uintptr_t where it was int32_t.

The motivation was that the length of a UTF-8 encoded String could be greater than 2³² bytes, so the length could overflow intptr_t on a 32-bit platform. All uses of getStringUTF8Length in the JIT involve signatures, descriptors, method names and class names, which will never be large enough to exceed the range of the int32_t type. This was something that @jpmpapin was good enough to verify. Just to be cautious, however, this change includes an assertion test that the computed length is in range for the int32_t type.

This change also introduces a method, getStringUTF8UnabbreviatedLength, that returns a length value of type uint64_t if the JIT compiler ever needs to determine the UTF-8 encoded length of an arbitrary String. The method is currently unused.

In addition, String Builder Transformer uses the result of getStringUTF8Length to estimate the StringBuilder buffer size needed to accommodate appending a constant String to a StringBuilder. That could overestimate the space required. This has been changed to use getStringLength instead, to use the actual lengths of constant String objects. A test has also been added to detect integer overflow of the capacity estimate, aborting the transformation, as StringBuilder.<init>
will throw a NegativeArraySizeException if the specified capacity is negative.

This pull request requires a coordinated merge with OMR pull request eclipse-omr/omr#7620

String Builder Transformer uses the result of getStringUTF8Length to estimate the StringBuilder buffer size needed to accommodate appending a constant String to a StringBuilder. That could overestimate the space required. This has been changed to use getStringLength instead, to use the actual lengths of constant String objects. A test has also been added to detect integer overflow of the capacity estimate, aborting the transformation, as StringBuilder.<init> will throw a NegativeArraySizeException if the specified capacity is negative. Signed-off-by: Henry Zongaro <[email protected]>

The return type of the JVM's getStringUTF8Length method has changed from IDATA to UDATA. This change adjusts the JIT compiler's uses of that method. In particular, the return type of TR_FrontEnd::getStringUTF8Length and its overriding methods changes from intptr_t to int32_t. Similarly, the bufferSize argument of TR_FrontEnd::getStringUTF8 becomes uintptr_t where it was int32_t. The motivation was that the length of a UTF-8 encoded String could be greater than 2^32 bytes, so the length could overflow on a 32-bit platform. All uses of getStringUTF8Length in the JIT all involve signatures, descriptors, method names and class names, which will never be large enough to exceed the range of the int32_t type. Just to be cautious, however, this change includes an assertion test that the computed length is in range for the int32_t type. This change also introduces a method, getStringUTF8UnabbreviatedLength, that returns a length value of type uint64_t if the JIT compiler ever needs to determine the UTF-8 encoded length of an arbitrary String. The method is currently unused. Signed-off-by: Henry Zongaro <[email protected]>

hzongaro · 2025-01-24T14:59:03Z

@jdmpapin, thank you for your off-line review of an earlier version of these changes. May I ask you to review this updated version?

jdmpapin · 2025-01-24T15:40:20Z

runtime/compiler/optimizer/StringBuilderTransformer.cpp

+   // calculation has overflowed, so halt early.
+   //
+   for (TR_Pair<TR::Node*, TR::RecognizedMethod>* pair = iter.getFirst();
+        (pair != NULL) && (capacity < std::numeric_limits<int32_t>::max());


If capacity is exactly the max value, it hasn't overflowed the signed range yet. Not that it's important at all to to keep looping in that case

jdmpapin · 2025-01-24T15:43:24Z

runtime/compiler/net/MessageTypes.hpp

@@ -131,6 +131,7 @@ enum MessageType : uint16_t
   VM_getObjectClassInfoFromObjectReferenceLocation,
   VM_stackWalkerMaySkipFrames,
   VM_getStringUTF8Length,
+   VM_getStringUTF8UnabbreviatedLength,


This message type is never sent. I think it's a holdover from an earlier revision of this change that attempted to provide a working implementation of TR_J9ServerVM::getStringUTF8UnabbreviatedLength() (which, for onlookers, can't be done because the server can never hold direct pointers to objects)

Could you please also delete the existing VM_getStringUTF8Length message type?

jdmpapin · 2025-01-24T15:56:05Z

runtime/compiler/env/VMJ9.cpp

   }

 char *
-TR_J9VMBase::getStringUTF8(uintptr_t objectPointer, char *buffer, intptr_t bufferSize)
+TR_J9VMBase::getStringUTF8(uintptr_t objectPointer, char *buffer, int32_t bufferSize)


This is almost doing the opposite of what the commit message says:

Similarly, the bufferSize argument of TR_FrontEnd::getStringUTF8 becomes uintptr_t where it was int32_t

I think uintptr_t is the right type, since it can express the size of any buffer. Not that I expect the compiler to ever actually copy over 2GB of data out of a string object

jdmpapin · 2025-01-24T16:06:21Z

runtime/compiler/env/VMJ9.h

+   /**
+    * \brief Returns the number of UTF-8 encoded bytes needed to represent a Java String object.
+    *        The number of bytes needed to UTF-8 encode the String is representable as
+    *        a \c uintptr_t, in general, but this method returns a length of type \c int32_t.


uintptr_t might not be enough on 32-bit, so in general we'd need uint64_t, which I think is why you made that the return type of getStringUTF8UnabbreviatedLength()

jdmpapin · 2025-01-24T16:29:13Z

runtime/compiler/env/j9method.cpp

@@ -8409,7 +8409,7 @@ TR_J9ByteCodeIlGenerator::runFEMacro(TR::SymbolReference *symRef)

         uintptr_t methodHandle;
         uintptr_t methodDescriptorRef;
-         intptr_t methodDescriptorLength;
+         uintptr_t methodDescriptorLength;


This change, and the similar changes below and in handleServerMessage() seem unnecessary

If we're concerned about possible overflow when adding 1 for the NUL terminator to size a buffer, then uintptr_t can avoid that, but on 32-bit intptr_t will be the same as int32_t, so overflow would still be possible (edit: at the other locations, not here). However, I don't think we're in any danger of passing a string anywhere near long enough to cause that overflow. It might be better to just adjust the assertion in getStringUTF8Length() to make sure that the result is always strictly less than the max, which would guarantee that we can always add 1 without overflow

hzongaro added 2 commits January 23, 2025 12:38

hzongaro added comp:jit depends:omr Pull request is dependent on a corresponding change in OMR labels Jan 24, 2025

hzongaro requested a review from jdmpapin January 24, 2025 14:58

hzongaro requested a review from dsouzai as a code owner January 24, 2025 14:58

hzongaro mentioned this pull request Jan 24, 2025

Change return type of TR_FrontEnd::getStringUTF8Length to int32_t eclipse-omr/omr#7620

Open

jdmpapin reviewed Jan 24, 2025

View reviewed changes

jdmpapin changed the title ~~Change return type of~~ Change return type of getStringUTF8Length Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change return type of getStringUTF8Length #21005

Change return type of getStringUTF8Length #21005

hzongaro commented Jan 24, 2025

hzongaro commented Jan 24, 2025

jdmpapin Jan 24, 2025

jdmpapin Jan 24, 2025

jdmpapin Jan 24, 2025

jdmpapin Jan 24, 2025

jdmpapin Jan 24, 2025 •

edited

Loading

Change return type of getStringUTF8Length #21005

Are you sure you want to change the base?

Change return type of getStringUTF8Length #21005

Conversation

hzongaro commented Jan 24, 2025

hzongaro commented Jan 24, 2025

jdmpapin Jan 24, 2025

Choose a reason for hiding this comment

jdmpapin Jan 24, 2025

Choose a reason for hiding this comment

jdmpapin Jan 24, 2025

Choose a reason for hiding this comment

jdmpapin Jan 24, 2025

Choose a reason for hiding this comment

jdmpapin Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

jdmpapin Jan 24, 2025 •

edited

Loading