Fix Bug with valueLength being overwritten after Trim #1338

olabusayoT · 2024-10-15T22:07:29Z

currently after trimming the value of the element, we set the valueLength, and then overwrite it after returning from the parse that does the trimming. This results in the wrong value for value length. This fixes it by only setting it if it's a non-choice comlplexType and simpleTypes are handled elsewhere
we also incorrectly use valuelength for prefixed length calculations when we ought to be using content length per the spec
we also do not ensure valuelength isn't getting overwritten so we add asserts to setAbsStartPos0bInBits and setAbsEndPos0bInBits to verify that
fix bug where padding is being added around prefixed length element (DAFFODIL-2943) by changing CaptureLengthRegion to wrap around contentlengthStart and padding
fix bug where we were missing return after PE for Out of Range Binary Integers (DAFFODIL-2942)
fix bug where we were using the main element's qname instead of the prefixed element qname in the Unparse Error message
refactor Prefixed parsers to use state's bitLimit to get the prefix length (PrefixedLengthParserMixin2) since the specifiedLengthPrefixedParser will take care of parsing the prefix length
refactored Prefixed unparsers to not try to unparse prefix length since that is taken care of by SpecifiedLengthPrefixedUnparser
refactored prefixed parsers and unparsers to remove unused prefixed length parser related members
add tests

DAFFODIL-2658

jadams-tresys

+1

daffodil-test/src/test/resources/org/apache/daffodil/section12/lengthKind/PrefixedTests.tdml

.../src/main/scala/org/apache/daffodil/runtime1/processors/parsers/SpecifiedLengthParsers.scala

daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/ElementBaseRuntime1Mixin.scala

stevedlawrence · 2024-10-21T15:31:31Z

daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/ElementBaseRuntime1Mixin.scala

-      (isSimpleType && (impliedRepresentation == Representation.Text || lengthKind == LengthKind.Delimited)) ||
+    val capturedByValueParsers =
+      (isSimpleType && (
+        primType == PrimType.String || lengthKind == LengthKind.Delimited)) ||


I'm wondering if this change is correct?

For example, say we have this element:

<xs:element name="foo" type="xs:int" dfdl:representation="text" dfdl:trimKind="padChar" dfdl:lengthKind="explicit" dfdl:length="10" ... />

So a fixed length text integer with padding. In this case I think what Daffodil does is it create a String parser to parse the fixed length string and remove padding, and then creates another parser to convert that string to an actual integer.

So in that case, even though the primType is not String, I think the String parser will still be used to capture the value length after padding is removed. So I think impliedRepresentation == Text is still needed?

- currently after trimming the value of the element, we set the valueLength, and then overwrite it after returning from the parse that does the trimming. This results in the wrong value for value length. This fixes it by checking if the valueLength has already been set, and only setting it in SpecifiedLengthParserBase.parse if it hasn't - add tests DAFFODIL-2658

- update check to captureValueLength in base parser if it's a complex element - update shouldCaptureParseValueLength to use PrimType.String instead of representation Text, since strings valueLengths (and delimited valueLengths) are always handled by value parsers DAFFODIL-2658

- add asserts to setAbsStartPos0bInBits and setAbsEndPos0bInBits to ensure they're not being overwritten - undo change to capturedByValueParsers to put back im[liedRepresentation == Text - do not capture the value lengths of choices in specified length parser base - fix bug where padding is being added around prefixed length element (DAFFODIL-2943) by changing CaptureLengthRegion to wrap around contentlengthStart and padding - fix bug where we use the valuelength to calculate the prefix length, according to the spec it should be the content length - fix bug where we were missing return after PE for Out of Range Binary Integers (DAFFODIL-2942) - fix bug where we were using the main element's qname instead of the prefixed element qname in the Unparse Error message - refactor Prefixed parsers to use state's bitLimit to get the prefix length (PrefixedLengthParserMixin2) since the specifiedLengthPrefixedParser will take care of parsing the prefix length - refactored Prefixed unparsers to not try to unparse prefix length since that is taken care of by SpecifiedLengthPrefixedUnparser - refactored prefixed parsers and unparsers to remove unused prefixed length parser related members DAFFODIL-2658

stevedlawrence

Looks reasonable and the right approach for correctly implement prefixed length. Using things like valueLength was clearly wrong. Just a few questions.

stevedlawrence · 2024-11-04T14:58:16Z

...l-core/src/main/scala/org/apache/daffodil/core/grammar/primitives/PrimitivesLengthKind.scala

@@ -186,21 +185,11 @@ case class HexBinaryEndOfBitLimit(e: ElementBase) extends Terminal(e, true) {
 case class HexBinaryLengthPrefixed(e: ElementBase) extends Terminal(e, true) {


This class is now exactly the same as HexBinaryEndOfBitLimit, suggest we just delete it an use that for prefixed hex binary.

stevedlawrence · 2024-11-04T15:31:39Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/ElementBaseGrammarMixin.scala

-        new SpecifiedLengthExplicit(this, body, bitsMultiplier)
+        if (isSimpleType && primType == PrimType.HexBinary) {
+          // hexBinary has some checks that need to be done that SpecifiedLengthExplicit
+          // gets in the way of


I think instead of this comment, we can say HexBinary has it's own HexBinarySpecifiedLength parser that handles calculating the length, so we do not need the SpecifiedLengthExplicit parser?

In fact, do we need to exclude a number of other primitive types that do their own explicit length handling? Looking at the current code base, I think maybe only simple types that are strings and complex types use the SpecifiedLengthExplicit parser? I think all other primitives implement their own specified length handling?

So maybe this wants to be

if (isComplexType || primType == PrimType.String) { SpecifiedLengthExplicit(...) } else { // non-string simple types have their own custom parsers/unparsers for handling explicit lengths body }

In fact, I wonder if we eventually want to refactor all of this to completely get rid of all the custom explicit/implicit length parsers? We just have various SpecifiedLength parser that sets a bit limit (based on a pattern, a prefix length, evaluaating a length expression etc) and then we just have a single parser that just reads all bit up until that current bit limit. Separation of concerns kind of thing. It would get rid of this condiation and all these BinaryIntegerKnownLength/RuntimeLength/PrefixLength/etc parsers. There's just a single BinaryNumberParser, and it just gets the length from the bitLimit.

Maybe that generality would take performance hit? I'm also not exactly sure how that would work with unparsing--the SpecifiedLengthUnparser would need to somehow pass the calculated length to the child unparser, I guess it could still use bitLimit since that is a thing in UState?

@mbeckerle , any thoughts on refactoring the code to have various specified length parser as described above, and any idea on how that would work with unparsing?

stevedlawrence · 2024-11-04T15:57:32Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/ElementBaseGrammarMixin.scala

+          body
+        } else {
+          new SpecifiedLengthExplicit(this, body, bitsMultiplier)
+        }
      case LengthKind.Explicit => {
        Assert.invariant(!knownEncodingIsFixedWidth)
        Assert.invariant(lengthUnits eq LengthUnits.Characters)


Below this we have cases for implicit lengths. Do we need to do anything special for non-string simple types? I think those primitives have custom parsers that handle the implict length logic and don't need a SpecifiedLengthImplicit gramar? My concern is we could be adding that grammar and it would do something like set a bit limit, but the child paser that actually parsrers a the thing would just use it's own calculate and wouldn't need the bit limit, so we are just wasting effort.

stevedlawrence · 2024-11-04T16:02:41Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/primitives/PrimitivesBCD.scala

-    e.lengthUnits,
-    e.prefixedLengthAdjustmentInUnits
-  )
+  override lazy val parser = new BCDIntegerPrefixedLengthParser(e.elementRuntimeData)


I wonder if we want to rename these something like BCDIntegerBitLimitLengthParser and BCDIntegerMinimumLengthUnparser, making it clear that they aren't really doing anything specifically with prefixed length, and better describes the behavior of how they actuall parse/unparse things?

And when we implement things like lengthKind endOfParent, we could probably just use the same BitLimitParser, for example.

stevedlawrence · 2024-11-04T16:19:38Z

...ime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/BinaryNumberTraits.scala

+ * This mixin doesn't require parsing the prefix length element and just uses
+ * the state's bitLimit and position to get the bitLength instead
+ */
+trait PrefixedLengthParserMixin2 {


Need a different name for this. Numbers are not descriptive enough. Maybe we renames these something like PrefixLengthFromParserMixin and PrefixLengthFromBitLimitMixin? Something to make it more clear how they differ without having to read the code.

stevedlawrence · 2024-11-04T16:45:40Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/ElementBaseGrammarMixin.scala

+      case Prefixed => true
+      case _ => false
+    }
+  }


This can just be lengthKind eq LengthKind.Prefixed. Don't really need a match/case if we are just going to return true/false.

stevedlawrence · 2024-11-04T16:46:17Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/ElementBaseGrammarMixin.scala

    import LengthKind._
    lengthKind match {
      case Delimited =>
        true // don't test for hasDelimiters because it might not be our delimiter, but a surrounding group's separator, or it's terminator, etc.
      case Pattern => true
-      case Prefixed => true
+      case Prefixed => isPrefixed


I would suggest just returning true here, isPrefixed doesn't do anything except check lengthKind == Prefixed, which is exactly what this match case does.

… Trim - rename custom prefixedlength parsers to *BitLengthParsers to more accurately reflect what they're doing - rename custom prefixedlength unparsers to MinimumLengthUnparsers to more accurately reflect what they're doing - rename PrefixedLengthParserMixin2 to BitLengthFromBitLimitMixin to more accurately reflect what it is - got rid of HexBinaryLengthPrefixed since it is the same as HexBinaryEndOfBitLimit DAFFODIL-2658

stevedlawrence

It's still not clear to me that we are correctly adding/not adding SpecifiedLength* parsers. Do we have any documentation or any easy way to tell that we are doing things correctly, and only adding those parsers when they are actually needed?

stevedlawrence · 2024-11-13T14:33:37Z

daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/DaffodilCCodeGenerator.scala

@@ -289,7 +289,8 @@ object DaffodilCCodeGenerator
      case g: ElementParseAndUnspecifiedLength =>
        elementParseAndUnspecifiedLengthGenerateCode(g, cgState)
      case g: ElementUnused => noop(g)
-      case g: HexBinaryLengthPrefixed => hexBinaryLengthPrefixedGenerateCode(g.e, cgState)
+      case g: HexBinaryEndOfBitLimit if g.e.isPrefixed =>


Hmm, maybe my suggestion was wrong and the primitive still wants to be something like HexBInaryLengthPrefixed so that the grammar is obvious and things like code generators/etc can use the obvious grammar names? The parsers generated don't necessarily have to match the names.

Hmm...do you mean the suggestion to remove HexBinaryLengthPrefixed or the suggestion to rename all PrefixedLength parsers/unparser to BitLimitLength/MinimumLength?

Sorry that wasn't clear. I'm suggesting that we should keep the old grammar name HexBinaryLengthPrefixed so this change isn't needed. This way the grammar is a more clear for things like the code generator that examine it. But it's still fine to call the parsers BitLimitLength/MinimumLength etc from that grammar.

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/ElementBaseGrammarMixin.scala

stevedlawrence · 2024-11-13T14:54:18Z

daffodil-core/src/main/scala/org/apache/daffodil/core/grammar/ElementBaseGrammarMixin.scala

-          if isSimpleType && impliedRepresentation == Representation.Binary =>
+          if isSimpleType &&
+            impliedRepresentation == Representation.Binary &&
+            primType != PrimType.HexBinary =>
        new SpecifiedLengthImplicit(this, body, implicitBinaryLengthInBits)


This condition feels like it doesn't exclude enough things. For example, this matches all binary types except hex binary. But, for example, integers with length kind implicit use something like BinaryIntegerKnownLength primitive/parser, which doesn't rely on SpecifiedLengthImplicit parser. It doesn't necessarily hurt, but it's just unnecessary work and might slow things down, since the BinaryIntegerKnownLengthParser is going to do that work.

I'm wonder if it's just string types that really make use of specified length stuff?

So It look like nillable binary types need SpecifiedLengthImplicit., they are the only tests failing when I comment out that chunk of code

I think that means we only need the SpecifiedLengthImplicit when we are trying to parse the nillable part of a number. For example, the way things currently work is we have something like a SimpleNilOrValueParser and that has two children parsers, one is the nil parser (which sounds like it requires SpecifiedLengthImplicit) and the other is the binary paser (which probably does not need SpecifiedLenghtImplicit).

I wonder if captureLengthRegion needs a new paramater (e.g. forNilContent: Boolean) which is passed into specifiedLength. This way specifiedLength can do different things depending on if it's trying to represent nil content or non-nil content.

olabusayoT requested review from mbeckerle, stevedlawrence and jadams-tresys October 15, 2024 22:07

jadams-tresys approved these changes Oct 16, 2024

View reviewed changes

stevedlawrence reviewed Oct 16, 2024

View reviewed changes

daffodil-test/src/test/resources/org/apache/daffodil/section12/lengthKind/PrefixedTests.tdml Outdated Show resolved Hide resolved

.../src/main/scala/org/apache/daffodil/runtime1/processors/parsers/SpecifiedLengthParsers.scala Show resolved Hide resolved

stevedlawrence reviewed Oct 21, 2024

View reviewed changes

olabusayoT added 2 commits November 1, 2024 00:13

olabusayoT force-pushed the daf-2658-paddingNotRemoved branch 2 times, most recently from 8feac7f to 1a0b7cc Compare November 1, 2024 04:32

olabusayoT force-pushed the daf-2658-paddingNotRemoved branch from 1a0b7cc to 78050c8 Compare November 1, 2024 04:36

stevedlawrence reviewed Nov 4, 2024

View reviewed changes

stevedlawrence reviewed Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Bug with valueLength being overwritten after Trim #1338

Fix Bug with valueLength being overwritten after Trim #1338

olabusayoT commented Oct 15, 2024 •

edited

Loading

jadams-tresys left a comment

stevedlawrence Oct 21, 2024

stevedlawrence left a comment

stevedlawrence Nov 4, 2024

stevedlawrence Nov 4, 2024

olabusayoT Nov 4, 2024

stevedlawrence Nov 4, 2024

stevedlawrence Nov 4, 2024

stevedlawrence Nov 4, 2024

stevedlawrence Nov 4, 2024

stevedlawrence Nov 4, 2024

stevedlawrence left a comment

stevedlawrence Nov 13, 2024

olabusayoT Nov 14, 2024

stevedlawrence Nov 14, 2024

stevedlawrence Nov 13, 2024

olabusayoT Nov 14, 2024

stevedlawrence Nov 14, 2024

		@@ -186,21 +185,11 @@ case class HexBinaryEndOfBitLimit(e: ElementBase) extends Terminal(e, true) {
		case class HexBinaryLengthPrefixed(e: ElementBase) extends Terminal(e, true) {

Fix Bug with valueLength being overwritten after Trim #1338

Are you sure you want to change the base?

Fix Bug with valueLength being overwritten after Trim #1338

Conversation

olabusayoT commented Oct 15, 2024 • edited Loading

jadams-tresys left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevedlawrence left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevedlawrence left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olabusayoT commented Oct 15, 2024 •

edited

Loading