Check for Unused Properties on Annotated Schema Components #1366

olabusayoT · 2024-11-07T01:56:23Z

currently we check unused properties on terms only, which means unsused properties on non-term schema components can be missed. This will fix that issue, by moving the checks to AnnotatedSchemaComponents instead.
fix bug with XercesSchemaFileLocation.equals assuming all incoming objects would be XercesSchemaFileLocations as well. We also get just plain SchemaFileLocations, which can break things, so instead we compare against SchemaFileLocation members
add tests

mbeckerle

This is going to be really interesting to see how many warnings this kicks out on existing schemas.

Requesting one change only which is to reuse the existing list of all schema components rather than having your own traversal.

mbeckerle · 2024-11-07T16:52:29Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/AnnotatedSchemaComponent.scala

+      }
+    }
+
+    val descendantsForCheck = this match {


There are lists of all schema components elsewhere. You do not need to do your own recursive traversal to obtain them.

SchemaSet.allSchemaComponents is a Seq[SchemaComponent]

So I actually looked into that, but becauseit is a Seq[SchemaComponent], and not an AnnotatedSchemaComponent, it doesn't know what checkUsedProperties is. Attempts to move checkUsedProperties to SchemaComponent instead resulted in issues since things like formatAnnotations that we use to grab the properties only exist in AnnotatedSchemaComponent.

Can't you just call the method checkUsedProperties on the schema components that happen to be AnnotatedSchemaComponents? E.g.,

schemaSet.allSchemComponents.collect{ case asc: AnnotatedSchemaComponent => asc.checkUsedProperties }

Done. Also added propagating the elements down from referencer to referencee

And from simple type to element (if it has any used properties) and from element to simple type (but only for the properties the simple type carries)

jadams-tresys

+1 assuming there isn't a way to reuse the code that Mike was referring to, which it sounds like there isn't.

mbeckerle

Requesting some additional documentation/scaladoc and comments be added.

mbeckerle · 2024-11-14T17:49:12Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala

+          val std = ost.get.asInstanceOf[SimpleTypeDefBase]
+          if (std.propCache.nonEmpty) {
+            // if it has used properties transfer it to the element that's using it
+            // we don't need the check for what is carries since all properties on the type are shared


typo in comment "is carries" -> "it carries"

mbeckerle · 2024-11-14T17:52:11Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala

+      case _ => // do nothing
+    }
+  }
+
  private lazy val checkUnusedProperties = LV('hasUnusedProperties) {


Needs same scaladoc warning as is on the AnnotatedSchemaComponent val of the same name. This pass must happen after all other compilation has been done.

Note that the lazy val name is 'checkUnusedProperties' but the LV symbol is 'hasUnusedProperties'. They should match.

mbeckerle · 2024-11-14T17:57:48Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala

+  private lazy val propagateReferenceeAndTypeUsedProperties = {
+    root.allComponents.collect {
+      case ref: AbstractElementRef =>
+        val referent = ref.optReferredToComponent.get


Is a referent the one doing the referring, or the one referred to? Ambiguous to me. Explain val referent with a comment.

mbeckerle · 2024-11-14T18:00:16Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala

@@ -540,8 +540,51 @@ final class SchemaSet private (
    .flatMap(_.defineVariables)
    .union(predefinedVars)

+  // propagated to referenced element propCache


Please give real scaladoc to this method and explain the problem (which I think is that a simpleType could have properties that we only know are used due to an element decl that references that type.
Explain the algorithm being used to determine properties that are unused. The way you are using and populating the propCache to get this answer and how information moves from one object's propCache to another is needs explanation.

The code here is simpler than I expected. Nothing is chasing the simpleType base chains for example, and there is some reason why we don't have to do that. You have a case for element refs and one for element decls, but nothing for simple types?

stevedlawrence · 2024-11-14T20:20:14Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala

+        // after compilation is done, we want to walk through all our refs and
+        // if a property is in the cache of the referencer, put it into the
+        // cache of the referenced element
+        propagateReferenceeAndTypeUsedProperties


I wonder if it would be possible or benefiit do this propogation at the time of property lookup instead of a step at the end of compilation? For example, is it possible for findPropertyOption to add to more than just one cache? It could add to its current cache but also use this logic to find any element reference/simpletypes and add to their cache as well? That way all the cache stuff is in one spot?

Or maybe alternatively, could checkUnusedProperties be updated to inspect the referenced propCaches instead of just its, and then we don't have to propagate/copy things at all?

So I considered doing at the time of property lookup, but we do that in daffodil-lib which doesn't have access to the daffodil-lib classes like ElementRef that'd make it possible to find the element refs/simple types.

For the 2nd suggestion, we run into the same issues where 2 element refs share the element they are referencing but only the first element ref 's propCache show it as using the properties, not the referenced element and not the 2nd element ref. Which is why we opted for propagation

daffodl-core classes like ElementRef*

stevedlawrence · 2024-11-14T20:32:47Z

...dil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroup.tdml

@@ -1164,7 +1164,7 @@
      </xs:complexType>
    </xs:element>
    <xs:group name="g0">
-      <xs:sequence dfdl:separatorPosition="infix">
+      <xs:sequence>


Can group sequences really not have any properties? I thought that was legal?

They can have properties and those are combined with those on the group reference to form the combined set of properties for that sequence.

Then why did dfdl:separatorPosition need to be removed from these group sequences? I assumed it was removed because it was creating a unused property warning, but it seems like the property should be used?

It was indeed creating an unused property warning, but since the default separatorPosition is infix. I deemed them unnecessary and safe to remove.

Seems like that shouldn't create a warning. Even though it's the same as the default the property should still be used. It just means the default format won't be used. Maybe there's an additional code change needed?

Ok, I'll add them back in and investigate what's going on

- currently we check unused properties on terms only, which means unsused properties on non-term schema components can be missed. This will fix that issue, by moving the checks to AnnotatedSchemaComponents instead. - fix bug with XercesSchemaFileLocation.equals assuming all incoming objects would be XercesSchemaFileLocations as well. We also get just plain SchemaFileLocations, which can break things, so instead we compare against SchemaFileLocation members - add tests DAFFODIL-2798

- fix tests with unncessery properties - update TransitiveClosureSchemaComponents to include EnumerationDef - use root.allComponents to call checkUnusedProperties - use root.allComponents to propagate used properties of a referencer element to the referenced element DAFFODIL-2798

- update CheckUsedProperties comment DAFFODIL-2798

…Components - copy used properties from type to element and element to type - only copy properties if the element or it's simpletype have the property - update test DAFFODIL-2798

…Schema Components - move propagateUsedProperties to AnnotatedSchemaComponent - call function recursively for elements being referenced and simple types - add group refs property propagation to groups - add scaladoc DAFFODIL-2798

mbeckerle

I know this issue has taken a long time to fix. However, we really need to use the Root.refMap to walk the components more efficiently pulling the property-was-used knowledge as we walk from the the referenced ASCs recursively to their referencing ASCs so that we visit everything exactly once.

You're still going to read the propCache as a way of saying "this property was used" on the local element decls, element refs, and group refs, so this still requires that schema compilation is complete. I.e., this is a separate pass.

But I believe it will be an improvement creating a separate data structure that captures properties not in the local propCache, but used by all referencing ASCs. That eliminates clobbering the propCache. That means the propCache remains available for other kinds of analysis and post-processing of the schema.

Tests are excellent.

mbeckerle · 2024-11-21T16:29:34Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/AnnotatedSchemaComponent.scala

+   * the properties its used, and the second element doesn't because of sharing. With
+   * this val, we copy used properties from element refs to elements, group refs to groups,
+   * from elements to their simple types, if the simple type is annotated with the property,
+   * and from simple types to their base types


This comment says what we do, but not why.

Start this comment by saying this is a central part of the algorithm by which we identify properties that are present, but not used.

The overall algorithm is expressed in multiple places. This part handles the propagation of properties from referencer to referenced object when property combining occurs. The rest of the algorithm is then just traversing every AnnocatedSchemaComponent to see what properties are present on them that do not appear in the propCache. This overall explaining of the algoirthm maybe doesn't go on this lazy val but elsewhere at the top level where this gets called perhaps.

It is also important to point out that this is clobbering the property caches, making them subsequently unusable for their initial property lookup uses. Hence, this must be done in a pass that happens only after all other schema compilation involving properties is complete.

mbeckerle · 2024-11-21T16:33:48Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/AnnotatedSchemaComponent.scala

+   * (e.g. schema compilation has finished) to ensure there are no false
+   * positives.
+   */
+  final lazy val checkUnusedProperties: Unit = {


Add to comment that the propagateUsedProperties must be called on all AnnotatedSchemaComponents before this will work properly.

Add that this is not a recursive walk. It identifies and issues warnings about unaccessed properties on just one AnnotatedSchemaComponent.

mbeckerle · 2024-11-21T16:43:30Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala

@@ -540,8 +540,25 @@ final class SchemaSet private (
    .flatMap(_.defineVariables)
    .union(predefinedVars)

+  /**
+   * propagate used properties through AnnotatedSchemaComponent


Add that this is part of the algorithm for identifying properties that are present, but unused.

Add that not only must this be done after compilation is complete, but that it repurposes the propCaches which are no longer usable for property lookups after this is done.

mbeckerle · 2024-11-21T16:47:39Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/AnnotatedSchemaComponent.scala

+      .map { _.formatAnnotation.justThisOneProperties }
+      .getOrElse(Map.empty)
+
+    val usedProperties = propCache


Not in this file, but in the files where the PropCache type is defined you need to add comments there that the PropCache plays two roles in different phases of the compilation. First for property lookups, second for determining which properties are present, but unused.

You need to explain, there, that these two uses are incompatible. The first use must be over. Then the caches are repurposed for the second algorithm.

mbeckerle · 2024-11-21T16:53:41Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/AnnotatedSchemaComponent.scala

+   * Used to propagate used properties through all element/group references, element declarations
+   * and simple types. This is necessary because we have elements being referenced
+   * by multiple element ref, where the first element gets its propCache populated with
+   * the properties its used, and the second element doesn't because of sharing. With


Grammar "its" => "it has". Some rephrasing and splitting into paragraphs will make this clearer.

mbeckerle · 2024-11-21T18:25:49Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala

+   */
+  private lazy val propagateUsedProperties = {
+    root.allComponents.collect { case asc: AnnotatedSchemaComponent =>
+      asc.propagateUsedProperties


Unless you do something to walk this in a sensible order, this is a computationally expensive O(n^2) or worse algorithm.

See other comment above about how to do this in a "pull" style that visits each ASC exactly once.

mbeckerle · 2024-11-21T18:34:17Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/AnnotatedSchemaComponent.scala

+      case ele: ElementDeclMixin =>
+        val ost = ele.optSimpleType
+        ost.collect { case std: SimpleTypeDefBase =>
+          // if the type has used properties transfer it to the element that's using it


I don't understand why you have to copy from the simple type back to the element. Please add comment explaining why this is needed.

I would think this would go only from element to type (to base type, etc.)

mbeckerle · 2024-11-21T18:39:49Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/AnnotatedSchemaComponent.scala

+          // copy the used properties from the element to the type
+          ele.propCache.foreach(kv =>
+            if (std.formatAnnotation.justThisOneProperties.contains(kv._1)) {
+              // only add the used properties that the simple type carries, not any local element properties that aren't shared


The only properties I know on simple elements that can only be on the element, not the type, are dfdl:inputValueCalc and dfdl:outputValueCalc. Those can't be unused.

So when this comment says to only add the used properties that the simple type carries, not any local element properties that aren't "shared". What does "shared" mean? I also don't see any logic about removing any "uncarried" properties.

mbeckerle · 2024-11-21T19:15:23Z

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/AnnotatedSchemaComponent.scala

+            }
+          )
+          // propagate through the element's simple type base chain
+          std.propagateUsedProperties


This propagation down the chain is the problematic part of this algorithm. One is potentially going to walk the length of this chain many more times than is necessary, and not by some small amount.

Consider trying this:

Take the list of all annotated schema comopnents ASCs, and divide it up into lists of

all global group defs

all global element defs

For all global group defs, process them by using Root.refMap to find all referencing global group refs and pull the propCache entries from there over onto the global group def.

For all global element decls, process them by using Root.refMap to find all element refs and pull the propCache entries from there over onto the global element decl.

For all global simple type defs, process them by using Root.refMap to find all elements that reference them and pull the propCache entries from there onto the globalSimpleType def.

For all local simple type defs, process them by pulling the propCache entries from their enclosing element onto them.

This leaves only global simple type defs with references to them from other simple type defs (which can be global or local).

These need to be processed in a specific order to avoid redundant processing. Create a table of "done" global simple type defs, initially empty.

Then start from all global simple type defs whose base is a primitive type (call these leaf simple type defs).
Process each of these as follows.

Use Root.refMap to find the other simple type defs which reference this one.

If they are found in the "done" table, pull their properties onto this simple type.

If they are not found in the "done" table, recursively process them, (they will become "done") then pull their properties onto this simple type.

When all the simple type defs which reference this one are "done" and their properties have been pulled onto this one, add this simple type def to the "done" table, and go on to the next in the list.

That is a walk which visits all referencers first before any referenced, so will propagate the propCache entries visiting each simple type def exactly once.

An additional thought: If you can walk in this "referencers before referenced" order by way of the Root.refMap, starting from the leaf simple types, the global group defs, and the global element defs (having complex type), then, for each ASC you can compute a separate value which is the set of all properties used.

That is, you don't need to be doing put() calls on the propCache. You can read the propCache, but fill in a separate structure of just the properties that are used so as to isolate from those that are unused. This could be a separate structure which holds only properties in the referencer's propCache that are NOT found in the referenced object's propCache, so that this new structure is not redundant with the contents of the propCache.

This would be cleaner and much more space efficient than reusing the propCache data structure in this way that reads it but also modifies it.

olabusayoT requested review from mbeckerle, stevedlawrence and jadams-tresys November 7, 2024 01:56

mbeckerle requested changes Nov 7, 2024

View reviewed changes

jadams-tresys approved these changes Nov 7, 2024

View reviewed changes

olabusayoT requested a review from mbeckerle November 14, 2024 16:58

mbeckerle requested changes Nov 14, 2024

View reviewed changes

stevedlawrence reviewed Nov 14, 2024

View reviewed changes

olabusayoT added 5 commits November 20, 2024 17:51

fixup! fixup! Check for Unused Properties on Annotated Schema Components

d1ae3b6

- update CheckUsedProperties comment DAFFODIL-2798

fixup! fixup! fixup! Check for Unused Properties on Annotated Schema …

76b164e

…Components - copy used properties from type to element and element to type - only copy properties if the element or it's simpletype have the property - update test DAFFODIL-2798

olabusayoT force-pushed the daf-2798-checkUnusedProperties branch from 15f32de to 19f0c94 Compare November 20, 2024 22:52

mbeckerle requested changes Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for Unused Properties on Annotated Schema Components #1366

Check for Unused Properties on Annotated Schema Components #1366

olabusayoT commented Nov 7, 2024

mbeckerle left a comment

mbeckerle Nov 7, 2024

olabusayoT Nov 7, 2024

mbeckerle Nov 7, 2024

olabusayoT Nov 7, 2024

olabusayoT Nov 8, 2024

jadams-tresys left a comment

mbeckerle left a comment

mbeckerle Nov 14, 2024

mbeckerle Nov 14, 2024 •

edited

Loading

mbeckerle Nov 14, 2024

mbeckerle Nov 14, 2024

stevedlawrence Nov 14, 2024

olabusayoT Nov 15, 2024

olabusayoT Nov 15, 2024

stevedlawrence Nov 14, 2024

mbeckerle Nov 14, 2024

stevedlawrence Nov 15, 2024

olabusayoT Nov 15, 2024

stevedlawrence Nov 15, 2024

olabusayoT Nov 15, 2024

mbeckerle left a comment

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

mbeckerle Nov 21, 2024

Check for Unused Properties on Annotated Schema Components #1366

Are you sure you want to change the base?

Check for Unused Properties on Annotated Schema Components #1366

Conversation

olabusayoT commented Nov 7, 2024

mbeckerle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jadams-tresys left a comment

Choose a reason for hiding this comment

mbeckerle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbeckerle Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbeckerle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbeckerle Nov 14, 2024 •

edited

Loading