-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for Unused Properties on Annotated Schema Components #1366
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be really interesting to see how many warnings this kicks out on existing schemas.
Requesting one change only which is to reuse the existing list of all schema components rather than having your own traversal.
} | ||
} | ||
|
||
val descendantsForCheck = this match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are lists of all schema components elsewhere. You do not need to do your own recursive traversal to obtain them.
SchemaSet.allSchemaComponents is a Seq[SchemaComponent]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I actually looked into that, but becauseit is a Seq[SchemaComponent], and not an AnnotatedSchemaComponent, it doesn't know what checkUsedProperties is. Attempts to move checkUsedProperties to SchemaComponent instead resulted in issues since things like formatAnnotations that we use to grab the properties only exist in AnnotatedSchemaComponent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just call the method checkUsedProperties on the schema components that happen to be AnnotatedSchemaComponents? E.g.,
schemaSet.allSchemComponents.collect{ case asc: AnnotatedSchemaComponent => asc.checkUsedProperties }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also added propagating the elements down from referencer to referencee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And from simple type to element (if it has any used properties) and from element to simple type (but only for the properties the simple type carries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 assuming there isn't a way to reuse the code that Mike was referring to, which it sounds like there isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting some additional documentation/scaladoc and comments be added.
val std = ost.get.asInstanceOf[SimpleTypeDefBase] | ||
if (std.propCache.nonEmpty) { | ||
// if it has used properties transfer it to the element that's using it | ||
// we don't need the check for what is carries since all properties on the type are shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in comment "is carries" -> "it carries"
case _ => // do nothing | ||
} | ||
} | ||
|
||
private lazy val checkUnusedProperties = LV('hasUnusedProperties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs same scaladoc warning as is on the AnnotatedSchemaComponent val of the same name. This pass must happen after all other compilation has been done.
Note that the lazy val name is 'checkUnusedProperties' but the LV symbol is 'hasUnusedProperties'. They should match.
private lazy val propagateReferenceeAndTypeUsedProperties = { | ||
root.allComponents.collect { | ||
case ref: AbstractElementRef => | ||
val referent = ref.optReferredToComponent.get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a referent the one doing the referring, or the one referred to? Ambiguous to me. Explain val referent with a comment.
@@ -540,8 +540,51 @@ final class SchemaSet private ( | |||
.flatMap(_.defineVariables) | |||
.union(predefinedVars) | |||
|
|||
// propagated to referenced element propCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give real scaladoc to this method and explain the problem (which I think is that a simpleType could have properties that we only know are used due to an element decl that references that type.
Explain the algorithm being used to determine properties that are unused. The way you are using and populating the propCache to get this answer and how information moves from one object's propCache to another is needs explanation.
The code here is simpler than I expected. Nothing is chasing the simpleType base chains for example, and there is some reason why we don't have to do that. You have a case for element refs and one for element decls, but nothing for simple types?
// after compilation is done, we want to walk through all our refs and | ||
// if a property is in the cache of the referencer, put it into the | ||
// cache of the referenced element | ||
propagateReferenceeAndTypeUsedProperties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be possible or benefiit do this propogation at the time of property lookup instead of a step at the end of compilation? For example, is it possible for findPropertyOption
to add to more than just one cache? It could add to its current cache but also use this logic to find any element reference/simpletypes and add to their cache as well? That way all the cache stuff is in one spot?
Or maybe alternatively, could checkUnusedProperties
be updated to inspect the referenced propCaches instead of just its, and then we don't have to propagate/copy things at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I considered doing at the time of property lookup, but we do that in daffodil-lib which doesn't have access to the daffodil-lib classes like ElementRef that'd make it possible to find the element refs/simple types.
For the 2nd suggestion, we run into the same issues where 2 element refs share the element they are referencing but only the first element ref 's propCache show it as using the properties, not the referenced element and not the 2nd element ref. Which is why we opted for propagation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
daffodl-core classes like ElementRef*
@@ -1164,7 +1164,7 @@ | |||
</xs:complexType> | |||
</xs:element> | |||
<xs:group name="g0"> | |||
<xs:sequence dfdl:separatorPosition="infix"> | |||
<xs:sequence> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can group sequences really not have any properties? I thought that was legal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can have properties and those are combined with those on the group reference to form the combined set of properties for that sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why did dfdl:separatorPosition
need to be removed from these group sequences? I assumed it was removed because it was creating a unused property warning, but it seems like the property should be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was indeed creating an unused property warning, but since the default separatorPosition is infix. I deemed them unnecessary and safe to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like that shouldn't create a warning. Even though it's the same as the default the property should still be used. It just means the default format won't be used. Maybe there's an additional code change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll add them back in and investigate what's going on
- currently we check unused properties on terms only, which means unsused properties on non-term schema components can be missed. This will fix that issue, by moving the checks to AnnotatedSchemaComponents instead. - fix bug with XercesSchemaFileLocation.equals assuming all incoming objects would be XercesSchemaFileLocations as well. We also get just plain SchemaFileLocations, which can break things, so instead we compare against SchemaFileLocation members - add tests DAFFODIL-2798
- fix tests with unncessery properties - update TransitiveClosureSchemaComponents to include EnumerationDef - use root.allComponents to call checkUnusedProperties - use root.allComponents to propagate used properties of a referencer element to the referenced element DAFFODIL-2798
- update CheckUsedProperties comment DAFFODIL-2798
…Components - copy used properties from type to element and element to type - only copy properties if the element or it's simpletype have the property - update test DAFFODIL-2798
…Schema Components - move propagateUsedProperties to AnnotatedSchemaComponent - call function recursively for elements being referenced and simple types - add group refs property propagation to groups - add scaladoc DAFFODIL-2798
15f32de
to
19f0c94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this issue has taken a long time to fix. However, we really need to use the Root.refMap to walk the components more efficiently pulling the property-was-used knowledge as we walk from the the referenced ASCs recursively to their referencing ASCs so that we visit everything exactly once.
You're still going to read the propCache as a way of saying "this property was used" on the local element decls, element refs, and group refs, so this still requires that schema compilation is complete. I.e., this is a separate pass.
But I believe it will be an improvement creating a separate data structure that captures properties not in the local propCache, but used by all referencing ASCs. That eliminates clobbering the propCache. That means the propCache remains available for other kinds of analysis and post-processing of the schema.
Tests are excellent.
* the properties its used, and the second element doesn't because of sharing. With | ||
* this val, we copy used properties from element refs to elements, group refs to groups, | ||
* from elements to their simple types, if the simple type is annotated with the property, | ||
* and from simple types to their base types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment says what we do, but not why.
Start this comment by saying this is a central part of the algorithm by which we identify properties that are present, but not used.
The overall algorithm is expressed in multiple places. This part handles the propagation of properties from referencer to referenced object when property combining occurs. The rest of the algorithm is then just traversing every AnnocatedSchemaComponent to see what properties are present on them that do not appear in the propCache. This overall explaining of the algoirthm maybe doesn't go on this lazy val but elsewhere at the top level where this gets called perhaps.
It is also important to point out that this is clobbering the property caches, making them subsequently unusable for their initial property lookup uses. Hence, this must be done in a pass that happens only after all other schema compilation involving properties is complete.
* (e.g. schema compilation has finished) to ensure there are no false | ||
* positives. | ||
*/ | ||
final lazy val checkUnusedProperties: Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add to comment that the propagateUsedProperties must be called on all AnnotatedSchemaComponents before this will work properly.
Add that this is not a recursive walk. It identifies and issues warnings about unaccessed properties on just one AnnotatedSchemaComponent.
@@ -540,8 +540,25 @@ final class SchemaSet private ( | |||
.flatMap(_.defineVariables) | |||
.union(predefinedVars) | |||
|
|||
/** | |||
* propagate used properties through AnnotatedSchemaComponent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that this is part of the algorithm for identifying properties that are present, but unused.
Add that not only must this be done after compilation is complete, but that it repurposes the propCaches which are no longer usable for property lookups after this is done.
.map { _.formatAnnotation.justThisOneProperties } | ||
.getOrElse(Map.empty) | ||
|
||
val usedProperties = propCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this file, but in the files where the PropCache type is defined you need to add comments there that the PropCache plays two roles in different phases of the compilation. First for property lookups, second for determining which properties are present, but unused.
You need to explain, there, that these two uses are incompatible. The first use must be over. Then the caches are repurposed for the second algorithm.
* Used to propagate used properties through all element/group references, element declarations | ||
* and simple types. This is necessary because we have elements being referenced | ||
* by multiple element ref, where the first element gets its propCache populated with | ||
* the properties its used, and the second element doesn't because of sharing. With |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar "its" => "it has". Some rephrasing and splitting into paragraphs will make this clearer.
*/ | ||
private lazy val propagateUsedProperties = { | ||
root.allComponents.collect { case asc: AnnotatedSchemaComponent => | ||
asc.propagateUsedProperties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you do something to walk this in a sensible order, this is a computationally expensive O(n^2) or worse algorithm.
See other comment above about how to do this in a "pull" style that visits each ASC exactly once.
case ele: ElementDeclMixin => | ||
val ost = ele.optSimpleType | ||
ost.collect { case std: SimpleTypeDefBase => | ||
// if the type has used properties transfer it to the element that's using it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you have to copy from the simple type back to the element. Please add comment explaining why this is needed.
I would think this would go only from element to type (to base type, etc.)
// copy the used properties from the element to the type | ||
ele.propCache.foreach(kv => | ||
if (std.formatAnnotation.justThisOneProperties.contains(kv._1)) { | ||
// only add the used properties that the simple type carries, not any local element properties that aren't shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only properties I know on simple elements that can only be on the element, not the type, are dfdl:inputValueCalc and dfdl:outputValueCalc. Those can't be unused.
So when this comment says to only add the used properties that the simple type carries, not any local element properties that aren't "shared". What does "shared" mean? I also don't see any logic about removing any "uncarried" properties.
} | ||
) | ||
// propagate through the element's simple type base chain | ||
std.propagateUsedProperties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This propagation down the chain is the problematic part of this algorithm. One is potentially going to walk the length of this chain many more times than is necessary, and not by some small amount.
Consider trying this:
Take the list of all annotated schema comopnents ASCs, and divide it up into lists of
- all global group defs
- all global element defs
For all global group defs, process them by using Root.refMap to find all referencing global group refs and pull the propCache entries from there over onto the global group def.
For all global element decls, process them by using Root.refMap to find all element refs and pull the propCache entries from there over onto the global element decl.
For all global simple type defs, process them by using Root.refMap to find all elements that reference them and pull the propCache entries from there onto the globalSimpleType def.
For all local simple type defs, process them by pulling the propCache entries from their enclosing element onto them.
This leaves only global simple type defs with references to them from other simple type defs (which can be global or local).
These need to be processed in a specific order to avoid redundant processing. Create a table of "done" global simple type defs, initially empty.
Then start from all global simple type defs whose base is a primitive type (call these leaf simple type defs).
Process each of these as follows.
Use Root.refMap to find the other simple type defs which reference this one.
If they are found in the "done" table, pull their properties onto this simple type.
If they are not found in the "done" table, recursively process them, (they will become "done") then pull their properties onto this simple type.
When all the simple type defs which reference this one are "done" and their properties have been pulled onto this one, add this simple type def to the "done" table, and go on to the next in the list.
That is a walk which visits all referencers first before any referenced, so will propagate the propCache entries visiting each simple type def exactly once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional thought: If you can walk in this "referencers before referenced" order by way of the Root.refMap, starting from the leaf simple types, the global group defs, and the global element defs (having complex type), then, for each ASC you can compute a separate value which is the set of all properties used.
That is, you don't need to be doing put() calls on the propCache. You can read the propCache, but fill in a separate structure of just the properties that are used so as to isolate from those that are unused. This could be a separate structure which holds only properties in the referencer's propCache that are NOT found in the referenced object's propCache, so that this new structure is not redundant with the contents of the propCache.
This would be cleaner and much more space efficient than reusing the propCache data structure in this way that reads it but also modifies it.
DAFFODIL-2798