Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Resource URI Scheme #1451

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

olabusayoT
Copy link
Contributor

  • open up check for scheme in resolveCommon to allow for file, jar and resource scheme
  • check if there's a systemId as part of XercesError in XercesSchemaFileLocation, use that if there is and it's not an absolute version of diagnosticFile, else use diagnosticFile

DAFFODIL-2973

- open up check for scheme in resolveCommon to allow for file, jar and resource scheme
- check if there's a systemId as part of XercesError in XercesSchemaFileLocation, use that if there is and it's not an absolute version of diagnosticFile, else use diagnosticFile

DAFFODIL-2973
// only set to systemId if it's different from the diagnosticFile path
!xercesError.getSystemId.endsWith(schemaFileLocation.diagnosticFile.getPath)
) new File(xercesError.getSystemId)
else schemaFileLocation.diagnosticFile),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this saying that there are cases were the file where Xerces finds an error isn't necessarily the same as the schema file location that daffodil provider to Xerces? Like the error is in an imported file or something?

Should we just always use the systemId from Xerces if it's not null? Or are there cases where it might be defined but diagnosticFile is the right thing to use?

Also, some windows tests are failing that look like they might be related to this, causing us to loose depersonlized paths in diagnostics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, there are cases where systemId is defined but it's the absolute version of diagnosticFile causing us to loose the depersonalized paths. So the check is to sort of mitigate that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the depersonalized one we have in schemaFileLocation isn't necessarily the correct file that the error is about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, that's what I found while playing around with the quarkus stuff. I'll confirm the accuracy of that again tho. It was complaining about line 577 in a 34 line file iirc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep just confirmed that we're getting the right paths now when we weren't before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xerces Validation works without needing to do anything extra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I think XercesSchemaFileLocation is only used when we run into error loading the schema. So we don't use it for validation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I think XercesSchemaFileLocation is only used when we run into error loading the schema. So we don't use it for validation

You're talking about loading a DFDL schema right? I don't think it should be used for loading the XMLSchema_for_DFDL.xsd schema.

As such I don't think it represents an error in Daffodil.

It might not necessarily be a bug in Daffodil, it could be in Quarkus as you say, but it still represents a bug somewhere that broke functionality in Daffodil need to load the XML Schema for DFDL schemas. For example, if some tool modified daffodil jars and removed XMLSchema_for_DFDL.xsd schema, that wouldn't want to be reported as an SDE. That's a bug in the tool that broke an assertion that those schemas are available to Daffodil. Breaking that assertion should result in an some kind of exception that's not an SDE, so those tools or Daffodil can be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we use XersesSchemaFileLocation for all loader errors. Are you suggesting we add some special logic for just XMLSchema_for_DFDL.xsd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindof. I'm suggesting that when we load XMLSchema_for_DFDL.xsd, not when we validate with it, we need to detect that it wasn't loaded.

I was hoping we could wrap this line in a try/catch block:

https://github.com/apache/daffodil/blob/main/daffodil-lib/src/main/scala/org/apache/daffodil/lib/xml/DaffodilXMLLoader.scala#L737

And throw an assertion when it fails, but i'm not sure if that actually throws an exception on failure. Might need a different approach if that doesn't work.

@@ -216,6 +218,18 @@ object Misc {
}
}

def optResourceURI(contextURI: URI, relPath: String): Option[URI] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we call this optRelativeResourceURI to match the pattern of optRelativeJarFileURI, which looks like it does a similar thing but for jars contexts?

@@ -152,6 +152,8 @@ object Misc {
if (Paths.get(relURI).toFile.exists())
Some(relURI)
else None
} else if (contextURI.getScheme == "resource") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth doing a little refactoring to this getResourceRelativeOption function to make it more clear, and more correct. For example, the contextURI.isOpaque condition probably wasn't the right check, since the logic of that branch expects the context to be a "jar" URI. Any other opaque URI won't work and will probably lead to an weird error. I think this only works because the only opqaue URI we support is "jars", so isOpaque kindof implies it's a jar URI, but in theory we could add support for other opque URI's that aren't jar and this would break.

Since this fuction always expcts contextURI to have a scheme, what if we move the logic of the "file" scheme to a new optRelativeFileURI, and then this function just becomes:

contextURI.getSchema match {
  case "jar" => optRelativeJarFileURI(...)
  case "file" => optRelativeFileURI(...)
  case "resource" => optRelativeResourceURI(...)
  case _ => throw new IllegalArgumentException(...)
}

I think that makes this function more clear and leaves the functions it calls to figure out to correctly handle resolving the relative path.

@@ -216,6 +218,18 @@ object Misc {
}
}

def optResourceURI(contextURI: URI, relPath: String): Option[URI] = {
val relURI = contextURI.resolve(relPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resolvedURI might be more clear. relURI makes me think this URI is relative, but it should be an absolute URI at this point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was just copied from getResourceRelativeOnlyOption, but we should rename relURI to resolvedURI here also. relURI really isn't a good description of what this variable represents.

val relURI = contextURI.resolve(relPath)
try {
relURI.toURL.openStream().close()
this.getClass.getResource(relURI.getPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these two lines are doing the same thing--checking to see if the resolved URI actually exists. So I'm not sure we need both. I imagine calling getResource is more efficient since it doesn't need to actually open the file as a stream and deal with exceptions? So can we remove the try/catch and just do something like this:

if (this.getClass.getResource(relURI.getPath)) Some(relURI) else None

@@ -737,6 +751,10 @@ object Misc {
Paths.get(pathPart).toFile
}
case "file" => Paths.get(uri).toFile
case "resource" => {
val resourceFilePart = uri.getPath
new File(resourceFilePart)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of this function uses Paths.get(foo).toFile. I'm not sure why we did that, maybe Paths.get is doing something additinal? Should do that here for consistency?

- ensure the paths are normalized to the platform appropriate separators before comparing them
- refactor getResourceRelativeOnlyOption to be more clear and more accurate since .isOpaque can't be guaranteed to always be true for only jar files; also moved file logic into a function to make more concise
- use Paths.get in uriToDiagnosticFile to make more consistent with other schemes

DAFFODIL-2973
@olabusayoT
Copy link
Contributor Author

- throw invariantFailed assertion in the case that we're unable to load XMLSchema_for_DFDL.xsd so the user is aware something is broken outside of the schema rather than an SDE than indicates something is wrong with the schema

DAFFODIL-2973
Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, one variable rename, and maybe the XerecesSchemaFileLocation change isn't needed anymore?

else
new File(sysIdNormalizedPath)
}
}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this change be reverted? Now that failures to load the validation schema result in an assertion, I think XercesSchemaFileLocation will only be created where schemaFileLocation.diagnosticFile is correct.

@@ -216,6 +218,18 @@ object Misc {
}
}

def optResourceURI(contextURI: URI, relPath: String): Option[URI] = {
val relURI = contextURI.resolve(relPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was just copied from getResourceRelativeOnlyOption, but we should rename relURI to resolvedURI here also. relURI really isn't a good description of what this variable represents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants