Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mergeJson: Set merged tileset children uri to relative path to output directory #142

Closed
wants to merge 3 commits into from

Conversation

jo-chemla
Copy link
Contributor

@jo-chemla jo-chemla commented Jul 9, 2024

In the initial PR, I did not correctly set the children URI for the merged tileset. This PR makes it so that children URI is the relative path from the output directory to the respective input tilesets jsons filepaths.

@javagl
Copy link
Contributor

javagl commented Jul 9, 2024

I think that this was not really changed in the last PR. It was the behavior that the merge operation had.
(But I'll have to re-read whether this might be related to https://github.com/CesiumGS/3d-tiles-tools/pull/140/files#diff-41e63ef4640d275815adbe220f32436f140f03ff544b0804dc4bc58426d1fa46R153 ...?)

In any case, there certainly are different possible strategies for determining the URIs of the tilesets. And trying to set the URIs in the target based on the URIs of the input and the output directory (using the relativize function) could make sense in many cases. But it looks dangerous - insofar that there may be cases where this is not possible, does not make sense, or yields completely unexpected results. (I'm thinking about cases where someone uses C:/some/absolute/path/to/some/tileset.json as the input, or inputs and outputs that are in certain directory structures (relative to each other) where the relativize call bails out.

I'll have to carefully read the changes, think this through more thoroughly, and maybe do some further tests. For example, with absolute paths (even though it could be tricky to put that into unit tests...).

@jo-chemla
Copy link
Contributor Author

Thanks for getting back again.
The original PR was not working correctly for input tilesets described by paths pointing to the json files tilesetFolder/tilesetA.json. Children URI was appended tileset.json twice and resulted in paths that looked like tilesetFolder/tilesetA.json/tilesetA.json

This new implementation works as it should for all cases I could test it on - see table below. The difference between how merge and mergeJson should handle tilesets filepaths is that the simple merge seems to find the best suitable output folderpath within the output merged tileset directory, while mergeJson should find the path relative to the output mergedTileset.json that points towards the input tilesetN.json.

This new implementation was tested on multiple iterations of absolute and relative inputs and outputs. Children uris always result in the correct relative filepath - only case when it is an absolute path is when input and output are on different drive letters. Could not update the spec to handle test-cases with absolute paths, hence I only added the test for relative input and relative outputs filepaths.

inputs output Works? Children uri
absolute C:\tilesets\folderN\tilesetN.json absolute C:\tilesets\merged\tilesetMerged.json ../folderN/tilesetN.json
relative .\tilesets\folderN\tilesetN.json relative .\tilesets\merged\tilesetMerged.json ../folderN/tilesetN.json
relative .\tilesets\folderN\tilesetN.json absolute C:\tilesets\merged\tilesetMerged.json ../folderN/tilesetN.json
absolute C:\tilesets\folderN\tilesetN.json relative .\tilesets\merged\tilesetMerged.json ../folderN/tilesetN.json
absolute C:\tilesets\folderN\tilesetN.json absolute on other drive Z:\out\merged\tilesetMerged.json C:/tilesets/folderN/tilesetN.json

@javagl
Copy link
Contributor

javagl commented Jul 10, 2024

(Only a short comment - I still have to allocate more time for that:)

I had seen the different handling of the "identifiers" (already linked to above, here in the last PR) that depended on the jsonOnly flag, and ... in hindsight, I should have looked at that with more scrutiny, to better understand why these identifiers should be handled differently based on the jsonOnly flag.

Some context: These "identifiers" are just used for disambiguating multiple input tilesets that "have the same name". This was not handled in the initial implementation of the merge function (waaay back when this was still part of another repo). And this case is what is explicitly checked with the current spec data, with the duplicate name appearing as TilesetA and sub/TilesetA.

The problem:

The behavior of merge is still horribly underspecified. It was just taken from the old state (linked above), revived, and fixed for the case of duplicate path components. But right now, the description - neither the CLI docs nor the TSDoc - say anything about the technical details, like what the resulting content.uri will be.

For example, the current (single) merge spec uses

inputs = [
  "./specs/data/mergeTilesets/basicMerge/TilesetA"
  "./specs/data/mergeTilesets/basicMerge/sub/TilesetA"
]

as the inputs, and tests that the resulting URIs for the tilesets will be

  "TilesetA/tileset.json",
  "TilesetA-0/tileset.json",

These TilesetA and TilesetA-0 prefixes are totally unspecified. They could also be randomStringX and randomStringY, and usually, this would not matter, because these tilesets are just copied to the output directory, into subdirectories with the proper names.

Now... with jsonOnly, they are not copied. And this affects the expectations that people might have for the resulting content.uri. The merge function can no longer assign random names (and disambiguate them with some unspecified -0 suffix).

In the current state of the spec, you said

    const expectedContentUris = [
      "../../mergeTilesets/basicMerge/TilesetA/tileset.json",
      "../../mergeTilesets/basicMerge/sub/TilesetA/tileset.json",
    ];

Now, these content URIs are very long, very specific, and very dependent on the exact structure of the input and output. And it will make a crucial difference of whether a user has a directory structure like

input/TilesetA/tileset.json
input/TilesetB/tileset.json
->
output/tileset.json

or

inputTilesetA/tileset.json
inputTilesetB/tileset.json
->
output/tileset.json

From quickly looking over the table, it looks like the resulting content.uri do make sense. But I'll have to think this through (also in terms of disambiguation), and in how far the behavior here can be "specified" - maybe with some statement like

The merged tileset will refer to the input tilesets using relative URIs that reflect the structure of the input and output directories..

(you know, roughly like that...). The fact that this is so important (only) for the jsonOnly=true case evaded me in the first PR.


Sorry... so much about a "short comment" 😬 But I see these PRs as a chance to clarify some aspects of the behavior and describe it more strictly on a technical level. And I appreciate your feedback, contributions, and tests that appear to happen based on real-world examples, and not only one artificial example that is run as part of a unit test.

@jo-chemla
Copy link
Contributor Author

Thanks for the "short comment", very much appreciated! It is now clearer to me, both the history behind the implementation and the use for these "identifiers" within the standard merge utility.

I did understand indeed that these were made to disambiguate input tilesets directories named the same at two different locations, so they could be copied into output at the best location within output directory - as top-level as possible, with a suffix in case it is not the first children tileset directory named like that.

In case of jsonOnly=true, the probable use-case is for the user to reference the output merged json tileset into a scene defined with CesiumJS or CesiumForUnreal etc. Having the children-uris of the referenced merged tilesets referenced by a path relative to the output merged tileset means that the tileset loaded into the scene will immediately correctly reference all children tilesets, and load the merged children where they are. Hence the new implementation - which does not modify identifiers for jsonOnly=false, but uses relative paths for jsonOnly=true.

Hope this makes sense. In our workflow, this is the use-case we have, hence also testing it for all the different flavors of relative vs absolute input & output paths arguments to the CLI in the table above, to make sure all these cases are yielding expected results.

@javagl
Copy link
Contributor

javagl commented Jul 11, 2024

All this makes sense. I think that the most common usage pattern for the mergeJson functionality is that someone has a directory with a bunch of subdirectories...

./data/tileset0/tileset0.json
./data/tileset1/tileset1.json
./data/tileset2/tileset2.json

and then just wants to create a new, top-level tileset, ./data/tileset.json, that refers to the existing ones, using URIs like ./tilesetN/tilesetN.json, exactly for the reason that you mentioned: The newly created one should "Just Work" when it is fed into a runtime engine.

But now I realized that the differences in the internal handling of URIs for the merge vs mergeJson case are larger than I expected.

The solution for the dismbiguation, using the tilesetSourceIdentifiers, was ... pragmatic, and solved the issue, but ... was certainly not "nice" and "thought through". I'm mentioning this to emphasize: None of the following is "your fault"! You just tried to cope with that ... "half-baked pragmatic solution" ... that I came up with.

But looking at the resulting core snippet, i.e. the loop in the mergeOperation method:

    for (const tilesetSourceName of tilesetSourceNames) {
      // Determine the name of the file that contains the tileset JSON data
      const tilesetSourceJsonFileName =
        Tilesets.determineTilesetJsonFileName(tilesetSourceName);

      // Determine an "identifier" for the tileset source
      // (see `tilesetSourceIdentifiers` for details)
      let tilesetSourceDirectoryPath;
      if (Paths.isDirectory(tilesetSourceName)) {
        tilesetSourceDirectoryPath = tilesetSourceName;
      } else {
        tilesetSourceDirectoryPath = path.dirname(tilesetSourceName);
      }
      const tilesetSourceDirectoryName = path.basename(tilesetSourceName);

      const tilesetTargetDirectoryPath = Paths.isDirectory(tilesetTargetName)
        ? tilesetTargetName
        : path.dirname(tilesetTargetName);
      const tilesetSourceIdentifier = TilesetMerger.createIdentifier(
        tilesetSourceDirectoryName,
        this.tilesetSourceIdentifiers
      );
      const relativeTilesetIdentifier = path.relative(
        tilesetTargetDirectoryPath,
        tilesetSourceDirectoryPath
      );

      const tilesetSource = TilesetSources.createAndOpen(tilesetSourceName);
      this.tilesetSources.push(tilesetSource);
      this.tilesetSourceJsonFileNames.push(tilesetSourceJsonFileName);
      this.tilesetSourceIdentifiers.push(
        !jsonOnly ? tilesetSourceIdentifier : relativeTilesetIdentifier
      );
    }

and squinting a little bit, it becomes apparent that these "identifiers" caused some quirks here:

1.: They sometimes are "arbitrary strings", and sometimes "relative paths". Meh.

2.: They require completely different information for their construction.

This is somehow "hidden" by the !jsonOnly ? ... ternary operator. But that loop could also be written as

    for (const tilesetSourceName of tilesetSourceNames) {
      // Determine the name of the file that contains the tileset JSON data
      const tilesetSourceJsonFileName =
        Tilesets.determineTilesetJsonFileName(tilesetSourceName);

      const tilesetTargetDirectoryPath = Paths.isDirectory(tilesetTargetName)
        ? tilesetTargetName
        : path.dirname(tilesetTargetName);

      const tilesetSource = TilesetSources.createAndOpen(tilesetSourceName);
      this.tilesetSources.push(tilesetSource);
      this.tilesetSourceJsonFileNames.push(tilesetSourceJsonFileName);

      if (jsonOnly) {
        let tilesetSourceDirectoryPath;
        if (Paths.isDirectory(tilesetSourceName)) {
          tilesetSourceDirectoryPath = tilesetSourceName;
        } else {
          tilesetSourceDirectoryPath = path.dirname(tilesetSourceName);
        }
        const relativeTilesetIdentifier = path.relative(
          tilesetTargetDirectoryPath,
          tilesetSourceDirectoryPath
        );
        this.tilesetSourceIdentifiers.push(relativeTilesetIdentifier);
      } else {
        const tilesetSourceDirectoryName = path.basename(tilesetSourceName);
        const tilesetSourceIdentifier = TilesetMerger.createIdentifier(
          tilesetSourceDirectoryName,
          this.tilesetSourceIdentifiers
        );
        this.tilesetSourceIdentifiers.push(tilesetSourceIdentifier);
      }
    }

which more clearly shows their different construction. E.g when they are "arbitrary strings", there is no need for the path.relative call, and when they are paths, there is no need for the createIdenitfier call.

3. Most importantly: The previous PR, #140 , already did break the original merge functionality, because of a quirky use of these "identifiers" at another place. When running something like
npx ts-node ./src/cli/main.ts merge -i ./mergeTests/input0/TilesetA/tileset.json -i ./mergeTests/input0/sub/TilesetA/tileset.json -o ./mergeTests/output0c/tileset.json -f

(with the inputs all having the /tileset.json part at the end), then it will crash with

[13:52:59.279] ERROR (CLI): EEXIST: file already exists, mkdir 'mergeTests\output0c\tileset.json'
    err: {
      "type": "Error",
      "message": "EEXIST: file already exists, mkdir 'mergeTests\\output0c\\tileset.json'",
      "stack":
          Error: EEXIST: file already exists, mkdir 'mergeTests\output0c\tileset.json'
              at Object.mkdirSync (node:fs:1391:3)
              at TilesetTargetFs.addEntry (C:\Develop\CesiumGS\3d-tiles-tools\src\tilesets\tilesetData\TilesetTargetFs.ts:57:8)
              at TilesetMerger.copyResources (C:\Develop\CesiumGS\3d-tiles-tools\src\tools\tilesetProcessing\TilesetMerger.ts:279:30)
              at TilesetMerger.mergeInternal (C:\Develop\CesiumGS\3d-tiles-tools\src\tools\tilesetProcessing\TilesetMerger.ts:254:25)
...

The reason is that the copyResources function uses these tilesetSourceIdentifiers in a way that makes assumptions that simply no longer hold after the change: Adding a debug log like
console.log("Putting " + sourceKey + " from source " + i + " into target as " + targetKey);
there prints
Putting ll.b3dm from source 0 into target as tileset.json/ll.b3dm
showing that it uses tileset.json as an "identifier" there.


Again, all this boils down to questionable handling of the "identifiers". Apologies for not noticing the caveats here earlier. I'll try to create a fix for this ASAP. (This may be independent of this PR, but I'll certainly try to cover the use case and tests that you described).

@javagl
Copy link
Contributor

javagl commented Jul 11, 2024

When running the new spec, then it expects the content URIs to be

    const expectedContentUris = [
      "../../mergeTilesets/basicMerge/TilesetA/tileset.json",
      "../../mergeTilesets/basicMerge/sub/TilesetA/tileset.json",
    ];

However, the directory structure within specs is

// Inputs:
specs/data/mergeTilesets/basicMerge/TilesetA/tileset.json
specs/data/mergeTilesets/basicMerge/sub/TilesetA/tileset.json

// Outputs
specs/data/output/mergeTilesets/basicMerge/tileset.json

When I'm counting correctly, then there is one ../ missing in the URIs:

  • Starting at specs/data/output/mergeTilesets/basicMerge/
  • First ../ goes to specs/data/output/mergeTilesets/
  • Second ../ goes to specs/data/output/
  • There is no mergeTilesets directory - it would need ...
  • ... a third ../ to go to specs/data/
  • There, it would find the mergeTilesets directory that contains the sources

Also, I kept the output files and served them with a local server, and the message
A 3D tile failed to load: http://localhost:8003/output/mergeTilesets/basicMerge/sub/TilesetA/tileset.json
seems to confirm this.

The reason for that is probably that the path.relative function behaves differently for files and for directories (i.e. it does not "know" whether something is a file or directory). When specifying the inputs including the trailing /tileset.json, then it does insert the required third ../. (I'll try to fix that locally...)

@jo-chemla
Copy link
Contributor Author

Thanks for the detailed feedback, very much appreciated.

  • Completely agree that the second code-block looks way clearer, better differentiating between when jsonOnly is either true or false.
  • I now realize that tilesetTargetDirectoryPath could have been taken out of the scope of the for loop.
  • And indeed the third .. is missing from the spec. The table above, on which I tested the new mergeJson implementation, was indeed only looking at inputs and outputs specified as filepaths, including foldername\tilesetName.json. I did not test it on directories since none of our tilesets are simply named tileset.json - I should have specified it, sorry for that! You are probably on the right track with that node path.relative not differentiating between filepaths and directories.

TLDR: completely agree with the proposed changes, these look indeed better!

@javagl javagl mentioned this pull request Jul 12, 2024
@javagl
Copy link
Contributor

javagl commented Jul 12, 2024

@jo-chemla I tried to address some of the issues in #143 . I tried to straighten out these "identifiers", and add some (maybe too) elaborate comments. Eventually, this could/should make this PR obsolete 🤞

@jo-chemla
Copy link
Contributor Author

jo-chemla commented Jul 12, 2024

Thanks for the feedback, great to hear. @alouis-jpg from our team did test against the same 5 cases of the table with a mix of absolute and relative paths for inputs and outputs (only on tileset files since we have no tileset named tileset.json) and they all yielded the same children uris with your PR. So this new PR #143 should be good to go!

@javagl
Copy link
Contributor

javagl commented Jul 24, 2024

Based on your comments, this seems to now be resolved via #143

@javagl javagl closed this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants