Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTJ2K Compressor #1883

Open
wants to merge 4 commits into
base: htj2k-beta
Choose a base branch
from

Conversation

palemieux
Copy link

@palemieux palemieux commented Oct 13, 2024

This patch fulfills my action item from the September 19 TSC meeting.

The patch proposes to add support for High-Throughput JPEG 2000 (HTJ2K) compression to OpenEXR -- HTJ2K is JPEG 2000 with the HT block coder standardized in Rec. ITU-T T.814 | ISO/IEC 15444-15. As detailed at Evaluating HTJ2K as a compressor option for OpenEXR, HTJ2K demonstrates significant improvements in both speed and file size over other OpenEXR compressors. Furthermore, HTJ2K is a worldwide standard that benefits from a diversity of implementations, builds on JPEG 2000, which is in broad use in studio workflows, is appropriate for long-term preservation, has both lossy and lossless modes and supports integer and floating point samples. Finally, the unique features of HTJ2K could be powerful additions to high-performance OpenEXR workflows where the full image may not always be viewed at its full resolution.

The patch defines a ht256 compressor, which uses 256-scanline chunks (this is not a limitation of HTJ2K, which can support any number of scanlines) and the open-source OpenJPH library.

Copy link

linux-foundation-easycla bot commented Oct 13, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

src/bin/exrconv/main.cpp Fixed Show fixed Hide fixed
src/bin/exrconv/main.cpp Fixed Show fixed Hide fixed
src/bin/exrconv/main.cpp Fixed Show fixed Hide fixed
@cary-ilm
Copy link
Member

The CI is failing in the validate step, which is attempting to link a test program against the library. The error message indicate the link is failing with unresolved symbols, possibly because that cmake configuration is missing the new dependency.

However, this part of the CI has been rewritten in the PR I just merged. Can you rebase your branch onto the current main branch now? It may still fail, but hopefully it will be easier to resolve then at least.

@palemieux palemieux force-pushed the feature/add-ht-support-rebase branch from 2a3f1c5 to 439cfe5 Compare December 13, 2024 04:25
@palemieux
Copy link
Author

Can you rebase your branch onto the current main branch now? It may still fail, but hopefully it will be easier to resolve then at least.

@cary-ilm Done.

cmake/CMakeLists.txt Outdated Show resolved Hide resolved
@cary-ilm
Copy link
Member

The failure in the "Validate" step of the linux build is along the lines of what we discussed in the TSC meeting today, the check to make sure "make install" installs just the right files. This output from the logs appears to indicate that the cmake configuration is causing the openjph library and headers to be installed, when that shouldn't happen, it'll cause problems for downstream projects:

Error: The following files were installed but were not expected:
  bin/ojph_compress
  bin/ojph_expand
  include/OpenEXR/ImfHTCompressor.h
  include/OpenEXR/ImfHTKCompressor.h
  include/openjph/ojph_arch.h
  include/openjph/ojph_arg.h
  include/openjph/ojph_base.h
  include/openjph/ojph_codestream.h
  include/openjph/ojph_defs.h
  include/openjph/ojph_file.h
  include/openjph/ojph_mem.h
  include/openjph/ojph_message.h
  include/openjph/ojph_params.h
  include/openjph/ojph_version.h
  lib64/libopenjph.so
  lib64/libopenjph.so.0.18
  lib64/libopenjph.so.0.18.2
  lib64/pkgconfig/openjph.pc

Try to make sure your cmake "fetch" of openjph mirror as close as possible how the current setup handles libdeflate and Imath. I had a similar issue with them originally but the current setup should handle the the fetch properly.

@palemieux
Copy link
Author

Thanks! What is the best way to run that validate step locally?

@palemieux
Copy link
Author

@cary-ilm I remember now... I could never fix the following errors:

[cmake] CMake Error: install(EXPORT "OpenEXR" ...) includes target "Iex" which requires target "openjph" that is not in any export set.
[cmake] CMake Error: install(EXPORT "OpenEXR" ...) includes target "IlmThread" which requires target "openjph" that is not in any export set.
[cmake] CMake Error: install(EXPORT "OpenEXR" ...) includes target "OpenEXRCore" which requires target "openjph" that is not in any export set.
[cmake] CMake Error: install(EXPORT "OpenEXR" ...) includes target "OpenEXR" which requires target "openjph" that is not in any export set.
[cmake] CMake Error: install(EXPORT "OpenEXR" ...) includes target "OpenEXRUtil" which requires target "openjph" that is not in any export set.

It seems to impose some requirements on openjph, which I could not find a solution for.

@cary-ilm
Copy link
Member

I'm not enough of a cmake expert to immediately know how to resolve this. Since this is a work in progress, you could just disable the CI "validate" step entirely for now, just edit the workflow file on your branch and add a "if: false" line, or just delete it. We'll need it resolved eventually before merging, but at least that will allow you to get on with other priorities.

I'm happy to help resolve this, but I won't have much time to look into it until mid-January.

@palemieux palemieux force-pushed the feature/add-ht-support-rebase branch 3 times, most recently from 2230cc5 to e28aaf5 Compare January 7, 2025 19:05

// make image filename
char input_filename[512] = { '\0' };
sprintf(input_filename, args->input_filename, frame_index + args->start_frame);

Check failure

Code scanning / CodeQL

Multiplication result converted to larger type High test

Multiplication result may overflow 'int' before it is converted to 'size_type'.
// make image filenames
char input_filename[512] = { '\0' };
char output_filename[512] = { '\0' };
sprintf(input_filename, args.input_filename, frame_index + args.start_frame);

Check failure

Code scanning / CodeQL

Multiplication result converted to larger type High test

Multiplication result may overflow 'int' before it is converted to 'size_type'.
char input_filename[512] = { '\0' };
char output_filename[512] = { '\0' };
sprintf(input_filename, args.input_filename, frame_index + args.start_frame);
sprintf(output_filename, args.output_filename, frame_index + args.start_frame);

Check failure

Code scanning / CodeQL

Multiplication result converted to larger type High test

Multiplication result may overflow 'int' before it is converted to 'size_type'.
@peterhillman
Copy link
Contributor

Regarding exrperf and exrconv: OpenEXR now provides exrmetrics for performance analysis. exrmetrics can also do compression conversion (though that's not an obvious use from the name) so can be used instead of exrconv. It looks like exrmetrics has more complete support for different channel and part types, but there are a couple of features that exrperf and exrconv offer that are missing from exrmetrics. Perhaps it would be better to add those as options to exrmetrics, probably as a separate PR, instead of adding new binaries?

@palemieux
Copy link
Author

Perhaps it would be better to add those as options to exrmetrics, probably as a separate PR, instead of adding new binaries?

Yes. exrperf and exrconv will be removed before this is merged.

@palemieux palemieux marked this pull request as ready for review January 17, 2025 00:09
@palemieux
Copy link
Author

@cary-ilm I believe this is ready for review and discussion at the upcoming TSC call.

@kmilos
Copy link

kmilos commented Jan 21, 2025

Please either make this optional, and/or also support building w/ system OpenJPH, like libdeflate. Bundling a (unreleased) feature branch/tag is not an ideal solution for long-term distribution...

@palemieux
Copy link
Author

@kmilos I agree that referencing an unreleased branch/tag of a dependency is not ideal. I expect a release of OpenJPH to be referenced by the time this PR is merged.

@kmilos
Copy link

kmilos commented Jan 21, 2025

Ok, makes sense, thanks @palemieux In the meantime I still think it would also be good to prepare for the detection for the system library, like it is done for libdeflate...

@palemieux
Copy link
Author

In the meantime I still think it would also be good to prepare for the detection for the system library, like it is done for libdeflate...

You mean use the system library if available (through find_package), and fall back to the GitHub dependency if the system library is not available?

@kmilos
Copy link

kmilos commented Jan 21, 2025

You mean use the system library if available (through find_package), and fall back to the GitHub dependency if the system library is not available?

Exactly, this is how it is already done for libdeflate (one can skip the CONFIG detection branch as OpenJPH doesn't ship a CMake config, only a pkgconf one). And Imath as well. In addition there are options to force use of the "internal" (fetched from GitHub) ones...

@palemieux
Copy link
Author

Exactly, this is how it is already done for libdeflate (one can skip the CONFIG detection branch as OpenJPH doesn't ship a CMake config, only a pkgconf one). In addition there is an option to force use of the "internal" (fetched from GitHub) one...

Ok. No objection from me. Is that regularly tested?

@@ -192,6 +198,7 @@ static const std::map<std::string, Compression> CompressionNameToId = {
{"b44a", Compression::B44A_COMPRESSION},
{"dwaa", Compression::DWAA_COMPRESSION},
{"dwab", Compression::DWAB_COMPRESSION},
{"ht256", Compression::HT256_COMPRESSION},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"ht256", Compression::HT256_COMPRESSION},
{"ht", Compression::HT_COMPRESSION},

Rename ht256 to ht 256-scanline chunks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still need to be resolved? "ht256" still appears in other places, such as in CompressionDesc on line 180 above, does that need to be consistent?

@TodicaIonut
Copy link
Contributor

setting HT Compression Quality Level float. visually lossless setting Compression Level 0.0f, quantizing lossy. floating point base-error value. @palemieux

@palemieux palemieux force-pushed the feature/add-ht-support-rebase branch 2 times, most recently from 15fc4d8 to 2be7650 Compare February 2, 2025 19:55
@cary-ilm
Copy link
Member

cary-ilm commented Feb 2, 2025

@palemieux, now that we're getting close with this, we do need to resolve the DCO and CLA, note those checks have been failing. The fix for the other failing checks should get merged to main shortly.

For the contributor license agreement you'll need to sign the form through the red "NOT COVERED" link above.

For the "Digital Certificate of Origin", each commit needs to be "signed" via the -s argument to git commit. This adds the Signed-off-by: <email> line to the commit message. You'll need to do this for every commit. You can do this retroactively to the existing commits via git commit -s --amend --no-edit. You'll then have to git push --force to overwrite the existing commits.

You could also squash everything into a single signed commit, since the history of these commits will get lost when we merge it anyway, since we generally "squash-and-merge" instead of "rebase-and-merge".

For the feature branch, we'll also need this PR to merge to the feature branch name, not to main as it stands now. Should we call that AcademySoftwareFoundation/htj2k?

I look this over in the next few days to see if anything further stands out.

Signed-off-by: Pierre-Anthony Lemieux <[email protected]>
@palemieux palemieux force-pushed the feature/add-ht-support-rebase branch from 2be7650 to c8a670e Compare February 2, 2025 21:49
@palemieux
Copy link
Author

@cary-ilm CLA is executed, DCO is done.

Ok with AcademySoftwareFoundation/htj2k. Can you create the branch and I will change the target branch of this PR and rebase the commit.

Copy link
Member

@cary-ilm cary-ilm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, the white paper appears to use "High Throughput", with no hyphen, so better to stay consistent with that.

SUDO="sudo"
fi

git clone -b add-export https://github.com/palemieux/OpenJPH.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be https://github.com/aous72/OpenJPH?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It will be a tagged release of https://github.com/aous72/OpenJPH. I am waiting to ask the maintainer to tag the release until we made sure that no further major changes to the OpenJPH build system is needed to accommodate OpenEXR. Let me know if we are at this point.

git clone -b add-export https://github.com/palemieux/OpenJPH.git
cd OpenJPH

# git checkout ${TAG}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And uncomment this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

src/lib/OpenEXR/ImfCompression.cpp Outdated Show resolved Hide resolved
src/lib/OpenEXR/ImfCompression.h Outdated Show resolved Hide resolved
@@ -192,6 +198,7 @@ static const std::map<std::string, Compression> CompressionNameToId = {
{"b44a", Compression::B44A_COMPRESSION},
{"dwaa", Compression::DWAA_COMPRESSION},
{"dwab", Compression::DWAB_COMPRESSION},
{"ht256", Compression::HT256_COMPRESSION},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still need to be resolved? "ht256" still appears in other places, such as in CompressionDesc on line 180 above, does that need to be consistent?

website/ReadingAndWritingImageFiles.rst Outdated Show resolved Hide resolved
@cary-ilm
Copy link
Member

cary-ilm commented Feb 3, 2025

@palemieux I created the branch and called it htj2k-beta.

@palemieux
Copy link
Author

@cary-ilm The white paper is in error :(. The official name, per the ITU and ISO specification titles is High-throughput JPEG 2000.

@palemieux palemieux changed the base branch from main to htj2k-beta February 3, 2025 04:08
@peterhillman
Copy link
Contributor

Question about RGB channels: I see special handling of RGB in the code. I admit I don't fully understand what's happening there, but are RGB channels in all layers treated as RGB? Would an image with channels called diffuse.R, diffuse.G, diffuse.B compress as efficiently as one with plain R, G, B? What about an image part that has multiple sets of RGB channels? The channel mapping function which seems to use the channel list to derive the mapping. Is that mapping serialized to the file to allow future changes in the mapping logic, or do both the encoder and decoder use the channel list?

@palemieux
Copy link
Author

@peterhillman the compressor makes use of a JPEG 2000 (Part 1) feature that improves coding efficiency when the first three channels are RGB (or equivalent): in that scenario, a decorrelating transform (effectively a RGB to YUV transform) is applied to the channels before coding.

Currently both the encode and decoder determine the mapping at run time. The mapping could conceivably be serialized in the file or the the channel order modified by the encoder (if possible).

The algorithm could be modified to accommodate channels named something other than "R", "G" and "B". Does a naming convention exist?

How would multiple sets of RGB channels be signaled in the file? Is that a common use case?

More complex/flexible JPEG 2000 tools are available if needed.

@peterhillman
Copy link
Contributor

Channel naming is defined in a couple of places:
https://openexr.com/en/latest/TechnicalIntroduction.html#channel-names
https://openexr.com/en/latest/InterpretingDeepPixels.html#channel-names-and-layers
You need to look for the "base name" - the part after the last dot, if there is one, or the whole name otherwise. (If the penultimate character in the channel name is a . then you can compare the last character against R,G or B)

It's common for a multipart file to contain R G B A in one part and L1.R L1.G L1.B L1.A in another. Since the compression runs per-part it may only see L1.R L1.G L1.B L1.A, so that could be treated as an RGBA group.

Having multiple RGB layers within the same part is possible, e.g. six channels forming two layers: R G B L1.R L1.G L1.B. If the codec only supports one set of RGB channels it makes sense to use R G B if any of those exist, and fall back to the (alphabetically) first set of *.R *.G *.B channels found otherwise.

There would be a lot of compression efficiency to be gained from a codec that understood that RGB channels within each layer have similarity to each other, but you would also expect strong correlation between all the red components (say) of each layer. Perhaps a future OpenEXR codec could apply a more complex decorrelation transform itself before using JPEG2000.

Including the mapping may be useful, and it may even be useful to support custom mappings at write time. #1942 discusses this issue in lossy DWA files, though it's more critical there because it affects image quality not just file size.

@palemieux
Copy link
Author

@peterhillman Thanks for the detailed information. Will review the code in light of the naming convention.

Would supporting custom mappings at write time require a change to the API? What kind of mappings do you have in mind?

There would be a lot of compression efficiency to be gained from a codec that understood that RGB channels within each layer have similarity to each other, but you would also expect strong correlation between all the red components (say) of each layer. Perhaps a future OpenEXR codec could apply a more complex decorrelation transform itself before using JPEG2000.

JPEG 2000 supports complex component transformations. The questions is whether it is worth involving that machinery at this time. Is there a latent or existing use case for files with, say, multiple R channels within a part?

@peterhillman
Copy link
Contributor

Would supporting custom mappings at write time require a change to the API? What kind of mappings do you have in mind?

Probably an API extension, yes, ideally one that's more general so it can be used across different codecs.

JPEG 2000 supports complex component transformations. The questions is whether it is worth involving that machinery at this time. Is there a latent or existing use case for files with, say, multiple R channels within a part?

Future work, I think, and probably a new compression method as far as OpenEXR is concerned.

Storing many channels within an OpenEXR is relatively common, and those files get very large. Making compression work well in that case would be really beneficial. An example of multiple R channels is a render where different shader components (specular, diffuse, emission) or different CG lights are split out to allow interactive adjustments later without rerendering. The Beachball example sequence is 3D-Stereo image and has separate RGB layers for left and right. It stores a single layer in each part. That is a common strategy as it reduces the amount of data that needs to be loaded and decompressed before accessing a subset of all channels. However, if a codec is very efficient at compressing multiple RGB layers together, it would make sense to store all channels in a single part to reduce the total file size.

@cary-ilm
Copy link
Member

cary-ilm commented Feb 3, 2025

@palemieux, well that's unfortunate for the grammar sticklers, even the Library of Congress document is a mix of "High Throughput", "High-Throughput" and "High-throughput". I do see the ISO document has "High-Throughput" in the title, so I'm fine with that. I remove the original suggestions but left a few where things should be updated.

@palemieux
Copy link
Author

palemieux commented Feb 3, 2025

@peterhillman et al. Should the reordering currently automatically performed by the compressor be removed and instead ask the application to provide the channels in RGB order if it wants decorrelation to be performed by the compressor? That would avoid burning in the channel reordering algorithm in the compressor.

palemieux and others added 3 commits February 3, 2025 09:18
Co-authored-by: Cary Phillips <[email protected]>
Signed-off-by: Pierre-Anthony Lemieux <[email protected]>
Co-authored-by: Cary Phillips <[email protected]>
Signed-off-by: Pierre-Anthony Lemieux <[email protected]>
Co-authored-by: Cary Phillips <[email protected]>
Signed-off-by: Pierre-Anthony Lemieux <[email protected]>
@peterhillman
Copy link
Contributor

The C++ API doesn't track the order you add files to the FrameBuffer or channellist attribute, it maintains them in alphabetical order. That's also the order when they are presented to the compressor. So making the codec aware of the ordering would be tricky.
There's something to be said for making the API as codec agnostic as possible, avoiding codec-specific API functions or special access patterns that change behavior. There's often ambiguity about whether such settings should be presented to the user, or handled by the software internally. It would be much simpler if the advice could be "name your channels according to the spec, and the codec will do the right thing"

@palemieux
Copy link
Author

@peterhillman Ok. Thanks for the background. I am tempted to include the channel map in the codestream... in case the algorithm/naming convention changes in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants