-
Notifications
You must be signed in to change notification settings - Fork 58
Add zstd and xz support #2893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add zstd and xz support #2893
Conversation
…s) using the same syntax as for gzip, including for compressing tarballs of various flavours. It adds tests for zstd (46) and xz (47) that are clones of the gzip rose_arch test (32) and a slightly beefier test for zstd (48) that is a clone of (07).
add myself to Contributors
|
This PR introduces the ability to use external software, i.e. zstd and xz but it makes no attempt to check whether the program is present. I have not tried to handle that situation gracefully but personally I don't think think that it's necessary as I haven't made any changes to the default settings, if indeed there are any, as rose detects which compression utility to called based on file extension or explicit use of the compress keyword - either of which would expect the user to make a deliberate decision to use zstd or xz. I note that zstd is increasingly widely used now and even if it isn't installed by default on a particular linux distro (or mac os), it is almost certain to be present in the package manager and is apparently pretty likely to be installed as a dependency of another package (i.e. pygraphviz (via libtiff and others) in the case of rose) such is its ubiquity. Furthermore it has been approved for addition to the standard library in future versions of python so it is only likely to become more integrated in to python: https://discuss.python.org/t/pep-784-adding-zstandard-to-the-standard-library/87377/138 The same arguments can be made for xz (which uses liblzma) but additionally it does appear to be installed as standard, at least for RHEL 9.4. I'm less wedded to this though and would happily remove it if required. I just added it to check it was that easy... |
|
I also note that there's obviously a lot of very-near-duplication of code here and it would likely be better to implement something more generic (especially given the reliance on external utilities which effectively have identical command line arguments (by design, I think?)). But I am keen to get this in in some form initially and potentially revisit when I have more time (and more skills!) if it has the potential to be beneficial. |
Faithfully reproduce behaviour of gzip/xz with zstd by removing the original file upon successful compression. (add --rm to command line)
…d by a keyword compress-cores (default 1 implies off) updated test 46 where it is explicitly set to 1 core. added test 49-app-arch-zstd-mp where compress-cores=0 (auto) is tested and compress-cores=4 is tested.
Added myself to the .mailmap
oliver-sanders
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR (and special thanks for sorting out the tests and documentation too 👏)!
We'll try to get this in soon, but be aware, we've got a very heavy review workload right now, so apologies if it takes longer.
| self.app_runner.popen.run_simple(command, shell=True) | ||
| self.app_runner.fs_util.delete(tar_name) | ||
|
|
||
| if target.compress_scheme in self.ZSTD_EXTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if target.compress_scheme in self.ZSTD_EXTS: | |
| elif target.compress_scheme in self.ZSTD_EXTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed thanks!
| self.app_runner.popen.run_simple(command, shell=True) | ||
| self.app_runner.fs_util.delete(tar_name) | ||
|
|
||
| if target.compress_scheme in self.XZ_EXTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if target.compress_scheme in self.XZ_EXTS: | |
| elif target.compress_scheme in self.XZ_EXTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed thanks!
| ) | ||
| os.close(fdsec) | ||
| target.work_source_path = zst_name | ||
| command = f"zstd --rm -T{cores} -c '{tar_name}' >'{zst_name}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the high likelyhood of this command being run directly on Cylc servers, we will have to be careful with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oliver-sanders . I agree and I hope that the default being to not use this is a good trade-off between exposing useful functionality and maintaining order!
I have updated the documentation to note that this should be used with caution on shared resources.
9a73855 to
fc9cca8
Compare
replace compress-cores with compress-threads and update documentation to clarify how -T controls operation. Add guidance about how to use multi-threading
…o clarify how -T multi-processing works
|
The "test / docs" test seems to be failing for reasons unrelated to this change, please ignore. The wonderfully named "test / test" tests, however, seem to be failing for legitimate reasons. It looks like |
| handler = compress_manager.get_handler(target.compress_scheme) | ||
| handler.compress_sources(target, work_dir) | ||
| compress_args = {"threads": target.compress_threads} | ||
| handler.compress_sources(target, work_dir, **compress_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried a very simple check
mode=rose_arch
[arch]
# rose-app.conf
command-format=cp %(sources)s %(target)s
target-prefix=/home/users/tim.pillinger/cylc-src/rose-apps/arch/archive/
source-prefix=/home/users/tim.pillinger/cylc-src/rose-apps/arch/source/
[arch:world.out]
source='world.out'
[arch:gunzipme.gz]
source='gunzipme.out'
[arch:targunzipme.tar.gz]
source='targunzipme.out'export CYLC_WORKFLOW_ID='hippo'
export CYLC_TASK_ID='task-run'
export CYLC_TASK_NAME='task-run'
export CYLC_TASK_CYCLE_POINT='task-run'
export CYLC_TASK_LOG_ROOT="${HERE}/log"
echo Running app from "${HERE}/app"
rose task-run --config="${HERE}/app"and got an error:
[FAIL] RoseArchGzip.compress_sources() got an unexpected keyword argument 'threads'
I think that you need to add the threads argument to rose_arch_gzip.py and possibly other items in that folder. You may want to consider emitting a warning if threads != 1 and program_is_single_thread:
This test works for me locally? Is that the most recent version of the code? |
|
Hmm, testing locally (cazldf...), this test passes on upstream/master but fails on david-rundle:feature/add-zstd-xz, so this does look like a genuine error. Do you have zstd and xz installed on your box / in your environment? Does the test need them to be installed? If the test does need these utils installed, let us know and we can work out where to install them in CI. The error can be found from this line onwards: https://github.com/metomi/rose/actions/runs/14859530104/job/41720858720#step:13:14 I've pulled out a couple of error messages that might be pertinent: |
| self.app_runner = app_runner | ||
|
|
||
| def compress_sources(self, target, work_dir): | ||
| def compress_sources(self, target, work_dir, threads="1"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO threads should be cast to int as soon as we parse the config.
|
I've written you a couple of integration tests at david-rundle#1 Tests running at https://github.com/wxtim/rose/actions/runs/14864492636 |
…ions about code - notably that a failure should appear if thread != 1 if the compression app doesn't support that refactor tests to provide nicer test architecture for other tests. flake8
* Changed a test based on this conversation: https://github.com/metomi/rose/pull/2286/files#r258325340 * Fixed a problem cased by a mutable default arg.
|
@david-rundle - Is there anything I can do to help you move this forward? |
Feature/add zstd xz
|
Just looking at this again now having merged your changes and am getting loads of errors like: and it's been a while so I don't really remember any of the workflow for running the tests so may need some help dusting this off! |
|
@david-rundle - I've fixed the conflict for you, and retriggered the tests. I have replicated the test failure locally, but I'm not clear on the causes. Anyway - to run the test locally I've tried running the tests and I'm getting the same failure - it looks like the workflow is failing because the task is failing, and you can look at the details of the failure at |
|
New developments... Python has now added support for zstd in version 3.14, this makes it much more attractive for use in Rose as we would no longer be relying on an external dependency (which may or may not be installed). Although this functionality would only be present when Python >= 3.14.0 is installed. |
This commit adds support in rose for zstd and xz (with default options) using the same syntax as for gzip, including for compressing tarballs of various flavours.
It adds tests for zstd (46) and xz (47) that are clones of the gzip rose_arch test (32) and a slightly beefier test for zstd (48) that is a clone of (07).