Skip to content

Conversation

@david-rundle
Copy link

This commit adds support in rose for zstd and xz (with default options) using the same syntax as for gzip, including for compressing tarballs of various flavours.

It adds tests for zstd (46) and xz (47) that are clones of the gzip rose_arch test (32) and a slightly beefier test for zstd (48) that is a clone of (07).

…s) using the same syntax as for gzip, including for compressing tarballs of various flavours.

It adds tests for zstd (46) and xz (47) that are clones of the gzip rose_arch test (32) and a slightly beefier test for zstd (48) that is a clone of (07).
@wxtim wxtim requested review from oliver-sanders and wxtim April 30, 2025 11:11
add myself to Contributors
@david-rundle
Copy link
Author

This PR introduces the ability to use external software, i.e. zstd and xz but it makes no attempt to check whether the program is present. I have not tried to handle that situation gracefully but personally I don't think think that it's necessary as I haven't made any changes to the default settings, if indeed there are any, as rose detects which compression utility to called based on file extension or explicit use of the compress keyword - either of which would expect the user to make a deliberate decision to use zstd or xz.

I note that zstd is increasingly widely used now and even if it isn't installed by default on a particular linux distro (or mac os), it is almost certain to be present in the package manager and is apparently pretty likely to be installed as a dependency of another package (i.e. pygraphviz (via libtiff and others) in the case of rose) such is its ubiquity.

Furthermore it has been approved for addition to the standard library in future versions of python so it is only likely to become more integrated in to python: https://discuss.python.org/t/pep-784-adding-zstandard-to-the-standard-library/87377/138

The same arguments can be made for xz (which uses liblzma) but additionally it does appear to be installed as standard, at least for RHEL 9.4. I'm less wedded to this though and would happily remove it if required. I just added it to check it was that easy...

@david-rundle
Copy link
Author

I also note that there's obviously a lot of very-near-duplication of code here and it would likely be better to implement something more generic (especially given the reliance on external utilities which effectively have identical command line arguments (by design, I think?)). But I am keen to get this in in some form initially and potentially revisit when I have more time (and more skills!) if it has the potential to be beneficial.

Faithfully reproduce behaviour of gzip/xz with zstd by removing the original file upon successful compression. (add --rm to command line)
…d by a keyword compress-cores (default 1 implies off)

updated test 46 where it is explicitly set to 1 core.
added test 49-app-arch-zstd-mp where compress-cores=0 (auto) is tested and compress-cores=4 is tested.
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR (and special thanks for sorting out the tests and documentation too 👏)!

We'll try to get this in soon, but be aware, we've got a very heavy review workload right now, so apologies if it takes longer.

self.app_runner.popen.run_simple(command, shell=True)
self.app_runner.fs_util.delete(tar_name)

if target.compress_scheme in self.ZSTD_EXTS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if target.compress_scheme in self.ZSTD_EXTS:
elif target.compress_scheme in self.ZSTD_EXTS:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed thanks!

self.app_runner.popen.run_simple(command, shell=True)
self.app_runner.fs_util.delete(tar_name)

if target.compress_scheme in self.XZ_EXTS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if target.compress_scheme in self.XZ_EXTS:
elif target.compress_scheme in self.XZ_EXTS:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed thanks!

)
os.close(fdsec)
target.work_source_path = zst_name
command = f"zstd --rm -T{cores} -c '{tar_name}' >'{zst_name}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the high likelyhood of this command being run directly on Cylc servers, we will have to be careful with this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oliver-sanders . I agree and I hope that the default being to not use this is a good trade-off between exposing useful functionality and maintaining order!

I have updated the documentation to note that this should be used with caution on shared resources.

@MetRonnie MetRonnie force-pushed the feature/add-zstd-xz branch from 9a73855 to fc9cca8 Compare May 6, 2025 11:07
replace compress-cores with compress-threads and update documentation to clarify how -T controls operation.

Add guidance about how to use multi-threading
@oliver-sanders
Copy link
Member

oliver-sanders commented May 6, 2025

The "test / docs" test seems to be failing for reasons unrelated to this change, please ignore.

The wonderfully named "test / test" tests, however, seem to be failing for legitimate reasons. It looks like t/rose-task-run/07-app-arch.t has become broken?

handler = compress_manager.get_handler(target.compress_scheme)
handler.compress_sources(target, work_dir)
compress_args = {"threads": target.compress_threads}
handler.compress_sources(target, work_dir, **compress_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a very simple check

mode=rose_arch

[arch]
# rose-app.conf
command-format=cp %(sources)s %(target)s
target-prefix=/home/users/tim.pillinger/cylc-src/rose-apps/arch/archive/
source-prefix=/home/users/tim.pillinger/cylc-src/rose-apps/arch/source/

[arch:world.out]
source='world.out'

[arch:gunzipme.gz]
source='gunzipme.out'

[arch:targunzipme.tar.gz]
source='targunzipme.out'
export CYLC_WORKFLOW_ID='hippo'
export CYLC_TASK_ID='task-run'
export CYLC_TASK_NAME='task-run'
export CYLC_TASK_CYCLE_POINT='task-run'
export CYLC_TASK_LOG_ROOT="${HERE}/log"

echo Running app from "${HERE}/app"
rose task-run --config="${HERE}/app"

and got an error:

[FAIL] RoseArchGzip.compress_sources() got an unexpected keyword argument 'threads'

I think that you need to add the threads argument to rose_arch_gzip.py and possibly other items in that folder. You may want to consider emitting a warning if threads != 1 and program_is_single_thread:

@david-rundle
Copy link
Author

The "test / docs" test seems to be failing for reasons unrelated to this change, please ignore.

The wonderfully named "test / test" tests, however, seem to be failing for legitimate reasons. It looks like t/rose-task-run/07-app-arch.t has become broken?

This test works for me locally? Is that the most recent version of the code?

@oliver-sanders
Copy link
Member

oliver-sanders commented May 6, 2025

Hmm, testing locally (cazldf...), this test passes on upstream/master but fails on david-rundle:feature/add-zstd-xz, so this does look like a genuine error. Do you have zstd and xz installed on your box / in your environment? Does the test need them to be installed?

If the test does need these utils installed, let us know and we can work out where to install them in CI.

The error can be found from this line onwards:

https://github.com/metomi/rose/actions/runs/14859530104/job/41720858720#step:13:14

I've pulled out a couple of error messages that might be pertinent:

tar: /home/runner/cylc-run/rtb.20250506T123046Z/07-app-arch/hLA3QX/foo/20130101T0000Z/hello/worlds/unknown/stuff.pax: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
tar (child): /home/runner/cylc-run/rtb.20250506T123046Z/07-app-arch/hLA3QX/foo/20130101T1200Z/hello/worlds/planet-n.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now

self.app_runner = app_runner

def compress_sources(self, target, work_dir):
def compress_sources(self, target, work_dir, threads="1"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO threads should be cast to int as soon as we parse the config.

@wxtim
Copy link
Contributor

wxtim commented May 6, 2025

I've written you a couple of integration tests at david-rundle#1

Tests running at https://github.com/wxtim/rose/actions/runs/14864492636

@david-rundle david-rundle marked this pull request as draft May 7, 2025 08:31
wxtim added 2 commits May 7, 2025 12:56
…ions about code - notably that a failure should appear if thread != 1 if the compression app doesn't support that

refactor tests to provide nicer test architecture for other tests.

flake8
* Changed a test based on this conversation:
  https://github.com/metomi/rose/pull/2286/files#r258325340
* Fixed a problem cased by a mutable default arg.
@oliver-sanders oliver-sanders modified the milestones: 2.5.0, 2.6.0 Jul 9, 2025
@wxtim
Copy link
Contributor

wxtim commented Jul 22, 2025

@david-rundle - Is there anything I can do to help you move this forward?

@david-rundle
Copy link
Author

Just looking at this again now having merged your changes and am getting loads of errors like:


ok 1 - 46-app-arch-zstd-install
not ok 2 - 46-app-arch-zstd-play
ERROR - Incomplete tasks:
    * 1/archive did not complete the required outputs:
      ⨯ ┆  succeeded
DEBUG - stopping zmq replier...
DEBUG - ...stopped
DEBUG - stopping zmq publisher...
DEBUG - ...stopped
DEBUG - auth received API command b'TERMINATE'
DEBUG - Removing authentication keys from scheduler
INFO - DONE
not ok 3 - 46-app-arch-zstd-job.status-archive-01
t/rose-task-run/46-app-arch-zstd.t: line 51: cd: /home/users/david.rundle/cylc-run/rtb.20250731T115019Z/46-app-arch-zstd/uSHLqn/share/backup: No such file or directory
--- -	2025-07-31 12:50:30.909814205 +0100
+++ 46-app-arch-zstd-find.out	2025-07-31 12:50:30.905487529 +0100
@@ -1,2 +0,0 @@
-./archive.d/2016.txt.zst
-./archive.d/whatever.tar.zst
not ok 4 - 46-app-arch-zstd-find.out
Failed 3/4 subtests 

Test Summary Report
-------------------
t/rose-task-run/46-app-arch-zstd.t (Wstat: 0 Tests: 4 Failed: 3)
  Failed tests:  2-4
Files=1, Tests=4,  9 wallclock secs ( 0.02 usr  0.00 sys +  1.37 cusr  0.73 csys =  2.12 CPU)
Result: FAIL

and it's been a while so I don't really remember any of the workflow for running the tests so may need some help dusting this off!

@wxtim
Copy link
Contributor

wxtim commented Jul 31, 2025

@david-rundle - I've fixed the conflict for you, and retriggered the tests.

I have replicated the test failure locally, but I'm not clear on the causes.

Anyway - to run the test locally

conda env create --name cylc.dev python=3.9 pygraphviz pip        # Or use the same env you used before

cat > ~/conda-envs/cylc.dev/etc/conda/activate.d/cylc.sh <<__HERE__
#!/usr/bin/env bash
CYLC_ENV_NAME="$(basename "$CONDA_DEFAULT_ENV")"
export CYLC_ENV_NAME
__HERE__

cat > ~/conda-envs/cylc.dev/etc/conda/deactivate.d/cylc.sh <<__HERE__
#!/usr/bin/env bash
unset CYLC_ENV_NAME
__HERE__

conda install cylc-rose
cd ~/path/to/your/copy/of/rose

pip install -e .

./etc/bin/rose-test-battery t/rose-task-run/46-app-arch-zstd.t

I've tried running the tests and I'm getting the same failure - it looks like the workflow is failing because the task is failing, and you can look at the details of the failure at ~/cylc-run/rtb.<datestamp>/46-app-arch-zstd/p6oswd/log/job/1/archive/NN/job.err. Mine is failing with

/home/users/tim.pillinger/cylc-run/rtb.20250731T155430Z/46-app-arch-zstd/p6oswd/log/job/1/archive/01/job: line 42: zstd: command not found

@oliver-sanders oliver-sanders removed this from the 2.6.0 milestone Sep 17, 2025
@oliver-sanders
Copy link
Member

New developments...

Python has now added support for zstd in version 3.14, this makes it much more attractive for use in Rose as we would no longer be relying on an external dependency (which may or may not be installed). Although this functionality would only be present when Python >= 3.14.0 is installed.

https://docs.python.org/3/library/compression.zstd.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants