-
Notifications
You must be signed in to change notification settings - Fork 4
fix: be more explicit about rsync args and ensure that symlinks are copied (avoiding that they point to non-existing files); to not copy permissions and ownership (thanks @landerini) #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…opied (avoiding that they point to non-existing files)
📝 WalkthroughWalkthroughAdds StorageObject.local_footprint() to report zero for on‑demand eligible objects, directory recursive size for directories, or object size otherwise; and replaces compact rsync flags with explicit Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant StorageObject
participant Rsync as "rsync (local/remote)"
Caller->>StorageObject: request retrieve_object(path)
alt on-demand eligible
StorageObject->>Caller: return (no local transfer)
else not on-demand
StorageObject->>Rsync: run rsync --recursive --times --copy-links
Rsync-->>StorageObject: transfer result
StorageObject-->>Caller: return result
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.py⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (1)📚 Learning: 2025-03-20T08:35:40.800ZApplied to files:
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
snakemake_storage_plugin_fs/__init__.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
snakemake_storage_plugin_fs/__init__.py
🧠 Learnings (1)
📚 Learning: 2025-03-20T08:35:40.800Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-storage-plugin-fs PR: 32
File: snakemake_storage_plugin_fs/__init__.py:202-210
Timestamp: 2025-03-20T08:35:40.800Z
Learning: The `is_ondemand_eligible` attribute used in `snakemake_storage_plugin_fs` is defined and initialized in the base classes from `snakemake_interface_storage_plugins.storage_object` package, so it doesn't need to be explicitly initialized in the derived classes.
Applied to files:
snakemake_storage_plugin_fs/__init__.py
🔇 Additional comments (2)
snakemake_storage_plugin_fs/__init__.py (2)
225-237: LGTM! The explicit rsync options align with PR objectives.The switch to explicit options, particularly
--copy-links, ensures that symlinks are dereferenced and their content is copied, preventing broken symlinks at the destination. This directly addresses the PR's stated goal.Note: The
-Dflag (preserve device and special files) from the original-aoption is not included. This is likely intentional for safety and simplicity, but verify if device/special file preservation is needed for your use case.
253-264: LGTM! The explicit options correctly preserve destination permissions.The explicit rsync options are well-chosen:
--copy-linksdereferences symlinks (aligns with PR objective)- Omission of
--perms,--group,--ownerallows inheritance from destination (correctly implements the setgid behavior mentioned in the comment)--recursiveand--timesprovide essential functionality
| if self.query_path.is_dir(): | ||
| # If the object is a directory, we have to count all files in it. | ||
| return sum( | ||
| f.stat().st_size for f in self.query_path.rglob("*") if f.is_file() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider symlink handling in directory traversal.
The rglob("*") call follows symlinks by default, which could lead to:
- Counting files outside the intended directory tree
- Infinite loops if circular symlinks exist
- Inconsistent footprint calculations if symlinks point to large external directories
Since the PR explicitly addresses symlink handling, consider whether this method should follow symlinks or not. If symlinks should not be followed, apply this diff:
if self.query_path.is_dir():
# If the object is a directory, we have to count all files in it.
return sum(
- f.stat().st_size for f in self.query_path.rglob("*") if f.is_file()
+ f.stat(follow_symlinks=False).st_size
+ for f in self.query_path.rglob("*")
+ if f.is_file() and not f.is_symlink()
)Alternatively, if the intent is to follow symlinks but avoid infinite loops, consider using os.walk with followlinks=False or tracking visited inodes.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self.query_path.is_dir(): | |
| # If the object is a directory, we have to count all files in it. | |
| return sum( | |
| f.stat().st_size for f in self.query_path.rglob("*") if f.is_file() | |
| ) | |
| if self.query_path.is_dir(): | |
| # If the object is a directory, we have to count all files in it. | |
| return sum( | |
| f.stat().st_size for f in self.query_path.rglob("*") if f.is_file() and not f.is_symlink() | |
| ) |
🤖 Prompt for AI Agents
In snakemake_storage_plugin_fs/__init__.py around lines 204 to 208, the
directory size calculation uses self.query_path.rglob("*") which follows
symlinks by default and can count files outside the tree or cause loops; update
the traversal to avoid following symlinks by skipping symlink entries when
summing sizes (e.g., only include entries where f.is_file() and not
f.is_symlink()), or replace rglob with os.walk(..., followlinks=False) and sum
regular file sizes, and ensure you also handle broken symlinks safely to prevent
exceptions.
🤖 I have created a release *beep* *boop* --- ## [1.1.3](v1.1.2...v1.1.3) (2025-10-31) ### Bug Fixes * be more explicit about rsync args and ensure that symlinks are copied (avoiding that they point to non-existing files); to not copy permissions and ownership (thanks [@landerini](https://github.com/landerini)) ([#38](#38)) ([505d868](505d868)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
fixes #34
Summary by CodeRabbit
New Features
Bug Fixes