Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions snakemake_storage_plugin_fs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,17 @@ def size(self) -> int:
# return the size in bytes
return self._stat().st_size

def local_footprint(self) -> int:
if self.is_ondemand_eligible:
# If the object is not downloaded, we don't have to count it.
return 0
if self.query_path.is_dir():
# If the object is a directory, we have to count all files in it.
return sum(
f.stat().st_size for f in self.query_path.rglob("*") if f.is_file()
)
Comment on lines +204 to +208
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consider symlink handling in directory traversal.

The rglob("*") call follows symlinks by default, which could lead to:

  • Counting files outside the intended directory tree
  • Infinite loops if circular symlinks exist
  • Inconsistent footprint calculations if symlinks point to large external directories

Since the PR explicitly addresses symlink handling, consider whether this method should follow symlinks or not. If symlinks should not be followed, apply this diff:

         if self.query_path.is_dir():
             # If the object is a directory, we have to count all files in it.
             return sum(
-                f.stat().st_size for f in self.query_path.rglob("*") if f.is_file()
+                f.stat(follow_symlinks=False).st_size 
+                for f in self.query_path.rglob("*") 
+                if f.is_file() and not f.is_symlink()
             )

Alternatively, if the intent is to follow symlinks but avoid infinite loops, consider using os.walk with followlinks=False or tracking visited inodes.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.query_path.is_dir():
# If the object is a directory, we have to count all files in it.
return sum(
f.stat().st_size for f in self.query_path.rglob("*") if f.is_file()
)
if self.query_path.is_dir():
# If the object is a directory, we have to count all files in it.
return sum(
f.stat().st_size for f in self.query_path.rglob("*") if f.is_file() and not f.is_symlink()
)
🤖 Prompt for AI Agents
In snakemake_storage_plugin_fs/__init__.py around lines 204 to 208, the
directory size calculation uses self.query_path.rglob("*") which follows
symlinks by default and can count files outside the tree or cause loops; update
the traversal to avoid following symlinks by skipping symlink entries when
summing sizes (e.g., only include entries where f.is_file() and not
f.is_symlink()), or replace rglob with os.walk(..., followlinks=False) and sum
regular file sizes, and ensure you also handle broken symlinks safely to prevent
exceptions.

return self.size()

def retrieve_object(self):
# Ensure that the object is accessible locally under self.local_path()
if self.is_ondemand_eligible:
Expand All @@ -212,7 +223,13 @@ def retrieve_object(self):
)
else:
cmd = sysrsync.get_rsync_command(
str(self.query_path), str(self.local_path()), options=["-av"]
str(self.query_path),
str(self.local_path()),
options=[
"--recursive",
"--times",
"--copy-links",
],
)
self._run_cmd(cmd)

Expand All @@ -235,7 +252,11 @@ def store_object(self):
str(self.query_path),
# ensure that permissions and ownership are inherited from destination,
# e.g. setgid.
options=["-av", "--no-o", "--no-g", "--no-p"],
options=[
"--recursive",
"--times",
"--copy-links",
],
)
self._run_cmd(cmd)

Expand Down