Skip to content

Conversation

@phmoferring
Copy link
Contributor

@phmoferring phmoferring commented Oct 14, 2025

When using this plugin in conjunction with the snakemake-executor-plugin-aws-batch, and a directory output from a job executed on AWS Batch, I receive the following error:

OSError: Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
s3://bucket/default/storage/prefix/directory_batch (missing in storage)

However, the expected output objects within this prefix are created as expected from the job executed on batch. Increasing the --latency-wait parameter does not change the behavior.

I'm uncertain if this is really the "source" of the error, since it does not happen when running locally(still using the s3 storage plugin), but this change to recompute _is_dir of the StorageObject when it is Falsy instead of just None seems to resolve the issue. Someone more familiar with the internals of snakemake-interface-storage-plugins/snakemake-interface-executor-plugins/snakemake may be able to point to a more appropriate place to address this issue. Querying s3 more than necessary is obviously not the most elegant solution.

I'm including a Snakefile that recreates the issue, a working AWS Batch queue/compute environment and usage of a default s3 storage prefix is required to recreate the issue. I am using the latest(as of 14Oct2025) versions of snakemake and mentioned plugins, Python3.13, on a Ubuntu 24.04 based docker image.

rule all:
    input:
        "directory_batch",
        "directory_local",

# Works fine when running locally, produces error after job completion on AWS Batch
rule directory_output_batch:
    output:
        directory_batch = directory("directory_batch"),
    shell:
        """
        mkdir -p {output.directory_batch}
        touch {output.directory_batch}/output_file
        """

# Produces no error when running fully locally or on AWS Batch
rule directory_output_local:
    localrule: True
    output:
        directory_local = directory("directory_local"),
    shell:
        """
        mkdir -p {output.directory_local}
        touch {output.directory_local}/output_file
        """

Summary by CodeRabbit

  • Bug Fixes
    • Improved directory detection in the S3 storage plugin to ensure consistent boolean results from directory checks, preventing unexpected None values that could disrupt control flow.
    • Enhances reliability of operations that depend on directory status, such as listing contents, conditional processing, and workflow decisions in S3-backed paths.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 14, 2025

📝 Walkthrough

Walkthrough

Adjusted S3 path directory-detection cache initialization: replaced an explicit None check for _is_dir with a truthiness check, changing when the cache is populated and potentially returning None from is_dir() instead of a boolean.

Changes

Cohort / File(s) Summary
S3 directory flag caching
snakemake_storage_plugin_s3/__init__.py
Replaced if self._is_dir is None with if not self._is_dir before populating self._is_dir = any(self.get_subkeys()), altering cache initialization and possible return value of is_dir()

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant S3Path as S3Path
  participant S3 as S3 API

  Caller->>S3Path: is_dir()
  alt Cache check (new)
    note over S3Path: if not self._is_dir
    S3Path->>S3: get_subkeys()
    S3-->>S3Path: subkeys list
    S3Path->>S3Path: self._is_dir = any(subkeys)
  else Cache assumed truthy/False
    note over S3Path: Skips population if self._is_dir is None (falsy?) or False handling differs
  end
  S3Path-->>Caller: self._is_dir (may be None/True/False)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • johanneskoester

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the primary change—fixing a spurious missing output error with the AWS Batch executor—and aligns directly with the problem outlined in the PR description.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d067f40 and 8ec8514.

📒 Files selected for processing (1)
  • snakemake_storage_plugin_s3/__init__.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • snakemake_storage_plugin_s3/__init__.py

Comment on lines +329 to 331
if not self._is_dir:
self._is_dir = any(self.get_subkeys())
return self._is_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Rechecking S3 on every call when directory doesn't exist creates performance overhead.

The change from if self._is_dir is None: to if not self._is_dir: solves the AWS Batch stale-cache problem by rechecking S3 whenever _is_dir is False. However, this means that for non-existent or empty directories, every subsequent is_dir() call will query S3 again, which is wasteful.

The root cause is that _is_dir is cached across remote state changes (e.g., when a Batch job creates files). Better alternatives might include:

  1. Invalidate the cache when appropriate: Reset _is_dir = None after job completion or when --latency-wait expires.
  2. Time-based cache expiration: Track when _is_dir was last populated and recheck after a timeout.
  3. Separate "checked" from "is directory" state: Use a separate flag to track whether the check has been performed vs. what the result was.

Additionally, mixing a None-check (uninitialized state) with a truthiness check (value comparison) makes the caching strategy unclear. Consider explicitly handling the three states: uninitialized (None), not a directory (False), and is a directory (True).

To verify the performance impact, run this script to identify how often is_dir() is called and whether the rechecking behavior is triggered frequently:


🏁 Script executed:

#!/bin/bash
# Description: Search for is_dir() calls and usage patterns to assess performance impact

# Find all calls to is_dir() in the codebase
echo "=== Locations calling is_dir() ==="
rg -n 'is_dir\(\)' -A 2 -B 2

# Check if there are any cache invalidation patterns
echo -e "\n=== Patterns that might invalidate _is_dir cache ==="
rg -n '_is_dir\s*=' -A 1 -B 1

# Look for any retry or latency-wait mechanisms
echo -e "\n=== Latency wait and retry mechanisms ==="
rg -n 'latency.wait|retry' -A 2 -B 2

Length of output: 4565


Separate cache initialization from value and prevent repeated S3 queries

Changing if self._is_dir is None to if not self._is_dir forces every is_dir() call on an empty or non-existent prefix (used in exists, mtime, size, retrieve_object, remove) to requery S3, causing substantial overhead. Instead:

  • Keep the three-state logic: uninitialized (None), directory (True), not-a-directory (False).
  • Invalidate or expire the cache explicitly (e.g., reset _is_dir = None after job completion or use a time-based TTL).
  • Or track a separate “checked” flag versus the actual boolean result.

Refactor the cache logic accordingly to avoid S3 thrashing.

🤖 Prompt for AI Agents
In snakemake_storage_plugin_s3/__init__.py around lines 329 to 331, the current
check uses "if not self._is_dir" which collapses the three-state cache into a
boolean and causes repeated S3 queries for empty/non-existent prefixes; change
the logic to preserve the uninitialized state by checking "if self._is_dir is
None" before calling self.get_subkeys(), set self._is_dir to True/False based on
the result, and implement explicit cache invalidation (reset self._is_dir =
None) after job completion or add a separate checked flag or a TTL timestamp to
avoid thrashing S3 while keeping correct cached values.

@gilbsgilbs
Copy link

I made another fix that is slightly better regarding the cache invalidation issue you mentioned: #60 .

@deto
Copy link

deto commented Nov 21, 2025

I was hit with the same issue and did some digging #61

More details in that issue thread, but basically the reason it works when executing locally is that snakemake just uses local file paths when checking for existence/missing when running locally, even if remote storage is used. This is because all files for remote storage are initially created locally. So it'll just check if the path exists like .snakemake/storage/s3/<bucket>/my_dir/, ignoring the storage plugin code completely.

would be nice if they merge in a fix soon! clearly this issue is snagging quite a few people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants