Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait files fully downloaded before complete task #1707

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

wintonzheng
Copy link
Contributor

@wintonzheng wintonzheng commented Feb 3, 2025

Important

Ensure files are fully downloaded before completing tasks by introducing download waiting mechanisms and handling long download times.

  • Behavior:
    • Ensure files are fully downloaded before completing tasks in agent.py and handler.py.
    • Introduce wait_for_download_finished() in files.py to wait for file downloads to complete.
    • Handle long download times with DownloadFileMaxWaitingTime exception in exceptions.py.
  • Functions:
    • Add list_downloading_files_in_directory() and wait_for_download_finished() in files.py.
    • Update execute_step() in agent.py to wait for downloads.
    • Update handle_click_to_download_file_action() in handler.py to wait for downloads.
  • Constants:
    • Add BROWSER_DOWNLOADING_SUFFIX in constants.py for identifying downloading files.
  • Logging:
    • Add logging for download status and long download warnings in agent.py and handler.py.

This description was created by Ellipsis for 96fd507. It will automatically update as commits are pushed.

<!-- ELLIPSIS_HIDDEN -->

> [!IMPORTANT]
> Ensure files are fully downloaded before completing tasks by waiting for download completion and handling long download times.
>
>   - **Behavior**:
>     - Ensure files are fully downloaded before completing tasks in `agent.py` and `handler.py`.
>     - Introduce `wait_for_download_finished()` in `files.py` to wait for file downloads to complete.
>     - Handle long download times with `DownloadFileMaxWaitingTime` exception in `exceptions.py`.
>   - **Constants**:
>     - Add `BROWSER_DOWNLOADING_SUFFIX` to `constants.py` for identifying downloading files.
>   - **Functions**:
>     - Add `list_downloading_files_in_directory()` in `files.py` to list files still downloading.
>     - Modify `execute_step()` in `agent.py` to wait for downloads before task completion.
>     - Update `handle_click_to_download_file_action()` in `handler.py` to wait for downloads.
>   - **Exceptions**:
>     - Add `DownloadFileMaxWaitingTime` in `exceptions.py` for handling download timeouts.
>
> <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=Skyvern-AI%2Fskyvern-cloud&utm_source=github&utm_medium=referral)<sup> for b5d4aa0829168bc704025e5df29168f0160b2868. It will automatically update as commits are pushed.</sup>

<!-- ELLIPSIS_HIDDEN -->
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 96fd507 in 1 minute and 15 seconds

More details
  • Looked at 229 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 11 drafted comments based on config settings.
1. skyvern/forge/sdk/api/files.py:97
  • Draft comment:
    Typo: use 'retrieved' instead of 'retreived'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. skyvern/webeye/actions/handler.py:1411
  • Draft comment:
    Variable 'file' is declared with union type (list[str] | str) but used in context where a consistent type is expected. Consider using a consistent type (e.g. always a list) for file chooser file paths.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. skyvern/webeye/actions/handler.py:1429
  • Draft comment:
    Ensure that the file chooser listener (fc_func) is correctly captured and removed. Confirm that its scope covers all uses.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. skyvern/forge/sdk/api/files.py:27
  • Draft comment:
    In 'download_from_s3', using file_path.write(downloaded_bytes) on a temporary file wrapper may need explicit flush/close for reliable file operations. Consider ensuring the file is properly flushed before returning.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. skyvern/webeye/actions/handler.py:488
  • Draft comment:
    In handle_click_to_download_file_action, the loop that waits for file download based on file count may benefit from a shorter polling interval or additional checks (e.g. file size changes) to more reliably detect completion.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. skyvern/forge/sdk/api/files.py:94
  • Draft comment:
    Typo: 'retreived' should be 'retrieved' in the log message when no file extension is found from HTTP headers.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. skyvern/forge/sdk/api/files.py:174
  • Draft comment:
    Consider adding more debug logging in the wait loop in 'wait_for_download_finished' to log iteration progress or remaining files; also, optionally verify file suffix remains the downloading suffix.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The function is relatively simple and already has core error handling. While debug logging could be helpful for troubleshooting, it's not critical and the function works without it. The suffix verification suggestion seems unnecessary since the function is only checking file existence, not actually using the suffix. The code appears to work correctly as is.
    Debug logging could genuinely help with troubleshooting download issues in production. The suffix verification could catch edge cases where files are renamed during download.
    While logging could be helpful, it's not a critical issue that requires immediate fixing. The suffix verification is unnecessary since the downloading files list is built by checking suffixes upfront in list_downloading_files_in_directory().
    This comment makes nice-to-have suggestions but doesn't identify any real problems or necessary improvements. The code works correctly as written.
8. skyvern/forge/sdk/api/files.py:177
  • Draft comment:
    Usage of 'asyncio.timeout' requires Python 3.11 or later; please ensure compatibility or document the minimum Python version requirement.
  • Reason this comment was not posted:
    Comment did not seem useful.
9. skyvern/webeye/actions/handler.py:488
  • Draft comment:
    In 'handle_click_to_download_file_action', the loop that waits for an increased number of files in the download directory uses a fixed 1-second interval. Consider making the sleep interval configurable or using a more responsive backoff strategy.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. skyvern/webeye/actions/handler.py:1588
  • Draft comment:
    There are several arbitrary sleep durations (e.g., sleep(15) after uploading files) in the chain click and input actions. Consider exposing these delays via configuration in settings for easier tuning.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
11. skyvern/webeye/actions/handler.py:2700
  • Draft comment:
    The overall complexity and length of functions in this handler module may benefit from refactoring into smaller, testable units, improving maintainability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_JO9llp1TEXyQmWFD


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 96fd507 in 2 minutes and 7 seconds

More details
  • Looked at 229 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 drafted comments based on config settings.
1. skyvern/fuse/sdk/api/files.py:174
  • Draft comment:
    Consider checking not only file existence but also that the file is fully downloaded (e.g. by verifying the absence of the BROWSER_DOWNLOADING_SUFFIX) before returning. This may reduce false positives if a partial file remains.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. skyvern/fuse/sdk/api/files.py:158
  • Draft comment:
    Good practice: Using os.walk() with a non-recursive flag—ensure that recursive behavior is clearly documented in list_files_in_directory.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    None
3. skyvern/webeye/actions/handler.py:488
  • Draft comment:
    In handle_click_to_download_file_action, the polling loop waits 1 second between checks. For improved responsiveness, consider reducing the sleep interval or checking file size stability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. skyvern/webeye/actions/handler.py:525
  • Draft comment:
    Ensure that the warning log for long-time downloading files properly informs users; you might consider triggering a fallback mechanism if downloads exceed the max waiting time.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    None
5. skyvern/webeye/actions/handler.py:1405
  • Draft comment:
    The chain_click function is quite complex. Consider refactoring to modularize the different fallback strategies (label lookup, sibling, and parent approaches) to improve readability and maintainability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. skyvern/webeye/actions/handler.py:2710
  • Draft comment:
    Methods that search for a tag within a given element (such as get_anchor_to_click) should handle cases when the target child element isn't found, possibly logging detailed context.
  • Reason this comment was not posted:
    Confidence changes required: 40%
    None
7. skyvern/webeye/actions/handler.py:1380
  • Draft comment:
    Consider explicitly handling potential exceptions when verifying secret parameters in get_actual_value_of_parameter_if_secret to avoid leaking secret placeholders.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. skyvern/forge/sdk/api/files.py:174
  • Draft comment:
    In wait_for_download_finished(), the logic checks only for file existence to decide if a download is complete. This assumes the temporary download file (with BROWSER_DOWNLOADING_SUFFIX) disappears on completion. Please confirm that this reliably indicates full download completion across all supported browsers.
  • Reason this comment was not posted:
    Comment did not seem useful.
9. skyvern/forge/sdk/api/files.py:162
  • Draft comment:
    In list_downloading_files_in_directory(), file extension matching uses 'path.suffix == downloading_suffix'. Consider whether this check is robust for all cases (e.g. if extra parameters exist in filename). A comment or additional validation might help.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is speculative - it says "Consider whether" and doesn't point to any specific issue. Path.suffix is a standard Python pathlib method that reliably gets file extensions. The BROWSER_DOWNLOADING_SUFFIX is likely a controlled constant. The comment doesn't provide any concrete examples of problematic cases.
    I could be wrong about the reliability of Path.suffix - there might be edge cases I'm not considering. The comment might be trying to prevent future issues.
    While edge cases might exist, the comment is too speculative and doesn't provide actionable feedback. Path.suffix is a standard Python method that's generally reliable for this use case.
    Delete this comment as it's speculative, not actionable, and doesn't point to a concrete issue. It violates the rule against speculative "if X then Y" comments.
10. skyvern/webeye/actions/handler.py:525
  • Draft comment:
    After waiting for the download files to complete using wait_for_download_finished(), a TimeoutError triggers a warning and the function continues by returning success. Please confirm that treating a timeout as a success (with warning) is the intended behavior.
  • Reason this comment was not posted:
    Comment did not seem useful.
11. skyvern/webeye/actions/handler.py:488
  • Draft comment:
    The loop that checks for an increased file count (comparing list_files_before and list_files_after) is used to detect a new download. Consider potential race conditions if multiple downloads occur simultaneously or if existing files are modified. A brief comment documenting assumptions would improve clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. skyvern/webeye/actions/handler.py:519
  • Draft comment:
    When logging the list of downloading files, ensure Path objects are converted to string representations for clearer logging output.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_4fKEhVhZxwUggQTP


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -158,6 +159,34 @@ def list_files_in_directory(directory: Path, recursive: bool = False) -> list[st
return listed_files


def list_downloading_files_in_directory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be beneficial to add inline docstrings or type annotations for the newly introduced functions (like wait_for_download_finished and list_downloading_files_in_directory) to clarify their intended use and assumptions.

@LawyZheng LawyZheng merged commit b43f1bf into main Feb 3, 2025
7 checks passed
@LawyZheng LawyZheng deleted the lawy/fix-download-suffix-issue branch February 3, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants