-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wait files fully downloaded before complete task #1707
Conversation
<!-- ELLIPSIS_HIDDEN --> > [!IMPORTANT] > Ensure files are fully downloaded before completing tasks by waiting for download completion and handling long download times. > > - **Behavior**: > - Ensure files are fully downloaded before completing tasks in `agent.py` and `handler.py`. > - Introduce `wait_for_download_finished()` in `files.py` to wait for file downloads to complete. > - Handle long download times with `DownloadFileMaxWaitingTime` exception in `exceptions.py`. > - **Constants**: > - Add `BROWSER_DOWNLOADING_SUFFIX` to `constants.py` for identifying downloading files. > - **Functions**: > - Add `list_downloading_files_in_directory()` in `files.py` to list files still downloading. > - Modify `execute_step()` in `agent.py` to wait for downloads before task completion. > - Update `handle_click_to_download_file_action()` in `handler.py` to wait for downloads. > - **Exceptions**: > - Add `DownloadFileMaxWaitingTime` in `exceptions.py` for handling download timeouts. > > <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=Skyvern-AI%2Fskyvern-cloud&utm_source=github&utm_medium=referral)<sup> for b5d4aa0829168bc704025e5df29168f0160b2868. It will automatically update as commits are pushed.</sup> <!-- ELLIPSIS_HIDDEN -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 96fd507 in 1 minute and 15 seconds
More details
- Looked at
229
lines of code in5
files - Skipped
0
files when reviewing. - Skipped posting
11
drafted comments based on config settings.
1. skyvern/forge/sdk/api/files.py:97
- Draft comment:
Typo: use 'retrieved' instead of 'retreived'. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. skyvern/webeye/actions/handler.py:1411
- Draft comment:
Variable 'file' is declared with union type (list[str] | str) but used in context where a consistent type is expected. Consider using a consistent type (e.g. always a list) for file chooser file paths. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. skyvern/webeye/actions/handler.py:1429
- Draft comment:
Ensure that the file chooser listener (fc_func) is correctly captured and removed. Confirm that its scope covers all uses. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. skyvern/forge/sdk/api/files.py:27
- Draft comment:
In 'download_from_s3', using file_path.write(downloaded_bytes) on a temporary file wrapper may need explicit flush/close for reliable file operations. Consider ensuring the file is properly flushed before returning. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. skyvern/webeye/actions/handler.py:488
- Draft comment:
In handle_click_to_download_file_action, the loop that waits for file download based on file count may benefit from a shorter polling interval or additional checks (e.g. file size changes) to more reliably detect completion. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. skyvern/forge/sdk/api/files.py:94
- Draft comment:
Typo: 'retreived' should be 'retrieved' in the log message when no file extension is found from HTTP headers. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. skyvern/forge/sdk/api/files.py:174
- Draft comment:
Consider adding more debug logging in the wait loop in 'wait_for_download_finished' to log iteration progress or remaining files; also, optionally verify file suffix remains the downloading suffix. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The function is relatively simple and already has core error handling. While debug logging could be helpful for troubleshooting, it's not critical and the function works without it. The suffix verification suggestion seems unnecessary since the function is only checking file existence, not actually using the suffix. The code appears to work correctly as is.
Debug logging could genuinely help with troubleshooting download issues in production. The suffix verification could catch edge cases where files are renamed during download.
While logging could be helpful, it's not a critical issue that requires immediate fixing. The suffix verification is unnecessary since the downloading files list is built by checking suffixes upfront in list_downloading_files_in_directory().
This comment makes nice-to-have suggestions but doesn't identify any real problems or necessary improvements. The code works correctly as written.
8. skyvern/forge/sdk/api/files.py:177
- Draft comment:
Usage of 'asyncio.timeout' requires Python 3.11 or later; please ensure compatibility or document the minimum Python version requirement. - Reason this comment was not posted:
Comment did not seem useful.
9. skyvern/webeye/actions/handler.py:488
- Draft comment:
In 'handle_click_to_download_file_action', the loop that waits for an increased number of files in the download directory uses a fixed 1-second interval. Consider making the sleep interval configurable or using a more responsive backoff strategy. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. skyvern/webeye/actions/handler.py:1588
- Draft comment:
There are several arbitrary sleep durations (e.g., sleep(15) after uploading files) in the chain click and input actions. Consider exposing these delays via configuration in settings for easier tuning. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
11. skyvern/webeye/actions/handler.py:2700
- Draft comment:
The overall complexity and length of functions in this handler module may benefit from refactoring into smaller, testable units, improving maintainability. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_JO9llp1TEXyQmWFD
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 96fd507 in 2 minutes and 7 seconds
More details
- Looked at
229
lines of code in5
files - Skipped
0
files when reviewing. - Skipped posting
12
drafted comments based on config settings.
1. skyvern/fuse/sdk/api/files.py:174
- Draft comment:
Consider checking not only file existence but also that the file is fully downloaded (e.g. by verifying the absence of the BROWSER_DOWNLOADING_SUFFIX) before returning. This may reduce false positives if a partial file remains. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. skyvern/fuse/sdk/api/files.py:158
- Draft comment:
Good practice: Using os.walk() with a non-recursive flag—ensure that recursive behavior is clearly documented in list_files_in_directory. - Reason this comment was not posted:
Confidence changes required:20%
None
3. skyvern/webeye/actions/handler.py:488
- Draft comment:
In handle_click_to_download_file_action, the polling loop waits 1 second between checks. For improved responsiveness, consider reducing the sleep interval or checking file size stability. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. skyvern/webeye/actions/handler.py:525
- Draft comment:
Ensure that the warning log for long-time downloading files properly informs users; you might consider triggering a fallback mechanism if downloads exceed the max waiting time. - Reason this comment was not posted:
Confidence changes required:50%
None
5. skyvern/webeye/actions/handler.py:1405
- Draft comment:
The chain_click function is quite complex. Consider refactoring to modularize the different fallback strategies (label lookup, sibling, and parent approaches) to improve readability and maintainability. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. skyvern/webeye/actions/handler.py:2710
- Draft comment:
Methods that search for a tag within a given element (such as get_anchor_to_click) should handle cases when the target child element isn't found, possibly logging detailed context. - Reason this comment was not posted:
Confidence changes required:40%
None
7. skyvern/webeye/actions/handler.py:1380
- Draft comment:
Consider explicitly handling potential exceptions when verifying secret parameters in get_actual_value_of_parameter_if_secret to avoid leaking secret placeholders. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. skyvern/forge/sdk/api/files.py:174
- Draft comment:
In wait_for_download_finished(), the logic checks only for file existence to decide if a download is complete. This assumes the temporary download file (with BROWSER_DOWNLOADING_SUFFIX) disappears on completion. Please confirm that this reliably indicates full download completion across all supported browsers. - Reason this comment was not posted:
Comment did not seem useful.
9. skyvern/forge/sdk/api/files.py:162
- Draft comment:
In list_downloading_files_in_directory(), file extension matching uses 'path.suffix == downloading_suffix'. Consider whether this check is robust for all cases (e.g. if extra parameters exist in filename). A comment or additional validation might help. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is speculative - it says "Consider whether" and doesn't point to any specific issue. Path.suffix is a standard Python pathlib method that reliably gets file extensions. The BROWSER_DOWNLOADING_SUFFIX is likely a controlled constant. The comment doesn't provide any concrete examples of problematic cases.
I could be wrong about the reliability of Path.suffix - there might be edge cases I'm not considering. The comment might be trying to prevent future issues.
While edge cases might exist, the comment is too speculative and doesn't provide actionable feedback. Path.suffix is a standard Python method that's generally reliable for this use case.
Delete this comment as it's speculative, not actionable, and doesn't point to a concrete issue. It violates the rule against speculative "if X then Y" comments.
10. skyvern/webeye/actions/handler.py:525
- Draft comment:
After waiting for the download files to complete using wait_for_download_finished(), a TimeoutError triggers a warning and the function continues by returning success. Please confirm that treating a timeout as a success (with warning) is the intended behavior. - Reason this comment was not posted:
Comment did not seem useful.
11. skyvern/webeye/actions/handler.py:488
- Draft comment:
The loop that checks for an increased file count (comparing list_files_before and list_files_after) is used to detect a new download. Consider potential race conditions if multiple downloads occur simultaneously or if existing files are modified. A brief comment documenting assumptions would improve clarity. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. skyvern/webeye/actions/handler.py:519
- Draft comment:
When logging the list of downloading files, ensure Path objects are converted to string representations for clearer logging output. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_4fKEhVhZxwUggQTP
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
@@ -158,6 +159,34 @@ def list_files_in_directory(directory: Path, recursive: bool = False) -> list[st | |||
return listed_files | |||
|
|||
|
|||
def list_downloading_files_in_directory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be beneficial to add inline docstrings or type annotations for the newly introduced functions (like wait_for_download_finished and list_downloading_files_in_directory) to clarify their intended use and assumptions.
Important
Ensure files are fully downloaded before completing tasks by introducing download waiting mechanisms and handling long download times.
agent.py
andhandler.py
.wait_for_download_finished()
infiles.py
to wait for file downloads to complete.DownloadFileMaxWaitingTime
exception inexceptions.py
.list_downloading_files_in_directory()
andwait_for_download_finished()
infiles.py
.execute_step()
inagent.py
to wait for downloads.handle_click_to_download_file_action()
inhandler.py
to wait for downloads.BROWSER_DOWNLOADING_SUFFIX
inconstants.py
for identifying downloading files.agent.py
andhandler.py
.This description was created by for 96fd507. It will automatically update as commits are pushed.