Skip to content

Conversation

LawyZheng
Copy link
Collaborator

@LawyZheng LawyZheng commented Sep 2, 2025

Important

Enhance file download handling with browser session awareness and S3 support, updating several functions and classes for improved file management.

  • Behavior:
    • Enhance file download handling by integrating browser session awareness in agent.py and handler.py.
    • Support for listing and managing downloaded and downloading files in S3 in s3.py.
    • Modify DownloadFileMaxWaitingTime to accept list[str] instead of list[Path] in exceptions.py.
  • Functions:
    • Update execute_step() in agent.py to check and list files in browser sessions.
    • Add list_downloaded_files_in_browser_session() and list_downloading_files_in_browser_session() in base.py, local.py, and s3.py.
    • Modify download_file() in aws.py to handle S3 URIs and download files locally.
  • Misc:
    • Refactor list_downloading_files_in_directory() in files.py to return list[str].
    • Add delete_file() method in aws.py for S3 file deletion.
    • Update check_and_fix_state() in browser_factory.py to handle browser session pages.

This description was created by Ellipsis for 97568e4. You can customize this summary. It will automatically update as commits are pushed.



📁 This PR enhances browser session file management by adding comprehensive S3 file handling capabilities and session-specific file listing for downloaded files in browser automation workflows.

🔍 Detailed Analysis

Key Changes

  • Storage Interface Enhancement: Added abstract methods list_downloaded_files_in_browser_session() and list_downloading_files_in_browser_session() to the base storage class with implementations for both local and S3 storage
  • S3 File Operations: Implemented S3 file deletion capability and enhanced file handling to work with S3 URIs throughout the download process
  • Browser Session Integration: Modified agent execution and file download handlers to check and include files from browser session storage alongside local directory files
  • Type System Updates: Changed file path handling from Path objects to string paths to accommodate S3 URIs in the DownloadFileMaxWaitingTime exception

Technical Implementation

sequenceDiagram
    participant Agent as Agent/Handler
    participant Storage as Storage Layer
    participant S3 as AWS S3
    participant Local as Local Directory
    
    Agent->>Local: List local files
    Agent->>Storage: List browser session files
    Storage->>S3: Query S3 bucket for session files
    S3-->>Storage: Return S3 file URIs
    Storage-->>Agent: Return combined file list
    Agent->>Agent: Process downloads with S3 support
    Agent->>S3: Download S3 files to local directory
    Agent->>Storage: Handle file cleanup/deletion
Loading

Impact

  • Enhanced File Management: Browser sessions can now properly track and manage downloaded files stored in S3, providing better visibility and control over file operations
  • Improved Download Reliability: The system now handles both local and remote (S3) file downloads seamlessly, with proper waiting mechanisms for incomplete downloads
  • Better Session Persistence: Files downloaded during browser sessions are now accessible and manageable across different execution contexts, improving workflow continuity

Created with Palmier

<!-- ELLIPSIS_HIDDEN -->

> [!IMPORTANT]
> Enhance browser session file management by adding S3 file handling and session-specific file listing capabilities.
>
>   - **Behavior**:
>     - Add `need_date` parameter to `_watch_and_sync_directory()` in `browser_controller.py` to control date inclusion in S3 key.
>     - Handle file deletion in S3 when `Change.deleted` is detected in `browser_controller.py`.
>     - Update `start_artifact_sync()` in `browser_controller.py` to set `need_date=False`.
>   - **Session Management**:
>     - Add `browser_session_id` and `organization_id` parameters to `get_browser_state_from_network_info()` in `persistent_sessions_manager.py`.
>     - Add `browser_session_id` to various browser creation functions in `special_browsers.py`.
>   - **File Management**:
>     - Implement `list_downloaded_files_in_browser_session()` and `list_downloading_files_in_browser_session()` in `base.py`, `local.py`, and `s3.py`.
>     - Update `download_file()` and `wait_for_download_finished()` in `files.py` to handle S3 URIs.
>     - Add `delete_file()` method to `aws.py` for S3 file deletion.
>   - **Misc**:
>     - Change `DownloadFileMaxWaitingTime` exception to use `list[str]` instead of `list[Path]` in `exceptions.py`.
>     - Update `handle_click_to_download_file_action()` in `handler.py` to check files in browser session storage.
>
> <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=Skyvern-AI%2Fskyvern-cloud&utm_source=github&utm_medium=referral)<sup> for 465124432b6e9e05df32437aacacd00cbe731378. You can [customize](https://app.ellipsis.dev/Skyvern-AI/settings/summaries) this summary. It will automatically update as commits are pushed.</sup>

<!-- ELLIPSIS_HIDDEN -->
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 97568e4 in 1 minute and 23 seconds. Click for details.
  • Reviewed 314 lines of code in 9 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. skyvern/forge/sdk/api/aws.py:117
  • Draft comment:
    Replace 'if storage_class not in S3StorageClass' with an explicit Enum type check (e.g. using isinstance or comparing storage_class.value) so the check works reliably.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. skyvern/forge/sdk/api/files.py:100
  • Draft comment:
    Consider refactoring the download_file function to separate HTTP request handling and file writing for improved readability and testability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. skyvern/forge/sdk/artifact/storage/local.py:236
  • Draft comment:
    Abstract out repeated directory creation logic (e.g. _create_directories_if_not_exists) to reduce duplication.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. skyvern/forge/sdk/artifact/storage/s3.py:303
  • Draft comment:
    Ensure robust handling of S3 key formatting and presigned URL generation; add error-handling for edge cases in upload/download flows.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. skyvern/webeye/actions/handler.py:860
  • Draft comment:
    Action handler functions are deeply nested and catch broad exceptions; refactor into smaller helper functions and use more specific exception types for improved maintainability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. skyvern/webeye/browser_factory.py:510
  • Draft comment:
    Subprocess.Popen usage to launch Chrome with a fixed sleep may conflict in containerized environments; make delays and port checks more robust and configurable.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. skyvern/forge/agent.py:526
  • Draft comment:
    Typo: In the warning message, "incompleted download file" should be updated to "incomplete download file" for clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. skyvern/webeye/actions/handler.py:903
  • Draft comment:
    Typo: The comment text "check if there's any file is still downloading" is unclear. Consider rephrasing it to "check if there are any files still downloading".
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_AoGR7rVSKTfkoFjW

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 97568e4 in 1 minute and 23 seconds. Click for details.
  • Reviewed 314 lines of code in 9 files
  • Skipped 0 files when reviewing.
  • Skipped posting 9 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. skyvern/webeye/browser_factory.py:42
  • Draft comment:
    Consider abstracting the duplicate logic between _create_headless_chromium and _create_headful_chromium. Both functions share similar steps (e.g. setting up user data, download directories, and updating browser preferences) and could be refactored into a common helper to DRY up the code.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. skyvern/webeye/browser_factory.py:676
  • Draft comment:
    The navigate_to_url method uses a fixed sleep (5 seconds) after page.goto. Consider replacing this fixed delay with a dynamic wait (e.g., using page.wait_for_load_state('networkidle')) to make navigation more robust and reduce unnecessary waiting.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. skyvern/webeye/browser_factory.py:716
  • Draft comment:
    The get_working_page() method assumes the last page in browser_context.pages is the working page. This heuristic might be unsafe with multi-page scenarios. Consider a more robust selection mechanism or documenting this assumption clearly.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. skyvern/webeye/browser_factory.py:46
  • Draft comment:
    In set_browser_console_log, the console event listener is registered on the browser context but there is no removal logic when closing the context. This may lead to resource leaks if contexts are frequently recreated. Consider removing the listener on cleanup.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. skyvern/webeye/browser_factory.py:498
  • Draft comment:
    In _create_cdp_connection_browser, synchronous time.sleep(1) is used after starting Chrome via subprocess. Since this code runs in an async context, consider replacing time.sleep(1) with await asyncio.sleep(1) to avoid blocking the event loop.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. skyvern/webeye/browser_factory.py:240
  • Draft comment:
    The code relies on 'asyncio.timeout' for timeout handling. Ensure that the target Python runtime supports this (Python 3.11+) or provide a compatibility layer for earlier versions.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. skyvern/forge/agent.py:526
  • Draft comment:
    Typo: The log message uses 'incompleted'. Consider changing it to 'incomplete' for grammatical correctness.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. skyvern/forge/sdk/api/files.py:199
  • Draft comment:
    Typographical issue: The comment reads "check if there's any file is still downloading". Consider rephrasing it, e.g., "check if any file is still downloading", for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
9. skyvern/webeye/actions/handler.py:904
  • Draft comment:
    The comment 'check if there's any file is still downloading' is a bit unclear grammatically. Consider rephrasing it to something like 'check if any files are still downloading' for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_miSJUM8K3aP3xMMU

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@LawyZheng LawyZheng merged commit 275426e into main Sep 2, 2025
8 checks passed
@LawyZheng LawyZheng deleted the lawy/downloaded-file-debugger-accessible branch September 2, 2025 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant