Skip to content

Conversation

@MODSetter
Copy link
Owner

@MODSetter MODSetter commented Apr 14, 2025

…t for upcoming SurfSense research agent

Summary by CodeRabbit

  • New Features

    • Launched an AI-assisted sub-section writer that enhances document analysis and answer generation with proper citation formatting.
    • Introduced a new module for custom graph definitions and a configuration class to streamline agent workflows.
    • Added asynchronous functions for fetching relevant documents and generating answers based on sub-section questions.
  • Bug Fixes

    • Improved type handling in search and connector functions to boost overall system reliability.
  • Chores

    • Updated project dependencies to support and stabilize the new workflow components.

@vercel
Copy link

vercel bot commented Apr 14, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
surf-sense-frontend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 14, 2025 3:54am

@coderabbitai
Copy link

coderabbitai bot commented Apr 14, 2025

Walkthrough

This update introduces a new submodule for the LangGraph Agent within the backend. A collection of files is added to define and execute a custom state graph workflow for writing sub-sections, including configuration, state handling, asynchronous nodes, and prompt definitions. Additionally, minor changes enforce type safety in routes, tasks, and connector services, and the project dependency on LangGraph has been updated.

Changes

File(s) Change Summary
surfsense_backend/app/agents/researcher/sub_section_writer/__init__.py, configuration.py, graph.py, nodes.py, prompts.py, state.py New module added for the LangGraph Agent. Introduces a configuration class (Configuration), a state graph (graph) workflow with nodes for fetching documents and writing sub-sections, a custom prompt (citation_system_prompt), and a State class to manage runtime data.
surfsense_backend/app/routes/chats_routes.py Modified to convert search_space_id explicitly to an integer before passing to the search results function.
surfsense_backend/app/tasks/stream_connector_search_results.py Updated the type of user_id parameter from int to str in the function signature.
surfsense_backend/app/utils/connector_service.py Modified multiple methods to change the user_id parameter type from int to str and streamlined source processing logic by removing deduplication, directly appending to sources_list.
surfsense_backend/pyproject.toml Added dependency "langgraph>=0.3.29".

Sequence Diagram(s)

sequenceDiagram
    participant Config as Configuration
    participant Graph as StateGraph
    participant FD as fetch_relevant_documents
    participant WS as write_sub_section
    participant State as ExecutionState

    Config->>Graph: Build workflow with configuration schema
    Graph->>FD: Invoke document fetching
    FD-->>State: Return fetched documents
    Graph->>WS: Invoke sub-section writing
    WS-->>State: Return final answer
Loading

Poem

I'm a rabbit, quick on my feet,
Hoping through code, ever so neat.
Fetching documents with a joyful leap,
Writing sub-sections, no error too deep.
In LangGraph fields, my code does prance –
A merry dance in a coding romance!
🐰✨

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b93c9d and aaddd5c.

📒 Files selected for processing (1)
  • surfsense_backend/app/routes/chats_routes.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • surfsense_backend/app/routes/chats_routes.py

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
surfsense_backend/app/utils/connector_service.py (1)

445-515: 🛠️ Refactor suggestion

Validate YouTube metadata fields and unique ID generation.

  1. Line 470 assigns self.source_id_counter to the chunk’s document ID. If concurrency is possible, consider a safer ID generation strategy.\
  2. At lines 488–490, the code truncates the description to 100 characters. Confirm that the appended ellipsis (“...”) is consistent with the rest of your project’s user experience.
🧰 Tools
🪛 Ruff (0.8.2)

480-480: Local variable published_date is assigned to but never used

Remove assignment to unused variable published_date

(F841)

🧹 Nitpick comments (8)
surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py (1)

1-2: Remove extraneous 'f' prefix from string.

The string is defined with an f-prefix but doesn't contain any placeholders.

-citation_system_prompt = f"""
+citation_system_prompt = """
🧰 Tools
🪛 Ruff (0.8.2)

1-82: f-string without any placeholders

Remove extraneous f prefix

(F541)

surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py (2)

11-161: Consider stabilizing content deduplication and adding connector error handling.

  1. The built-in hash() used at lines 118–124 can vary between runs due to Python’s hash randomization. If consistency is required for deduplication across processes or sessions, consider using a stable hash function like md5 or sha256.
  2. When iterating over multiple connectors (lines 40–108), if a connector fails or is missing, the code silently proceeds. This might mask errors. Consider adding try/except blocks or skipping unavailable connectors with clear logging to improve resilience.

164-245: Incorporate the sub-section title in the final LLM prompt and handle potential LLM failures.

  1. The code references configuration.sub_section_title (line 212) but it is commented out, resulting in no mention of the actual title in lines 217–229. If the sub-section title is important, reintroduce it into the prompt to provide full context to the LLM.
  2. At lines 238–239, there is no exception handling if the LLM invocation fails. Consider wrapping this call in a try/except block to handle potential network errors or LLM unavailability gracefully.
surfsense_backend/app/utils/connector_service.py (5)

16-60: Evaluate concurrency risks with source_id_counter and improve logging for missing connectors.

  1. The method search_crawled_urls (lines 16–60) increments self.source_id_counter at line 48 in a simple loop. If this service is used concurrently, you risk race conditions. Consider making the counter thread-safe or generating IDs differently.
  2. No fallback or error handling is present if no results are found (returns an empty list). Logging or messages to guide the caller might be beneficial.

61-105: Check concurrency for source_id_counter and maintain consistent behavior with file searches.

Just like in search_crawled_urls, source_id_counter is incremented at line 93. If concurrency is possible, address potential race conditions. If concurrency is not intended, consider documenting that limitation.


126-218: Guard against missing or malformed Tavily configuration and concurrency on source IDs.

  1. Line 139 fetches a connector but handles an empty result by returning an empty list. Consider logging that no Tavily connector was found to help debug misconfiguration.
  2. As with other methods, self.source_id_counter increments at line 197 in a loop; concurrency might create duplicate IDs.

219-282: Improve Slack-specific metadata usage and concurrency caution.

  1. The Slack metadata logic (lines 243–268) is helpful but consider adding fallback logs if the needed fields are missing (e.g., channel_name).
  2. The shared concurrency concern with source_id_counter (line 270) remains. If these methods are called in parallel, you might encounter duplicates.

283-354: Enhance code clarity for Notion page retrieval and concurrency handling.

  1. Lines 320–323 build a title from the Notion page’s metadata. Consider logging or skipping if critical metadata is missing.\
  2. Incrementing self.source_id_counter at line 342 can create ID collisions if multiple coroutines access the same connector.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa5dbb7 and 0b93c9d.

⛔ Files ignored due to path filters (1)
  • surfsense_backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • surfsense_backend/app/agents/researcher/sub_section_writer/__init__.py (1 hunks)
  • surfsense_backend/app/agents/researcher/sub_section_writer/configuration.py (1 hunks)
  • surfsense_backend/app/agents/researcher/sub_section_writer/graph.py (1 hunks)
  • surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py (1 hunks)
  • surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py (1 hunks)
  • surfsense_backend/app/agents/researcher/sub_section_writer/state.py (1 hunks)
  • surfsense_backend/app/routes/chats_routes.py (1 hunks)
  • surfsense_backend/app/tasks/stream_connector_search_results.py (1 hunks)
  • surfsense_backend/app/utils/connector_service.py (21 hunks)
  • surfsense_backend/pyproject.toml (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
surfsense_backend/app/agents/researcher/sub_section_writer/graph.py (3)
surfsense_backend/app/agents/researcher/sub_section_writer/state.py (1)
  • State (10-22)
surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py (2)
  • fetch_relevant_documents (11-160)
  • write_sub_section (164-243)
surfsense_backend/app/agents/researcher/sub_section_writer/configuration.py (1)
  • Configuration (12-31)
surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py (4)
surfsense_backend/app/agents/researcher/sub_section_writer/configuration.py (2)
  • Configuration (12-31)
  • from_runnable_config (25-31)
surfsense_backend/app/agents/researcher/sub_section_writer/state.py (1)
  • State (10-22)
surfsense_backend/app/utils/connector_service.py (6)
  • ConnectorService (10-515)
  • search_crawled_urls (16-59)
  • search_files (61-104)
  • search_tavily (126-217)
  • search_slack (219-281)
  • search_notion (283-353)
surfsense_backend/app/utils/reranker_service.py (2)
  • RerankerService (5-95)
  • rerank_documents (19-80)
🪛 Ruff (0.8.2)
surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py

1-82: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (10)
surfsense_backend/app/tasks/stream_connector_search_results.py (1)

17-17: Type change from int to str looks good

The parameter type change for user_id from int to str aligns with similar changes in the connector service. This change maintains type consistency across the application.

surfsense_backend/pyproject.toml (1)

16-16: Dependency addition for LangGraph looks good

The addition of langgraph dependency with a minimum version constraint is appropriate for supporting the new sub_section_writer agent functionality.

surfsense_backend/app/agents/researcher/sub_section_writer/__init__.py (1)

1-8: Well-structured module initialization

The new module file follows Python best practices with a clear docstring, appropriate imports, and explicit public API definition through __all__.

surfsense_backend/app/agents/researcher/sub_section_writer/graph.py (1)

1-23: Well-structured graph definition for the sub-section writer workflow.

The implementation follows a clean pattern for defining a LangGraph workflow with a sequential flow:

  1. Start → Fetch relevant documents
  2. Fetch relevant documents → Write sub-section
  3. Write sub-section → End

This straightforward structure makes the workflow easy to understand and maintain.

surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py (1)

2-82: Comprehensive citation system prompt that addresses the PR's citation issues.

The prompt provides detailed instructions for IEEE citation format usage, including:

  • Clear steps for analyzing documents and extracting information
  • Specific citation format requirements (square brackets with source_id)
  • Input/output examples demonstrating proper citation usage
  • Common incorrect citation formats to avoid

This well-structured prompt should effectively resolve the citation issues mentioned in the PR objectives.

🧰 Tools
🪛 Ruff (0.8.2)

1-82: f-string without any placeholders

Remove extraneous f prefix

(F541)

surfsense_backend/app/agents/researcher/sub_section_writer/state.py (1)

9-22: Clean state implementation with proper typing.

The State class properly defines:

  1. A runtime context with the database session
  2. Output fields to store the results from each node in the workflow

The types are well-defined, and the optional fields have appropriate default values of None.

surfsense_backend/app/agents/researcher/sub_section_writer/configuration.py (2)

11-21: Well-defined configuration with appropriate types and defaults.

The Configuration class effectively captures all the input parameters needed for the sub-section writer agent. Using kw_only=True on the dataclass ensures a clean instantiation pattern.


24-31: Robust configuration factory method with proper field filtering.

The from_runnable_config class method safely:

  1. Handles the case where config might be None
  2. Extracts only the fields that exist in the Configuration class
  3. Returns a properly constructed instance

This approach prevents errors when passing in configurations with extra fields.

surfsense_backend/app/utils/connector_service.py (2)

106-125: Ensure connector is properly typed and validated.

The signature change to accept user_id: str (line 106) may expose potential mismatches if existing code still passes an integer. Verify that all call sites use a string user ID. Additionally, consider logging when a connector is not found for troubleshooting.


355-444: Check extension-specific metadata reliability.

  1. At lines 393–424, parsing extension data includes date/time handling and visit duration logic. Ensure each key is reliably present, or handle missing fields gracefully.\
  2. source_id_counter concurrency concerns apply at line 432.
🧰 Tools
🪛 Ruff (0.8.2)

391-391: Local variable browsing_session_id is assigned to but never used

Remove assignment to unused variable browsing_session_id

(F841)


401-401: Do not use bare except

(E722)


421-421: Do not use bare except

(E722)

@MODSetter MODSetter merged commit afe7ed4 into main Apr 14, 2025
3 checks passed
AbdullahAlMousawi pushed a commit to AbdullahAlMousawi/SurfSense that referenced this pull request Jul 14, 2025
Fixed current agent citation issues and added sub_section_writer agen…
CREDO23 pushed a commit to CREDO23/SurfSense that referenced this pull request Jul 25, 2025
Fixed current agent citation issues and added sub_section_writer agen…
@coderabbitai coderabbitai bot mentioned this pull request Sep 10, 2025
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants