Skip to content

Conversation

@avelino
Copy link
Member

@avelino avelino commented Apr 8, 2025

This commit adds support for full-text search (FTS) in ChronDB using PostgreSQL's to_tsquery syntax. The implementation includes:

  • Support for @@ to_tsquery syntax in WHERE clauses
  • Integration with Lucene index for FTS operations
  • Mapping of PostgreSQL text search operators to Lucene queries
  • Proper handling of FTS conditions in query execution
  • Support for multiple FTS conditions with AND/OR operators

The changes allow users to perform full-text searches using familiar PostgreSQL syntax while leveraging Lucene's powerful search capabilities under the hood.

Key files modified:

  • src/chrondb/api/sql/parser/clauses.clj: Added FTS condition parsing
  • src/chrondb/api/sql/execution/query.clj: Added FTS query handling
  • src/chrondb/index/lucene.clj: Enhanced Lucene index for FTS
  • src/chrondb/index/protocol.clj: Updated index protocol for FTS

Summary by CodeRabbit

  • New Features

    • Integrated full-text search capabilities with accent normalization, smart wildcard adjustments, and improved filtering.
    • Enhanced Git storage management for branch operations and concurrent document handling.
  • Enhancements

    • Upgraded SQL query parsing and error handling for more reliable document retrieval.
    • Improved server startup flexibility with dynamic configuration and robust indexing.
    • Refined logging throughout the system to provide better operational transparency.
  • Documentation

    • Added examples for full-text search functionality using to_tsquery in the documentation.
  • Tests

    • Expanded test coverage ensures higher system stability and reliability, including new tests for memory index functionalities and Git storage operations.

This commit adds support for full-text search (FTS) in ChronDB using PostgreSQL's
to_tsquery syntax. The implementation includes:

- Support for @@ to_tsquery syntax in WHERE clauses
- Integration with Lucene index for FTS operations
- Mapping of PostgreSQL text search operators to Lucene queries
- Proper handling of FTS conditions in query execution
- Support for multiple FTS conditions with AND/OR operators

The changes allow users to perform full-text searches using familiar PostgreSQL
syntax while leveraging Lucene's powerful search capabilities under the hood.

Key files modified:
- src/chrondb/api/sql/parser/clauses.clj: Added FTS condition parsing
- src/chrondb/api/sql/execution/query.clj: Added FTS query handling
- src/chrondb/index/lucene.clj: Enhanced Lucene index for FTS
- src/chrondb/index/protocol.clj: Updated index protocol for FTS

Signed-off-by: Avelino <[email protected]>
@avelino avelino added the :enhancement New feature or request label Apr 8, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 8, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This pull request integrates full-text search (FTS) support into the SQL execution module by introducing a function to detect FTS conditions and enhancing query handling logic. It updates the SQL parser, server startup, and API search functions to accommodate an additional index parameter and normalization of search terms. Furthermore, it overhauls the Lucene and memory index implementations to improve logging, error handling, and functionality, while also adding comprehensive tests for these components along with new tests for Git storage operations.

Changes

Files Change Summary
src/chrondb/api/sql/execution/query.clj Added fts-condition? to detect FTS conditions; updated handle-select, handle-select-case, and handle-query to accept an additional index parameter and process FTS-specific logic.
src/chrondb/api/sql/parser/clauses.clj, src/chrondb/api/sql/parser/statements.clj Enhanced parsing logic for WHERE clauses and table specifications with added logging, token checks, and normalization of input (e.g., removal of extraneous characters).
src/chrondb/api/sql/server.clj, src/chrondb/core.clj Modified server startup to handle an index-or-port parameter; replaced the memory index with a Lucene index and added logging for index creation.
src/chrondb/api/v1.clj, src/chrondb/api/v1/routes.clj, src/chrondb/example.clj Updated search functions to call index/search with additional fixed parameters (such as "name", "content", or "main") to alter the search context.
src/chrondb/index/lucene.clj, src/chrondb/index/memory.clj, src/chrondb/index/protocol.clj Overhauled indexing modules: renamed and updated document creation in Lucene, enhanced parameter signatures for searches (adding field and branch), improved logging and error handling, and added a new close method in the index protocol.
test/chrondb/api/sql/execution/query_test.clj, test/chrondb/api/sql/sql_query_test.clj Adjusted tests to pass nil for the second argument to handle-select, reflecting changes in the function’s signature.
test/chrondb/index/lucene_test.clj, test/chrondb/index/memory_test.clj Added new tests for Lucene and memory index functionalities covering full-text search capabilities, custom field handling, error conditions, and reader availability.
test/chrondb/storage/git_test.clj Introduced a new open-repo function using FileRepositoryBuilder and added extensive tests for Git storage operations including branch, query, file, concurrency, and repository setup.

Possibly related PRs

  • feat: Add full-text search support via PostgreSQL to_tsquery syntax #23: The changes in the main PR are directly related to those in the retrieved PR, as both introduce and enhance full-text search capabilities, specifically through the implementation of the fts-condition? function and modifications to the handle-select function in the same file.
  • feat: add SQL schema support for multi-tenant data isolation #21: The changes in the main PR, which introduce full-text search capabilities and modify the handle-select function, are related to the retrieved PR as both involve significant modifications to the src/chrondb/api/sql/execution/query.clj file, specifically altering the handle-select function to accommodate new parameters and functionalities.

Suggested labels

bug

Poem

Oh, I’m a happy little rabbit,
Hopping through the fields of code,
Searching with full-text magic,
Where indexes lighten the load.
With every bug bounced away,
I nibble on changes all day!
🥕🐇


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab18ea7 and 0bbb031.

📒 Files selected for processing (2)
  • docs/SUMMARY.md (1 hunks)
  • docs/examples-postgresql.md (1 hunks)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

avelino added 2 commits April 8, 2025 14:11
This commit adds support for PostgreSQL-style to_tsquery syntax in SQL WHERE clauses,
allowing users to write queries like:

  SELECT * FROM documents WHERE content @@ to_tsquery('query')

The implementation:
- Parses to_tsquery expressions in WHERE clauses
- Maps them to Lucene full-text search
- Maintains compatibility with existing FTS_MATCH syntax
- Updates API endpoints to use consistent field/branch parameters

This change improves SQL compatibility and provides a more familiar syntax
for PostgreSQL users.

fix: lint issues in clauses.clj

Signed-off-by: Avelino <[email protected]>
This commit updates the Lucene index tests to expect document IDs instead of full documents in search results. This change aligns with the index protocol specification and makes the tests more focused on the core indexing functionality.

The changes include:
- Modifying test assertions to check for document IDs instead of full documents
- Updating test cases for basic operations, document updates, and nil value handling
- Maintaining test coverage while improving test clarity

This refactoring helps ensure consistent behavior across different index implementations and makes the tests more maintainable.

Signed-off-by: Avelino <[email protected]>
@avelino avelino changed the title WIP: feat: Add full-text search support via PostgreSQL to_tsquery syntax feat: Add full-text search support via PostgreSQL to_tsquery syntax Apr 8, 2025
avelino added 3 commits April 8, 2025 14:28
This commit adds support for PostgreSQL's to_tsquery syntax in the Lucene index implementation. The changes include:

- Added to_tsquery parsing and conversion to Lucene query syntax
- Updated memory index tests to use consistent formatting
- Added test cases for to_tsquery functionality
- Improved error handling for invalid query syntax

The implementation allows users to use PostgreSQL-style text search queries while maintaining compatibility with existing Lucene search functionality.

Signed-off-by: Avelino <[email protected]>
The commit simplifies the test case for git commit errors by using the `_` wildcard for unused arguments instead of `args`. This makes the code more concise and follows Clojure conventions for ignoring unused parameters.

Signed-off-by: Avelino <[email protected]>
This commit adds comprehensive documentation for ChronDB's full-text search
capabilities using PostgreSQL's to_tsquery syntax. The documentation covers:

- Basic FTS syntax and examples
- How the search works internally
- Field optimization with _fts suffix
- Comparison with FTS_MATCH function

The documentation is added to examples-postgresql.md and linked in SUMMARY.md.

Signed-off-by: Avelino <[email protected]>
@avelino avelino merged commit f8daf8d into main Apr 8, 2025
8 checks passed
@avelino avelino deleted the avelino/sql-lucine-index-to_tsquery branch April 8, 2025 17:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (11)
src/chrondb/example.clj (1)

13-13: Improved code readability by removing extraneous blank lines

Removal of unnecessary blank lines improves code readability without affecting functionality.

Also applies to: 38-38, 48-48, 53-53, 57-57, 60-60, 65-65, 67-67

src/chrondb/api/v1/routes.clj (3)

68-71: The TODO comments should be considered for implementation

The TODO comments indicate plans to allow specifying search fields and branches via query parameters, which would make the API more flexible. Consider implementing these features as they align well with the full-text search capabilities being added.

Also, I notice the comments on lines 70-71 are in Portuguese while the rest of the codebase is in English. For consistency, consider translating these comments to English.


72-73: Fix indentation in the response call

There's a slight indentation issue in the response/response call that should be fixed for code readability and consistency.

  (response/response
-     {:results (index/search index default-field query default-branch)})))
+   {:results (index/search index default-field query default-branch)})))

61-73: Update function documentation to reflect new parameters

The function docstring doesn't mention the new parameters being passed to the index/search function. Consider updating it to mention the default field and branch being used, as this is important for users to understand the search behavior.

(defn handle-search
  "Handles document search requests.
   Parameters:
   - index: The index implementation
   - query: The search query string
+  
+  Internally, the function uses 'content' as the default search field and 'main' as the default branch.
   Returns: HTTP response with search results"
  [index query]
src/chrondb/api/sql/parser/statements.clj (1)

35-42: Consider moving table spec cleaning to a helper function

The table specification cleaning logic is duplicated across all four SQL statement parsers. To follow the DRY principle, consider extracting this logic into a helper function.

+ (defn- clean-table-spec
+   "Cleans a table specification by removing trailing semicolons and spaces.
+    Returns [schema table-name] tuple."
+   [raw-table-spec]
+   (when raw-table-spec
+     (let [cleaned-spec (-> raw-table-spec
+                            (str/replace #";$" "")
+                            (str/trim))
+           parts (str/split cleaned-spec #"\.")]
+       (if (= (count parts) 2)
+         [(first parts) (str/trim (second parts))]
+         [nil cleaned-spec]))))

  ;; Then in each parser:
  (let [raw-table-spec (when (and from-index (> from-index 0)) ...)
-       [schema table-name] (when raw-table-spec
-                            (let [cleaned-spec (-> raw-table-spec
-                                                   (str/replace #";$" "")
-                                                   (str/trim))
-                                  parts (str/split cleaned-spec #"\.")]
-                              (if (= (count parts) 2)
-                                [(first parts) (str/trim (second parts))]
-                                [nil cleaned-spec])))
+       [schema table-name] (clean-table-spec raw-table-spec)
        ...]

Also applies to: 71-78, 125-132, 176-183

src/chrondb/api/sql/server.clj (1)

10-11: Ensure the docstring clarifies argument types.
While the docstring mentions index-or-port, consider specifying that if it’s a numeric value, it’s treated as the port; otherwise, it’s an index implementation. This helps clarify function usage for new developers.

-  - index-or-port: Either an index implementation OR a port number (default: 5432)
+  - index-or-port: Either an index implementation (Lucene or custom) OR a port number (default: 5432)
src/chrondb/api/sql/parser/clauses.clj (1)

74-145: Robust FTS parsing with fallback warnings.

  • Parsing logic for FTS_MATCH(...) and field @@ to_tsquery(...) is well-structured, including logs for debugging.
  • Consider explicitly handling OR conditions in the future, as they’re currently skipped or assumed as AND.
src/chrondb/index/lucene.clj (4)

15-41: Be cautious about double-appending _fts.
When the field name itself already ends with _fts, the code may generate a second _fts suffix, resulting in verbose fields like title_fts_fts. If that’s unintended, you could skip adding _fts a second time.

Consider applying a safeguard like:

- (when (or (= field-name "name")
-           (= field-name "description")
-           (= field-name "content")
-           (= field-name "text")
-           (= field-name "location")
-           (str/ends-with? field-name "_fts"))
+ (when (or (and (#{ "name" "description" "content" "text" "location"} field-name))
+           (and (str/ends-with? field-name "_fts")
+                (not (str/ends-with? field-name "_fts_fts"))))

43-49: Remove or document the forgotten doc->map function.
Since this function is commented out, consider removing it entirely or adding comments explaining future use cases to reduce accidental clutter.


60-65: Check concurrency implications for reader-atom and searcher-atom.
Multiple threads might update these atoms simultaneously, risking transient read/write issues. Evaluate if additional synchronization or a single-threaded index writer policy is necessary.


92-152: Leading wildcard usage may cause performance challenges.
Allowing leading wildcards and adding * for short queries can significantly expand search complexity on large datasets. Consider restricting or optimizing such searches to avoid heavy performance overhead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d55e504 and ab18ea7.

📒 Files selected for processing (16)
  • src/chrondb/api/sql/execution/query.clj (7 hunks)
  • src/chrondb/api/sql/parser/clauses.clj (2 hunks)
  • src/chrondb/api/sql/parser/statements.clj (4 hunks)
  • src/chrondb/api/sql/server.clj (1 hunks)
  • src/chrondb/api/v1.clj (1 hunks)
  • src/chrondb/api/v1/routes.clj (2 hunks)
  • src/chrondb/core.clj (2 hunks)
  • src/chrondb/example.clj (4 hunks)
  • src/chrondb/index/lucene.clj (1 hunks)
  • src/chrondb/index/memory.clj (2 hunks)
  • src/chrondb/index/protocol.clj (1 hunks)
  • test/chrondb/api/sql/execution/query_test.clj (12 hunks)
  • test/chrondb/api/sql/sql_query_test.clj (1 hunks)
  • test/chrondb/index/lucene_test.clj (2 hunks)
  • test/chrondb/index/memory_test.clj (1 hunks)
  • test/chrondb/storage/git_test.clj (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (51)
src/chrondb/api/v1.clj (1)

25-27:

Details

✅ Verification successful

Updated search function signature to support full-text search

The handle-search function has been modified to call index/search with additional parameters, specifying the field to search ("name") and the branch to search in ("main"). This change aligns with the PR's objective of implementing full-text search using PostgreSQL's to_tsquery syntax.

Let's verify that this change is consistent with other search function implementations:


🏁 Script executed:

#!/bin/bash
# Searching for other index/search calls to ensure consistent parameter usage
rg -A 2 "index/search" --glob "!src/chrondb/api/v1.clj" --glob "!src/chrondb/example.clj"

Length of output: 5070


Verified: Consistent Full-Text Search Implementation

The handle-search function in src/chrondb/api/v1.clj now calls (index/search index "name" query "main"), which is fully consistent with other usages throughout the codebase (e.g., in test/chrondb/index/memory_test.clj and test/chrondb/index/lucene_test.clj). This modification meets the PR’s goal of leveraging PostgreSQL's to_tsquery syntax for full-text search.

  • Confirmed that similar search calls across tests use the fields "name" and branch "main".
  • No discrepancies were found in parameter consistency with other implementations.
  • The revised search signature aligns with the intended functionality and coding standards.
test/chrondb/api/sql/sql_query_test.clj (1)

106-106: Modified handle-select function call to support the FTS index parameter

The change from passing parsed to nil as the second argument in query/handle-select adapts the test to the updated function signature, which now likely accepts an optional index parameter for full-text search support.

src/chrondb/example.clj (1)

26-27: Updated search-users implementation to support full-text search

Similar to the change in src/chrondb/api/v1.clj, the index/search call now includes field and branch parameters. This ensures consistent implementation of the full-text search capability across the codebase.

test/chrondb/api/sql/execution/query_test.clj (1)

92-92:

Details

✅ Verification successful

Consistently updated all handle-select calls to support FTS capability

All calls to query/handle-select have been updated to pass nil as the second argument instead of the query variable. This consistent change across all test cases adapts the tests to the updated function signature, which now likely accepts an optional index parameter for full-text search functionality.

Let's verify the updated function signature of handle-select:

Also applies to: 105-105, 118-118, 131-131, 145-145, 157-157, 171-171, 199-199, 219-219, 236-236, 247-247, 258-258


🏁 Script executed:

#!/bin/bash
# Search for the handle-select function definition to confirm the new signature
rg -A 5 "defn handle-select" --glob "src/chrondb/api/sql/execution/*.clj"

Length of output: 1053


FTS Update Verification Complete: Consistent Parameter Usage Confirmed
All instances of query/handle-select have been updated to pass nil as the second argument, matching the newly defined function signature in src/chrondb/api/sql/execution/query.clj which now accepts an optional index parameter for full-text search (FTS). This update is correctly applied in all affected test cases (lines 92, 105, 118, 131, 145, 157, 171, 199, 219, 236, 247, 258).

src/chrondb/core.clj (2)

103-105: LGTM: Improved logging for index creation

Adding logging statements for the Lucene index creation process improves observability of the application startup sequence.


123-123: LGTM: Updated SQL server initialization to include index

Properly passing the index to the SQL server is essential for enabling the new full-text search capabilities.

src/chrondb/index/memory.clj (2)

9-18: Use of deprecated code marker for document-matches? function

Good practice to mark the unused function with #_ rather than removing it entirely, as it documents the evolution of the codebase and might be useful for reference.


32-32: Appropriate logging for MemoryIndex search

The logging statement clearly indicates that this implementation uses basic string matching rather than full-text search, which is important for users to understand the limitations of the memory index implementation.

src/chrondb/api/sql/parser/statements.clj (2)

36-39: Improved SQL parsing robustness

The addition of code to clean table specifications by removing trailing semicolons and trimming whitespace is a good improvement to make the SQL parser more robust. This change handles common syntax variations that users might include in their queries.


72-75: Consistent cleanup logic across all SQL statement parsers

I notice the same cleanup logic has been consistently applied across SELECT, INSERT, UPDATE, and DELETE statements. This ensures uniform handling of table specifications throughout the parser, which is a good practice.

Also applies to: 126-129, 177-180

src/chrondb/api/sql/server.clj (2)

4-4: Dependency update looks good.
Requiring [chrondb.index.lucene :as lucene-index] aligns with the new Lucene-based indexing feature, removing the dependency on any memory-based index.


14-22: Constructor logic is clear.
The fallback to lucene-index/create-lucene-index when index is nil is straightforward, ensuring the server starts gracefully on the specified port. No issues found.

test/chrondb/storage/git_test.clj (7)

46-52: Repository opener appears correct.
The open-repo function properly constructs and builds a repository from the provided path using FileRepositoryBuilder. Good job handling environment and Git directory discovery.


120-140: Branch operations test coverage.
Verifying that documents differ across branches is important for multi-branch scenarios. These tests provide good coverage of separate states in “main” and “dev.”


141-176: Query operations tests look comprehensive.
Testing prefix and table-based queries, plus multi-branch queries, ensures robust validation of Git-based storage retrieval logic.


177-190: File operation error handling.
Forcing commit errors and testing non-existent documents verify edge-case behaviors. Nicely done.


191-211: Concurrent operations handled effectively.
Gradual delays and verifying final saved documents validate concurrency mechanics. Code looks good.


212-221: Existing directory setup test.
Ensuring Git storage starts even if the directory already exists is a valuable scenario. Nice coverage.


222-236: Document path generation coverage.
These tests confirm flexible ID patterns and prefix retrieval. This helps ensure consistent pathing in Git storage.

src/chrondb/index/protocol.clj (1)

2-3: Protocol documentation improvements.
Simplified descriptions are concise, and adding close method is a welcome addition for resource management. No concerns.

Also applies to: 7-11

src/chrondb/api/sql/parser/clauses.clj (2)

5-6: Logging and utility references look good.
The addition of [chrondb.util.logging :as log] is consistent with the rest of the codebase and should help with debugging.


64-69: Docstring correctly outlines new FTS parsing.
Highlighting recognition of FTS_MATCH and @@ to_tsquery clarifies the function scope.

test/chrondb/index/lucene_test.clj (13)

27-29: LGTM: Properly adjusted the index/search call with new parameters.


34-35: LGTM: Consistent update to include the field and branch for searching.


37-38: LGTM: Correct deletion flow followed by a fresh search to confirm removal.


46-48: LGTM: Updated the search call in the test to reflect the new function signature.


53-55: LGTM: Searching by age field now includes the branch argument, which is appropriate.


60-61: LGTM: Verifies the engine handles empty search strings properly.


66-67: LGTM: Checks handling of non-existent terms, ensuring the search returns an empty result.


69-83: Comprehensive accent-handling test.
This test effectively checks diacritic folding and ensures accented and unaccented queries align.


84-103: Thorough coverage for wildcard and prefix searches.
This suite adequately tests infix and prefix matching scenarios.


104-120: Good job verifying FTS-optimized fields.
Ensures that custom fields tagged for full-text search are correctly indexed and retrieved.


122-135: Robust error handling tests.
Checking writer-closed scenarios helps confirm stability under failure conditions.


137-141: Verifies invalid directory handling.
Helps ensure the index creation gracefully returns nil without crashing.


143-150: Tests behavior with a nil index reader.
This scenario is critical to confirm the search operation fails gracefully.

test/chrondb/index/memory_test.clj (7)

1-5: Namespace and dependencies initialization look correct.
Sets up the test environment properly with references to memory index protocols.


6-11: Fixture creation is straightforward and well-structured.
Ensures each test runs against a fresh memory index instance.


14-20: Verifies successful indexing.
Checks both the returned result and storage data structure for consistency.


21-31: Tests removal of an indexed document.
Appropriately confirms the document can no longer be retrieved post-deletion.


32-48: Ensures field-based search logic works with multiple documents.
Validates the count and content of returned results.


49-62: Good coverage of searching via a content field.
Shows that custom search fields function as expected.


63-79: Checks case insensitivity in searches.
Verifies consistent result sets for different text cases.

src/chrondb/api/sql/execution/query.clj (4)

86-95: Neat helper to identify FTS conditions.
Clearly checks the :type field for :fts-match—simple and effective.


97-235: Implementation of combined FTS and standard conditions is comprehensive.
You’re only processing the first FTS condition (first-fts), so if multiple FTS conditions exist, only the first is used. Verify if that’s intended or if you need to handle multiple. Otherwise, the branching logic and standard condition filtering appear coherent, and the code is well-instrumented with logs.


275-355: Index parameter integration looks consistent.
Properly accounts for potential count and aggregate queries, preserving existing behaviors while extending the logic to handle FTS results.


549-567: Handles the new index parameter in handle-query gracefully.
The error logging and fallback for unknown queries are consistent with the rest of the flow.

src/chrondb/index/lucene.clj (5)

3-13: No issues with the newly imported namespaces and classes.
These imports align well with the subsequent functionality and appear correctly scoped.


78-90: Document removal is well-structured.
Good use of exception handling and a clear boolean return to signify success or failure. This pattern helps avoid silent errors.


154-160: Graceful resource teardown path.
Closing the writer, reader, and directory in this order mitigates risk of resource leaks or read/write conflicts. Implementation looks solid here.


162-187: Comprehensive index creation and fallback strategy.
Building or reopening an index with a graceful fallback for missing or corrupt indexes ensures reliability. The logging provides helpful diagnostic details.


67-77:

Details

❓ Verification inconclusive

Verify thread safety in index-document.
Because the index-document method resets the reader-atom and searcher-atom after committing, concurrent indexing could lead to inconsistent states if not properly synchronized.


🏁 Script executed:

#!/bin/bash
# Searching for concurrency references and usage patterns of `index-document` across the codebase
rg "index-document"

Length of output: 2685


Attention: Potential Concurrency Issue in Document Indexing

The current implementation in src/chrondb/index/lucene.clj (lines 67–77) resets the reader-atom and searcher-atom immediately after calling .commit on the writer. Given that this function is invoked from multiple parts of the codebase (as evidenced by its usage in several API endpoints and tests), concurrent calls to index-document can lead to race conditions and an inconsistent state between the writer, reader, and searcher.

  • Location: src/chrondb/index/lucene.clj, lines 67–77
  • Issue: Lack of synchronization when resetting shared atoms can cause conflicts if multiple threads index documents simultaneously.
  • Suggestion: Evaluate the thread safety of the indexing process. If concurrent indexing is expected, consider introducing synchronization (e.g., via locking mechanisms or atomic updates) around the operations that commit the writer and update the reader and searcher.

Comment on lines +31 to +44
(search [_ field query-string branch]
(log/log-warn (str "MemoryIndex search called (field: " field ", query: " query-string ", branch: " branch "). Basic string matching performed, not full FTS."))
(let [all-docs (seq (.values data))
query-lower (str/lower-case query-string)]
(filter
(fn [doc]
(some (fn [[k v]]
(when (string? v)
(cond
(= (name k) field) (str/includes? (str/lower-case v) query-lower)
(= field "content") (str/includes? (str/lower-case v) query-lower)
:else false)))
doc))
all-docs))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

The branch parameter is not used in filtering

While the search signature has been updated to accept the branch parameter, it's not actually used in the filtering logic. This could lead to inconsistencies with other index implementations that might use this parameter.

Consider updating the filter function to also check if the document's branch matches the specified branch parameter.

  (filter
   (fn [doc]
+    (and 
+     (or (nil? branch) (= (:branch doc) branch))
      (some (fn [[k v]]
             (when (string? v)
               (cond
                 (= (name k) field) (str/includes? (str/lower-case v) query-lower)
                 (= field "content") (str/includes? (str/lower-case v) query-lower)
                 :else false)))
-           doc))
+           doc)))
   all-docs))))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(search [_ field query-string branch]
(log/log-warn (str "MemoryIndex search called (field: " field ", query: " query-string ", branch: " branch "). Basic string matching performed, not full FTS."))
(let [all-docs (seq (.values data))
query-lower (str/lower-case query-string)]
(filter
(fn [doc]
(some (fn [[k v]]
(when (string? v)
(cond
(= (name k) field) (str/includes? (str/lower-case v) query-lower)
(= field "content") (str/includes? (str/lower-case v) query-lower)
:else false)))
doc))
all-docs))))
(search [_ field query-string branch]
(log/log-warn (str "MemoryIndex search called (field: " field ", query: " query-string ", branch: " branch "). Basic string matching performed, not full FTS."))
(let [all-docs (seq (.values data))
query-lower (str/lower-case query-string)]
(filter
(fn [doc]
(and
(or (nil? branch) (= (:branch doc) branch))
(some (fn [[k v]]
(when (string? v)
(cond
(= (name k) field) (str/includes? (str/lower-case v) query-lower)
(= field "content") (str/includes? (str/lower-case v) query-lower)
:else false)))
doc)))
all-docs))))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants