Skip to content

Conversation

@avelino
Copy link
Member

@avelino avelino commented Oct 30, 2025

Introduce Abstract Syntax Tree (AST) query layer that enables consistent query parsing, planning, and execution for REST, Redis, SQL, and CLI protocols. The new chrondb.query.ast namespace defines composable query operators (FTS, range, sort, etc.) with cursor-based pagination and type safety. Deprecate bespoke query engines in favor of a Lucene-backed planner that ensures protocol parity and performance. Includes migration notes, updated documentation, and extensive integration tests.

  • Adds direct AST support to Redis and SQL engines (ast_converter.clj)
  • Refactors test suite to assert identical results across protocols for history, sorting, and paging
  • Background reindexing and commit order maintained via Lucene index hooks
  • Documentation added in docs/ast-queries.md and referenced in README, operations, and architecture docs

fixes #39

Summary by CodeRabbit

  • New Features

    • Apache Lucene-powered search with full-text, composite, geospatial queries, sorting, cursor pagination and cached query planner; unified AST-based query execution across REST, Redis, SQL, and CLI; new search endpoints and CLI command.
  • Documentation

    • Expanded architecture, AST query guide, Redis/REST/SQL/CLI examples, performance and operations (background reindexing) and README updates.
  • Tests

    • New unit and multi-protocol integration tests covering AST, SQL conversion, and Lucene behavior.
  • Maintenance / Chores

    • Background index population/maintenance tasks and native-image/CI runtime-initialization guidance.

Introduce Abstract Syntax Tree (AST) query layer that enables consistent query parsing, planning, and execution for REST, Redis, SQL, and CLI protocols. The new `chrondb.query.ast` namespace defines composable query operators (FTS, range, sort, etc.) with cursor-based pagination and type safety. Deprecate bespoke query engines in favor of a Lucene-backed planner that ensures protocol parity and performance. Includes migration notes, updated documentation, and extensive integration tests.

- Adds direct AST support to Redis and SQL engines (`ast_converter.clj`)
- Refactors test suite to assert identical results across protocols for history, sorting, and paging
- Background reindexing and commit order maintained via Lucene index hooks
- Documentation added in `docs/ast-queries.md` and referenced in README, operations, and architecture docs

fixes #39

Signed-off-by: Avelino <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 30, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a Lucene-backed search overhaul: an AST query model, new search-query index protocol, Lucene and Memory index implementations, REST/Redis/SQL/CLI routing to AST-based search, background index maintenance and population, extensive documentation, and multi-protocol integration and unit tests.

Changes

Cohort / File(s) Summary
Documentation
AGENT.md, README.md, docs/architecture.md, docs/ast-queries.md, docs/examples-redis.md, docs/index.md, docs/operations.md, docs/performance.md
New and expanded docs describing Lucene-powered indexing, AST queries, examples, background reindexing, performance guidance, and reviewer/CI guidance for native-image/runtime init and lazy CLI namespaces.
Query AST Library
src/chrondb/query/ast.clj
New AST builder: clause constructors (term, fts, range, boolean), sort descriptors, query wrapper and helpers for pagination/branch/hints/search-after.
Index Protocol
src/chrondb/index/protocol.clj
Added public search-query protocol method; documented search as legacy.
Lucene Index Implementation
src/chrondb/index/lucene.clj
Major Lucene integration: field normalization, numeric fields, sort construction, ast->query multimethods, query execution (sort/limit/offset/search-after), maintenance tasks, index population, and search-query implementation.
Memory Index Implementation
src/chrondb/index/memory.clj
Added search-query implementation that supports AST-like clauses, matchers, content-field scanning, pagination and result shaping.
REST API & Routing
src/chrondb/api/v1.clj, src/chrondb/api/v1/routes.clj
Request parsing (structured queries, sort, after cursor), AST construction, call into index/search-query, fetch docs from storage, return total/next-cursor; GET "/search" wired to new flow.
Redis API
src/chrondb/api/redis/core.clj
Added SEARCH and FT.SEARCH handlers that parse options (LIMIT/OFFSET/SORT/BRANCH), build AST, call search-query, and return JSON results.
SQL Execution & Parser
src/chrondb/api/sql/execution/ast_converter.clj, src/chrondb/api/sql/execution/operators.clj, src/chrondb/api/sql/execution/query.clj, src/chrondb/api/sql/parser/clauses.clj, src/chrondb/api/sql/parser/statements.clj, src/chrondb/api/sql/protocol/messages/writer.clj
SQL WHERE→AST conversion, numeric-aware operator evaluation, SELECT routing through Lucene when index available (handle-select-case signature updated), multi-row INSERT support, parse-time skip branches for IN/BETWEEN/NULL, and safer message-type writing.
CLI & HTTP Client
src/chrondb/cli/core.clj, src/chrondb/cli/http.clj, src/chrondb/cli/server.clj
New search CLI command and search-documents HTTP helper; server startup kicks off background index population and periodic maintenance task.
Index Lifecycle & Maintenance Wiring
src/chrondb/cli/server.clj, src/chrondb/index/lucene.clj
Background index-population task, periodic maintenance/commit/refresh loop, ensure-searcher-updated!, start/stop maintenance helpers.
Examples & Tests
src/chrondb/example.clj, test/chrondb/api/ast_integration_test.clj, test/chrondb/api/sql/execution/query_test.clj, test/chrondb/api/sql/parser/clauses_test.clj, test/chrondb/index/lucene_test.clj, test/chrondb/index/memory_test.clj, test/chrondb/query/ast_test.clj, test/chrondb/api/sql/execution/ast_converter_test.clj
Example updated to search-query; extensive AST integration tests (REST/Redis/AST), SQL parser and ast-converter unit tests, and migrated index tests to search-query API.
Core CLI/Server Lazy Resolve
src/chrondb/core.clj
Replaced direct namespace requires with requiring-resolve / lazy var resolution for CLI/server command dispatch to defer runtime initialization.
Native Image Runtime
dev/chrondb/native_image.clj
Added Jackson classes to runtime-initialized class list and adjusted native-image arg composition.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API Layer (REST/Redis/SQL/CLI)
    participant AST as chrondb.query.ast
    participant Index as Index (Lucene/Memory)
    participant Storage as Storage
    participant Indexer as Background Indexer

    Client->>API: search request (q/query, limit, sort, branch, after)
    API->>AST: parse/construct AST
    AST-->>API: AST query map
    API->>Index: search-query(AST, branch, opts)
    rect rgb(220,235,255)
      Note right of Index: ast->query → build Lucene Query\nexecute (sort/limit/offset/search-after)
    end
    Index-->>API: {ids, total, next-cursor}
    API->>Storage: fetch docs by ids
    Storage-->>API: documents
    API-->>Client: response (docs, total, next-cursor)

    par Background maintenance
      Indexer->>Storage: scan for new/changed docs
      Indexer->>Index: index documents, commit, refresh
      Index->>Index: update stats / warm caches
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas requiring extra attention:

  • src/chrondb/index/lucene.clj — ast->query multimethods, Lucene field typing, sort mapping, search-after correctness, threading of analyzers and options, and maintenance lifecycle.
  • src/chrondb/api/sql/execution/query.clj — handle-select-case signature change, Lucene routing vs storage fallback, JOIN and projection correctness.
  • test/chrondb/api/ast_integration_test.clj — integration stability across REST/Redis/AST, Redis lifecycle setup/teardown, and deterministic ordering/cursor behavior.
  • Protocol surface consistency — ensure all call sites migrated to search-query and backward-compat fallbacks handled.

Possibly related PRs

Suggested labels

enhancement

Poem

🐰 I hopped through indexes, fields aglow,
I stitched AST branches row by row,
Cursors tucked beneath my paw,
Lucene hummed and wakes the law,
Now searches dance — hooray, let's go! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat(ast): unify query engine across all protocols with new AST system" directly and clearly describes the main change in the changeset. The PR introduces a new Abstract Syntax Tree (AST) query layer and refactors REST, Redis, SQL, and CLI protocols to use this unified query engine, which is precisely what the title conveys. The title is concise, specific, and accurately reflects the primary objective of the changeset without unnecessary noise.

[pre_merge_check_pass]

The PR code changes substantially implement the primary coding requirements from issue #39: it expands src/chrondb/index/lucene.clj with a new search-query method and multi-dispatch query builder (ast->query); extends src/chrondb/index/protocol.clj with the new search-query protocol method; implements a query planner infrastructure via AST construction and optimization; adds full-text search support with configurable analyzers; implements result caching and index lifecycle management (ensure-index-populated, maintenance tasks); and unifies query execution across REST, Redis, SQL, and CLI through the new AST layer. While some advanced objectives like geospatial and temporal indexes are mentioned in documentation, the core coding requirements for query planning, FTS, caching, and index management are addressed through this implementation. |
| Out of Scope Changes Check | ✅ Passed | All code changes in the PR are directly related to the stated objectives of unifying the query engine across protocols via the new AST system. Changes to SQL parsing (clauses, statements, operators, query execution), Redis API, REST API, CLI, and index implementations all serve the core goal of AST-based query unification. Supporting infrastructure changes to src/chrondb/core.clj for lazy CLI loading and dev/chrondb/native_image.clj for Jackson class registration align with the guidance mentioned in AGENT.md. Background reindexing additions to src/chrondb/cli/server.clj are explicitly part of the PR objectives. No changes appear to introduce unrelated functionality outside the scope of AST query engine unification or its supporting infrastructure. |

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch avelino/issue-39

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +147 to +154
(defmethod ast->query :range
[_ {:keys [field lower upper include-lower? include-upper? value-type]}]
(let [lower (when lower (str lower))
upper (when upper (str upper))]
(case value-type
:long (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
:double (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
(TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?)))))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use numeric range queries for numeric AST clauses

The :range AST handler always builds TermRangeQuery regardless of value-type. Term ranges operate on lexicographic term order and will treat numbers as strings (e.g. 10 sorts before 2), while create-lucene-doc indexes numeric fields with LongPoint/DoublePoint. As a result range-long and range-double queries will either return incorrect results or skip documents entirely because the numeric point fields are never consulted. Consider using LongPoint/newRangeQuery and DoublePoint/newRangeQuery for numeric values so range filtering respects numeric ordering.

Useful? React with 👍 / 👎.

Comment on lines +156 to +164
(defmethod ast->query :exists
[_ {:keys [field]}]
(DocValuesFieldExistsQuery. field))

(defmethod ast->query :missing
[_ {:keys [field]}]
(let [builder (BooleanQuery$Builder.)]
(.add builder (MatchAllDocsQuery.) BooleanClause$Occur/MUST)
(.add builder (DocValuesFieldExistsQuery. field) BooleanClause$Occur/MUST_NOT)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge String fields lack doc values required by :exists and sorting

Both :exists/:missing clauses and sanitize-sort-field rely on doc values (DocValuesFieldExistsQuery, SortField$Type/STRING) to work with string fields, but create-lucene-doc only adds doc values for numeric fields. A string field indexed as a TextField will therefore never satisfy an exists query and attempts to sort on it will throw IllegalStateException: field was indexed without docvalues. Add SortedDocValuesField (or similar) for text fields when storing them so these AST operators behave correctly.

Useful? React with 👍 / 👎.

…ing, and SQL parser bridge

- Adds `fts-condition?` and FTS AST clause detection in SQL query execution, bridging SQL `WHERE x @@ to_tsquery(...)` to FTS search.
- Introduces central AST clause for FTS matching, standardized as `(ast/fts "field" "term")`, with normalization and wildcarding for broad/lucene-aligned search.
- Updates Lucene index search logic to recognize and optimize for FTS AST nodes, preserving analyzer selection, wildcard support, and leading wildcard configuration.
- Ensures protocol consistency and API-parity across SQL, REST, and direct invocation per protocol documentation.
- Backfills tests for FTS AST equation, wildcard and to_tsquery parsing, and SQL-engine roundtrip.
- Documents breaking change and migration: legacy FTS conditions are mapped to AST clause `(ast/fts ...)`; guides users to new query structure.

Signed-off-by: Avelino <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (10)
docs/ast-queries.md (1)

84-85: Consider showing cursor structure example.

The documentation mentions <base64-cursor> but doesn't show what a decoded cursor looks like. Adding a brief example of the cursor map structure ({:doc 123 :score 1.5}) would help developers understand how to construct or debug cursors.

src/chrondb/api/v1.clj (1)

63-64: Consider adding parameter validation.

The limit and offset parsing converts to string before parsing to int. If these are already integers in the params map, this is inefficient. Consider handling both string and numeric inputs:

limit (cond
        (integer? limit) limit
        (string? limit) (Integer/parseInt limit)
        :else nil)
src/chrondb/index/memory.clj (1)

116-126: Consider caching filtered results for total count.

The implementation filters the entire dataset twice: once for the paginated results and once for the total count (line 122). For large datasets, this is inefficient. Consider filtering once and deriving both values:

(let [filtered-docs (filter matches? (.values data))
      total (count filtered-docs)
      docs (->> filtered-docs
                (map :id)
                (drop offset)
                (take limit)
                vec)]
  {:ids docs
   :total total
   :limit limit
   :offset offset})
src/chrondb/api/sql/execution/query.clj (1)

101-156: Remove the unused FTS helper to satisfy lint

fts-get-matching-docs is no longer referenced, and clj-kondo is warning about it. Please delete the helper (or wire it back up) so the lint job runs clean.

src/chrondb/api/v1/routes.clj (5)

49-59: Consider logging EDN parsing errors.

The catch block silently returns nil on parsing failure. For debugging and monitoring, consider logging the error message.

Apply this diff:

       (cond
         (map? query-param) query-param
         (string? query-param) (edn/read-string query-param)
         :else nil)
-      (catch Exception _
+      (catch Exception e
+        (log/log-warn (str "Failed to parse structured query: " (.getMessage e)))
         nil))))

61-75: Consider logging cursor parsing errors.

Similar to parse-structured-query, the catch block silently returns nil. Adding logging would help troubleshoot invalid cursor values.

Apply this diff:

           (contains? parsed :doc)
           (contains? parsed :score))
           parsed))
-      (catch Exception _
+      (catch Exception e
+        (log/log-warn (str "Failed to parse searchAfter cursor: " (.getMessage e)))
         nil))))

103-104: Silent exception handling may hide parsing errors.

Failed integer parsing returns nil, which may lead to unexpected behavior downstream. Consider logging parsing failures or validating the input format earlier.

Apply this diff:

         ;; Parse limit and offset
-        limit-int (when limit (try (Integer/parseInt limit) (catch Exception _ nil)))
-        offset-int (when offset (try (Integer/parseInt offset) (catch Exception _ nil)))
+        limit-int (when limit 
+                   (try 
+                     (Integer/parseInt limit) 
+                     (catch Exception e
+                       (log/log-warn (str "Invalid limit parameter: " limit))
+                       nil)))
+        offset-int (when offset 
+                    (try 
+                      (Integer/parseInt offset) 
+                      (catch Exception e
+                        (log/log-warn (str "Invalid offset parameter: " offset))
+                        nil)))

164-167: Add type hints to avoid reflection on ScoreDoc fields.

Direct field access on ScoreDoc will trigger reflection warnings. Adding type hints improves performance.

Apply this diff:

                         (:next-cursor result)
                         (assoc :next-cursor
                                (let [encoder (java.util.Base64/getEncoder)
-                                     cursor-data {:doc (.-doc ^org.apache.lucene.search.ScoreDoc (:next-cursor result))
-                                                  :score (.-score ^org.apache.lucene.search.ScoreDoc (:next-cursor result))}
+                                     ^org.apache.lucene.search.ScoreDoc score-doc (:next-cursor result)
+                                     cursor-data {:doc (.-doc score-doc)
+                                                  :score (.-score score-doc)}
                                      encoded (String. (.encode encoder (.getBytes (pr-str cursor-data) "UTF-8")) "UTF-8")]
                                  encoded)))]

77-168: Consider extracting helper functions for better modularity.

The handle-search function is quite long (~90 lines) and handles multiple concerns: parameter parsing, query building, execution, and response formatting. Consider extracting helpers like parse-pagination-params, build-ast-from-params, and format-search-response to improve readability and testability.

src/chrondb/index/lucene.clj (1)

63-71: Consider catching NumberFormatException specifically.

Catching Exception is overly broad. Long/parseLong and Double/parseDouble throw NumberFormatException, which would be more precise.

Apply this diff:

 (defn- parse-long-safe [s]
   (try
     (Long/parseLong s)
-    (catch Exception _ nil)))
+    (catch NumberFormatException _ nil)))
 
 (defn- parse-double-safe [s]
   (try
     (Double/parseDouble s)
-    (catch Exception _ nil)))
+    (catch NumberFormatException _ nil)))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbafc0b and f50e3e2.

📒 Files selected for processing (30)
  • AGENT.md (2 hunks)
  • README.md (1 hunks)
  • docs/architecture.md (2 hunks)
  • docs/ast-queries.md (1 hunks)
  • docs/examples-redis.md (1 hunks)
  • docs/index.md (1 hunks)
  • docs/operations.md (1 hunks)
  • docs/performance.md (1 hunks)
  • src/chrondb/api/redis/core.clj (3 hunks)
  • src/chrondb/api/sql/execution/ast_converter.clj (1 hunks)
  • src/chrondb/api/sql/execution/operators.clj (2 hunks)
  • src/chrondb/api/sql/execution/query.clj (11 hunks)
  • src/chrondb/api/sql/parser/clauses.clj (1 hunks)
  • src/chrondb/api/sql/parser/statements.clj (1 hunks)
  • src/chrondb/api/sql/protocol/messages/writer.clj (1 hunks)
  • src/chrondb/api/v1.clj (1 hunks)
  • src/chrondb/api/v1/routes.clj (3 hunks)
  • src/chrondb/cli/core.clj (3 hunks)
  • src/chrondb/cli/http.clj (1 hunks)
  • src/chrondb/cli/server.clj (1 hunks)
  • src/chrondb/example.clj (1 hunks)
  • src/chrondb/index/lucene.clj (3 hunks)
  • src/chrondb/index/memory.clj (1 hunks)
  • src/chrondb/index/protocol.clj (1 hunks)
  • src/chrondb/query/ast.clj (1 hunks)
  • test/chrondb/api/ast_integration_test.clj (1 hunks)
  • test/chrondb/api/sql/execution/query_test.clj (1 hunks)
  • test/chrondb/api/sql/parser/clauses_test.clj (1 hunks)
  • test/chrondb/index/lucene_test.clj (5 hunks)
  • test/chrondb/index/memory_test.clj (3 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
README.md

📄 CodeRabbit inference engine (AGENT.md)

Update README.md for major user-facing changes

Files:

  • README.md
docs/**/*.md

📄 CodeRabbit inference engine (AGENT.md)

Document new endpoints and wire-protocol behaviors in docs/

Files:

  • docs/operations.md
  • docs/architecture.md
  • docs/performance.md
  • docs/index.md
  • docs/ast-queries.md
  • docs/examples-redis.md
docs/operations.md

📄 CodeRabbit inference engine (AGENT.md)

When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Files:

  • docs/operations.md
docs/**

📄 CodeRabbit inference engine (AGENT.md)

Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Files:

  • docs/operations.md
  • docs/architecture.md
  • docs/performance.md
  • docs/index.md
  • docs/ast-queries.md
  • docs/examples-redis.md
AGENT.md

📄 CodeRabbit inference engine (AGENT.md)

Maintain and update this AGENT.md with developer-focused instructions

Files:

  • AGENT.md
src/chrondb/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Files:

  • src/chrondb/index/protocol.clj
  • src/chrondb/api/sql/parser/clauses.clj
  • src/chrondb/cli/http.clj
  • src/chrondb/api/sql/parser/statements.clj
  • src/chrondb/api/sql/execution/operators.clj
  • src/chrondb/example.clj
  • src/chrondb/api/v1.clj
  • src/chrondb/index/lucene.clj
  • src/chrondb/cli/core.clj
  • src/chrondb/api/sql/protocol/messages/writer.clj
  • src/chrondb/api/redis/core.clj
  • src/chrondb/api/sql/execution/ast_converter.clj
  • src/chrondb/cli/server.clj
  • src/chrondb/query/ast.clj
  • src/chrondb/index/memory.clj
  • src/chrondb/api/v1/routes.clj
  • src/chrondb/api/sql/execution/query.clj
**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

**/*.{clj,cljc}: Prefer pure functions and data transformations in Clojure code
Add docstrings for public vars and functions
Follow indentation as enforced by cljfmt
Use descriptive, chronological-operation-aware logging (structured where possible)
Guard shared mutable state; prefer atoms with swap! semantics
Leverage transduce, reduce, or pipelines for efficient data flows

Files:

  • src/chrondb/index/protocol.clj
  • src/chrondb/api/sql/parser/clauses.clj
  • src/chrondb/cli/http.clj
  • src/chrondb/api/sql/parser/statements.clj
  • src/chrondb/api/sql/execution/operators.clj
  • src/chrondb/example.clj
  • src/chrondb/api/v1.clj
  • src/chrondb/index/lucene.clj
  • test/chrondb/index/memory_test.clj
  • test/chrondb/api/sql/execution/query_test.clj
  • src/chrondb/cli/core.clj
  • src/chrondb/api/sql/protocol/messages/writer.clj
  • src/chrondb/api/redis/core.clj
  • test/chrondb/api/sql/parser/clauses_test.clj
  • src/chrondb/api/sql/execution/ast_converter.clj
  • src/chrondb/cli/server.clj
  • src/chrondb/query/ast.clj
  • test/chrondb/index/lucene_test.clj
  • src/chrondb/index/memory.clj
  • test/chrondb/api/ast_integration_test.clj
  • src/chrondb/api/v1/routes.clj
  • src/chrondb/api/sql/execution/query.clj
test/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

test/**/*.{clj,cljc}: Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution
Prefer property-based tests when sequence ordering matters

Files:

  • test/chrondb/index/memory_test.clj
  • test/chrondb/api/sql/execution/query_test.clj
  • test/chrondb/api/sql/parser/clauses_test.clj
  • test/chrondb/index/lucene_test.clj
  • test/chrondb/api/ast_integration_test.clj
docs/examples-*.md

📄 CodeRabbit inference engine (AGENT.md)

Provide runnable examples in docs/examples-*.md when possible

Files:

  • docs/examples-redis.md
🧠 Learnings (12)
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to README.md : Update README.md for major user-facing changes

Applied to files:

  • README.md
  • docs/operations.md
  • AGENT.md
  • docs/architecture.md
  • docs/index.md
  • docs/examples-redis.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/operations.md : When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Applied to files:

  • docs/operations.md
  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Leverage transduce, reduce, or pipelines for efficient data flows

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Prefer pure functions and data transformations in Clojure code

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to AGENT.md : Maintain and update this AGENT.md with developer-focused instructions

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/** : Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/**/*.md : Document new endpoints and wire-protocol behaviors in docs/

Applied to files:

  • AGENT.md
  • docs/ast-queries.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/examples-*.md : Provide runnable examples in docs/examples-*.md when possible

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:40:13.442Z
Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: ChronDB's GitStorage only implements the internal protocol chrondb.storage.protocol/Storage and is not a Closeable. There is no simple way to use with-open or try-with-resources patterns without changing the storage's public API. The storage remains open and in use by the server throughout the process lifetime.

Applied to files:

  • docs/architecture.md
  • src/chrondb/api/redis/core.clj
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to src/chrondb/**/*.{clj,cljc} : Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Applied to files:

  • src/chrondb/example.clj
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution

Applied to files:

  • test/chrondb/index/memory_test.clj
  • test/chrondb/api/sql/parser/clauses_test.clj
  • test/chrondb/api/ast_integration_test.clj
📚 Learning: 2025-10-12T09:40:13.442Z
Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: In ChronDB, the server mode (chrondb server) starts non-daemon threads for Jetty, Redis, and SQL servers that keep the process alive. Calling System/exit after server initialization would terminate the server immediately, breaking the main execution mode. The lifecycle is managed by clojure.main, which properly propagates exceptions with exit code ≠0.

Applied to files:

  • src/chrondb/api/redis/core.clj
  • src/chrondb/cli/server.clj
🪛 GitHub Actions: Coverage
src/chrondb/cli/server.clj

[error] 107-107: Syntax error compiling at (chrondb/cli/server.clj:107:4). No such var: lucene/start-index-maintenance-task

🪛 GitHub Actions: Lint project with clj-kondo
src/chrondb/index/lucene.clj

[warning] 8-8: Unused import SortedDocValuesField


[warning] 10-10: Unused import BooleanQuery


[warning] 12-12: Unused import BytesRef

test/chrondb/api/sql/parser/clauses_test.clj

[warning] 2-2: #'clojure.test/testing' is referred but never used

src/chrondb/cli/server.clj

[warning] 107-107: Unresolved var: lucene/start-index-maintenance-task


[warning] 117-117: Unresolved var: lucene/ensure-index-populated

src/chrondb/query/ast.clj

[warning] 34-34: range already refers to #'clojure.core/range


[warning] 92-92: boolean already refers to #'clojure.core/boolean


[warning] 121-121: and already refers to #'clojure.core/and


[warning] 130-130: or already refers to #'clojure.core/or


[warning] 139-139: not already refers to #'clojure.core/not


[warning] 145-145: sort-by already refers to #'clojure.core/sort-by

src/chrondb/api/sql/execution/query.clj

[warning] 101-101: Unused private var chrondb.api.sql.execution.query/fts-get-matching-docs

🪛 LanguageTool
AGENT.md

[grammar] ~74-~74: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test
  • GitHub Check: Run SQL Protocol Benchmark
  • GitHub Check: test
🔇 Additional comments (38)
docs/index.md (1)

19-19: Lucene capability highlight looks solid.

The new bullet clearly advertises the Lucene-backed search upgrades without overpromising. Nicely aligned with the rest of the docs.

docs/examples-redis.md (1)

116-139: Nice coverage of the new SEARCH verbs.

Showcasing limit/offset, sorting, branch selection, and the FT.SEARCH alias gives users immediate, runnable patterns—exactly what we need for the Lucene-backed flow.

docs/performance.md (1)

47-65: Great operational context for the Lucene migration.

The new section concisely explains why the overhaul matters and gives actionable tuning guidance (metrics, analyze tooling, async staging). This will help operators adopt the changes confidently.

docs/operations.md (1)

220-228: Background reindexing notes are spot on.

Clear explanation of the async Lucene catch-up cycle, metrics, and post-restore guidance—exactly the operational detail we needed to ship with the indexer changes.

AGENT.md (1)

75-99: Guidelines keep the team aligned.

The reminders to funnel search work through the Lucene planner and to sync the doc set when those pieces change will save us from future drift. Thanks for codifying it.

src/chrondb/api/sql/protocol/messages/writer.clj (1)

33-42: Protocol safety improvement looks good.

Normalizing the message type and skipping the leading byte when it’s genuinely absent avoids the old startup edge cases without touching length semantics.

docs/architecture.md (1)

35-35: LGTM! Clear documentation of the Lucene integration.

The updated indexing layer description and the new detailed section effectively communicate the shift to Lucene-backed indexing with secondary indexes, query planning, and caching capabilities.

Also applies to: 47-49

test/chrondb/api/sql/execution/query_test.clj (1)

37-47: LGTM! Mock storage updated to match revised data retrieval shape.

The mock implementation now returns a vector of document values instead of key-value pairs, aligning with the broader Lucene/AST indexing overhaul where the storage layer returns documents directly.

README.md (1)

27-29: LGTM! Well-written user-facing documentation.

The new Lucene Search Overhaul section clearly communicates the major architectural change and its benefits to users. This aligns with the coding guideline to update README.md for major user-facing changes.

src/chrondb/cli/http.clj (1)

120-133: LGTM! Well-structured search function.

The new search-documents function cleanly supports the AST-based search API with optional parameters. The use of cond-> to conditionally build query parameters is idiomatic and handles optional fields correctly.

src/chrondb/api/sql/parser/statements.clj (1)

157-157: No issues found—multi-row INSERT is properly supported.

The parser change from (first values) to values correctly enables multi-row INSERT support. The execution layer's handle-insert-case function (query.clj:581-684) already handles this correctly:

  • Detects multi-row input via (every? sequential? values)
  • Normalizes to a list of rows: values-rows
  • Processes each row with mapv, creating, saving, and indexing documents individually
  • Returns accurate inserted-count for all processed rows

The implementation is complete and correct.

src/chrondb/index/protocol.clj (1)

10-15: Protocol extension looks solid.

The addition of search-query with a clear deprecation notice for the legacy search method follows good API evolution practices. The docstring effectively documents the expected query-map structure and return shape.

src/chrondb/cli/core.clj (2)

61-70: Well-structured CLI command definition.

The search command specification is comprehensive and follows the established patterns in the file. The option set aligns well with the AST query capabilities described in the documentation.


222-231: Clean implementation following established patterns.

The run-search handler follows the same delegation pattern as other commands in the file, maintaining consistency in the codebase.

docs/ast-queries.md (3)

1-68: Comprehensive and well-structured documentation.

The documentation provides an excellent overview of the AST query system with clear examples. The organization from basic concepts to complex usage is logical and developer-friendly.

Based on learnings


156-169: Excellent pagination documentation.

The distinction between offset-based and cursor-based pagination is clearly explained, including when to use each approach. The next-cursor semantics are well-documented.


182-191: Strong protocol consistency guarantees.

The documentation effectively emphasizes that all protocols return identical results, which is a key benefit of the AST-based approach. This aligns well with the PR objectives.

src/chrondb/api/sql/execution/operators.clj (3)

6-19: Solid numeric parsing helper.

The implementation handles the common cases well, with appropriate fallback to nil for non-numeric values. The quote removal is essential for SQL string literals.


28-75: Robust comparison logic with numeric awareness.

The implementation correctly handles nil field values and attempts numeric comparisons when appropriate, falling back to string comparisons. The explicit operator cases for nil values prevent potential null pointer issues.


77-89: Clean simplification of filtering logic.

The refactored implementation is more readable while maintaining correct AND semantics across multiple conditions. The explicit empty check at the start is a good optimization.

test/chrondb/index/memory_test.clj (3)

43-48: Test properly adapted to new API.

The test correctly uses the new query map format and extracts IDs from the result. The assertions verify the expected behavior with string ID comparisons.


50-63: Content field search test looks good.

The test verifies the special "content" field behavior that searches across all document fields, which is an important feature to validate.


73-84: Thorough case-insensitivity testing.

The test validates case-insensitive search across multiple case variations, ensuring the feature works consistently.

src/chrondb/index/memory.clj (3)

97-115: Content field handling is well-implemented.

The special case for content field searching across all document fields is clearly separated and correctly excludes internal fields like :_table. The case-insensitive search aligns with user expectations.


64-65: Clean parameter validation.

The limit and offset validation ensures sensible bounds, preventing negative values and enforcing minimum limits.


47-62: <!-- [scratchpad_start]
[task review_file_1/1 final_analysis]
Complete analysis of wildcard conversion concern

[observations]

  • AST defines 3 distinct types: :term (exact), :wildcard (pattern), :prefix (wildcard convenience)
  • Legacy format conversion creates :wildcard with "" suffix: (str value "")
  • ast.clj has explicit prefix function that does: (wildcard field (str value "*"))
  • Memory test validates legacy format behavior:
    • test-search-by-field: {:field "name" :value "John"} expects 2 results
    • Expected results: "user:1" (John Doe) and "user:3" (John Smith)
    • This PROVES legacy format expected PREFIX matching, not exact match
    • Test would fail if changed to :term (exact match)

[critical_finding]

  • The review comment suggests changing to :term for "exact match"
  • This contradicts test expectations that show legacy format uses PREFIX matching
  • The test explicitly expects "John" to match "John Doe" AND "John Smith"
  • Changing to :term would break this test and existing caller code

[conclusion]
The review comment is factually incorrect about the legacy format semantics. The code correctly preserves backward compatibility. The suggested change would actually introduce a BREAKING CHANGE.

[tag]

Based on my investigation of the codebase and tests, I have found conclusive evidence about the legacy format behavior.

The review comment is incorrect.

The test file test/chrondb/index/memory_test.clj definitively proves that the legacy format {:field "name" :value "John"} was designed for prefix matching, not exact matching. Specifically, test-search-by-field expects the query {:field "name" :value "John"} to return 2 results: both "John Doe" and "John Smith". This confirms the legacy API uses wildcard/prefix semantics.

The current implementation correctly preserves this behavior by converting to {:type :wildcard :field field :value (str value "*")}. The AST module even has a dedicated prefix function that explicitly does the same thing: (wildcard field (str value "*")).

The suggested change to :term (exact match) would actually break backward compatibility and cause the existing tests to fail, as the test expects multiple prefix matches, not an exact match.

Likely an incorrect or invalid review comment.

src/chrondb/api/v1/routes.clj (2)

38-47: LGTM - Well-structured sort parameter parsing.

The function correctly handles the parsing of sort parameters with appropriate fallbacks for missing direction values.


204-205: LGTM - Route correctly updated for new signature.

The search route now passes the full request object, which is necessary for the enhanced handle-search implementation.

src/chrondb/index/lucene.clj (7)

15-21: LGTM - Clean field normalization logic.

The function correctly handles FTS field naming with appropriate guards against double-suffixing.


23-42: LGTM - Reasonable type detection heuristics.

The type inference logic based on field names (age, price, *_id, etc.) is practical for common use cases. The fallback to :string provides a safe default.


44-61: LGTM - Clean sort descriptor conversion.

The functions correctly map AST sort descriptors to Lucene's SortField with appropriate type handling.


73-105: LGTM - Comprehensive numeric field indexing.

The enhanced document creation correctly indexes numeric values using both LongPoint/DoublePoint (for range queries) and NumericDocValuesField/DoubleDocValuesField (for sorting). The FTS field normalization is also properly applied.


107-186: LGTM - Well-structured AST query system.

The multimethod dispatch and dynamic context binding are idiomatic Clojure. The FTS error handling appropriately logs before re-throwing. The boolean query recursion correctly composes nested queries.


188-250: LGTM - Robust query execution with comprehensive error handling.

The query execution logic correctly handles both cursor-based (searchAfter) and offset-based pagination. Error handling includes stack traces for debugging, and result formatting provides all necessary metadata including next-cursor.


363-387: LGTM - Well-integrated search-query implementation.

The protocol implementation correctly binds the Lucene context, constructs the query, and executes with appropriate error handling. The result mapping to IDs aligns with the storage-backed document retrieval pattern used in the routes.

src/chrondb/query/ast.clj (3)

89-90: LGTM - Nil-removing vectorize enables clean composition.

The vectorize helper that removes nils works well with the guard clauses in builder functions (like term, wildcard, fts), allowing optional clauses to naturally compose without cluttering the query structure.


92-119: LGTM - Well-optimized boolean clause composition.

The function correctly optimizes single-clause cases and properly wraps single must-not clauses with match-all to ensure correct boolean semantics. The vectorization of all clause types ensures consistent handling.


156-200: LGTM - Clean query wrapper and augmentation helpers.

The query function and accompanying with-* helpers provide a composable, idiomatic API for building queries. The use of vectorize for optional sort descriptors ensures consistent handling.

Comment on lines +328 to +338
(let [doc-ids (:ids lucene-result)
;; Deduplicate IDs before fetching documents
unique-doc-ids (distinct doc-ids)
docs (filter some? (map #(storage/get-document storage % branch-name) unique-doc-ids))]
(log/log-info (str "Lucene returned " (count doc-ids) " document IDs, " (count unique-doc-ids) " unique, fetched " (count docs) " documents"))
;; Deduplicate by ID to ensure no duplicates
(->> docs
(clojure.core/group-by :id)
(vals)
(map first)))
;; Fallback: JOIN or direct storage lookup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve Lucene ordering when deduplicating docs

The group-by/vals dedup loses any ordering that Lucene just applied, because hash-map iteration order is undefined. As soon as a client requests ORDER BY, the response sequence becomes effectively random. Please dedup while preserving order—for example, walk the returned ids once, track a seen set, and conj documents into a vector in discovery order.

-                         (let [doc-ids (:ids lucene-result)
-                               ;; Deduplicate IDs before fetching documents
-                               unique-doc-ids (distinct doc-ids)
-                               docs (filter some? (map #(storage/get-document storage % branch-name) unique-doc-ids))]
-                           (log/log-info (str "Lucene returned " (count doc-ids) " document IDs, " (count unique-doc-ids) " unique, fetched " (count docs) " documents"))
-                           ;; Deduplicate by ID to ensure no duplicates
-                           (->> docs
-                                (clojure.core/group-by :id)
-                                (vals)
-                                (map first)))
+                         (let [doc-ids (:ids lucene-result)
+                               [seen docs]
+                               (reduce (fn [[seen acc] id]
+                                         (if (seen id)
+                                           [seen acc]
+                                           (if-let [doc (storage/get-document storage id branch-name)]
+                                             [(conj seen id) (conj acc doc)]
+                                             [(conj seen id) acc])))
+                                       [#{} []]
+                                       doc-ids)]
+                           (log/log-info (str "Lucene returned " (count doc-ids) " document IDs, fetched " (count docs) " unique documents"))
+                           docs)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(let [doc-ids (:ids lucene-result)
;; Deduplicate IDs before fetching documents
unique-doc-ids (distinct doc-ids)
docs (filter some? (map #(storage/get-document storage % branch-name) unique-doc-ids))]
(log/log-info (str "Lucene returned " (count doc-ids) " document IDs, " (count unique-doc-ids) " unique, fetched " (count docs) " documents"))
;; Deduplicate by ID to ensure no duplicates
(->> docs
(clojure.core/group-by :id)
(vals)
(map first)))
;; Fallback: JOIN or direct storage lookup
(let [doc-ids (:ids lucene-result)
[seen docs]
(reduce (fn [[seen acc] id]
(if (seen id)
[seen acc]
(if-let [doc (storage/get-document storage id branch-name)]
[(conj seen id) (conj acc doc)]
[(conj seen id) acc])))
[#{} []]
doc-ids)]
(log/log-info (str "Lucene returned " (count doc-ids) " document IDs, fetched " (count docs) " unique documents"))
docs)
;; Fallback: JOIN or direct storage lookup
🤖 Prompt for AI Agents
In src/chrondb/api/sql/execution/query.clj around lines 328-338, the current
dedup via group-by/vals loses Lucene ordering; instead, deduplicate while
preserving Lucene's order by iterating the returned ids (unique-doc-ids or
original doc-ids in their returned order), maintain a mutable or persistent
'seen' set, and build a vector by looking up each id in storage/get-document and
conj-ing the document only if its id hasn't been seen yet; replace the
group-by/vals/map-first sequence with this ordered walk so ordering from Lucene
is preserved while removing duplicates.

Comment on lines +129 to +134
;; Skip IN conditions (not implemented)
(and (>= (count remaining) 2)
(= (str/lower-case (second remaining)) "in"))
(do
(log/log-warn "IN condition parsing not yet implemented, skipping")
(recur (drop-while #(not (constants/LOGICAL_OPERATORS (str/lower-case %))) remaining) conditions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify token advancement for trailing IN condition.

The drop-while advances until a logical operator is found, but if the IN condition is the last clause (no AND/OR follows), this could consume tokens beyond the WHERE clause boundary. Consider checking against the end-index or ensuring the caller's token boundary is respected.

🤖 Prompt for AI Agents
In src/chrondb/api/sql/parser/clauses.clj around lines 129 to 134, the
IN-condition skip uses drop-while which can overrun the WHERE token boundary if
the IN clause is trailing; change the token-advancement to respect the caller's
token boundary by stopping when you reach the provided end-index (or the end of
the tokens slice) — i.e. replace the drop-while with logic that advances until
either a logical operator is found OR the current index >= end-index, and ensure
you return the remaining tokens correctly (empty or slice starting at end-index)
so tokens after the WHERE boundary are not consumed.

Comment on lines +136 to +146
;; Skip BETWEEN conditions (not implemented)
(and (>= (count remaining) 2)
(= (str/lower-case (second remaining)) "between"))
(do
(log/log-warn "BETWEEN condition parsing not yet implemented, skipping")
(let [rest-tokens (drop 2 remaining)
after-between (drop-while (fn [tok]
(and tok
(not (constants/LOGICAL_OPERATORS (str/lower-case tok)))))
rest-tokens)]
(recur after-between conditions)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Similar edge case concern for trailing BETWEEN.

The token advancement logic mirrors the IN case and has the same potential issue with trailing conditions. The implementation correctly skips past the BETWEEN keyword and its arguments, but boundary checking against end-index would make this more robust.

🤖 Prompt for AI Agents
In src/chrondb/api/sql/parser/clauses.clj around lines 136 to 146, the BETWEEN
branch advances tokens like the IN case and can overrun when BETWEEN is
trailing; add explicit boundary checks when inspecting (second remaining) and
when computing rest-tokens/after-between so you never read past end-index (e.g.
verify remaining has at least two tokens before treating second as "between",
limit the drop/drop-while to end-index or use indexed iteration/take to stop at
the parser end), and return/recur safely when the token stream ends rather than
assuming more tokens exist.

Comment on lines +148 to +153
;; Skip NULL comparisons (not implemented)
(and (>= (count remaining) 3)
(= (str/lower-case (nth remaining 2)) "null"))
(do
(log/log-warn "NULL comparison parsing not yet implemented, skipping")
(recur (drop 3 remaining) conditions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

NULL comparison handling may miss IS NOT NULL.

The code checks for "null" at the third position and drops 3 tokens, which works for patterns like "field IS NULL". However, "field IS NOT NULL" is a 4-token pattern that won't be caught by this check (NULL would be at position 3, not 2). This could lead to incorrect parsing of IS NOT NULL comparisons.

Consider updating the detection to check for both patterns:

-              ;; Skip NULL comparisons (not implemented)
-              (and (>= (count remaining) 3)
-                   (= (str/lower-case (nth remaining 2)) "null"))
+              ;; Skip NULL comparisons (not implemented)
+              (and (>= (count remaining) 3)
+                   (or (= (str/lower-case (nth remaining 2)) "null")
+                       (and (>= (count remaining) 4)
+                            (= (str/lower-case (nth remaining 2)) "not")
+                            (= (str/lower-case (nth remaining 3)) "null"))))
               (do
                 (log/log-warn "NULL comparison parsing not yet implemented, skipping")
-                (recur (drop 3 remaining) conditions))
+                (let [tokens-to-drop (if (= (str/lower-case (nth remaining 2)) "not") 4 3)]
+                  (recur (drop tokens-to-drop remaining) conditions)))
🤖 Prompt for AI Agents
In src/chrondb/api/sql/parser/clauses.clj around lines 148 to 153, the current
check only skips a 3-token "IS NULL" pattern and misses the 4-token "IS NOT
NULL" pattern; update the detection to handle both: inspect the next tokens
case-insensitively and if you see ["is" "null"] drop 3 tokens, and if you see
["is" "not" "null"] drop 4 tokens, logging the skip in either case and then
recur with the appropriately dropped remaining tokens so both "IS NULL" and "IS
NOT NULL" are skipped correctly.

Comment on lines +65 to +72
sort (cond
(string? sort) (map (fn [spec]
(let [[field dir] (-> spec str/trim (str/split #":"))]
{:field (str/trim field)
:direction (if (= "desc" (str/lower-case (or dir ""))) :desc :asc)}))
(str/split sort #","))
(sequential? sort) sort
:else nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sort parsing could be more robust.

The sort normalization splits on colons but doesn't validate the field names or handle malformed input (e.g., "field:" or ":asc"). Consider adding validation or error handling for invalid sort specifications.

🤖 Prompt for AI Agents
In src/chrondb/api/v1.clj around lines 65-72, the sort parsing accepts malformed
specs like "field:" or ":asc" without validation; update the parsing to validate
each spec: trim and split as now, require the field part to be non-empty and
match an allowed pattern (e.g. /^[A-Za-z0-9_.-]+$/), normalize the direction to
either :asc or :desc (default to :asc) only when dir is "asc" or "desc"
(case-insensitive), and if a spec is invalid either skip it or raise a clear
IllegalArgumentException including the original spec — implement one consistent
behavior across the function and ensure the resulting sort is either a filtered
vector of valid map entries or the function fails fast with a descriptive error
message.

Comment on lines 8 to 12
[org.apache.lucene.document Document StringField TextField Field$Store SortedDocValuesField LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField]
[org.apache.lucene.index IndexWriter IndexWriterConfig IndexWriterConfig$OpenMode DirectoryReader Term IndexNotFoundException]
[org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs]
[org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery]
[org.apache.lucene.queryparser.classic QueryParser ParseException]
[org.apache.lucene.util BytesRef]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove unused imports.

The linter flags SortedDocValuesField, BooleanQuery, and BytesRef as unused. Clean imports improve code maintainability.

Based on learnings

Apply this diff:

-           [org.apache.lucene.document Document StringField TextField Field$Store SortedDocValuesField LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField]
+           [org.apache.lucene.document Document StringField TextField Field$Store LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField]
            [org.apache.lucene.index IndexWriter IndexWriterConfig IndexWriterConfig$OpenMode DirectoryReader Term IndexNotFoundException]
-           [org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery]
+           [org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery]
            [org.apache.lucene.queryparser.classic QueryParser ParseException]
-           [org.apache.lucene.util BytesRef]
            [java.nio.file Paths]))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[org.apache.lucene.document Document StringField TextField Field$Store SortedDocValuesField LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField]
[org.apache.lucene.index IndexWriter IndexWriterConfig IndexWriterConfig$OpenMode DirectoryReader Term IndexNotFoundException]
[org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs]
[org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery]
[org.apache.lucene.queryparser.classic QueryParser ParseException]
[org.apache.lucene.util BytesRef]
[org.apache.lucene.document Document StringField TextField Field$Store LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField]
[org.apache.lucene.index IndexWriter IndexWriterConfig IndexWriterConfig$OpenMode DirectoryReader Term IndexNotFoundException]
[org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery]
[org.apache.lucene.queryparser.classic QueryParser ParseException]
[java.nio.file Paths]))
🧰 Tools
🪛 GitHub Actions: Lint project with clj-kondo

[warning] 8-8: Unused import SortedDocValuesField


[warning] 10-10: Unused import BooleanQuery


[warning] 12-12: Unused import BytesRef

🤖 Prompt for AI Agents
In src/chrondb/index/lucene.clj around lines 8 to 12, the import lists include
unused symbols SortedDocValuesField, BooleanQuery, and BytesRef; remove those
three symbols from the respective import vectors so the file only imports
actually used Lucene classes, then run the linter to confirm no remaining
unused-import warnings.

Comment on lines +66 to +96
matcher (fn matcher [doc clause]
(let [{:keys [type field value lower upper include-lower? include-upper? must should must-not filter]} clause]
(case type
:match-all true
:term (= (get doc (keyword field)) value)
:wildcard (let [pattern (-> value str/lower-case (str/replace "*" ".*") (str/replace "?" "."))
regex (re-pattern pattern)
v (some-> (get doc (keyword field)) str str/lower-case)]
(boolean (and v (re-matches regex v))))
:range (let [doc-val (get doc (keyword field))]
(when doc-val
(let [comp (compare (str doc-val) (str (or lower doc-val)))
comp-upper (compare (str doc-val) (str (or upper doc-val)))]
(and (if lower
(if include-lower?
(>= comp 0)
(> comp 0))
true)
(if upper
(if include-upper?
(<= comp-upper 0)
(< comp-upper 0))
true)))))
:boolean (let [must (every? #(matcher doc %) must)
should (if (seq should)
(some #(matcher doc %) should)
true)
must-not (every? #(not (matcher doc %)) must-not)
filter-ok (every? #(matcher doc %) filter)]
(and must should must-not filter-ok))
false)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Range comparison and term matching inconsistencies.

Two concerns with the matcher implementation:

  1. Range comparisons use string comparison (line 77-78), which won't work correctly for numeric ranges. For example, "9" > "10" in string comparison. Consider checking if values are numeric and comparing numerically.

  2. Inconsistent case sensitivity: wildcard matching is case-insensitive (line 73 str/lower-case), but term matching is case-sensitive (line 70). This could lead to unexpected behavior where :term and :wildcard queries behave differently.

Consider this improvement for numeric range handling:

:range (let [doc-val (get doc (keyword field))]
         (when doc-val
           (let [doc-num (or (try-parse-number doc-val) doc-val)
                 lower-num (when lower (or (try-parse-number lower) lower))
                 upper-num (when upper (or (try-parse-number upper) upper))
                 comp (compare doc-num (or lower-num doc-num))
                 comp-upper (compare doc-num (or upper-num doc-num))]
             ...)))

(when value
(wildcard field (str value "*"))))

(defn range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rename functions to avoid shadowing clojure.core.

Six functions shadow core Clojure functions: range, boolean, and, or, not, and sort-by. This creates ambiguity and potential bugs, especially visible at line 143 where or is called within the not function.

Consider prefixing with q- or using -clause suffix (e.g., range-clause, q-and, q-or, q-not, bool-clause, sort-descriptor).

Based on learnings

Example renaming:

-(defn range
+(defn range-clause
   "Creates a range clause. Provide nil for open bounds.
    For inclusive/exclusive bounds use the options map.
    Supports value-type option: :long, :double, :string (default)."
   ([field lower upper]
-   (range field lower upper {:include-lower? true :include-upper? true}))
+   (range-clause field lower upper {:include-lower? true :include-upper? true}))
   ([field lower upper {:keys [include-lower? include-upper? type]
                        :or {include-lower? true
                             include-upper? true}}]
    (when field
      {:type :range
       :field (name field)
       :lower (when (some? lower) (str lower))
       :upper (when (some? upper) (str upper))
       :include-lower? (boolean include-lower?)
       :include-upper? (boolean include-upper?)
       :value-type type})))

-(defn boolean
+(defn bool-clause
   "Generic boolean node."
   [{:keys [must should must-not filter]}]
   ...)

-(defn and
+(defn q-and
   "Combines clauses using logical AND."
   [& clauses]
   ...)

-(defn or
+(defn q-or
   "Combines clauses using logical OR."
   [& clauses]
   ...)

-(defn not
+(defn q-not
   "Negates a clause."
   [clause]
-  (boolean {:must [(match-all)]
-            :must-not [(or clause (match-all))]}))
+  (bool-clause {:must [(match-all)]
+                :must-not [(q-or clause (match-all))]}))

-(defn sort-by
+(defn sort-descriptor
   "Creates a sort descriptor. Optionally specify :direction and :type (:string, :long, :double, :doc, :score)."
   ([field]
-   (sort-by field :asc))
+   (sort-descriptor field :asc))
   ([field direction]
-   (sort-by field direction :string))
+   (sort-descriptor field direction :string))
   ([field direction type]
    {:field (name field)
     :direction (keyword direction)
     :type (keyword type)}))

Also applies to: 92-92, 121-121, 130-130, 139-139, 145-145

🧰 Tools
🪛 GitHub Actions: Lint project with clj-kondo

[warning] 34-34: range already refers to #'clojure.core/range

@@ -0,0 +1,47 @@
(ns chrondb.api.sql.parser.clauses-test
(:require [clojure.test :refer [deftest is testing]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused import.

The testing macro is imported but never used in any of the test definitions, as flagged by the linter.

Apply this diff to remove the unused import:

-(ns chrondb.api.sql.parser.clauses-test
-  (:require [clojure.test :refer [deftest is testing]]
+(ns chrondb.api.sql.parser.clauses-test
+  (:require [clojure.test :refer [deftest is]]
             [chrondb.api.sql.parser.tokenizer :as tokenizer]
             [chrondb.api.sql.parser.clauses :as clauses]))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(:require [clojure.test :refer [deftest is testing]]
(ns chrondb.api.sql.parser.clauses-test
(:require [clojure.test :refer [deftest is]]
[chrondb.api.sql.parser.tokenizer :as tokenizer]
[chrondb.api.sql.parser.clauses :as clauses]))
🧰 Tools
🪛 GitHub Actions: Lint project with clj-kondo

[warning] 2-2: #'clojure.test/testing' is referred but never used

🤖 Prompt for AI Agents
In test/chrondb/api/sql/parser/clauses_test.clj around line 2, the require
vector imports the testing macro but it is never used; remove testing from the
(:require [clojure.test :refer [...]]) list so only used symbols (deftest and
is) are referred, updating the require form accordingly to avoid the linter
warning about an unused import.

Comment on lines +27 to 38
(let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (= 1 (count results)))
(is (= "user:1" (first results))))))

(testing "Remove document"
(let [doc {:id "user:1" :name "John Doe" :age 30 :email "[email protected]"}]
(index/index-document *test-index* doc)
(let [before-remove (index/search *test-index* "name" "John" "main")]
(let [before-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (= 1 (count before-remove)))
(index/delete-document *test-index* (:id doc))
(let [after-remove (index/search *test-index* "name" "John" "main")]
(let [after-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (empty? after-remove)))))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update tests for map-shaped search results

index/search-query now returns a result map (e.g. {:ids [...], :total ...}), but the tests still treat it as a plain seq. (count results) and (first results) will now operate on the map entries instead of the document ids, so these assertions will fail. Please destructure (:ids ...) (and adjust the rest of the file in the same way) before making assertions. For example:

-      (let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
-        (is (= 1 (count results)))
-        (is (= "user:1" (first results)))))
+      (let [{:keys [ids]} (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
+        (is (= 1 (count ids)))
+        (is (= "user:1" (first ids))))))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In test/chrondb/index/lucene_test.clj around lines 27-38, the tests treat the
return value of index/search-query as a seq but it now returns a map {:ids
[...], :total ...}; update the lets that capture results, before-remove and
after-remove to destructure the map (e.g. bind ids and total from the map), then
change assertions to operate on ids (use (count ids), (first ids), and (empty?
ids)) and optionally assert :total if needed so the assertions check document
ids rather than map entries.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

♻️ Duplicate comments (2)
src/chrondb/index/lucene.clj (2)

85-105: Add doc values for sortable/filterable string fields.

String clauses (:exists, sorting) still fail because create-lucene-doc never writes doc values for non-numeric fields. During sorting Lucene throws IllegalStateException: field was indexed without docvalues, and DocValuesFieldExistsQuery always reports “missing”, so :exists/:missing are broken. Please add a SortedDocValuesField (or equivalent) whenever the field isn’t indexed as numeric, importing org.apache.lucene.document.SortedDocValuesField and org.apache.lucene.util.BytesRef. Example fix:

-          (.add ldoc (TextField. field-name field-value Field$Store/YES))
-
-          (cond
+          (.add ldoc (TextField. field-name field-value Field$Store/YES))
+
+          (cond
             long-val
             (let [long-arr (long-array [long-val])]
               (.add ldoc (LongPoint. field-name long-arr))
               (.add ldoc (NumericDocValuesField. field-name long-val)))
 
             double-val
             (let [double-arr (double-array [double-val])]
               (.add ldoc (DoublePoint. field-name double-arr))
               (.add ldoc (DoubleDocValuesField. field-name double-val)))
+            :else
+            (.add ldoc (SortedDocValuesField. field-name (BytesRef. field-value))))

This restores consistent behaviour for :exists, :missing, and string sorts.


147-155: Use numeric range queries for numeric value types.

Numeric range clauses still build TermRangeQuery, so numbers are compared lexicographically (“10” < “2”) and point fields (LongPoint, DoublePoint) are ignored. :range-long/:range-double therefore return wrong or empty results. Please switch to the point range APIs (e.g. LongPoint/newRangeQuery, DoublePoint/newRangeQuery) using parse-long-safe/parse-double-safe and adjust for exclusive bounds via inc/dec / Math/nextUp / Math/nextDown. Fall back to TermRangeQuery only when parsing fails. Example sketch:

-    (case value-type
-      :long (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
-      :double (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
-      (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?)))))
+    (case value-type
+      :long (let [lo (parse-long-safe lower)
+                  hi (parse-long-safe upper)]
+              (if (and (nil? lo) (nil? hi))
+                (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
+                (LongPoint/newRangeQuery field
+                                         (if lo (if include-lower? lo (min Long/MAX_VALUE (inc lo))) Long/MIN_VALUE)
+                                         (if hi (if include-upper? hi (max Long/MIN_VALUE (dec hi))) Long/MAX_VALUE))))
+      :double (let [lo (parse-double-safe lower)
+                    hi (parse-double-safe upper)]
+                (if (and (nil? lo) (nil? hi))
+                  (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
+                  (DoublePoint/newRangeQuery field
+                                             (if lo (if include-lower? lo (Math/nextUp lo)) Double/NEGATIVE_INFINITY)
+                                             (if hi (if include-upper? hi (Math/nextDown hi)) Double/POSITIVE_INFINITY))))
+      (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?)))))

This aligns query behaviour with how documents are indexed.

🧹 Nitpick comments (6)
docs/operations.md (1)

220-228: Well-documented operational guidance for background reindexing.

The section clearly explains the non-blocking reindexing behavior and provides actionable observability guidance (specific log namespaces and metric prefixes). The recommendation to monitor after restores/imports is particularly valuable for production operators.

Minor suggestion: Consider documenting whether the hourly maintenance interval is configurable, or if it should be for production environments with varying load patterns.

README.md (1)

27-30: Consider adding direct documentation links.

The section clearly describes the Lucene upgrade motivation and benefits. To improve discoverability, consider adding direct links to the relevant documentation sections mentioned (e.g., docs/ast-queries.md, docs/architecture.md, or configuration guides).

Apply this diff to add helpful links:

-ChronDB's indexing layer now runs entirely on Apache Lucene, replacing the previous bespoke search structures. The upgrade brings production-grade analyzers, configurable secondary and composite indexes, geospatial capabilities, and a query planner with smart caching. The work was motivated by real-world workloads that demanded lower latency and richer search semantics powered by a proven search engine. Dive into the updated documentation to learn how to enable and tune the new indexes for your applications.
+ChronDB's indexing layer now runs entirely on Apache Lucene, replacing the previous bespoke search structures. The upgrade brings production-grade analyzers, configurable secondary and composite indexes, geospatial capabilities, and a query planner with smart caching. The work was motivated by real-world workloads that demanded lower latency and richer search semantics powered by a proven search engine. See [AST Queries](docs/ast-queries.md), [Architecture](docs/architecture.md), and [Performance](docs/performance.md) documentation to learn how to enable and tune the new indexes for your applications.
src/chrondb/cli/server.clj (1)

115-117: Add error handling for background index population.

Background index population may fail silently. While it's appropriate to run asynchronously, consider adding error handling or logging to track completion status.

Apply this diff to improve observability:

       _ (println "Lucene index created successfully: " (type index))
-      ;; Ensure index is populated with existing documents (in background)
-      _ (lucene/ensure-index-populated index storage "main")]
+      ;; Ensure index is populated with existing documents (in background)
+      _ (try
+          (lucene/ensure-index-populated index storage "main")
+          (println "Background index population initiated")
+          (catch Exception e
+            (println "Warning: Background index population failed:" (.getMessage e))))]
AGENT.md (1)

74-74: Add missing punctuation.

The line is missing a period at the end.

Apply this diff:

-6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer
+6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.

Based on static analysis hints.

src/chrondb/index/protocol.clj (1)

10-15: Consider adding deprecation timeline.

The protocol evolution is well-designed: the legacy search method is retained with a deprecation notice while the new search-query method is introduced with comprehensive documentation. Consider adding a timeline or version number for when the legacy method will be removed to help downstream consumers plan their migrations.

docs/ast-queries.md (1)

88-89: Clarify URL encoding for structured queries.

The example shows an EDN map in a query parameter, but in practice this would need URL encoding. Consider adding a note or showing the encoded form to prevent confusion.

Apply this diff to add clarification:

 # Structured AST query (EDN)
-GET /api/v1/search?query={:clauses [{:type :fts :field "content" :value "Software"}]}
+GET /api/v1/search?query=%7B%3Aclauses%20%5B%7B%3Atype%20%3Afts%20%3Afield%20%22content%22%20%3Avalue%20%22Software%22%7D%5D%7D
+# (URL-encoded EDN map: {:clauses [{:type :fts :field "content" :value "Software"}]})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbafc0b and 8bd4c0f.

📒 Files selected for processing (30)
  • AGENT.md (2 hunks)
  • README.md (1 hunks)
  • docs/architecture.md (2 hunks)
  • docs/ast-queries.md (1 hunks)
  • docs/examples-redis.md (1 hunks)
  • docs/index.md (1 hunks)
  • docs/operations.md (1 hunks)
  • docs/performance.md (1 hunks)
  • src/chrondb/api/redis/core.clj (3 hunks)
  • src/chrondb/api/sql/execution/ast_converter.clj (1 hunks)
  • src/chrondb/api/sql/execution/operators.clj (2 hunks)
  • src/chrondb/api/sql/execution/query.clj (12 hunks)
  • src/chrondb/api/sql/parser/clauses.clj (1 hunks)
  • src/chrondb/api/sql/parser/statements.clj (1 hunks)
  • src/chrondb/api/sql/protocol/messages/writer.clj (1 hunks)
  • src/chrondb/api/v1.clj (1 hunks)
  • src/chrondb/api/v1/routes.clj (3 hunks)
  • src/chrondb/cli/core.clj (3 hunks)
  • src/chrondb/cli/http.clj (1 hunks)
  • src/chrondb/cli/server.clj (1 hunks)
  • src/chrondb/example.clj (1 hunks)
  • src/chrondb/index/lucene.clj (4 hunks)
  • src/chrondb/index/memory.clj (1 hunks)
  • src/chrondb/index/protocol.clj (1 hunks)
  • src/chrondb/query/ast.clj (1 hunks)
  • test/chrondb/api/ast_integration_test.clj (1 hunks)
  • test/chrondb/api/sql/execution/query_test.clj (1 hunks)
  • test/chrondb/api/sql/parser/clauses_test.clj (1 hunks)
  • test/chrondb/index/lucene_test.clj (5 hunks)
  • test/chrondb/index/memory_test.clj (3 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
docs/**/*.md

📄 CodeRabbit inference engine (AGENT.md)

Document new endpoints and wire-protocol behaviors in docs/

Files:

  • docs/performance.md
  • docs/architecture.md
  • docs/examples-redis.md
  • docs/ast-queries.md
  • docs/index.md
  • docs/operations.md
docs/**

📄 CodeRabbit inference engine (AGENT.md)

Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Files:

  • docs/performance.md
  • docs/architecture.md
  • docs/examples-redis.md
  • docs/ast-queries.md
  • docs/index.md
  • docs/operations.md
AGENT.md

📄 CodeRabbit inference engine (AGENT.md)

Maintain and update this AGENT.md with developer-focused instructions

Files:

  • AGENT.md
src/chrondb/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Files:

  • src/chrondb/api/sql/parser/clauses.clj
  • src/chrondb/api/sql/protocol/messages/writer.clj
  • src/chrondb/cli/http.clj
  • src/chrondb/index/memory.clj
  • src/chrondb/api/sql/execution/ast_converter.clj
  • src/chrondb/api/redis/core.clj
  • src/chrondb/api/v1.clj
  • src/chrondb/api/sql/execution/operators.clj
  • src/chrondb/example.clj
  • src/chrondb/index/lucene.clj
  • src/chrondb/cli/server.clj
  • src/chrondb/query/ast.clj
  • src/chrondb/cli/core.clj
  • src/chrondb/index/protocol.clj
  • src/chrondb/api/sql/parser/statements.clj
  • src/chrondb/api/v1/routes.clj
  • src/chrondb/api/sql/execution/query.clj
**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

**/*.{clj,cljc}: Prefer pure functions and data transformations in Clojure code
Add docstrings for public vars and functions
Follow indentation as enforced by cljfmt
Use descriptive, chronological-operation-aware logging (structured where possible)
Guard shared mutable state; prefer atoms with swap! semantics
Leverage transduce, reduce, or pipelines for efficient data flows

Files:

  • src/chrondb/api/sql/parser/clauses.clj
  • src/chrondb/api/sql/protocol/messages/writer.clj
  • src/chrondb/cli/http.clj
  • src/chrondb/index/memory.clj
  • test/chrondb/index/lucene_test.clj
  • src/chrondb/api/sql/execution/ast_converter.clj
  • src/chrondb/api/redis/core.clj
  • src/chrondb/api/v1.clj
  • src/chrondb/api/sql/execution/operators.clj
  • src/chrondb/example.clj
  • test/chrondb/api/sql/parser/clauses_test.clj
  • test/chrondb/api/ast_integration_test.clj
  • src/chrondb/index/lucene.clj
  • test/chrondb/index/memory_test.clj
  • src/chrondb/cli/server.clj
  • test/chrondb/api/sql/execution/query_test.clj
  • src/chrondb/query/ast.clj
  • src/chrondb/cli/core.clj
  • src/chrondb/index/protocol.clj
  • src/chrondb/api/sql/parser/statements.clj
  • src/chrondb/api/v1/routes.clj
  • src/chrondb/api/sql/execution/query.clj
docs/examples-*.md

📄 CodeRabbit inference engine (AGENT.md)

Provide runnable examples in docs/examples-*.md when possible

Files:

  • docs/examples-redis.md
test/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

test/**/*.{clj,cljc}: Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution
Prefer property-based tests when sequence ordering matters

Files:

  • test/chrondb/index/lucene_test.clj
  • test/chrondb/api/sql/parser/clauses_test.clj
  • test/chrondb/api/ast_integration_test.clj
  • test/chrondb/index/memory_test.clj
  • test/chrondb/api/sql/execution/query_test.clj
README.md

📄 CodeRabbit inference engine (AGENT.md)

Update README.md for major user-facing changes

Files:

  • README.md
docs/operations.md

📄 CodeRabbit inference engine (AGENT.md)

When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Files:

  • docs/operations.md
🧠 Learnings (12)
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to README.md : Update README.md for major user-facing changes

Applied to files:

  • docs/performance.md
  • AGENT.md
  • docs/index.md
  • README.md
  • docs/operations.md
📚 Learning: 2025-10-12T09:40:13.442Z
Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: ChronDB's GitStorage only implements the internal protocol chrondb.storage.protocol/Storage and is not a Closeable. There is no simple way to use with-open or try-with-resources patterns without changing the storage's public API. The storage remains open and in use by the server throughout the process lifetime.

Applied to files:

  • docs/architecture.md
  • src/chrondb/api/redis/core.clj
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Leverage transduce, reduce, or pipelines for efficient data flows

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Prefer pure functions and data transformations in Clojure code

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to AGENT.md : Maintain and update this AGENT.md with developer-focused instructions

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/** : Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/operations.md : When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Applied to files:

  • AGENT.md
  • docs/operations.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/**/*.md : Document new endpoints and wire-protocol behaviors in docs/

Applied to files:

  • AGENT.md
  • docs/ast-queries.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/examples-*.md : Provide runnable examples in docs/examples-*.md when possible

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to src/chrondb/**/*.{clj,cljc} : Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Applied to files:

  • src/chrondb/api/sql/execution/ast_converter.clj
  • test/chrondb/api/sql/execution/query_test.clj
📚 Learning: 2025-10-12T09:40:13.442Z
Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: In ChronDB, the server mode (chrondb server) starts non-daemon threads for Jetty, Redis, and SQL servers that keep the process alive. Calling System/exit after server initialization would terminate the server immediately, breaking the main execution mode. The lifecycle is managed by clojure.main, which properly propagates exceptions with exit code ≠0.

Applied to files:

  • src/chrondb/api/redis/core.clj
  • src/chrondb/cli/server.clj
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution

Applied to files:

  • test/chrondb/api/sql/parser/clauses_test.clj
  • test/chrondb/api/ast_integration_test.clj
  • test/chrondb/index/memory_test.clj
🪛 LanguageTool
AGENT.md

[grammar] ~74-~74: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run SQL Protocol Benchmark
🔇 Additional comments (10)
src/chrondb/api/sql/parser/statements.clj (1)

157-157: LGTM! Multi-row INSERT support correctly implemented.

The change from (first values) to values correctly enables multi-row INSERT statements by returning all parsed value groups rather than just the first one. This aligns with the execution layer enhancements mentioned in the summary.

docs/examples-redis.md (1)

116-139: Excellent documentation addition for AST-based search.

The examples clearly demonstrate the new Lucene-backed search capabilities with progressive complexity (basic search → pagination → sorting → branch-specific → combined options). The inclusion of the FT.SEARCH alias for RediSearch compatibility is a nice touch for users migrating from Redis.

src/chrondb/api/sql/protocol/messages/writer.clj (1)

32-42: Robust type normalization for protocol messages.

The enhanced type handling correctly addresses multiple representations (nil, numbers, Bytes) and safely converts them to int where needed. The nil check at line 41 properly prevents writing a type byte for startup messages, which aligns with PostgreSQL protocol requirements.

The Byte-to-unsigned-int conversion using bit-and is standard practice for protocol implementations.

test/chrondb/api/sql/parser/clauses_test.clj (1)

1-47: Good test coverage for current parser capabilities.

The tests appropriately cover both implemented features (standard conditions, LIKE, FTS_MATCH, to_tsquery, multiple conditions) and document unimplemented features (BETWEEN, IN, NULL comparisons) by expecting empty results.

The helper function provides a clean abstraction, and the test cases validate both the structure and content of parsed conditions.

Note: These tests validate the current skip behavior for unimplemented SQL constructs. If the parser is enhanced to return unsupported-condition markers (as suggested in the clauses.clj review), these tests should be updated accordingly.

docs/index.md (1)

19-19: LGTM! Clear feature description.

The bullet point concisely describes the new Lucene-powered search capability and aligns with the detailed documentation added in other files (architecture.md, performance.md, examples-redis.md).

src/chrondb/example.clj (1)

26-27: LGTM!

The migration to index/search-query with map-based query parameters aligns with the new AST-based search architecture. The hardcoded limit of 100 and "main" branch are appropriate for example code.

AGENT.md (1)

98-98: LGTM!

Clear guidance for maintaining documentation consistency when modifying Lucene indexing, query planning, or analyzers. The list of documentation files to update is comprehensive.

src/chrondb/cli/http.clj (1)

120-133: LGTM!

The search-documents function follows the established pattern in this file: clear docstring, destructured parameters, and conditional query parameter building with cond->. The implementation is clean and consistent.

test/chrondb/api/sql/execution/query_test.clj (1)

38-47: LGTM!

The change to return a vector of documents instead of a filtered map is more intuitive for test assertions. The transformation is consistently applied across both arities and aligns with the existing get-documents-by-table pattern in the same mock storage.

docs/ast-queries.md (1)

1-220: Excellent documentation for the AST query system!

This documentation is comprehensive, well-structured, and provides practical examples for all protocols. The migration notes, pagination strategies, and type safety sections will be especially helpful for users adopting the new system. The content aligns well with the coding guideline to document new endpoints and wire-protocol behaviors.

Comment on lines +47 to +65
## Lucene Indexing Overhaul

ChronDB's search layer now leans entirely on Apache Lucene, bringing a substantial performance uplift compared to the earlier bespoke indexes. The rework was driven by production users that needed predictable latency under complex filters and full-text workloads. By adopting Lucene's optimized query execution engine, ChronDB now delegates scoring, boolean logic, and range evaluation to proven algorithms instead of reimplementing them in-house.

### Why it matters

- **Lower tail latencies**: Composite and secondary indexes can now be configured per collection, reducing random I/O and eliminating manual fan-out scans.
- **Specialized analyzers**: Tokenization, stemming, and language-specific analyzers are first-class Lucene features and drastically improve relevance quality for full-text search scenarios.
- **Query planner**: ChronDB inspects incoming queries to choose efficient index combinations, avoiding redundant segment scans and materializing only the minimal result set.
- **Result caching**: Frequently used queries populate an eviction-aware cache, trimming recurring response times and protecting the repository from repeated heavy traversals.
- **Geospatial acceleration**: Geohash and BKD tree indexes leverage Lucene's spatial extensions, enabling fast proximity queries without bespoke data structures.

### Operational guidance

- Configure index definitions alongside schema creation so that ChronDB can warm caches and statistics proactively.
- Monitor the new index metrics under `chrondb.index.*` to track cache hit rates, planner fallbacks, and segment merge costs.
- Use the `ANALYZE INDEX` tooling to recompute query statistics when workloads shift—ChronDB will use the new data to refine its execution plans.
- When adding batch ingestion jobs, stage writes behind the asynchronous indexer to prevent cache stampedes and leverage Lucene's bulk segment writers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Comprehensive performance documentation for Lucene migration.

The section effectively communicates the rationale and benefits of the Lucene overhaul with appropriate technical depth. The operational guidance is specific and actionable, particularly the cache warming and monitoring recommendations.

Verify the ANALYZE INDEX command.

Line 63 references "ANALYZE INDEX tooling" for recomputing query statistics. Please confirm this command is implemented in this PR or documented as future work, to avoid referencing unavailable features.


🏁 Script executed:

#!/bin/bash
# Search for ANALYZE INDEX implementation or documentation
rg -i "analyze\s+index" --type clj
rg -i "analyze-index" --type clj

Length of output: 55


🏁 Script executed:

#!/bin/bash
# Search for ANALYZE INDEX without type restrictions
echo "=== Searching for ANALYZE INDEX patterns ==="
rg -i "analyze.?index" --max-count=20

echo "=== Checking file types in repo ==="
fd -t f --extension='' | head -20

echo "=== Searching in docs/ for ANALYZE INDEX ==="
rg -i "analyze.?index" docs/

echo "=== Searching for command registrations ==="
rg -i "command|query.*analyze" -A2 -B2 | head -40

Length of output: 2939


🏁 Script executed:

#!/bin/bash
# Broader search for analyze patterns in source code
echo "=== Search for 'analyze' in source files ==="
rg -i "analyze" src/ -A2 -B2 | head -50

echo "=== Search for query stats/statistics features ==="
rg -i "query.*stat|statistics" src/ --type clj | head -20

echo "=== Search for TODOs or FIXMEs mentioning index or analyze ==="
rg -i "todo|fixme.*index|todo.*analyze" src/

Length of output: 3474


Remove or clarify the ANALYZE INDEX reference in operational guidance.

Line 63 references "ANALYZE INDEX tooling" to recompute query statistics, but this command is not implemented anywhere in the codebase. Either:

  • Remove this bullet point if the feature is not yet available, or
  • Document it as planned future work and clarify current workarounds for query statistics
🤖 Prompt for AI Agents
In docs/performance.md around lines 47 to 65, the "ANALYZE INDEX" reference is
inaccurate because the command isn't implemented; either remove that bullet or
explicitly mark it as planned work and provide current workarounds: replace the
sentence with a note that ANALYZE INDEX is planned (include an expected status
like "planned" or "TBD") and add a short sentence describing current options to
refresh query statistics such as reindexing the collection, allowing the indexer
to fully warm caches after schema changes, or running targeted rebuilds of
affected indexes until the tooling is available.

Comment on lines +546 to +549
doc-ids (:ids result)
docs (filter some? (map #(storage/get-document storage % branch-name) doc-ids))

;; Format results as array of JSON strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix storage/get-document invocation arity

Everywhere else we still call storage/get-document with two args (storage + id). Passing branch-name here makes this path throw an ArityException before any result is returned, so Redis SEARCH currently fails. Drop the third argument (or add a dedicated branch-aware helper if that’s intentional).

-            docs (filter some? (map #(storage/get-document storage % branch-name) doc-ids))
+            docs (filter some? (map #(storage/get-document storage %) doc-ids))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
doc-ids (:ids result)
docs (filter some? (map #(storage/get-document storage % branch-name) doc-ids))
;; Format results as array of JSON strings
doc-ids (:ids result)
docs (filter some? (map #(storage/get-document storage %) doc-ids))
;; Format results as array of JSON strings
🤖 Prompt for AI Agents
In src/chrondb/api/redis/core.clj around lines 546 to 549, the call to
storage/get-document is invoked with three arguments (storage, id, branch-name)
which causes an ArityException because the function is defined to accept only
two (storage, id); remove the third argument so the map uses
(storage/get-document storage id) OR, if branch-specific behavior is required,
add a new helper function in storage namespace (e.g., get-document-branch) that
accepts (storage id branch-name) and call that instead, ensuring calls in this
file match the storage API.

Comment on lines +43 to +57
"=" (ast/term field clean-val)
"!=" (ast/not (ast/term field clean-val))
"<>" (ast/not (ast/term field clean-val))
">" (let [num-info (parse-number clean-val)]
(if num-info
(if (= (:type num-info) :long)
(ast/range-long field (:value num-info) nil {:include-lower? false})
(ast/range-double field (:value num-info) nil {:include-lower? false}))
(ast/range field clean-val nil {:include-lower? false})))
">=" (let [num-info (parse-number clean-val)]
(if num-info
(if (= (:type num-info) :long)
(ast/range-long field (:value num-info) nil {:include-lower? true})
(ast/range-double field (:value num-info) nil {:include-lower? true}))
(ast/range field clean-val nil {:include-lower? true})))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle numeric equality/inequality with numeric clauses.

parse-number already detects numeric literals, but the "=", "!=", and "<>" branches ignore it and emit string term clauses. For numeric fields (e.g., age = 30), Lucene expects a numeric query; the string term will never match the indexed numeric value, so these comparisons silently fail.

Please reuse parse-number here and build the appropriate numeric clause (e.g., an inclusive range where lower==upper). For example:

-        "=" (ast/term field clean-val)
-        "!=" (ast/not (ast/term field clean-val))
-        "<>" (ast/not (ast/term field clean-val))
+        "=" (if-let [num-info (parse-number clean-val)]
+              (case (:type num-info)
+                :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true}))
+              (ast/term field clean-val))
+        "!=" (if-let [num-info (parse-number clean-val)]
+               (ast/not
+                 (case (:type num-info)
+                   :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                   :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})))
+               (ast/not (ast/term field clean-val)))
+        "<>" (if-let [num-info (parse-number clean-val)]
+               (ast/not
+                 (case (:type num-info)
+                   :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                   :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})))
+               (ast/not (ast/term field clean-val)))

Comment on lines 14 to +33
cond-val (str/replace (:value condition) #"['\"]" "")
operator (str/lower-case (:op condition))]
(case operator
"=" (= (str field-val) cond-val)
"!=" (not= (str field-val) cond-val)
"<>" (not= (str field-val) cond-val)
">" (> (compare (str field-val) cond-val) 0)
"<" (< (compare (str field-val) cond-val) 0)
">=" (>= (compare (str field-val) cond-val) 0)
"<=" (<= (compare (str field-val) cond-val) 0)
"like" (re-find (re-pattern (str "(?i)" (str/replace cond-val #"%" ".*"))) (str field-val))
;; Default case
(do
(log/log-warn (str "Unsupported operator: " operator))
false))))
operator (str/lower-case (:op condition))
;; Try to parse as numbers for numeric comparisons
field-num (try-parse-number field-val)
cond-num (try-parse-number cond-val)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard nil and non-string condition values before normalization

str/replace on (:value condition) blows up when the condition carries nil (e.g. IS NULL) or a numeric literal that already arrives as a number. In practice this now triggers NPE/ClassCastException before we even reach the operator case, so IS NULL queries crash the SQL path. Please coerce safely before stripping quotes.

-        cond-val (str/replace (:value condition) #"['\"]" "")
+        raw-cond (:value condition)
+        cond-val (when (some? raw-cond)
+                   (-> (if (string? raw-cond)
+                         raw-cond
+                         (str raw-cond))
+                       (str/replace #"['\"]" "")))
🤖 Prompt for AI Agents
In src/chrondb/api/sql/execution/operators.clj around lines 29–33, the code
calls str/replace on (:value condition) and str/lower-case on (:op condition)
without guarding for nil or non-string values, which throws when condition.value
is nil or already a number; change cond-val to only strip quotes when the value
is a string (leave nil or numeric values as-is) and coerce operator safely by
lower-casing only when it is a string (or calling str first), e.g. compute
raw-val (:value condition) then set cond-val to (if (string? raw-val)
(str/replace raw-val #"['\"]" "") raw-val) and set operator similarly (if
(string? (:op condition)) (str/lower-case (:op condition)) (str/lower-case (str
(:op condition)))) so numeric and nil values no longer cause
NPE/ClassCastException.

Comment on lines +328 to +337
(let [doc-ids (:ids lucene-result)
;; Deduplicate IDs before fetching documents
unique-doc-ids (distinct doc-ids)
docs (filter some? (map #(storage/get-document storage % branch-name) unique-doc-ids))]
(log/log-info (str "Lucene returned " (count doc-ids) " document IDs, " (count unique-doc-ids) " unique, fetched " (count docs) " documents"))
;; Deduplicate by ID to ensure no duplicates
(->> docs
(clojure.core/group-by :id)
(vals)
(map first)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve Lucene ordering when removing duplicates.

After fetching documents by ID you regroup by :id and then take (vals) on the hash-map, which reorders the sequence. Any Lucene sort (or pagination cursor) becomes meaningless because results come back in arbitrary hash order. You already deduplicated IDs with distinct, so drop the group-by step and keep the original order (e.g. just (vec docs)). Without this fix ORDER BY and cursor-based paging will return unstable sequences.

-                           ;; Deduplicate by ID to ensure no duplicates
-                           (->> docs
-                                (clojure.core/group-by :id)
-                                (vals)
-                                (map first)))
+                           ;; Already deduplicated by `distinct`; preserve Lucene order
+                           (vec docs))
🤖 Prompt for AI Agents
In src/chrondb/api/sql/execution/query.clj around lines 328-337, the post-fetch
regrouping via group-by and vals reorders results and destroys Lucene's
sort/cursor ordering; remove the group-by/vals/(map first) pipeline and instead
preserve Lucene ordering by returning the fetched documents in the original
fetch order (e.g. construct docs by mapping over unique-doc-ids and filtering
nil, or simply use the docs vector as-is after deduplication), ensuring no
additional reordering occurs.

Comment on lines +94 to +167
branch-name (or (not-empty branch) "main")
;; Parse sort descriptors
sort-descriptors (parse-sort-param sort)

;; Parse searchAfter cursor
search-after-cursor (or (parse-search-after after)
(parse-search-after cursor))

;; Parse limit and offset
limit-int (when limit (try (Integer/parseInt limit) (catch Exception _ nil)))
offset-int (when offset (try (Integer/parseInt offset) (catch Exception _ nil)))

;; Build AST query
ast-query (cond
;; Use structured query if provided
(parse-structured-query query)
(let [structured (parse-structured-query query)]
(ast/query (:clauses structured)
{:sort sort-descriptors
:limit limit-int
:offset offset-int
:branch branch-name
:after search-after-cursor}))

;; Convert simple query string to AST
(some? q)
(let [fts-clause (ast/fts "content" q)]
(ast/query [fts-clause]
{:sort sort-descriptors
:limit limit-int
:offset offset-int
:branch branch-name
:after search-after-cursor}))

;; Empty query
:else
(ast/query []
{:sort sort-descriptors
:limit limit-int
:offset offset-int
:branch branch-name
:after search-after-cursor}))

_ (log/log-info (str "AST query built: " ast-query))

;; Build options map
opts (cond-> {}
limit-int (assoc :limit limit-int)
offset-int (assoc :offset offset-int)
(seq sort-descriptors) (assoc :sort sort-descriptors)
search-after-cursor (assoc :after search-after-cursor))

;; Execute query
result (if (seq (:clauses ast-query))
(index/search-query index ast-query branch-name opts)
{:ids [] :total 0 :limit (or limit-int 100) :offset (or offset-int 0)})

;; Fetch full documents from storage using IDs
results (if (seq (:ids result))
(filter some? (map #(storage/get-document storage % branch-name) (:ids result)))
[])

;; Prepare response with next cursor if available
response-body (cond-> {:results (vec results)
:total (:total result)
:limit (:limit result)
:offset (:offset result)}
(:next-cursor result)
(assoc :next-cursor
(let [encoder (java.util.Base64/getEncoder)
cursor-data {:doc (.-doc ^org.apache.lucene.search.ScoreDoc (:next-cursor result))
:score (.-score ^org.apache.lucene.search.ScoreDoc (:next-cursor result))}
encoded (String. (.encode encoder (.getBytes (pr-str cursor-data) "UTF-8")) "UTF-8")]
encoded)))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Respect the configured default branch.

We again fall back to "main" whenever the client omits branch. On instances whose configured default branch is "master" (or any other value), every search will point at a non-existent branch and return empty results. Please load the default branch from configuration (you may need to require [chrondb.config :as config]) and only fall back to "main" if the config is unset/blank.

-        {:keys [q query branch limit offset sort after cursor]} params
-        branch-name (or (not-empty branch) "main")
+        {:keys [q query branch limit offset sort after cursor]} params
+        cfg (config/load-config)
+        configured-branch (some-> (get-in cfg [:git :default-branch]) str/trim)
+        branch-name (if (str/blank? branch)
+                      (or (not (str/blank? configured-branch) configured-branch) "main")
+                      branch)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/chrondb/api/v1/routes.clj around lines 94-167, the branch-name binding
always falls back to the hard-coded "main" when branch is missing; change it to
read the configured default branch from chrondb.config (e.g. require
[chrondb.config :as config]) and use that value when branch is blank, only
falling back to "main" if the config value is nil/blank; update the branch-name
expression to prefer the explicit branch param, then the config default
(checking non-empty), and finally "main", and add the necessary require at the
top of the file.

Comment on lines +105 to +107
(sql-server/start-server storage index sql-port))
;; Start index maintenance task (runs every 60 minutes by default)
(lucene/start-index-maintenance-task index 60))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add error handling for index maintenance task.

Starting the index maintenance task lacks error handling. If lucene/start-index-maintenance-task throws an exception, the servers will be partially initialized without maintenance, and the error may be silently swallowed.

Apply this diff to add error handling:

   (when-not disable-sql
     (println "Starting SQL protocol server on port" sql-port)
     (sql-server/start-server storage index sql-port))
-  ;; Start index maintenance task (runs every 60 minutes by default)
-  (lucene/start-index-maintenance-task index 60))
+  ;; Start index maintenance task (runs every 60 minutes by default)
+  (try
+    (lucene/start-index-maintenance-task index 60)
+    (println "Index maintenance task started successfully")
+    (catch Exception e
+      (println "Warning: Failed to start index maintenance task:" (.getMessage e)))))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(sql-server/start-server storage index sql-port))
;; Start index maintenance task (runs every 60 minutes by default)
(lucene/start-index-maintenance-task index 60))
(sql-server/start-server storage index sql-port))
;; Start index maintenance task (runs every 60 minutes by default)
(try
(lucene/start-index-maintenance-task index 60)
(println "Index maintenance task started successfully")
(catch Exception e
(println "Warning: Failed to start index maintenance task:" (.getMessage e)))))
🤖 Prompt for AI Agents
In src/chrondb/cli/server.clj around lines 105 to 107, the call to
lucene/start-index-maintenance-task lacks error handling and could throw,
leaving servers partially initialized; wrap the call in a try/catch (or
try/catch and finally) that catches Throwable, logs the error with context and
the exception stack (using the existing logger), and then either rethrow or call
a controlled shutdown/exit if you want to fail fast; ensure the catch does not
silently swallow the exception and that the log message clearly states the
maintenance task failed and includes the exception details.

Comment on lines +67 to +75
(let [{:keys [type field value lower upper include-lower? include-upper? must should must-not filter]} clause]
(case type
:match-all true
:term (= (get doc (keyword field)) value)
:wildcard (let [pattern (-> value str/lower-case (str/replace "*" ".*") (str/replace "?" "."))
regex (re-pattern pattern)
v (some-> (get doc (keyword field)) str str/lower-case)]
(boolean (and v (re-matches regex v))))
:range (let [doc-val (get doc (keyword field))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Escape regex metacharacters when translating wildcards

The wildcard branch currently replaces */? but leaves every other regex metacharacter untouched. Searching for acme.com ends up matching acmexcom, [foo], etc., because ./[] are interpreted by the regex engine. Please escape the literal characters before expanding the wildcard tokens.

-                        :wildcard (let [pattern (-> value str/lower-case (str/replace "*" ".*") (str/replace "?" "."))
-                                        regex (re-pattern pattern)
+                        :wildcard (let [regex (->> (str/lower-case (str value))
+                                                   (map (fn [ch]
+                                                          (case ch
+                                                            \* ".*"
+                                                            \? "."
+                                                            (java.util.regex.Pattern/quote (str ch)))))
+                                                   (apply str)
+                                                   re-pattern)
                                         v (some-> (get doc (keyword field)) str str/lower-case)]
                                     (boolean (and v (re-matches regex v))))
🤖 Prompt for AI Agents
In src/chrondb/index/memory.clj around lines 67 to 75, the wildcard branch
builds a regex by only replacing '*' and '?' which leaves other regex
metacharacters unescaped (so e.g. '.' or '[]' are treated as regex syntax). Fix
by first escaping regex metacharacters in the input string (except '*' and '?')
and then expand '*' -> '.*' and '?' -> '.'; e.g. convert value to
string/lower-case, run a replace that escapes characters like . \ + ^ $ ( ) { }
[ ] | so they become literal (or use a targeted replace with a capture group to
prefix them with a backslash), then replace the escaped '*'/'?' tokens with ".*"
and "." respectively, build the re-pattern from that escaped-and-transformed
string, and use it for re-matches.

Comment on lines +75 to +88
:range (let [doc-val (get doc (keyword field))]
(when doc-val
(let [comp (compare (str doc-val) (str (or lower doc-val)))
comp-upper (compare (str doc-val) (str (or upper doc-val)))]
(and (if lower
(if include-lower?
(>= comp 0)
(> comp 0))
true)
(if upper
(if include-upper?
(<= comp-upper 0)
(< comp-upper 0))
true)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle numeric range comparisons numerically

compare on stringified values breaks numeric ranges (:age 10 fails the > 2 test because "10" < "2" lexicographically). We need to coerce inputs to numbers when possible, falling back to case-insensitive string compare only when both sides are non-numeric.

-                        :range (let [doc-val (get doc (keyword field))]
-                                 (when doc-val
-                                   (let [comp (compare (str doc-val) (str (or lower doc-val)))
-                                         comp-upper (compare (str doc-val) (str (or upper doc-val)))]
-                                     (and (if lower
-                                            (if include-lower?
-                                              (>= comp 0)
-                                              (> comp 0))
-                                            true)
-                                          (if upper
-                                            (if include-upper?
-                                              (<= comp-upper 0)
-                                              (< comp-upper 0))
-                                            true)))))
+                        :range (let [doc-val (get doc (keyword field))
+                                     doc-num (parse-num doc-val)
+                                     lower-num (parse-num lower)
+                                     upper-num (parse-num upper)
+                                     doc-str (when (some? doc-val) (str/lower-case (str doc-val)))
+                                     lower-str (when (some? lower) (str/lower-case (str lower)))
+                                     upper-str (when (some? upper) (str/lower-case (str upper)))]
+                                 (when (some? doc-val)
+                                   (and (if (some? lower)
+                                          (if (and doc-num lower-num)
+                                            (if include-lower?
+                                              (>= doc-num lower-num)
+                                              (> doc-num lower-num))
+                                            (let [cmp (compare doc-str lower-str)]
+                                              (if include-lower? (>= cmp 0) (> cmp 0))))
+                                          true)
+                                        (if (some? upper)
+                                          (if (and doc-num upper-num)
+                                            (if include-upper?
+                                              (<= doc-num upper-num)
+                                              (< doc-num upper-num))
+                                            (let [cmp (compare doc-str upper-str)]
+                                              (if include-upper? (<= cmp 0) (< cmp 0))))
+                                          true))))

Add a local parse-num helper (as above) that returns a Double for numeric strings or numbers, otherwise nil.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/chrondb/index/memory.clj around lines 75 to 88, the current range
comparison stringifies values and uses compare which breaks numeric ranges (e.g.
10 vs 2). Add a local parse-num helper that returns a Double when given a
numeric value or numeric string and nil otherwise, then in the :range branch try
to parse both doc-val and bound (lower/upper) into numbers; if both sides are
numeric perform numeric comparisons (>, >=, <, <=) using those Doubles,
otherwise fall back to case-insensitive string compare as before; preserve the
existing include-lower?/include-upper? logic and short-circuit when lower/upper
are nil.

Comment on lines +27 to 38
(let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (= 1 (count results)))
(is (= "user:1" (first results))))))

(testing "Remove document"
(let [doc {:id "user:1" :name "John Doe" :age 30 :email "[email protected]"}]
(index/index-document *test-index* doc)
(let [before-remove (index/search *test-index* "name" "John" "main")]
(let [before-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (= 1 (count before-remove)))
(index/delete-document *test-index* (:id doc))
(let [after-remove (index/search *test-index* "name" "John" "main")]
(let [after-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (empty? after-remove)))))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Update expectations for the new search-query result shape.

index/search-query now returns a map (e.g., {:ids [...], :total ..., ...}) as used by the REST handler and the AST integration tests. Treating the return value as a plain vector (count results, first results) no longer works and these assertions will fail. Please destructure :ids (and any other fields you need) before asserting.

-      (let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
-        (is (= 1 (count results)))
-        (is (= "user:1" (first results))))
+      (let [{:keys [ids]} (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
+        (is (= 1 (count ids)))
+        (is (= "user:1" (first ids))))

Please apply the same adjustment to the other assertions in this file.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (= 1 (count results)))
(is (= "user:1" (first results))))))
(testing "Remove document"
(let [doc {:id "user:1" :name "John Doe" :age 30 :email "[email protected]"}]
(index/index-document *test-index* doc)
(let [before-remove (index/search *test-index* "name" "John" "main")]
(let [before-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (= 1 (count before-remove)))
(index/delete-document *test-index* (:id doc))
(let [after-remove (index/search *test-index* "name" "John" "main")]
(let [after-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (empty? after-remove)))))))
(let [{:keys [ids]} (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (= 1 (count ids)))
(is (= "user:1" (first ids))))))
(testing "Remove document"
(let [doc {:id "user:1" :name "John Doe" :age 30 :email "[email protected]"}]
(index/index-document *test-index* doc)
(let [before-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (= 1 (count before-remove)))
(index/delete-document *test-index* (:id doc))
(let [after-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
(is (empty? after-remove)))))))
🤖 Prompt for AI Agents
In test/chrondb/index/lucene_test.clj around lines 27 to 38, the test assumes
index/search-query returns a plain vector but it now returns a map (e.g. {:ids
[...], :total ...}); update the assertions to destructure the map and operate on
:ids (and any other needed keys) instead — bind results (or {:keys [ids total]
:as results}) then use (count ids) and (first ids) for your existing checks, and
apply the same destructuring and replacement to the other assertions in this
file that treat the return as a vector.

…y ensuring Jackson classes are runtime-initialized

- Fixes native-image docker build failures caused by Jackson JSON classes with static initialization constraints.
- Registers `com.fasterxml.jackson.core.JsonFactory` and `com.fasterxml.jackson.core.io.SerializedString` in `base-run-time-classes` to explicitly force runtime initialization.
- Updates AGENT.md for developer guidance: all changes to JSON/Jackson usage must keep this runtime-classes list in sync.
- No functional API change; allows docker builds to succeed and ChronDB start up reliably in GraalVM images.

Signed-off-by: Avelino <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bd4c0f and 5780a93.

📒 Files selected for processing (3)
  • AGENT.md (3 hunks)
  • dev/chrondb/native_image.clj (2 hunks)
  • src/chrondb/core.clj (2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
src/chrondb/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Files:

  • src/chrondb/core.clj
**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

**/*.{clj,cljc}: Prefer pure functions and data transformations in Clojure code
Add docstrings for public vars and functions
Follow indentation as enforced by cljfmt
Use descriptive, chronological-operation-aware logging (structured where possible)
Guard shared mutable state; prefer atoms with swap! semantics
Leverage transduce, reduce, or pipelines for efficient data flows

Files:

  • src/chrondb/core.clj
  • dev/chrondb/native_image.clj
dev/chrondb/native_image.clj

📄 CodeRabbit inference engine (AGENT.md)

Generate uberjar and GraalVM metadata via clojure -M:build -- --uberjar from the native_image helper module

Files:

  • dev/chrondb/native_image.clj
AGENT.md

📄 CodeRabbit inference engine (AGENT.md)

Maintain and update this AGENT.md with developer-focused instructions

Files:

  • AGENT.md
🧠 Learnings (18)
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to src/chrondb/**/*.{clj,cljc} : Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Applied to files:

  • src/chrondb/core.clj
  • dev/chrondb/native_image.clj
  • AGENT.md
📚 Learning: 2025-10-12T09:40:13.442Z
Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: In ChronDB, the server mode (chrondb server) starts non-daemon threads for Jetty, Redis, and SQL servers that keep the process alive. Calling System/exit after server initialization would terminate the server immediately, breaking the main execution mode. The lifecycle is managed by clojure.main, which properly propagates exceptions with exit code ≠0.

Applied to files:

  • src/chrondb/core.clj
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to Dockerfile : Docker build steps: install Clojure, run clojure -M:build -- --uberjar to produce target/chrondb.jar, then native-image target/native-image-args -H:Name=chrondb

Applied to files:

  • src/chrondb/core.clj
  • dev/chrondb/native_image.clj
  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to dev/chrondb/native_image.clj : Generate uberjar and GraalVM metadata via clojure -M:build -- --uberjar from the native_image helper module

Applied to files:

  • src/chrondb/core.clj
  • dev/chrondb/native_image.clj
  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-image-args : Always include --features=clj_easy.graal_build_time.InitClojureClasses in native-image arguments

Applied to files:

  • dev/chrondb/native_image.clj
  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-config/** : When adding reflection or dynamic class loading, add explicit GraalVM configuration under target/native-config/

Applied to files:

  • dev/chrondb/native_image.clj
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-image-args : Do not include -H:+UnlockExperimentalVMOptions in native-image arguments (JGit classes already set for build-time init)

Applied to files:

  • dev/chrondb/native_image.clj
  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to README.md : Update README.md for major user-facing changes

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Leverage transduce, reduce, or pipelines for efficient data flows

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Prefer pure functions and data transformations in Clojure code

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to AGENT.md : Maintain and update this AGENT.md with developer-focused instructions

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/** : Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/operations.md : When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/**/*.md : Document new endpoints and wire-protocol behaviors in docs/

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/examples-*.md : Provide runnable examples in docs/examples-*.md when possible

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to .github/workflows/build-native-image.yml : Use build-native-image.yml as the reference for native-image pipeline and smoke tests; keep it aligned with documented steps

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to Dockerfile : Dockerfile must use builder image ghcr.io/graalvm/native-image:ol9-java17-22.3.2 and runtime base debian:12-slim

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution

Applied to files:

  • AGENT.md
🪛 LanguageTool
AGENT.md

[grammar] ~74-~74: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~117-~117: Please add a punctuation mark at the end of paragraph.
Context: ... runtime instead of during native image analysis ### Advanced Native Image Topics - **...

(PUNCTUATION_PARAGRAPH_END)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run SQL Protocol Benchmark
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (4)
AGENT.md (1)

98-98: Comprehensive documentation guidance for Lucene changes.

The added instruction appropriately covers all relevant documentation files that need updates when modifying Lucene indexing capabilities.

dev/chrondb/native_image.clj (3)

108-110: Jackson runtime initialization properly configured.

The addition of JsonFactory and SerializedString to runtime classes ensures Jackson components initialize at runtime rather than build time, preventing potential issues with JSON processing in the native image. This aligns with the guidance added in AGENT.md.

Based on learnings.


274-274: Improved naming clarity.

Renaming trace-args to trace-object-args better reflects the actual flag being used (--trace-object-instantiation=).


276-280: Logical argument ordering.

The reordering separates trace-specific arguments from the general tail-args, improving the organization and readability of the native-image argument structure.

3. Leverage `transduce`, `reduce`, or pipelines for efficient data flows
4. Guard shared mutable state; prefer atoms with swap! semantics
5. Profile Git-heavy operations when adding loops or recursion
6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add missing punctuation mark.

The sentence should end with a period per standard grammar conventions for list items.

Apply this diff:

-6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer
+6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer
6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.
🧰 Tools
🪛 LanguageTool

[grammar] ~74-~74: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)

🤖 Prompt for AI Agents
In AGENT.md around line 74, the list item lacks a terminating period; update the
line so it ends with a period (add a "." at the end of the sentence) to match
standard punctuation for list items.

Comment on lines +116 to +117
- Ensure Jackson factories stay runtime-initialized: `dev/chrondb/native_image.clj` registers `com.fasterxml.jackson.core.JsonFactory` and related classes under `base-run-time-classes`; keep this list in sync when touching JSON dependencies
- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add missing punctuation mark.

Line 117 should end with a period per standard grammar conventions.

Apply this diff:

-- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis
+- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Ensure Jackson factories stay runtime-initialized: `dev/chrondb/native_image.clj` registers `com.fasterxml.jackson.core.JsonFactory` and related classes under `base-run-time-classes`; keep this list in sync when touching JSON dependencies
- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis
- Ensure Jackson factories stay runtime-initialized: `dev/chrondb/native_image.clj` registers `com.fasterxml.jackson.core.JsonFactory` and related classes under `base-run-time-classes`; keep this list in sync when touching JSON dependencies
- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis.
🧰 Tools
🪛 LanguageTool

[grammar] ~117-~117: Please add a punctuation mark at the end of paragraph.
Context: ... runtime instead of during native image analysis ### Advanced Native Image Topics - **...

(PUNCTUATION_PARAGRAPH_END)

🤖 Prompt for AI Agents
AGENT.md around lines 116-117: the sentence on line 117 is missing a terminal
period; add a period at the end of the second bullet so the line reads as a
complete sentence; update the punctuation only, preserving existing wording and
spacing.

Comment on lines 73 to 82
(println (server-usage))
(println)
(println (cli/usage)))
:cli (apply cli/-main args)
(let [{:keys [command args value]} (server/parse-command args)]
(println (cli-usage)))
:cli (apply cli-main args)
(let [{:keys [command args value]} (parse-server-command args)]
(if (= command :unknown)
(do
(println "Unknown command" value)
(println (server/usage)))
(server/dispatch! command args)))))
(println (server-usage)))
(dispatch-server! command args)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid printing nil after usage helpers

Both chrondb.cli.server/usage and chrondb.cli.core/usage already print their banners and return nil. Wrapping them in println now adds a literal nil line after the help output. Guard the println so we only emit when the helper actually returns text, while still invoking the helper for its side effects.

-    :help (do
-            (println (server-usage))
-            (println)
-            (println (cli-usage)))
+    :help (do
+            (when-let [banner (server-usage)]
+              (println banner))
+            (println)
+            (when-let [banner (cli-usage)]
+              (println banner)))
@@
-        (do
-          (println "Unknown command" value)
-          (println (server-usage)))
+        (do
+          (println "Unknown command" value)
+          (when-let [banner (server-usage)]
+            (println banner)))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/chrondb/core.clj around lines 73 to 82, the usage helpers are being
wrapped in println which prints a trailing "nil" because
chrondb.cli.server/usage and chrondb.cli.core/usage print their own banners and
return nil; remove the println wrappers or guard them: call the usage helper for
its side effects and only println its return value if non-nil (e.g. capture the
return and (when result (println result))). Ensure the helpers are still invoked
so banners appear but avoid emitting a literal "nil" line.

…tests

- Implement `chrondb.api.sql.execution.ast-converter` to translate SQL WHERE condition maps to unified AST query clauses used by Lucene.
- Support FTS, standard operators (=, !=, <>, >, <, >=, <=), LIKE (wildcards), IS [NOT] NULL, and number parsing for range queries.
- Gracefully handle unknown types and operators, always preserving immutability and fast failure semantics.
- Add unit tests for all major conversion and edge cases to `test/chrondb/api/sql/execution/ast_converter_test.clj`.
- Document AST composition and query helper contract in `docs/ast-queries.md`.
- Cross-link property-based test location in AGENT.md for better developer guidance.

This supports full protocol-level parity and reproducible query planning across ChronDB HTTP, Redis, and SQL endpoints.

Signed-off-by: Avelino <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/chrondb/api/sql/execution/ast_converter.clj (1)

43-45: Handle numeric equality/inequality with numeric clauses.

The equality operators still use string term clauses, which won't match numeric field values indexed by Lucene. For example, age = 30 will fail because the indexed numeric value won't match a string term query. This was flagged in a previous review.

Reuse parse-number to detect numeric literals and build appropriate numeric range clauses (with lower==upper for equality):

-        "=" (ast/term field clean-val)
-        "!=" (ast/not (ast/term field clean-val))
-        "<>" (ast/not (ast/term field clean-val))
+        "=" (if-let [num-info (parse-number clean-val)]
+              (case (:type num-info)
+                :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true}))
+              (ast/term field clean-val))
+        "!=" (if-let [num-info (parse-number clean-val)]
+               (ast/not
+                 (case (:type num-info)
+                   :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                   :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})))
+               (ast/not (ast/term field clean-val)))
+        "<>" (if-let [num-info (parse-number clean-val)]
+               (ast/not
+                 (case (:type num-info)
+                   :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                   :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})))
+               (ast/not (ast/term field clean-val)))
AGENT.md (2)

75-75: Add missing punctuation mark.

The sentence should end with a period per standard grammar conventions.

Apply this diff:

-6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer
+6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.

118-118: Add missing punctuation mark.

The sentence should end with a period per standard grammar conventions.

Apply this diff:

-- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis
+- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5780a93 and 8d9fa50.

📒 Files selected for processing (5)
  • AGENT.md (4 hunks)
  • docs/ast-queries.md (1 hunks)
  • src/chrondb/api/sql/execution/ast_converter.clj (1 hunks)
  • test/chrondb/api/sql/execution/ast_converter_test.clj (1 hunks)
  • test/chrondb/query/ast_test.clj (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/ast-queries.md
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

**/*.{clj,cljc}: Prefer pure functions and data transformations in Clojure code
Add docstrings for public vars and functions
Follow indentation as enforced by cljfmt
Use descriptive, chronological-operation-aware logging (structured where possible)
Guard shared mutable state; prefer atoms with swap! semantics
Leverage transduce, reduce, or pipelines for efficient data flows

Files:

  • test/chrondb/api/sql/execution/ast_converter_test.clj
  • test/chrondb/query/ast_test.clj
  • src/chrondb/api/sql/execution/ast_converter.clj
test/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

test/**/*.{clj,cljc}: Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution
Prefer property-based tests when sequence ordering matters

Files:

  • test/chrondb/api/sql/execution/ast_converter_test.clj
  • test/chrondb/query/ast_test.clj
src/chrondb/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Files:

  • src/chrondb/api/sql/execution/ast_converter.clj
AGENT.md

📄 CodeRabbit inference engine (AGENT.md)

Maintain and update this AGENT.md with developer-focused instructions

Files:

  • AGENT.md
🧠 Learnings (17)
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution

Applied to files:

  • test/chrondb/api/sql/execution/ast_converter_test.clj
  • test/chrondb/query/ast_test.clj
  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Prefer property-based tests when sequence ordering matters

Applied to files:

  • test/chrondb/query/ast_test.clj
  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to src/chrondb/**/*.{clj,cljc} : Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Applied to files:

  • src/chrondb/api/sql/execution/ast_converter.clj
  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to README.md : Update README.md for major user-facing changes

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Leverage transduce, reduce, or pipelines for efficient data flows

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Prefer pure functions and data transformations in Clojure code

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to AGENT.md : Maintain and update this AGENT.md with developer-focused instructions

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/** : Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/operations.md : When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/**/*.md : Document new endpoints and wire-protocol behaviors in docs/

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/examples-*.md : Provide runnable examples in docs/examples-*.md when possible

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to dev/chrondb/native_image.clj : Generate uberjar and GraalVM metadata via clojure -M:build -- --uberjar from the native_image helper module

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to Dockerfile : Docker build steps: install Clojure, run clojure -M:build -- --uberjar to produce target/chrondb.jar, then native-image target/native-image-args -H:Name=chrondb

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to .github/workflows/build-native-image.yml : Use build-native-image.yml as the reference for native-image pipeline and smoke tests; keep it aligned with documented steps

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-image-args : Do not include -H:+UnlockExperimentalVMOptions in native-image arguments (JGit classes already set for build-time init)

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-image-args : Always include --features=clj_easy.graal_build_time.InitClojureClasses in native-image arguments

Applied to files:

  • AGENT.md
📚 Learning: 2025-10-12T09:48:12.351Z
Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to Dockerfile : Dockerfile must use builder image ghcr.io/graalvm/native-image:ol9-java17-22.3.2 and runtime base debian:12-slim

Applied to files:

  • AGENT.md
🪛 LanguageTool
AGENT.md

[grammar] ~75-~75: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~118-~118: Please add a punctuation mark at the end of paragraph.
Context: ... runtime instead of during native image analysis ### Advanced Native Image Topics - **...

(PUNCTUATION_PARAGRAPH_END)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run SQL Protocol Benchmark
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (11)
src/chrondb/api/sql/execution/ast_converter.clj (3)

6-23: LGTM!

The helper functions correctly strip quotes and parse numeric values with appropriate fallback handling.


46-74: LGTM!

The comparison operators (>, >=, <, <=) correctly use parse-number to detect numeric values and create appropriate range clauses. The LIKE, IS NULL, and IS NOT NULL operators map correctly to their AST equivalents.


79-91: LGTM!

The function correctly uses keep to filter nil clauses and appropriately combines multiple conditions with AND logic, with proper handling of single-condition edge cases.

AGENT.md (2)

57-57: LGTM!

Good addition documenting the location of AST query test coverage, which aligns with the new AST-based query architecture introduced in this PR.


99-99: LGTM!

Appropriate guidance for maintainers to update architecture and performance documentation when modifying Lucene indexing or query planning components.

test/chrondb/api/sql/execution/ast_converter_test.clj (1)

6-12: LGTM!

The FTS test and multiple conditions aggregation tests are well-structured and cover the expected behavior correctly.

Also applies to: 100-116

test/chrondb/query/ast_test.clj (5)

5-39: LGTM!

Excellent coverage of basic clause constructors with proper nil handling and edge case validation.


41-81: LGTM!

Range clause tests comprehensively cover default behavior, open bounds, option overrides, and numeric type specialization.


83-117: LGTM!

Boolean node tests thoroughly validate collapsing logic, implicit match-all handling, and nil filtering behavior.


119-157: LGTM!

Logical combinator tests properly verify edge case collapsing and the interaction between AND, OR, and NOT operators.


159-225: LGTM!

Sort descriptor and query transformer tests comprehensively validate metadata handling, defaults, and vectorization behavior.

Comment on lines +14 to +98
(deftest condition->ast-clause-standard-test
(testing "Equality removes surrounding quotes"
(let [condition {:type :standard
:field "name"
:op "="
:value "'Alice'"}
expected (ast/term :name "Alice")]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "Not equal delegates to ast/not"
(let [condition {:type :standard
:field "status"
:op "!="
:value "archived"}
expected (ast/not (ast/term :status "archived"))]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "Alternate not equal operator <> behaves the same"
(let [condition {:type :standard
:field "status"
:op "<>"
:value "archived"}
expected (ast/not (ast/term :status "archived"))]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "Greater than parses longs"
(let [condition {:type :standard
:field "age"
:op ">"
:value "30"}
expected (ast/range-long :age 30 nil {:include-lower? false})]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "Greater than parses doubles"
(let [condition {:type :standard
:field "price"
:op ">"
:value "42.5"}
expected (ast/range-double :price 42.5 nil {:include-lower? false})]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "Greater than falls back to string range when numeric parse fails"
(let [condition {:type :standard
:field "code"
:op ">"
:value "'A'"}
expected (ast/range :code "A" nil {:include-lower? false})]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "Less-than-or-equal builds inclusive upper bound"
(let [condition {:type :standard
:field "age"
:op "<="
:value "65"}
expected (ast/range-long :age nil 65 {:include-upper? true})]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "LIKE converts SQL wildcards to Lucene wildcards"
(let [condition {:type :standard
:field "title"
:op "LIKE"
:value "report-%Q1%"}
expected (ast/wildcard :title "report-*Q1*")]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "IS NULL returns missing clause"
(let [condition {:type :standard
:field "deleted_at"
:op "IS NULL"}
expected (ast/missing :deleted_at)]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "IS NOT NULL returns exists clause"
(let [condition {:type :standard
:field "deleted_at"
:op "IS NOT NULL"}
expected (ast/exists :deleted_at)]
(is (= expected (converter/condition->ast-clause condition)))))

(testing "Unknown condition type returns nil"
(is (nil? (converter/condition->ast-clause
{:type :unsupported
:field "name"
:op "="
:value "Alice"})))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add test coverage for numeric equality comparisons.

The test suite doesn't verify that numeric values in equality operators (=, !=, <>) are handled correctly. This is a critical gap since the implementation currently has a bug where numeric equality uses string term clauses instead of numeric range clauses.

Add tests like these:

(testing "Equality with numeric long value"
  (let [condition {:type :standard
                   :field "age"
                   :op "="
                   :value "30"}
        expected (ast/range-long :age 30 30 {:include-lower? true :include-upper? true})]
    (is (= expected (converter/condition->ast-clause condition)))))

(testing "Equality with numeric double value"
  (let [condition {:type :standard
                   :field "price"
                   :op "="
                   :value "99.99"}
        expected (ast/range-double :price 99.99 99.99 {:include-lower? true :include-upper? true})]
    (is (= expected (converter/condition->ast-clause condition)))))

(testing "Not equal with numeric value"
  (let [condition {:type :standard
                   :field "count"
                   :op "!="
                   :value "0"}
        expected (ast/not (ast/range-long :count 0 0 {:include-lower? true :include-upper? true}))]
    (is (= expected (converter/condition->ast-clause condition)))))
🤖 Prompt for AI Agents
test/chrondb/api/sql/execution/ast_converter_test.clj lines 14-98: add tests
that verify numeric equality is converted to numeric range clauses rather than
string terms; specifically add an "Equality with numeric long value" test
asserting that field age with op "=" and value "30" yields ast/range-long with
lower and upper both 30 and include-lower? and include-upper? true, an "Equality
with numeric double value" test asserting that field price with op "=" and value
"99.99" yields ast/range-double with lower and upper both 99.99 and includes
true, and a "Not equal with numeric value" test asserting that field count with
op "!=" and value "0" yields ast/not wrapping an ast/range-long with lower and
upper 0 and includes true; place each test alongside the existing
condition->ast-clause-standard-test cases so they run with the same fixtures.

@avelino avelino merged commit 9c46bdc into main Oct 30, 2025
10 checks passed
@avelino avelino deleted the avelino/issue-39 branch October 30, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Advanced Query and Index Support with Lucene

2 participants