feat(ast): unify query engine across all protocols with new AST system #61

avelino · 2025-10-30T14:31:30Z

Introduce Abstract Syntax Tree (AST) query layer that enables consistent query parsing, planning, and execution for REST, Redis, SQL, and CLI protocols. The new chrondb.query.ast namespace defines composable query operators (FTS, range, sort, etc.) with cursor-based pagination and type safety. Deprecate bespoke query engines in favor of a Lucene-backed planner that ensures protocol parity and performance. Includes migration notes, updated documentation, and extensive integration tests.

Adds direct AST support to Redis and SQL engines (ast_converter.clj)
Refactors test suite to assert identical results across protocols for history, sorting, and paging
Background reindexing and commit order maintained via Lucene index hooks
Documentation added in docs/ast-queries.md and referenced in README, operations, and architecture docs

fixes #39

Summary by CodeRabbit

New Features
- Apache Lucene-powered search with full-text, composite, geospatial queries, sorting, cursor pagination and cached query planner; unified AST-based query execution across REST, Redis, SQL, and CLI; new search endpoints and CLI command.
Documentation
- Expanded architecture, AST query guide, Redis/REST/SQL/CLI examples, performance and operations (background reindexing) and README updates.
Tests
- New unit and multi-protocol integration tests covering AST, SQL conversion, and Lucene behavior.
Maintenance / Chores
- Background index population/maintenance tasks and native-image/CI runtime-initialization guidance.

Introduce Abstract Syntax Tree (AST) query layer that enables consistent query parsing, planning, and execution for REST, Redis, SQL, and CLI protocols. The new `chrondb.query.ast` namespace defines composable query operators (FTS, range, sort, etc.) with cursor-based pagination and type safety. Deprecate bespoke query engines in favor of a Lucene-backed planner that ensures protocol parity and performance. Includes migration notes, updated documentation, and extensive integration tests. - Adds direct AST support to Redis and SQL engines (`ast_converter.clj`) - Refactors test suite to assert identical results across protocols for history, sorting, and paging - Background reindexing and commit order maintained via Lucene index hooks - Documentation added in `docs/ast-queries.md` and referenced in README, operations, and architecture docs fixes #39 Signed-off-by: Avelino <[email protected]>

coderabbitai · 2025-10-30T14:31:40Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a Lucene-backed search overhaul: an AST query model, new search-query index protocol, Lucene and Memory index implementations, REST/Redis/SQL/CLI routing to AST-based search, background index maintenance and population, extensive documentation, and multi-protocol integration and unit tests.

Changes

Cohort / File(s)	Summary
Documentation `AGENT.md`, `README.md`, `docs/architecture.md`, `docs/ast-queries.md`, `docs/examples-redis.md`, `docs/index.md`, `docs/operations.md`, `docs/performance.md`	New and expanded docs describing Lucene-powered indexing, AST queries, examples, background reindexing, performance guidance, and reviewer/CI guidance for native-image/runtime init and lazy CLI namespaces.
Query AST Library `src/chrondb/query/ast.clj`	New AST builder: clause constructors (term, fts, range, boolean), sort descriptors, query wrapper and helpers for pagination/branch/hints/search-after.
Index Protocol `src/chrondb/index/protocol.clj`	Added public `search-query` protocol method; documented `search` as legacy.
Lucene Index Implementation `src/chrondb/index/lucene.clj`	Major Lucene integration: field normalization, numeric fields, sort construction, ast->query multimethods, query execution (sort/limit/offset/search-after), maintenance tasks, index population, and `search-query` implementation.
Memory Index Implementation `src/chrondb/index/memory.clj`	Added `search-query` implementation that supports AST-like clauses, matchers, content-field scanning, pagination and result shaping.
REST API & Routing `src/chrondb/api/v1.clj`, `src/chrondb/api/v1/routes.clj`	Request parsing (structured queries, sort, after cursor), AST construction, call into `index/search-query`, fetch docs from storage, return total/next-cursor; GET "/search" wired to new flow.
Redis API `src/chrondb/api/redis/core.clj`	Added `SEARCH` and `FT.SEARCH` handlers that parse options (LIMIT/OFFSET/SORT/BRANCH), build AST, call `search-query`, and return JSON results.
SQL Execution & Parser `src/chrondb/api/sql/execution/ast_converter.clj`, `src/chrondb/api/sql/execution/operators.clj`, `src/chrondb/api/sql/execution/query.clj`, `src/chrondb/api/sql/parser/clauses.clj`, `src/chrondb/api/sql/parser/statements.clj`, `src/chrondb/api/sql/protocol/messages/writer.clj`	SQL WHERE→AST conversion, numeric-aware operator evaluation, SELECT routing through Lucene when index available (handle-select-case signature updated), multi-row INSERT support, parse-time skip branches for IN/BETWEEN/NULL, and safer message-type writing.
CLI & HTTP Client `src/chrondb/cli/core.clj`, `src/chrondb/cli/http.clj`, `src/chrondb/cli/server.clj`	New `search` CLI command and `search-documents` HTTP helper; server startup kicks off background index population and periodic maintenance task.
Index Lifecycle & Maintenance Wiring `src/chrondb/cli/server.clj`, `src/chrondb/index/lucene.clj`	Background index-population task, periodic maintenance/commit/refresh loop, ensure-searcher-updated!, start/stop maintenance helpers.
Examples & Tests `src/chrondb/example.clj`, `test/chrondb/api/ast_integration_test.clj`, `test/chrondb/api/sql/execution/query_test.clj`, `test/chrondb/api/sql/parser/clauses_test.clj`, `test/chrondb/index/lucene_test.clj`, `test/chrondb/index/memory_test.clj`, `test/chrondb/query/ast_test.clj`, `test/chrondb/api/sql/execution/ast_converter_test.clj`	Example updated to `search-query`; extensive AST integration tests (REST/Redis/AST), SQL parser and ast-converter unit tests, and migrated index tests to `search-query` API.
Core CLI/Server Lazy Resolve `src/chrondb/core.clj`	Replaced direct namespace requires with requiring-resolve / lazy var resolution for CLI/server command dispatch to defer runtime initialization.
Native Image Runtime `dev/chrondb/native_image.clj`	Added Jackson classes to runtime-initialized class list and adjusted native-image arg composition.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API Layer (REST/Redis/SQL/CLI)
    participant AST as chrondb.query.ast
    participant Index as Index (Lucene/Memory)
    participant Storage as Storage
    participant Indexer as Background Indexer

    Client->>API: search request (q/query, limit, sort, branch, after)
    API->>AST: parse/construct AST
    AST-->>API: AST query map
    API->>Index: search-query(AST, branch, opts)
    rect rgb(220,235,255)
      Note right of Index: ast->query → build Lucene Query\nexecute (sort/limit/offset/search-after)
    end
    Index-->>API: {ids, total, next-cursor}
    API->>Storage: fetch docs by ids
    Storage-->>API: documents
    API-->>Client: response (docs, total, next-cursor)

    par Background maintenance
      Indexer->>Storage: scan for new/changed docs
      Indexer->>Index: index documents, commit, refresh
      Index->>Index: update stats / warm caches
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas requiring extra attention:

src/chrondb/index/lucene.clj — ast->query multimethods, Lucene field typing, sort mapping, search-after correctness, threading of analyzers and options, and maintenance lifecycle.
src/chrondb/api/sql/execution/query.clj — handle-select-case signature change, Lucene routing vs storage fallback, JOIN and projection correctness.
test/chrondb/api/ast_integration_test.clj — integration stability across REST/Redis/AST, Redis lifecycle setup/teardown, and deterministic ordering/cursor behavior.
Protocol surface consistency — ensure all call sites migrated to search-query and backward-compat fallbacks handled.

Possibly related PRs

feat(cli): introduce ChronDB CLI utility for backup, restore, export, import and scheduling #60 — Modifies the same CLI modules (CLI search command, HTTP helper, server index maintenance); directly related to CLI/server changes.
feat: Add full-text search support via PostgreSQL to_tsquery syntax #23 — Overlaps SQL execution, WHERE parsing and FTS/Lucene integration; likely touches the same SQL→AST routing and parser branches.
feat(redis): Redis protocol server implementation #11 — Related to Redis protocol server changes and FTS handlers; overlaps SEARCH/FT.SEARCH additions.

Suggested labels

enhancement

Poem

🐰 I hopped through indexes, fields aglow,
I stitched AST branches row by row,
Cursors tucked beneath my paw,
Lucene hummed and wakes the law,
Now searches dance — hooray, let's go! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat(ast): unify query engine across all protocols with new AST system" directly and clearly describes the main change in the changeset. The PR introduces a new Abstract Syntax Tree (AST) query layer and refactors REST, Redis, SQL, and CLI protocols to use this unified query engine, which is precisely what the title conveys. The title is concise, specific, and accurately reflects the primary objective of the changeset without unnecessary noise.

[pre_merge_check_pass]

The PR code changes substantially implement the primary coding requirements from issue #39: it expands src/chrondb/index/lucene.clj with a new search-query method and multi-dispatch query builder (ast->query); extends src/chrondb/index/protocol.clj with the new search-query protocol method; implements a query planner infrastructure via AST construction and optimization; adds full-text search support with configurable analyzers; implements result caching and index lifecycle management (ensure-index-populated, maintenance tasks); and unifies query execution across REST, Redis, SQL, and CLI through the new AST layer. While some advanced objectives like geospatial and temporal indexes are mentioned in documentation, the core coding requirements for query planning, FTS, caching, and index management are addressed through this implementation. |
| Out of Scope Changes Check | ✅ Passed | All code changes in the PR are directly related to the stated objectives of unifying the query engine across protocols via the new AST system. Changes to SQL parsing (clauses, statements, operators, query execution), Redis API, REST API, CLI, and index implementations all serve the core goal of AST-based query unification. Supporting infrastructure changes to src/chrondb/core.clj for lazy CLI loading and dev/chrondb/native_image.clj for Jackson class registration align with the guidance mentioned in AGENT.md. Background reindexing additions to src/chrondb/cli/server.clj are explicitly part of the PR objectives. No changes appear to introduce unrelated functionality outside the scope of AST query engine unification or its supporting infrastructure. |

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch avelino/issue-39

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-10-30T14:37:01Z

src/chrondb/index/lucene.clj

+(defmethod ast->query :range
+  [_ {:keys [field lower upper include-lower? include-upper? value-type]}]
+  (let [lower (when lower (str lower))
+        upper (when upper (str upper))]
+    (case value-type
+      :long (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
+      :double (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
+      (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?)))))


Use numeric range queries for numeric AST clauses

The :range AST handler always builds TermRangeQuery regardless of value-type. Term ranges operate on lexicographic term order and will treat numbers as strings (e.g. 10 sorts before 2), while create-lucene-doc indexes numeric fields with LongPoint/DoublePoint. As a result range-long and range-double queries will either return incorrect results or skip documents entirely because the numeric point fields are never consulted. Consider using LongPoint/newRangeQuery and DoublePoint/newRangeQuery for numeric values so range filtering respects numeric ordering.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-30T14:37:01Z

src/chrondb/index/lucene.clj

+(defmethod ast->query :exists
+  [_ {:keys [field]}]
+  (DocValuesFieldExistsQuery. field))
+
+(defmethod ast->query :missing
+  [_ {:keys [field]}]
+  (let [builder (BooleanQuery$Builder.)]
+    (.add builder (MatchAllDocsQuery.) BooleanClause$Occur/MUST)
+    (.add builder (DocValuesFieldExistsQuery. field) BooleanClause$Occur/MUST_NOT)


String fields lack doc values required by :exists and sorting

Both :exists/:missing clauses and sanitize-sort-field rely on doc values (DocValuesFieldExistsQuery, SortField$Type/STRING) to work with string fields, but create-lucene-doc only adds doc values for numeric fields. A string field indexed as a TextField will therefore never satisfy an exists query and attempts to sort on it will throw IllegalStateException: field was indexed without docvalues. Add SortedDocValuesField (or similar) for text fields when storing them so these AST operators behave correctly.

Useful? React with 👍 / 👎.

…ing, and SQL parser bridge - Adds `fts-condition?` and FTS AST clause detection in SQL query execution, bridging SQL `WHERE x @@ to_tsquery(...)` to FTS search. - Introduces central AST clause for FTS matching, standardized as `(ast/fts "field" "term")`, with normalization and wildcarding for broad/lucene-aligned search. - Updates Lucene index search logic to recognize and optimize for FTS AST nodes, preserving analyzer selection, wildcard support, and leading wildcard configuration. - Ensures protocol consistency and API-parity across SQL, REST, and direct invocation per protocol documentation. - Backfills tests for FTS AST equation, wildcard and to_tsquery parsing, and SQL-engine roundtrip. - Documents breaking change and migration: legacy FTS conditions are mapped to AST clause `(ast/fts ...)`; guides users to new query structure. Signed-off-by: Avelino <[email protected]>

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (10)

docs/ast-queries.md (1)

84-85: Consider showing cursor structure example.

The documentation mentions <base64-cursor> but doesn't show what a decoded cursor looks like. Adding a brief example of the cursor map structure ({:doc 123 :score 1.5}) would help developers understand how to construct or debug cursors.
src/chrondb/api/v1.clj (1)
63-64: Consider adding parameter validation.

The limit and offset parsing converts to string before parsing to int. If these are already integers in the params map, this is inefficient. Consider handling both string and numeric inputs:
limit (cond
        (integer? limit) limit
        (string? limit) (Integer/parseInt limit)
        :else nil)
src/chrondb/index/memory.clj (1)
116-126: Consider caching filtered results for total count.

The implementation filters the entire dataset twice: once for the paginated results and once for the total count (line 122). For large datasets, this is inefficient. Consider filtering once and deriving both values:
(let [filtered-docs (filter matches? (.values data))
      total (count filtered-docs)
      docs (->> filtered-docs
                (map :id)
                (drop offset)
                (take limit)
                vec)]
  {:ids docs
   :total total
   :limit limit
   :offset offset})
src/chrondb/api/sql/execution/query.clj (1)

101-156: Remove the unused FTS helper to satisfy lint

fts-get-matching-docs is no longer referenced, and clj-kondo is warning about it. Please delete the helper (or wire it back up) so the lint job runs clean.
src/chrondb/api/v1/routes.clj (5)
49-59: Consider logging EDN parsing errors.

The catch block silently returns nil on parsing failure. For debugging and monitoring, consider logging the error message.

Apply this diff:
       (cond
         (map? query-param) query-param
         (string? query-param) (edn/read-string query-param)
         :else nil)
-      (catch Exception _
+      (catch Exception e
+        (log/log-warn (str "Failed to parse structured query: " (.getMessage e)))
         nil))))
61-75: Consider logging cursor parsing errors.

Similar to parse-structured-query, the catch block silently returns nil. Adding logging would help troubleshoot invalid cursor values.

Apply this diff:
           (contains? parsed :doc)
           (contains? parsed :score))
           parsed))
-      (catch Exception _
+      (catch Exception e
+        (log/log-warn (str "Failed to parse searchAfter cursor: " (.getMessage e)))
         nil))))
103-104: Silent exception handling may hide parsing errors.

Failed integer parsing returns nil, which may lead to unexpected behavior downstream. Consider logging parsing failures or validating the input format earlier.

Apply this diff:
         ;; Parse limit and offset
-        limit-int (when limit (try (Integer/parseInt limit) (catch Exception _ nil)))
-        offset-int (when offset (try (Integer/parseInt offset) (catch Exception _ nil)))
+        limit-int (when limit 
+                   (try 
+                     (Integer/parseInt limit) 
+                     (catch Exception e
+                       (log/log-warn (str "Invalid limit parameter: " limit))
+                       nil)))
+        offset-int (when offset 
+                    (try 
+                      (Integer/parseInt offset) 
+                      (catch Exception e
+                        (log/log-warn (str "Invalid offset parameter: " offset))
+                        nil)))
164-167: Add type hints to avoid reflection on ScoreDoc fields.

Direct field access on ScoreDoc will trigger reflection warnings. Adding type hints improves performance.

Apply this diff:
                         (:next-cursor result)
                         (assoc :next-cursor
                                (let [encoder (java.util.Base64/getEncoder)
-                                     cursor-data {:doc (.-doc ^org.apache.lucene.search.ScoreDoc (:next-cursor result))
-                                                  :score (.-score ^org.apache.lucene.search.ScoreDoc (:next-cursor result))}
+                                     ^org.apache.lucene.search.ScoreDoc score-doc (:next-cursor result)
+                                     cursor-data {:doc (.-doc score-doc)
+                                                  :score (.-score score-doc)}
                                      encoded (String. (.encode encoder (.getBytes (pr-str cursor-data) "UTF-8")) "UTF-8")]
                                  encoded)))]
77-168: Consider extracting helper functions for better modularity.

The handle-search function is quite long (~90 lines) and handles multiple concerns: parameter parsing, query building, execution, and response formatting. Consider extracting helpers like parse-pagination-params, build-ast-from-params, and format-search-response to improve readability and testability.
src/chrondb/index/lucene.clj (1)
63-71: Consider catching NumberFormatException specifically.

Catching Exception is overly broad. Long/parseLong and Double/parseDouble throw NumberFormatException, which would be more precise.

Apply this diff:
 (defn- parse-long-safe [s]
   (try
     (Long/parseLong s)
-    (catch Exception _ nil)))
+    (catch NumberFormatException _ nil)))
 
 (defn- parse-double-safe [s]
   (try
     (Double/parseDouble s)
-    (catch Exception _ nil)))
+    (catch NumberFormatException _ nil)))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbafc0b and f50e3e2.

📒 Files selected for processing (30)

AGENT.md (2 hunks)
README.md (1 hunks)
docs/architecture.md (2 hunks)
docs/ast-queries.md (1 hunks)
docs/examples-redis.md (1 hunks)
docs/index.md (1 hunks)
docs/operations.md (1 hunks)
docs/performance.md (1 hunks)
src/chrondb/api/redis/core.clj (3 hunks)
src/chrondb/api/sql/execution/ast_converter.clj (1 hunks)
src/chrondb/api/sql/execution/operators.clj (2 hunks)
src/chrondb/api/sql/execution/query.clj (11 hunks)
src/chrondb/api/sql/parser/clauses.clj (1 hunks)
src/chrondb/api/sql/parser/statements.clj (1 hunks)
src/chrondb/api/sql/protocol/messages/writer.clj (1 hunks)
src/chrondb/api/v1.clj (1 hunks)
src/chrondb/api/v1/routes.clj (3 hunks)
src/chrondb/cli/core.clj (3 hunks)
src/chrondb/cli/http.clj (1 hunks)
src/chrondb/cli/server.clj (1 hunks)
src/chrondb/example.clj (1 hunks)
src/chrondb/index/lucene.clj (3 hunks)
src/chrondb/index/memory.clj (1 hunks)
src/chrondb/index/protocol.clj (1 hunks)
src/chrondb/query/ast.clj (1 hunks)
test/chrondb/api/ast_integration_test.clj (1 hunks)
test/chrondb/api/sql/execution/query_test.clj (1 hunks)
test/chrondb/api/sql/parser/clauses_test.clj (1 hunks)
test/chrondb/index/lucene_test.clj (5 hunks)
test/chrondb/index/memory_test.clj (3 hunks)

🧰 Additional context used

📓 Path-based instructions (9)

README.md

📄 CodeRabbit inference engine (AGENT.md)

Update README.md for major user-facing changes

Files:

README.md

docs/**/*.md

📄 CodeRabbit inference engine (AGENT.md)

Document new endpoints and wire-protocol behaviors in docs/

Files:

docs/operations.md
docs/architecture.md
docs/performance.md
docs/index.md
docs/ast-queries.md
docs/examples-redis.md

docs/operations.md

📄 CodeRabbit inference engine (AGENT.md)

When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Files:

docs/operations.md

docs/**

📄 CodeRabbit inference engine (AGENT.md)

Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Files:

docs/operations.md
docs/architecture.md
docs/performance.md
docs/index.md
docs/ast-queries.md
docs/examples-redis.md

AGENT.md

📄 CodeRabbit inference engine (AGENT.md)

Maintain and update this AGENT.md with developer-focused instructions

Files:

AGENT.md

src/chrondb/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Files:

src/chrondb/index/protocol.clj
src/chrondb/api/sql/parser/clauses.clj
src/chrondb/cli/http.clj
src/chrondb/api/sql/parser/statements.clj
src/chrondb/api/sql/execution/operators.clj
src/chrondb/example.clj
src/chrondb/api/v1.clj
src/chrondb/index/lucene.clj
src/chrondb/cli/core.clj
src/chrondb/api/sql/protocol/messages/writer.clj
src/chrondb/api/redis/core.clj
src/chrondb/api/sql/execution/ast_converter.clj
src/chrondb/cli/server.clj
src/chrondb/query/ast.clj
src/chrondb/index/memory.clj
src/chrondb/api/v1/routes.clj
src/chrondb/api/sql/execution/query.clj

**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

**/*.{clj,cljc}: Prefer pure functions and data transformations in Clojure code
Add docstrings for public vars and functions
Follow indentation as enforced by cljfmt
Use descriptive, chronological-operation-aware logging (structured where possible)
Guard shared mutable state; prefer atoms with swap! semantics
Leverage transduce, reduce, or pipelines for efficient data flows

Files:

src/chrondb/index/protocol.clj
src/chrondb/api/sql/parser/clauses.clj
src/chrondb/cli/http.clj
src/chrondb/api/sql/parser/statements.clj
src/chrondb/api/sql/execution/operators.clj
src/chrondb/example.clj
src/chrondb/api/v1.clj
src/chrondb/index/lucene.clj
test/chrondb/index/memory_test.clj
test/chrondb/api/sql/execution/query_test.clj
src/chrondb/cli/core.clj
src/chrondb/api/sql/protocol/messages/writer.clj
src/chrondb/api/redis/core.clj
test/chrondb/api/sql/parser/clauses_test.clj
src/chrondb/api/sql/execution/ast_converter.clj
src/chrondb/cli/server.clj
src/chrondb/query/ast.clj
test/chrondb/index/lucene_test.clj
src/chrondb/index/memory.clj
test/chrondb/api/ast_integration_test.clj
src/chrondb/api/v1/routes.clj
src/chrondb/api/sql/execution/query.clj

test/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

test/**/*.{clj,cljc}: Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution
Prefer property-based tests when sequence ordering matters

Files:

test/chrondb/index/memory_test.clj
test/chrondb/api/sql/execution/query_test.clj
test/chrondb/api/sql/parser/clauses_test.clj
test/chrondb/index/lucene_test.clj
test/chrondb/api/ast_integration_test.clj

docs/examples-*.md

📄 CodeRabbit inference engine (AGENT.md)

Provide runnable examples in docs/examples-*.md when possible

Files:

docs/examples-redis.md

🧠 Learnings (12)

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to README.md : Update README.md for major user-facing changes

Applied to files:

README.md
docs/operations.md
AGENT.md
docs/architecture.md
docs/index.md
docs/examples-redis.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/operations.md : When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Applied to files:

docs/operations.md
AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Leverage transduce, reduce, or pipelines for efficient data flows

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Prefer pure functions and data transformations in Clojure code

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to AGENT.md : Maintain and update this AGENT.md with developer-focused instructions

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/** : Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/**/*.md : Document new endpoints and wire-protocol behaviors in docs/

Applied to files:

AGENT.md
docs/ast-queries.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/examples-*.md : Provide runnable examples in docs/examples-*.md when possible

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:40:13.442Z

Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: ChronDB's GitStorage only implements the internal protocol chrondb.storage.protocol/Storage and is not a Closeable. There is no simple way to use with-open or try-with-resources patterns without changing the storage's public API. The storage remains open and in use by the server throughout the process lifetime.

Applied to files:

docs/architecture.md
src/chrondb/api/redis/core.clj

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to src/chrondb/**/*.{clj,cljc} : Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Applied to files:

src/chrondb/example.clj

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution

Applied to files:

test/chrondb/index/memory_test.clj
test/chrondb/api/sql/parser/clauses_test.clj
test/chrondb/api/ast_integration_test.clj

📚 Learning: 2025-10-12T09:40:13.442Z

Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: In ChronDB, the server mode (chrondb server) starts non-daemon threads for Jetty, Redis, and SQL servers that keep the process alive. Calling System/exit after server initialization would terminate the server immediately, breaking the main execution mode. The lifecycle is managed by clojure.main, which properly propagates exceptions with exit code ≠0.

Applied to files:

src/chrondb/api/redis/core.clj
src/chrondb/cli/server.clj

🪛 GitHub Actions: Coverage

src/chrondb/cli/server.clj

[error] 107-107: Syntax error compiling at (chrondb/cli/server.clj:107:4). No such var: lucene/start-index-maintenance-task

🪛 GitHub Actions: Lint project with clj-kondo

src/chrondb/index/lucene.clj

[warning] 8-8: Unused import SortedDocValuesField

[warning] 10-10: Unused import BooleanQuery

[warning] 12-12: Unused import BytesRef

test/chrondb/api/sql/parser/clauses_test.clj

[warning] 2-2: #'clojure.test/testing' is referred but never used

src/chrondb/cli/server.clj

[warning] 107-107: Unresolved var: lucene/start-index-maintenance-task

[warning] 117-117: Unresolved var: lucene/ensure-index-populated

src/chrondb/query/ast.clj

[warning] 34-34: range already refers to #'clojure.core/range

[warning] 92-92: boolean already refers to #'clojure.core/boolean

[warning] 121-121: and already refers to #'clojure.core/and

[warning] 130-130: or already refers to #'clojure.core/or

[warning] 139-139: not already refers to #'clojure.core/not

[warning] 145-145: sort-by already refers to #'clojure.core/sort-by

src/chrondb/api/sql/execution/query.clj

[warning] 101-101: Unused private var chrondb.api.sql.execution.query/fts-get-matching-docs

🪛 LanguageTool

AGENT.md

[grammar] ~74-~74: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: test
GitHub Check: Run SQL Protocol Benchmark
GitHub Check: test

🔇 Additional comments (38)

docs/index.md (1)

19-19: Lucene capability highlight looks solid.

The new bullet clearly advertises the Lucene-backed search upgrades without overpromising. Nicely aligned with the rest of the docs.

docs/examples-redis.md (1)

116-139: Nice coverage of the new SEARCH verbs.

Showcasing limit/offset, sorting, branch selection, and the FT.SEARCH alias gives users immediate, runnable patterns—exactly what we need for the Lucene-backed flow.

docs/performance.md (1)

47-65: Great operational context for the Lucene migration.

The new section concisely explains why the overhaul matters and gives actionable tuning guidance (metrics, analyze tooling, async staging). This will help operators adopt the changes confidently.

docs/operations.md (1)

220-228: Background reindexing notes are spot on.

Clear explanation of the async Lucene catch-up cycle, metrics, and post-restore guidance—exactly the operational detail we needed to ship with the indexer changes.

AGENT.md (1)

75-99: Guidelines keep the team aligned.

The reminders to funnel search work through the Lucene planner and to sync the doc set when those pieces change will save us from future drift. Thanks for codifying it.

src/chrondb/api/sql/protocol/messages/writer.clj (1)

33-42: Protocol safety improvement looks good.

Normalizing the message type and skipping the leading byte when it’s genuinely absent avoids the old startup edge cases without touching length semantics.

docs/architecture.md (1)

35-35: LGTM! Clear documentation of the Lucene integration.

The updated indexing layer description and the new detailed section effectively communicate the shift to Lucene-backed indexing with secondary indexes, query planning, and caching capabilities.

Also applies to: 47-49

test/chrondb/api/sql/execution/query_test.clj (1)

37-47: LGTM! Mock storage updated to match revised data retrieval shape.

The mock implementation now returns a vector of document values instead of key-value pairs, aligning with the broader Lucene/AST indexing overhaul where the storage layer returns documents directly.

README.md (1)

27-29: LGTM! Well-written user-facing documentation.

The new Lucene Search Overhaul section clearly communicates the major architectural change and its benefits to users. This aligns with the coding guideline to update README.md for major user-facing changes.

src/chrondb/cli/http.clj (1)

120-133: LGTM! Well-structured search function.

The new search-documents function cleanly supports the AST-based search API with optional parameters. The use of cond-> to conditionally build query parameters is idiomatic and handles optional fields correctly.

src/chrondb/api/sql/parser/statements.clj (1)

157-157: No issues found—multi-row INSERT is properly supported.

The parser change from (first values) to values correctly enables multi-row INSERT support. The execution layer's handle-insert-case function (query.clj:581-684) already handles this correctly:

Detects multi-row input via (every? sequential? values)

Normalizes to a list of rows: values-rows

Processes each row with mapv, creating, saving, and indexing documents individually

Returns accurate inserted-count for all processed rows

The implementation is complete and correct.

src/chrondb/index/protocol.clj (1)

10-15: Protocol extension looks solid.

The addition of search-query with a clear deprecation notice for the legacy search method follows good API evolution practices. The docstring effectively documents the expected query-map structure and return shape.

src/chrondb/cli/core.clj (2)

61-70: Well-structured CLI command definition.

The search command specification is comprehensive and follows the established patterns in the file. The option set aligns well with the AST query capabilities described in the documentation.

222-231: Clean implementation following established patterns.

The run-search handler follows the same delegation pattern as other commands in the file, maintaining consistency in the codebase.

docs/ast-queries.md (3)

1-68: Comprehensive and well-structured documentation.

The documentation provides an excellent overview of the AST query system with clear examples. The organization from basic concepts to complex usage is logical and developer-friendly.

Based on learnings

156-169: Excellent pagination documentation.

The distinction between offset-based and cursor-based pagination is clearly explained, including when to use each approach. The next-cursor semantics are well-documented.

182-191: Strong protocol consistency guarantees.

The documentation effectively emphasizes that all protocols return identical results, which is a key benefit of the AST-based approach. This aligns well with the PR objectives.

src/chrondb/api/sql/execution/operators.clj (3)

6-19: Solid numeric parsing helper.

The implementation handles the common cases well, with appropriate fallback to nil for non-numeric values. The quote removal is essential for SQL string literals.

28-75: Robust comparison logic with numeric awareness.

The implementation correctly handles nil field values and attempts numeric comparisons when appropriate, falling back to string comparisons. The explicit operator cases for nil values prevent potential null pointer issues.

77-89: Clean simplification of filtering logic.

The refactored implementation is more readable while maintaining correct AND semantics across multiple conditions. The explicit empty check at the start is a good optimization.

test/chrondb/index/memory_test.clj (3)

43-48: Test properly adapted to new API.

The test correctly uses the new query map format and extracts IDs from the result. The assertions verify the expected behavior with string ID comparisons.

50-63: Content field search test looks good.

The test verifies the special "content" field behavior that searches across all document fields, which is an important feature to validate.

73-84: Thorough case-insensitivity testing.

The test validates case-insensitive search across multiple case variations, ensuring the feature works consistently.

src/chrondb/index/memory.clj (3)

97-115: Content field handling is well-implemented.

The special case for content field searching across all document fields is clearly separated and correctly excludes internal fields like :_table. The case-insensitive search aligns with user expectations.

64-65: Clean parameter validation.

The limit and offset validation ensures sensible bounds, preventing negative values and enforcing minimum limits.

47-62: <!-- [scratchpad_start]
[task review_file_1/1 final_analysis]
Complete analysis of wildcard conversion concern

[observations]

AST defines 3 distinct types: :term (exact), :wildcard (pattern), :prefix (wildcard convenience)

Legacy format conversion creates :wildcard with "" suffix: (str value "")

ast.clj has explicit prefix function that does: (wildcard field (str value "*"))

Memory test validates legacy format behavior:

test-search-by-field: {:field "name" :value "John"} expects 2 results

Expected results: "user:1" (John Doe) and "user:3" (John Smith)

This PROVES legacy format expected PREFIX matching, not exact match

Test would fail if changed to :term (exact match)

[critical_finding]

The review comment suggests changing to :term for "exact match"

This contradicts test expectations that show legacy format uses PREFIX matching

The test explicitly expects "John" to match "John Doe" AND "John Smith"

Changing to :term would break this test and existing caller code

[conclusion]
The review comment is factually incorrect about the legacy format semantics. The code correctly preserves backward compatibility. The suggested change would actually introduce a BREAKING CHANGE.

[tag]

Based on my investigation of the codebase and tests, I have found conclusive evidence about the legacy format behavior.

The review comment is incorrect.

The test file test/chrondb/index/memory_test.clj definitively proves that the legacy format {:field "name" :value "John"} was designed for prefix matching, not exact matching. Specifically, test-search-by-field expects the query {:field "name" :value "John"} to return 2 results: both "John Doe" and "John Smith". This confirms the legacy API uses wildcard/prefix semantics.

The current implementation correctly preserves this behavior by converting to {:type :wildcard :field field :value (str value "*")}. The AST module even has a dedicated prefix function that explicitly does the same thing: (wildcard field (str value "*")).

The suggested change to :term (exact match) would actually break backward compatibility and cause the existing tests to fail, as the test expects multiple prefix matches, not an exact match.

Likely an incorrect or invalid review comment.

src/chrondb/api/v1/routes.clj (2)

38-47: LGTM - Well-structured sort parameter parsing.

The function correctly handles the parsing of sort parameters with appropriate fallbacks for missing direction values.

204-205: LGTM - Route correctly updated for new signature.

The search route now passes the full request object, which is necessary for the enhanced handle-search implementation.

src/chrondb/index/lucene.clj (7)

15-21: LGTM - Clean field normalization logic.

The function correctly handles FTS field naming with appropriate guards against double-suffixing.

23-42: LGTM - Reasonable type detection heuristics.

The type inference logic based on field names (age, price, *_id, etc.) is practical for common use cases. The fallback to :string provides a safe default.

44-61: LGTM - Clean sort descriptor conversion.

The functions correctly map AST sort descriptors to Lucene's SortField with appropriate type handling.

73-105: LGTM - Comprehensive numeric field indexing.

The enhanced document creation correctly indexes numeric values using both LongPoint/DoublePoint (for range queries) and NumericDocValuesField/DoubleDocValuesField (for sorting). The FTS field normalization is also properly applied.

107-186: LGTM - Well-structured AST query system.

The multimethod dispatch and dynamic context binding are idiomatic Clojure. The FTS error handling appropriately logs before re-throwing. The boolean query recursion correctly composes nested queries.

188-250: LGTM - Robust query execution with comprehensive error handling.

The query execution logic correctly handles both cursor-based (searchAfter) and offset-based pagination. Error handling includes stack traces for debugging, and result formatting provides all necessary metadata including next-cursor.

363-387: LGTM - Well-integrated search-query implementation.

The protocol implementation correctly binds the Lucene context, constructs the query, and executes with appropriate error handling. The result mapping to IDs aligns with the storage-backed document retrieval pattern used in the routes.

src/chrondb/query/ast.clj (3)

89-90: LGTM - Nil-removing vectorize enables clean composition.

The vectorize helper that removes nils works well with the guard clauses in builder functions (like term, wildcard, fts), allowing optional clauses to naturally compose without cluttering the query structure.

92-119: LGTM - Well-optimized boolean clause composition.

The function correctly optimizes single-clause cases and properly wraps single must-not clauses with match-all to ensure correct boolean semantics. The vectorization of all clause types ensures consistent handling.

156-200: LGTM - Clean query wrapper and augmentation helpers.

The query function and accompanying with-* helpers provide a composable, idiomatic API for building queries. The use of vectorize for optional sort descriptors ensures consistent handling.

coderabbitai · 2025-10-30T14:48:00Z

src/chrondb/api/sql/execution/query.clj

+                         (let [doc-ids (:ids lucene-result)
+                               ;; Deduplicate IDs before fetching documents
+                               unique-doc-ids (distinct doc-ids)
+                               docs (filter some? (map #(storage/get-document storage % branch-name) unique-doc-ids))]
+                           (log/log-info (str "Lucene returned " (count doc-ids) " document IDs, " (count unique-doc-ids) " unique, fetched " (count docs) " documents"))
+                           ;; Deduplicate by ID to ensure no duplicates
+                           (->> docs
+                                (clojure.core/group-by :id)
+                                (vals)
+                                (map first)))
+                         ;; Fallback: JOIN or direct storage lookup


⚠️ Potential issue | 🟠 Major

Preserve Lucene ordering when deduplicating docs

The group-by/vals dedup loses any ordering that Lucene just applied, because hash-map iteration order is undefined. As soon as a client requests ORDER BY, the response sequence becomes effectively random. Please dedup while preserving order—for example, walk the returned ids once, track a seen set, and conj documents into a vector in discovery order.

- (let [doc-ids (:ids lucene-result) - ;; Deduplicate IDs before fetching documents - unique-doc-ids (distinct doc-ids) - docs (filter some? (map #(storage/get-document storage % branch-name) unique-doc-ids))] - (log/log-info (str "Lucene returned " (count doc-ids) " document IDs, " (count unique-doc-ids) " unique, fetched " (count docs) " documents")) - ;; Deduplicate by ID to ensure no duplicates - (->> docs - (clojure.core/group-by :id) - (vals) - (map first))) + (let [doc-ids (:ids lucene-result) + [seen docs] + (reduce (fn [[seen acc] id] + (if (seen id) + [seen acc] + (if-let [doc (storage/get-document storage id branch-name)] + [(conj seen id) (conj acc doc)] + [(conj seen id) acc]))) + [#{} []] + doc-ids)] + (log/log-info (str "Lucene returned " (count doc-ids) " document IDs, fetched " (count docs) " unique documents")) + docs)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(let [doc-ids (:ids lucene-result)

;; Deduplicate IDs before fetching documents

unique-doc-ids (distinct doc-ids)

docs (filter some? (map #(storage/get-document storage % branch-name) unique-doc-ids))]

(log/log-info (str "Lucene returned " (count doc-ids) " document IDs, " (count unique-doc-ids) " unique, fetched " (count docs) " documents"))

;; Deduplicate by ID to ensure no duplicates

(->> docs

(clojure.core/group-by :id)

(vals)

(map first)))

;; Fallback: JOIN or direct storage lookup

(let [doc-ids (:ids lucene-result)

[seen docs]

(reduce (fn [[seen acc] id]

(if (seen id)

[seen acc]

(if-let [doc (storage/get-document storage id branch-name)]

[(conj seen id) (conj acc doc)]

[(conj seen id) acc])))

[#{} []]

doc-ids)]

(log/log-info (str "Lucene returned " (count doc-ids) " document IDs, fetched " (count docs) " unique documents"))

docs)

;; Fallback: JOIN or direct storage lookup

🤖 Prompt for AI Agents

In src/chrondb/api/sql/execution/query.clj around lines 328-338, the current dedup via group-by/vals loses Lucene ordering; instead, deduplicate while preserving Lucene's order by iterating the returned ids (unique-doc-ids or original doc-ids in their returned order), maintain a mutable or persistent 'seen' set, and build a vector by looking up each id in storage/get-document and conj-ing the document only if its id hasn't been seen yet; replace the group-by/vals/map-first sequence with this ordered walk so ordering from Lucene is preserved while removing duplicates.

coderabbitai · 2025-10-30T14:48:00Z

src/chrondb/api/sql/parser/clauses.clj

+              ;; Skip IN conditions (not implemented)
+              (and (>= (count remaining) 2)
+                   (= (str/lower-case (second remaining)) "in"))
+              (do
+                (log/log-warn "IN condition parsing not yet implemented, skipping")
+                (recur (drop-while #(not (constants/LOGICAL_OPERATORS (str/lower-case %))) remaining) conditions))


⚠️ Potential issue | 🟡 Minor

Verify token advancement for trailing IN condition.

The drop-while advances until a logical operator is found, but if the IN condition is the last clause (no AND/OR follows), this could consume tokens beyond the WHERE clause boundary. Consider checking against the end-index or ensuring the caller's token boundary is respected.

🤖 Prompt for AI Agents

In src/chrondb/api/sql/parser/clauses.clj around lines 129 to 134, the IN-condition skip uses drop-while which can overrun the WHERE token boundary if the IN clause is trailing; change the token-advancement to respect the caller's token boundary by stopping when you reach the provided end-index (or the end of the tokens slice) — i.e. replace the drop-while with logic that advances until either a logical operator is found OR the current index >= end-index, and ensure you return the remaining tokens correctly (empty or slice starting at end-index) so tokens after the WHERE boundary are not consumed.

coderabbitai · 2025-10-30T14:48:00Z

src/chrondb/api/sql/parser/clauses.clj

+              ;; Skip BETWEEN conditions (not implemented)
+              (and (>= (count remaining) 2)
+                   (= (str/lower-case (second remaining)) "between"))
+              (do
+                (log/log-warn "BETWEEN condition parsing not yet implemented, skipping")
+                (let [rest-tokens (drop 2 remaining)
+                      after-between (drop-while (fn [tok]
+                                                   (and tok
+                                                        (not (constants/LOGICAL_OPERATORS (str/lower-case tok)))))
+                                                 rest-tokens)]
+                  (recur after-between conditions)))


⚠️ Potential issue | 🟡 Minor

Similar edge case concern for trailing BETWEEN.

The token advancement logic mirrors the IN case and has the same potential issue with trailing conditions. The implementation correctly skips past the BETWEEN keyword and its arguments, but boundary checking against end-index would make this more robust.

🤖 Prompt for AI Agents

In src/chrondb/api/sql/parser/clauses.clj around lines 136 to 146, the BETWEEN branch advances tokens like the IN case and can overrun when BETWEEN is trailing; add explicit boundary checks when inspecting (second remaining) and when computing rest-tokens/after-between so you never read past end-index (e.g. verify remaining has at least two tokens before treating second as "between", limit the drop/drop-while to end-index or use indexed iteration/take to stop at the parser end), and return/recur safely when the token stream ends rather than assuming more tokens exist.

coderabbitai · 2025-10-30T14:48:00Z

src/chrondb/api/sql/parser/clauses.clj

+              ;; Skip NULL comparisons (not implemented)
+              (and (>= (count remaining) 3)
+                   (= (str/lower-case (nth remaining 2)) "null"))
+              (do
+                (log/log-warn "NULL comparison parsing not yet implemented, skipping")
+                (recur (drop 3 remaining) conditions))


⚠️ Potential issue | 🟠 Major

NULL comparison handling may miss IS NOT NULL.

The code checks for "null" at the third position and drops 3 tokens, which works for patterns like "field IS NULL". However, "field IS NOT NULL" is a 4-token pattern that won't be caught by this check (NULL would be at position 3, not 2). This could lead to incorrect parsing of IS NOT NULL comparisons.

Consider updating the detection to check for both patterns:

- ;; Skip NULL comparisons (not implemented) - (and (>= (count remaining) 3) - (= (str/lower-case (nth remaining 2)) "null")) + ;; Skip NULL comparisons (not implemented) + (and (>= (count remaining) 3) + (or (= (str/lower-case (nth remaining 2)) "null") + (and (>= (count remaining) 4) + (= (str/lower-case (nth remaining 2)) "not") + (= (str/lower-case (nth remaining 3)) "null")))) (do (log/log-warn "NULL comparison parsing not yet implemented, skipping") - (recur (drop 3 remaining) conditions)) + (let [tokens-to-drop (if (= (str/lower-case (nth remaining 2)) "not") 4 3)] + (recur (drop tokens-to-drop remaining) conditions)))

🤖 Prompt for AI Agents

In src/chrondb/api/sql/parser/clauses.clj around lines 148 to 153, the current check only skips a 3-token "IS NULL" pattern and misses the 4-token "IS NOT NULL" pattern; update the detection to handle both: inspect the next tokens case-insensitively and if you see ["is" "null"] drop 3 tokens, and if you see ["is" "not" "null"] drop 4 tokens, logging the skip in either case and then recur with the appropriately dropped remaining tokens so both "IS NULL" and "IS NOT NULL" are skipped correctly.

coderabbitai · 2025-10-30T14:48:00Z

src/chrondb/api/v1.clj

+        sort (cond
+               (string? sort) (map (fn [spec]
+                                     (let [[field dir] (-> spec str/trim (str/split #":"))]
+                                       {:field (str/trim field)
+                                        :direction (if (= "desc" (str/lower-case (or dir ""))) :desc :asc)}))
+                                   (str/split sort #","))
+               (sequential? sort) sort
+               :else nil)


⚠️ Potential issue | 🟡 Minor

Sort parsing could be more robust.

The sort normalization splits on colons but doesn't validate the field names or handle malformed input (e.g., "field:" or ":asc"). Consider adding validation or error handling for invalid sort specifications.

🤖 Prompt for AI Agents

In src/chrondb/api/v1.clj around lines 65-72, the sort parsing accepts malformed specs like "field:" or ":asc" without validation; update the parsing to validate each spec: trim and split as now, require the field part to be non-empty and match an allowed pattern (e.g. /^[A-Za-z0-9_.-]+$/), normalize the direction to either :asc or :desc (default to :asc) only when dir is "asc" or "desc" (case-insensitive), and if a spec is invalid either skip it or raise a clear IllegalArgumentException including the original spec — implement one consistent behavior across the function and ensure the resulting sort is either a filtered vector of valid map entries or the function fails fast with a descriptive error message.

coderabbitai · 2025-10-30T14:48:01Z

src/chrondb/index/lucene.clj

+           [org.apache.lucene.document Document StringField TextField Field$Store SortedDocValuesField LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField]
           [org.apache.lucene.index IndexWriter IndexWriterConfig IndexWriterConfig$OpenMode DirectoryReader Term IndexNotFoundException]
-           [org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs]
+           [org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery]
           [org.apache.lucene.queryparser.classic QueryParser ParseException]
+           [org.apache.lucene.util BytesRef]


🛠️ Refactor suggestion | 🟠 Major

Remove unused imports.

The linter flags SortedDocValuesField, BooleanQuery, and BytesRef as unused. Clean imports improve code maintainability.

Based on learnings

Apply this diff:

- [org.apache.lucene.document Document StringField TextField Field$Store SortedDocValuesField LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField] + [org.apache.lucene.document Document StringField TextField Field$Store LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField] [org.apache.lucene.index IndexWriter IndexWriterConfig IndexWriterConfig$OpenMode DirectoryReader Term IndexNotFoundException] - [org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery] + [org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery] [org.apache.lucene.queryparser.classic QueryParser ParseException] - [org.apache.lucene.util BytesRef] [java.nio.file Paths]))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[org.apache.lucene.document Document StringField TextField Field$Store SortedDocValuesField LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField]

[org.apache.lucene.index IndexWriter IndexWriterConfig IndexWriterConfig$OpenMode DirectoryReader Term IndexNotFoundException]

[org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs]

[org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery]

[org.apache.lucene.queryparser.classic QueryParser ParseException]

[org.apache.lucene.util BytesRef]

[org.apache.lucene.document Document StringField TextField Field$Store LongPoint DoublePoint NumericDocValuesField DoubleDocValuesField]

[org.apache.lucene.index IndexWriter IndexWriterConfig IndexWriterConfig$OpenMode DirectoryReader Term IndexNotFoundException]

[org.apache.lucene.search IndexSearcher Query ScoreDoc TopDocs BooleanQuery$Builder BooleanClause$Occur MatchAllDocsQuery TermQuery WildcardQuery TermRangeQuery Sort SortField SortField$Type TopFieldCollector TopScoreDocCollector DocValuesFieldExistsQuery]

[org.apache.lucene.queryparser.classic QueryParser ParseException]

[java.nio.file Paths]))

🧰 Tools

🪛 GitHub Actions: Lint project with clj-kondo

[warning] 8-8: Unused import SortedDocValuesField

[warning] 10-10: Unused import BooleanQuery

[warning] 12-12: Unused import BytesRef

🤖 Prompt for AI Agents

In src/chrondb/index/lucene.clj around lines 8 to 12, the import lists include unused symbols SortedDocValuesField, BooleanQuery, and BytesRef; remove those three symbols from the respective import vectors so the file only imports actually used Lucene classes, then run the linter to confirm no remaining unused-import warnings.

coderabbitai · 2025-10-30T14:48:02Z

src/chrondb/index/memory.clj

+          matcher (fn matcher [doc clause]
+                    (let [{:keys [type field value lower upper include-lower? include-upper? must should must-not filter]} clause]
+                      (case type
+                        :match-all true
+                        :term (= (get doc (keyword field)) value)
+                        :wildcard (let [pattern (-> value str/lower-case (str/replace "*" ".*") (str/replace "?" "."))
+                                        regex (re-pattern pattern)
+                                        v (some-> (get doc (keyword field)) str str/lower-case)]
+                                    (boolean (and v (re-matches regex v))))
+                        :range (let [doc-val (get doc (keyword field))]
+                                 (when doc-val
+                                   (let [comp (compare (str doc-val) (str (or lower doc-val)))
+                                         comp-upper (compare (str doc-val) (str (or upper doc-val)))]
+                                     (and (if lower
+                                            (if include-lower?
+                                              (>= comp 0)
+                                              (> comp 0))
+                                            true)
+                                          (if upper
+                                            (if include-upper?
+                                              (<= comp-upper 0)
+                                              (< comp-upper 0))
+                                            true)))))
+                        :boolean (let [must (every? #(matcher doc %) must)
+                                       should (if (seq should)
+                                                (some #(matcher doc %) should)
+                                                true)
+                                       must-not (every? #(not (matcher doc %)) must-not)
+                                       filter-ok (every? #(matcher doc %) filter)]
+                                   (and must should must-not filter-ok))
+                        false)))


⚠️ Potential issue | 🟠 Major

Range comparison and term matching inconsistencies.

Two concerns with the matcher implementation:

Range comparisons use string comparison (line 77-78), which won't work correctly for numeric ranges. For example, "9" > "10" in string comparison. Consider checking if values are numeric and comparing numerically.

Inconsistent case sensitivity: wildcard matching is case-insensitive (line 73 str/lower-case), but term matching is case-sensitive (line 70). This could lead to unexpected behavior where :term and :wildcard queries behave differently.

Consider this improvement for numeric range handling:

:range (let [doc-val (get doc (keyword field))] (when doc-val (let [doc-num (or (try-parse-number doc-val) doc-val) lower-num (when lower (or (try-parse-number lower) lower)) upper-num (when upper (or (try-parse-number upper) upper)) comp (compare doc-num (or lower-num doc-num)) comp-upper (compare doc-num (or upper-num doc-num))] ...)))

coderabbitai · 2025-10-30T14:48:02Z

src/chrondb/query/ast.clj

+  (when value
+    (wildcard field (str value "*"))))
+
+(defn range


⚠️ Potential issue | 🟠 Major

Rename functions to avoid shadowing clojure.core.

Six functions shadow core Clojure functions: range, boolean, and, or, not, and sort-by. This creates ambiguity and potential bugs, especially visible at line 143 where or is called within the not function.

Consider prefixing with q- or using -clause suffix (e.g., range-clause, q-and, q-or, q-not, bool-clause, sort-descriptor).

Based on learnings

Example renaming:

-(defn range +(defn range-clause "Creates a range clause. Provide nil for open bounds. For inclusive/exclusive bounds use the options map. Supports value-type option: :long, :double, :string (default)." ([field lower upper] - (range field lower upper {:include-lower? true :include-upper? true})) + (range-clause field lower upper {:include-lower? true :include-upper? true})) ([field lower upper {:keys [include-lower? include-upper? type] :or {include-lower? true include-upper? true}}] (when field {:type :range :field (name field) :lower (when (some? lower) (str lower)) :upper (when (some? upper) (str upper)) :include-lower? (boolean include-lower?) :include-upper? (boolean include-upper?) :value-type type}))) -(defn boolean +(defn bool-clause "Generic boolean node." [{:keys [must should must-not filter]}] ...) -(defn and +(defn q-and "Combines clauses using logical AND." [& clauses] ...) -(defn or +(defn q-or "Combines clauses using logical OR." [& clauses] ...) -(defn not +(defn q-not "Negates a clause." [clause] - (boolean {:must [(match-all)] - :must-not [(or clause (match-all))]})) + (bool-clause {:must [(match-all)] + :must-not [(q-or clause (match-all))]})) -(defn sort-by +(defn sort-descriptor "Creates a sort descriptor. Optionally specify :direction and :type (:string, :long, :double, :doc, :score)." ([field] - (sort-by field :asc)) + (sort-descriptor field :asc)) ([field direction] - (sort-by field direction :string)) + (sort-descriptor field direction :string)) ([field direction type] {:field (name field) :direction (keyword direction) :type (keyword type)}))

Also applies to: 92-92, 121-121, 130-130, 139-139, 145-145

🧰 Tools

🪛 GitHub Actions: Lint project with clj-kondo

[warning] 34-34: range already refers to #'clojure.core/range

coderabbitai · 2025-10-30T14:48:02Z

test/chrondb/api/sql/parser/clauses_test.clj

@@ -0,0 +1,47 @@
+(ns chrondb.api.sql.parser.clauses-test
+  (:require [clojure.test :refer [deftest is testing]]


⚠️ Potential issue | 🟡 Minor

Remove unused import.

The testing macro is imported but never used in any of the test definitions, as flagged by the linter.

Apply this diff to remove the unused import:

-(ns chrondb.api.sql.parser.clauses-test - (:require [clojure.test :refer [deftest is testing]] +(ns chrondb.api.sql.parser.clauses-test + (:require [clojure.test :refer [deftest is]] [chrondb.api.sql.parser.tokenizer :as tokenizer] [chrondb.api.sql.parser.clauses :as clauses]))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(:require [clojure.test :refer [deftest is testing]]

(ns chrondb.api.sql.parser.clauses-test

(:require [clojure.test :refer [deftest is]]

[chrondb.api.sql.parser.tokenizer :as tokenizer]

[chrondb.api.sql.parser.clauses :as clauses]))

🧰 Tools

🪛 GitHub Actions: Lint project with clj-kondo

[warning] 2-2: #'clojure.test/testing' is referred but never used

🤖 Prompt for AI Agents

In test/chrondb/api/sql/parser/clauses_test.clj around line 2, the require vector imports the testing macro but it is never used; remove testing from the (:require [clojure.test :refer [...]]) list so only used symbols (deftest and is) are referred, updating the require form accordingly to avoid the linter warning about an unused import.

coderabbitai · 2025-10-30T14:48:02Z

test/chrondb/index/lucene_test.clj

+      (let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
        (is (= 1 (count results)))
        (is (= "user:1" (first results))))))

  (testing "Remove document"
    (let [doc {:id "user:1" :name "John Doe" :age 30 :email "[email protected]"}]
      (index/index-document *test-index* doc)
-      (let [before-remove (index/search *test-index* "name" "John" "main")]
+      (let [before-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
        (is (= 1 (count before-remove)))
        (index/delete-document *test-index* (:id doc))
-        (let [after-remove (index/search *test-index* "name" "John" "main")]
+        (let [after-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
          (is (empty? after-remove)))))))


⚠️ Potential issue | 🟠 Major

Update tests for map-shaped search results

index/search-query now returns a result map (e.g. {:ids [...], :total ...}), but the tests still treat it as a plain seq. (count results) and (first results) will now operate on the map entries instead of the document ids, so these assertions will fail. Please destructure (:ids ...) (and adjust the rest of the file in the same way) before making assertions. For example:

- (let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})] - (is (= 1 (count results))) - (is (= "user:1" (first results))))) + (let [{:keys [ids]} (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})] + (is (= 1 (count ids))) + (is (= "user:1" (first ids))))))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In test/chrondb/index/lucene_test.clj around lines 27-38, the tests treat the return value of index/search-query as a seq but it now returns a map {:ids [...], :total ...}; update the lets that capture results, before-remove and after-remove to destructure the map (e.g. bind ids and total from the map), then change assertions to operate on ids (use (count ids), (first ids), and (empty? ids)) and optionally assert :total if needed so the assertions check document ids rather than map entries.

coderabbitai

Actionable comments posted: 12

♻️ Duplicate comments (2)

src/chrondb/index/lucene.clj (2)

85-105: Add doc values for sortable/filterable string fields.

String clauses (:exists, sorting) still fail because create-lucene-doc never writes doc values for non-numeric fields. During sorting Lucene throws IllegalStateException: field was indexed without docvalues, and DocValuesFieldExistsQuery always reports “missing”, so :exists/:missing are broken. Please add a SortedDocValuesField (or equivalent) whenever the field isn’t indexed as numeric, importing org.apache.lucene.document.SortedDocValuesField and org.apache.lucene.util.BytesRef. Example fix:

-          (.add ldoc (TextField. field-name field-value Field$Store/YES))
-
-          (cond
+          (.add ldoc (TextField. field-name field-value Field$Store/YES))
+
+          (cond
             long-val
             (let [long-arr (long-array [long-val])]
               (.add ldoc (LongPoint. field-name long-arr))
               (.add ldoc (NumericDocValuesField. field-name long-val)))
 
             double-val
             (let [double-arr (double-array [double-val])]
               (.add ldoc (DoublePoint. field-name double-arr))
               (.add ldoc (DoubleDocValuesField. field-name double-val)))
+            :else
+            (.add ldoc (SortedDocValuesField. field-name (BytesRef. field-value))))

This restores consistent behaviour for :exists, :missing, and string sorts.

147-155: Use numeric range queries for numeric value types.

Numeric range clauses still build TermRangeQuery, so numbers are compared lexicographically (“10” < “2”) and point fields (LongPoint, DoublePoint) are ignored. :range-long/:range-double therefore return wrong or empty results. Please switch to the point range APIs (e.g. LongPoint/newRangeQuery, DoublePoint/newRangeQuery) using parse-long-safe/parse-double-safe and adjust for exclusive bounds via inc/dec / Math/nextUp / Math/nextDown. Fall back to TermRangeQuery only when parsing fails. Example sketch:

-    (case value-type
-      :long (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
-      :double (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
-      (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?)))))
+    (case value-type
+      :long (let [lo (parse-long-safe lower)
+                  hi (parse-long-safe upper)]
+              (if (and (nil? lo) (nil? hi))
+                (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
+                (LongPoint/newRangeQuery field
+                                         (if lo (if include-lower? lo (min Long/MAX_VALUE (inc lo))) Long/MIN_VALUE)
+                                         (if hi (if include-upper? hi (max Long/MIN_VALUE (dec hi))) Long/MAX_VALUE))))
+      :double (let [lo (parse-double-safe lower)
+                    hi (parse-double-safe upper)]
+                (if (and (nil? lo) (nil? hi))
+                  (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?))
+                  (DoublePoint/newRangeQuery field
+                                             (if lo (if include-lower? lo (Math/nextUp lo)) Double/NEGATIVE_INFINITY)
+                                             (if hi (if include-upper? hi (Math/nextDown hi)) Double/POSITIVE_INFINITY))))
+      (TermRangeQuery. field lower upper (boolean include-lower?) (boolean include-upper?)))))

This aligns query behaviour with how documents are indexed.

🧹 Nitpick comments (6)

docs/operations.md (1)

220-228: Well-documented operational guidance for background reindexing.

The section clearly explains the non-blocking reindexing behavior and provides actionable observability guidance (specific log namespaces and metric prefixes). The recommendation to monitor after restores/imports is particularly valuable for production operators.

Minor suggestion: Consider documenting whether the hourly maintenance interval is configurable, or if it should be for production environments with varying load patterns.
README.md (1)
27-30: Consider adding direct documentation links.

The section clearly describes the Lucene upgrade motivation and benefits. To improve discoverability, consider adding direct links to the relevant documentation sections mentioned (e.g., docs/ast-queries.md, docs/architecture.md, or configuration guides).

Apply this diff to add helpful links:
-ChronDB's indexing layer now runs entirely on Apache Lucene, replacing the previous bespoke search structures. The upgrade brings production-grade analyzers, configurable secondary and composite indexes, geospatial capabilities, and a query planner with smart caching. The work was motivated by real-world workloads that demanded lower latency and richer search semantics powered by a proven search engine. Dive into the updated documentation to learn how to enable and tune the new indexes for your applications.
+ChronDB's indexing layer now runs entirely on Apache Lucene, replacing the previous bespoke search structures. The upgrade brings production-grade analyzers, configurable secondary and composite indexes, geospatial capabilities, and a query planner with smart caching. The work was motivated by real-world workloads that demanded lower latency and richer search semantics powered by a proven search engine. See [AST Queries](docs/ast-queries.md), [Architecture](docs/architecture.md), and [Performance](docs/performance.md) documentation to learn how to enable and tune the new indexes for your applications.
src/chrondb/cli/server.clj (1)
115-117: Add error handling for background index population.

Background index population may fail silently. While it's appropriate to run asynchronously, consider adding error handling or logging to track completion status.

Apply this diff to improve observability:
       _ (println "Lucene index created successfully: " (type index))
-      ;; Ensure index is populated with existing documents (in background)
-      _ (lucene/ensure-index-populated index storage "main")]
+      ;; Ensure index is populated with existing documents (in background)
+      _ (try
+          (lucene/ensure-index-populated index storage "main")
+          (println "Background index population initiated")
+          (catch Exception e
+            (println "Warning: Background index population failed:" (.getMessage e))))]
AGENT.md (1)
74-74: Add missing punctuation.

The line is missing a period at the end.

Apply this diff:
-6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer
+6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.
Based on static analysis hints.
src/chrondb/index/protocol.clj (1)

10-15: Consider adding deprecation timeline.

The protocol evolution is well-designed: the legacy search method is retained with a deprecation notice while the new search-query method is introduced with comprehensive documentation. Consider adding a timeline or version number for when the legacy method will be removed to help downstream consumers plan their migrations.
docs/ast-queries.md (1)
88-89: Clarify URL encoding for structured queries.

The example shows an EDN map in a query parameter, but in practice this would need URL encoding. Consider adding a note or showing the encoded form to prevent confusion.

Apply this diff to add clarification:
 # Structured AST query (EDN)
-GET /api/v1/search?query={:clauses [{:type :fts :field "content" :value "Software"}]}
+GET /api/v1/search?query=%7B%3Aclauses%20%5B%7B%3Atype%20%3Afts%20%3Afield%20%22content%22%20%3Avalue%20%22Software%22%7D%5D%7D
+# (URL-encoded EDN map: {:clauses [{:type :fts :field "content" :value "Software"}]})

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbafc0b and 8bd4c0f.

📒 Files selected for processing (30)

AGENT.md (2 hunks)
README.md (1 hunks)
docs/architecture.md (2 hunks)
docs/ast-queries.md (1 hunks)
docs/examples-redis.md (1 hunks)
docs/index.md (1 hunks)
docs/operations.md (1 hunks)
docs/performance.md (1 hunks)
src/chrondb/api/redis/core.clj (3 hunks)
src/chrondb/api/sql/execution/ast_converter.clj (1 hunks)
src/chrondb/api/sql/execution/operators.clj (2 hunks)
src/chrondb/api/sql/execution/query.clj (12 hunks)
src/chrondb/api/sql/parser/clauses.clj (1 hunks)
src/chrondb/api/sql/parser/statements.clj (1 hunks)
src/chrondb/api/sql/protocol/messages/writer.clj (1 hunks)
src/chrondb/api/v1.clj (1 hunks)
src/chrondb/api/v1/routes.clj (3 hunks)
src/chrondb/cli/core.clj (3 hunks)
src/chrondb/cli/http.clj (1 hunks)
src/chrondb/cli/server.clj (1 hunks)
src/chrondb/example.clj (1 hunks)
src/chrondb/index/lucene.clj (4 hunks)
src/chrondb/index/memory.clj (1 hunks)
src/chrondb/index/protocol.clj (1 hunks)
src/chrondb/query/ast.clj (1 hunks)
test/chrondb/api/ast_integration_test.clj (1 hunks)
test/chrondb/api/sql/execution/query_test.clj (1 hunks)
test/chrondb/api/sql/parser/clauses_test.clj (1 hunks)
test/chrondb/index/lucene_test.clj (5 hunks)
test/chrondb/index/memory_test.clj (3 hunks)

🧰 Additional context used

📓 Path-based instructions (9)

docs/**/*.md

📄 CodeRabbit inference engine (AGENT.md)

Document new endpoints and wire-protocol behaviors in docs/

Files:

docs/performance.md
docs/architecture.md
docs/examples-redis.md
docs/ast-queries.md
docs/index.md
docs/operations.md

docs/**

📄 CodeRabbit inference engine (AGENT.md)

Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Files:

docs/performance.md
docs/architecture.md
docs/examples-redis.md
docs/ast-queries.md
docs/index.md
docs/operations.md

AGENT.md

📄 CodeRabbit inference engine (AGENT.md)

Maintain and update this AGENT.md with developer-focused instructions

Files:

AGENT.md

src/chrondb/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Files:

src/chrondb/api/sql/parser/clauses.clj
src/chrondb/api/sql/protocol/messages/writer.clj
src/chrondb/cli/http.clj
src/chrondb/index/memory.clj
src/chrondb/api/sql/execution/ast_converter.clj
src/chrondb/api/redis/core.clj
src/chrondb/api/v1.clj
src/chrondb/api/sql/execution/operators.clj
src/chrondb/example.clj
src/chrondb/index/lucene.clj
src/chrondb/cli/server.clj
src/chrondb/query/ast.clj
src/chrondb/cli/core.clj
src/chrondb/index/protocol.clj
src/chrondb/api/sql/parser/statements.clj
src/chrondb/api/v1/routes.clj
src/chrondb/api/sql/execution/query.clj

**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

**/*.{clj,cljc}: Prefer pure functions and data transformations in Clojure code
Add docstrings for public vars and functions
Follow indentation as enforced by cljfmt
Use descriptive, chronological-operation-aware logging (structured where possible)
Guard shared mutable state; prefer atoms with swap! semantics
Leverage transduce, reduce, or pipelines for efficient data flows

Files:

src/chrondb/api/sql/parser/clauses.clj
src/chrondb/api/sql/protocol/messages/writer.clj
src/chrondb/cli/http.clj
src/chrondb/index/memory.clj
test/chrondb/index/lucene_test.clj
src/chrondb/api/sql/execution/ast_converter.clj
src/chrondb/api/redis/core.clj
src/chrondb/api/v1.clj
src/chrondb/api/sql/execution/operators.clj
src/chrondb/example.clj
test/chrondb/api/sql/parser/clauses_test.clj
test/chrondb/api/ast_integration_test.clj
src/chrondb/index/lucene.clj
test/chrondb/index/memory_test.clj
src/chrondb/cli/server.clj
test/chrondb/api/sql/execution/query_test.clj
src/chrondb/query/ast.clj
src/chrondb/cli/core.clj
src/chrondb/index/protocol.clj
src/chrondb/api/sql/parser/statements.clj
src/chrondb/api/v1/routes.clj
src/chrondb/api/sql/execution/query.clj

docs/examples-*.md

📄 CodeRabbit inference engine (AGENT.md)

Provide runnable examples in docs/examples-*.md when possible

Files:

docs/examples-redis.md

test/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

test/**/*.{clj,cljc}: Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution
Prefer property-based tests when sequence ordering matters

Files:

test/chrondb/index/lucene_test.clj
test/chrondb/api/sql/parser/clauses_test.clj
test/chrondb/api/ast_integration_test.clj
test/chrondb/index/memory_test.clj
test/chrondb/api/sql/execution/query_test.clj

README.md

📄 CodeRabbit inference engine (AGENT.md)

Update README.md for major user-facing changes

Files:

README.md

docs/operations.md

📄 CodeRabbit inference engine (AGENT.md)

When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Files:

docs/operations.md

🧠 Learnings (12)

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to README.md : Update README.md for major user-facing changes

Applied to files:

docs/performance.md
AGENT.md
docs/index.md
README.md
docs/operations.md

📚 Learning: 2025-10-12T09:40:13.442Z

Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: ChronDB's GitStorage only implements the internal protocol chrondb.storage.protocol/Storage and is not a Closeable. There is no simple way to use with-open or try-with-resources patterns without changing the storage's public API. The storage remains open and in use by the server throughout the process lifetime.

Applied to files:

docs/architecture.md
src/chrondb/api/redis/core.clj

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Leverage transduce, reduce, or pipelines for efficient data flows

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Prefer pure functions and data transformations in Clojure code

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to AGENT.md : Maintain and update this AGENT.md with developer-focused instructions

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/** : Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/operations.md : When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Applied to files:

AGENT.md
docs/operations.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/**/*.md : Document new endpoints and wire-protocol behaviors in docs/

Applied to files:

AGENT.md
docs/ast-queries.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/examples-*.md : Provide runnable examples in docs/examples-*.md when possible

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to src/chrondb/**/*.{clj,cljc} : Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Applied to files:

src/chrondb/api/sql/execution/ast_converter.clj
test/chrondb/api/sql/execution/query_test.clj

📚 Learning: 2025-10-12T09:40:13.442Z

Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: In ChronDB, the server mode (chrondb server) starts non-daemon threads for Jetty, Redis, and SQL servers that keep the process alive. Calling System/exit after server initialization would terminate the server immediately, breaking the main execution mode. The lifecycle is managed by clojure.main, which properly propagates exceptions with exit code ≠0.

Applied to files:

src/chrondb/api/redis/core.clj
src/chrondb/cli/server.clj

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution

Applied to files:

test/chrondb/api/sql/parser/clauses_test.clj
test/chrondb/api/ast_integration_test.clj
test/chrondb/index/memory_test.clj

🪛 LanguageTool

AGENT.md

[grammar] ~74-~74: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run SQL Protocol Benchmark

🔇 Additional comments (10)

src/chrondb/api/sql/parser/statements.clj (1)

157-157: LGTM! Multi-row INSERT support correctly implemented.

The change from (first values) to values correctly enables multi-row INSERT statements by returning all parsed value groups rather than just the first one. This aligns with the execution layer enhancements mentioned in the summary.

docs/examples-redis.md (1)

116-139: Excellent documentation addition for AST-based search.

The examples clearly demonstrate the new Lucene-backed search capabilities with progressive complexity (basic search → pagination → sorting → branch-specific → combined options). The inclusion of the FT.SEARCH alias for RediSearch compatibility is a nice touch for users migrating from Redis.

src/chrondb/api/sql/protocol/messages/writer.clj (1)

32-42: Robust type normalization for protocol messages.

The enhanced type handling correctly addresses multiple representations (nil, numbers, Bytes) and safely converts them to int where needed. The nil check at line 41 properly prevents writing a type byte for startup messages, which aligns with PostgreSQL protocol requirements.

The Byte-to-unsigned-int conversion using bit-and is standard practice for protocol implementations.

test/chrondb/api/sql/parser/clauses_test.clj (1)

1-47: Good test coverage for current parser capabilities.

The tests appropriately cover both implemented features (standard conditions, LIKE, FTS_MATCH, to_tsquery, multiple conditions) and document unimplemented features (BETWEEN, IN, NULL comparisons) by expecting empty results.

The helper function provides a clean abstraction, and the test cases validate both the structure and content of parsed conditions.

Note: These tests validate the current skip behavior for unimplemented SQL constructs. If the parser is enhanced to return unsupported-condition markers (as suggested in the clauses.clj review), these tests should be updated accordingly.

docs/index.md (1)

19-19: LGTM! Clear feature description.

The bullet point concisely describes the new Lucene-powered search capability and aligns with the detailed documentation added in other files (architecture.md, performance.md, examples-redis.md).

src/chrondb/example.clj (1)

26-27: LGTM!

The migration to index/search-query with map-based query parameters aligns with the new AST-based search architecture. The hardcoded limit of 100 and "main" branch are appropriate for example code.

AGENT.md (1)

98-98: LGTM!

Clear guidance for maintaining documentation consistency when modifying Lucene indexing, query planning, or analyzers. The list of documentation files to update is comprehensive.

src/chrondb/cli/http.clj (1)

120-133: LGTM!

The search-documents function follows the established pattern in this file: clear docstring, destructured parameters, and conditional query parameter building with cond->. The implementation is clean and consistent.

test/chrondb/api/sql/execution/query_test.clj (1)

38-47: LGTM!

The change to return a vector of documents instead of a filtered map is more intuitive for test assertions. The transformation is consistently applied across both arities and aligns with the existing get-documents-by-table pattern in the same mock storage.

docs/ast-queries.md (1)

1-220: Excellent documentation for the AST query system!

This documentation is comprehensive, well-structured, and provides practical examples for all protocols. The migration notes, pagination strategies, and type safety sections will be especially helpful for users adopting the new system. The content aligns well with the coding guideline to document new endpoints and wire-protocol behaviors.

coderabbitai · 2025-10-30T14:55:55Z

docs/performance.md

+## Lucene Indexing Overhaul
+
+ChronDB's search layer now leans entirely on Apache Lucene, bringing a substantial performance uplift compared to the earlier bespoke indexes. The rework was driven by production users that needed predictable latency under complex filters and full-text workloads. By adopting Lucene's optimized query execution engine, ChronDB now delegates scoring, boolean logic, and range evaluation to proven algorithms instead of reimplementing them in-house.
+
+### Why it matters
+
+- **Lower tail latencies**: Composite and secondary indexes can now be configured per collection, reducing random I/O and eliminating manual fan-out scans.
+- **Specialized analyzers**: Tokenization, stemming, and language-specific analyzers are first-class Lucene features and drastically improve relevance quality for full-text search scenarios.
+- **Query planner**: ChronDB inspects incoming queries to choose efficient index combinations, avoiding redundant segment scans and materializing only the minimal result set.
+- **Result caching**: Frequently used queries populate an eviction-aware cache, trimming recurring response times and protecting the repository from repeated heavy traversals.
+- **Geospatial acceleration**: Geohash and BKD tree indexes leverage Lucene's spatial extensions, enabling fast proximity queries without bespoke data structures.
+
+### Operational guidance
+
+- Configure index definitions alongside schema creation so that ChronDB can warm caches and statistics proactively.
+- Monitor the new index metrics under `chrondb.index.*` to track cache hit rates, planner fallbacks, and segment merge costs.
+- Use the `ANALYZE INDEX` tooling to recompute query statistics when workloads shift—ChronDB will use the new data to refine its execution plans.
+- When adding batch ingestion jobs, stage writes behind the asynchronous indexer to prevent cache stampedes and leverage Lucene's bulk segment writers.
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Comprehensive performance documentation for Lucene migration.

The section effectively communicates the rationale and benefits of the Lucene overhaul with appropriate technical depth. The operational guidance is specific and actionable, particularly the cache warming and monitoring recommendations.

Verify the ANALYZE INDEX command.

Line 63 references "ANALYZE INDEX tooling" for recomputing query statistics. Please confirm this command is implemented in this PR or documented as future work, to avoid referencing unavailable features.

🏁 Script executed:

#!/bin/bash # Search for ANALYZE INDEX implementation or documentation rg -i "analyze\s+index" --type clj rg -i "analyze-index" --type clj

Length of output: 55

🏁 Script executed:

#!/bin/bash # Search for ANALYZE INDEX without type restrictions echo "=== Searching for ANALYZE INDEX patterns ===" rg -i "analyze.?index" --max-count=20 echo "=== Checking file types in repo ===" fd -t f --extension='' | head -20 echo "=== Searching in docs/ for ANALYZE INDEX ===" rg -i "analyze.?index" docs/ echo "=== Searching for command registrations ===" rg -i "command|query.*analyze" -A2 -B2 | head -40

Length of output: 2939

🏁 Script executed:

#!/bin/bash # Broader search for analyze patterns in source code echo "=== Search for 'analyze' in source files ===" rg -i "analyze" src/ -A2 -B2 | head -50 echo "=== Search for query stats/statistics features ===" rg -i "query.*stat|statistics" src/ --type clj | head -20 echo "=== Search for TODOs or FIXMEs mentioning index or analyze ===" rg -i "todo|fixme.*index|todo.*analyze" src/

Length of output: 3474

Remove or clarify the ANALYZE INDEX reference in operational guidance.

Line 63 references "ANALYZE INDEX tooling" to recompute query statistics, but this command is not implemented anywhere in the codebase. Either:

Remove this bullet point if the feature is not yet available, or

Document it as planned future work and clarify current workarounds for query statistics

🤖 Prompt for AI Agents

In docs/performance.md around lines 47 to 65, the "ANALYZE INDEX" reference is inaccurate because the command isn't implemented; either remove that bullet or explicitly mark it as planned work and provide current workarounds: replace the sentence with a note that ANALYZE INDEX is planned (include an expected status like "planned" or "TBD") and add a short sentence describing current options to refresh query statistics such as reindexing the collection, allowing the indexer to fully warm caches after schema changes, or running targeted rebuilds of affected indexes until the tooling is available.

coderabbitai · 2025-10-30T14:55:55Z

src/chrondb/api/redis/core.clj

+            doc-ids (:ids result)
+            docs (filter some? (map #(storage/get-document storage % branch-name) doc-ids))
+
+            ;; Format results as array of JSON strings


⚠️ Potential issue | 🔴 Critical

Fix storage/get-document invocation arity

Everywhere else we still call storage/get-document with two args (storage + id). Passing branch-name here makes this path throw an ArityException before any result is returned, so Redis SEARCH currently fails. Drop the third argument (or add a dedicated branch-aware helper if that’s intentional).

- docs (filter some? (map #(storage/get-document storage % branch-name) doc-ids)) + docs (filter some? (map #(storage/get-document storage %) doc-ids))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

doc-ids (:ids result)

docs (filter some? (map #(storage/get-document storage % branch-name) doc-ids))

;; Format results as array of JSON strings

doc-ids (:ids result)

docs (filter some? (map #(storage/get-document storage %) doc-ids))

;; Format results as array of JSON strings

🤖 Prompt for AI Agents

In src/chrondb/api/redis/core.clj around lines 546 to 549, the call to storage/get-document is invoked with three arguments (storage, id, branch-name) which causes an ArityException because the function is defined to accept only two (storage, id); remove the third argument so the map uses (storage/get-document storage id) OR, if branch-specific behavior is required, add a new helper function in storage namespace (e.g., get-document-branch) that accepts (storage id branch-name) and call that instead, ensuring calls in this file match the storage API.

coderabbitai · 2025-10-30T14:55:55Z

src/chrondb/api/sql/execution/ast_converter.clj

+        "=" (ast/term field clean-val)
+        "!=" (ast/not (ast/term field clean-val))
+        "<>" (ast/not (ast/term field clean-val))
+        ">" (let [num-info (parse-number clean-val)]
+              (if num-info
+                (if (= (:type num-info) :long)
+                  (ast/range-long field (:value num-info) nil {:include-lower? false})
+                  (ast/range-double field (:value num-info) nil {:include-lower? false}))
+                (ast/range field clean-val nil {:include-lower? false})))
+        ">=" (let [num-info (parse-number clean-val)]
+               (if num-info
+                 (if (= (:type num-info) :long)
+                   (ast/range-long field (:value num-info) nil {:include-lower? true})
+                   (ast/range-double field (:value num-info) nil {:include-lower? true}))
+                 (ast/range field clean-val nil {:include-lower? true})))


⚠️ Potential issue | 🔴 Critical

Handle numeric equality/inequality with numeric clauses.

parse-number already detects numeric literals, but the "=", "!=", and "<>" branches ignore it and emit string term clauses. For numeric fields (e.g., age = 30), Lucene expects a numeric query; the string term will never match the indexed numeric value, so these comparisons silently fail.

Please reuse parse-number here and build the appropriate numeric clause (e.g., an inclusive range where lower==upper). For example:

- "=" (ast/term field clean-val) - "!=" (ast/not (ast/term field clean-val)) - "<>" (ast/not (ast/term field clean-val)) + "=" (if-let [num-info (parse-number clean-val)] + (case (:type num-info) + :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true}) + :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})) + (ast/term field clean-val)) + "!=" (if-let [num-info (parse-number clean-val)] + (ast/not + (case (:type num-info) + :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true}) + :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true}))) + (ast/not (ast/term field clean-val))) + "<>" (if-let [num-info (parse-number clean-val)] + (ast/not + (case (:type num-info) + :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true}) + :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true}))) + (ast/not (ast/term field clean-val)))

coderabbitai · 2025-10-30T14:55:56Z

src/chrondb/api/sql/execution/operators.clj

        cond-val (str/replace (:value condition) #"['\"]" "")
-        operator (str/lower-case (:op condition))]
-    (case operator
-      "=" (= (str field-val) cond-val)
-      "!=" (not= (str field-val) cond-val)
-      "<>" (not= (str field-val) cond-val)
-      ">" (> (compare (str field-val) cond-val) 0)
-      "<" (< (compare (str field-val) cond-val) 0)
-      ">=" (>= (compare (str field-val) cond-val) 0)
-      "<=" (<= (compare (str field-val) cond-val) 0)
-      "like" (re-find (re-pattern (str "(?i)" (str/replace cond-val #"%" ".*"))) (str field-val))
-      ;; Default case
-      (do
-        (log/log-warn (str "Unsupported operator: " operator))
-        false))))
+        operator (str/lower-case (:op condition))
+        ;; Try to parse as numbers for numeric comparisons
+        field-num (try-parse-number field-val)
+        cond-num (try-parse-number cond-val)]


⚠️ Potential issue | 🔴 Critical

Guard nil and non-string condition values before normalization

str/replace on (:value condition) blows up when the condition carries nil (e.g. IS NULL) or a numeric literal that already arrives as a number. In practice this now triggers NPE/ClassCastException before we even reach the operator case, so IS NULL queries crash the SQL path. Please coerce safely before stripping quotes.

- cond-val (str/replace (:value condition) #"['\"]" "") + raw-cond (:value condition) + cond-val (when (some? raw-cond) + (-> (if (string? raw-cond) + raw-cond + (str raw-cond)) + (str/replace #"['\"]" "")))

🤖 Prompt for AI Agents

In src/chrondb/api/sql/execution/operators.clj around lines 29–33, the code calls str/replace on (:value condition) and str/lower-case on (:op condition) without guarding for nil or non-string values, which throws when condition.value is nil or already a number; change cond-val to only strip quotes when the value is a string (leave nil or numeric values as-is) and coerce operator safely by lower-casing only when it is a string (or calling str first), e.g. compute raw-val (:value condition) then set cond-val to (if (string? raw-val) (str/replace raw-val #"['\"]" "") raw-val) and set operator similarly (if (string? (:op condition)) (str/lower-case (:op condition)) (str/lower-case (str (:op condition)))) so numeric and nil values no longer cause NPE/ClassCastException.

coderabbitai · 2025-10-30T14:55:56Z

src/chrondb/api/sql/execution/query.clj

+                         (let [doc-ids (:ids lucene-result)
+                               ;; Deduplicate IDs before fetching documents
+                               unique-doc-ids (distinct doc-ids)
+                               docs (filter some? (map #(storage/get-document storage % branch-name) unique-doc-ids))]
+                           (log/log-info (str "Lucene returned " (count doc-ids) " document IDs, " (count unique-doc-ids) " unique, fetched " (count docs) " documents"))
+                           ;; Deduplicate by ID to ensure no duplicates
+                           (->> docs
+                                (clojure.core/group-by :id)
+                                (vals)
+                                (map first)))


⚠️ Potential issue | 🟠 Major

Preserve Lucene ordering when removing duplicates.

After fetching documents by ID you regroup by :id and then take (vals) on the hash-map, which reorders the sequence. Any Lucene sort (or pagination cursor) becomes meaningless because results come back in arbitrary hash order. You already deduplicated IDs with distinct, so drop the group-by step and keep the original order (e.g. just (vec docs)). Without this fix ORDER BY and cursor-based paging will return unstable sequences.

- ;; Deduplicate by ID to ensure no duplicates - (->> docs - (clojure.core/group-by :id) - (vals) - (map first))) + ;; Already deduplicated by `distinct`; preserve Lucene order + (vec docs))

🤖 Prompt for AI Agents

In src/chrondb/api/sql/execution/query.clj around lines 328-337, the post-fetch regrouping via group-by and vals reorders results and destroys Lucene's sort/cursor ordering; remove the group-by/vals/(map first) pipeline and instead preserve Lucene ordering by returning the fetched documents in the original fetch order (e.g. construct docs by mapping over unique-doc-ids and filtering nil, or simply use the docs vector as-is after deduplication), ensuring no additional reordering occurs.

coderabbitai · 2025-10-30T14:55:57Z

src/chrondb/api/v1/routes.clj

+        branch-name (or (not-empty branch) "main")
+        ;; Parse sort descriptors
+        sort-descriptors (parse-sort-param sort)
+
+        ;; Parse searchAfter cursor
+        search-after-cursor (or (parse-search-after after)
+                               (parse-search-after cursor))
+
+        ;; Parse limit and offset
+        limit-int (when limit (try (Integer/parseInt limit) (catch Exception _ nil)))
+        offset-int (when offset (try (Integer/parseInt offset) (catch Exception _ nil)))
+
+        ;; Build AST query
+        ast-query (cond
+                   ;; Use structured query if provided
+                   (parse-structured-query query)
+                   (let [structured (parse-structured-query query)]
+                     (ast/query (:clauses structured)
+                               {:sort sort-descriptors
+                                :limit limit-int
+                                :offset offset-int
+                                :branch branch-name
+                                :after search-after-cursor}))
+
+                   ;; Convert simple query string to AST
+                   (some? q)
+                   (let [fts-clause (ast/fts "content" q)]
+                     (ast/query [fts-clause]
+                               {:sort sort-descriptors
+                                :limit limit-int
+                                :offset offset-int
+                                :branch branch-name
+                                :after search-after-cursor}))
+
+                   ;; Empty query
+                   :else
+                   (ast/query []
+                             {:sort sort-descriptors
+                              :limit limit-int
+                              :offset offset-int
+                              :branch branch-name
+                              :after search-after-cursor}))
+
+        _ (log/log-info (str "AST query built: " ast-query))
+
+        ;; Build options map
+        opts (cond-> {}
+               limit-int (assoc :limit limit-int)
+               offset-int (assoc :offset offset-int)
+               (seq sort-descriptors) (assoc :sort sort-descriptors)
+               search-after-cursor (assoc :after search-after-cursor))
+
+        ;; Execute query
+        result (if (seq (:clauses ast-query))
+                (index/search-query index ast-query branch-name opts)
+                {:ids [] :total 0 :limit (or limit-int 100) :offset (or offset-int 0)})
+
+        ;; Fetch full documents from storage using IDs
+        results (if (seq (:ids result))
+                 (filter some? (map #(storage/get-document storage % branch-name) (:ids result)))
+                 [])
+
+        ;; Prepare response with next cursor if available
+        response-body (cond-> {:results (vec results)
+                               :total (:total result)
+                               :limit (:limit result)
+                               :offset (:offset result)}
+                        (:next-cursor result)
+                        (assoc :next-cursor
+                               (let [encoder (java.util.Base64/getEncoder)
+                                     cursor-data {:doc (.-doc ^org.apache.lucene.search.ScoreDoc (:next-cursor result))
+                                                  :score (.-score ^org.apache.lucene.search.ScoreDoc (:next-cursor result))}
+                                     encoded (String. (.encode encoder (.getBytes (pr-str cursor-data) "UTF-8")) "UTF-8")]
+                                 encoded)))]


⚠️ Potential issue | 🔴 Critical

Respect the configured default branch.

We again fall back to "main" whenever the client omits branch. On instances whose configured default branch is "master" (or any other value), every search will point at a non-existent branch and return empty results. Please load the default branch from configuration (you may need to require [chrondb.config :as config]) and only fall back to "main" if the config is unset/blank.

- {:keys [q query branch limit offset sort after cursor]} params - branch-name (or (not-empty branch) "main") + {:keys [q query branch limit offset sort after cursor]} params + cfg (config/load-config) + configured-branch (some-> (get-in cfg [:git :default-branch]) str/trim) + branch-name (if (str/blank? branch) + (or (not (str/blank? configured-branch) configured-branch) "main") + branch)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/chrondb/api/v1/routes.clj around lines 94-167, the branch-name binding always falls back to the hard-coded "main" when branch is missing; change it to read the configured default branch from chrondb.config (e.g. require [chrondb.config :as config]) and use that value when branch is blank, only falling back to "main" if the config value is nil/blank; update the branch-name expression to prefer the explicit branch param, then the config default (checking non-empty), and finally "main", and add the necessary require at the top of the file.

coderabbitai · 2025-10-30T14:55:57Z

src/chrondb/cli/server.clj

+    (sql-server/start-server storage index sql-port))
+  ;; Start index maintenance task (runs every 60 minutes by default)
+  (lucene/start-index-maintenance-task index 60))


🛠️ Refactor suggestion | 🟠 Major

Add error handling for index maintenance task.

Starting the index maintenance task lacks error handling. If lucene/start-index-maintenance-task throws an exception, the servers will be partially initialized without maintenance, and the error may be silently swallowed.

Apply this diff to add error handling:

(when-not disable-sql (println "Starting SQL protocol server on port" sql-port) (sql-server/start-server storage index sql-port)) - ;; Start index maintenance task (runs every 60 minutes by default) - (lucene/start-index-maintenance-task index 60)) + ;; Start index maintenance task (runs every 60 minutes by default) + (try + (lucene/start-index-maintenance-task index 60) + (println "Index maintenance task started successfully") + (catch Exception e + (println "Warning: Failed to start index maintenance task:" (.getMessage e)))))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(sql-server/start-server storage index sql-port))

;; Start index maintenance task (runs every 60 minutes by default)

(lucene/start-index-maintenance-task index 60))

(sql-server/start-server storage index sql-port))

;; Start index maintenance task (runs every 60 minutes by default)

(try

(lucene/start-index-maintenance-task index 60)

(println "Index maintenance task started successfully")

(catch Exception e

(println "Warning: Failed to start index maintenance task:" (.getMessage e)))))

🤖 Prompt for AI Agents

In src/chrondb/cli/server.clj around lines 105 to 107, the call to lucene/start-index-maintenance-task lacks error handling and could throw, leaving servers partially initialized; wrap the call in a try/catch (or try/catch and finally) that catches Throwable, logs the error with context and the exception stack (using the existing logger), and then either rethrow or call a controlled shutdown/exit if you want to fail fast; ensure the catch does not silently swallow the exception and that the log message clearly states the maintenance task failed and includes the exception details.

coderabbitai · 2025-10-30T14:55:57Z

src/chrondb/index/memory.clj

+                    (let [{:keys [type field value lower upper include-lower? include-upper? must should must-not filter]} clause]
+                      (case type
+                        :match-all true
+                        :term (= (get doc (keyword field)) value)
+                        :wildcard (let [pattern (-> value str/lower-case (str/replace "*" ".*") (str/replace "?" "."))
+                                        regex (re-pattern pattern)
+                                        v (some-> (get doc (keyword field)) str str/lower-case)]
+                                    (boolean (and v (re-matches regex v))))
+                        :range (let [doc-val (get doc (keyword field))]


⚠️ Potential issue | 🟠 Major

Escape regex metacharacters when translating wildcards

The wildcard branch currently replaces */? but leaves every other regex metacharacter untouched. Searching for acme.com ends up matching acmexcom, [foo], etc., because ./[] are interpreted by the regex engine. Please escape the literal characters before expanding the wildcard tokens.

- :wildcard (let [pattern (-> value str/lower-case (str/replace "*" ".*") (str/replace "?" ".")) - regex (re-pattern pattern) + :wildcard (let [regex (->> (str/lower-case (str value)) + (map (fn [ch] + (case ch + \* ".*" + \? "." + (java.util.regex.Pattern/quote (str ch))))) + (apply str) + re-pattern) v (some-> (get doc (keyword field)) str str/lower-case)] (boolean (and v (re-matches regex v))))

🤖 Prompt for AI Agents

In src/chrondb/index/memory.clj around lines 67 to 75, the wildcard branch builds a regex by only replacing '*' and '?' which leaves other regex metacharacters unescaped (so e.g. '.' or '[]' are treated as regex syntax). Fix by first escaping regex metacharacters in the input string (except '*' and '?') and then expand '*' -> '.*' and '?' -> '.'; e.g. convert value to string/lower-case, run a replace that escapes characters like . \ + ^ $ ( ) { } [ ] | so they become literal (or use a targeted replace with a capture group to prefix them with a backslash), then replace the escaped '*'/'?' tokens with ".*" and "." respectively, build the re-pattern from that escaped-and-transformed string, and use it for re-matches.

coderabbitai · 2025-10-30T14:55:57Z

src/chrondb/index/memory.clj

+                        :range (let [doc-val (get doc (keyword field))]
+                                 (when doc-val
+                                   (let [comp (compare (str doc-val) (str (or lower doc-val)))
+                                         comp-upper (compare (str doc-val) (str (or upper doc-val)))]
+                                     (and (if lower
+                                            (if include-lower?
+                                              (>= comp 0)
+                                              (> comp 0))
+                                            true)
+                                          (if upper
+                                            (if include-upper?
+                                              (<= comp-upper 0)
+                                              (< comp-upper 0))
+                                            true)))))


⚠️ Potential issue | 🟠 Major

Handle numeric range comparisons numerically

compare on stringified values breaks numeric ranges (:age 10 fails the > 2 test because "10" < "2" lexicographically). We need to coerce inputs to numbers when possible, falling back to case-insensitive string compare only when both sides are non-numeric.

- :range (let [doc-val (get doc (keyword field))] - (when doc-val - (let [comp (compare (str doc-val) (str (or lower doc-val))) - comp-upper (compare (str doc-val) (str (or upper doc-val)))] - (and (if lower - (if include-lower? - (>= comp 0) - (> comp 0)) - true) - (if upper - (if include-upper? - (<= comp-upper 0) - (< comp-upper 0)) - true))))) + :range (let [doc-val (get doc (keyword field)) + doc-num (parse-num doc-val) + lower-num (parse-num lower) + upper-num (parse-num upper) + doc-str (when (some? doc-val) (str/lower-case (str doc-val))) + lower-str (when (some? lower) (str/lower-case (str lower))) + upper-str (when (some? upper) (str/lower-case (str upper)))] + (when (some? doc-val) + (and (if (some? lower) + (if (and doc-num lower-num) + (if include-lower? + (>= doc-num lower-num) + (> doc-num lower-num)) + (let [cmp (compare doc-str lower-str)] + (if include-lower? (>= cmp 0) (> cmp 0)))) + true) + (if (some? upper) + (if (and doc-num upper-num) + (if include-upper? + (<= doc-num upper-num) + (< doc-num upper-num)) + (let [cmp (compare doc-str upper-str)] + (if include-upper? (<= cmp 0) (< cmp 0)))) + true))))

Add a local parse-num helper (as above) that returns a Double for numeric strings or numbers, otherwise nil.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/chrondb/index/memory.clj around lines 75 to 88, the current range comparison stringifies values and uses compare which breaks numeric ranges (e.g. 10 vs 2). Add a local parse-num helper that returns a Double when given a numeric value or numeric string and nil otherwise, then in the :range branch try to parse both doc-val and bound (lower/upper) into numbers; if both sides are numeric perform numeric comparisons (>, >=, <, <=) using those Doubles, otherwise fall back to case-insensitive string compare as before; preserve the existing include-lower?/include-upper? logic and short-circuit when lower/upper are nil.

coderabbitai · 2025-10-30T14:55:57Z

test/chrondb/index/lucene_test.clj

+      (let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
        (is (= 1 (count results)))
        (is (= "user:1" (first results))))))

  (testing "Remove document"
    (let [doc {:id "user:1" :name "John Doe" :age 30 :email "[email protected]"}]
      (index/index-document *test-index* doc)
-      (let [before-remove (index/search *test-index* "name" "John" "main")]
+      (let [before-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
        (is (= 1 (count before-remove)))
        (index/delete-document *test-index* (:id doc))
-        (let [after-remove (index/search *test-index* "name" "John" "main")]
+        (let [after-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]
          (is (empty? after-remove)))))))


⚠️ Potential issue | 🔴 Critical

Update expectations for the new search-query result shape.

index/search-query now returns a map (e.g., {:ids [...], :total ..., ...}) as used by the REST handler and the AST integration tests. Treating the return value as a plain vector (count results, first results) no longer works and these assertions will fail. Please destructure :ids (and any other fields you need) before asserting.

- (let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})] - (is (= 1 (count results))) - (is (= "user:1" (first results)))) + (let [{:keys [ids]} (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})] + (is (= 1 (count ids))) + (is (= "user:1" (first ids))))

Please apply the same adjustment to the other assertions in this file.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(let [results (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]

(is (= 1 (count results)))

(is (= "user:1" (first results))))))

(testing "Remove document"

(let [doc {:id "user:1" :name "John Doe" :age 30 :email "[email protected]"}]

(index/index-document *test-index* doc)

(let [before-remove (index/search *test-index* "name" "John" "main")]

(let [before-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]

(is (= 1 (count before-remove)))

(index/delete-document *test-index* (:id doc))

(let [after-remove (index/search *test-index* "name" "John" "main")]

(let [after-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]

(is (empty? after-remove)))))))

(let [{:keys [ids]} (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]

(is (= 1 (count ids)))

(is (= "user:1" (first ids))))))

(testing "Remove document"

(let [doc {:id "user:1" :name "John Doe" :age 30 :email "[email protected]"}]

(index/index-document *test-index* doc)

(let [before-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]

(is (= 1 (count before-remove)))

(index/delete-document *test-index* (:id doc))

(let [after-remove (index/search-query *test-index* {:field "name" :value "John"} "main" {:limit 10})]

(is (empty? after-remove)))))))

🤖 Prompt for AI Agents

In test/chrondb/index/lucene_test.clj around lines 27 to 38, the test assumes index/search-query returns a plain vector but it now returns a map (e.g. {:ids [...], :total ...}); update the assertions to destructure the map and operate on :ids (and any other needed keys) instead — bind results (or {:keys [ids total] :as results}) then use (count ids) and (first ids) for your existing checks, and apply the same destructuring and replacement to the other assertions in this file that treat the return as a vector.

…y ensuring Jackson classes are runtime-initialized - Fixes native-image docker build failures caused by Jackson JSON classes with static initialization constraints. - Registers `com.fasterxml.jackson.core.JsonFactory` and `com.fasterxml.jackson.core.io.SerializedString` in `base-run-time-classes` to explicitly force runtime initialization. - Updates AGENT.md for developer guidance: all changes to JSON/Jackson usage must keep this runtime-classes list in sync. - No functional API change; allows docker builds to succeed and ChronDB start up reliably in GraalVM images. Signed-off-by: Avelino <[email protected]>

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bd4c0f and 5780a93.

📒 Files selected for processing (3)

AGENT.md (3 hunks)
dev/chrondb/native_image.clj (2 hunks)
src/chrondb/core.clj (2 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

src/chrondb/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Files:

src/chrondb/core.clj

**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

**/*.{clj,cljc}: Prefer pure functions and data transformations in Clojure code
Add docstrings for public vars and functions
Follow indentation as enforced by cljfmt
Use descriptive, chronological-operation-aware logging (structured where possible)
Guard shared mutable state; prefer atoms with swap! semantics
Leverage transduce, reduce, or pipelines for efficient data flows

Files:

src/chrondb/core.clj
dev/chrondb/native_image.clj

dev/chrondb/native_image.clj

📄 CodeRabbit inference engine (AGENT.md)

Generate uberjar and GraalVM metadata via clojure -M:build -- --uberjar from the native_image helper module

Files:

dev/chrondb/native_image.clj

AGENT.md

📄 CodeRabbit inference engine (AGENT.md)

Maintain and update this AGENT.md with developer-focused instructions

Files:

AGENT.md

🧠 Learnings (18)

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to src/chrondb/**/*.{clj,cljc} : Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Applied to files:

src/chrondb/core.clj
dev/chrondb/native_image.clj
AGENT.md

📚 Learning: 2025-10-12T09:40:13.442Z

Learnt from: avelino
PR: moclojer/chrondb#59
File: src/chrondb/core.clj:158-171
Timestamp: 2025-10-12T09:40:13.442Z
Learning: In ChronDB, the server mode (chrondb server) starts non-daemon threads for Jetty, Redis, and SQL servers that keep the process alive. Calling System/exit after server initialization would terminate the server immediately, breaking the main execution mode. The lifecycle is managed by clojure.main, which properly propagates exceptions with exit code ≠0.

Applied to files:

src/chrondb/core.clj

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to Dockerfile : Docker build steps: install Clojure, run clojure -M:build -- --uberjar to produce target/chrondb.jar, then native-image target/native-image-args -H:Name=chrondb

Applied to files:

src/chrondb/core.clj
dev/chrondb/native_image.clj
AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to dev/chrondb/native_image.clj : Generate uberjar and GraalVM metadata via clojure -M:build -- --uberjar from the native_image helper module

Applied to files:

src/chrondb/core.clj
dev/chrondb/native_image.clj
AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-image-args : Always include --features=clj_easy.graal_build_time.InitClojureClasses in native-image arguments

Applied to files:

dev/chrondb/native_image.clj
AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-config/** : When adding reflection or dynamic class loading, add explicit GraalVM configuration under target/native-config/

Applied to files:

dev/chrondb/native_image.clj

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-image-args : Do not include -H:+UnlockExperimentalVMOptions in native-image arguments (JGit classes already set for build-time init)

Applied to files:

dev/chrondb/native_image.clj
AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to README.md : Update README.md for major user-facing changes

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Leverage transduce, reduce, or pipelines for efficient data flows

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Prefer pure functions and data transformations in Clojure code

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to AGENT.md : Maintain and update this AGENT.md with developer-focused instructions

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/** : Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/operations.md : When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/**/*.md : Document new endpoints and wire-protocol behaviors in docs/

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/examples-*.md : Provide runnable examples in docs/examples-*.md when possible

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to .github/workflows/build-native-image.yml : Use build-native-image.yml as the reference for native-image pipeline and smoke tests; keep it aligned with documented steps

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to Dockerfile : Dockerfile must use builder image ghcr.io/graalvm/native-image:ol9-java17-22.3.2 and runtime base debian:12-slim

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution

Applied to files:

AGENT.md

🪛 LanguageTool

AGENT.md

[grammar] ~74-~74: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~117-~117: Please add a punctuation mark at the end of paragraph.
Context: ... runtime instead of during native image analysis ### Advanced Native Image Topics - **...

(PUNCTUATION_PARAGRAPH_END)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Run SQL Protocol Benchmark
GitHub Check: test
GitHub Check: test

🔇 Additional comments (4)

AGENT.md (1)

98-98: Comprehensive documentation guidance for Lucene changes.

The added instruction appropriately covers all relevant documentation files that need updates when modifying Lucene indexing capabilities.

dev/chrondb/native_image.clj (3)

108-110: Jackson runtime initialization properly configured.

The addition of JsonFactory and SerializedString to runtime classes ensures Jackson components initialize at runtime rather than build time, preventing potential issues with JSON processing in the native image. This aligns with the guidance added in AGENT.md.

Based on learnings.

274-274: Improved naming clarity.

Renaming trace-args to trace-object-args better reflects the actual flag being used (--trace-object-instantiation=).

276-280: Logical argument ordering.

The reordering separates trace-specific arguments from the general tail-args, improving the organization and readability of the native-image argument structure.

coderabbitai · 2025-10-30T15:29:29Z

AGENT.md

 3. Leverage `transduce`, `reduce`, or pipelines for efficient data flows
 4. Guard shared mutable state; prefer atoms with swap! semantics
 5. Profile Git-heavy operations when adding loops or recursion
+6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer


⚠️ Potential issue | 🟡 Minor

Add missing punctuation mark.

The sentence should end with a period per standard grammar conventions for list items.

Apply this diff:

-6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer +6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer

6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.

🧰 Tools

🪛 LanguageTool

[grammar] ~74-~74: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)

🤖 Prompt for AI Agents

In AGENT.md around line 74, the list item lacks a terminating period; update the line so it ends with a period (add a "." at the end of the sentence) to match standard punctuation for list items.

coderabbitai · 2025-10-30T15:29:29Z

AGENT.md

+- Ensure Jackson factories stay runtime-initialized: `dev/chrondb/native_image.clj` registers `com.fasterxml.jackson.core.JsonFactory` and related classes under `base-run-time-classes`; keep this list in sync when touching JSON dependencies
+- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis


⚠️ Potential issue | 🟡 Minor

Add missing punctuation mark.

Line 117 should end with a period per standard grammar conventions.

Apply this diff:

-- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis +- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Ensure Jackson factories stay runtime-initialized: `dev/chrondb/native_image.clj` registers `com.fasterxml.jackson.core.JsonFactory` and related classes under `base-run-time-classes`; keep this list in sync when touching JSON dependencies

- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis

- Ensure Jackson factories stay runtime-initialized: `dev/chrondb/native_image.clj` registers `com.fasterxml.jackson.core.JsonFactory` and related classes under `base-run-time-classes`; keep this list in sync when touching JSON dependencies

- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis.

🧰 Tools

🪛 LanguageTool

[grammar] ~117-~117: Please add a punctuation mark at the end of paragraph.
Context: ... runtime instead of during native image analysis ### Advanced Native Image Topics - **...

(PUNCTUATION_PARAGRAPH_END)

🤖 Prompt for AI Agents

AGENT.md around lines 116-117: the sentence on line 117 is missing a terminal period; add a period at the end of the second bullet so the line reads as a complete sentence; update the punctuation only, preserving existing wording and spacing.

coderabbitai · 2025-10-30T15:29:30Z

src/chrondb/core.clj

+            (println (server-usage))
            (println)
-            (println (cli/usage)))
-    :cli (apply cli/-main args)
-    (let [{:keys [command args value]} (server/parse-command args)]
+            (println (cli-usage)))
+    :cli (apply cli-main args)
+    (let [{:keys [command args value]} (parse-server-command args)]
      (if (= command :unknown)
        (do
          (println "Unknown command" value)
-          (println (server/usage)))
-        (server/dispatch! command args)))))
+          (println (server-usage)))
+        (dispatch-server! command args)))))


⚠️ Potential issue | 🟡 Minor

Avoid printing nil after usage helpers

Both chrondb.cli.server/usage and chrondb.cli.core/usage already print their banners and return nil. Wrapping them in println now adds a literal nil line after the help output. Guard the println so we only emit when the helper actually returns text, while still invoking the helper for its side effects.

- :help (do - (println (server-usage)) - (println) - (println (cli-usage))) + :help (do + (when-let [banner (server-usage)] + (println banner)) + (println) + (when-let [banner (cli-usage)] + (println banner))) @@ - (do - (println "Unknown command" value) - (println (server-usage))) + (do + (println "Unknown command" value) + (when-let [banner (server-usage)] + (println banner)))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/chrondb/core.clj around lines 73 to 82, the usage helpers are being wrapped in println which prints a trailing "nil" because chrondb.cli.server/usage and chrondb.cli.core/usage print their own banners and return nil; remove the println wrappers or guard them: call the usage helper for its side effects and only println its return value if non-nil (e.g. capture the return and (when result (println result))). Ensure the helpers are still invoked so banners appear but avoid emitting a literal "nil" line.

…tests - Implement `chrondb.api.sql.execution.ast-converter` to translate SQL WHERE condition maps to unified AST query clauses used by Lucene. - Support FTS, standard operators (=, !=, <>, >, <, >=, <=), LIKE (wildcards), IS [NOT] NULL, and number parsing for range queries. - Gracefully handle unknown types and operators, always preserving immutability and fast failure semantics. - Add unit tests for all major conversion and edge cases to `test/chrondb/api/sql/execution/ast_converter_test.clj`. - Document AST composition and query helper contract in `docs/ast-queries.md`. - Cross-link property-based test location in AGENT.md for better developer guidance. This supports full protocol-level parity and reproducible query planning across ChronDB HTTP, Redis, and SQL endpoints. Signed-off-by: Avelino <[email protected]>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

src/chrondb/api/sql/execution/ast_converter.clj (1)

43-45: Handle numeric equality/inequality with numeric clauses.

The equality operators still use string term clauses, which won't match numeric field values indexed by Lucene. For example, age = 30 will fail because the indexed numeric value won't match a string term query. This was flagged in a previous review.

Reuse parse-number to detect numeric literals and build appropriate numeric range clauses (with lower==upper for equality):

-        "=" (ast/term field clean-val)
-        "!=" (ast/not (ast/term field clean-val))
-        "<>" (ast/not (ast/term field clean-val))
+        "=" (if-let [num-info (parse-number clean-val)]
+              (case (:type num-info)
+                :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true}))
+              (ast/term field clean-val))
+        "!=" (if-let [num-info (parse-number clean-val)]
+               (ast/not
+                 (case (:type num-info)
+                   :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                   :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})))
+               (ast/not (ast/term field clean-val)))
+        "<>" (if-let [num-info (parse-number clean-val)]
+               (ast/not
+                 (case (:type num-info)
+                   :long (ast/range-long field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})
+                   :double (ast/range-double field (:value num-info) (:value num-info) {:include-lower? true :include-upper? true})))
+               (ast/not (ast/term field clean-val)))

AGENT.md (2)

75-75: Add missing punctuation mark.

The sentence should end with a period per standard grammar conventions.

Apply this diff:

-6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer
+6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.

118-118: Add missing punctuation mark.

The sentence should end with a period per standard grammar conventions.

Apply this diff:

-- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis
+- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5780a93 and 8d9fa50.

📒 Files selected for processing (5)

AGENT.md (4 hunks)
docs/ast-queries.md (1 hunks)
src/chrondb/api/sql/execution/ast_converter.clj (1 hunks)
test/chrondb/api/sql/execution/ast_converter_test.clj (1 hunks)
test/chrondb/query/ast_test.clj (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

docs/ast-queries.md

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

**/*.{clj,cljc}: Prefer pure functions and data transformations in Clojure code
Add docstrings for public vars and functions
Follow indentation as enforced by cljfmt
Use descriptive, chronological-operation-aware logging (structured where possible)
Guard shared mutable state; prefer atoms with swap! semantics
Leverage transduce, reduce, or pipelines for efficient data flows

Files:

test/chrondb/api/sql/execution/ast_converter_test.clj
test/chrondb/query/ast_test.clj
src/chrondb/api/sql/execution/ast_converter.clj

test/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

test/**/*.{clj,cljc}: Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution
Prefer property-based tests when sequence ordering matters

Files:

test/chrondb/api/sql/execution/ast_converter_test.clj
test/chrondb/query/ast_test.clj

src/chrondb/**/*.{clj,cljc}

📄 CodeRabbit inference engine (AGENT.md)

Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Files:

src/chrondb/api/sql/execution/ast_converter.clj

AGENT.md

📄 CodeRabbit inference engine (AGENT.md)

Maintain and update this AGENT.md with developer-focused instructions

Files:

AGENT.md

🧠 Learnings (17)

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Include integration tests for Git-backed storage, and add regression tests for concurrency and conflict resolution

Applied to files:

test/chrondb/api/sql/execution/ast_converter_test.clj
test/chrondb/query/ast_test.clj
AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to test/**/*.{clj,cljc} : Prefer property-based tests when sequence ordering matters

Applied to files:

test/chrondb/query/ast_test.clj
AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to src/chrondb/**/*.{clj,cljc} : Namespaces must use the chrondb.* prefix (e.g., ns chrondb.core) and idiomatic kebab-case var names

Applied to files:

src/chrondb/api/sql/execution/ast_converter.clj
AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to README.md : Update README.md for major user-facing changes

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Leverage transduce, reduce, or pipelines for efficient data flows

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to **/*.{clj,cljc} : Prefer pure functions and data transformations in Clojure code

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to AGENT.md : Maintain and update this AGENT.md with developer-focused instructions

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/** : Keep developer-focused instructions in AGENT.md and maintain docs/ for user-facing and protocol documentation

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/operations.md : When modifying backup/restore features, update docs/operations.md with CLI/REST examples and incremental bundle flow

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/**/*.md : Document new endpoints and wire-protocol behaviors in docs/

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to docs/examples-*.md : Provide runnable examples in docs/examples-*.md when possible

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to dev/chrondb/native_image.clj : Generate uberjar and GraalVM metadata via clojure -M:build -- --uberjar from the native_image helper module

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to Dockerfile : Docker build steps: install Clojure, run clojure -M:build -- --uberjar to produce target/chrondb.jar, then native-image target/native-image-args -H:Name=chrondb

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to .github/workflows/build-native-image.yml : Use build-native-image.yml as the reference for native-image pipeline and smoke tests; keep it aligned with documented steps

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-image-args : Do not include -H:+UnlockExperimentalVMOptions in native-image arguments (JGit classes already set for build-time init)

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to target/native-image-args : Always include --features=clj_easy.graal_build_time.InitClojureClasses in native-image arguments

Applied to files:

AGENT.md

📚 Learning: 2025-10-12T09:48:12.351Z

Learnt from: CR
PR: moclojer/chrondb#0
File: AGENT.md:0-0
Timestamp: 2025-10-12T09:48:12.351Z
Learning: Applies to Dockerfile : Dockerfile must use builder image ghcr.io/graalvm/native-image:ol9-java17-22.3.2 and runtime base debian:12-slim

Applied to files:

AGENT.md

🪛 LanguageTool

AGENT.md

[grammar] ~75-~75: Please add a punctuation mark at the end of paragraph.
Context: ...es that bypass the centralized indexing layer ### When Reviewing Code 1. Confirm ti...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~118-~118: Please add a punctuation mark at the end of paragraph.
Context: ... runtime instead of during native image analysis ### Advanced Native Image Topics - **...

(PUNCTUATION_PARAGRAPH_END)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Run SQL Protocol Benchmark
GitHub Check: test
GitHub Check: test

🔇 Additional comments (11)

src/chrondb/api/sql/execution/ast_converter.clj (3)

6-23: LGTM!

The helper functions correctly strip quotes and parse numeric values with appropriate fallback handling.

46-74: LGTM!

The comparison operators (>, >=, <, <=) correctly use parse-number to detect numeric values and create appropriate range clauses. The LIKE, IS NULL, and IS NOT NULL operators map correctly to their AST equivalents.

79-91: LGTM!

The function correctly uses keep to filter nil clauses and appropriately combines multiple conditions with AND logic, with proper handling of single-condition edge cases.

AGENT.md (2)

57-57: LGTM!

Good addition documenting the location of AST query test coverage, which aligns with the new AST-based query architecture introduced in this PR.

99-99: LGTM!

Appropriate guidance for maintainers to update architecture and performance documentation when modifying Lucene indexing or query planning components.

test/chrondb/api/sql/execution/ast_converter_test.clj (1)

6-12: LGTM!

The FTS test and multiple conditions aggregation tests are well-structured and cover the expected behavior correctly.

Also applies to: 100-116

test/chrondb/query/ast_test.clj (5)

5-39: LGTM!

Excellent coverage of basic clause constructors with proper nil handling and edge case validation.

41-81: LGTM!

Range clause tests comprehensively cover default behavior, open bounds, option overrides, and numeric type specialization.

83-117: LGTM!

Boolean node tests thoroughly validate collapsing logic, implicit match-all handling, and nil filtering behavior.

119-157: LGTM!

Logical combinator tests properly verify edge case collapsing and the interaction between AND, OR, and NOT operators.

159-225: LGTM!

Sort descriptor and query transformer tests comprehensively validate metadata handling, defaults, and vectorization behavior.

coderabbitai · 2025-10-30T21:22:12Z

test/chrondb/api/sql/execution/ast_converter_test.clj

+(deftest condition->ast-clause-standard-test
+  (testing "Equality removes surrounding quotes"
+    (let [condition {:type :standard
+                     :field "name"
+                     :op "="
+                     :value "'Alice'"}
+          expected (ast/term :name "Alice")]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "Not equal delegates to ast/not"
+    (let [condition {:type :standard
+                     :field "status"
+                     :op "!="
+                     :value "archived"}
+          expected (ast/not (ast/term :status "archived"))]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "Alternate not equal operator <> behaves the same"
+    (let [condition {:type :standard
+                     :field "status"
+                     :op "<>"
+                     :value "archived"}
+          expected (ast/not (ast/term :status "archived"))]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "Greater than parses longs"
+    (let [condition {:type :standard
+                     :field "age"
+                     :op ">"
+                     :value "30"}
+          expected (ast/range-long :age 30 nil {:include-lower? false})]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "Greater than parses doubles"
+    (let [condition {:type :standard
+                     :field "price"
+                     :op ">"
+                     :value "42.5"}
+          expected (ast/range-double :price 42.5 nil {:include-lower? false})]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "Greater than falls back to string range when numeric parse fails"
+    (let [condition {:type :standard
+                     :field "code"
+                     :op ">"
+                     :value "'A'"}
+          expected (ast/range :code "A" nil {:include-lower? false})]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "Less-than-or-equal builds inclusive upper bound"
+    (let [condition {:type :standard
+                     :field "age"
+                     :op "<="
+                     :value "65"}
+          expected (ast/range-long :age nil 65 {:include-upper? true})]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "LIKE converts SQL wildcards to Lucene wildcards"
+    (let [condition {:type :standard
+                     :field "title"
+                     :op "LIKE"
+                     :value "report-%Q1%"}
+          expected (ast/wildcard :title "report-*Q1*")]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "IS NULL returns missing clause"
+    (let [condition {:type :standard
+                     :field "deleted_at"
+                     :op "IS NULL"}
+          expected (ast/missing :deleted_at)]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "IS NOT NULL returns exists clause"
+    (let [condition {:type :standard
+                     :field "deleted_at"
+                     :op "IS NOT NULL"}
+          expected (ast/exists :deleted_at)]
+      (is (= expected (converter/condition->ast-clause condition)))))
+
+  (testing "Unknown condition type returns nil"
+    (is (nil? (converter/condition->ast-clause
+               {:type :unsupported
+                :field "name"
+                :op "="
+                :value "Alice"})))))


🛠️ Refactor suggestion | 🟠 Major

Add test coverage for numeric equality comparisons.

The test suite doesn't verify that numeric values in equality operators (=, !=, <>) are handled correctly. This is a critical gap since the implementation currently has a bug where numeric equality uses string term clauses instead of numeric range clauses.

Add tests like these:

(testing "Equality with numeric long value" (let [condition {:type :standard :field "age" :op "=" :value "30"} expected (ast/range-long :age 30 30 {:include-lower? true :include-upper? true})] (is (= expected (converter/condition->ast-clause condition))))) (testing "Equality with numeric double value" (let [condition {:type :standard :field "price" :op "=" :value "99.99"} expected (ast/range-double :price 99.99 99.99 {:include-lower? true :include-upper? true})] (is (= expected (converter/condition->ast-clause condition))))) (testing "Not equal with numeric value" (let [condition {:type :standard :field "count" :op "!=" :value "0"} expected (ast/not (ast/range-long :count 0 0 {:include-lower? true :include-upper? true}))] (is (= expected (converter/condition->ast-clause condition)))))

🤖 Prompt for AI Agents

test/chrondb/api/sql/execution/ast_converter_test.clj lines 14-98: add tests that verify numeric equality is converted to numeric range clauses rather than string terms; specifically add an "Equality with numeric long value" test asserting that field age with op "=" and value "30" yields ast/range-long with lower and upper both 30 and include-lower? and include-upper? true, an "Equality with numeric double value" test asserting that field price with op "=" and value "99.99" yields ast/range-double with lower and upper both 99.99 and includes true, and a "Not equal with numeric value" test asserting that field count with op "!=" and value "0" yields ast/not wrapping an ast/range-long with lower and upper 0 and includes true; place each test alongside the existing condition->ast-clause-standard-test cases so they run with the same fixtures.

chatgpt-codex-connector bot reviewed Oct 30, 2025

View reviewed changes

coderabbitai bot suggested changes Oct 30, 2025

View reviewed changes

avelino merged commit 9c46bdc into main Oct 30, 2025
10 checks passed

avelino deleted the avelino/issue-39 branch October 30, 2025 21:25

		@@ -0,0 +1,47 @@
		(ns chrondb.api.sql.parser.clauses-test
		(:require [clojure.test :refer [deftest is testing]]

-  (:require [clojure.test :refer [deftest is testing]]
+(ns chrondb.api.sql.parser.clauses-test
+  (:require [clojure.test :refer [deftest is]]
+             [chrondb.api.sql.parser.tokenizer :as tokenizer]
+             [chrondb.api.sql.parser.clauses :as clauses]))

	6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer
	6. Route new search features through the Lucene planner, cache, and index definitions—avoid bespoke query engines that bypass the centralized indexing layer.

		- Ensure Jackson factories stay runtime-initialized: `dev/chrondb/native_image.clj` registers `com.fasterxml.jackson.core.JsonFactory` and related classes under `base-run-time-classes`; keep this list in sync when touching JSON dependencies
		- Keep CLI namespaces lazily loaded (use `requiring-resolve` as in `chrondb.core`) so `clj-http`/`cheshire` initialization happens at runtime instead of during native image analysis

feat(ast): unify query engine across all protocols with new AST system #61

feat(ast): unify query engine across all protocols with new AST system #61

Uh oh!

Conversation

avelino commented Oct 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Pre-merge checks and finishing touches

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

avelino commented Oct 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 30, 2025 •

edited

Loading