Skip to content

Conversation

@avelino
Copy link
Member

@avelino avelino commented Apr 8, 2025

This commit introduces SQL schema support to enable multi-tenant data isolation through Git branches. Key changes include:

  • Add schema parameter support in SQL parser for SELECT, INSERT, UPDATE, DELETE
  • Implement branch-based storage operations in Git and Memory storage backends
  • Update storage protocol to handle schema/branch parameter in document operations
  • Add schema normalization to convert SQL schema names to Git branch names
  • Update query execution to pass schema/branch context to storage operations
  • Add test coverage for schema-based document operations

The changes allow applications to isolate tenant data by using SQL schemas, which are mapped to Git branches under the hood. This provides a clean abstraction for multi-tenant applications while leveraging Git's branching capabilities for data isolation.

Summary by CodeRabbit

  • New Features
    • Introduced multi-branch support across SQL operations, giving users enhanced control over document retrieval and manipulation.
    • Improved SQL parsing now supports table schemas in the "schema.table" format for naturally handling branch-specific actions.
    • Extended branch-aware capabilities to storage operations for more flexible document management.
  • Tests
    • Added comprehensive tests to validate branch mapping and default branch behavior, ensuring consistent performance across scenarios.

This commit introduces SQL schema support to enable multi-tenant data isolation
through Git branches. Key changes include:

- Add schema parameter support in SQL parser for SELECT, INSERT, UPDATE, DELETE
- Implement branch-based storage operations in Git and Memory storage backends
- Update storage protocol to handle schema/branch parameter in document operations
- Add schema normalization to convert SQL schema names to Git branch names
- Update query execution to pass schema/branch context to storage operations
- Add test coverage for schema-based document operations

The changes allow applications to isolate tenant data by using SQL schemas,
which are mapped to Git branches under the hood. This provides a clean
abstraction for multi-tenant applications while leveraging Git's branching
capabilities for data isolation.

Signed-off-by: Avelino <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 8, 2025

📝 Walkthrough

Walkthrough

This pull request extends the SQL execution and parsing logic to support branch-specific operations. The changes add an optional branch parameter to CRUD operations in SQL query execution, update the parser to extract schema information, and modify both Git and memory storage implementations (as well as the storage protocol) to handle branch parameters. Additionally, new helper functions and tests are added to verify the branch mapping behavior.

Changes

File Change Summary
src/chrondb/api/sql/execution/query.clj Updated SQL execution functions (SELECT, INSERT, UPDATE, DELETE) to include an optional branch parameter; commented out legacy document retrieval functions; added normalize-schema-to-branch for schema-to-branch conversion.
src/chrondb/api/sql/parser/…/statements.clj Enhanced SQL query parser to split table specifications into schema and table name; updated return maps in queries to include schema information.
src/chrondb/storage/git.clj Introduced a new get-table-path function; updated file/document path functions and methods (save-document, get-document, delete-document, etc.) to accept an optional branch parameter and improve logging.
src/chrondb/storage/memory.clj Modified storage methods to include an extra _branch parameter for potential branch-specific operations, preserving existing logic.
src/chrondb/storage/protocol.clj Updated Storage protocol method signatures to optionally include the branch parameter; reformatted and updated method documentation.
test/chrondb/api/sql/execution/query_test.clj Added new helper functions (create-test-resources, prepare-test-data); updated mock storage methods for branch support; included a new test case for schema-branch mapping.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant Q as SQL Query Handler
    participant N as Schema-to-Branch Normalizer
    participant S as Storage Layer

    C->>Q: Send SQL query (e.g., SELECT with optional schema)
    Q->>N: Normalize schema to branch (if branch not provided)
    N-->>Q: Return branch name
    Q->>S: Execute operation (get, save, update, delete) using branch parameter
    S-->>Q: Return result
    Q-->>C: Deliver query response
Loading

Suggested labels

bug

Poem

I'm a rabbit in the code, so spry,
Hopping through branches with a twinkling eye.
SQL queries now know where to roam,
From main to public, every branch feels like home.
With carrots and code, we celebrate and sigh!
Happy hops in a world where bugs say goodbye!


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
src/chrondb/storage/memory.clj (1)

68-72: 🛠️ Refactor suggestion

Update factory function for branch isolation

If implementing branch-specific storage, the create-memory-storage function should also be updated to initialize the branch-data structure.

(defn create-memory-storage
  "Creates a new instance of MemoryStorage.
   Returns: A new MemoryStorage instance backed by a ConcurrentHashMap."
  []
-  (->MemoryStorage (ConcurrentHashMap.)))
+  (->MemoryStorage (atom {"main" (ConcurrentHashMap.)})))
♻️ Duplicate comments (4)
src/chrondb/storage/memory.clj (4)

52-53: Branch parameter is ignored in get-document

Similar to save-document, this implementation doesn't provide true branch isolation.


55-56: Branch parameter is ignored in delete-document

Similar to previous methods, this implementation doesn't provide true branch isolation.


58-59: Branch parameter is ignored in get-documents-by-prefix

Similar to previous methods, this implementation doesn't provide true branch isolation.


61-62: Branch parameter is ignored in get-documents-by-table

Similar to previous methods, this implementation doesn't provide true branch isolation.

🧹 Nitpick comments (9)
test/chrondb/api/sql/execution/query_test.clj (2)

8-21: Add tests for the helper functions

The create-test-resources and prepare-test-data helper functions are added but don't appear to be used in any of the tests. Either add tests that use these functions or consider removing them if they're unused.


34-35: Mock storage should verify branch parameter

The mock implementation ignores the branch parameter, which means it's not fully testing the branch isolation functionality. Consider enhancing the mock to track which branch is being accessed.

-      (get-document [_ id _branch]
-        (get @documents id))
+      (get-document [_ id branch]
+        (println (str "Getting document " id " from branch " branch))
+        (get @documents id))

Similar changes could be made to other methods to at least log which branch is being accessed.

src/chrondb/api/sql/parser/statements.clj (3)

28-39: Extract schema parsing logic to a helper function

The schema parsing logic is duplicated across all four query parsing functions. It would be cleaner to extract this into a helper function.

(defn- parse-schema-table
  "Parses a raw table specification into schema and table components.
   Returns [schema table-name]"
  [raw-table-spec]
  (when raw-table-spec
    (let [parts (str/split raw-table-spec #"\.")]
      (if (= (count parts) 2)
        [(first parts) (second parts)]
        [nil raw-table-spec]))))

Then you could use it in each query parsing function:

(let [...
      raw-table-spec (...)
      [schema table-name] (parse-schema-table raw-table-spec)
      ...]

65-72: Add support for quoted identifiers

The current schema parsing doesn't handle quoted identifiers like "my.schema"."table", which is standard SQL syntax for identifiers containing special characters. You might want to add support for this case.

        [schema table-name] (when raw-table-spec
-                             (let [parts (str/split raw-table-spec #"\.")]
+                             (let [parts (if (.startsWith raw-table-spec "\"")
+                                          (let [quoted-parts (re-seq #"\"([^\"]+)\"" raw-table-spec)]
+                                            (mapv second quoted-parts))
+                                          (str/split raw-table-spec #"\."))]
                               (if (= (count parts) 2)
                                 [(first parts) (second parts)]
                                 [nil raw-table-spec])))

This is a simplified version that doesn't handle all edge cases, but illustrates the approach.


167-171: Handle potential SQL injection in schema.table parsing

The schema.table parsing could be vulnerable to SQL injection if the schema or table names contain specially crafted strings. Consider using a more robust parsing approach.

        [schema table-name] (when raw-table-spec
-                             (let [parts (str/split raw-table-spec #"\.")]
+                             (let [parts (re-find #"^([^\.]+)\.([^\.]+)$" raw-table-spec)]
                               (if (= (count parts) 2)
-                                 [(first parts) (second parts)]
+                                 [(second parts) (third parts)]
                                 [nil raw-table-spec])))

This regex approach would be more restrictive about what constitutes a valid schema.table format.

src/chrondb/api/sql/execution/query.clj (4)

12-19: Commented-out function for ID-based document retrieval.

Keeping it commented suggests it may be retired or replaced. If it’s truly deprecated, consider removing it entirely to reduce code clutter unless you plan to revive it.


21-25: Commented-out function for retrieving all documents.

Same note as above: remove if it’s no longer relevant. If you expect to reintroduce it, maintain it in a feature branch to keep your main codebase clean.


27-33: Commented-out function for retrieving table documents.

Again, if fully deprecated, consider removing. Otherwise, keep track of reintroducing it when needed.


103-105: Enhanced logging for SELECT queries.

Providing the table name, schema, and computed branch name clarifies the query context for troubleshooting. Consider switching to a debug-level log if high-volume queries lead to noisy logs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15dd5dd and 5efaa0f.

📒 Files selected for processing (6)
  • src/chrondb/api/sql/execution/query.clj (9 hunks)
  • src/chrondb/api/sql/parser/statements.clj (7 hunks)
  • src/chrondb/storage/git.clj (6 hunks)
  • src/chrondb/storage/memory.clj (1 hunks)
  • src/chrondb/storage/protocol.clj (1 hunks)
  • test/chrondb/api/sql/execution/query_test.clj (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (22)
src/chrondb/storage/protocol.clj (5)

10-17: LGTM: Protocol extension to support branching

The protocol has been cleanly extended to add branch-based operations to the Storage protocol. This aligns well with the PR objectives for multi-tenant data isolation.


19-26: LGTM: Consistent documentation update

Documentation is consistently updated to reflect the new branch parameter.


28-35: LGTM: Consistent parameter documentation

Good job maintaining a consistent approach to documenting the branch parameter across all methods.


37-44: LGTM: Document retrieval with branch parameter

The get-documents-by-prefix method is correctly extended to support branch-specific operations.


46-53: LGTM: Table-based document retrieval with branch

The get-documents-by-table method extension maintains consistency with other methods.

test/chrondb/api/sql/execution/query_test.clj (1)

224-259: LGTM: Comprehensive schema-branch mapping tests

Good job on adding tests for the schema-branch mapping feature. The tests cover the important scenarios:

  1. Queries without a schema (defaulting to 'main' branch)
  2. Queries with 'public' schema (mapping to 'main' branch)
  3. Queries with 'main' schema (using 'main' branch)

This ensures that the schema-to-branch mapping behavior is correctly implemented.

src/chrondb/api/sql/parser/statements.clj (1)

118-123:

Details

❓ Verification inconclusive

Validate schema names

There's no validation of schema names. Invalid characters in schema names could cause issues when converting to Git branch names (which have specific constraints).

Consider adding validation to ensure that schema names are valid for conversion to Git branch names:


🏁 Script executed:

#!/bin/bash
# Check for Git branch name validation logic in the codebase
rg -A 5 -B 5 "branch.?(name)?.?valid" --glob "*.clj"

Length of output: 54


Schema Name Validation: Manual Verification Required

After executing the search for Git branch name validation logic, no matches were found. This suggests that the codebase currently lacks explicit validation to ensure that schema names (which are later used for Git branch name conversion) do not contain invalid characters. Please manually verify if validation is implemented elsewhere or consider adding appropriate checks—especially in the context of the conversion logic in src/chrondb/api/sql/parser/statements.clj at lines 118–123—to ensure that schema names meet Git branch naming constraints.

src/chrondb/storage/git.clj (8)

220-224: New helper function for encoded table path looks good.

This function centralizes table name encoding logic, improving maintainability when referencing tables in the repository. Consider handling potential edge cases (e.g., nil or empty table-name) if they're possible in upstream usage.


230-238: Refactor to use the new get-table-path function.

Line 230 is just clarifying backward compatibility, and line 238 now calls get-table-path instead of inlined encode-path. This improves consistency and keeps logic cohesive.


249-249: Optional branch parameter in get-document-path.

Introducing an optional branch parameter here aligns with multi-branch logic. The fallback to the default branch in line 252 is clear and consistent with the rest of the code.

Also applies to: 252-252


276-278: Enhanced save-document method with branch support.

  • Lines 276-278 define a no-branch version that delegates to the new one, ensuring backward compatibility.
  • Lines 290-291 convert the document to JSON and retrieve or default to the configured branch.
  • Line 294 provides the branch to commit-virtual.

No major issues; the approach cleanly extends branch functionality.

Also applies to: 290-291, 294-294


306-308: Extended get-document with optional branch logic.

  • Lines 306-308 add a no-branch overload.
  • Lines 313-314 pick the default branch if not provided.
  • Line 318 checks for a valid path before proceeding.

This is a good pattern for preserving old usage while introducing branch awareness.

Also applies to: 313-314, 318-318


338-339: Updated get-documents-by-prefix to optionally specify a branch.

  • The fallback to the default branch (line 346) is consistent.
  • Logging statements (e.g., lines 350, 365) help trace multi-branch queries.
  • Filtering logic uses .json suffix to locate documents.

Implementation seems correct; watch for potential performance implications of scanning large repositories.

Also applies to: 341-341, 346-346, 350-350, 352-352, 358-361, 362-379, 384-385


387-388: Adjusted get-documents-by-table for branch-specific lookups.

  • Defaulting the branch name (line 395) mirrors the prefix-based approach.
  • The iterative JSON search resembles the prefix logic, providing table-level filtering.
  • Logging indicates document matches and mismatches.

All changes appear consistent and well-documented.

Also applies to: 390-390, 395-395, 397-397, 399-399, 401-401, 408-410, 411-429, 432-433


435-436: Branch-aware delete-document method.

  • Lines 435-436 define the no-branch variant.
  • Lines 438, 443-444 ensure the correct branch is used for retrieving/deleting the document.

This finalizes the CRUD suite with optional branch handling. Looks good.

Also applies to: 438-438, 443-444

src/chrondb/api/sql/execution/query.clj (7)

75-84: New normalize-schema-to-branch function.

Mapping the nil and "public" schemas to "main" is straightforward. Ensure that custom schema names do not collide with existing branch names or special references in Git. Otherwise, looks solid.


116-123: Refined logic to retrieve documents by ID or table/prefix.

This unified approach ensures minimal duplication. Good use of branch-name to route the call. Validate corner cases (e.g., empty where-condition or nil table) in upstream parsing.


129-132: WHERE condition filtering logic.

Conditional filtering with apply-where-conditions is straightforward. Logging the filtered count and IDs is helpful for debugging. Looks good.


188-190: handle-insert now supports branch parameter.

Storing documents directly into the specified branch is consistent with the rest of the multi-branch approach. No issues found.


197-207: Branch-aware handle-update.

  • Including the branch param in logs (line 201) clarifies update context.
  • Merging old doc with new fields before saving is standard.
  • The final log (line 205) confirms success.

Implementation is clear and consistent.


212-214: Branch parameter in handle-delete.

Passing branch to storage/delete-document ensures correct branch deletion. This finalizes branch support for all CRUD operations.


491-502: Refactored handle-query to accommodate all branch-aware statements.

Switching to if-else blocks for each SQL command type is workable. Ensure future expansions account for new statement types.

Comment on lines 49 to +50
(save-document [_ doc] (save-document-memory data doc))
(save-document [_ doc _branch] (save-document-memory data doc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Implementation needs branch-specific storage

The memory storage implementation accepts the branch parameter but doesn't use it for isolation. For true multi-tenant isolation, the MemoryStorage class should maintain separate storage for each branch.

One way to implement this would be to replace the single ConcurrentHashMap with a map of branch names to ConcurrentHashMaps:

-defrecord MemoryStorage [^ConcurrentHashMap data]
+defrecord MemoryStorage [branch-data]

-  (save-document [_ doc] (save-document-memory data doc))
-  (save-document [_ doc _branch] (save-document-memory data doc))
+  (save-document [this doc] 
+    (save-document this doc "main"))
+  (save-document [_ doc branch] 
+    (let [branch-map (or (get @branch-data branch) 
+                         (let [new-map (ConcurrentHashMap.)]
+                           (swap! branch-data assoc branch new-map)
+                           new-map))]
+      (save-document-memory branch-map doc)))

Similar changes would be needed for other methods.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(save-document [_ doc] (save-document-memory data doc))
(save-document [_ doc _branch] (save-document-memory data doc))
(defrecord MemoryStorage [branch-data]
(save-document [this doc]
(save-document this doc "main"))
(save-document [_ doc branch]
(let [branch-map (or (get @branch-data branch)
(let [new-map (ConcurrentHashMap.)]
(swap! branch-data assoc branch new-map)
new-map))]
(save-document-memory branch-map doc)))

Comment on lines +40 to +42
(get-documents-by-prefix [_ prefix _branch]
(filter (fn [[k _]] (.startsWith k prefix)) @documents))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect document filtering in get-documents-by-prefix

The get-documents-by-prefix method is returning key-value pairs instead of just the document values.

       (get-documents-by-prefix [_ prefix _branch]
-        (filter (fn [[k _]] (.startsWith k prefix)) @documents))
+        (->> @documents
+             (filter (fn [[k _]] (.startsWith k prefix)))
+             (map second)
+             (into [])))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(get-documents-by-prefix [_ prefix _branch]
(filter (fn [[k _]] (.startsWith k prefix)) @documents))
(get-documents-by-prefix [_ prefix _branch]
(->> @documents
(filter (fn [[k _]] (.startsWith k prefix)))
(map second)
(into [])))

@avelino avelino merged commit 985c7bf into main Apr 8, 2025
9 checks passed
@avelino avelino deleted the avelino/sql-scheme-support branch April 8, 2025 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants