-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add SQL schema support for multi-tenant data isolation #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces SQL schema support to enable multi-tenant data isolation through Git branches. Key changes include: - Add schema parameter support in SQL parser for SELECT, INSERT, UPDATE, DELETE - Implement branch-based storage operations in Git and Memory storage backends - Update storage protocol to handle schema/branch parameter in document operations - Add schema normalization to convert SQL schema names to Git branch names - Update query execution to pass schema/branch context to storage operations - Add test coverage for schema-based document operations The changes allow applications to isolate tenant data by using SQL schemas, which are mapped to Git branches under the hood. This provides a clean abstraction for multi-tenant applications while leveraging Git's branching capabilities for data isolation. Signed-off-by: Avelino <[email protected]>
📝 WalkthroughWalkthroughThis pull request extends the SQL execution and parsing logic to support branch-specific operations. The changes add an optional branch parameter to CRUD operations in SQL query execution, update the parser to extract schema information, and modify both Git and memory storage implementations (as well as the storage protocol) to handle branch parameters. Additionally, new helper functions and tests are added to verify the branch mapping behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant Q as SQL Query Handler
participant N as Schema-to-Branch Normalizer
participant S as Storage Layer
C->>Q: Send SQL query (e.g., SELECT with optional schema)
Q->>N: Normalize schema to branch (if branch not provided)
N-->>Q: Return branch name
Q->>S: Execute operation (get, save, update, delete) using branch parameter
S-->>Q: Return result
Q-->>C: Deliver query response
Suggested labels
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
src/chrondb/storage/memory.clj (1)
68-72: 🛠️ Refactor suggestionUpdate factory function for branch isolation
If implementing branch-specific storage, the
create-memory-storagefunction should also be updated to initialize the branch-data structure.(defn create-memory-storage "Creates a new instance of MemoryStorage. Returns: A new MemoryStorage instance backed by a ConcurrentHashMap." [] - (->MemoryStorage (ConcurrentHashMap.))) + (->MemoryStorage (atom {"main" (ConcurrentHashMap.)})))
♻️ Duplicate comments (4)
src/chrondb/storage/memory.clj (4)
52-53: Branch parameter is ignored in get-documentSimilar to save-document, this implementation doesn't provide true branch isolation.
55-56: Branch parameter is ignored in delete-documentSimilar to previous methods, this implementation doesn't provide true branch isolation.
58-59: Branch parameter is ignored in get-documents-by-prefixSimilar to previous methods, this implementation doesn't provide true branch isolation.
61-62: Branch parameter is ignored in get-documents-by-tableSimilar to previous methods, this implementation doesn't provide true branch isolation.
🧹 Nitpick comments (9)
test/chrondb/api/sql/execution/query_test.clj (2)
8-21: Add tests for the helper functionsThe
create-test-resourcesandprepare-test-datahelper functions are added but don't appear to be used in any of the tests. Either add tests that use these functions or consider removing them if they're unused.
34-35: Mock storage should verify branch parameterThe mock implementation ignores the branch parameter, which means it's not fully testing the branch isolation functionality. Consider enhancing the mock to track which branch is being accessed.
- (get-document [_ id _branch] - (get @documents id)) + (get-document [_ id branch] + (println (str "Getting document " id " from branch " branch)) + (get @documents id))Similar changes could be made to other methods to at least log which branch is being accessed.
src/chrondb/api/sql/parser/statements.clj (3)
28-39: Extract schema parsing logic to a helper functionThe schema parsing logic is duplicated across all four query parsing functions. It would be cleaner to extract this into a helper function.
(defn- parse-schema-table "Parses a raw table specification into schema and table components. Returns [schema table-name]" [raw-table-spec] (when raw-table-spec (let [parts (str/split raw-table-spec #"\.")] (if (= (count parts) 2) [(first parts) (second parts)] [nil raw-table-spec]))))Then you could use it in each query parsing function:
(let [... raw-table-spec (...) [schema table-name] (parse-schema-table raw-table-spec) ...]
65-72: Add support for quoted identifiersThe current schema parsing doesn't handle quoted identifiers like "my.schema"."table", which is standard SQL syntax for identifiers containing special characters. You might want to add support for this case.
[schema table-name] (when raw-table-spec - (let [parts (str/split raw-table-spec #"\.")] + (let [parts (if (.startsWith raw-table-spec "\"") + (let [quoted-parts (re-seq #"\"([^\"]+)\"" raw-table-spec)] + (mapv second quoted-parts)) + (str/split raw-table-spec #"\."))] (if (= (count parts) 2) [(first parts) (second parts)] [nil raw-table-spec])))This is a simplified version that doesn't handle all edge cases, but illustrates the approach.
167-171: Handle potential SQL injection in schema.table parsingThe schema.table parsing could be vulnerable to SQL injection if the schema or table names contain specially crafted strings. Consider using a more robust parsing approach.
[schema table-name] (when raw-table-spec - (let [parts (str/split raw-table-spec #"\.")] + (let [parts (re-find #"^([^\.]+)\.([^\.]+)$" raw-table-spec)] (if (= (count parts) 2) - [(first parts) (second parts)] + [(second parts) (third parts)] [nil raw-table-spec])))This regex approach would be more restrictive about what constitutes a valid schema.table format.
src/chrondb/api/sql/execution/query.clj (4)
12-19: Commented-out function for ID-based document retrieval.Keeping it commented suggests it may be retired or replaced. If it’s truly deprecated, consider removing it entirely to reduce code clutter unless you plan to revive it.
21-25: Commented-out function for retrieving all documents.Same note as above: remove if it’s no longer relevant. If you expect to reintroduce it, maintain it in a feature branch to keep your main codebase clean.
27-33: Commented-out function for retrieving table documents.Again, if fully deprecated, consider removing. Otherwise, keep track of reintroducing it when needed.
103-105: Enhanced logging for SELECT queries.Providing the table name, schema, and computed branch name clarifies the query context for troubleshooting. Consider switching to a debug-level log if high-volume queries lead to noisy logs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/chrondb/api/sql/execution/query.clj(9 hunks)src/chrondb/api/sql/parser/statements.clj(7 hunks)src/chrondb/storage/git.clj(6 hunks)src/chrondb/storage/memory.clj(1 hunks)src/chrondb/storage/protocol.clj(1 hunks)test/chrondb/api/sql/execution/query_test.clj(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (22)
src/chrondb/storage/protocol.clj (5)
10-17: LGTM: Protocol extension to support branchingThe protocol has been cleanly extended to add branch-based operations to the Storage protocol. This aligns well with the PR objectives for multi-tenant data isolation.
19-26: LGTM: Consistent documentation updateDocumentation is consistently updated to reflect the new branch parameter.
28-35: LGTM: Consistent parameter documentationGood job maintaining a consistent approach to documenting the branch parameter across all methods.
37-44: LGTM: Document retrieval with branch parameterThe
get-documents-by-prefixmethod is correctly extended to support branch-specific operations.
46-53: LGTM: Table-based document retrieval with branchThe
get-documents-by-tablemethod extension maintains consistency with other methods.test/chrondb/api/sql/execution/query_test.clj (1)
224-259: LGTM: Comprehensive schema-branch mapping testsGood job on adding tests for the schema-branch mapping feature. The tests cover the important scenarios:
- Queries without a schema (defaulting to 'main' branch)
- Queries with 'public' schema (mapping to 'main' branch)
- Queries with 'main' schema (using 'main' branch)
This ensures that the schema-to-branch mapping behavior is correctly implemented.
src/chrondb/api/sql/parser/statements.clj (1)
118-123:Details
❓ Verification inconclusive
Validate schema names
There's no validation of schema names. Invalid characters in schema names could cause issues when converting to Git branch names (which have specific constraints).
Consider adding validation to ensure that schema names are valid for conversion to Git branch names:
🏁 Script executed:
#!/bin/bash # Check for Git branch name validation logic in the codebase rg -A 5 -B 5 "branch.?(name)?.?valid" --glob "*.clj"Length of output: 54
Schema Name Validation: Manual Verification Required
After executing the search for Git branch name validation logic, no matches were found. This suggests that the codebase currently lacks explicit validation to ensure that schema names (which are later used for Git branch name conversion) do not contain invalid characters. Please manually verify if validation is implemented elsewhere or consider adding appropriate checks—especially in the context of the conversion logic in
src/chrondb/api/sql/parser/statements.cljat lines 118–123—to ensure that schema names meet Git branch naming constraints.src/chrondb/storage/git.clj (8)
220-224: New helper function for encoded table path looks good.This function centralizes table name encoding logic, improving maintainability when referencing tables in the repository. Consider handling potential edge cases (e.g.,
nilor emptytable-name) if they're possible in upstream usage.
230-238: Refactor to use the newget-table-pathfunction.Line 230 is just clarifying backward compatibility, and line 238 now calls
get-table-pathinstead of inlinedencode-path. This improves consistency and keeps logic cohesive.
249-249: Optionalbranchparameter inget-document-path.Introducing an optional
branchparameter here aligns with multi-branch logic. The fallback to the default branch in line 252 is clear and consistent with the rest of the code.Also applies to: 252-252
276-278: Enhancedsave-documentmethod with branch support.
- Lines 276-278 define a no-branch version that delegates to the new one, ensuring backward compatibility.
- Lines 290-291 convert the document to JSON and retrieve or default to the configured branch.
- Line 294 provides the branch to
commit-virtual.No major issues; the approach cleanly extends branch functionality.
Also applies to: 290-291, 294-294
306-308: Extendedget-documentwith optional branch logic.
- Lines 306-308 add a no-branch overload.
- Lines 313-314 pick the default branch if not provided.
- Line 318 checks for a valid path before proceeding.
This is a good pattern for preserving old usage while introducing branch awareness.
Also applies to: 313-314, 318-318
338-339: Updatedget-documents-by-prefixto optionally specify a branch.
- The fallback to the default branch (line 346) is consistent.
- Logging statements (e.g., lines 350, 365) help trace multi-branch queries.
- Filtering logic uses
.jsonsuffix to locate documents.Implementation seems correct; watch for potential performance implications of scanning large repositories.
Also applies to: 341-341, 346-346, 350-350, 352-352, 358-361, 362-379, 384-385
387-388: Adjustedget-documents-by-tablefor branch-specific lookups.
- Defaulting the branch name (line 395) mirrors the prefix-based approach.
- The iterative JSON search resembles the prefix logic, providing table-level filtering.
- Logging indicates document matches and mismatches.
All changes appear consistent and well-documented.
Also applies to: 390-390, 395-395, 397-397, 399-399, 401-401, 408-410, 411-429, 432-433
435-436: Branch-awaredelete-documentmethod.
- Lines 435-436 define the no-branch variant.
- Lines 438, 443-444 ensure the correct branch is used for retrieving/deleting the document.
This finalizes the CRUD suite with optional branch handling. Looks good.
Also applies to: 438-438, 443-444
src/chrondb/api/sql/execution/query.clj (7)
75-84: Newnormalize-schema-to-branchfunction.Mapping the
niland"public"schemas to"main"is straightforward. Ensure that custom schema names do not collide with existing branch names or special references in Git. Otherwise, looks solid.
116-123: Refined logic to retrieve documents by ID or table/prefix.This unified approach ensures minimal duplication. Good use of
branch-nameto route the call. Validate corner cases (e.g., emptywhere-conditionorniltable) in upstream parsing.
129-132: WHERE condition filtering logic.Conditional filtering with
apply-where-conditionsis straightforward. Logging the filtered count and IDs is helpful for debugging. Looks good.
188-190:handle-insertnow supports branch parameter.Storing documents directly into the specified branch is consistent with the rest of the multi-branch approach. No issues found.
197-207: Branch-awarehandle-update.
- Including the
branchparam in logs (line 201) clarifies update context.- Merging old doc with new fields before saving is standard.
- The final log (line 205) confirms success.
Implementation is clear and consistent.
212-214: Branch parameter inhandle-delete.Passing
branchtostorage/delete-documentensures correct branch deletion. This finalizes branch support for all CRUD operations.
491-502: Refactoredhandle-queryto accommodate all branch-aware statements.Switching to if-else blocks for each SQL command type is workable. Ensure future expansions account for new statement types.
| (save-document [_ doc] (save-document-memory data doc)) | ||
| (save-document [_ doc _branch] (save-document-memory data doc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Implementation needs branch-specific storage
The memory storage implementation accepts the branch parameter but doesn't use it for isolation. For true multi-tenant isolation, the MemoryStorage class should maintain separate storage for each branch.
One way to implement this would be to replace the single ConcurrentHashMap with a map of branch names to ConcurrentHashMaps:
-defrecord MemoryStorage [^ConcurrentHashMap data]
+defrecord MemoryStorage [branch-data]
- (save-document [_ doc] (save-document-memory data doc))
- (save-document [_ doc _branch] (save-document-memory data doc))
+ (save-document [this doc]
+ (save-document this doc "main"))
+ (save-document [_ doc branch]
+ (let [branch-map (or (get @branch-data branch)
+ (let [new-map (ConcurrentHashMap.)]
+ (swap! branch-data assoc branch new-map)
+ new-map))]
+ (save-document-memory branch-map doc)))Similar changes would be needed for other methods.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| (save-document [_ doc] (save-document-memory data doc)) | |
| (save-document [_ doc _branch] (save-document-memory data doc)) | |
| (defrecord MemoryStorage [branch-data] | |
| (save-document [this doc] | |
| (save-document this doc "main")) | |
| (save-document [_ doc branch] | |
| (let [branch-map (or (get @branch-data branch) | |
| (let [new-map (ConcurrentHashMap.)] | |
| (swap! branch-data assoc branch new-map) | |
| new-map))] | |
| (save-document-memory branch-map doc))) |
| (get-documents-by-prefix [_ prefix _branch] | ||
| (filter (fn [[k _]] (.startsWith k prefix)) @documents)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect document filtering in get-documents-by-prefix
The get-documents-by-prefix method is returning key-value pairs instead of just the document values.
(get-documents-by-prefix [_ prefix _branch]
- (filter (fn [[k _]] (.startsWith k prefix)) @documents))
+ (->> @documents
+ (filter (fn [[k _]] (.startsWith k prefix)))
+ (map second)
+ (into [])))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| (get-documents-by-prefix [_ prefix _branch] | |
| (filter (fn [[k _]] (.startsWith k prefix)) @documents)) | |
| (get-documents-by-prefix [_ prefix _branch] | |
| (->> @documents | |
| (filter (fn [[k _]] (.startsWith k prefix))) | |
| (map second) | |
| (into []))) |
This commit introduces SQL schema support to enable multi-tenant data isolation through Git branches. Key changes include:
The changes allow applications to isolate tenant data by using SQL schemas, which are mapped to Git branches under the hood. This provides a clean abstraction for multi-tenant applications while leveraging Git's branching capabilities for data isolation.
Summary by CodeRabbit