Refactor code structure and enhance documentation#1
Conversation
…adability and maintainability
…rity on testing guidelines, configuration structure, and programmatic API usage
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughRefactors the codebase to adopt dependency injection and a factory for RAG orchestration, introduces RAGConfig and factory wiring (RAGOrchestratorFactory), updates constructors to accept injected services, adds defaults and pytest integration marker, bumps project version to 2.0.0, and expands integration tests and contributing docs. Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI/main()
participant Factory as RAGOrchestratorFactory
participant Emb as Embeddings
participant Store as ChromaStore
participant DRS as DocumentRetrievalService
participant LLM as LLMService
participant Orch as RAGOrchestrator
participant App as TapioAssistantApp
CLI->>Factory: create_orchestrator()
Factory->>Emb: create_embeddings()
Emb-->>Factory: embeddings instance
Factory->>Store: create_chroma_store(embeddings)
Store-->>Factory: chroma store
Factory->>DRS: create_document_retrieval_service(chroma_store)
DRS-->>Factory: document retrieval service
Factory->>LLM: create_llm_service()
LLM-->>Factory: LLM service
Factory->>Orch: create_orchestrator(drs, llm)
Orch-->>Factory: orchestrator
Factory-->>CLI: orchestrator
CLI->>App: TapioAssistantApp(rag_orchestrator)
App-->>CLI: app ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CONTRIBUTING.md (1)
60-66:⚠️ Potential issue | 🟡 MinorAlign the architecture diagram file name with the actual module name.
The diagram references
gradio_app.py, but the Project Structure section listsapp.py. Please update one of them so contributors don’t chase a non-existent file.Also applies to: 321-323
🤖 Fix all issues with AI agents
In `@CONTRIBUTING.md`:
- Around line 456-459: Update the CONTRIBUTING.md entry for the fallback_to_body
option to hyphenate the compound adjective by changing the phrase "full body
content" to "full-body content" so the list item reads "... Use full-body
content if selectors fail (default: true)"; locate the line describing
`fallback_to_body` and make that single-word hyphenation change.
🧹 Nitpick comments (10)
tapio/vectorstore/chroma_store.py (1)
20-42: Good DI refactor; consider a more specific type hint forembeddings.The dependency injection pattern is correctly implemented. Using
Anyfor theembeddingsparameter is pragmatic given LangChain's various embedding interfaces, but you could consider usinglangchain_core.embeddings.Embeddingsbase class for better type safety and IDE support.💡 Optional type improvement
+from langchain_core.embeddings import Embeddings + class ChromaStore: ... def __init__( self, collection_name: str, - embeddings: Any, + embeddings: Embeddings, persist_directory: str = "chroma_db", ) -> None:tests/parser/test_relative_links.py (1)
205-216: Consider extractingConfigManagerimport to module level.The
ConfigManagerimport is repeated insidetest_domain_specific_url_handling. Since it's also used insetUp, consider moving the import to the top of the file with other imports for consistency.♻️ Suggested change
Add to imports at the top of the file:
from tapio.config.config_manager import ConfigManagerThen remove the local imports on lines 90-91 and 206-207.
tapio/config/config_models.py (1)
146-151: Consider using constants fromsettings.pyfor defaults.The AI summary mentions
tapio/config/settings.pydefinesDEFAULT_EMBEDDING_MODEL,DEFAULT_LLM_MODEL, etc. Using those constants here would centralize default values and follow the DRY principle.#!/bin/bash # Check if settings.py has these constants rg -n "DEFAULT_EMBEDDING_MODEL|DEFAULT_LLM_MODEL|DEFAULT_MAX_TOKENS|DEFAULT_NUM_RESULTS" tapio/config/settings.pytests/integration/test_rag_pipeline.py (2)
114-119: Consider usingunittest.mock.patchinstead of monkey-patching.Directly assigning a lambda to
factory.create_embeddingsworks but is less explicit than using proper mocking, which provides better cleanup and is more idiomatic for tests.♻️ Alternative using patch
+ from unittest.mock import patch + # Create factory factory = RAGOrchestratorFactory(config) - # Mock the create_embeddings to return our mock - factory.create_embeddings = lambda: mock_embeddings + # Mock the create_embeddings to return our mock + with patch.object(factory, 'create_embeddings', return_value=mock_embeddings): + # Create orchestrator + orchestrator = factory.create_orchestrator() - # Create orchestrator - orchestrator = factory.create_orchestrator()
163-164: Assertion could be more precise.
assert len(results) <= 2is a weak assertion. With 3 documents added andnum_results=2, we'd typically expect exactly 2 results. Consider usingassert len(results) == 2unless there's a reason fewer results might be returned (e.g., similarity threshold filtering).♻️ Suggested change
- assert len(results) <= 2 # Should respect num_results limit + assert len(results) == 2 # Should return exactly num_results documentstests/integration/test_vectorization_pipeline.py (1)
59-65: Consider hoisting theChromaimport to module level.The
from langchain_chroma import Chromaimport is repeated inside each test function. Moving it to the top of the file with other imports would improve readability, unless there's a specific reason to delay the import (e.g., avoiding import errors when running non-integration tests).Suggested refactor
from langchain_text_splitters import MarkdownTextSplitter # type: ignore[import-not-found] +from langchain_chroma import Chroma # type: ignore[import-not-found] from tapio.vectorstore.vectorizer import MarkdownVectorizerThen remove the inline imports at lines 59 and 102.
Also applies to: 102-108
tests/parser/test_parser.py (2)
143-148: Remove duplicate import ofConfigManager.
ConfigManageris already imported at line 11. The local import here is redundant.Suggested fix
- # Create config manager and get site config - from tapio.config.config_manager import ConfigManager - - config_manager = ConfigManager(self.config_path) + # Create config manager and get site config + config_manager = ConfigManager(self.config_path)
189-195: Remove duplicate import ofConfigManager.
ConfigManageris already imported at line 11. This local import at line 191 is redundant.Suggested fix
def test_init_with_invalid_site(self): """Test initialization with invalid site.""" - from tapio.config.config_manager import ConfigManager - config_manager = ConfigManager(self.config_path) with self.assertRaises(ValueError): config_manager.get_site_config("nonexistent")tests/vectorstore/test_chroma_store.py (1)
242-271: Add@patchdecorator to avoid real ChromaDB initialization.This test doesn't patch
Chroma, so it will attempt to create a real ChromaDB instance which may be slow or fail in some environments. Consider adding the patch for consistency with other tests.Suggested fix
+ `@patch`("tapio.vectorstore.chroma_store.Chroma") - def test_enhance_document_with_citation(self, mock_embeddings): + def test_enhance_document_with_citation(self, mock_chroma, mock_embeddings): """Test enhancing a document with citation information.""" # Initialize ChromaStore with injected embeddings store = ChromaStore( collection_name="test_collection", embeddings=mock_embeddings, )tests/conftest.py (1)
19-39: Avoid shared mutable embedding lists in the mock.
embed_query/embed_documentscurrently return the same list instance, which can cause cross-test coupling if any downstream code mutates it. Returning copies better mirrors real embedding behavior.♻️ Proposed adjustment
- embeddings.embed_query.return_value = dummy_embedding + embeddings.embed_query.side_effect = lambda _: dummy_embedding.copy() @@ - def embed_documents_mock(texts): - return [dummy_embedding for _ in texts] + def embed_documents_mock(texts): + return [dummy_embedding.copy() for _ in texts]
…s definition to streamline the codebase and enhance maintainability.
…r improved clarity
… for improved maintainability and consistency
…y in integration tests
…or improved clarity
…r improved mocking
… improved test reliability
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@tapio/factories.py`:
- Line 8: Replace the deprecated import of HuggingFaceEmbeddings in
tapio/factories.py: change the module import from langchain_community.embeddings
to the supported langchain_huggingface package so the symbol
HuggingFaceEmbeddings is imported from langchain_huggingface instead; locate the
import line that currently reads using HuggingFaceEmbeddings and update it to
import from langchain_huggingface to eliminate the deprecation warning and match
usage elsewhere (e.g., as done in cli.py).
In `@tests/integration/test_rag_pipeline.py`:
- Line 78: Remove the duplicate import of patch by deleting the redundant "from
unittest.mock import patch" statement (the one inside the test function) since
patch is already imported at the top; ensure no other references are affected
and run tests to confirm imports remain valid.
🧹 Nitpick comments (2)
tests/parser/test_parser.py (1)
231-247: Consider removing redundant import.The
DEFAULT_DIRSimport at line 235 is redundant since it's already imported insetUp()at line 39. While this works correctly, you could reference the already-importedDEFAULT_DIRSor store it as an instance variable insetUp()for reuse.♻️ Proposed refactor to remove redundant import
Store
DEFAULT_DIRSas an instance attribute insetUp():from tapio.config.settings import DEFAULT_DIRS + self.DEFAULT_DIRS = DEFAULT_DIRS self.input_dir = os.path.join(self.temp_dir, self.site_name, DEFAULT_DIRS["CRAWLED_DIR"])Then use
self.DEFAULT_DIRSin test methods:- from tapio.config.settings import DEFAULT_DIRS - - no_fallback_input_dir = os.path.join(self.temp_dir, self.no_fallback_site_name, DEFAULT_DIRS["CRAWLED_DIR"]) - no_fallback_output_dir = os.path.join(self.temp_dir, self.no_fallback_site_name, DEFAULT_DIRS["PARSED_DIR"]) + no_fallback_input_dir = os.path.join(self.temp_dir, self.no_fallback_site_name, self.DEFAULT_DIRS["CRAWLED_DIR"]) + no_fallback_output_dir = os.path.join(self.temp_dir, self.no_fallback_site_name, self.DEFAULT_DIRS["PARSED_DIR"])tapio/factories.py (1)
51-67: Consider using the broaderEmbeddingstype for the parameter.The type hint
HuggingFaceEmbeddings | Noneis more restrictive than necessary. SinceChromaStore.__init__acceptsEmbeddings(the base type), usingEmbeddings | Nonehere would allow injecting other embedding implementations (e.g., for testing with mocks).♻️ Suggested type broadening
+from langchain_core.embeddings import Embeddings + ... - def create_chroma_store(self, embeddings: HuggingFaceEmbeddings | None = None) -> ChromaStore: + def create_chroma_store(self, embeddings: Embeddings | None = None) -> ChromaStore:
…roved type consistency
User description
Refactor the code for improved readability and maintainability, implement dependency injection for better testing, and enhance documentation in CONTRIBUTING.md and README.md to clarify testing guidelines and configuration structure.
PR Type
Enhancement, Tests
Description
Implement dependency injection pattern across RAG system components
Create factory classes for simplified orchestrator instantiation
Add comprehensive test fixtures and integration test suite
Refactor Parser, ChromaStore, and Vectorizer with injected dependencies
Enhance documentation with programmatic API and configuration guides
Diagram Walkthrough
File Walkthrough
11 files
Export public API with factory classesRefactor with dependency injection for RAG orchestratorUpdate CLI to use factory pattern and dependency injectionAdd RAGConfig dataclass for centralized configurationAdd embedding and RAG configuration defaultsCreate factory classes for dependency injectionRefactor Parser with injected configuration and directoriesImplement dependency injection for vector storeRefactor with injected document and LLM servicesInject embeddings instance for flexibilityInject vector database and text splitter dependencies13 files
Add comprehensive mock fixtures for testingCreate integration test packageAdd integration tests for parser pipelineAdd integration tests for RAG system end-to-endAdd integration tests for vectorization pipelineUpdate tests to use dependency injection patternUpdate tests to use dependency injection patternSimplify tests with injected mock dependenciesSimplify tests with injected mock dependenciesUpdate CLI tests for dependency injection patternRefactor tests to use injected orchestratorUpdate tests to inject embeddings dependencySimplify tests with injected dependencies2 files
Enhance documentation with API and testing guidelinesMove configuration details to CONTRIBUTING.md2 files
Bump version to 2.0.0Add integration test marker configurationSummary by CodeRabbit
New Features
Documentation
Tests
Refactor / Chores