Refactor code structure and enhance documentation by brylie · Pull Request #1 · brylie/tapio

brylie · 2026-02-02T09:50:11Z

User description

Refactor the code for improved readability and maintainability, implement dependency injection for better testing, and enhance documentation in CONTRIBUTING.md and README.md to clarify testing guidelines and configuration structure.

PR Type

Enhancement, Tests

Description

Implement dependency injection pattern across RAG system components
Create factory classes for simplified orchestrator instantiation
Add comprehensive test fixtures and integration test suite
Refactor Parser, ChromaStore, and Vectorizer with injected dependencies
Enhance documentation with programmatic API and configuration guides

Diagram Walkthrough

flowchart LR
  Config["RAGConfig<br/>Configuration"]
  Factory["RAGOrchestratorFactory<br/>Dependency Wiring"]
  Embeddings["HuggingFaceEmbeddings<br/>Injected"]
  ChromaStore["ChromaStore<br/>Injected"]
  DocService["DocumentRetrievalService<br/>Injected"]
  LLMService["LLMService<br/>Injected"]
  Orchestrator["RAGOrchestrator<br/>Coordinated"]
  App["TapioAssistantApp<br/>Simplified"]
  
  Config -- "creates" --> Factory
  Factory -- "creates" --> Embeddings
  Factory -- "creates" --> ChromaStore
  Factory -- "creates" --> DocService
  Factory -- "creates" --> LLMService
  Embeddings -- "injected into" --> ChromaStore
  ChromaStore -- "injected into" --> DocService
  DocService -- "injected into" --> Orchestrator
  LLMService -- "injected into" --> Orchestrator
  Orchestrator -- "injected into" --> App

File Walkthrough

Relevant files

Enhancement

11 files

__init__.py `Export public API with factory classes`	+9/-0
app.py `Refactor with dependency injection for RAG orchestrator`	+51/-88
cli.py `Update CLI to use factory pattern and dependency injection`	+63/-11
config_models.py `Add RAGConfig dataclass for centralized configuration`	+32/-0
settings.py `Add embedding and RAG configuration defaults`	+8/-0
factories.py `Create factory classes for dependency injection`	+183/-0
parser.py `Refactor Parser with injected configuration and directories`	+24/-16
document_retrieval_service.py `Implement dependency injection for vector store`	+19/-13
rag_orchestrator.py `Refactor with injected document and LLM services`	+27/-25
chroma_store.py `Inject embeddings instance for flexibility`	+23/-7
vectorizer.py `Inject vector database and text splitter dependencies`	+30/-39

Tests

13 files

conftest.py `Add comprehensive mock fixtures for testing`	+129/-0
__init__.py `Create integration test package`	+1/-0
test_parser_pipeline.py `Add integration tests for parser pipeline`	+136/-0
test_rag_pipeline.py `Add integration tests for RAG system end-to-end`	+169/-0
test_vectorization_pipeline.py `Add integration tests for vectorization pipeline`	+129/-0
test_parser.py `Update tests to use dependency injection pattern`	+29/-25
test_relative_links.py `Update tests to use dependency injection pattern`	+19/-4
test_document_retrieval_service.py `Simplify tests with injected mock dependencies`	+7/-11
test_rag_orchestrator.py `Simplify tests with injected mock dependencies`	+11/-19
test_cli.py `Update CLI tests for dependency injection pattern`	+22/-136
test_gradio_app.py `Refactor tests to use injected orchestrator`	+20/-70
test_chroma_store.py `Update tests to inject embeddings dependency`	+59/-47
test_vectorizer.py `Simplify tests with injected dependencies`	+38/-132

Documentation

2 files

CONTRIBUTING.md `Enhance documentation with API and testing guidelines`	+196/-2
README.md `Move configuration details to CONTRIBUTING.md`	+1/-86

Configuration changes

2 files

pyproject.toml `Bump version to 2.0.0`	+1/-1
pytest.ini `Add integration test marker configuration`	+4/-0

Summary by CodeRabbit

New Features
- Public API exports: RAGConfig and RAGOrchestratorFactory for programmatic orchestration.
Documentation
- Expanded CONTRIBUTING.md with programmatic API examples, site configuration guidance, default settings, and testing guidelines; README now redirects to CONTRIBUTING.md.
Tests
- Added integration test suite and new pytest integration marker; many test fixtures and end-to-end tests added.
Refactor / Chores
- Project refactored to dependency-injection patterns; bumped version to 2.0.0.

…adability and maintainability

…rity on testing guidelines, configuration structure, and programmatic API usage

coderabbitai · 2026-02-02T09:50:28Z

Warning

Rate limit exceeded

@brylie has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 52 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Refactors the codebase to adopt dependency injection and a factory for RAG orchestration, introduces RAGConfig and factory wiring (RAGOrchestratorFactory), updates constructors to accept injected services, adds defaults and pytest integration marker, bumps project version to 2.0.0, and expands integration tests and contributing docs.

Changes

Cohort / File(s)	Summary
Documentation & Project `CONTRIBUTING.md`, `README.md`, `pyproject.toml`, `pytest.ini`	Expanded CONTRIBUTING.md with testing/programmatic API/site-config docs; removed site-config block from README (points to CONTRIBUTING.md); bumped version to `2.0.0`; added pytest `integration` marker.
Package exports & config defaults `tapio/__init__.py`, `tapio/config/config_models.py`, `tapio/config/settings.py`	Exported `RAGConfig` and `RAGOrchestratorFactory` from package; added `RAGConfig` dataclass and new default constants (`DEFAULT_EMBEDDING_MODEL`, `DEFAULT_LLM_MODEL`, `DEFAULT_MAX_TOKENS`, `DEFAULT_NUM_RESULTS`).
Factories / Wiring `tapio/factories.py`	Added `RAGOrchestratorFactory` to build embeddings, chroma store, document retrieval service, LLM service, and assemble a `RAGOrchestrator`.
App & CLI changes `tapio/app.py`, `tapio/cli.py`	`TapioAssistantApp` now accepts an injected `RAGOrchestrator`; added `check_model_availability`; `main()` and CLI now create/configure a `RAGOrchestrator` via factory and pass it into the app.
Core services (DI) `tapio/services/document_retrieval_service.py`, `tapio/services/rag_orchestrator.py`	Switched services to accept injected dependencies: `DocumentRetrievalService` takes a `vector_store`; `RAGOrchestrator` takes `doc_retrieval_service` and `llm_service`.
Vector store & vectorizer (DI) `tapio/vectorstore/chroma_store.py`, `tapio/vectorstore/vectorizer.py`	`ChromaStore` now accepts injected `embeddings`; `MarkdownVectorizer` now accepts `vector_db` and `text_splitter` (no internal construction).
Parser (DI) `tapio/parser/parser.py`	`Parser` constructor changed to accept `site_config`, `input_dir`, and `output_dir` (no longer loads config via config_path internally).
Tests & fixtures `tests/conftest.py`, `tests/integration/`, `tests/`, `tests/vectorstore/`, `tests/services/`, `tests/parser/*`	Added many fixtures (mocks for embeddings, chroma, LLM, doc retrieval, rag orchestrator), registered `integration` pytest marker, added integration tests for parser/RAG/vectorization, and updated unit tests to the DI-based constructors and factory wiring.
Other test scaffolding `tests/integration/__init__.py`	Added package initializer for integration tests.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI/main()
    participant Factory as RAGOrchestratorFactory
    participant Emb as Embeddings
    participant Store as ChromaStore
    participant DRS as DocumentRetrievalService
    participant LLM as LLMService
    participant Orch as RAGOrchestrator
    participant App as TapioAssistantApp

    CLI->>Factory: create_orchestrator()
    Factory->>Emb: create_embeddings()
    Emb-->>Factory: embeddings instance
    Factory->>Store: create_chroma_store(embeddings)
    Store-->>Factory: chroma store
    Factory->>DRS: create_document_retrieval_service(chroma_store)
    DRS-->>Factory: document retrieval service
    Factory->>LLM: create_llm_service()
    LLM-->>Factory: LLM service
    Factory->>Orch: create_orchestrator(drs, llm)
    Orch-->>Factory: orchestrator
    Factory-->>CLI: orchestrator
    CLI->>App: TapioAssistantApp(rag_orchestrator)
    App-->>CLI: app ready

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through factories, one by one,
Injected services, now neatly spun.
Version two springs into the glade,
Tests and docs in order made.
A rabbit cheers: dependencies, well-played! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Refactor code structure and enhance documentation' is vague and generic, using non-descriptive language that fails to convey the main technical changes (dependency injection refactor, factory pattern implementation).	Consider a more specific title that highlights the primary change, such as 'Implement dependency injection pattern across RAG components' or 'Refactor to factory-based orchestrator initialization and dependency injection'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 98.72% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch improve-maintainability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-02-02T09:51:02Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
🔴	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: 🏷️ Unstructured logging: New log entries are plain-text messages rather than structured logs (e.g., JSON), which reduces auditability and makes monitoring and parsing harder as required by the checklist. Referred Code logger.info( "Initialized RAG orchestrator", ) Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: 🏷️ Missing audit context: The new app flow logs operational messages (e.g., model availability) but does not show any audit-trail fields like user identity or action outcome for potentially critical actions, and it is unclear from the diff whether such audit logging exists elsewhere. Referred Code def check_model_availability(self) -> None: """Check if the LLM model is available. Raises: SystemExit: If the model is not available """ if not self.rag_orchestrator.check_model_availability(): logger.error("Required LLM model is not available") raise SystemExit(1) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: 🏷️ Error exposure unclear: The diff introduces a hard exit on model unavailability and logging, but it is not fully visible whether user-facing error paths (e.g., UI/CLI exception messages) avoid leaking internal details across all execution paths. Referred Code def check_model_availability(self) -> None: """Check if the LLM model is available. Raises: SystemExit: If the model is not available """ if not self.rag_orchestrator.check_model_availability(): logger.error("Required LLM model is not available") raise SystemExit(1) def generate_rag_response( self, query: str, history: list[dict[str, Any]] \| None = None, ) -> tuple[str, str]: """Generate a response using RAG and return both the response and retrieved documents. Args: query: The user's query history: Chat history ... (clipped 15 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: 🏷️ Path handling review: The CLI now constructs `input_dir`/`output_dir` paths from the `site` argument and default directories, and while it appears constrained by `available_sites`, full validation/sanitization of external inputs and filesystem safety cannot be confirmed from the diff alone. Referred Code # Parse a specific site if site in available_sites: typer.echo(f"🔧 Using configuration for site: {site}") # Get site config from config manager site_config = config_manager.get_site_config(site) # Determine input and output directories input_dir = os.path.join(DEFAULT_CONTENT_DIR, site, DEFAULT_DIRS["CRAWLED_DIR"]) output_dir = os.path.join(DEFAULT_CONTENT_DIR, site, DEFAULT_DIRS["PARSED_DIR"]) # Create parser with dependency injection parser = Parser( site_name=site, site_config=site_config, input_dir=input_dir, output_dir=output_dir, ) results = parser.parse_all() # Output information ... (clipped 59 lines) Learn more about managing compliance generic rules or creating your own custom rules
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2026-02-02T09:52:23Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	✅ ~~Refactor the new factory pattern implementation~~ Suggestion Impact: Instead of completing and using VectorizerFactory, the commit removed the incomplete/unused VectorizerFactory class (and its MarkdownTextSplitter import), eliminating the factory pattern implementation rather than refactoring/finishing it. code diff: # File: tapio/factories.py @@ -6,7 +6,6 @@ """ from langchain_community.embeddings import HuggingFaceEmbeddings -from langchain_text_splitters import MarkdownTextSplitter from tapio.config.config_models import RAGConfig from tapio.services.document_retrieval_service import DocumentRetrievalService @@ -125,60 +124,3 @@ llm_service=llm_service, ) - -class VectorizerFactory: - """Factory for creating MarkdownVectorizer instances. - - Handles creation of text splitters and vector database instances for - the vectorization pipeline. - - Args: - collection_name: Name of the ChromaDB collection - persist_directory: Directory path for ChromaDB persistence - embedding_model_name: Name of the HuggingFace embedding model - chunk_size: Size of text chunks for splitting - chunk_overlap: Overlap between consecutive chunks - """ - - def __init__( - self, - collection_name: str, - persist_directory: str = "chroma_db", - embedding_model_name: str = "all-MiniLM-L6-v2", - chunk_size: int = 1000, - chunk_overlap: int = 200, - ) -> None: - """Initialize the vectorizer factory. - - Args: - collection_name: Name of the ChromaDB collection - persist_directory: Directory for ChromaDB persistence - embedding_model_name: HuggingFace embedding model name - chunk_size: Text chunk size - chunk_overlap: Overlap between chunks - """ - self.collection_name = collection_name - self.persist_directory = persist_directory - self.embedding_model_name = embedding_model_name - self.chunk_size = chunk_size - self.chunk_overlap = chunk_overlap - - def create_embeddings(self) -> HuggingFaceEmbeddings: - """Create embeddings instance. - - Returns: - Configured HuggingFaceEmbeddings instance - """ - return HuggingFaceEmbeddings(model_name=self.embedding_model_name) - - def create_text_splitter(self) -> MarkdownTextSplitter: - """Create text splitter for markdown. - - Returns: - Configured MarkdownTextSplitter instance - """ - return MarkdownTextSplitter( - chunk_size=self.chunk_size, - chunk_overlap=self.chunk_overlap, - ) - The `VectorizerFactory` is incomplete and unused. It should be completed to create the `MarkdownVectorizer` and its dependencies, and then used in the `vectorize` CLI command to replace manual dependency creation and ensure consistency with the factory pattern. Examples: tapio/factories.py [129-183] class VectorizerFactory: """Factory for creating MarkdownVectorizer instances. Handles creation of text splitters and vector database instances for the vectorization pipeline. Args: collection_name: Name of the ChromaDB collection persist_directory: Directory path for ChromaDB persistence embedding_model_name: Name of the HuggingFace embedding model ... (clipped 45 lines) tapio/cli.py [362-378] # Create dependencies embeddings = HuggingFaceEmbeddings(model_name=embedding_model) text_splitter = MarkdownTextSplitter( chunk_size=1000, chunk_overlap=200, ) vector_db = Chroma( collection_name=collection_name, embedding_function=embeddings, persist_directory=db_dir, ... (clipped 7 lines) Solution Walkthrough: Before: # In tapio/factories.py class VectorizerFactory: # ... methods to create dependencies... def create_embeddings(self): ... def create_text_splitter(self): ... # Missing a method to create the actual vectorizer # In tapio/cli.py @app.command() def vectorize(...): # Dependencies are created manually embeddings = HuggingFaceEmbeddings(...) text_splitter = MarkdownTextSplitter(...) vector_db = Chroma(...) # Vectorizer is instantiated with manually created dependencies vectorizer = MarkdownVectorizer( vector_db=vector_db, text_splitter=text_splitter, ) vectorizer.process_directory(...) After: # In tapio/factories.py class VectorizerFactory: # ... methods to create dependencies... def create_vector_db(self): ... def create_text_splitter(self): ... def create_vectorizer(self) -> MarkdownVectorizer: # Wires all dependencies together vector_db = self.create_vector_db() text_splitter = self.create_text_splitter() return MarkdownVectorizer(vector_db, text_splitter) # In tapio/cli.py @app.command() def vectorize(...): # Use the factory to create the vectorizer factory = VectorizerFactory(...) vectorizer = factory.create_vectorizer() vectorizer.process_directory(...) Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a significant architectural inconsistency where the new `VectorizerFactory` is incomplete and unused, while the `vectorize` CLI command manually creates dependencies, contradicting the PR's goal of using factories for DI.	Medium
General	Raise custom exception instead of SystemExit In `check_model_availability`, replace the `SystemExit(1)` call with a custom exception (e.g., `ModelUnavailableError`) to allow for more graceful error handling by callers. tapio/app.py [44-52] +class ModelUnavailableError(Exception): + """Custom exception for when the LLM model is not available.""" + pass + +... + def check_model_availability(self) -> None: """Check if the LLM model is available. Raises: - SystemExit: If the model is not available + ModelUnavailableError: If the model is not available """ if not self.rag_orchestrator.check_model_availability(): logger.error("Required LLM model is not available") - raise SystemExit(1) + raise ModelUnavailableError("Required LLM model is not available") `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that using `SystemExit` is poor practice in a class, as it hinders reusability and testability. Replacing it with a custom exception is a significant improvement to the code's design and robustness.	Medium
	Expose chunking parameters in CLI Expose `chunk_size` and `chunk_overlap` as command-line options in the `vectorize` command to allow users to configure the text splitting process. tapio/cli.py [301-325] @app.command() def vectorize( site: str \| None = typer.Argument( None, help="Site to vectorize (e.g. 'migri'). If not provided, all sites are processed.", ), embedding_model: str = typer.Option( "all-MiniLM-L6-v2", "--model", "-m", help="Name of the sentence-transformers model to use", + ), + chunk_size: int = typer.Option( + 1000, + "--chunk-size", + help="Size of text chunks for splitting", + ), + chunk_overlap: int = typer.Option( + 200, + "--chunk-overlap", + help="Overlap between consecutive text chunks", ), batch_size: int = typer.Option( 20, "--batch-size", "-b", help="Number of documents to process in each batch", ), verbose: bool = typer.Option( False, "--verbose", "-v", help="Enable verbose output", ), ) -> None: `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: The suggestion correctly points out that `chunk_size` and `chunk_overlap` are hardcoded within the `vectorize` command. Exposing them as CLI options is a valuable enhancement for user flexibility and tuning the vectorization process.	Low
	Expose RAG parameters in CLI Add command-line options for `embedding_model` and `num_results` to the `tapio_app` command to allow for greater configuration of the RAG pipeline. tapio/cli.py [453-472] @app.command() def tapio_app( model_name: str = typer.Option( "llama3.2:latest", "--model-name", "-m", help="Ollama model to use for LLM inference", + ), + embedding_model: str = typer.Option( + DEFAULT_EMBEDDING_MODEL, + "--embedding-model", + help="HuggingFace embedding model to use", + ), + num_results: int = typer.Option( + DEFAULT_NUM_RESULTS, + "--num-results", + "-n", + help="Number of documents to retrieve for context", ), max_tokens: int = typer.Option( 1024, "--max-tokens", "-t", help="Maximum number of tokens to generate", ), share: bool = typer.Option( False, "--share", help="Create a shareable link for the app", ), ) -> None: `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that key RAG parameters like `embedding_model_name` and `num_results` are hardcoded in the `tapio_app` command. Adding them as CLI options significantly improves the command's flexibility and usability for tuning the RAG pipeline.	Low
Update

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

CONTRIBUTING.md (1)

60-66: ⚠️ Potential issue | 🟡 Minor

Align the architecture diagram file name with the actual module name.

The diagram references gradio_app.py, but the Project Structure section lists app.py. Please update one of them so contributors don’t chase a non-existent file.

Also applies to: 321-323

🤖 Fix all issues with AI agents

In `@CONTRIBUTING.md`:
- Around line 456-459: Update the CONTRIBUTING.md entry for the fallback_to_body
option to hyphenate the compound adjective by changing the phrase "full body
content" to "full-body content" so the list item reads "... Use full-body
content if selectors fail (default: true)"; locate the line describing
`fallback_to_body` and make that single-word hyphenation change.

🧹 Nitpick comments (10)

tapio/vectorstore/chroma_store.py (1)
20-42: Good DI refactor; consider a more specific type hint for embeddings.

The dependency injection pattern is correctly implemented. Using Any for the embeddings parameter is pragmatic given LangChain's various embedding interfaces, but you could consider using langchain_core.embeddings.Embeddings base class for better type safety and IDE support.
💡 Optional type improvement
+from langchain_core.embeddings import Embeddings
+
 class ChromaStore:
     ...
     def __init__(
         self,
         collection_name: str,
-        embeddings: Any,
+        embeddings: Embeddings,
         persist_directory: str = "chroma_db",
     ) -> None:
tests/parser/test_relative_links.py (1)
205-216: Consider extracting ConfigManager import to module level.

The ConfigManager import is repeated inside test_domain_specific_url_handling. Since it's also used in setUp, consider moving the import to the top of the file with other imports for consistency.
♻️ Suggested change

Add to imports at the top of the file:
from tapio.config.config_manager import ConfigManager
Then remove the local imports on lines 90-91 and 206-207.
tapio/config/config_models.py (1)
146-151: Consider using constants from settings.py for defaults.

The AI summary mentions tapio/config/settings.py defines DEFAULT_EMBEDDING_MODEL, DEFAULT_LLM_MODEL, etc. Using those constants here would centralize default values and follow the DRY principle.
#!/bin/bash
# Check if settings.py has these constants
rg -n "DEFAULT_EMBEDDING_MODEL|DEFAULT_LLM_MODEL|DEFAULT_MAX_TOKENS|DEFAULT_NUM_RESULTS" tapio/config/settings.py
tests/integration/test_rag_pipeline.py (2)
114-119: Consider using unittest.mock.patch instead of monkey-patching.

Directly assigning a lambda to factory.create_embeddings works but is less explicit than using proper mocking, which provides better cleanup and is more idiomatic for tests.
♻️ Alternative using patch
+    from unittest.mock import patch
+
     # Create factory
     factory = RAGOrchestratorFactory(config)
 
-    # Mock the create_embeddings to return our mock
-    factory.create_embeddings = lambda: mock_embeddings
+    # Mock the create_embeddings to return our mock
+    with patch.object(factory, 'create_embeddings', return_value=mock_embeddings):
+        # Create orchestrator
+        orchestrator = factory.create_orchestrator()
 
-    # Create orchestrator
-    orchestrator = factory.create_orchestrator()
163-164: Assertion could be more precise.

assert len(results) <= 2 is a weak assertion. With 3 documents added and num_results=2, we'd typically expect exactly 2 results. Consider using assert len(results) == 2 unless there's a reason fewer results might be returned (e.g., similarity threshold filtering).
♻️ Suggested change
-    assert len(results) <= 2  # Should respect num_results limit
+    assert len(results) == 2  # Should return exactly num_results documents
tests/integration/test_vectorization_pipeline.py (1)
59-65: Consider hoisting the Chroma import to module level.

The from langchain_chroma import Chroma import is repeated inside each test function. Moving it to the top of the file with other imports would improve readability, unless there's a specific reason to delay the import (e.g., avoiding import errors when running non-integration tests).
Suggested refactor
 from langchain_text_splitters import MarkdownTextSplitter  # type: ignore[import-not-found]
+from langchain_chroma import Chroma  # type: ignore[import-not-found]
 
 from tapio.vectorstore.vectorizer import MarkdownVectorizer
Then remove the inline imports at lines 59 and 102.
Also applies to: 102-108
tests/parser/test_parser.py (2)
143-148: Remove duplicate import of ConfigManager.

ConfigManager is already imported at line 11. The local import here is redundant.
Suggested fix
-        # Create config manager and get site config
-        from tapio.config.config_manager import ConfigManager
-
-        config_manager = ConfigManager(self.config_path)
+        # Create config manager and get site config
+        config_manager = ConfigManager(self.config_path)
189-195: Remove duplicate import of ConfigManager.

ConfigManager is already imported at line 11. This local import at line 191 is redundant.
Suggested fix
     def test_init_with_invalid_site(self):
         """Test initialization with invalid site."""
-        from tapio.config.config_manager import ConfigManager
-
         config_manager = ConfigManager(self.config_path)
         with self.assertRaises(ValueError):
             config_manager.get_site_config("nonexistent")
tests/vectorstore/test_chroma_store.py (1)
242-271: Add @patch decorator to avoid real ChromaDB initialization.

This test doesn't patch Chroma, so it will attempt to create a real ChromaDB instance which may be slow or fail in some environments. Consider adding the patch for consistency with other tests.
Suggested fix
+    `@patch`("tapio.vectorstore.chroma_store.Chroma")
-    def test_enhance_document_with_citation(self, mock_embeddings):
+    def test_enhance_document_with_citation(self, mock_chroma, mock_embeddings):
         """Test enhancing a document with citation information."""
         # Initialize ChromaStore with injected embeddings
         store = ChromaStore(
             collection_name="test_collection",
             embeddings=mock_embeddings,
         )
tests/conftest.py (1)
19-39: Avoid shared mutable embedding lists in the mock.

embed_query/embed_documents currently return the same list instance, which can cause cross-test coupling if any downstream code mutates it. Returning copies better mirrors real embedding behavior.
♻️ Proposed adjustment
-    embeddings.embed_query.return_value = dummy_embedding
+    embeddings.embed_query.side_effect = lambda _: dummy_embedding.copy()
@@
-    def embed_documents_mock(texts):
-        return [dummy_embedding for _ in texts]
+    def embed_documents_mock(texts):
+        return [dummy_embedding.copy() for _ in texts]

CONTRIBUTING.md

…s definition to streamline the codebase and enhance maintainability.

…e safety

…r improved clarity

… for improved maintainability and consistency

…y in integration tests

…or improved clarity

…ed clarity

…r improved mocking

… improved test reliability

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@tapio/factories.py`:
- Line 8: Replace the deprecated import of HuggingFaceEmbeddings in
tapio/factories.py: change the module import from langchain_community.embeddings
to the supported langchain_huggingface package so the symbol
HuggingFaceEmbeddings is imported from langchain_huggingface instead; locate the
import line that currently reads using HuggingFaceEmbeddings and update it to
import from langchain_huggingface to eliminate the deprecation warning and match
usage elsewhere (e.g., as done in cli.py).

In `@tests/integration/test_rag_pipeline.py`:
- Line 78: Remove the duplicate import of patch by deleting the redundant "from
unittest.mock import patch" statement (the one inside the test function) since
patch is already imported at the top; ensure no other references are affected
and run tests to confirm imports remain valid.

🧹 Nitpick comments (2)

tests/parser/test_parser.py (1)
231-247: Consider removing redundant import.

The DEFAULT_DIRS import at line 235 is redundant since it's already imported in setUp() at line 39. While this works correctly, you could reference the already-imported DEFAULT_DIRS or store it as an instance variable in setUp() for reuse.
♻️ Proposed refactor to remove redundant import

Store DEFAULT_DIRS as an instance attribute in setUp():
         from tapio.config.settings import DEFAULT_DIRS
+        self.DEFAULT_DIRS = DEFAULT_DIRS

         self.input_dir = os.path.join(self.temp_dir, self.site_name, DEFAULT_DIRS["CRAWLED_DIR"])
Then use self.DEFAULT_DIRS in test methods:
-        from tapio.config.settings import DEFAULT_DIRS
-
-        no_fallback_input_dir = os.path.join(self.temp_dir, self.no_fallback_site_name, DEFAULT_DIRS["CRAWLED_DIR"])
-        no_fallback_output_dir = os.path.join(self.temp_dir, self.no_fallback_site_name, DEFAULT_DIRS["PARSED_DIR"])
+        no_fallback_input_dir = os.path.join(self.temp_dir, self.no_fallback_site_name, self.DEFAULT_DIRS["CRAWLED_DIR"])
+        no_fallback_output_dir = os.path.join(self.temp_dir, self.no_fallback_site_name, self.DEFAULT_DIRS["PARSED_DIR"])
tapio/factories.py (1)
51-67: Consider using the broader Embeddings type for the parameter.

The type hint HuggingFaceEmbeddings | None is more restrictive than necessary. Since ChromaStore.__init__ accepts Embeddings (the base type), using Embeddings | None here would allow injecting other embedding implementations (e.g., for testing with mocks).
♻️ Suggested type broadening
+from langchain_core.embeddings import Embeddings
+
 ...
 
-    def create_chroma_store(self, embeddings: HuggingFaceEmbeddings | None = None) -> ChromaStore:
+    def create_chroma_store(self, embeddings: Embeddings | None = None) -> ChromaStore:

tapio/factories.py

tests/integration/test_rag_pipeline.py

…improved clarity

…ntainability

…roved type consistency

brylie added 3 commits February 2, 2026 11:43

Refactor code structure and remove redundant sections for improved re…

19e9a38

…adability and maintainability

Refactor with dependency injection

820d779

Enhance documentation in CONTRIBUTING.md and README.md to improve cla…

736702f

…rity on testing guidelines, configuration structure, and programmatic API usage

qodo-code-review bot added Review effort 4/5 Failed compliance check labels Feb 2, 2026

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

CONTRIBUTING.md Show resolved Hide resolved

brylie added 10 commits February 2, 2026 12:03

Remove VectorizerFactory from the tapio package initialization and it…

32a2743

…s definition to streamline the codebase and enhance maintainability.

Refactor ChromaStore to use specific Embeddings type for improved typ…

a4e6232

…e safety

Remove redundant import of ConfigManager in test_relative_links.py fo…

59f6226

…r improved clarity

Refactor RAGConfig to use default settings from tapio.config.settings…

e6bafbd

… for improved maintainability and consistency

Refactor test_rag_pipeline.py to improve mocking and assertion clarit…

f85aa65

…y in integration tests

Remove redundant import of Chroma in test_vectorization_pipeline.py f…

e4fb224

…or improved clarity

Remove redundant import of ConfigManager in test_parser.py for improv…

667270a

…ed clarity

Add missing patch decorator to test_enhance_document_with_citation fo…

8d51c8e

…r improved mocking

Refactor mock_embeddings to avoid mutation in returned embeddings for…

d6b2a3c

… improved test reliability

Fix typo in fallback_to_body description for improved clarity

f3f2d46

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

tapio/factories.py Outdated Show resolved Hide resolved

tests/integration/test_rag_pipeline.py Outdated Show resolved Hide resolved

brylie added 4 commits February 2, 2026 12:33

Fix import statement for HuggingFaceEmbeddings to use the correct module

1aab554

Remove redundant import of patch in test_rag_pipeline_end_to_end for …

8f18d6d

…improved clarity

Refactor test_parser to use class-level DEFAULT_DIRS for improved mai…

f4d6621

…ntainability

Refactor create_chroma_store method to accept Embeddings type for imp…

ef2239d

…roved type consistency

brylie merged commit 78dfcee into main Feb 2, 2026
4 checks passed

brylie deleted the improve-maintainability branch February 2, 2026 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor code structure and enhance documentation#1

Refactor code structure and enhance documentation#1
brylie merged 17 commits intomainfrom
improve-maintainability

brylie commented Feb 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

qodo-code-review bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot commented Feb 2, 2026 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brylie commented Feb 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

qodo-code-review bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

brylie commented Feb 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

qodo-code-review bot commented Feb 2, 2026 •

edited

Loading

qodo-code-review bot commented Feb 2, 2026 •

edited

Loading