Skip to content

Athena: Add programming quality llm module#380

Open
konrad2002 wants to merge 31 commits intomainfrom
athena/feature/add-programming-quality-llm-module
Open

Athena: Add programming quality llm module#380
konrad2002 wants to merge 31 commits intomainfrom
athena/feature/add-programming-quality-llm-module

Conversation

@konrad2002
Copy link
Member

@konrad2002 konrad2002 commented Dec 18, 2025

Motivation and Context

Introduction of a new module_programming_quality_llm module that extends the programming feedback capabilities. While the existing module_programming_llm focuses on verifying code correctness, this new module analyzes code quality based on established criteria including readability, complexity, architectural design, and maintainability.

Description

New module: module_programming_quality_llm

This is a new module designed to provide LLM-based code quality analysis for student programming submissions. The module evaluates submissions against standardized quality criteria:

Evaluation Categories:

  1. Code Quality and Maintainability - Readability, complexity, code smells, modularity, and documentation
  2. Architectural Quality - Package structure, layering, error handling, and design patterns

Module Structure:

  • Core generators for quality analysis:
    • generate_graded_suggestions_by_file.py - Generate quality feedback with grades
    • generate_non_graded_suggestions_by_file.py - Generate quality feedback without grades
    • generate_summary_by_file.py - Summary of quality analysis per file
  • Supporting utilities for prompt management and processing

Steps for Testing

  1. Start the module_programming_quality_llm service and verify successful initialization
  2. Test the module's core functionality:
    • Generating graded quality suggestions for code submissions
    • Generating non-graded quality suggestions for code submissions
    • Creating quality analysis summaries per file
  3. Verify the module correctly evaluates submissions against the defined quality criteria:
    • Code readability and naming conventions
    • Complexity analysis (cyclomatic, method length, etc.)
    • Code smell detection (duplication, dead code, etc.)
    • Modularity and separation of concerns
    • Architectural patterns and error handling
  4. Validate prompt templates work correctly with the new model
  5. Check module logs for any initialization or inference errors

Testserver States

Note

These badges show the state of the test servers.
Green = Currently available, Red = Currently locked
Click on the badges to get to the test servers.


Screenshots

Summary by CodeRabbit

  • New Features

    • Added a new LLM-based programming quality assessment module that automatically generates feedback on code submissions.
    • Module provides both graded feedback and non-graded improvement suggestions analyzing code quality, maintainability, and architectural patterns.
  • Performance

    • Increased HTTP client timeout for async module requests from 600 to 800 seconds to improve reliability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

📝 Walkthrough

Walkthrough

This pull request introduces a new LLM-based programming quality assessment module that generates both graded and non-graded feedback suggestions. The module integrates with existing infrastructure via configuration files, includes Docker deployment setup, implements file-level analysis using LLM prompts, and adds utilities for repository diffing and content processing.

Changes

Cohort / File(s) Summary
Module Registration
athena/assessment_module_manager/modules.ini, athena/assessment_module_manager/modules.docker.ini, athena/assessment_module_manager/assessment_module_manager/module/request_to_module.py
New [module_programming_quality_llm] configuration added with HTTP endpoint, type set to programming, and feedback capability flags. HTTP client timeout increased from 600s to 800s for async requests.
Docker & Deployment
athena/docker-compose.yml, athena/docker-compose.prod.yml, athena/modules/programming/module_programming_quality_llm/Dockerfile
New service module_programming_quality_llm added with port 5007, dependency on athena and llm_core, and multi-stage Docker image build using Python 3.11 and Poetry.
Module Configuration & Initialization
athena/modules/programming/module_programming_quality_llm/module.conf, athena/modules/programming/module_programming_quality_llm/llm_config.yml, athena/modules/programming/module_programming_quality_llm/.env.example, athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/__init__.py
Module name, type, port declared. LLM models configured (Azure OpenAI GPT-4o variants). Environment variables defined for LLM credentials, database, and optional tracing. Package initialization loads .env for local development.
Core Configuration Model
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/config.py
Comprehensive Pydantic model hierarchy defining prompt templates, approach configurations, and token limits for graded/non-graded feedback workflows. Includes Configuration class (208 lines) with nested models for split prompts, feedback generation, and file summarization.
Main Module Entry Point
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/__main__.py
Four Athena-annotated handler functions: receive_submissions, select_submission, process_incoming_feedback, and suggest_feedback (routes to graded/non-graded generators). Preloads tiktoken encoding and delegates to specialized suggestion modules.
Graded Feedback Generation
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/generate_graded_suggestions_by_file.py
Async function analyzing files with solution context, building prioritized file prompts, executing parallel LLM predictions (177 lines). Returns feedback with line ranges, titles, descriptions, and credits mapped to grading instructions.
Non-Graded Feedback Generation
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/generate_non_graded_suggestions_by_file.py
Async function analyzing changed files via diffs, dynamically adjusting token limits, applying priority-based feedback filtering (352 lines). Categorizes suggestions as CRITICAL, MAJOR, MINOR, or NICE TO HAVE; caps output by category.
Solution Summary Generation
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/generate_summary_by_file.py
Async function computing file-level summaries via diffs, concurrent LLM predictions, and aggregation into SolutionSummary (151 lines). Includes FileDescription and SolutionSummary models.
Problem & Grading Statement Splitting
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/split_problem_statement_by_file.py, athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/split_grading_instructions_by_file.py
Async functions decomposing general instructions/statements into per-file variants via LLM with deduplication logic. Returns aggregated results grouped by file name (148 and 138 lines respectively).
Utility Functions
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/helpers/utils.py
Repository operations: file loading, merging by filepath, formatting grading instructions, diff computation with optional remote handling, temporary remote context manager, programming language extension mapping, and line numbering (141 lines).
Prompt Templates
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/prompts/*
Seven prompt modules defining system/human message pairs for: graded feedback generation, non-graded feedback generation, quality reviews, splitting grading instructions, splitting problem statements (with/without solutions), and file summarization. Total ~250 lines of prompt text.
LLM Core Enhancement
athena/llm_core/llm_core/core/predict_and_parse.py
Added _parse_with_null_check() for LM Studio JSON parsing, replacing dual-parser pattern with safer null-aware extraction. Preserves PydanticOutputParser for non-LM Studio providers.
Project Setup
athena/modules/programming/module_programming_quality_llm/pyproject.toml, athena/modules/programming/module_programming_quality_llm/poetry.toml, athena/modules/programming/module_programming_quality_llm/README.md
Poetry configuration with Python 3.11, dependencies on athena/llm_core (local paths), LangChain, GitPython, tiktoken, pytest tooling. In-project virtualenv enabled. README documents setup and usage.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A module hopped into the codebase today,
With prompts and with feedback in every which way!
From graded to summaries, splitting with care,
LLMs analyzing code everywhere! 🚀✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title directly describes the primary change: adding a new programming quality LLM module to the Athena system. It is concise, clear, and accurately summarizes the main changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (4.0.4)
athena/assessment_module_manager/assessment_module_manager/module/request_to_module.py
athena/llm_core/llm_core/core/predict_and_parse.py
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/__init__.py
  • 15 others

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

Athena Test Results Summary

TestsPassed ✅SkippedFailed
Athena Test Report10 ran10 passed0 skipped0 failed

Failing Tests Summary

TestResult
No test annotations available

@github-actions
Copy link

📊 Detailed Coverage Table

Combining 3 coverage files...
Parsing test-results/programming_module_programming_llm_coverage.xml...
Parsing test-results/text_module_text_llm_coverage.xml...
Parsing test-results/modeling_module_modeling_llm_coverage.xml...
Combining duplicate packages...
Creating combined coverage file: test-results/combined_coverage.xml
✅ Combined coverage saved to test-results/combined_coverage.xml
📊 Combined 31 unique packages

📊 Combined Coverage Summary:

Package Line Rate Branch Rate Status
athena 37.8% 3.3%
athena.helpers 100.0% 100.0%
athena.helpers.programming 33.0% 0.0%
athena.helpers.text 0.0% 100.0%
athena.models 0.0% 0.0%
athena.schemas 76.5% 8.3%
athena.storage 21.1% 0.0%
llm_core 100.0% 100.0%
llm_core.core 26.0% 6.2%
llm_core.loaders 79.3% 37.5%
llm_core.loaders.model_loaders 68.5% 37.5%
llm_core.models 66.7% 35.7%
llm_core.models.providers 77.2% 56.2%
llm_core.utils 52.8% 18.5%
modeling.module_modeling_llm.module_modeling_llm 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer 71.4% 50.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer.parser 79.2% 60.2%
modeling.module_modeling_llm.module_modeling_llm.core 88.9% 50.0%
modeling.module_modeling_llm.module_modeling_llm.models 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.prompts 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.utils 100.0% 50.0%
programming.module_programming_llm.module_programming_llm 100.0% 100.0%
programming.module_programming_llm.module_programming_llm.helpers 27.6% 0.0%
programming.module_programming_llm.module_programming_llm.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm 72.7% 12.5%
text.module_text_llm.module_text_llm.default_approach 66.4% 36.1%
text.module_text_llm.module_text_llm.default_approach.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm.default_approach.schemas 100.0% 100.0%
text.module_text_llm.module_text_llm.divide_and_conquer 34.0% 0.0%
text.module_text_llm.module_text_llm.helpers 55.4% 26.7%
text.module_text_llm.module_text_llm.self_consistency 46.2% 0.0%

Total packages: 31

Note: Coverage thresholds: ✅ (≥70%), ❌ (<70%)

@github-actions
Copy link

Athena Test Results Summary

TestsPassed ✅SkippedFailed
Athena Test Report10 ran10 passed0 skipped0 failed

Failing Tests Summary

TestResult
No test annotations available

@github-actions
Copy link

📊 Detailed Coverage Table

Combining 3 coverage files...
Parsing test-results/programming_module_programming_llm_coverage.xml...
Parsing test-results/text_module_text_llm_coverage.xml...
Parsing test-results/modeling_module_modeling_llm_coverage.xml...
Combining duplicate packages...
Creating combined coverage file: test-results/combined_coverage.xml
✅ Combined coverage saved to test-results/combined_coverage.xml
📊 Combined 31 unique packages

📊 Combined Coverage Summary:

Package Line Rate Branch Rate Status
athena 37.8% 3.3%
athena.helpers 100.0% 100.0%
athena.helpers.programming 33.0% 0.0%
athena.helpers.text 0.0% 100.0%
athena.models 0.0% 0.0%
athena.schemas 76.5% 8.3%
athena.storage 21.1% 0.0%
llm_core 100.0% 100.0%
llm_core.core 26.0% 6.2%
llm_core.loaders 79.3% 37.5%
llm_core.loaders.model_loaders 68.5% 37.5%
llm_core.models 66.7% 35.7%
llm_core.models.providers 77.2% 56.2%
llm_core.utils 52.8% 18.5%
modeling.module_modeling_llm.module_modeling_llm 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer 71.4% 50.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer.parser 79.2% 60.2%
modeling.module_modeling_llm.module_modeling_llm.core 88.9% 50.0%
modeling.module_modeling_llm.module_modeling_llm.models 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.prompts 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.utils 100.0% 50.0%
programming.module_programming_llm.module_programming_llm 100.0% 100.0%
programming.module_programming_llm.module_programming_llm.helpers 27.6% 0.0%
programming.module_programming_llm.module_programming_llm.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm 72.7% 12.5%
text.module_text_llm.module_text_llm.default_approach 66.4% 36.1%
text.module_text_llm.module_text_llm.default_approach.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm.default_approach.schemas 100.0% 100.0%
text.module_text_llm.module_text_llm.divide_and_conquer 34.0% 0.0%
text.module_text_llm.module_text_llm.helpers 55.4% 26.7%
text.module_text_llm.module_text_llm.self_consistency 46.2% 0.0%

Total packages: 31

Note: Coverage thresholds: ✅ (≥70%), ❌ (<70%)

@github-actions
Copy link

github-actions bot commented Jan 1, 2026

There hasn't been any activity on this pull request recently. Therefore, this pull request has been automatically marked as stale and will be closed if no further activity occurs within seven days. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 1, 2026
@github-actions
Copy link

github-actions bot commented Jan 1, 2026

Athena Test Results Summary

TestsPassed ✅SkippedFailed
Athena Test Report10 ran10 passed0 skipped0 failed

Failing Tests Summary

TestResult
No test annotations available

@github-actions
Copy link

github-actions bot commented Jan 1, 2026

📊 Detailed Coverage Table

Combining 3 coverage files...
Parsing test-results/programming_module_programming_llm_coverage.xml...
Parsing test-results/text_module_text_llm_coverage.xml...
Parsing test-results/modeling_module_modeling_llm_coverage.xml...
Combining duplicate packages...
Creating combined coverage file: test-results/combined_coverage.xml
✅ Combined coverage saved to test-results/combined_coverage.xml
📊 Combined 31 unique packages

📊 Combined Coverage Summary:

Package Line Rate Branch Rate Status
athena 37.8% 3.3%
athena.helpers 100.0% 100.0%
athena.helpers.programming 33.0% 0.0%
athena.helpers.text 0.0% 100.0%
athena.models 0.0% 0.0%
athena.schemas 76.5% 8.3%
athena.storage 21.1% 0.0%
llm_core 100.0% 100.0%
llm_core.core 26.0% 6.2%
llm_core.loaders 79.3% 37.5%
llm_core.loaders.model_loaders 68.5% 37.5%
llm_core.models 66.7% 35.7%
llm_core.models.providers 77.2% 56.2%
llm_core.utils 52.8% 18.5%
modeling.module_modeling_llm.module_modeling_llm 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer 71.4% 50.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer.parser 79.2% 60.2%
modeling.module_modeling_llm.module_modeling_llm.core 88.9% 50.0%
modeling.module_modeling_llm.module_modeling_llm.models 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.prompts 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.utils 100.0% 50.0%
programming.module_programming_llm.module_programming_llm 100.0% 100.0%
programming.module_programming_llm.module_programming_llm.helpers 27.6% 0.0%
programming.module_programming_llm.module_programming_llm.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm 72.7% 12.5%
text.module_text_llm.module_text_llm.default_approach 66.4% 36.1%
text.module_text_llm.module_text_llm.default_approach.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm.default_approach.schemas 100.0% 100.0%
text.module_text_llm.module_text_llm.divide_and_conquer 34.0% 0.0%
text.module_text_llm.module_text_llm.helpers 55.4% 26.7%
text.module_text_llm.module_text_llm.self_consistency 46.2% 0.0%

Total packages: 31

Note: Coverage thresholds: ✅ (≥70%), ❌ (<70%)

@github-actions
Copy link

Athena Test Results Summary

TestsPassed ✅SkippedFailed
Athena Test Report10 ran10 passed0 skipped0 failed

Failing Tests Summary

TestResult
No test annotations available

@github-actions
Copy link

📊 Detailed Coverage Table

Combining 3 coverage files...
Parsing test-results/programming_module_programming_llm_coverage.xml...
Parsing test-results/text_module_text_llm_coverage.xml...
Parsing test-results/modeling_module_modeling_llm_coverage.xml...
Combining duplicate packages...
Creating combined coverage file: test-results/combined_coverage.xml
✅ Combined coverage saved to test-results/combined_coverage.xml
📊 Combined 31 unique packages

📊 Combined Coverage Summary:

Package Line Rate Branch Rate Status
athena 37.8% 3.3%
athena.helpers 100.0% 100.0%
athena.helpers.programming 33.0% 0.0%
athena.helpers.text 0.0% 100.0%
athena.models 0.0% 0.0%
athena.schemas 76.5% 8.3%
athena.storage 21.1% 0.0%
llm_core 100.0% 100.0%
llm_core.core 24.4% 5.7%
llm_core.loaders 79.3% 37.5%
llm_core.loaders.model_loaders 68.5% 37.5%
llm_core.models 66.7% 35.7%
llm_core.models.providers 77.2% 56.2%
llm_core.utils 52.8% 18.5%
modeling.module_modeling_llm.module_modeling_llm 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer 71.4% 50.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer.parser 79.2% 60.2%
modeling.module_modeling_llm.module_modeling_llm.core 88.9% 50.0%
modeling.module_modeling_llm.module_modeling_llm.models 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.prompts 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.utils 100.0% 50.0%
programming.module_programming_llm.module_programming_llm 100.0% 100.0%
programming.module_programming_llm.module_programming_llm.helpers 27.6% 0.0%
programming.module_programming_llm.module_programming_llm.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm 72.7% 12.5%
text.module_text_llm.module_text_llm.default_approach 66.4% 36.1%
text.module_text_llm.module_text_llm.default_approach.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm.default_approach.schemas 100.0% 100.0%
text.module_text_llm.module_text_llm.divide_and_conquer 34.0% 0.0%
text.module_text_llm.module_text_llm.helpers 55.4% 26.7%
text.module_text_llm.module_text_llm.self_consistency 46.2% 0.0%

Total packages: 31

Note: Coverage thresholds: ✅ (≥70%), ❌ (<70%)

@github-actions
Copy link

Athena Test Results Summary

TestsPassed ✅SkippedFailed
Athena Test Report10 ran10 passed0 skipped0 failed

Failing Tests Summary

TestResult
No test annotations available

@github-actions
Copy link

📊 Detailed Coverage Table

Combining 3 coverage files...
Parsing test-results/programming_module_programming_llm_coverage.xml...
Parsing test-results/text_module_text_llm_coverage.xml...
Parsing test-results/modeling_module_modeling_llm_coverage.xml...
Combining duplicate packages...
Creating combined coverage file: test-results/combined_coverage.xml
✅ Combined coverage saved to test-results/combined_coverage.xml
📊 Combined 31 unique packages

📊 Combined Coverage Summary:

Package Line Rate Branch Rate Status
athena 37.8% 3.3%
athena.helpers 100.0% 100.0%
athena.helpers.programming 33.0% 0.0%
athena.helpers.text 0.0% 100.0%
athena.models 0.0% 0.0%
athena.schemas 76.5% 8.3%
athena.storage 21.1% 0.0%
llm_core 100.0% 100.0%
llm_core.core 24.4% 5.7%
llm_core.loaders 79.3% 37.5%
llm_core.loaders.model_loaders 68.5% 37.5%
llm_core.models 66.7% 35.7%
llm_core.models.providers 77.2% 56.2%
llm_core.utils 52.8% 18.5%
modeling.module_modeling_llm.module_modeling_llm 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer 71.4% 50.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer.parser 79.2% 60.2%
modeling.module_modeling_llm.module_modeling_llm.core 88.9% 50.0%
modeling.module_modeling_llm.module_modeling_llm.models 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.prompts 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.utils 100.0% 50.0%
programming.module_programming_llm.module_programming_llm 100.0% 100.0%
programming.module_programming_llm.module_programming_llm.helpers 27.6% 0.0%
programming.module_programming_llm.module_programming_llm.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm 72.7% 12.5%
text.module_text_llm.module_text_llm.default_approach 66.4% 36.1%
text.module_text_llm.module_text_llm.default_approach.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm.default_approach.schemas 100.0% 100.0%
text.module_text_llm.module_text_llm.divide_and_conquer 34.0% 0.0%
text.module_text_llm.module_text_llm.helpers 55.4% 26.7%
text.module_text_llm.module_text_llm.self_consistency 46.2% 0.0%

Total packages: 31

Note: Coverage thresholds: ✅ (≥70%), ❌ (<70%)

@az108 az108 added the deploy:athena-test1 Athena Test Server 1 label Jan 22, 2026
@az108 az108 temporarily deployed to Athena - Test 1 January 22, 2026 18:43 — with GitHub Actions Inactive
@github-actions github-actions bot added lock:athena-test1 Is currently deployed to Athena Test Server 1 and removed deploy:athena-test1 Athena Test Server 1 labels Jan 22, 2026
@az108 az108 added deploy:athena-test1 Athena Test Server 1 and removed lock:athena-test1 Is currently deployed to Athena Test Server 1 labels Jan 22, 2026
@az108 az108 temporarily deployed to Athena - Test 1 January 22, 2026 21:42 — with GitHub Actions Inactive
@github-actions github-actions bot added lock:athena-test1 Is currently deployed to Athena Test Server 1 and removed deploy:athena-test1 Athena Test Server 1 labels Jan 22, 2026
@github-actions
Copy link

Athena Test Results Summary

TestsPassed ✅SkippedFailed
Athena Test Report10 ran10 passed0 skipped0 failed

Failing Tests Summary

TestResult
No test annotations available

@github-actions
Copy link

📊 Detailed Coverage Table

Combining 3 coverage files...
Parsing test-results/programming_module_programming_llm_coverage.xml...
Parsing test-results/text_module_text_llm_coverage.xml...
Parsing test-results/modeling_module_modeling_llm_coverage.xml...
Combining duplicate packages...
Creating combined coverage file: test-results/combined_coverage.xml
✅ Combined coverage saved to test-results/combined_coverage.xml
📊 Combined 31 unique packages

📊 Combined Coverage Summary:

Package Line Rate Branch Rate Status
athena 37.8% 3.3%
athena.helpers 100.0% 100.0%
athena.helpers.programming 33.0% 0.0%
athena.helpers.text 0.0% 100.0%
athena.models 0.0% 0.0%
athena.schemas 76.5% 8.3%
athena.storage 21.1% 0.0%
llm_core 100.0% 100.0%
llm_core.core 23.6% 5.7%
llm_core.loaders 79.3% 37.5%
llm_core.loaders.model_loaders 68.5% 37.5%
llm_core.models 66.7% 35.7%
llm_core.models.providers 77.2% 56.2%
llm_core.utils 52.8% 18.5%
modeling.module_modeling_llm.module_modeling_llm 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer 71.4% 50.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer.parser 79.2% 60.2%
modeling.module_modeling_llm.module_modeling_llm.core 88.9% 50.0%
modeling.module_modeling_llm.module_modeling_llm.models 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.prompts 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.utils 100.0% 50.0%
programming.module_programming_llm.module_programming_llm 100.0% 100.0%
programming.module_programming_llm.module_programming_llm.helpers 27.6% 0.0%
programming.module_programming_llm.module_programming_llm.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm 72.7% 12.5%
text.module_text_llm.module_text_llm.default_approach 66.4% 36.1%
text.module_text_llm.module_text_llm.default_approach.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm.default_approach.schemas 100.0% 100.0%
text.module_text_llm.module_text_llm.divide_and_conquer 34.0% 0.0%
text.module_text_llm.module_text_llm.helpers 55.4% 26.7%
text.module_text_llm.module_text_llm.self_consistency 46.2% 0.0%

Total packages: 31

Note: Coverage thresholds: ✅ (≥70%), ❌ (<70%)

@konrad2002 konrad2002 marked this pull request as ready for review January 28, 2026 16:06
@konrad2002 konrad2002 requested a review from a team as a code owner January 28, 2026 16:06
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In `@athena/modules/programming/module_programming_quality_llm/.env.example`:
- Around line 4-7: The .env.example currently contains a concrete SECRET value
which should be replaced with a placeholder to avoid accidental insecure copies;
update the SECRET entry in .env.example to a placeholder like
SECRET=<your-secret-here> (or SECRET=generate_a_strong_random_value) and add a
short comment or README note instructing users to generate and set a strong,
unique secret (e.g., using a secure random generator) rather than copying the
example value.

In
`@athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/__init__.py`:
- Around line 1-4: The module currently calls dotenv.load_dotenv(override=True)
at import time in the package __init__.py, which can silently overwrite existing
process env vars; remove or change that behavior by either (A) removing the
import-time call from __init__.py and moving dotenv.load_dotenv() to the
application entrypoint, or (B) gate the call behind an explicit flag and stop
forcing overrides (e.g., call dotenv.load_dotenv(override=False) only when a
runtime config or env var like LOAD_DOTENV_OVERRIDE is set). Locate the call to
dotenv.load_dotenv in the module_programming_quality_llm __init__.py (and apply
the same change to the other modules mentioned) and implement one of these two
approaches so imports no longer override environment variables by default.

In
`@athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/__main__.py`:
- Around line 49-52: The second return statement (calling
generate_non_graded_suggestions_by_file) is over-indented causing Flake8 E127;
fix the indentation so that the return aligns with the preceding return (both
return statements start at the same indentation level), keeping the same
arguments (exercise, submission, module_config.non_graded_approach,
module_config.debug) and preserving the existing conditional flow around
generate_graded_suggestions_by_file and generate_non_graded_suggestions_by_file.

In
`@athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/generate_graded_suggestions_by_file.py`:
- Around line 160-175: The Feedback objects are being constructed with
structured_grading_instruction_id=None while the source FeedbackModel exposes
grading_instruction_id; update the mapping so that when iterating
result.feedbacks you set
structured_grading_instruction_id=feedback.grading_instruction_id (or None if
absent). Locate the loop that builds Feedback instances (iterating
result.feedbacks) and replace the hardcoded None for
structured_grading_instruction_id with feedback.grading_instruction_id to
preserve the original grading_instruction_id value.
- Around line 62-70: The add_line_numbers utility currently uses
enumerate(lines) producing 0-based line numbers which leads to off-by-one errors
in reported line_start/line_end; update add_line_numbers (in helpers/utils.py)
to use enumerate(lines, start=1) so it emits 1-based line numbers, and verify
any consumers of add_line_numbers (e.g., generate_graded_suggestions_by_file.py
usage) expect 1-based indices or adjust their mappings accordingly.

In
`@athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/generate_non_graded_suggestions_by_file.py`:
- Around line 210-228: The filtering loop can add duplicates because
filtered_prompt_inputs is built from prompt_inputs and then items are appended
again from prompt_inputs; to fix, after building filtered_prompt_inputs (which
filters by programming_language_extension) remove those same entries from
prompt_inputs (e.g., by file_path or a unique key) before the while loop so the
subsequent prompt_inputs.pop(0) cannot re-add already-included files, ensuring
prompt_inputs, filtered_prompt_inputs and the final prompt_inputs contain unique
file entries up to config.max_number_of_files.

In
`@athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/generate_summary_by_file.py`:
- Around line 145-150: The code uses the model-returned file_summary.file_name
when populating items_dict, which can be incorrect; instead, use the original
prompt input's file path for the map key. Update the loop that iterates over
results (variable results and local file_summary) to reference the corresponding
prompt input (the list of file inputs passed to the LLM, e.g., the same-indexed
item containing file_path) rather than file_summary.file_name, and add a safety
check if results and prompt inputs differ in length before assigning
items_dict[file_path] = file_summary.description.

In
`@athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/helpers/utils.py`:
- Around line 62-69: The add_line_numbers function produces 0-based line numbers
because it calls enumerate(lines) with the default start; change it to
enumerate(lines, start=1) so line numbering begins at 1, keep the existing
line_number_max_length calculation and the f-string formatting (still using
str(line_number).rjust(line_number_max_length) and the same join) to preserve
alignment and output formatting.
- Around line 88-104: temporary_remote currently reuses an existing remote
blindly; update it to verify the existing remote's URL before yielding: call
repo.remote(remote_name) and inspect its URLs (remote.urls or remote.url) and
compare to remote_url; if they match, yield that remote as now, but if they
differ, create a uniquely named temporary remote (e.g. remote_name + "_tmp" or
timestamped) via repo.create_remote(remote_name, remote_url), call
remote.fetch(), yield that temp remote, and ensure you delete only the temp
remote with repo.delete_remote(temp_remote) after use; reference
temporary_remote, repo.remote, repo.create_remote, remote.fetch, and
repo.delete_remote when making the change.

In
`@athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/prompts/generate_graded_suggestions_by_file.py`:
- Around line 23-28: Update the "Output" instructions in the prompt to require a
strict JSON schema instead of freeform text: replace the ambiguous "Return an
array of feedback items... or a positive note" guidance with a precise JSON
object/array schema (e.g., an array of objects with keys title, description,
optional line_start, line_end) and state "Output MUST be valid JSON" and that
empty results should be an empty JSON array; modify the prompt text in
generate_graded_suggestions_by_file.py's Output section so the parser receives
only the structured JSON schema and no conflicting freeform alternatives.

In
`@athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/split_problem_statement_by_file.py`:
- Line 65: The local variable "model" is assigned from config.model.get_model()
but never used; remove the unused assignment to eliminate the dead variable by
either deleting the line or, if the call is needed for side effects, invoke
config.model.get_model() without assigning its result (or assign to "_" to
signal intentional unused). Update the occurrence where "model" is created in
the split_problem_statement_by_file module to reflect this change.
- Around line 44-52: The docstring for split_problem_statement_by_file
references GradedBasicApproachConfig but the function signature/parameter is
BasicApproachConfig; make them consistent by updating the docstring to use
BasicApproachConfig (or if the intended type is GradedBasicApproachConfig,
change the parameter annotation and any imports accordingly). Locate the
function split_problem_statement_by_file (or split_problem_statement) and update
the docstring parameter type for config to BasicApproachConfig (or adjust the
signature to GradedBasicApproachConfig) so the documented type matches the
actual parameter and imports remain correct.

In `@athena/modules/programming/module_programming_quality_llm/README.md`:
- Around line 7-17: The fenced code blocks in README.md lack language
identifiers (triggering MD040); update the two code fences around the shell
commands to include a language tag (e.g., change the opening backticks for the
`cp .env.example .env` block and the `poetry install` block to use ```bash) so
both fenced blocks are annotated with "bash" for proper syntax highlighting and
linting.
🧹 Nitpick comments (10)
athena/llm_core/llm_core/core/predict_and_parse.py (1)

110-120: Remove unnecessary Pydantic v1 compatibility layer; focus on silent failure handling in LM Studio fallback.

model_validate is correctly used for Pydantic v2. The project explicitly requires Pydantic 2.11.7 in iris/pyproject.toml, so the suggested getattr/parse_obj fallback for v1 compatibility is unnecessary.

However, the silent None return on parse/validation errors (lines 119–120) does mask malformed output. Consider logging errors or raising exceptions instead of silently returning None, as this fallback path is critical for LM Studio handling and debugging parse failures is important.

athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/prompts/generate_non_graded_suggestions_by_file.py (1)

8-12: Clarify the “no solution suggestions” constraint to avoid conflicting guidance.

Line 11 bans “solution suggestions,” but the task is to provide improvement guidance. Consider disallowing full corrected code/complete solutions instead, while still allowing actionable advice.

✏️ Proposed wording tweak
-Create non graded improvement suggestions for a student\'s programming submission that a human tutor would recommend. \
+Create non-graded improvement suggestions for a student\'s programming submission that a human tutor would recommend. \
 ...
-Important: the answer you generate must not contain any solution suggestions or contain corrected errors.
+Important: do not provide full corrected code or complete solutions; focus on principles and guidance the student can apply.
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/generate_graded_suggestions_by_file.py (1)

151-156: Consider adding strict=True to zip() calls for safety.

While prompt_inputs and results originate from the same iteration and should always have equal length, adding strict=True provides a defensive check that would surface bugs immediately if assumptions change.

Proposed fix
-                for prompt_input, result in zip(prompt_inputs, results)
+                for prompt_input, result in zip(prompt_inputs, results, strict=True)
-    for prompt_input, result in zip(prompt_inputs, results):
+    for prompt_input, result in zip(prompt_inputs, results, strict=True):
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/split_grading_instructions_by_file.py (2)

31-49: Docstring is missing the debug parameter.

The function signature includes debug: bool but the docstring Args section doesn't document it.

Proposed fix
     Args:
         exercise (Exercise): Exercise to split the grading instructions for (respecting the changed files)
         submission (Submission): Submission to split the grading instructions for (respecting the changed files)
         prompt (ChatPromptTemplate): Prompt template to check for grading_instructions
         config (GradedBasicApproachConfig): Configuration
+        debug (bool): Whether to emit debug metadata

     Returns:
         Optional[SplitGradingInstructions]: Split grading instructions, None if it is too short or too long

127-136: Mutating items on a Pydantic model with Sequence type may cause validation issues.

Directly assigning a list to split_grading_instructions.items (which is typed as Sequence[FileGradingInstruction]) works but bypasses Pydantic validation. Consider creating a new SplitGradingInstructions instance instead.

Proposed fix
-    split_grading_instructions.items = [
+    deduplicated_items = [
         FileGradingInstruction(
             file_name=file_name,
             grading_instructions="\n".join(
                 file_grading_instruction.grading_instructions
                 for file_grading_instruction in file_grading_instructions
             ),
         )
         for file_name, file_grading_instructions in file_grading_instructions_by_file_name.items()
     ]

-    return split_grading_instructions
+    return SplitGradingInstructions(items=deduplicated_items)
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/split_problem_statement_by_file.py (1)

137-146: Same concern: mutating Pydantic model's Sequence field directly.

Similar to split_grading_instructions_by_file.py, consider returning a new model instance.

Proposed fix
-    split_problem_statement.items = [
+    deduplicated_items = [
         FileProblemStatement(
             file_name=file_name,
             problem_statement="\n".join(
                 file_problem_statement.problem_statement
                 for file_problem_statement in file_problem_statements
             ),
         )
         for file_name, file_problem_statements in file_problem_statements_by_file_name.items()
     ]

-    return split_problem_statement
+    return SplitProblemStatement(items=deduplicated_items)
athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/generate_non_graded_suggestions_by_file.py (3)

91-102: Translate German comments to English for consistency, and verify the token limit logic.

The comments are in German. Additionally, the logic seems counterintuitive: smaller submissions (< 40000 chars) get a lower token limit (5000) while larger submissions get the full config limit. Typically, you'd want more tokens per file when there are fewer files.

Proposed translation and consideration
-    # Dynamische Token-Berechnung
+    # Dynamic token calculation
     total_content_size = sum(len(content) for content in changed_files.values())

-    if total_content_size < 40000:  # Kleine Submission
+    if total_content_size < 40000:  # Small submission
         effective_max_tokens = 5000
-    elif total_content_size < 100000:  # Mittlere Submission
+    elif total_content_size < 100000:  # Medium submission
         effective_max_tokens = 3000
-    else:  # Große Submission
+    else:  # Large submission
         effective_max_tokens = config.max_input_tokens

-    # Config-Objekt temporär überschreiben (Kopie erstellen um Original nicht zu verändern)
+    # Temporarily override config object (create copy to avoid modifying original)
     config = config.model_copy(update={"max_input_tokens": effective_max_tokens})

Please verify this token scaling is intentional—the current logic gives large submissions more tokens per file than small ones.


288-352: Priority filtering relies on fragile string prefix matching.

The _filter_feedbacks_by_priority function categorizes feedbacks by checking if description.startswith("CRITICAL"), etc. This creates tight coupling with LLM output format. If the LLM doesn't prefix descriptions exactly as expected, feedbacks fall into other and aren't prioritized correctly.

Consider documenting this contract clearly or using a more robust categorization mechanism (e.g., a dedicated priority or severity field in the model).


258-263: Consider adding strict=True to zip() calls.

Same as in the graded suggestions file—adding strict=True provides defensive checking.

athena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/helpers/utils.py (1)

107-141: Hardcoded branch="main" will fail when repositories use different default branches.

The parameter is used consistently across all callers without override, and external repositories (template, submission, solution) may use master or custom branch names. Consider detecting the active branch dynamically or making it configurable at the call site.

@az108 az108 added deploy:athena-test1 Athena Test Server 1 and removed lock:athena-test1 Is currently deployed to Athena Test Server 1 labels Feb 2, 2026
@github-actions github-actions bot added lock:athena-test1 Is currently deployed to Athena Test Server 1 and removed deploy:athena-test1 Athena Test Server 1 labels Feb 2, 2026
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Athena Test Results Summary

TestsPassed ✅SkippedFailed
Athena Test Report10 ran10 passed0 skipped0 failed

Failing Tests Summary

TestResult
No test annotations available

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

📊 Detailed Coverage Table

Combining 3 coverage files...
Parsing test-results/programming_module_programming_llm_coverage.xml...
Parsing test-results/text_module_text_llm_coverage.xml...
Parsing test-results/modeling_module_modeling_llm_coverage.xml...
Combining duplicate packages...
Creating combined coverage file: test-results/combined_coverage.xml
✅ Combined coverage saved to test-results/combined_coverage.xml
📊 Combined 31 unique packages

📊 Combined Coverage Summary:

Package Line Rate Branch Rate Status
athena 37.8% 3.3%
athena.helpers 100.0% 100.0%
athena.helpers.programming 33.0% 0.0%
athena.helpers.text 0.0% 100.0%
athena.models 0.0% 0.0%
athena.schemas 76.5% 8.3%
athena.storage 21.1% 0.0%
llm_core 100.0% 100.0%
llm_core.core 23.6% 5.7%
llm_core.loaders 79.3% 37.5%
llm_core.loaders.model_loaders 68.5% 37.5%
llm_core.models 66.7% 35.7%
llm_core.models.providers 77.2% 56.2%
llm_core.utils 52.8% 18.5%
modeling.module_modeling_llm.module_modeling_llm 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer 71.4% 50.0%
modeling.module_modeling_llm.module_modeling_llm.apollon_transformer.parser 79.2% 60.2%
modeling.module_modeling_llm.module_modeling_llm.core 88.9% 50.0%
modeling.module_modeling_llm.module_modeling_llm.models 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.prompts 100.0% 100.0%
modeling.module_modeling_llm.module_modeling_llm.utils 100.0% 50.0%
programming.module_programming_llm.module_programming_llm 100.0% 100.0%
programming.module_programming_llm.module_programming_llm.helpers 27.6% 0.0%
programming.module_programming_llm.module_programming_llm.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm 72.7% 12.5%
text.module_text_llm.module_text_llm.default_approach 66.4% 36.1%
text.module_text_llm.module_text_llm.default_approach.prompts 100.0% 100.0%
text.module_text_llm.module_text_llm.default_approach.schemas 100.0% 100.0%
text.module_text_llm.module_text_llm.divide_and_conquer 34.0% 0.0%
text.module_text_llm.module_text_llm.helpers 55.4% 26.7%
text.module_text_llm.module_text_llm.self_consistency 46.2% 0.0%

Total packages: 31

Note: Coverage thresholds: ✅ (≥70%), ❌ (<70%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

athena lock:athena-test1 Is currently deployed to Athena Test Server 1 ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants