Skip to content

fix tentative unit test execution#3398

Open
robfrank wants to merge 20 commits intomainfrom
test-and-coverage-review
Open

fix tentative unit test execution#3398
robfrank wants to merge 20 commits intomainfrom
test-and-coverage-review

Conversation

@robfrank
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review

Summary

This PR makes two changes to the GitHub Actions workflow:

  1. Changes the unit test command from mvn verify to mvn test
  2. Reorders test reporter parameters (list-tests before list-suites)

Analysis

Change 1: verifytest (line 96)

✅ This change is technically correct for a job named unit-tests:

  • mvn verify runs unit tests + integration tests + coverage
  • mvn test runs ONLY unit tests + coverage
  • Since integration tests run in a separate integration-tests job, using test here avoids redundant execution

⚠️ Issue: The comment on line 95 is now outdated and should be updated:

- name: Run Unit Tests with Coverage
  # test (not verify) to run only unit tests, excluding integration tests which run in separate job
  run: ./mvnw test -Pcoverage --batch-mode --errors --fail-never --show-version -pl !e2e,!e2e-perf -DexcludedGroups=slow -Dtest=!OpenCypherTCKTest

Change 2: Parameter reordering

The reordering of list-tests and list-suites is cosmetic and doesn't affect functionality.

Recommendations

  1. Update the outdated comment (line 95) to reflect the new command
  2. Add a PR description explaining what issue this fixes - it would help reviewers understand the motivation

Questions

  • Does this fix a specific CI failure or coverage reporting issue?
  • Have you verified that coverage reports are still generated correctly with test instead of verify?

Verdict

The code change is correct, but the comment needs updating. Otherwise, this looks like a good cleanup that makes the unit test job more focused on its actual purpose.


Review generated per CLAUDE.md guidelines

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Fix tentative unit test execution

Summary

This PR addresses CI test execution issues by properly segregating benchmark tests from regular unit tests, preventing them from running unnecessarily in the main test suite. The changes introduce a new dedicated job for benchmark tests and apply proper JUnit 5 tagging.


✅ Positive Aspects

  1. Proper Test Segregation: The addition of @Tag("benchmark") to all benchmark test classes follows JUnit 5 best practices and aligns with the existing pattern for slow tests.

  2. Improved CI Efficiency: Separating benchmark tests into their own job (benchmark-unit-tests) prevents long-running benchmarks from slowing down regular unit test execution.

  3. Consistent Architecture: The new benchmark-unit-tests job mirrors the structure of slow-unit-tests, maintaining consistency in the workflow design.

  4. Coverage Completeness: Benchmark test coverage is properly collected and integrated into the overall coverage reporting pipeline.


🔍 Code Quality Observations

GitHub Workflow Changes (.github/workflows/mvn-test.yml)

Line 96: Changed from verify to test goal

-run: ./mvnw verify -Pcoverage ... -DexcludedGroups=slow -Dtest=!OpenCypherTCKTest
+run: ./mvnw test -Pcoverage ... -DexcludedGroups=slow,benchmark -Dtest=!OpenCypherTCKTest

Concern: Changing from verify to test phase may skip important checks. The verify phase runs additional validation beyond just tests. Was this intentional?

Recommendation: Verify that no critical validations are being skipped. If the change is purely to avoid running integration tests (which should already be skipped via other mechanisms), consider documenting this reasoning in the commit message or PR description.

Test Tagging

All 6 benchmark test classes now have @Tag("benchmark"):

  • ✅ Consistent application across all benchmarks
  • ✅ Follows existing pattern (@Tag("slow") precedent)
  • ✅ Enables flexible test execution via Maven/Gradle

⚠️ Potential Issues

  1. Job Scope Limitation (Line 182)

    run: ./mvnw verify -Pcoverage ... -pl engine -Dgroups=benchmark

    The benchmark-unit-tests job only runs on the engine module (-pl engine), but benchmark tests exist in both:

    • engine/src/test/java/performance/ (3 files)
    • gremlin/src/test/java/performance/ (2 files)

    Issue: Benchmarks in the gremlin module won't be executed or covered.

    Recommendation: Either:

    • Change to -pl engine,gremlin to include both modules
    • Or document why gremlin benchmarks are excluded
    • Or remove the -pl restriction entirely to catch any future benchmark tests in other modules
  2. Minor: Order Swap in test-reporter (Lines 106-107, 149-150)

    -list-suites: "failed"
     list-tests: "failed"
    +list-tests: "failed"
    +list-suites: "failed"

    This appears to be a cosmetic change with no functional impact. While it improves consistency, it's worth noting that this swap occurs in multiple places but not in the new benchmark-unit-tests job (lines 192-193), where the original order is maintained.

  3. Whitespace Change (Line 544)

    -files:  {{ coverage-files-output }}
    +files: {{ coverage-files-output }}

    Trivial formatting fix (extra space removal). Good catch.


🎯 Recommendations

  1. Critical: Update line 182 to include the gremlin module:

    run: ./mvnw verify -Pcoverage --batch-mode --errors --fail-never --show-version -pl engine,gremlin -Dgroups=benchmark
  2. Verify: Confirm that changing from verify to test on line 96 doesn't skip important validation steps.

  3. Consider: Adding a comment in the workflow explaining why benchmarks are separated:

    # Benchmark tests are isolated due to their long execution time
    # and to avoid impacting regular unit test performance metrics
    benchmark-unit-tests:
  4. Nice-to-have: Apply the list-tests/list-suites ordering consistently across all test reporter uses (including the new benchmark job).


📊 Test Coverage

  • ✅ All identified benchmark classes are properly tagged
  • ✅ Coverage collection configured for benchmark tests
  • ✅ Integration with codecov/codacy maintained
  • ⚠️ Missing gremlin module benchmarks in execution

🔒 Security & Performance

  • ✅ No security concerns identified
  • ✅ Performance improvement expected (faster CI runs)
  • ✅ No new dependencies introduced
  • ✅ Follows principle of test isolation

✨ Overall Assessment

This is a solid improvement to the CI pipeline that properly addresses test segregation. The main issue is the incomplete scope of the benchmark job (missing gremlin module). Once that's addressed, this PR will significantly improve CI efficiency.

Recommendation: Approve with minor changes requested (add gremlin module to benchmark job scope).


Review conducted using CLAUDE.md guidelines and ArcadeDB project conventions.

@codacy-production
Copy link

codacy-production bot commented Feb 10, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-10.81%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5d1b7fb) 82996 57777 69.61%
Head commit (45ef40e) 112579 (+29583) 66201 (+8424) 58.80% (-10.81%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3398) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link

claude bot commented Feb 10, 2026

Comprehensive Pull Request Review

Overview

This PR addresses CI test execution by properly segregating benchmark tests from regular unit tests through JUnit 5 tagging and a dedicated CI job. The changes improve CI efficiency by preventing long-running benchmarks from slowing down standard test execution.


✅ Strengths

  1. Proper Test Classification: Adding @Tag("benchmark") to all benchmark test classes follows JUnit 5 best practices and maintains consistency with the existing @Tag("slow") pattern.

  2. Clean Separation of Concerns: The new benchmark-unit-tests job mirrors the structure of slow-unit-tests, maintaining architectural consistency.

  3. Coverage Integration: Benchmark test coverage is properly collected and integrated into the codecov/codacy pipeline.

  4. Minor Cleanup: Whitespace fix on line 544 (files: double-space → single-space).


⚠️ Issues Identified

1. CRITICAL: Incomplete Module Coverage in Benchmark Job

Location: .github/workflows/mvn-test.yml:182

run: ./mvnw verify -Pcoverage ... -pl engine -Dgroups=benchmark

Problem: The benchmark job only runs on the engine module, but benchmarks exist in:

  • engine/src/test/java/performance/ (3 tests)
  • gremlin/src/test/java/performance/ (2 tests) — NOT EXECUTED

Impact: Two benchmark tests (CypherEngineComparisonBenchmark, QueryLanguageWithGremlinAndCypherBenchmark) are tagged but never run, and their coverage is not collected.

Fix:

run: ./mvnw verify -Pcoverage --batch-mode --errors --fail-never --show-version -pl engine,gremlin -Dgroups=benchmark

Or remove -pl entirely to catch future benchmarks in any module.


2. POTENTIAL ISSUE: OpenCypherTCKTest Tagging Change

Location: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:36

Change: @Tag("slow")@Tag("benchmark")

Analysis:

  • OpenCypherTCKTest is a comprehensive TCK (Technology Compatibility Kit) test suite, not a benchmark
  • TCK tests verify compliance with the OpenCypher specification
  • The previous @Tag("slow") classification seems more semantically correct

Questions:

  1. Is this test actually a benchmark measuring performance, or is it a functional compliance test that happens to be slow?
  2. If it's truly a compliance test, should it remain tagged as slow and run in the slow-unit-tests job instead?

The distinction matters because:

  • Benchmarks measure performance characteristics
  • Slow tests verify correctness but take a long time

Recommendation: Clarify the intent. If this is a compliance test (TCK), consider reverting to @Tag("slow").


3. QUESTIONABLE: SelectClasspathResource → SelectPackages

Location: OpenCypherTCKTest.java:33

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")

Problem: These are different annotations with different semantics:

  • @SelectClasspathResource: Selects resources from the classpath (e.g., .feature files in src/test/resources)
  • @SelectPackages: Selects test classes from Java packages

Verification needed:

# The features are in resources, not packages:
engine/src/test/resources/opencypher/tck/features/

Impact: This change may break the test discovery mechanism. Cucumber tests typically use @SelectClasspathResource to find .feature files.

Recommendation:

  1. Verify this test still works correctly after the change
  2. If it doesn't work, revert to @SelectClasspathResource
  3. Document why the change was necessary if it does work

4. MINOR: Maven Goal Change Lacks Justification

Location: .github/workflows/mvn-test.yml:96

-run: ./mvnw verify -Pcoverage ...
+run: ./mvnw test -Pcoverage ...

Analysis:

  • verify phase: runs tests + integration tests + additional verification plugins
  • test phase: runs only unit tests

Questions:

  1. Does this skip any important Maven plugins that run in the verify phase?
  2. Is this purely to avoid running ITs (which should already be skipped)?

Recommendation: The change is likely fine (ITs run in a separate job), but consider:

  1. Updating the now-outdated comment on line 95
  2. Documenting the rationale in the PR description

5. INCONSISTENCY: Test Reporter Parameter Ordering

Locations: Multiple test reporter blocks

The PR reorders list-tests/list-suites in some places but not the new benchmark-unit-tests job (lines 192-193). This is cosmetic but creates inconsistency.

Recommendation: Either apply the ordering consistently everywhere or revert the changes (they have no functional impact).


🔍 Detailed Code Review

GitHub Workflow Changes

Line Change Assessment
96 verifytest, add benchmark to exclusions ✅ Correct (with comment update needed)
106-107, 149-150 Reorder test reporter params ⚠️ Cosmetic, creates inconsistency
159-201 New benchmark-unit-tests job ✅ Good structure, ❌ missing gremlin module
511-518 Download benchmark coverage ✅ Correct
544 Whitespace cleanup ✅ Good

Test Tagging Changes

File Change Assessment
LSMVectorIndexStorageBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
DuanSSSPBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
ParserBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
QueryLanguageBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
CypherEngineComparisonBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
QueryLanguageWithGremlinAndCypherBenchmark Add @Tag("benchmark") ✅ Correct (it's a benchmark)
OpenCypherTCKTest slowbenchmark ⚠️ Questionable (TCK ≠ benchmark)

🎯 Required Actions

Critical

  1. Add gremlin module to benchmark job scope (line 182)

High Priority

  1. Verify OpenCypherTCKTest change — confirm it's truly a benchmark or revert to @Tag("slow")
  2. Test SelectPackages change — verify Cucumber still finds feature files

Medium Priority

  1. Update comment on line 95 to reflect test vs verify
  2. Add PR description explaining what problem this solves

Nice-to-Have

  1. Apply parameter ordering consistently across all test reporter blocks

🔒 Security & Performance

  • ✅ No security concerns
  • ✅ Improves CI performance (faster test execution)
  • ✅ No new dependencies
  • ✅ Follows Apache 2.0 licensing

📊 Test Coverage

Per CLAUDE.md requirements:

  • ✅ All modified files are test-related (no production code changes)
  • ❌ Missing verification that the changes actually work (especially SelectPackages change)
  • ⚠️ Gremlin benchmarks excluded from execution

🎬 Recommendation

Status: Changes Requested

The concept is excellent, but there are critical issues:

  1. Gremlin module benchmarks are not executed
  2. OpenCypherTCKTest tag change needs justification/verification
  3. SelectPackages change may break test discovery

Please address the critical and high-priority items before merging.


Review conducted per CLAUDE.md guidelines and ArcadeDB conventions.

@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.29%. Comparing base (7c07315) to head (617af3c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3398      +/-   ##
==========================================
- Coverage   63.33%   60.29%   -3.05%     
==========================================
  Files        1255     1255              
  Lines       84547    84547              
  Branches    17364    17364              
==========================================
- Hits        53552    50975    -2577     
- Misses      23513    26058    +2545     
- Partials     7482     7514      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mergify
Copy link
Contributor

mergify bot commented Feb 10, 2026

🧪 CI Insights

Here's what we observed from your CI run for 617af3c.

🟢 All jobs passed!

But CI Insights is watching 👀

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: fix tentative unit test execution

Summary

This PR reorganizes the CI test execution strategy by introducing a new benchmark tag for JUnit tests and creating a separate CI job to run benchmark tests independently. The main goal appears to be preventing benchmark tests from running during normal unit test execution while ensuring they still get coverage tracking.

Positive Changes ✅

  1. Clear Test Categorization: The introduction of the @Tag("benchmark") annotation provides a clean separation between regular unit tests and performance benchmarks, which is a best practice.

  2. Consistent Tagging: All benchmark tests across the codebase are now consistently tagged:

    • LSMVectorIndexStorageBenchmark
    • TriggerBenchmark
    • DuanSSSPBenchmark
    • ParserBenchmark
    • QueryLanguageBenchmark
    • CypherEngineComparisonBenchmark
    • QueryLanguageWithGremlinAndCypherBenchmark
  3. Dedicated CI Job: The new benchmark-unit-tests job runs benchmarks separately with their own coverage reporting, which prevents them from slowing down the main test suite.

  4. Coverage Preservation: Benchmark tests still contribute to code coverage via the separate job and artifact upload.

  5. Code Cleanup: The pom.xml changes include minor formatting improvements and better property organization.

Issues & Concerns ⚠️

1. Critical Issue: OpenCypherTCKTest Tag Change

-@Tag("slow")
+@Tag("benchmark")

Problem: The OpenCypherTCKTest was changed from @Tag("slow") to @Tag("benchmark"). This is semantically incorrect because:

  • TCK (Technology Compatibility Kit) tests are conformance tests, not benchmarks
  • They verify OpenCypher specification compliance, not performance
  • The test class doesn't measure timing or performance metrics
  • This will exclude TCK tests from the slow-unit-tests job where they belong

Impact: TCK tests will now run in the benchmark job instead of the slow tests job, which may not be the intended behavior.

Recommendation:

  • If TCK tests are too slow for regular unit tests, keep @Tag("slow")
  • If they're also benchmarks, use multiple tags: @Tag("slow") @Tag("benchmark")
  • Or create a separate @Tag("tck") category if they need special handling

2. SelectClasspathResource → SelectPackages Change

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")

Problem: This changes how Cucumber features are discovered:

  • SelectClasspathResource looks for a directory path with / separators
  • SelectPackages looks for Java package names with . separators

Risk: This could cause the TCK tests to not find the feature files at all, depending on how they're organized in the resources directory.

Verification Needed:

  • Check if feature files exist at src/test/resources/opencypher/tck/features/
  • Verify the benchmark job successfully runs these tests
  • Ensure no TCK test scenarios are silently skipped

3. Incomplete Benchmark Job Configuration

The new benchmark-unit-tests job only runs tests from the engine module:

run: ./mvnw verify -Pcoverage ... -pl engine -Dgroups=benchmark

Problem: The gremlin module also has benchmark tests (CypherEngineComparisonBenchmark, QueryLanguageWithGremlinAndCypherBenchmark) that won't be executed.

Recommendation: Either:

  • Expand to -pl engine,gremlin
  • Or document why gremlin benchmarks are intentionally excluded
  • Or create a separate gremlin benchmark job

4. Minor: Property Duplication in pom.xml

The pom.xml has duplicate property definitions:

<maven-install-plugin.version>3.1.4</maven-install-plugin.version>
<!-- appears twice at lines 75 and 81 in the diff -->

Impact: Low - Maven uses the last definition, but it's confusing for maintainers.

5. Test Report Naming

All three jobs (unit, slow, benchmark) use the same report name:

name: Unit Tests Report

Problem: This makes it harder to distinguish which job produced which report in the GitHub Actions UI.

Recommendation: Use descriptive names like:

  • "Unit Tests Report" (for regular tests)
  • "Slow Unit Tests Report" (for slow tests)
  • "Benchmark Tests Report" (for benchmarks)

Questions for Author

  1. Was the OpenCypherTCKTest tag change intentional? Should it really be classified as a benchmark?
  2. Have you verified that the TCK tests still run successfully with the SelectPackages change?
  3. Should gremlin module benchmarks be included in the benchmark job?
  4. What's the expected CI time savings from this reorganization?

Testing Recommendations

Before merging, please verify:

  1. Run the benchmark job locally/in CI and confirm all expected benchmarks execute
  2. Verify TCK tests still discover and run all feature files
  3. Check that gremlin benchmarks are handled appropriately
  4. Confirm coverage reporting includes all benchmark tests
  5. Ensure no tests were accidentally excluded from all jobs

Code Style & Conventions

✅ Code adheres to project standards from CLAUDE.md:

  • No new dependencies added
  • Proper formatting maintained
  • Test-focused changes

Security & Performance

✅ No security concerns identified
✅ Performance impact: This change should improve CI performance by isolating slow benchmarks

Overall Assessment

Status: ⚠️ Needs Revision

The core idea is sound and follows best practices, but there are several issues that need attention:

  1. The OpenCypherTCKTest categorization appears incorrect
  2. The Cucumber test discovery change needs verification
  3. Gremlin benchmarks are not included in the new job

Recommendation: Address the concerns above before merging, particularly verifying the TCK test behavior and deciding on the correct categorization.

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: Fix Tentative Unit Test Execution

Overview

This PR addresses test organization and execution issues by separating benchmark tests from regular unit tests and attempting to resolve OpenCypher TCK test execution problems.


Positive Changes

1. Benchmark Test Isolation

  • Good practice: Adding @Tag("benchmark") to performance/benchmark tests (LSMVectorIndexStorageBenchmark, TriggerBenchmark, DuanSSSPBenchmark, ParserBenchmark, QueryLanguageBenchmark, etc.)
  • Rationale: Separating slow-running benchmark tests from regular unit tests improves CI/CD pipeline efficiency
  • Implementation: New benchmark-unit-tests job in GitHub Actions runs benchmarks separately

2. Improved Console.Output Test Coverage

  • Enhancement: The consoleLogStatement() test now properly validates actual output instead of just checking that it doesn't throw
  • Implementation: Captures System.out and asserts the expected output matches
  • Quality: This follows the project's preference for proper assertions (using AssertJ)

3. CI/CD Workflow Improvements

  • Excludes benchmark group from regular unit tests via -DexcludedGroups=slow,benchmark
  • Separate job for benchmark tests with dedicated coverage reporting
  • Consistent test reporter configuration

⚠️ Critical Issues

1. Commented-Out Cucumber Dependencies Break OpenCypherTCKTest

Location: engine/pom.xml lines 210-232

Problem: The PR comments out all Cucumber dependencies, but OpenCypherTCKTest.java still imports and uses Cucumber classes:

import static io.cucumber.junit.platform.engine.Constants.GLUE_PROPERTY_NAME;
import static io.cucumber.junit.platform.engine.Constants.PLUGIN_PROPERTY_NAME;

@Suite
@IncludeEngines("cucumber")
@SelectPackages("opencypher.tck.features")
@ConfigurationParameter(key = GLUE_PROPERTY_NAME, value = "com.arcadedb.query.opencypher.tck")
@ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "pretty, html:target/tck-report.html, com.arcadedb.query.opencypher.tck.TCKReportPlugin")
@Tag("benchmark")
public class OpenCypherTCKTest {
}

Files affected:

  • engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java
  • engine/src/test/java/com/arcadedb/query/opencypher/tck/TCKReportPlugin.java
  • engine/src/test/java/com/arcadedb/query/opencypher/tck/TCKStepDefinitions.java

Impact: The engine module will fail to compile in test scope. This violates the project's principle: "after every change in the backend (Java), compile the project and fix all the issues until the compilation passes"

Recommendation: Either:

  1. Keep Cucumber dependencies with <scope>test</scope> (preferred if TCK tests are still needed)
  2. OR remove/move the OpenCypher TCK test files entirely if they're being deprecated
  3. The @Tag("benchmark") annotation on OpenCypherTCKTest should probably be @Tag("slow") instead

2. Inconsistent Tag Usage

Location: OpenCypherTCKTest.java line 36

The PR changes the tag from @Tag("slow") to @Tag("benchmark"). However:

  • OpenCypher TCK is a compliance test suite, not a benchmark
  • Benchmarks measure performance; TCK tests verify specification compliance
  • This semantic mismatch could cause confusion

Recommendation: Keep @Tag("slow") or use a new @Tag("tck") tag instead

3. Missing SelectClasspathResource → SelectPackages Migration

Location: OpenCypherTCKTest.java line 25

Changed from:

@SelectClasspathResource("opencypher/tck/features")

to:

@SelectPackages("opencypher.tck.features")

Problem: @SelectPackages expects Java package names, but opencypher.tck.features is a resource path. This change may prevent Cucumber from finding feature files.

Recommendation: Either keep @SelectClasspathResource or verify that the feature files are accessible via the package name


📋 Minor Issues

1. Code Formatting Inconsistencies

Location: engine/src/test/java/com/arcadedb/query/sql/TriggerBenchmark.java lines 40-43

Changed from:

private static Database database;
private static final int WARMUP_ITERATIONS = 10000;
private static final int BENCHMARK_ITERATIONS = 100000;

to:

private static       Database database;
private static final int      WARMUP_ITERATIONS    = 10000;
private static final int      BENCHMARK_ITERATIONS = 100000;

Issue: Excessive whitespace for alignment is not standard Java style and doesn't match the existing codebase conventions

Recommendation: Revert to standard formatting without extra alignment spaces

2. Property Ordering in pom.xml

Location: Root pom.xml lines 50-98

Properties are reorganized but some duplicates remain:

  • maven-install-plugin.version appears twice (lines 72 and 77)
  • license-maven-plugin.version appears twice (lines 75 and 88 in original)

Recommendation: Clean up duplicate property definitions

3. Whitespace-Only Changes

Location: Multiple files in pom.xml

Many changes are purely whitespace (indentation corrections). While consistency is good, mixing functional changes with formatting changes makes reviews harder.

Recommendation: Consider separating formatting changes into a dedicated commit


🔍 Testing & Verification Concerns

1. No Verification of Changes

According to CLAUDE.md guidelines:

"Before writing any code: State how you will verify this change works"

Missing:

  • No evidence that the engine module compiles after commenting out Cucumber
  • No test execution results showing benchmarks run separately
  • No verification that OpenCypherTCKTest still works (or is intentionally broken)

2. Incomplete Cucumber Removal

If the intent is to remove Cucumber support, the following should also be addressed:

  • Remove TCK-related test files
  • Remove cucumber.version property from engine/pom.xml (line 48)
  • Update documentation if TCK compliance testing is being dropped

🎯 Recommendations

Before merging, please:

  1. Fix compilation: Either restore Cucumber dependencies or remove TCK test files
  2. Run verification: Execute mvn clean install to ensure everything compiles
  3. Run the tests: Verify that:
    • Regular unit tests exclude benchmarks: mvn test -DexcludedGroups=benchmark
    • Benchmark tests run separately: mvn test -Dgroups=benchmark
    • CI workflow completes successfully
  4. Clarify intent: Add a clear description to the PR explaining:
    • Why Cucumber is being removed (if that's the goal)
    • What happens to OpenCypher TCK compliance testing
  5. Fix semantic issues: Use appropriate tags (@Tag("slow") or @Tag("tck") instead of @Tag("benchmark") for OpenCypherTCKTest)
  6. Clean up formatting: Remove unnecessary whitespace alignment changes

📊 Security & Performance

  • ✅ No security concerns identified
  • ✅ No performance regressions expected (benchmark isolation should improve CI performance)
  • ✅ Test coverage improvements with better console output validation

📝 Summary

The benchmark test isolation strategy is excellent and aligns with best practices. However, the Cucumber dependency removal is incomplete and will break compilation. The PR needs additional work before it can be merged successfully.

Verdict: ❌ Changes requested - compilation issues must be resolved first

@codacy-production
Copy link

codacy-production bot commented Feb 10, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-30.25%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5d1b7fb) 82996 57777 69.61%
Head commit (20707b7) 112605 (+29609) 44324 (-13453) 39.36% (-30.25%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3398) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: Fix Tentative Unit Test Execution

Summary

This PR introduces a new test execution strategy by separating benchmark tests into their own CI job and making configuration adjustments to test dependencies. The changes improve test organization and CI pipeline clarity.


✅ Positive Aspects

  1. Better Test Organization: Separating benchmark tests from regular unit tests is a good practice. Benchmarks typically take longer and have different performance characteristics.

  2. Improved CI Parallelization: The new benchmark-unit-tests job runs in parallel with other test jobs, which can improve overall CI runtime.

  3. Consistent Tagging: Adding @Tag("benchmark") to benchmark test classes provides clear semantic meaning and enables selective test execution.

  4. Enhanced Test Coverage Tracking: The new job properly collects and uploads coverage reports for benchmark tests separately.


⚠️ Issues and Concerns

1. Breaking Change in OpenCypherTCKTest ⚠️

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")
-@Tag("slow")
+@Tag("benchmark")

Problem: Changing from @SelectClasspathResource to @SelectPackages fundamentally changes how Cucumber discovers test features:

  • @SelectClasspathResource looks for files in the classpath (e.g., .feature files in src/test/resources/opencypher/tck/features/)
  • @SelectPackages looks for Java packages (e.g., classes in com.arcadedb.query.opencypher.tck.features)

Since TCK tests are typically Cucumber .feature files (not Java classes), this change will likely break the OpenCypherTCKTest.

Recommendation: Revert to @SelectClasspathResource("opencypher/tck/features") and only change the tag from slow to benchmark.

2. Commented Out Dependencies Without Explanation ⚠️

<!--        <dependency>-->
<!--            <groupId>io.cucumber</groupId>-->
<!--            <artifactId>cucumber-junit-platform-engine</artifactId>-->
...
<!--        <dependency>-->
<!--            <groupId>org.junit.platform</groupId>-->
<!--            <artifactId>junit-platform-suite</artifactId>-->

Problems:

  • These dependencies are needed for the OpenCypherTCKTest to run (it uses @Suite annotation from junit-platform-suite)
  • Commenting them out will cause compilation failures or runtime errors for the TCK test
  • No explanation in the PR description or commit message about why these were removed

Recommendation: Either:

  1. Keep these dependencies if the TCK test is still needed, OR
  2. If removing the TCK test is intentional, remove the entire test class and explain why in the PR description

3. Tag Classification Inconsistency

The PR changes OpenCypherTCKTest from @Tag("slow") to @Tag("benchmark"). However:

  • TCK (Technology Compatibility Kit) tests are compliance/conformance tests, not benchmarks
  • They verify correctness against the OpenCypher specification, not measure performance
  • Tagging them as "benchmark" is semantically incorrect

Recommendation: Keep the TCK test tagged as slow since it's a long-running conformance test, not a performance benchmark.

4. Incomplete Test Execution Strategy

Looking at the workflow changes:

-DexcludedGroups=slow,benchmark

The regular unit tests now exclude BOTH slow and benchmark groups. However:

  • If OpenCypherTCKTest is tagged benchmark, it will run in the benchmark-unit-tests job
  • But that job only runs on the engine module: -pl engine -Dgroups=benchmark
  • This seems intentional, but needs verification that this is the desired behavior

🔍 Minor Issues

  1. Whitespace Changes in pom.xml: The PR includes formatting changes (tabs/spaces) in the pom.xml that are unrelated to the core functionality. While improving consistency is good, mixing formatting with functional changes makes review harder.

  2. Property Organization in pom.xml: Properties are reorganized but no functional change. Consider separating pure refactoring from functional changes in future PRs.

  3. Missing PR Description: The PR body is empty, making it hard to understand the motivation and expected behavior changes.


🧪 Test Coverage Assessment

Positive:

  • The SQLScriptAdditionalCoverageTest.consoleLogStatement() improvement is excellent - it now actually verifies the output instead of just checking that it doesn't throw
  • All benchmark tests are properly tagged

Concern:

  • No new tests added to verify the CI changes work as expected
  • The commented-out dependencies may break existing tests

🔒 Security Considerations

No security concerns identified. The changes are limited to test execution configuration.


🚀 Performance Considerations

Positive: Separating benchmarks into their own job prevents them from slowing down regular test feedback loops.


📋 Recommendations

Critical (Must Fix):

  1. ✅ Revert OpenCypherTCKTest to use @SelectClasspathResource instead of @SelectPackages
  2. ✅ Un-comment the Cucumber and JUnit Platform Suite dependencies in engine/pom.xml
  3. ✅ Keep OpenCypherTCKTest tagged as @Tag("slow") not benchmark

Suggested:

  1. Add a PR description explaining:
    • Why benchmark tests are being separated
    • Expected impact on CI runtime
    • Whether the TCK test should continue running
  2. Verify that the workflow correctly runs all tests (run a test build)
  3. Consider separating formatting changes from functional changes

✅ After Fixes

Once the critical issues are addressed, this PR will provide a good improvement to the test execution strategy. The core idea of separating benchmarks is sound and follows best practices.

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: Fix Tentative Unit Test Execution

I've reviewed PR #3398 and have the following feedback:

Positive Changes

  1. Improved Test Categorization: Excellent separation of benchmark tests from regular unit tests using the @Tag("benchmark") annotation. This allows for more granular test execution control.

  2. Dedicated Benchmark Job: The new benchmark-unit-tests job in the CI workflow properly isolates performance-intensive tests, preventing them from slowing down the main test suite.

  3. Better Test Coverage Organization: Separating benchmark coverage reports ensures they can be tracked independently without skewing regular test metrics.

  4. Proper Console Output Testing: The improvement to SQLScriptAdditionalCoverageTest.consoleLogStatement() now actually verifies the output instead of just checking that it doesn't throw. This is a meaningful test quality improvement.

🔍 Issues & Concerns

1. Commented-Out Code (Critical)

<!--        <dependency>-->
<!--            <groupId>io.cucumber</groupId>-->
<!--            <artifactId>cucumber-junit-platform-engine</artifactId>-->

Location: engine/pom.xml:216-225

Issue: Commented-out code should be removed entirely, not left in the codebase. This violates the project's guidelines ("Avoid backwards-compatibility hacks like... adding // removed comments for removed code... If something is unused, delete it completely").

Recommendation: Either remove this dependency entirely or document WHY it's commented out if there's a temporary reason.

2. Inconsistent Tagging Strategy

Location: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:36

-@Tag("slow")
+@Tag("benchmark")

Issue: The OpenCypherTCKTest was changed from @Tag("slow") to @Tag("benchmark"). However, the CI workflow still excludes slow tests. If there are other tests tagged with "slow", they will now run in the regular test suite, which may not be intended.

Questions:

  • Are there other tests with @Tag("slow")?
  • Should the workflow exclude both "slow" and "benchmark" tests, or are you replacing "slow" with "benchmark" globally?

3. Cucumber Configuration Change

Location: OpenCypherTCKTest.java:33

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")

Issue: This is a significant behavioral change. @SelectClasspathResource scans for files in the classpath, while @SelectPackages scans for packages. This might not find the Cucumber feature files if they're not structured as Java packages.

Recommendation: Verify that the TCK tests still execute correctly with this change. Run the benchmark tests locally to confirm.

4. Whitespace-Only Changes

Location: Multiple files (pom.xml, TriggerBenchmark.java)

Issue: The PR includes several whitespace-only changes (e.g., reordering property definitions, reformatting). While not harmful, these make the diff harder to review and don't add value.

Recommendation: Consider separating formatting changes into a dedicated PR or avoiding them when making functional changes.

5. Missing Verification

Issue: The PR description is empty, and there's no evidence that the benchmark tests were actually run to verify they work with the new tagging system.

Recommendation: Add a description explaining:

  • What problem this PR solves
  • How the changes were tested
  • Confirmation that benchmark tests run successfully in isolation

🎯 Testing Concerns

  1. New Job Configuration: The benchmark-unit-tests job uses:

    -Dgroups=benchmark -Dtest=!OpenCypherTCKTest

    But it only runs on the engine module (-pl engine). The gremlin module also has benchmark tests. Should those be included?

  2. Coverage Gaps: The workflow downloads benchmark-unit-coverage-reports but benchmark tests might not contribute meaningfully to coverage metrics since they focus on performance, not code paths.

📝 Minor Issues

  1. Typo in Commit Message: "tenttive" should be "tentative" in commit 706c0dd.

  2. Order Swap: The list-tests and list-suites parameters were swapped in several places. While this likely doesn't affect functionality, consistency would be better if there's no reason for the change.

🛠️ Recommendations

  1. Remove commented code in engine/pom.xml
  2. Verify Cucumber test execution with the new @SelectPackages annotation
  3. Document the tagging strategy: Add a comment or documentation explaining when to use @Tag("benchmark") vs @Tag("slow")
  4. Consider including gremlin benchmarks in the benchmark job or document why they're excluded
  5. Add PR description explaining the changes and verification performed
  6. Run the benchmark tests to ensure they work correctly with the new configuration

⚖️ Overall Assessment

This PR addresses a real issue with test organization and CI performance. The core idea of separating benchmark tests is sound and follows good practices. However, there are several implementation details that need attention:

  • Code Quality: Remove commented code
  • Testing: Verify the Cucumber configuration change
  • Documentation: Explain the changes and tagging strategy

Recommendation: Request changes to address the commented code and verify the Cucumber test configuration before merging.

@claude
Copy link

claude bot commented Feb 10, 2026

Pull Request Review: Fix Tentative Unit Test Execution

Summary

This PR separates benchmark tests from regular unit tests by introducing a new @Tag("benchmark") tag and creating a dedicated CI job to run them separately. This improves CI performance by excluding long-running benchmarks from the main test suite while still maintaining coverage.

Code Quality & Best Practices ✅

Positive aspects:

  • Clean separation of concerns: benchmark tests are now properly tagged and isolated
  • Follows existing test patterns and JUnit 5 tagging conventions
  • Consistent application of the @Tag("benchmark") annotation across all benchmark test classes
  • Good use of JUnit Platform's group filtering capabilities

Areas for improvement:

  1. Inconsistent indentation in engine/pom.xml (lines 216-227):

    • The cucumber-junit-platform-engine dependency has incorrect indentation (extra spaces)
    • The commented-out exclusion block should either be removed entirely or properly documented
    • Recommendation: Fix the indentation to match surrounding code and either remove the commented XML or add a clear comment explaining why it's kept
  2. Commented-out code without explanation (engine/pom.xml):

    • Lines 221-226 contain commented XML for exclusions
    • Per CLAUDE.md: "remove any System.out you used for debug when you have finished"
    • Recommendation: Either remove the commented code or add a clear comment explaining why the exclusion was removed and might need to be reconsidered
  3. OpenCypherTCKTest tag change (OpenCypherTCKTest.java:36):

    • Changed from @Tag("slow") to @Tag("benchmark")
    • This is a semantic change - is this test truly a benchmark, or just slow?
    • Also changed from @SelectClasspathResource to @SelectPackages - this could affect test discovery
    • Recommendation: Clarify if this is the intended behavior, as TCK tests are typically compliance tests, not benchmarks

Test Coverage ✅

Improvements:

  1. Enhanced test assertion in SQLScriptAdditionalCoverageTest.java (lines 287-301):
    • The consoleLogStatement() test now properly captures and verifies console output
    • Uses proper stream redirection with try-finally to ensure cleanup
    • Adds actual assertion instead of just checking it doesn't throw
    • Excellent improvement! This follows the guideline: "Prefer this syntax: assertThat(property.isMandatory()).isTrue();"

Coverage considerations:

  • The new benchmark-unit-tests job runs only on the engine module with -Dgroups=benchmark
  • Benchmark tests in the gremlin module (CypherEngineComparisonBenchmark.java, QueryLanguageWithGremlinAndCypherBenchmark.java) are tagged but won't run in the new CI job
  • Recommendation: Either extend the benchmark job to include gremlin module or document why only engine benchmarks are run in CI

Performance Considerations ✅

Benefits:

  • Main unit test suite will run faster by excluding benchmarks: -DexcludedGroups=slow,benchmark
  • Benchmarks run in parallel with other test jobs, maximizing CI efficiency
  • Proper separation allows different timeout/resource settings for benchmark tests

Potential issue:

  • Line 182: The benchmark job runs ./mvnw verify -Pcoverage ... -pl engine
  • This runs only the engine module, but benchmark tests exist in gremlin module too
  • Recommendation: Consider running -pl engine,gremlin or documenting why gremlin benchmarks are excluded

Security Concerns ✅

No security issues identified. Changes are limited to test organization and CI configuration.

Potential Bugs ⚠️

  1. Duplicate property definitions in root pom.xml (lines 72-77):

    • maven-install-plugin.version is defined twice (lines 72 and 77)
    • license-maven-plugin.version is defined twice (lines 75 and 88)
    • Impact: Low - Maven will use the last definition, but this is confusing
    • Recommendation: Remove the duplicate definitions
  2. Minor formatting inconsistencies in root pom.xml:

    • Lines 309-310: Multi-line license merge text might cause readability issues
    • Lines 126-138: Module list formatting is now consistent (good!)
    • Line 544: Extra space removed before files: (good cleanup)
  3. OpenCypherTCKTest package selection change:

    • Changed from @SelectClasspathResource("opencypher/tck/features") to @SelectPackages("opencypher.tck.features")
    • This changes how Cucumber discovers feature files
    • Risk: Medium - this could cause test discovery failures if the package doesn't exist or feature files aren't found
    • Recommendation: Verify this change was tested and feature files are still discovered correctly

CI/CD Workflow Changes 🔍

GitHub Actions modifications:

  1. New benchmark-unit-tests job (lines 162-203):

    • Properly depends on build-and-package
    • Uses correct JDK 21 setup
    • Runs with coverage profile
    • Issue: The job name in the test reporter is "Unit Tests Report" - should be "Benchmark Unit Tests Report" to avoid confusion
    • Issue: The benchmark job only runs -pl engine, missing gremlin benchmarks
  2. Updated coverage aggregation (lines 508-516):

    • Properly downloads benchmark coverage reports
    • Includes them in the codecov upload
    • Uses continue-on-error: true for artifact downloads (good practice)
  3. Minor reordering (lines 106-107, 146-147):

    • Swapped list-tests and list-suites order - no functional impact
    • Appears to be just formatting/style preference

Adherence to CLAUDE.md Guidelines

Followed:

  • Uses existing test patterns and frameworks
  • Maintains existing code style
  • Improved test assertions (SQLScriptAdditionalCoverageTest)
  • No new dependencies added

⚠️ Needs attention:

  • Commented-out code should be removed (engine/pom.xml lines 221-226)
  • Indentation issues in engine/pom.xml (line 216)
  • Duplicate property definitions in root pom.xml should be cleaned up

Recommendations Summary

High Priority:

  1. ✅ Remove or document the commented-out exclusions in engine/pom.xml
  2. ✅ Fix indentation in engine/pom.xml (line 216)
  3. ✅ Remove duplicate property definitions in root pom.xml
  4. ⚠️ Verify OpenCypherTCKTest still discovers feature files correctly after the @SelectPackages change

Medium Priority:
5. 📝 Update the test reporter name in the benchmark job to "Benchmark Unit Tests Report"
6. 🔍 Consider including gremlin module in the benchmark CI job or document why it's excluded
7. 📝 Clarify if OpenCypherTCKTest should be tagged as "benchmark" vs "slow"

Low Priority:
8. 🎨 Fix multi-line license merge formatting in root pom.xml (line 309-310)

Verification Needed

Per CLAUDE.md guidelines, please verify:

  1. Run the full test suite locally to ensure no tests are accidentally excluded
  2. Verify OpenCypherTCKTest still runs and discovers all feature files
  3. Check that benchmark tests in the gremlin module are intentionally excluded from CI or should be included
  4. Confirm the CI pipeline completes successfully with the new job

Overall Assessment: This is a solid improvement that will speed up CI by properly separating benchmark tests. The main concerns are the commented-out code, duplicate properties, and ensuring the OpenCypherTCKTest change doesn't break test discovery. With minor cleanup, this PR will be ready to merge.

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Fix tentative unit test execution

Overview

This PR improves the CI/CD test execution strategy by separating benchmark tests into their own workflow job, ensuring better organization and more accurate test coverage reporting.


✅ Positive Changes

1. Improved Test Organization

  • Excellent separation of benchmark tests from regular unit tests using JUnit 5's @Tag("benchmark") annotation
  • New dedicated benchmark-unit-tests job in CI workflow allows for better isolation and potentially different resource allocation
  • Proper coverage aggregation in the coverage-report job now includes benchmark test coverage

2. Test Quality Enhancement

  • SQLScriptAdditionalCoverageTest.java:287-301: Great improvement to the consoleLogStatement() test - now actually verifies output instead of just checking it doesn't throw. This is proper test-driven development.
  • Proper cleanup with finally block ensures System.out is restored even if test fails

3. Workflow Improvements

  • Consistent test reporter configuration across all jobs
  • Proper dependency chain: coverage-report now depends on benchmark-unit-tests job
  • Good use of continue-on-error: true for artifact downloads to handle missing artifacts gracefully

4. Dependency Management

  • Centralized JMH version in engine/pom.xml properties (good practice)
  • Consistent formatting improvements in pom.xml

⚠️ Issues & Concerns

1. CRITICAL: Duplicate Property Definition (pom.xml:77-78)

<maven-install-plugin.version>3.1.4</maven-install-plugin.version>
<maven-enforcer-plugin.version>3.4.1</maven-enforcer-plugin.version>

Issue: maven-install-plugin.version is defined twice (lines 73 and 77)
Impact: While Maven takes the last definition, this is confusing and violates DRY principles
Recommendation: Remove the duplicate at line 77

2. CRITICAL: Duplicate Property Definition (pom.xml:92)

<mockito-core.version>5.21.0</mockito-core.version>

Issue: mockito-core.version is defined twice (lines 88 and 92) with different values (5.16.0 vs 5.21.0)
Impact: This creates ambiguity - version 5.21.0 will be used, but version 5.16.0 might be expected elsewhere
Recommendation: Consolidate to a single definition and verify which version is intended

3. Questionable Change: OpenCypherTCKTest Tag (engine/.../OpenCypherTCKTest.java:36)

-@Tag("slow")
+@Tag("benchmark")

Issue: Changed from slow to benchmark tag, but this is a TCK (Technology Compatibility Kit) compliance test, not a performance benchmark
Impact: May cause confusion about test purpose; TCK tests verify correctness, not performance
Recommendation: Consider whether slow tag was more semantically accurate, or if this test should have both tags

4. Code Style: Whitespace Changes (TriggerBenchmark.java:41-43)

-  private static Database database;
-  private static final int WARMUP_ITERATIONS = 10000;
+  private static       Database database;
+  private static final int      WARMUP_ITERATIONS    = 10000;

Issue: Added column alignment whitespace that doesn't match ArcadeDB's existing code style
Per CLAUDE.md: "adhere to the existing code"
Recommendation: Revert these whitespace changes - they're not necessary for the PR's purpose and violate project style guidelines

5. Commented-Out Code (engine/pom.xml:221-226)

<!--                    <exclusions>-->
<!--                        <exclusion>-->
<!--                            <groupId>org.junit.platform</groupId>-->
<!--                            <artifactId>junit-platform-engine</artifactId>-->
<!--                        </exclusion>-->
<!--                    </exclusions>-->

Issue: Commented-out exclusion block with no explanation
Per CLAUDE.md: Clean code practices suggest removing unused code rather than commenting it
Recommendation: Either remove entirely or add a comment explaining why it's disabled and might be needed later

6. Minor: Indentation Inconsistency (engine/pom.xml:216)

                <dependency>
                    <groupId>io.cucumber</groupId>

Issue: Inconsistent indentation (extra spaces before <dependency>)
Recommendation: Fix indentation to match surrounding code

7. Workflow: Missing Test Exclusion Clarification
The new benchmark job runs:

-Dgroups=benchmark -Dtest=!OpenCypherTCKTest

Question: Why exclude OpenCypherTCKTest if it's now tagged as benchmark? This seems contradictory.
Recommendation: Clarify the intent - either OpenCypherTCKTest should be benchmark or it should remain slow


🔍 Additional Observations

Missing Context

  • The PR description is empty - would benefit from explaining:
    • What issue was this fixing?
    • Why were benchmark tests failing or causing problems?
    • What was the "tentative" behavior being fixed?

Test Execution Strategy

  • Current approach: exclude benchmarks from main tests, run them separately
  • Consider: Should benchmarks run on every commit, or only on schedule/manual trigger? They may be time-intensive.

Coverage Considerations

  • Benchmark tests are now included in coverage, which is good
  • However, benchmark tests typically focus on performance, not coverage - ensure this doesn't skew coverage metrics

📋 Recommendations Summary

Must Fix:

  1. Remove duplicate maven-install-plugin.version property (pom.xml:77)
  2. Resolve conflicting mockito-core.version definitions (lines 88 vs 92)
  3. Revert unnecessary whitespace changes in TriggerBenchmark.java

Should Consider:
4. Re-evaluate OpenCypherTCKTest tag change (slow vs benchmark)
5. Remove or document commented-out exclusion in engine/pom.xml
6. Add PR description explaining what's being fixed
7. Clarify why OpenCypherTCKTest is excluded in benchmark job if it's tagged benchmark

Nice to Have:
8. Fix indentation in engine/pom.xml:216
9. Consider if benchmarks should run on every commit or on schedule


✅ Testing & Verification

The PR correctly:

  • ✅ Adds proper @Tag("benchmark") annotations to benchmark tests
  • ✅ Updates CI workflow to handle benchmark tests separately
  • ✅ Improves test assertion quality (console output test)
  • ✅ Maintains coverage reporting for all test types

Overall, this is a useful improvement to the testing infrastructure, but requires cleanup of duplicate properties and style violations before merging.


Review generated following CLAUDE.md guidelines for code quality, testing practices, and project conventions.

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Fix tentative unit test execution

Summary

This PR reorganizes test execution to separate benchmark/performance tests from regular unit tests and improve CI efficiency. The changes include:

  • Adding a new benchmark JUnit tag for performance tests
  • Creating a dedicated CI job for benchmark tests
  • Changing Maven lifecycle phase from verify to package for unit tests
  • Improving test coverage for console output functionality
  • Minor POM cleanup and formatting improvements

✅ Strengths

1. Better Test Organization

  • Clear separation of concerns: benchmark tests now run in their own CI job
  • Excludes benchmark tag from regular unit tests (-DexcludedGroups=slow,benchmark)
  • Consistent tagging applied across all benchmark test classes

2. Improved CI Efficiency

  • Using package phase instead of verify prevents unnecessary integration test setup
  • Allows parallel execution of different test categories
  • Better isolation of slow/benchmark tests from regular unit tests

3. Enhanced Test Coverage

  • SQLScriptAdditionalCoverageTest.consoleLogStatement() now properly verifies console output instead of just checking it does not throw
  • Good use of ByteArrayOutputStream to capture and assert System.out content

⚠️ Issues & Concerns

1. Commented-out Code in POM (engine/pom.xml:195-201)

<dependency>
    <groupId>io.cucumber</groupId>
    <artifactId>cucumber-junit-platform-engine</artifactId>
    <version>${cucumber.version}</version>
    <scope>test</scope>
<!--                    <exclusions>-->
<!--                        <exclusion>-->
<!--                            <groupId>org.junit.platform</groupId>-->
<!--                            <artifactId>junit-platform-engine</artifactId>-->
<!--                        </exclusion>-->
<!--                    </exclusions>-->
</dependency>
  • Issue: Commented-out exclusions left in the code
  • Recommendation: Either remove the comments entirely or document WHY the exclusion was removed (commit message says "just remove junit bindings" but rationale unclear)
  • Risk: The exclusion might have been there to prevent dependency conflicts. Removing it could cause classpath issues.

2. OpenCypherTCKTest Tag Change (Line 246)

-@Tag("slow")
+@Tag("benchmark")
  • Issue: This changes the semantic meaning. TCK (Technology Compatibility Kit) tests are conformance tests, not benchmarks
  • Recommendation: Consider using both tags if applicable, or keep it as "slow" since TCK tests verify correctness, not measure performance
  • Impact: If someone runs tests with -Dgroups=slow, they will no longer run TCK tests

3. SelectClasspathResource → SelectPackages Change

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")
  • Issue: This fundamentally changes how Cucumber discovers feature files
    • SelectClasspathResource looks for files in the classpath (e.g., .feature files)
    • SelectPackages looks for Java packages
  • Risk: This could break TCK test discovery if feature files are not in a package named opencypher.tck.features
  • Recommendation: Verify this actually works. The commit message says "fix load of resources" but no explanation of why this was needed

4. Indentation-only Changes
Multiple files have whitespace/alignment changes (e.g., TriggerBenchmark.java:268-273, QueryLanguageBenchmark.java):

-  private static Database database;
-  private static final int WARMUP_ITERATIONS = 10000;
+  private static       Database database;
+  private static final int      WARMUP_ITERATIONS    = 10000;
  • Issue: These formatting changes make the diff harder to review and are not related to the PR's stated purpose
  • Recommendation: Per CLAUDE.md guidelines, avoid unrelated formatting changes. Consider reverting these or submitting them in a separate formatting-only PR

5. Property Organization (pom.xml)
The PR reorganizes property definitions in the root pom.xml. While this improves organization, ensure there are no duplicate definitions.


🔍 Performance Considerations

Positive:

  • Separating benchmark tests allows them to run with different resource constraints if needed
  • package phase is faster than verify for unit tests since it skips integration test setup

Consideration:

  • Benchmark tests now run in a separate job that depends on build-and-package, adding some overhead
  • Total CI time might increase if benchmark tests are slow, even though unit tests complete faster

🔒 Security Concerns

No security issues identified. Changes are primarily test organization and CI configuration.


🧪 Test Coverage

Positive:

  • New assertion in consoleLogStatement() test actually validates output
  • All benchmark tests properly tagged for categorization

Missing:

  • No evidence that the OpenCypherTCKTest changes were verified (does it still find and run TCK features?)
  • No mention of whether benchmark tests were actually run in the new CI job

📋 Recommendations

  1. Remove or explain commented code in engine/pom.xml
  2. Reconsider OpenCypherTCKTest tagging - should it be "slow" or "benchmark"?
  3. Verify SelectPackages change - ensure TCK tests still run correctly
  4. Revert formatting-only changes or move to separate PR
  5. Add PR description - currently blank, should explain:
    • Why the change from verify to package phase
    • Rationale for removing cucumber-junit-platform-engine exclusion
    • Expected impact on CI execution time
  6. Run full test suite to ensure nothing broke, especially TCK tests

Code Quality: 7/10

The core idea is solid (better test organization), but execution has some rough edges:

  • Mixing functional changes with formatting changes
  • Unclear rationale for some changes (commented code, SelectPackages)
  • Missing PR description to explain the "why"

Per CLAUDE.md project guidelines:

  • ✅ Tests are being improved
  • ✅ Build commands updated appropriately
  • ⚠️ Some changes seem incomplete (commented code)
  • ⚠️ Formatting changes should be avoided in functional PRs

Next Steps

Please address the issues above, particularly:

  1. The OpenCypherTCKTest resource loading change
  2. Commented-out exclusions in engine/pom.xml
  3. Add a PR description explaining the changes

Happy to re-review once these are addressed!

@codacy-production
Copy link

codacy-production bot commented Feb 10, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-12.52%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5ee9ad4) 83715 60114 71.81%
Head commit (9c6c20e) 113273 (+29558) 67157 (+7043) 59.29% (-12.52%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3398) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Fix Tentative Unit Test Execution

I've reviewed PR #3398 and have the following feedback:

Positive Changes

  1. Improved Test Categorization: The introduction of the benchmark tag is excellent for separating performance tests from regular unit tests. This aligns with the project's goal of faster CI pipelines.

  2. Dedicated Benchmark Job: The new benchmark-unit-tests job properly isolates benchmark tests, preventing them from slowing down regular test runs while still collecting coverage data.

  3. Phase Change Rationale: Switching from verify to package phase prevents unintended integration test execution during unit test runs. This is a smart fix.

  4. Better Test Coverage Verification: The consoleLogStatement() test now properly validates output instead of just checking for no exceptions - good improvement.

  5. Consistent Formatting: The pom.xml formatting cleanup (indentation, whitespace) improves readability.

⚠️ Issues and Concerns

1. OpenCypherTCKTest Tag Change (Critical)

File: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:36

-@Tag("slow")
+@Tag("benchmark")

Issue: This changes the semantic meaning of the test. The OpenCypher TCK is a compliance test suite, not a benchmark. It's slow because it runs hundreds of feature scenarios, but it's fundamentally different from performance benchmarks.

Recommendation: Keep it as @Tag("slow"). If you want to exclude it from regular runs, add it to the -DexcludedGroups parameter instead:

-DexcludedGroups=slow,benchmark -Dtest=!OpenCypherTCKTest

2. Maven Surefire Configuration Mismatch

File: pom.xml:239

The failsafe plugin now excludes Cucumber engine:

<excludeJUnit5Engines>
    <excludeJUnit5Engine>cucumber</excludeJUnit5Engine>
</excludeJUnit5Engines>

Potential Issue: This might prevent the OpenCypherTCKTest from running in integration tests if that's where it was intended to run. The test uses @IncludeEngines("cucumber"), so excluding it globally could cause the test to be silently skipped.

Recommendation: Verify that OpenCypherTCKTest runs in the intended job (benchmark-unit-tests). If it should run separately, consider creating a dedicated TCK test job instead of mixing it with benchmarks.

3. System.out Capture Pattern

File: engine/src/test/java/com/arcadedb/query/sql/executor/SQLScriptAdditionalCoverageTest.java:287-301

The test captures System.out which is good, but:

Minor Issue: The pattern doesn't handle potential concurrent test execution well. If JUnit runs tests in parallel (which surefire can do), this could capture output from other tests.

Recommendation: Consider using a logging framework capture or ensuring this test class runs in single-threaded mode. Not critical for now, but worth noting.

4. Duplicate Property Definitions

File: pom.xml:76,78

<maven-install-plugin.version>3.1.4</maven-install-plugin.version>  <!-- line 72 -->
...
<maven-install-plugin.version>3.1.4</maven-install-plugin.version>  <!-- line 76 -->

Issue: The maven-install-plugin.version property is defined twice in the pom.xml properties section.

Recommendation: Remove one of the duplicate definitions.

📊 Test Coverage Considerations

The PR correctly collects coverage from all test categories (unit, slow, benchmark, integration) and merges them in the coverage-report job. This ensures no coverage data is lost.

However, verify that:

  • Benchmark tests actually contribute meaningful coverage (they might exercise different code paths)
  • The --fail-never flag doesn't hide real test failures that should block CI

🔍 Security & Performance

No security concerns identified. The changes are test infrastructure improvements.

Performance impact should be positive:

  • Regular CI runs will be faster (excludes slow and benchmark tests)
  • Parallel job execution maintains overall coverage quality
  • Benchmark tests run separately without blocking other checks

📋 Suggested Actions

  1. Revert OpenCypherTCKTest tag change - Keep it as slow rather than benchmark
  2. Remove duplicate maven-install-plugin.version property
  3. Verify OpenCypherTCKTest execution - Ensure it runs in the intended job and isn't silently excluded
  4. Consider test isolation for System.out capture test (low priority)

Overall Assessment

This is a solid improvement to the test execution strategy. The separation of benchmark tests is well-designed and follows best practices for CI optimization. The main concerns are semantic (TCK vs benchmark classification) and a potential duplicate property definition.

Recommendation: Address the critical issues (TCK tag, duplicate property) before merging. The other items are minor suggestions for future consideration.


Review conducted by Claude Code following CLAUDE.md guidelines

@claude
Copy link

claude bot commented Feb 10, 2026

Code Review - PR #3398: Fix Tentative Unit Test Execution

Summary

This PR refactors the CI test execution strategy by:

  1. Changing Maven lifecycle from verify to package for unit tests
  2. Introducing a new @Tag("benchmark") to separate benchmark tests
  3. Adding a dedicated benchmark-unit-tests job in the CI pipeline
  4. Improving test coverage for CONSOLE.OUTPUT statement
  5. Code formatting and dependency organization improvements

✅ Strengths

1. Improved Test Organization

  • Benchmark Separation: Properly tags benchmark tests with @Tag("benchmark") and runs them separately, preventing them from slowing down standard unit test runs
  • Clear Intent: The new CI job structure makes it explicit when benchmarks run vs regular tests

2. Correct Maven Lifecycle Usage

  • Phase Change: Moving from verifypackage prevents unintended integration test execution
  • Accurate Comment: The updated comment correctly explains that package phase runs surefire (test) and JaCoCo report (prepare-package) without reaching integration-test phase

3. Enhanced Test Coverage

  • SQLScriptAdditionalCoverageTest.java (line 263-283): The consoleLogStatement() test now properly verifies output instead of just ensuring no exceptions
    • Captures System.out and validates the actual output
    • Follows proper cleanup pattern with try-finally
    • Adheres to project's preferred assertion style: assertThat(output).isEqualTo("test log message")

4. Dependency Management

  • engine/pom.xml: Extracted JMH version to property ${jmh.version} (1.37) - good practice for version consistency
  • integration/pom.xml: Added proper exclusions for Cucumber dependencies in engine test-jar to prevent unwanted transitive dependencies

⚠️ Issues & Concerns

1. 🔴 Critical: OpenCypherTCKTest Tag Change

File: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:216

-@Tag("slow")
+@Tag("benchmark")

Issue: This changes the semantic categorization of the OpenCypher TCK test suite from "slow" to "benchmark". TCK (Technology Compatibility Kit) tests are conformance tests, not benchmarks.

Impact:

  • TCK tests verify specification compliance and should run in the "slow" test category
  • Benchmark tests measure performance characteristics
  • This misclassifies the test's purpose

Recommendation: Revert this change. If TCK tests need to be excluded from regular runs, use multiple tags:

@Tag("slow")
@Tag("tck")
public class OpenCypherTCKTest {

2. 🟡 Medium: Inconsistent Annotation Import

File: engine/src/test/java/com/arcadedb/query/opencypher/tck/OpenCypherTCKTest.java:204

-import org.junit.platform.suite.api.SelectClasspathResource;
+import org.junit.platform.suite.api.SelectPackages;

Issue: The annotation change from @SelectClasspathResource("opencypher/tck/features") to @SelectPackages("opencypher.tck.features") may cause the test to fail finding feature files.

Concerns:

  • SelectClasspathResource looks for files/directories on the classpath (e.g., .feature files)
  • SelectPackages looks for Java packages with test classes
  • TCK feature files are resources, not Java packages

Recommendation: Verify this change doesn't break TCK test discovery. If feature files are in src/test/resources/opencypher/tck/features/, the original SelectClasspathResource is likely correct.

3. 🟡 Medium: Whitespace-Only Changes

Files:

  • engine/src/test/java/com/arcadedb/query/sql/TriggerBenchmark.java:237-243
  • engine/src/test/java/performance/QueryLanguageBenchmark.java:332-350,419-429

Issue: Multiple lines have alignment-only whitespace changes (e.g., variable declarations aligned to columns)

-  private static Database database;
+  private static       Database database;

Concerns:

  • Per CLAUDE.md: "adhere to the existing code" style
  • These changes add noise to the diff without functional benefit
  • Not consistently applied across the codebase

Recommendation: Remove cosmetic whitespace changes unless they're part of a dedicated formatting effort. Focus on functional changes only.

4. 🟡 Medium: Wildcard Import Replaced

File: engine/src/test/java/performance/QueryLanguageBenchmark.java:339-350

-import java.util.*;
+import java.util.ArrayList;
+import java.util.Arrays;
+// ... 8 more explicit imports

Issue: While explicit imports are generally preferred, this change is cosmetic and not mentioned in the PR description.

Observation: CLAUDE.md states "don't use fully qualified names if possible, always import the class" but doesn't mandate explicit vs wildcard imports.

Recommendation: This is fine if it's project standard, but consider applying consistently or deferring to a separate formatting PR.

5. 🟢 Minor: Duplicate Property Definition

File: pom.xml:522-523

<license-maven-plugin.version>5.0.0</license-maven-plugin.version>
<maven-install-plugin.version>3.1.4</maven-install-plugin.version>

Observation: These properties already exist earlier in the file (lines 519, 522 in the diff). The diff shows they're being reorganized/deduplicated, which is good. Verify no duplicates remain.

6. 🟢 Minor: CI YAML Formatting

File: .github/workflows/mvn-test.yml:20-22, 40-41

-          list-suites: "failed"
           list-tests: "failed"
+          list-suites: "failed"

Issue: Parameter order changed (cosmetic)

Impact: None - YAML doesn't care about parameter order


🔍 Testing Verification

Required Testing:

  1. Verify CI pipeline runs successfully with the new job structure
  2. ⚠️ Verify OpenCypherTCKTest still discovers and runs feature files after annotation change
  3. Verify benchmark tests are properly excluded from standard unit test runs with -DexcludedGroups=benchmark
  4. Verify consoleLogStatement() test passes with the new assertion

Commands to Verify Locally:

# Test the package phase runs unit tests but not ITs
mvn package -Pcoverage -DexcludedGroups=slow,benchmark -Dtest=!OpenCypherTCKTest

# Test slow tests run separately
mvn package -Pcoverage -pl engine -Dgroups=slow -Dtest=!OpenCypherTCKTest

# Test benchmark tests run separately
mvn package -Pcoverage -Dgroups=benchmark -Dtest=!OpenCypherTCKTest

# Verify OpenCypherTCKTest can still run
cd engine && mvn test -Dtest=OpenCypherTCKTest

🛡️ Security & Performance

Security: ✅ No security concerns identified

Performance:

  • Positive: Separating benchmarks will speed up regular CI runs
  • Positive: Using package instead of verify prevents accidental IT execution

📋 Recommendations

Must Fix:

  1. Revert OpenCypherTCKTest tag from benchmark back to slow (or use both tags appropriately)
  2. Verify SelectPackages change doesn't break TCK test discovery - test locally before merging

Should Consider:

  1. Remove cosmetic whitespace changes (variable alignment) to reduce diff noise
  2. Verify no duplicate properties remain in pom.xml

Nice to Have:

  1. Document the test categorization strategy (slow, benchmark) in CLAUDE.md for future contributors

📊 Adherence to CLAUDE.md

Followed:

  • Test coverage improvements (consoleLogStatement test)
  • Used assertThat(...).isTrue() style assertions
  • Proper dependency management
  • No System.out left in production code

⚠️ Violated:

  • "adhere to the existing code" - cosmetic whitespace changes deviate from surrounding code
  • Possibly semantic change to TCK test categorization without clear justification

Final Verdict

Approve with Changes Required 🟡

This PR makes valuable improvements to test organization and CI efficiency. However, the OpenCypherTCKTest categorization change requires verification/correction before merging. The TCK suite is a conformance test, not a benchmark, and changing its tag may cause it to be skipped inappropriately.

Once the TCK test issue is addressed, this PR will be ready to merge.


Reviewed per CLAUDE.md guidelines

@robfrank robfrank force-pushed the test-and-coverage-review branch 2 times, most recently from 247afe9 to 9c6c20e Compare February 12, 2026 12:31
@claude
Copy link

claude bot commented Feb 12, 2026

Pull Request Review: Fix Tentative Unit Test Execution

Summary

This PR improves the CI/CD test execution by separating benchmark tests from regular/slow unit tests and fixing the Maven lifecycle phase to avoid running integration tests during unit test phases. Overall, this is a well-structured improvement to the test infrastructure.


✅ Strengths

1. Proper Test Separation

  • Excellent use of JUnit 5 @Tag("benchmark") to categorize benchmark tests
  • Creates dedicated CI job (benchmark-unit-tests) for benchmark execution
  • Prevents benchmarks from slowing down regular unit test runs

2. Maven Lifecycle Fix

  • Critical fix: Changed from verify to package phase for unit tests
  • This correctly prevents integration tests from running during unit test execution
  • Comment in workflow clearly explains the rationale: "package phase runs surefire (test) and JaCoCo report (prepare-package) without reaching integration-test phase"

3. Dependency Management Improvements

  • Adds BOM (Bill of Materials) for JUnit and AssertJ at root level for version consistency
  • Adds Cucumber BOM in engine module
  • Removes hardcoded versions from individual dependencies (leveraging BOM versions)
  • Uses jmh.version property for JMH dependencies

4. Cucumber Configuration

  • Adds cucumber.junit-platform.naming-strategy=long to surefire to improve test reporting
  • Excludes cucumber engine from failsafe (integration tests) - prevents unwanted auto-discovery
  • Integration module properly excludes transitive cucumber dependencies it doesn't need

5. Code Quality

  • SQLScriptAdditionalCoverageTest.java: Improved test for CONSOLE.OUTPUT that actually verifies output
  • Removed obsolete OpenCypherTCKTest.java (functionality now in OpenCypherTCKSuite.java)

⚠️ Issues & Concerns

1. Potential Test Naming Conflict (Minor)

All three test job reporters use the same name:

name: Unit Tests Report

This applies to:

  • unit-tests job
  • slow-unit-tests job
  • benchmark-unit-tests job

Recommendation: Use distinct names like "Fast Unit Tests Report", "Slow Unit Tests Report", "Benchmark Tests Report" to avoid confusion.

2. Failsafe Cucumber Exclusion (Clarification Needed)

<excludeJUnit5Engines>
    <excludeJUnit5Engine>cucumber</excludeJUnit5Engine>
</excludeJUnit5Engines>

Question: Are there integration tests that legitimately use Cucumber? If so, this exclusion might prevent them from running. If not, this is fine.

3. OpenCypherTCKSuite Changes (Potential Issue)

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")

Concern: This changes from selecting a classpath resource (feature files) to selecting a package. Are the Cucumber .feature files located in a package opencypher.tck.features? Cucumber typically uses @SelectClasspathResource for feature files. This might break TCK test discovery.

Also removed: @Disabled and @Tag("slow") annotations - was this intentional? The suite is now enabled but there's no clear path to run it since OpenCypherTCKTest.java was deleted.

4. Integration Module Exclusions (Verify Necessity)

The integration module now excludes cucumber and junit-platform-suite from the engine test-jar dependency:

<exclusions>
    <exclusion>
        <groupId>io.cucumber</groupId>
        <artifactId>cucumber-java</artifactId>
    </exclusion>
    <!-- ... -->
</exclusions>

Verify: Does integration module actually need to avoid these transitive dependencies? If it doesn't use them, this is good cleanup. Otherwise, might cause ClassNotFoundExceptions.

5. Code Formatting Inconsistencies (Style)

Several files have formatting-only changes (e.g., QueryLanguageBenchmark.java, TriggerBenchmark.java):

  • Alignment changes for variable declarations
  • Import statement expansions (replacing * with specific imports)
  • Line wrapping changes

While these improve code style, they create noise in the diff. According to CLAUDE.md: "adhere to the existing code" - ensure these align with project formatting standards.

6. Missing Test Verification (Process)

Per CLAUDE.md requirements:

"Before writing any code: State how you will verify this change works"

Question: Have the following been verified?

  • Do benchmark tests now run in their dedicated job?
  • Do regular unit tests complete faster without benchmarks?
  • Does the TCK suite still work after the changes to OpenCypherTCKSuite.java?
  • Do all existing tests still pass?

🔍 Additional Observations

Properties Organization (Minor)

In root pom.xml, some properties were reordered. While this improves organization, the duplicate declaration was removed - good cleanup!

Module Order Change

 <modules>
+    <module>test-utils</module>
     <module>engine</module>

test-utils moved before engine. This is correct since engine likely depends on test-utils for testing.


🎯 Recommendations

Critical

  1. Verify TCK suite functionality - The changes to OpenCypherTCKSuite.java need testing
  2. Test the benchmark job - Ensure benchmarks run in the new dedicated CI job

Important

  1. Rename test reporter jobs - Use unique names for each test category
  2. Document the change - Consider updating README or contributing guide about the new test categories

Nice-to-have

  1. Verify integration exclusions - Confirm cucumber exclusions don't break anything
  2. Consider adding comments - In POM files explaining why cucumber is excluded from certain modules

🎓 Security & Performance

  • ✅ No security concerns identified
  • ✅ No new dependencies added (only reorganized existing ones)
  • ✅ Performance improvement: Benchmark tests separated from fast tests
  • ✅ All licenses remain Apache 2.0 compatible

Overall Assessment

Status: ✅ Approve with minor recommendations

This is a solid improvement to the test infrastructure. The main concern is ensuring the OpenCypher TCK changes don't break test discovery. Once the TCK suite is verified to work, this PR should be good to merge.

The separation of benchmark tests is excellent and will improve CI performance. The Maven lifecycle fix is technically correct and important.

Suggested next steps:

  1. Manually run the OpenCypher TCK suite to verify it still works
  2. Trigger a CI run to verify all three test jobs execute correctly
  3. Consider the naming suggestions for test reporters

@robfrank robfrank force-pushed the test-and-coverage-review branch from 9c6c20e to 5c9c26f Compare February 13, 2026 15:16
@claude
Copy link

claude bot commented Feb 13, 2026

Pull Request Review: Fix tentative unit test execution

Summary

This PR addresses test execution and coverage reporting issues by:

  1. Separating benchmark tests from regular unit tests using JUnit 5 tags
  2. Fixing Cucumber/OpenCypher TCK test execution
  3. Improving Maven surefire/failsafe configuration
  4. Adding proper dependency management for test frameworks

Code Quality & Best Practices ✅

Strengths:

  • Good separation of concerns: Benchmark tests are now properly tagged and run separately
  • Dependency management improvements: Adding JUnit and AssertJ BOMs ensures consistent versions
  • Cleaner CI workflow: The new benchmark-unit-tests job isolates performance tests
  • Better test coverage verification: SQLScriptAdditionalCoverageTest now verifies console output

Areas for Improvement:

1. Commented-out Code (engine/pom.xml:96-110)
The commented-out maven-surefire-plugin configuration should be removed entirely. Per CLAUDE.md: 'If something is unused, delete it completely.'

2. Whitespace-Only Changes
Several files have formatting changes unrelated to the fix (TriggerBenchmark.java, QueryLanguageBenchmark.java, pom.xml). While these improve consistency, they make the PR diff harder to review.

3. Cucumber Filter Configuration (pom.xml:202)
cucumber.filter.name='not .IT.' relies on naming conventions. Consider using Cucumber tags instead for more explicit control.

Potential Issues 🔍

1. Deleted OpenCypherTCKTest.java
How should developers run the OpenCypher TCK tests now? The OpenCypherTCKSuite.java is a Suite class, not a test entry point. Please document the new way to run TCK tests.

2. Maven Phase Inconsistency
The CI workflow comment says 'package phase runs surefire...' but the command uses 'verify' phase. This mismatch could cause confusion. Either update the comment or use package phase as stated.

3. Cucumber Engine Exclusion
The failsafe plugin now excludes the cucumber engine (pom.xml:253-255). Does this prevent any Cucumber-based integration tests from running?

Performance Considerations ✅

The separation of benchmark tests into their own CI job is excellent for faster feedback, better resource allocation, and improved visibility.

Test Coverage 📊

SQLScriptAdditionalCoverageTest enhancement is a real improvement - it now actually verifies console output instead of just checking for no exceptions.

Recommendations Summary

Critical:

  1. Document TCK test execution after removing OpenCypherTCKTest.java
  2. Fix comment vs implementation mismatch in CI workflow

High Priority:

  1. Remove commented code in engine/pom.xml lines 96-110
  2. Verify Cucumber exclusion doesn't break intended test scenarios

Nice to Have:

  1. Consider formatting-only commit for cleaner git history
  2. Use Cucumber tags instead of name-based filtering

Overall Assessment

This is a solid improvement to the test infrastructure. The main concerns are around documentation (TCK execution) and minor inconsistencies between comments and implementation.

Status: Approve with minor changes requested ✅

@claude
Copy link

claude bot commented Feb 13, 2026

PR Review: Fix tentative unit test execution

I've reviewed this PR which refactors test execution configuration and organization. Here's my detailed feedback:

Strengths

  1. Better Test Organization: Separating benchmark tests into a dedicated category with the @Tag("benchmark") annotation is a good architectural improvement
  2. Improved CI Workflow: Adding a dedicated benchmark-unit-tests job prevents slow benchmarks from blocking regular unit tests
  3. Cleaner Configuration: Removing the need for explicit test exclusions (-Dtest=!OpenCypherTCKTest) in favor of tag-based filtering is more maintainable
  4. Dependency Management: Properly using dependency management sections (BOMs) for JUnit and AssertJ promotes version consistency

🔍 Issues & Concerns

1. Commented-out Code (engine/pom.xml)

Lines 169-183 contain a large block of commented-out configuration for maven-surefire-plugin. This violates the project's coding standards:

<!--            <plugin>-->
<!--                <groupId>org.apache.maven.plugins</groupId>-->
<!--                <artifactId>maven-surefire-plugin</artifactId>-->

Recommendation: Remove commented code entirely. If needed for reference, it's in git history.

2. Deleted Test Class May Still Be Referenced

OpenCypherTCKTest.java was deleted, but the suite class (OpenCypherTCKSuite) still exists. The old test was a runner that instantiated the suite.
Questions:

  • How are TCK tests now executed? The suite is annotated with @Suite but has no @Disabled annotation removed
  • Is there a new execution mechanism? This needs clarification in the PR description

3. Inconsistent Resource Loading Change

In OpenCypherTCKSuite.java:

-@SelectClasspathResource("opencypher/tck/features")
+@SelectPackages("opencypher.tck.features")

This changes from classpath resource lookup to package selection. Without seeing the resource structure, this could break TCK feature discovery.
Recommendation: Verify that TCK features are still discovered correctly.

4. Formatting-only Changes

Multiple files have whitespace/alignment changes that don't affect functionality (e.g., QueryLanguageBenchmark.java, TriggerBenchmark.java). While consistency is good, these clutter the diff.

-  private static Database database;
+  private static       Database database;

Recommendation: Consider splitting formatting changes into a separate commit for clearer review.

5. Missing Test Verification

The SQLScriptAdditionalCoverageTest.consoleLogStatement() improvement is good - it actually verifies output now instead of just checking for no exceptions. However:

  • What happens if the test runs in an environment where System.out is already redirected?
  • Should use a more robust approach with try-with-resources if possible

6. GitHub Workflow Changes

The workflow changes have potential issues:

a) Maven Lifecycle Phase Mismatch

-        run: ./mvnw verify -Pcoverage ...
+        run: ./mvnw package -Pcoverage ...  # for slow-unit-tests job

The comment on line 11 says "package phase runs surefire (test) and JaCoCo report (prepare-package)", but the actual command still uses verify for the main job and package for slow tests. This inconsistency is confusing.

b) Benchmark Coverage Dependency

-    needs: [ unit-tests, integration-tests, slow-unit-tests ]
+    needs: [ unit-tests, integration-tests, slow-unit-tests, benchmark-unit-tests ]

This makes coverage reports wait for benchmarks, which could significantly delay feedback. Consider if benchmark coverage is critical for the main coverage report.

7. Duplicate Property Definitions (pom.xml)

Lines 638-639 duplicate the maven-install-plugin.version:

<maven-install-plugin.version>3.1.4</maven-install-plugin.version>  <!-- line 71 -->
<!-- ... -->
<maven-install-plugin.version>3.1.4</maven-install-plugin.version>  <!-- line 638 -->

Recommendation: Remove the duplicate.

8. Integration Module Cucumber Exclusions

The integration/pom.xml excludes Cucumber dependencies from the engine test-jar. This makes sense if integration tests don't need TCK tests, but should be documented.

🎯 Recommendations

  1. Remove all commented code - commit history preserves it
  2. Add PR description explaining:
    • Why TCK test execution changed
    • How TCK tests are now run
    • Why benchmark tests needed separation
  3. Verify TCK feature discovery works with @SelectPackages
  4. Fix duplicate properties in root pom.xml
  5. Consider splitting formatting changes into a separate commit
  6. Test the changes locally with:
    mvn clean verify -Pcoverage -DexcludedGroups=slow,benchmark
    mvn test -Dgroups=slow
    mvn test -Dgroups=benchmark

📊 Test Coverage Considerations

According to CLAUDE.md guidelines:

all new server-side code must be tested with a test case

The consoleLogStatement() test improvement is good, but ensure:

  • Benchmark tests still run somewhere (they do - in dedicated job ✓)
  • TCK tests are still executed (unclear - needs verification)

🔒 Security & Performance

  • No security concerns identified
  • Performance: Separating benchmarks from regular tests should improve CI performance ✓

Overall Assessment

This is a reasonable refactoring that improves test organization, but needs cleanup:

  • Remove commented code
  • Fix duplicate properties
  • Clarify TCK test execution
  • Add PR description

Status: Needs minor revisions before merging.

@codacy-production
Copy link

codacy-production bot commented Feb 13, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
Report missing for b3c2d011
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (b3c2d01) Report Missing Report Missing Report Missing
Head commit (1b7bd0f) 114105 66572 58.34%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3398) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Footnotes

  1. Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

@robfrank robfrank force-pushed the test-and-coverage-review branch from 5fc3498 to 771db66 Compare February 13, 2026 16:22
@claude
Copy link

claude bot commented Feb 13, 2026

PR Review: Fix Unit Test Execution

Summary

This PR restructures the test execution strategy by:

  1. Introducing a new benchmark tag for performance tests
  2. Separating benchmark tests into a dedicated CI job
  3. Removing the problematic OpenCypherTCKTest wrapper class
  4. Cleaning up POM configurations and dependency management
  5. Improving test isolation and coverage reporting

Positive Changes ✅

1. Benchmark Test Separation

  • Good: Isolating benchmark tests (@Tag("benchmark")) into a separate CI job prevents them from slowing down regular unit tests
  • Files affected: 7 benchmark test classes now properly tagged
  • CI improvement: New benchmark-unit-tests job runs independently

2. Dependency Management Improvements

  • Good: Centralized Cucumber BOM in engine/pom.xml (lines 191-201)
  • Good: Added JUnit and AssertJ BOMs to parent POM for version consistency
  • Good: JMH version now centralized in property (jmh.version)
  • Good: Integration module excludes unnecessary Cucumber dependencies from test-jar

3. OpenCypherTCK Cleanup

  • Good: Removed problematic OpenCypherTCKTest wrapper class that was being excluded
  • Good: OpenCypherTCKSuite now uses @SelectPackages instead of @SelectClasspathResource
  • Good: Removed @Disabled annotation from TCK suite, making it runnable

4. Test Coverage Enhancement

  • Excellent: SQLScriptAdditionalCoverageTest.consoleLogStatement() now actually validates output instead of just verifying "it doesn't throw"
  • Proper assertion added: assertThat(output).isEqualTo("test log message")

5. CI Workflow Improvements

  • Changed from verify to package phase for slow tests (avoids running integration-test phase)
  • Proper exclusion of slow and benchmark tests from regular unit tests
  • Coverage reporting now includes benchmark test coverage

Concerns & Issues ⚠️

1. Commented-Out Code (Critical)

engine/pom.xml lines 169-183: Large block of commented-out XML configuration

  • Issue: Violates project guidelines ("remove any System.out you used for debug when you have finished")
  • Recommendation: Either remove completely or document why it's commented out
  • Action: Delete this block entirely - Maven Surefire configuration should be in parent POM
<!--            <plugin>-->
<!--                <groupId>org.apache.maven.plugins</groupId>-->
<!--                <artifactId>maven-surefire-plugin</artifactId>-->
...
<!--            </plugin>-->

2. Missing Cucumber Engine Exclusion Strategy

pom.xml lines 709-712: Added global excludeJUnit5Engine for cucumber in failsafe

  • Issue: This affects ALL integration tests, but only engine module uses Cucumber
  • Question: Why is this needed globally instead of just in engine module?
  • Risk: Could mask issues if other modules want to use Cucumber for integration tests

3. Inconsistent Maven Phase Usage

  • Regular unit tests: mvn verify (line 12)
  • Slow unit tests: mvn package (line 31)
  • Benchmark tests: mvn verify (line 69)

Question: Why does slow-unit-tests use package but benchmark-unit-tests uses verify?

  • The comment on line 11 says "package phase runs surefire (test) and JaCoCo report (prepare-package)", but this isn't applied consistently

4. Cucumber Configuration Concerns

pom.xml lines 695-702: New cucumber configuration parameters

<configurationParameters>
    cucumber.junit-platform.naming-strategy=long
    cucumber.filter.name="not .*IT.*"
</configurationParameters>
  • Issue: The filter "not .*IT.*" is a workaround for Surefire not disambiguating scenarios
  • Question: Is this robust enough? What if legitimate test names contain "IT"?
  • Recommendation: Consider using more specific Cucumber tags instead of name filtering

5. Module Order Change

pom.xml lines 668-676: test-utils moved before engine

  • Question: Is this intentional? Does engine depend on test-utils?
  • Risk: Build order matters for multi-module projects
  • Recommendation: Add comment explaining why this order is important

6. Code Style Inconsistency

engine/src/test/java/com/arcadedb/query/sql/TriggerBenchmark.java lines 357-360:

private static       Database database;
private static final int      WARMUP_ITERATIONS    = 10000;
private static final int      BENCHMARK_ITERATIONS = 100000;
  • Issue: Vertical alignment of variable names (column alignment with spaces)
  • Guideline Violation: Project guidelines don't mention this style, and it's not used elsewhere
  • Recommendation: Use standard formatting:
private static Database database;
private static final int WARMUP_ITERATIONS = 10000;
private static final int BENCHMARK_ITERATIONS = 100000;

7. Import Statement Explosion

performance/QueryLanguageBenchmark.java lines 457-467: Replaced import java.util.*; with explicit imports

  • Good: Explicit imports are generally better for clarity
  • Minor: This seems unrelated to the main PR goal
  • Question: Is this intentional or an IDE auto-format side effect?

Security Considerations 🔒

No security concerns identified. Changes are limited to test infrastructure.


Performance Considerations ⚡

Positive Impact:

  • Separating benchmark tests should speed up regular CI builds
  • Using package phase instead of verify for slow tests avoids unnecessary integration-test phase setup

Questions:

  1. How long do benchmark tests take? Should they run on every commit or be scheduled?
  2. Consider adding a timeout for benchmark jobs to prevent runaway tests

Test Coverage 🧪

Improved:

  • SQLScriptAdditionalCoverageTest.consoleLogStatement() now has proper assertions
  • Benchmark tests now contribute to overall coverage metrics

Missing:

  • No test validates the new Cucumber configuration works correctly
  • No validation that @Tag("benchmark") actually excludes tests from regular runs

Recommendation:

  • Add a simple test in CI that verifies benchmark-tagged tests don't run in regular unit test phase
  • Validate that TCK suite actually runs (or document if it's still disabled for another reason)

Build & Compilation ✓

Question: Has the project been compiled successfully with these changes?
Per project guidelines: "after every change in the backend (Java), compile the project and fix all the issues until the compilation passes"

Action Needed: Confirm mvn clean install succeeds


Recommendations Summary

Must Fix:

  1. ❌ Remove commented-out code in engine/pom.xml (lines 169-183)
  2. ❌ Fix code style inconsistency in TriggerBenchmark.java (vertical alignment)
  3. ❌ Document why test-utils moved before engine in module order

Should Consider:

  1. ⚠️ Make Maven phase usage consistent (all should use verify or explain differences)
  2. ⚠️ Move cucumber engine exclusion from parent POM to engine module only
  3. ⚠️ Add timeout configuration for benchmark job in CI
  4. ⚠️ Replace name-based cucumber filter with proper tag-based filtering

Nice to Have:

  1. 💡 Add a comment in CI workflow explaining why benchmarks are separate
  2. 💡 Document expected benchmark execution time
  3. 💡 Add regression test that validates test categorization works

Overall Assessment

This PR addresses real issues with test execution and improves CI organization. The core approach is sound, but several implementation details need refinement:

  • Scope: Appropriate - focuses on test infrastructure
  • Approach: Good - separating benchmark tests makes sense
  • Execution: Needs cleanup - remove commented code and address inconsistencies
  • Testing: Partially done - manual verification needed

Verdict: Approve with required changes before merge. The commented-out code and style inconsistencies must be addressed.


Estimated Review Confidence: High - I've reviewed all changes and understand the test infrastructure implications.

cc: @robfrank

@robfrank robfrank force-pushed the test-and-coverage-review branch from 771db66 to 1b7bd0f Compare February 13, 2026 17:22
@claude
Copy link

claude bot commented Feb 13, 2026

PR Review: Fix tentative unit test execution

I've reviewed this PR which refactors the test execution strategy and Maven configuration. Here are my findings:

✅ Positive Changes

  1. Better test organization: Introducing the @Tag("benchmark") annotation to separate benchmark tests from regular unit tests is excellent. This allows for more granular test execution control.

  2. Improved CI workflow: Creating a dedicated benchmark-unit-tests job ensures benchmarks run separately, preventing them from slowing down regular unit tests.

  3. Dependency management improvements:

    • Moving to BOM-based dependency management for JUnit and AssertJ in the root pom.xml (lines 802-818)
    • Using cucumber-bom in engine/pom.xml for better version consistency
    • Centralizing the JMH version in a property
  4. Test enhancement: The improvement to SQLScriptAdditionalCoverageTest.consoleLogStatement() (lines 381-400) now properly verifies the output instead of just checking that it doesn't throw.

  5. Cleaner TCK test structure: Removing OpenCypherTCKTest.java and simplifying OpenCypherTCKSuite.java removes unnecessary wrapper code.

⚠️ Concerns & Suggestions

1. Commented-out code in engine/pom.xml

Location: engine/pom.xml:169-183

The surefire plugin configuration is completely commented out. This should either be:

  • Removed entirely if no longer needed
  • Uncommented if it's still necessary

Recommendation: Remove the commented block or clarify why it needs to be preserved.

2. Inconsistent formatting changes

Locations: Multiple files (TriggerBenchmark.java:356-360, QueryLanguageBenchmark.java, etc.)

The PR includes many whitespace-only formatting changes (e.g., aligning variable declarations). While these improve readability, they:

  • Add noise to the diff
  • Make it harder to identify functional changes
  • Risk merge conflicts in active branches

Recommendation: Consider separating pure formatting changes into a dedicated formatting-only PR.

3. Missing exclusion in integration/pom.xml

Location: integration/pom.xml:598-611

While excluding Cucumber and JUnit Platform Suite dependencies from the engine test-jar is good, verify this doesn't break integration tests that might depend on these classes.

Recommendation: Run the full integration test suite to ensure nothing breaks.

4. Redundant property definitions

Location: pom.xml:623-627 and 89-91

The properties skipITs, skipTests, itCoverageAgent, and argLine are defined near the top (lines 623-627), but some like maven-install-plugin.version appear twice (lines 659 and 76).

Recommendation: Check for and remove any duplicate property definitions.

5. Cucumber engine exclusion in failsafe

Location: pom.xml:710-712

The failsafe plugin now explicitly excludes the cucumber engine. This is good for preventing unintended test discovery, but ensure this doesn't affect any legitimate integration tests using Cucumber.

6. CI workflow comment mismatch

Location: .github/workflows/mvn-test.yml:11

The comment says "package phase runs surefire" but the command uses verify phase, not package phase.

Recommendation: Update the comment to match reality: "verify phase runs surefire and generates coverage"

7. OpenCypherTCKSuite changes

Location: engine/src/test/java/.../OpenCypherTCKSuite.java:288-289

Changed from @SelectClasspathResource("opencypher/tck/features") to @SelectPackages("opencypher.tck.features"). Verify that:

  • The TCK feature files are still discovered correctly
  • The package selector works as expected with the cucumber engine

Also removed the @Disabled annotation - is this intentional? Will TCK tests now run in CI?

🔍 Testing Recommendations

  1. Verify benchmark tests run correctly:

    mvn test -Dgroups=benchmark
  2. Verify regular tests exclude benchmarks:

    mvn test -DexcludedGroups=benchmark
  3. Verify TCK suite runs (if @disabled removal was intentional):

    mvn test -Dtest=OpenCypherTCKSuite
  4. Full build verification:

    mvn clean install

📊 Summary

This PR makes meaningful improvements to test organization and CI efficiency. The main concerns are:

  • Cleanup of commented code
  • Separation of formatting changes from functional changes
  • Verification that all test categories still work correctly

The changes align with the project's goal of better test execution control, but need a few refinements before merging.

Risk Assessment: Medium

  • Test infrastructure changes can have cascading effects
  • CI workflow modifications should be verified through actual CI runs
  • Dependency exclusions need validation

Review generated with attention to CLAUDE.md guidelines

@claude
Copy link

claude bot commented Feb 13, 2026

Pull Request Review: Test Execution Refactoring

Positive Changes:

  • Excellent separation of benchmark tests using @tag("benchmark") annotation
  • Better CI/CD structure with dedicated benchmark-unit-tests job
  • OpenCypher TCK cleanup by removing wrapper class
  • Enhanced test coverage in consoleLogStatement() with actual output verification
  • Better dependency management with BOM for JUnit and AssertJ

Critical Issues:

  1. Commented-Out Code (engine/pom.xml lines 169-183)

    • Large block of surefire configuration is commented rather than removed
    • Creates maintenance debt - either remove or document with ticket reference
    • Per CLAUDE.md: commented debug code should be removed when finished
  2. Cucumber Engine Exclusion Inconsistency

    • Global exclusion added to failsafe plugin in root pom.xml
    • Previously in engine/pom.xml for surefire but now commented out
    • Clarify if Cucumber should be excluded globally or per-module
  3. OpenCypherTCKSuite Resource Loading Change

    • Changed from @SelectClasspathResource("opencypher/tck/features") to @SelectPackages("opencypher.tck.features")
    • @SelectPackages looks for Java packages, not resource files
    • MUST verify TCK tests execute successfully and find feature files
  4. Missing Build Verification

    • Per CLAUDE.md: "test all new and old components before considering job finished"
    • Need to run: mvn clean install and verify all test categories

Code Quality Issues:

  1. Formatting-Only Changes in QueryLanguageBenchmark.java

    • Mixed functional and style changes make review harder
    • Example: expanded import java.util.*; to individual imports
    • Separate formatting commits from functional changes
  2. Unnecessary Whitespace Changes

    • TriggerBenchmark.java line 359-361: alignment changes add noise
    • Per CLAUDE.md: "adhere to existing code"
  3. Missing PR Description

    • Empty PR body - add motivation and approach

Recommendations Before Merge:

  1. Remove or document commented code in engine/pom.xml
  2. Verify OpenCypher TCK tests run with @SelectPackages
  3. Run full build: mvn clean install
  4. Add PR description
  5. Revert unnecessary formatting changes
  6. Verify benchmark tests run in new CI job

Overall: Core idea is sound - separating benchmarks improves CI efficiency. However, critical issues need addressing first, particularly TCK test verification and commented code cleanup.

Recommendation: Request changes

Review conducted per ArcadeDB CLAUDE.md guidelines.

@claude
Copy link

claude bot commented Feb 13, 2026

Pull Request Review: Fix Tentative Unit Test Execution

Overview

This PR improves test organization by separating benchmark tests from regular unit tests and removes the problematic OpenCypherTCKTest wrapper. The changes enhance CI workflow efficiency and test maintainability.


Strengths

  1. Better Test Organization: Introducing the @tag("benchmark") annotation to separate performance benchmarks from regular tests is a solid architectural decision that will improve CI performance.

  2. Simplified TCK Testing: Removing the OpenCypherTCKTest wrapper and directly using OpenCypherTCKSuite is cleaner. The suite-based approach with JUnit Platform is the recommended pattern for Cucumber tests.

  3. Dependency Management Improvements:

    • Using BOMs for JUnit and AssertJ in the root POM ensures version consistency across modules
    • Moving Cucumber BOM to engine module dependency management is appropriate
    • Centralized jmh.version property reduces duplication
  4. Enhanced CI Workflow: Creating a dedicated benchmark-unit-tests job prevents benchmarks from slowing down regular test runs.

  5. Test Coverage Enhancement: The SQLScriptAdditionalCoverageTest.consoleLogStatement() improvement actually verifies output instead of just checking for no exceptions.


Issues and Concerns

1. Critical: Commented-Out Code Should Be Removed

Location: engine/pom.xml:154-183

The maven-surefire-plugin configuration is commented out rather than removed. Per CLAUDE.md guidelines: "Avoid backwards-compatibility hacks" and "If something is unused, delete it completely."

Recommendation: Remove the commented block entirely. The Cucumber exclusion is now properly handled in the root POMs surefire configuration.


2. Issue: Maven Phase Inconsistency

Location: .github/workflows/mvn-test.yml:31,69

  • Line 12: Uses mvn verify for regular unit tests
  • Line 31: Uses mvn package for slow unit tests
  • Line 69: Uses mvn verify for benchmark tests

Problem: The comment on line 11 says "package phase runs surefire... without reaching integration-test phase" but the command still uses verify. This is inconsistent with the slow-unit-tests job which correctly uses package.

Recommendation: For consistency and to match the comment, change line 12 to use mvn package instead of mvn verify, OR update the comment to explain why verify is still appropriate.

@claude
Copy link

claude bot commented Feb 13, 2026

3. Missing Test for Cucumber Filter

Location: pom.xml:668-675

The new Cucumber configuration includes a filter cucumber.filter.name="not .IT." to exclude integration tests from Surefire runs, but there is no verification that this actually works as intended.

Recommendation: Verify that the TCK suite runs correctly with this filter and does not inadvertently exclude valid scenarios. Consider adding a comment explaining why this specific filter pattern was chosen.


4. Code Style: Formatting Inconsistency

Location: engine/src/test/java/com/arcadedb/query/sql/TriggerBenchmark.java:356-358

The formatting change introduces inconsistent alignment. While this aligns the variable names, the extra spacing between static and Database looks unusual and is not consistent with the rest of the codebase.

Recommendation: Revert to standard formatting without the extra alignment spaces, which is more typical of the project style.


5. Import Organization Change

Location: engine/src/test/java/performance/QueryLanguageBenchmark.java:454-466

Wildcard imports were replaced with explicit imports. While this is generally good practice, the change from java.util.* to 13 individual imports increases verbosity without clear benefit for this test file.

Recommendation: This is acceptable but consider whether the verbosity adds value here, especially since this is a benchmark test file rather than production code.


Questions and Clarifications

  1. Cucumber Engine Exclusion: Why is the Cucumber engine now excluded from failsafe (line 684-686 in pom.xml) when it was previously only excluded from surefire? Should integration tests be able to run Cucumber scenarios?

  2. Coverage Aggregation: Does the coverage-report job correctly aggregate coverage from the new benchmark-unit-tests job? The dependency is added (line 110) but please verify the coverage files are properly merged.

  3. Benchmark Job Resource Usage: The benchmark tests might be resource-intensive. Have you considered:

    • Setting appropriate timeouts?
    • Using continue-on-error: true if benchmark failures should not block PRs?
    • Whether these should run on every PR or only on merge to main?

@claude
Copy link

claude bot commented Feb 13, 2026

Testing Recommendations

Before merging, please verify:

  1. Regular unit tests complete successfully without running benchmarks
  2. Slow unit tests run independently and produce coverage reports
  3. Benchmark tests execute successfully in the new job
  4. OpenCypherTCKSuite runs correctly without the wrapper class
  5. Coverage reports correctly aggregate from all test jobs
  6. The Cucumber filter does not exclude valid test scenarios
  7. No System.out statements remain from debugging (as per CLAUDE.md)

Security and Performance

  • No security concerns identified
  • Performance impact is positive - separating benchmarks will speed up regular CI runs
  • No new dependencies with licensing concerns

Minor Issues

  1. Whitespace change (line 133): Good catch on removing extra space
  2. Test report order (lines 21-22, 40-41, 79-80, 100-101): Swapping list-tests and list-suites order. Ensure this does not affect report parsing.

Summary

Overall Assessment: This is a well-structured PR that improves test organization and CI efficiency. The main concerns are:

  • Remove commented-out code (critical)
  • Fix Maven phase inconsistency in workflow
  • Address code formatting issues

Recommendation: Request changes to address the critical issues before merging. The architectural direction is sound.


Review based on ArcadeDB CLAUDE.md guidelines and best practices for multi-module Maven projects.

@codacy-production
Copy link

codacy-production bot commented Feb 13, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-13.46%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (b3c2d01) 84547 60703 71.80%
Head commit (617af3c) 114105 (+29558) 66563 (+5860) 58.33% (-13.46%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3398) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant