Refactor the integration testing code #549

YehorBoiar · 2024-12-07T19:32:33Z

This PR is created to address the issue #523

YehorBoiar · 2024-12-08T09:25:53Z

@ozgurakgun Please review

niklasdewally · 2024-12-08T13:33:45Z

Thank you for improving the docs as part of this :)

niklasdewally

LGTM, aside from some minor nitpicks!

conjure_oxide/tests/generated_tests.rs

In PR #421, tests were unintentionally passing due to expected files being overwritten regardless of whether the generated solutions matched Conjure’s solutions. This commit refactors the integration_test_inner function and related file I/O utilities so that: * When ACCEPT=false, tests still compare generated outputs (parse, rewrite, solutions, and traces) against the expected results as before. * When ACCEPT=true, tests now first verify that the generated solutions match the Conjure solutions. Only if they match are the expected files updated. If they don’t match, the test fails without updating the expected files.

github-actions · 2024-12-09T18:17:15Z

Code and Documentation Coverage Report

Documentation Coverage

This PR:

conjure_core: 4% with examples, 49% documented -- 8/98/223
conjure_oxide: 0% with examples, 22% documented -- 0/7/35
minion_rs: 30% with examples, 82% documented -- 2/23/112
tree-morph: 17% with examples, 50% documented -- 1/4/10

Main:

conjure_core: 4% with examples, 49% documented -- 8/98/223
conjure_oxide: 0% with examples, 19% documented -- 0/4/35
minion_rs: 30% with examples, 82% documented -- 2/23/112
tree-morph: 17% with examples, 50% documented -- 1/4/10

View full documentation coverage for main, this PR

Code Coverage Summary

This PR: Detailed Report

  lines......: 76.9% (5131 of 6669 lines)
  functions..: 61.7% (448 of 726 functions)
  branches...: no data found

Main: Detailed Report

  lines......: 76.7% (5145 of 6704 lines)
  functions..: 61.7% (448 of 726 functions)
  branches...: no data found

Coverage Main & PR Coverage Change

Lines coverage changed by 0.20% and covered lines changed by -14
Functions coverage changed by 0.00% and covered lines changed by 0
Branches... coverage: No comparison data available

ozgurakgun · 2024-12-09T20:05:30Z

I am not sure of this one yet @YehorBoiar - I feel we can tidy this up a bit more, but I need to spend a bit more time formulating my thoughts. Thanks for the work so far!

niklasdewally · 2024-12-11T14:47:02Z

@ozgurakgun @YehorBoiar

Maybe we need to think deeper about how we customise each stage of the tester.

Currently we have the following steps

Parse
Rewrite
Solve

(thanks Yehor for documenting this)

We could think about all the other stuff the tester does as stages too:

1a. Parse
1b. Check against other parser(s)
3a. Rewrite
3c. Check model properties (extra_asserts)
4a. Solve
4b. Check solutions against Conjure

These steps fall into two categories:

Optional steps, enabled by env var / configuration value: 1b,3c,4b
Required steps with multiple concrete implementations: e.g. native parser vs json parser, naive rewriter vs optimising one.

Thinking of everything as steps would make configuration simpler. For each optional step, we would parse a config.toml value enable_<step>. For each required step, we would keep a list of possible implementations, which could be changed in config by <step>_impl.

niklasdewally · 2024-12-11T14:52:04Z

It would also be nice to have parser only tests : e.g. for treesitter development or for when I'm adding more complicated stuff to the AST. Also no-solver tests so I can test normalisation rule PRs.

A possible approach for the above:

Parse test configuration / What are we testing? At the start of testing, we read config.toml and the environment variables, and construct a list of steps to perform.
Run tests: we run each step in the list, erroring if it fails.
Update tests: if accept=true, we run some custom on_accept() function for each step in our list.

This is a vague, high level proposal, so I'm happy to give a more concrete example in Rust.

ozgurakgun · 2024-12-13T11:11:40Z

(1) The steps having a similar shape to each other, maybe a step abstraction that takes in a function and implements the generate/check-against/accept logic once is closer to what I am thinking.

(2) We can also think about a global properties/assertions/warnings structure. That is enabled/disabled depending on what we are looking for in a particular run. An example is the prop_ we added in the naive rewriter. A test run or an executable run of oxide should be able ro enable/disable different properties at different levels. Consider it an error/warning/trace etc.

I have been looking into the existing tracing libraries for (2) but didn't manage to find one that does everything I think we need. I think we need named properties, registered centrally (maybe through codegen) and when at call site we decide how to interpret each property. (1) is this PR, (2) is a separate but related one...

niklasdewally · 2024-12-13T13:28:24Z

(2) We can also think about a global properties/assertions/warnings structure. That is enabled/disabled depending on what we are looking for in a particular run. An example is the prop_ we added in the naive rewriter. A test run or an executable run of oxide should be able ro enable/disable different properties at different levels. Consider it an error/warning/trace etc.

To clarify, these need to be dynamic, not static? If build time is fine, feature flags and simple declarative macros could take us most the way there...

Can we store these flags in the Context? Context already acts as a global variable store,accessible by the entire program which is initialised by the caller of Conjure Oxide API (be that the tester, CLI, user crates): this sounds like it fits the bill?

As we have serde, it would be easy to set these flags through a toml file instead of the command line. And we already read toml in the for the tests anyways, so we could do:

# config.toml
[test]
# test settings here
rewriter=  "naive"

[debug_flags]
# conjure oxide debug settings here
debug_check_only_one_applicable_rule= true

For example, we could parse the debug_flags section into a DebugFlags struct which is part of our Context.

niklasdewally · 2024-12-13T13:41:00Z

Moving our existing properties to the context and making it serialisable from the test toml should be possible to do immediately without too much effort.

Adding macros / a nicer API would hopefully then go on top of our existing stuff without much large scale refactoring.

ozgurakgun · 2024-12-13T14:31:12Z

dynamic as in runtime instead of compile time, yes.

I expect we won't change this during a run, but I guess it should be possible. A kind of debugging option where we increase the message levels during a run could be useful.

putting them in context is fine.

github-actions · 2025-01-13T00:12:20Z

The pull request is marked as stale because it has been inactive for 30 days. If there is no new activity it will be automatically closed in 5 days.

YehorBoiar · 2025-01-23T09:34:39Z

naive rewriter vs optimising one

We have 2 rewriters @niklasdewally?

niklasdewally · 2025-01-23T09:58:18Z

@YehorBoiar see #528.

I thought I pinged you in this last semester- apologies if you didn't get it!

YehorBoiar · 2025-01-23T11:42:58Z

There are a lot of thigs that we want to change in generated_tests. I decided to break down the implementation of those changes into steps. I want you @ozgurakgun @niklasdewally to check if those would cover everything we want to change.

1. Break our general stages 1,2,3 into substages.

1a. Parse (Convert the Essence model into an internal format.)
1b. Check against other parser(s) (Compare output from different parsing methods (e.g., native parser vs JSON parser).)
2a. Rewrite (Apply transformations to the parsed model (e.g., simplifying expressions, normalizing).)
2b. Check model properties (extra_asserts) (Verify additional model properties (e.g., ensure vector operators are evaluated).)
3a. Solve (Run the model through the solver to find solutions.)
3b. Check solutions against Conjure (Compare solutions with expected output from the Conjure solver.)

There are required stages: 1a, 2a, 3a - they have multiple implementations to them. We want to choose the implementation in config.toml e.g.

parser_impl = "native"
rewriter_impl = "optimizing"

And non-required stages we would want to choose by writing enable_<step> e.g.

enable_1b = true
enable_3c = true
enable_4b = false

All I need to do here is perhaps add more variables to TestConfig

struct TestConfig {
    extra_rewriter_asserts: Option<Vec<String>>,
//    skip_native_parser: Option<bool>,
//    use_naive_rewriter: Option<bool>,
    enable: Option<Vec<String>> // Perhaps we can just specify all optional stages in a vector of string instead of doing `enable_<step>`? @niklasdewally 
    impl: Option<Vec<String>> // Same here?  
// I think we wouldn't need skip_native_parser and use_naive_rewriter once we introduce `enable` and `impl` variables
}

As suggested by Nick, we will iterate through the enabled variables, executing only those steps, throwing errors if they fail, and rewriting if accept=true.

2. Step Abstraction

After ensuring that implenetation of step 1 works - we can make an abstraction for those 3 main steps. Now it has the following structure:

Generate something (parse model, rewrite model, solve).
Check output against expected results (compare parsed model, rewritten model, or solutions).
Accept new results if ACCEPT=true (update expected files).
We can create a function that would take a function in as a parameter - use that function to generate result we need, check against expected results, and then check if we watn to rewrite (accept=true)

3. Global Properties/Assertions/Warnings

I don't understand what do we want here... @ozgurakgun

(2) We can also think about a global properties/assertions/warnings structure. That is enabled/disabled depending on what we are looking for in a particular run. An example is the prop_ we added in the naive rewriter. A test run or an executable run of oxide should be able ro enable/disable different properties at different levels. Consider it an error/warning/trace etc.

I have been looking into the existing tracing libraries for (2) but didn't manage to find one that does everything I think we need. I think we need named properties, registered centrally (maybe through codegen) and when at call site we decide how to interpret each property. (1) is this PR, (2) is a separate but related one...

niklasdewally · 2025-01-23T11:49:26Z

I don't understand what do we want here...

An example is our check for multiple applicable rules. This is currently set using an argument passed into the rewriter, but when we have a lot of them managing them in our context object / using a library might be better

As mentioned above, it would be useful to turn these on and off for any given run (e.g. through the config file)

See #609 #591

I will read the rest later :)

niklasdewally · 2025-01-23T12:16:26Z

@YehorBoiar overall, the plan lgtm.

In case you didn't see this - #523 (comment).

We could potentially benefit from writing our own a custom test runner so that we don't have to keep hacking around rusts one. This would also allow us to do cli arguments

YehorBoiar · 2025-01-23T13:52:36Z

Should we have default stages that are going to run in case if config.toml is not specified? If so, should we run only 1a, 2a, 3a?

1a: Parse (Convert the Essence model into an internal format.) 1b: Check against other parser(s) (Compare output from different parsing methods (e.g., native parser vs JSON parser).) 2a: Rewrite (Apply transformations to the parsed model (e.g., simplifying expressions, normalizing).) 2b: Check model properties (extra_asserts) (Verify additional model properties (e.g., ensure vector operators are evaluated).) 3a: Solve (Run the model through the solver to find solutions.) 3b: Check solutions against Conjure (Compare solutions with expected output from the Conjure solver.) 4a: Check that the generated rules match expected traces

YehorBoiar · 2025-01-25T13:47:05Z

conjure_oxide/tests/generated_tests.rs

+    enable_2b: Option<bool>,
+    disable_3a: Option<bool>,
+    enable_3b: Option<bool>,
+    disable_4a: Option<bool>,


@niklasdewally @ozgurakgun I made it so we would need to explicitly disable main stages, and implicitly enable secondary stages. Does it make sence, or would it be a bit confusing for others?

niklasdewally · 2025-01-25T14:24:58Z

It might be clearer to have names here instead of (or as well as) the numbers. Also, for consistency, having all settings be opt-in (e.g. run_x, enable_x) instead of a mixture of opt-in and opt-out would be better.

…

On 25 January 2025 13:47:40 GMT, Yehor Boiar ***@***.***> wrote: @YehorBoiar commented on this pull request. > + disable_1a: Option<bool>, + enable_1b: Option<bool>, + disable_2a: Option<bool>, + enable_2b: Option<bool>, + disable_3a: Option<bool>, + enable_3b: Option<bool>, + disable_4a: Option<bool>, @niklasdewally @ozgurakgun I made it so we would need to explicitly disable main stages, and implicitly enable secondary stages. Does it make sence, or would it be a bit confusing for others? -- Reply to this email directly or view it on GitHub: #549 (review) You are receiving this because you were mentioned. Message ID: ***@***.***>

YehorBoiar · 2025-01-25T14:27:24Z

It might be clearer to have names here instead of (or as well as) the numbers. Also, for consistency, having all settings be opt-in (e.g. run_x, enable_x) instead of a mixture of opt-in and opt-out would be better.
…
On 25 January 2025 13:47:40 GMT, Yehor Boiar @.> wrote: @YehorBoiar commented on this pull request. > + disable_1a: Option, + enable_1b: Option, + disable_2a: Option, + enable_2b: Option, + disable_3a: Option, + enable_3b: Option, + disable_4a: Option, @niklasdewally @ozgurakgun I made it so we would need to explicitly disable main stages, and implicitly enable secondary stages. Does it make sence, or would it be a bit confusing for others? -- Reply to this email directly or view it on GitHub: #549 (review) You are receiving this because you were mentioned. Message ID: @.>

Yeah, but then we would compare solutions against conjure by default, which is not very nice.

niklasdewally · 2025-01-25T14:28:27Z

It might be clearer to have names here instead of (or as well as) the numbers. Also, for consistency, having all settings be opt-in (e.g. run_x, enable_x) instead of a mixture of opt-in and opt-out would be better.
…
On 25 January 2025 13:47:40 GMT, Yehor Boiar @.> wrote: @YehorBoiar commented on this pull request. > + disable_1a: Option, + enable_1b: Option, + disable_2a: Option, + enable_2b: Option, + disable_3a: Option, + enable_3b: Option, + disable_4a: Option, @niklasdewally @ozgurakgun I made it so we would need to explicitly disable main stages, and implicitly enable secondary stages. Does it make sence, or would it be a bit confusing for others? -- Reply to this email directly or view it on GitHub: #549 (review) You are receiving this because you were mentioned. Message ID: @.>

Yeah, but then we would compare solutions against conjure by default, which is not very nice

We can have fields default to false but still be worded as opt in.

niklasdewally · 2025-01-25T14:29:40Z

Is this branch updated to main? I cleaned the settings code up a bit since this pr was opened: primarily, default options handling and setting names.

Also, I changed the types of the settings to be bool not Option

YehorBoiar · 2025-01-25T14:29:57Z

It might be clearer to have names here instead of (or as well as) the numbers. Also, for consistency, having all settings be opt-in (e.g. run_x, enable_x) instead of a mixture of opt-in and opt-out would be better.
…
On 25 January 2025 13:47:40 GMT, Yehor Boiar @.> wrote: @YehorBoiar commented on this pull request. > + disable_1a: Option, + enable_1b: Option, + disable_2a: Option, + enable_2b: Option, + disable_3a: Option, + enable_3b: Option, + disable_4a: Option, @niklasdewally @ozgurakgun I made it so we would need to explicitly disable main stages, and implicitly enable secondary stages. Does it make sence, or would it be a bit confusing for others? -- Reply to this email directly or view it on GitHub: #549 (review) You are receiving this because you were mentioned. Message ID: _@**.**_>

Yeah, but then we would compare solutions against conjure by default, which is not very nice

We can have fields default to false but still be worded as opt in.

Like conjure_check: Option<bool> = false;?

niklasdewally · 2025-01-25T14:33:21Z

I think verb_noun is clearest: e.g enable_conjure_checks, check_solutions_against_conjure

Also see

#560

YehorBoiar · 2025-01-25T14:36:39Z

I see! I should probably rebase on main branch in this case

niklasdewally · 2025-01-25T14:43:18Z

I see! I should probably rebase on main branch in this case

I'm inclined to think that we should add these changes as part of a larger rewrite of the integration tester to use our own test harness, as linked above.

There's a few other bugbears and desired features that will be hard to hack onto the existing system, and a rewrite might be needed.

See the issues linked above and the PR linked in them.

Might be worth holding back on this until we discuss this in a group meeting with Soph and Liz as well, and gather a more complete list of requirements for a refactor rewrite.

Cc @ozgurakgun

niklasdewally · 2025-01-25T14:48:34Z

In particular, Oz wanted cli arguments for each config argument, and that might make this PR redundant.

YehorBoiar requested review from ozgurakgun and niklasdewally December 7, 2024 19:32

YehorBoiar marked this pull request as draft December 7, 2024 19:36

YehorBoiar marked this pull request as ready for review December 7, 2024 19:39

niklasdewally approved these changes Dec 8, 2024

View reviewed changes

conjure_oxide/tests/generated_tests.rs Outdated Show resolved Hide resolved

conjure_oxide/tests/generated_tests.rs Outdated Show resolved Hide resolved

conjure_oxide/tests/generated_tests.rs Outdated Show resolved Hide resolved

YehorBoiar force-pushed the gen_test_refactor branch from 0f12bf2 to 45edb56 Compare December 8, 2024 17:24

niklasdewally force-pushed the gen_test_refactor branch from 6d06e84 to 73bd17f Compare December 9, 2024 10:05

niklasdewally linked an issue Dec 9, 2024 that may be closed by this pull request

Refactor the integration testing code #523

Open

YehorBoiar added 2 commits December 9, 2024 18:03

Small correction in gen_test_refactor

60087ab

niklasdewally force-pushed the gen_test_refactor branch from 73bd17f to 60087ab Compare December 9, 2024 18:06

niklasdewally assigned YehorBoiar Dec 9, 2024

github-actions bot added the Stale label Jan 13, 2025

github-actions bot closed this Jan 19, 2025

YehorBoiar reopened this Jan 23, 2025

YehorBoiar added kind::refactor Improvements to existing code (style, performance, clarity, ...) kind::testing Testing and Correctness and removed Stale labels Jan 23, 2025

YehorBoiar commented Jan 25, 2025

View reviewed changes

Update docstring in integration_test_inner

48865e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the integration testing code #549

Refactor the integration testing code #549

YehorBoiar commented Dec 7, 2024

YehorBoiar commented Dec 8, 2024

niklasdewally commented Dec 8, 2024

niklasdewally left a comment

github-actions bot commented Dec 9, 2024

ozgurakgun commented Dec 9, 2024

niklasdewally commented Dec 11, 2024 •

edited

Loading

niklasdewally commented Dec 11, 2024 •

edited

Loading

ozgurakgun commented Dec 13, 2024

niklasdewally commented Dec 13, 2024 •

edited

Loading

niklasdewally commented Dec 13, 2024 •

edited

Loading

ozgurakgun commented Dec 13, 2024

github-actions bot commented Jan 13, 2025

YehorBoiar commented Jan 23, 2025

niklasdewally commented Jan 23, 2025

YehorBoiar commented Jan 23, 2025

niklasdewally commented Jan 23, 2025 •

edited

Loading

niklasdewally commented Jan 23, 2025 •

edited

Loading

YehorBoiar commented Jan 23, 2025

YehorBoiar Jan 25, 2025

niklasdewally commented Jan 25, 2025 via email

YehorBoiar commented Jan 25, 2025

niklasdewally commented Jan 25, 2025

niklasdewally commented Jan 25, 2025 •

edited

Loading

YehorBoiar commented Jan 25, 2025 •

edited

Loading

niklasdewally commented Jan 25, 2025 •

edited

Loading

YehorBoiar commented Jan 25, 2025

niklasdewally commented Jan 25, 2025 •

edited

Loading

niklasdewally commented Jan 25, 2025

Refactor the integration testing code #549

Are you sure you want to change the base?

Refactor the integration testing code #549

Conversation

YehorBoiar commented Dec 7, 2024

YehorBoiar commented Dec 8, 2024

niklasdewally commented Dec 8, 2024

niklasdewally left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 9, 2024

Code and Documentation Coverage Report

Documentation Coverage

Code Coverage Summary

Coverage Main & PR Coverage Change

ozgurakgun commented Dec 9, 2024

niklasdewally commented Dec 11, 2024 • edited Loading

niklasdewally commented Dec 11, 2024 • edited Loading

ozgurakgun commented Dec 13, 2024

niklasdewally commented Dec 13, 2024 • edited Loading

niklasdewally commented Dec 13, 2024 • edited Loading

ozgurakgun commented Dec 13, 2024

github-actions bot commented Jan 13, 2025

YehorBoiar commented Jan 23, 2025

niklasdewally commented Jan 23, 2025

YehorBoiar commented Jan 23, 2025

1. Break our general stages 1,2,3 into substages.

2. Step Abstraction

3. Global Properties/Assertions/Warnings

niklasdewally commented Jan 23, 2025 • edited Loading

niklasdewally commented Jan 23, 2025 • edited Loading

YehorBoiar commented Jan 23, 2025

YehorBoiar Jan 25, 2025

Choose a reason for hiding this comment

niklasdewally commented Jan 25, 2025 via email

YehorBoiar commented Jan 25, 2025

niklasdewally commented Jan 25, 2025

niklasdewally commented Jan 25, 2025 • edited Loading

YehorBoiar commented Jan 25, 2025 • edited Loading

niklasdewally commented Jan 25, 2025 • edited Loading

YehorBoiar commented Jan 25, 2025

niklasdewally commented Jan 25, 2025 • edited Loading

niklasdewally commented Jan 25, 2025

niklasdewally commented Dec 11, 2024 •

edited

Loading

niklasdewally commented Dec 11, 2024 •

edited

Loading

niklasdewally commented Dec 13, 2024 •

edited

Loading

niklasdewally commented Dec 13, 2024 •

edited

Loading

niklasdewally commented Jan 23, 2025 •

edited

Loading

niklasdewally commented Jan 23, 2025 •

edited

Loading

niklasdewally commented Jan 25, 2025 •

edited

Loading

YehorBoiar commented Jan 25, 2025 •

edited

Loading

niklasdewally commented Jan 25, 2025 •

edited

Loading

niklasdewally commented Jan 25, 2025 •

edited

Loading