-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect and print test data statistics #42
Comments
I'm not sure what a good UI for this would look like. Take this example: check all list <- list_of(integer(), min_length: 1),
elem <- member_of(list) do
assert elem in list
end How would you print statistics for this? Would you print what was the distribution of Furthermore, would printing after the property executes be a really good visual interface? You'd have a bunch of dots for the tests you're running, and then this blob of info about a property, then more dots, and so on. Thoughts? |
The kind of relevant statistics depends usually on the properties or the system under test. In your example, I would be interested in the length of the list. check all list <- list_of(integer(), min_length: 1),
elem <- member_of(list) do
assert elem in list
aggregate length(list)
end It shows if the distribution of the list length in order to answer the question if enough and relevant data is generated for tests. If we do the same for maps and we see the generated maps have less then than 32 elements, then this would indicate that relevant parts of the map implementation are not reached (if I remember right, than maps are implemented as 32-trie or similar). It becomes more relevant if we generate recursive data structures. Let us generate a tree with Regarding the UI: From my perspective the statistics are primarily required during constructing the properties and fixing the bugs. Often, you will only run one property at a time. In this case, it is more than ok to have the statistics printed after the property log. If you run the properties in a longer test run, they become either as silent as a test log output (I don't want to see all the dots) or they appear in the (CI) test run log output, where they do no harm. In PropEr you can configure your own statistic writer, which may help to do a specific formatting or similar. It might be even useful to generate HTML or Markdown tables as usually done for code coverage reports. What I like in Real world examples and blog posts on these features are here:
|
As I see it right now we have two routes to explore:
|
I prefer a functional approach, but without macros, if feasible. If you look at the F# examples (which are quite similar to propcheck and Proper), than we could enrich the last check_all(list_of(integer()), [initial_seed: :os.timestamp()], fn list ->
assert length(list) >= 0
|> aggregate("length of list", length(list))
|> classify("trivial", length(list) == 0)
|> ok()
end)
Along this road, I would go for this: In The stateful approach looks nice at first glance and your thoughts about "debugging" is more than true. But to start an explicit aggregation server for such a task feels alien (and imperative). Providing a global server would require to identify each (concurrent) property execution automatically, which could destroy the elegance of your example. What do you think? |
To be honest, I am not convinced about any. My favorite approach would probably be: aggregate all list <- list_of(integer()) do
%{list_length: length(list)
end but it is unclear how feasible it is in practice to replace the Annotating the assertion would be too foreign for elixir unfortunately, since it requires the block to return the assertion and that's generally not how exunit assertions work. |
This would not work, if you want to use several aggregations at the same time, which is usually the case when you classify the generated data e.g. into buckets like I do not understand your comment regarding annotating the assertion. What I meant is to take the result of the (executed) assertion (i.e. property "non negative list length" do
check all list <- list_of(integer()) do
assert length(list) >= 0
|> aggregate("list length", length(list))
end
end |
I see, thanks!
|
@josevalim The aggregators could simply start from nothing and build a list of aggregations in their pipeline. Same approach, but one parameter less. |
@alfert returning
Doesn't mean that if we can do something good we can avoid it because other frameworks avoid it :) In general I think your approach is a bit far from the Elixir language and too close to FsCheck (as far as I know FsCheck). It feels very foreign in Elixir to me. |
@whatyouhide Properties are by their very nature boolean statements. But the Did I get you right that you would prefer an approach like for pipeline :browser do
plug :accepts, ["html"]
plug :fetch_session
plug :fetch_flash
plug :protect_from_forgery
plug :put_secure_browser_headers
plug HelloWeb.Plugs.Locale, "en"
end Applying this idea, we could define a check_all(list_of(integer()), [initial_seed: :os.timestamp()], fn list ->
assert length(list) >= 0
stats aggregate("length of list", length(list))
stats classify("trivial", length(list) == 0)
stats classify("huge list", length(list) > 50)
end) or perhaps even more in the check_all(list_of(integer()), [initial_seed: :os.timestamp()], fn list ->
assert length(list) >= 0
stats :aggregate, "length of list", length(list)
stats :classify, "trivial", length(list) == 0
stats :classify "huge list", length(list) > 50
end) Optically, we replaced |
I meant to use an |
@whatyouhide Could you please rewrite my last example in your way? I am not sure how I would have to formulate it properly. Perhaps I am thinking in a complete wrong direction. Thanks! |
Sorry I thought you were talking about check all list <- list_of(integer()) do
aggregate "length of list", length(list)
classify "trivial", length(list) == 0
classify "huge list", length(list) > 50
assert length(list) >= 0
end or something of the sorts. |
Ok, so if the property fails we have also the stats for failing property run. This is good! I think that |
Yes they have to be macros. For what it's worth, |
Correct, but when assert raises, the following commands are not executed. So, unless you reorder the statements in the property as part of the implementation , their order is significant. But's it is a good style to collect first the stats and then do the checking. The classical implementation strategy that stats collection is a wrapper of the property does not apply here. |
Right, I was missing that, sorry. So yeah we might have to do the aggregations as soon as possible in the test. |
If you want to collect them as soon as possible, you can collect stats before the do block. Like this: property "something" do
check all x <- gen1(), y <- gen2(), z <- gen3(),
aggregate: [
key1: stat1(x),
key2: stat2(x),
key3: stat3(y)
] do
assert_something(x, y, z)
end
end You get a nice AST out of this from this of course: {:property, [],
["something",
[do: {:check, [],
[{:all, [],
[{:<-, [], [{:x, [], Elixir}, {:gen1, [], []}]},
{:<-, [], [{:y, [], Elixir}, {:gen2, [], []}]},
{:<-, [], [{:z, [], Elixir}, {:gen3, [], []}]},
[aggregate: [key1: {:stat1, [], [{:x, [], Elixir}]},
key2: {:stat2, [], [{:x, [], Elixir}]},
key3: {:stat3, [], [{:y, [], Elixir}]}]]]},
[do: {:assert_something, [],
[{:x, [], Elixir}, {:y, [], Elixir}, {:z, [], Elixir}]}]]}]]} The main problem here is that it requires you to gather the statistics on the "raw" values generated by the generators. You might be more interested in some intermediate values. That's what I believe to be the hardest problem, collecting statistics as late as possible. One way to gather statistics whenever you want is to gather them in the process dictionary and get them from there after the test case is run. If you use the process dictionary you don't even need macros; functions would be good enough. It's probably better than to gather them in a different process, at least in this case. For raw generators (not properties), you can just tell the user to wrap it in a |
Is there some progress on this ? If not can I or someone else help make this a thing ? |
In QuickCheck and PropEr, you can add functions to a property which collect statistical data about the generated data or add labels for classification of generated data. These information are printed on the screen after executing a property. They help to asses the quality of the generator and are useful to analyse whether relevant test cases/branches are executed with the generated data.
At least in PropEr, this is a feature requiring support of the property execution function since the results of a property could contain the statistical data. To calculate the statistics the result of a property must be fed into accumulation functions. The accumulator is printed after executing the property.
The text was updated successfully, but these errors were encountered: