-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some skeleton tests are extremely slow on Windows #331
Comments
I store the execution time of the each generation in separated
So here are some results (some tests never finished): $ sort -n <(for f in $(ls *_*.log); do cat $f | grep -o -E '\([1-9]+.*\)' | tr -d '()' ; done) | column
1.0123708 1.2061186 1.398478 1.7313844 3.545979 6.2762644 19.1019572
1.0175466 1.2140428 1.3999757 1.7506811 3.6755543 6.320666 19.8422225
1.0207795 1.2177153 1.4003677 1.8222493 3.7104511 6.3240929 19.8937025
1.0279499 1.2271273 1.4299672 1.8373235 3.9307811 6.461851 20.5925374
1.0347854 1.2375833 1.4381695 1.871309 4.1462893 6.5952994 20.6457992
1.0506582 1.2478265 1.4416772 2.1121065 4.2338955 6.605284 20.8574713
1.0710235 1.2577638 1.4619963 2.1282506 4.9962177 6.8190486 21.0589518
1.092179 1.2784787 1.4866995 2.1299754 5.11254 7.4957975 21.3088674
1.0995379 1.2806526 1.5073277 2.2660121 5.2844923 7.6238291 21.6054515
1.1423733 1.2822474 1.5266376 2.3043965 5.3296963 7.642483 28.5312305
1.156332 1.2883214 1.5412814 2.3535283 5.4675259 7.8018155 132.8422803
1.162659 1.2978251 1.5423705 2.4174622 5.4926532 9.3340122 265.9861263
1.1639108 1.3222313 1.5683961 2.4445399 5.5172051 9.52987 491.276471
1.1711366 1.3363735 1.568921 2.7039793 5.5980776 9.7681643 492.4069617
1.1730053 1.3391551 1.5843521 2.7298563 5.6159203 9.9542416 3828.0381905
1.1743305 1.3530512 1.6159905 2.7907898 5.6598511 10.7178018 3838.0232887
1.1799193 1.3593102 1.6267786 3.1749009 5.9794209 13.0525341
1.1804136 1.3800682 1.6795056 3.2259422 6.0503126 13.2396031
1.1934251 1.3883636 1.7041505 3.3845808 6.1115797 15.0390006
1.2053042 1.3914672 1.7158329 3.5315845 6.2747898 19.0428656 |
This is very interesting! A few questions off the top of my head:
|
One more in that direction: skeleton-specific output differs from normal output in that it is binary ( |
As I sad before I'm currently developing skeleton validation support. Regular tests are not ready, so I cannot measure programmatically. Calling the following command by hand: $Executable = "C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe"
$Options = "-ib -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
$IO = "bug1708378.re -o bug1708378.c"
$Command = "$Executable $IO $Options"
Remove-Item bug1708378.c* -Force
Measure-Command { Invoke-Expression $Command | Out-Null } generates output for 13 seconds:
File sizes are (in bytes):
Note, in this case, this is a manual command run, without any wrappers, just call re2c from the terminal. Without - $Options = "-ib -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
+ $Options = "-ib -W --no-version --no-generation-date -Werror-undefined-control-flow" Only 2 seconds (in 6 times faster):
File sizes are (in bytes):
By the way, despite a 6x run time improvement, 2 seconds for |
Ok, so the files are generated properly and have reasonable size.
Correct, this is way too much time for this tests.
I don't think that it has to do with the hardware. Judging from the tests, I think the slowdown is caused by the way re2c is writing the output. In case of skeleton tests, re2c makes many This is only a guess, but it is the first thing that I'd check. re2c needs a |
Yet another generation I managed to measure: $Executable = "C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe"
$Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
$IO = "c\submatch\http_rfc7230.re -o c\submatch\http_rfc7230.c"
$Command = "$Executable $IO $Options"
Remove-Item http_rfc7230_skeleton.c* -Force
Measure-Command { Invoke-Expression $Command | Out-Null } Only 2 seconds:
The same but with skeleton: - $Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
+ $Options = "-i -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow" It lasted forever:
File sizes are (in bytes):
|
Interesting, this one has approximately the same size of the largest skeleton file If you have spare cycles, can you make an experiment and try https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re instead of But it may be easier to use Windows profiling tools (I've heard of ETW). |
It is noteworthy that the generation in Windows, using Mingw, is much faster than using Visual Studio. Without $Executable = "C:\src\re2c\cmake-build-debug-mingw\re2c.exe"
$Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
$IO = "http_rfc7230.re -o http_rfc7230_mingw.c"
$Command = "$Executable $IO $Options"
Measure-Command { Invoke-Expression $Command | Out-Null } it takes 588 milliseconds (4.4x faster than using Visual Studio):
File sizes are (in bytes):
with - $Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
+ $Options = "-i -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
- $IO = "http_rfc7230.re -o http_rfc7230_mingw.c"
+ $IO = "http_rfc7230.re -o http_rfc7230_mingw_skeleton.c" takes about 187 seconds (20.8x faster than using Visual Studio):
File sizes are (in bytes):
re2c was built with g++ v8.1.0 and the following CMake options:
|
Mingw x64
Visual Studio 2019 x64
|
@skvadrik Does this measurement show tags impact for slowdown? |
@sergeyklay Without tags it is definitely faster (with skeleton: Mingw ~140x, VS ~470x; without skeleton: Mingw ~3x, VS: ~2x). This is aligned with the behaviour on Linux: without tags it takes 100ms, and with tags 21s (~210x). Perf tells me that 98% of time is spent in
So this is a non-optimized build. Perhaps this explains the slowdown, as well as the difference between VS and Mingw (at zero levels of optimizations they both generate inefficient code, but with different level of inefficiency). How does an optimized build behave? |
|
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 146.4076 ms | 431.4192 ms |
Size http_rfc7230_notags.c |
147943 B | 230746 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
Visual Studio 2019 x64
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 1064.8614 ms | 8330.2958 ms |
Size http_rfc7230_notags.c |
147943 B | 230746 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
http_rfc7230.re
https://github.com/skvadrik/re2c/blob/master/examples/c/submatch/http_rfc7230.re
Mingw x64
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 210.038 ms | 49896.8644 ms |
Size http_rfc7230.c |
198597 B | 313746 B |
Size http_rfc7230.c.line162.input |
0 B | 27890920 B |
Size http_rfc7230.c.line162.keys |
0 B | 3769927 B |
Visual Studio 2019 x64
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 2451.7808 ms | 3821611.5262 ms |
Size http_rfc7230.c |
198597 B | 313746 B |
Size http_rfc7230.c.line162.input |
0 B | 27890920 B |
Size http_rfc7230.c.line162.keys |
0 B | 3769927 B |
CMake flags
Mingw x64
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Visual Studio 2019 x64
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64
Ok, so there is not much difference with optimizations, and it doesn't bridge the gap between VS and Mingw. I've had a closer look at why I managed to get about 4x speedup (21s -> 5s) by reordering some loops and reducing matrix size to As to why it is particularly slow on VS, I suppose it can be caused by a different allocator. |
Hmm.. Very interesting! I hope we can get the best result for VS on this field. Feel free to ping me for any test. Thank you for the research. |
Previously tag values were stored in vectors (even thought most of the tags are s-tags and have only one element in the history). For tests with many tags this represenation was taking too much space and time spent on vector initialization. Now all tag values are stored as scalars (32-bit integers). For s-tags the value is the offset. For m-tags the value is an index in the tag trie (a prefix tree of tag histories that is encoded in an array). This representation is faster and takes less space. This allows to save space that was previously used for storing intermediate tag values of tag variables as vectors This partially fixes #331.
Wow! I'll check today and give you a feedback. Thank you! |
There are benchmark results:
|
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 180.0535 ms | 618.603 ms |
Size http_rfc7230_notags.c |
147943 B | 230746 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
Visual Studio 2019 x64
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 1042.8478 ms | 8132.859 ms |
Size http_rfc7230_notags.c |
147943 B | 230746 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
http_rfc7230.re
https://github.com/skvadrik/re2c/blob/master/examples/c/submatch/http_rfc7230.re
Mingw x64
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 226.9931 ms | 3098.5436 ms |
Size http_rfc7230.c |
198597 B | 313746 B |
Size http_rfc7230.c.line162.input |
0 B | 27890920 B |
Size http_rfc7230.c.line162.keys |
0 B | 3769927 B |
Visual Studio 2019 x64
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 2457.6249 ms | 27062.125 ms |
Size http_rfc7230.c |
198597 B | 313746 B |
Size http_rfc7230.c.line162.input |
0 B | 27890920 B |
Size http_rfc7230.c.line162.keys |
0 B | 3769927 B |
CMake flags
Mingw x64
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Visual Studio 2019 x64
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64
Result
Before your changes re2c compiled with Visual Studio was 458 times slower than re2c compiled with Mingw.
After your changes re2c compiled with Visual Studio became 8.7 times slower than re2c compiled with Mingw.
Good, then it makes things a bit faster! I pushed one more patch (4a47ded), but it makes no noticeable difference on my system. How slow is the full test run now? |
I'll test it ASAP and give you a feedback. Good job! Thank you! |
I was able to do a minimal load testing. The timings were improved a lot and in general the tests run significant faster. However, some tests are still not fast enough to use for CI pipelines. Below I provide the slowest code generation with re2c compiled using 4a47ded commit. For Mingw I used the following command line:
The same command was used for a skeleton validation (with For Visual Studio I used the following command line:
The same command was used for a skeleton validation (with The measurements below were taken using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re There are benchmark results: Mingw x64
Visual Studio 2019 x64
Resultwith without Please note,
I have no idea why it wasn't generated. And I have not seen any warning or error messages. CMake flagsMingw x64
Visual Studio 2019 x64
|
In accordance with #331 (comment) here are slowest tests (all tests were finished): $ sort -n <(for f in $(ls *_*.log); do cat $f | grep -o -E '\([1-9]+.*\)' | tr -d '()' ; done) | column
1.000551 1.049238 1.0985912 1.1610301 1.3881291 2.4166578 7.7322819
1.0026765 1.0521585 1.1069511 1.1654575 1.4155576 3.4287334 8.705518
1.0068045 1.0574971 1.1199378 1.1672974 1.4557006 3.7058299 9.1262585
1.006806 1.0678113 1.1217741 1.1721492 1.4865933 5.1395657 11.3622493
1.0095393 1.0693213 1.1249263 1.1745822 1.5288221 5.1630022 11.7143255
1.0138449 1.0774119 1.1285558 1.1829852 1.5455525 5.2170471 12.1913201
1.0157868 1.0786276 1.131142 1.2105879 1.5553722 5.2170917 28.6310193
1.0172572 1.0815964 1.1355961 1.2214821 1.5648215 5.2423219 35.6424253
1.0175864 1.0824848 1.1384727 1.2247057 1.5660845 5.2487696 242.034956
1.0217653 1.0858299 1.1394903 1.2426674 1.5718653 5.2702669
1.0258584 1.0898011 1.1481312 1.2909222 1.596262 5.8009955
1.0300305 1.0910497 1.1509711 1.3091419 1.6746581 6.5150299
1.0337268 1.0913256 1.1549397 1.3220114 1.9889176 6.5898417
1.0411128 1.0932286 1.155237 1.3313342 2.1939683 6.7280103
1.0451031 1.0933827 1.1594367 1.3709589 2.3234237 7.6762901 |
For the remaining slow tests, can you post test names in addition to timings (maybe just a few slow tests)? |
Let me check this tomorrow. I'll provide the Top 10 slowest tests with timings. Thanks for your efforts to make re2c faster on Windows! |
Here are the test timings: For Mingw I used the following command line:
For Visual Studio I used the following command line:
Below I provide the code generation results with re2c compiled using 70c45d1 commit. The no-skeleton timings have changed within the margin of error, so I'm not providing them here. The timing were taken using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re
Note that the generation time was reduced twice even for Mingw. Good job! Here are the slowest code-generations (except
|
Thanks for the timings @sergeyklay! I realized during my local experiments with This shouldn't have any impact on your experiments, as |
Anyway, I'm glad this issue helped so many things to be found that could be improved! |
|
@sergeyklay Maybe what we need for debug Visual Studio build is:
And for release build simply:
|
This sounds interesting. I'll need some time to investigate the solution. Thank you for the tip. Another issues that bother me:
|
Right. Fiddling with CMAKE_CXX_FLAGS directly is not a modern CMake whatsoever. We should always use check_cxx_compiler_flag. And this why I'm not a big fan to set CMAKE_CXX_FLAGS in workflow configuration file. This rework was in my future plans. |
Does this also happen in optimized Visual Studio build? Unlike |
it looks like it's detaching |
@skvadrik Good job! |
This commit b8b107a should workaround the infinite loop in Visual Studio Debug build. I don't think that tweaking the code to workaround such bugs is a good strategy, but this particular change makes sense in general. |
Let me check this. I'll provide result ASAP. |
Below I provide the skeleton generation results with re2c compiled using commit 745f6e3 The timings were obtained using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re
All necessary files were generated (including
As I can see now everything is OK! This is what PowerShell tests wrapper says:
|
\o/ Great! Total time It might be worth reporting the Visual Studio problem with the infinite loop in Debug build. |
I'm still working on this. Not all things are ready yet. But I'll provide results ASAP.
Yeah. But TBH, I don't fully understand the nature of the issue and can't provide a minimal PoC |
Minimal PoC will require effort, an I'm reluctant to spend the effort because I'm not sure the bug hasn't been fixed in newer or non-free versions. But I can provide exact instructions on reproducing this in Cygwin environment (starting with checking out re2c from git an on to running the hanging test). I'll have a look if there is an easy way to send such a bug report. |
Please keep us informed on this issue. |
@skvadrik I'm sorry for the silence and lack of any activity in this direction. Increased activity in my main job due to the end of the year. In any case I'm always here :) |
@sergeyklay It's perfectly fine, I have the same problem with my day job. ;) I haven't merged anything into master for weeks. In part this is because I have little time, and in part because I'm working on a local experiment. And I assume that you also have good reasons. Thanks for letting me know anyway, much appreciated! |
@skvadrik The work has not been completed yet and some things have yet to be implemented, but general PoC looks like: https://github.com/sergeyklay/re2c/blob/feature/powershell-test-runner/run_tests.ps1 I'm not a Windows user, although I have a PC running this system. And looking back at the whole PowerShell-tests-runner-journey, I understand that I would not want to become the main maintainer of this solution. The issue from strategic point of view is, in the long run, I would not want to support this solution. Therefore, I would like to propose a universal solution that works equally well in all major systems - Python. What do you think about a test runner written in Python for Linux/UNIX as well as Windows systems? Would this be an acceptable for re2c project? |
@sergeyklay That's a lot of work! Of course it's understandable if you wouldn't want to maintain it in the long run. I myself run Windows in a VM, and it is extremely slow and inconvenient. But the main disadvantage it the necessity to maintain different scripts on different platforms, they would diverge over time.
Certainly, I think it is the best option. run_tests.sh is too complex for a bash script, and most of the time is spent on the test harness, not on running the tests themselves. And it's hard to maintain portability. |
Fine! I'll start work on unified solution then! |
Great! \o/ |
Skeleton tests are a little slow, but and they already pass too: Run as
|
Awesome, thanks for all your work! \o/ |
A few notes from my latest unsuccessful attempt to enable skeleton tests on windows (and check if they are still too slow):
|
Hello,
I'm currently developing a native PowerShell wrapper for running re2c tests on Windows without resorting to Mingw, Cygwin and so on. I managed to write multi-threaded tests runner using PowerShell only (right now for skeleton tests only).
I run tests on my 20 cores Xeon in parallel and everything is going well except for some tests. They are super slow. After a dig into, I realized that these tests are extremely slow due to the generation process. Below I provide measurements of some generations:
6.8190486 seconds
9.3340122 seconds
7.642483 seconds
13.0525341 seconds:
265.9861263 seconds: 🎉
Actually this is not a complete list. Some tests never finished. According to my quick look, slow tests are about 9%. The rest of the tests are as fast as, for example, in macOs or Linux. re2c using built with the following configuration
Manual run shows the same degradation:
I will publish additional comments during my research.
The text was updated successfully, but these errors were encountered: