Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some skeleton tests are extremely slow on Windows #331

Open
sergeyklay opened this issue Oct 23, 2020 · 63 comments
Open

Some skeleton tests are extremely slow on Windows #331

sergeyklay opened this issue Oct 23, 2020 · 63 comments

Comments

@sergeyklay
Copy link
Collaborator

sergeyklay commented Oct 23, 2020

Hello,

I'm currently developing a native PowerShell wrapper for running re2c tests on Windows without resorting to Mingw, Cygwin and so on. I managed to write multi-threaded tests runner using PowerShell only (right now for skeleton tests only).

I run tests on my 20 cores Xeon in parallel and everything is going well except for some tests. They are super slow. After a dig into, I realized that these tests are extremely slow due to the generation process. Below I provide measurements of some generations:

6.8190486 seconds

& C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "debug\closure_stats_gtop.re" `
    -o "debug\closure_stats_gtop.c" `
    -i `
    --posix-captures `
    --posix-closure gor1 `
    --dump-closure-stats `
    --fixed-tags toplevel `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

9.3340122 seconds

C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "encodings\class3.re" `
    -o "encodings\class3.c" `
    -i8 `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

7.642483 seconds

C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "debug\closure_stats_gor1.re" `
    -o "debug\closure_stats_gor1.c" `
    -i `
    --posix-captures `
    --posix-closure gtop `
    --dump-closure-stats 
    --fixed-tags toplevel `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

13.0525341 seconds:

& C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "bug1708378.re" `
    -o "bug1708378.c" `
    -ib `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

265.9861263 seconds: 🎉

& C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "bug128.re" `
    -o "bug128.c"  `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

Actually this is not a complete list. Some tests never finished. According to my quick look, slow tests are about 9%. The rest of the tests are as fast as, for example, in macOs or Linux. re2c using built with the following configuration

  • Compiler: Visual Studio 2019
  • Platform: x64
  • CPU: Xeon E5-2628L v3 (10 physical cores, 20 logical with HT)
  • CMake flags:
    -DCMAKE_BUILD_TYPE=Debug
    -DRE2C_BUILD_RE2GO=on
    -DCMAKE_INSTALL_PREFIX="C:\src\re2c\install"
    -G "Visual Studio 16 2019"
    -A x64
    

Manual run shows the same degradation:

PS C:\src\re2c\test_201023160030> Measure-Command { & C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe `
>> bug1708378.re `
>> -o bug1708378.c `
>> -ib `
>> -W `
>> --no-version `
>> --no-generation-date `
>> --skeleton `
>> -Werror-undefined-control-flow }

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 12
Milliseconds      : 923
Ticks             : 129239968
TotalDays         : 0.000149583296296296
TotalHours        : 0.00358999911111111
TotalMinutes      : 0.215399946666667
TotalSeconds      : 12.9239968
TotalMilliseconds : 12923.9968

I will publish additional comments during my research.

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 23, 2020

I store the execution time of the each generation in separated *.log file in the following form:

<re2c executable> <re2c options> (<time>)

So here are some results (some tests never finished):

$ sort -n <(for f in $(ls *_*.log); do cat $f | grep -o -E '\([1-9]+.*\)' | tr -d '()' ; done) | column

1.0123708       1.2061186       1.398478        1.7313844       3.545979        6.2762644       19.1019572
1.0175466       1.2140428       1.3999757       1.7506811       3.6755543       6.320666        19.8422225
1.0207795       1.2177153       1.4003677       1.8222493       3.7104511       6.3240929       19.8937025
1.0279499       1.2271273       1.4299672       1.8373235       3.9307811       6.461851        20.5925374
1.0347854       1.2375833       1.4381695       1.871309        4.1462893       6.5952994       20.6457992
1.0506582       1.2478265       1.4416772       2.1121065       4.2338955       6.605284        20.8574713
1.0710235       1.2577638       1.4619963       2.1282506       4.9962177       6.8190486       21.0589518
1.092179        1.2784787       1.4866995       2.1299754       5.11254         7.4957975       21.3088674
1.0995379       1.2806526       1.5073277       2.2660121       5.2844923       7.6238291       21.6054515
1.1423733       1.2822474       1.5266376       2.3043965       5.3296963       7.642483        28.5312305
1.156332        1.2883214       1.5412814       2.3535283       5.4675259       7.8018155       132.8422803
1.162659        1.2978251       1.5423705       2.4174622       5.4926532       9.3340122       265.9861263
1.1639108       1.3222313       1.5683961       2.4445399       5.5172051       9.52987         491.276471
1.1711366       1.3363735       1.568921        2.7039793       5.5980776       9.7681643       492.4069617
1.1730053       1.3391551       1.5843521       2.7298563       5.6159203       9.9542416       3828.0381905
1.1743305       1.3530512       1.6159905       2.7907898       5.6598511       10.7178018      3838.0232887
1.1799193       1.3593102       1.6267786       3.1749009       5.9794209       13.0525341
1.1804136       1.3800682       1.6795056       3.2259422       6.0503126       13.2396031
1.1934251       1.3883636       1.7041505       3.3845808       6.1115797       15.0390006
1.2053042       1.3914672       1.7158329       3.5315845       6.2747898       19.0428656

@sergeyklay sergeyklay changed the title Some skeleton tests are extremely slow on WIndows Some skeleton tests are extremely slow on Windows Oct 23, 2020
@skvadrik
Copy link
Owner

This is very interesting! A few questions off the top of my head:

  • Are these tests slow only with --skeleton option? What happens if you drop that option?
  • If --skeleton is to blame, are tests generating output as they run? There should be *.keys and *.input files.
  • If the files are generated, how large are they? Are they under 1G? Are they about the same size as on Linux/macOS?
  • Could the slowness be caused by writing large unbuffered output? It should be easy to check by commenting out lines
    fwrite(buffer, sizeof(cunit_t), size, file);
    and
    fwrite(keys, sizeof(key_t), nkey, file);
    .

@skvadrik
Copy link
Owner

One more in that direction: skeleton-specific output differs from normal output in that it is binary (fopen is called with "wb").

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 24, 2020

Are these tests slow only with --skeleton option? What happens if you drop that option?

As I sad before I'm currently developing skeleton validation support. Regular tests are not ready, so I cannot measure programmatically. Calling the following command by hand:

$Executable = "C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe"
$Options = "-ib -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
$IO = "bug1708378.re -o bug1708378.c"

$Command = "$Executable $IO $Options"

Remove-Item bug1708378.c* -Force
Measure-Command { Invoke-Expression $Command | Out-Null }

generates output for 13 seconds:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 12
Milliseconds      : 905
Ticks             : 129053813
TotalDays         : 0.00014936783912037
TotalHours        : 0.00358482813888889
TotalMinutes      : 0.215089688333333
TotalSeconds      : 12.9053813
TotalMilliseconds : 12905.3813

File sizes are (in bytes):

PS C:\src\re2c\test_201023160030> ls bug1708378.*

    Directory: C:\src\re2c\test_201023160030

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          10/24/2020 12:08 PM         566983 bug1708378.c
-a---          10/24/2020 12:08 PM       28026493 bug1708378.c.line216.input
-a---          10/24/2020 12:08 PM        3475398 bug1708378.c.line216.keys
-a---          10/17/2020 12:23 PM          14989 bug1708378.re

Note, in this case, this is a manual command run, without any wrappers, just call re2c from the terminal.

Without --skeleton option generation time is notable better:

- $Options = "-ib -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
+ $Options = "-ib -W --no-version --no-generation-date -Werror-undefined-control-flow"

Only 2 seconds (in 6 times faster):

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 2
Milliseconds      : 808
Ticks             : 28083572
TotalDays         : 3.25041342592593E-05
TotalHours        : 0.000780099222222222
TotalMinutes      : 0.0468059533333333
TotalSeconds      : 2.8083572
TotalMilliseconds : 2808.3572

File sizes are (in bytes):

PS C:\src\re2c\test_201023160030> ls bug1708378.*

    Directory: C:\src\re2c\test_201023160030

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          10/24/2020 12:09 PM         365074 bug1708378.c
-a---          10/17/2020 12:23 PM          14989 bug1708378.re

By the way, despite a 6x run time improvement, 2 seconds for bug1708378.re is still not the best time, isn't? Just to get ahead of your question, I want to say that all tests are performed using the following V-NAND SSD: Samsung SSD 970 EVO Plus NVMe M.2

@skvadrik
Copy link
Owner

Ok, so the files are generated properly and have reasonable size.

2 seconds for bug1708378.re is still not the best time, isn't?

Correct, this is way too much time for this tests. re2c bug1708378.re -o bug1708378.c takes under 45 milliseconds on my Linux machine. And with skeleton option re2c bug1708378.re -S -o bug1708378.c takes about 600 milliseconds, which is also fast.

tests are performed using the following V-NAND SSD

I don't think that it has to do with the hardware. Judging from the tests, I think the slowdown is caused by the way re2c is writing the output. In case of skeleton tests, re2c makes many fwrite calls (each appending a small bit to the output file) and relies on the system libc to make it buffered, so that many small fwrite operations are joined into large and less frequent write system calls (or the Windows analogue of write). In case of non-skeleton output it's probably not the system calls, but the mixture of stdio and iostream operations (a different libc/lbstdc++ implementation on Windows).

This is only a guess, but it is the first thing that I'd check.

re2c needs a --time-profile option. :)

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 24, 2020

Yet another generation I managed to measure:

$Executable = "C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe"
$Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
$IO = "c\submatch\http_rfc7230.re -o c\submatch\http_rfc7230.c"

$Command = "$Executable $IO $Options"

Remove-Item http_rfc7230_skeleton.c* -Force
Measure-Command { Invoke-Expression $Command | Out-Null }

Only 2 seconds:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 2
Milliseconds      : 588
Ticks             : 25884700
TotalDays         : 2.99591435185185E-05
TotalHours        : 0.000719019444444444
TotalMinutes      : 0.0431411666666667
TotalSeconds      : 2.58847
TotalMilliseconds : 2588.47

The same but with skeleton:

- $Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
+ $Options = "-i -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"

It lasted forever:

Days              : 0
Hours             : 1
Minutes           : 5
Seconds           : 1
Milliseconds      : 833
Ticks             : 39018332165
TotalDays         : 0.0451601066724537
TotalHours        : 1.08384256013889
TotalMinutes      : 65.0305536083333
TotalSeconds      : 3901.8332165
TotalMilliseconds : 3901833.2165

File sizes are (in bytes):

PS C:\src\re2c> ls .\test_201024170134\c\submatch\http_rfc7230*

    Directory: C:\src\re2c\test_201024170134\c\submatch

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----

-a---          10/24/2020  6:31 PM         313786 http_rfc7230.c
-a---          10/24/2020  6:31 PM       27890920 http_rfc7230.c.line282.input
-a---          10/24/2020  6:31 PM        3769927 http_rfc7230.c.line282.keys
-a---          10/17/2020 12:23 PM           9831 http_rfc7230.re

@skvadrik
Copy link
Owner

Interesting, this one has approximately the same size of the largest skeleton file http_rfc7230.c.line282.input (27 MB), but it took over an hour compared to bug1708378.c.line216.input that took 13 seconds. My output-generation hypothesis doesn't fit here.

If you have spare cycles, can you make an experiment and try https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re instead of http_rfc7230.re? It is the same regular expression stripped of all tags. It may give us insight whether tags have something to do with the slowdown.

But it may be easier to use Windows profiling tools (I've heard of ETW).

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 25, 2020

It is noteworthy that the generation in Windows, using Mingw, is much faster than using Visual Studio.

Without --skeleton option:

$Executable = "C:\src\re2c\cmake-build-debug-mingw\re2c.exe"
$Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
$IO = "http_rfc7230.re -o http_rfc7230_mingw.c"

$Command = "$Executable $IO $Options"

Measure-Command { Invoke-Expression $Command | Out-Null }

it takes 588 milliseconds (4.4x faster than using Visual Studio):

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 588
Ticks             : 5889669
TotalDays         : 6.81674652777778E-06
TotalHours        : 0.000163601916666667
TotalMinutes      : 0.009816115
TotalSeconds      : 0.5889669
TotalMilliseconds : 588.9669

File sizes are (in bytes):

PS C:\src\re2c\test_manual> ls *

    Directory: C:\src\re2c\test_manual

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          10/25/2020 12:07 PM         198597 http_rfc7230_mingw.c
-a---          10/17/2020 12:23 PM           9831 http_rfc7230.re

with --skeleton option

- $Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
+ $Options = "-i -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"


- $IO = "http_rfc7230.re -o http_rfc7230_mingw.c"
+ $IO = "http_rfc7230.re -o http_rfc7230_mingw_skeleton.c"

takes about 187 seconds (20.8x faster than using Visual Studio):

Days              : 0
Hours             : 0
Minutes           : 3
Seconds           : 7
Milliseconds      : 597
Ticks             : 1875970618
TotalDays         : 0.00217126228935185
TotalHours        : 0.0521102949444444
TotalMinutes      : 3.12661769666667
TotalSeconds      : 187.5970618
TotalMilliseconds : 187597.0618

File sizes are (in bytes):

PS C:\src\re2c\test_manual> ls *

    Directory: C:\src\re2c\test_manual

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          10/25/2020 12:19 PM         313776 http_rfc7230_mingw_skeleton.c
-a---          10/25/2020 12:19 PM       27890920 http_rfc7230_mingw_skeleton.c.line282.input
-a---          10/25/2020 12:19 PM        3769927 http_rfc7230_mingw_skeleton.c.line282.keys
-a---          10/17/2020 12:23 PM           9831 http_rfc7230.re

re2c was built with g++ v8.1.0 and the following CMake options:

-DCMAKE_BUILD_TYPE=Debug
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 25, 2020

Interesting, this one has approximately the same size of the largest skeleton file http_rfc7230.c.line282.input (27 MB), but it took over an hour compared to bug1708378.c.line216.input that took 13 seconds. My output-generation hypothesis doesn't fit here.

If you have spare cycles, can you make an experiment and try https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re instead of http_rfc7230.re? It is the same regular expression stripped of all tags. It may give us insight whether tags have something to do with the slowdown.

But it may be easier to use Windows profiling tools (I've heard of ETW).

Mingw x64

without --skeleton with --skeleton
Generation Time 191.7389 ms 1323.9391 ms
Size http_rfc7230_notags.c 147943 B 230776 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B

Visual Studio 2019 x64

without --skeleton with --skeleton
Generation Time 1226.1267 ms 8138.2939 ms
Size http_rfc7230_notags.c 147943 B 230774 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B

@sergeyklay
Copy link
Collaborator Author

@skvadrik Does this measurement show tags impact for slowdown?

@skvadrik
Copy link
Owner

@sergeyklay Without tags it is definitely faster (with skeleton: Mingw ~140x, VS ~470x; without skeleton: Mingw ~3x, VS: ~2x). This is aligned with the behaviour on Linux: without tags it takes 100ms, and with tags 21s (~210x). Perf tells me that 98% of time is spent in re2c::write_keys<unsigned char>, in allocating and deleting vectors. I'll have a look at it.

re2c was built with g++ v8.1.0 and the following CMake options:

-DCMAKE_BUILD_TYPE=Debug
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"

So this is a non-optimized build. Perhaps this explains the slowdown, as well as the difference between VS and Mingw (at zero levels of optimizations they both generate inefficient code, but with different level of inefficiency). How does an optimized build behave?

@sergeyklay
Copy link
Collaborator Author

How does an optimized build behave?

http_rfc7230_notags.re

https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re

Mingw x64
without --skeleton with --skeleton
Generation Time 146.4076 ms 431.4192 ms
Size http_rfc7230_notags.c 147943 B 230746 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 1064.8614 ms 8330.2958 ms
Size http_rfc7230_notags.c 147943 B 230746 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B

http_rfc7230.re

https://github.com/skvadrik/re2c/blob/master/examples/c/submatch/http_rfc7230.re

Mingw x64
without --skeleton with --skeleton
Generation Time 210.038 ms 49896.8644 ms
Size http_rfc7230.c 198597 B 313746 B
Size http_rfc7230.c.line162.input 0 B 27890920 B
Size http_rfc7230.c.line162.keys 0 B 3769927 B
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 2451.7808 ms 3821611.5262 ms
Size http_rfc7230.c 198597 B 313746 B
Size http_rfc7230.c.line162.input 0 B 27890920 B
Size http_rfc7230.c.line162.keys 0 B 3769927 B

CMake flags

Mingw x64
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Visual Studio 2019 x64
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64

@skvadrik
Copy link
Owner

Ok, so there is not much difference with optimizations, and it doesn't bridge the gap between VS and Mingw.

I've had a closer look at why http_rfc7230.re is so slow. It is caused by a naive way of computing tag values for skeleton. Since skeleton has to work on non-optimized TDFA, it has many tag variables (N). It is normal before renaming, most of the variables are unused and they are reduced by the optimization phase, but the whole point of skeleton is to test the optimized code on data produced for non-optimized code. Skeleton keys are generated by simulating TDFA execution on a set of M strings, applying tag operations at each step. The intermediate results are stored in a large matrix of size NxM (over 1000000 elements in total), where elements of the matrix are vectors (some tags have a history, ie they need all their values not just the last one). Unsurprisingly the allocation and matrix population takes time. Much of it is unused, but that doesn't help.

I managed to get about 4x speedup (21s -> 5s) by reordering some loops and reducing matrix size to Nx1, but for the real fix we'll need a more efficient way of storing intermediate results (perhaps a trie for the tags that have history).

As to why it is particularly slow on VS, I suppose it can be caused by a different allocator.

@sergeyklay
Copy link
Collaborator Author

Hmm.. Very interesting! I hope we can get the best result for VS on this field. Feel free to ping me for any test. Thank you for the research.

skvadrik added a commit that referenced this issue Oct 30, 2020
Previously tag values were stored in vectors (even thought most of the
tags are s-tags and have only one element in the history). For tests
with many tags this represenation was taking too much space and time
spent on vector initialization.

Now all tag values are stored as scalars (32-bit integers). For s-tags
the value is the offset. For m-tags the value is an index in the tag
trie (a prefix tree of tag histories that is encoded in an array). This
representation is faster and takes less space.

This allows to save space that was previously used for storing
intermediate tag values of tag variables as vectors

This partially fixes #331.
@skvadrik
Copy link
Owner

GitHub auto-closed this because of commit message text, but I didn't intend to.

I made a few improvements to skeleton tests: e8cdf1d and 80f364e. On my Linux machine this results in ~2-3x speedup on run_tests.sh --skeleton and more than 20x speedup for http_rfc7230.re in particular.

@skvadrik skvadrik reopened this Oct 30, 2020
@sergeyklay
Copy link
Collaborator Author

Wow! I'll check today and give you a feedback. Thank you!

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 30, 2020

I made a few improvements to skeleton tests: e8cdf1d and 80f364e.

There are benchmark results:

http_rfc7230_notags.re

https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re

Mingw x64
without --skeleton with --skeleton
Generation Time 180.0535 ms 618.603 ms
Size http_rfc7230_notags.c 147943 B 230746 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 1042.8478 ms 8132.859 ms
Size http_rfc7230_notags.c 147943 B 230746 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B

http_rfc7230.re

https://github.com/skvadrik/re2c/blob/master/examples/c/submatch/http_rfc7230.re

Mingw x64
without --skeleton with --skeleton
Generation Time 226.9931 ms 3098.5436 ms
Size http_rfc7230.c 198597 B 313746 B
Size http_rfc7230.c.line162.input 0 B 27890920 B
Size http_rfc7230.c.line162.keys 0 B 3769927 B
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 2457.6249 ms 27062.125 ms
Size http_rfc7230.c 198597 B 313746 B
Size http_rfc7230.c.line162.input 0 B 27890920 B
Size http_rfc7230.c.line162.keys 0 B 3769927 B

CMake flags

Mingw x64
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Visual Studio 2019 x64
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64

Result

Before your changes re2c compiled with Visual Studio was 458 times slower than re2c compiled with Mingw.

After your changes re2c compiled with Visual Studio became 8.7 times slower than re2c compiled with Mingw.

@skvadrik
Copy link
Owner

Good, then it makes things a bit faster! I pushed one more patch (4a47ded), but it makes no noticeable difference on my system.

How slow is the full test run now?

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 31, 2020

Good, then it makes things a bit faster! I pushed one more patch (4a47ded), but it makes no noticeable difference on my system.

How slow is the full test run now?

I'll test it ASAP and give you a feedback. Good job! Thank you!

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 31, 2020

I was able to do a minimal load testing. The timings were improved a lot and in general the tests run significant faster. However, some tests are still not fast enough to use for CI pipelines. Below I provide the slowest code generation with re2c compiled using 4a47ded commit.

For Mingw I used the following command line:

C:\src\re2c\cmake-build-debug-mingw\re2c.exe `
    bug128.re -o bug128.c -W --no-version --no-generation-date -Werror-undefined-control-flow

The same command was used for a skeleton validation (with --skeleton).

For Visual Studio I used the following command line:

C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe `
    bug128.re -o bug128.c -W --no-version --no-generation-date -Werror-undefined-control-flow

The same command was used for a skeleton validation (with --skeleton).

The measurements below were taken using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re

There are benchmark results:

Mingw x64
without --skeleton with --skeleton
Generation Time 353.0869 ms 9863.7646 ms
Size of bug128.c 16270 B 29694 B
Size of bug128.c.line8.input - 324560902 B
Size of bug128.c.line8.keys - 62416974
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 5973.2428 ms 221058.2635 ms
Size of bug128.c 16270 B -
Size of bug128.c.line8.input - 324560902 B
Size of bug128.c.line8.keys - 62416974 B
Result

with --skeleton:
re2c compiled with Visual Studio is about 22.4 times slower than re2c compiled with Mingw.

without --skeleton:
re2c compiled with Visual Studio is about 16.9 times slower than re2c compiled with Mingw.

Please note, bug128.c file was not created for the following scenario:

  • re2c was compiled with Visual Studio
  • --skeleton option was used

I have no idea why it wasn't generated. And I have not seen any warning or error messages.

CMake flags

Mingw x64
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Visual Studio 2019 x64
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Oct 31, 2020

In accordance with #331 (comment) here are slowest tests (all tests were finished):

$ sort -n <(for f in $(ls *_*.log); do cat $f | grep -o -E '\([1-9]+.*\)' | tr -d '()' ; done) | column

1.000551        1.049238        1.0985912       1.1610301       1.3881291       2.4166578       7.7322819
1.0026765       1.0521585       1.1069511       1.1654575       1.4155576       3.4287334       8.705518
1.0068045       1.0574971       1.1199378       1.1672974       1.4557006       3.7058299       9.1262585
1.006806        1.0678113       1.1217741       1.1721492       1.4865933       5.1395657       11.3622493
1.0095393       1.0693213       1.1249263       1.1745822       1.5288221       5.1630022       11.7143255
1.0138449       1.0774119       1.1285558       1.1829852       1.5455525       5.2170471       12.1913201
1.0157868       1.0786276       1.131142        1.2105879       1.5553722       5.2170917       28.6310193
1.0172572       1.0815964       1.1355961       1.2214821       1.5648215       5.2423219       35.6424253
1.0175864       1.0824848       1.1384727       1.2247057       1.5660845       5.2487696       242.034956
1.0217653       1.0858299       1.1394903       1.2426674       1.5718653       5.2702669
1.0258584       1.0898011       1.1481312       1.2909222       1.596262        5.8009955
1.0300305       1.0910497       1.1509711       1.3091419       1.6746581       6.5150299
1.0337268       1.0913256       1.1549397       1.3220114       1.9889176       6.5898417
1.0411128       1.0932286       1.155237        1.3313342       2.1939683       6.7280103
1.0451031       1.0933827       1.1594367       1.3709589       2.3234237       7.6762901

@skvadrik
Copy link
Owner

skvadrik commented Nov 2, 2020

bug128.re is difficult, it produces a large DFA that gets much smaller after minimization (it is the purpose of this test to ensure that re2c can minimize it and in reasonable time). I pushed f24f3c7 that should speed it up a bit (perhaps by 1/3), but I suspect it's not enough. Skeleton construction have to go before minimization (so that it can find errors in it), and so it has to work on a large DFA.

For the remaining slow tests, can you post test names in addition to timings (maybe just a few slow tests)?

@sergeyklay
Copy link
Collaborator Author

Let me check this tomorrow. I'll provide the Top 10 slowest tests with timings. Thanks for your efforts to make re2c faster on Windows!

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Nov 3, 2020

Here are the test timings:

For Mingw I used the following command line:

C:\src\re2c\.build_mingw\re2c.exe `
    bug128.re -o bug128.c `
    -W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow

For Visual Studio I used the following command line:

C:\src\re2c\.build_msvc\Debug\re2c.exe `
    bug128.re -o bug128.c `
    -W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow

Below I provide the code generation results with re2c compiled using 70c45d1 commit. The no-skeleton timings have changed within the margin of error, so I'm not providing them here. The timing were taken using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re

  • Mingw x64: 4016.8068 ms
  • Visual Studio 2019 x64: 103018.7321 ms

Note that the generation time was reduced twice even for Mingw. Good job!

Here are the slowest code-generations (except bug128.re since it was provided above). re2c was compiled using Visual Studio.
First column is the time in seconds.

 16.4251298  c\submatch\http_rfc7230.re
 14.3443707  c\submatch\uri_rfc3986.re
 12.5504493  encodings\code_points_8.re
 11.9489916  tags\bench_http_rfc7230.re
 11.7245659  bug1708378.re
 11.3771996  tags\bench_uri_rfc3986.re
 9.7639883   encodings\code_points_u.re
 9.3953288   encodings\code_points_w.re
 9.3702854   encodings\utf8_names.re
 9.3198274   encodings\code_points_x.re
 7.8328027   posix_captures\skeleton_01.re
 7.3967537   encodings\class3.re
 7.3233439   debug\closure_stats_gor1.re
 6.809638    debug\closure_stats_gtop.re
 6.1107775   encodings\code_points.re
 6.0995565   encodings\code_points_e.re
 4.9683203   real_world\php20150211_parse_date.re
 4.8522657   encodings\unicode_blocks_u_encoding_policy_fail.re
 4.7366564   real_world\parse_date_1_78.re
 4.534634    encodings\unicode_blocks_u_encoding_policy_ignore.re
 4.0589073   bug57_original.re
 3.8969453   real_world\parse_date_db.re
 3.5421027   real_world\parse_date_b.re
 3.5134215   real_world\parse_date_g.re
 3.4023533   real_world\parse_date_s.re
 3.3955802   real_world\parse_date.re
 3.3522276   encodings\unicode_blocks_u_encoding_policy_substitute.re
 3.083979    posix_captures\gor3.re
 2.4607906   encodings\unicode_blocks_8_encoding_policy_substitute.re
 2.3087235   encodings\class1.re
 2.1166286   encodings\class2.re
 1.9528867   dot\php20150211_zend_language_scanner.re
 1.9294394   encodings\unicode_blocks_8_encoding_policy_ignore.re
 1.9075661   encodings\unicode_blocks_8_encoding_policy_fail.re
 1.7798628   encodings\unicode_blocks_x_encoding_policy_fail.re
 1.7585333   encodings\class4.re
 1.7427518   real_world\cpp.re
 1.6056584   encodings\unicode_blocks_x_encoding_policy_ignore.re
 1.5418737   encodings\unicode_blocks_x_encoding_policy_substitute.re

@skvadrik
Copy link
Owner

skvadrik commented Nov 4, 2020

Thanks for the timings @sergeyklay!

I realized during my local experiments with bug128.re that 5s out of total 6s time taken by the test on my Linux system is the time taken by ext4 on truncating the output file: if the output file remains from the previous run, there is a 5s slowdown compared to the case when the output file does not exist. I'm talking about the *.input an *.keys files. @trofi debugged it further and provided a fix: 8090a1b.

This shouldn't have any impact on your experiments, as ./run_test.sh --skeleton runs in a clean directory, and also the filesystem on Windows is likely different. But it confused my local measurements (when the file got occasionally removed I observed a very fast run, which I could not reproduce on subsequent runs).

@sergeyklay
Copy link
Collaborator Author

Anyway, I'm glad this issue helped so many things to be found that could be improved!

@trofi
Copy link
Collaborator

trofi commented Nov 6, 2020

/O2 gets ignored by cmake because it's incompatible with default /RTC1 option (found the error in CMakeError.log): https://stackoverflow.com/questions/37007939/command-line-error-d8016-o2-and-rtc1-command-line-options-are-incompatibl

@skvadrik
Copy link
Owner

skvadrik commented Nov 6, 2020

@sergeyklay Maybe what we need for debug Visual Studio build is:

cmake --build . --config Release -DCMAKE_CXX_FLAGS="-UNDEBUG -DRE2C_DEBUG"

And for release build simply:

cmake --build . --config Release

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Nov 6, 2020

@sergeyklay Maybe what we need for debug Visual Studio build is:

cmake --build . --config Release -DCMAKE_CXX_FLAGS="-UNDEBUG -DRE2C_DEBUG"

And for release build simply:

cmake --build . --config Release

This sounds interesting. I'll need some time to investigate the solution. Thank you for the tip.

Another issues that bother me:

  1. In some cases, the *.c file is not created. The bug128.c case is not the only one

  2. As I said I'm currently developing a native PowerShell wrapper for running re2c tests on Windows. So, about 50% of tests or even more are failed (possible related to 1.). This is unexpected failure

@sergeyklay
Copy link
Collaborator Author

/O2 gets ignored by cmake because it's incompatible with default /RTC1 option (found the error in CMakeError.log): https://stackoverflow.com/questions/37007939/command-line-error-d8016-o2-and-rtc1-command-line-options-are-incompatibl

Right. Fiddling with CMAKE_CXX_FLAGS directly is not a modern CMake whatsoever. We should always use check_cxx_compiler_flag. And this why I'm not a big fan to set CMAKE_CXX_FLAGS in workflow configuration file. This rework was in my future plans.

@skvadrik
Copy link
Owner

skvadrik commented Nov 7, 2020

In some cases, the *.c file is not created

Does this also happen in optimized Visual Studio build?

Unlike *.keys and *.input files, with the output file re2c tries to create it atomically: is writes the generated output to a temporary file (https://github.com/skvadrik/re2c/blob/master/src/util/temp_file.cc#L46-L67) and then renames it to the output file.

@skvadrik
Copy link
Owner

skvadrik commented Nov 8, 2020

I investigated why Debug build takes infinite time. Apparently it enters an infinite loop in destruction of std::vector<range_t>. There is a function std::_Container_base12::_Orphan_all() that loops over a list and sets some entries to zero (I don't know what it's doing in a destructor of a vector, but the block of code containing the loop is guarded by #if _ITERATOR_DEBUG_LEVEL == 2, so it must be some debug checks). The loop should terminate when the next entry is NULL, but that never happens: the list contains a 3-cycle, and so the program loops over those 3 value infinitely (I left it overnight).

Here is a screenshot of debug session that shows the infinite loop: screenshot-20201108@112229

This can be either a bug in Visual Studio, or some STL edge case that is beyond my C++-fu (although the code that uses this std::vector<range_t> is quite simple and it's not doing any dirty memory tricks as far as I know). I don't think I can fix this aside from using different data structures for skeleton (e.g. replace vectors and wrap-iterators with circular lists).

What I still don't know is why some particular commit adds or removes this behavior. I see no connection, and it can be very complex, so I won't investigate this any further.

@codylico
Copy link

codylico commented Nov 8, 2020

it looks like it's detaching std::vector< >::iterator instances from the source std::vector< >, but that's just a guess.

@sergeyklay
Copy link
Collaborator Author

@skvadrik Good job!

@skvadrik
Copy link
Owner

This commit b8b107a should workaround the infinite loop in Visual Studio Debug build. I don't think that tweaking the code to workaround such bugs is a good strategy, but this particular change makes sense in general.

@sergeyklay
Copy link
Collaborator Author

Let me check this. I'll provide result ASAP.

@sergeyklay
Copy link
Collaborator Author

Below I provide the skeleton generation results with re2c compiled using commit 745f6e3 The timings were obtained using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re

  • Visual Studio 2019 x64 (Debug): 25678.2738 ms
  • Visual Studio 2019 x64 (Release): 1465.6405 ms
  • Mingw x64 (Debug): 1312.2025 ms

All necessary files were generated (including *.c one). This is the best result so far. Congratulations!

about 50% of tests or even more are failed

As I can see now everything is OK! This is what PowerShell tests wrapper says:

-----------------
All:         1591
Ran:         1552
Passed:      1347
Soft errors: 205
Hard errors: 0
Total time:  240.175295
-----------------
PASSED

@skvadrik
Copy link
Owner

\o/ Great!

Total time 240.175295 is 4 minutes, and it's the time required to run skeleton tests in Release Visual Studio build, right? It seems fast enough for CI. What is the total time for Visual Studio Debug build and for Mingw build?

It might be worth reporting the Visual Studio problem with the infinite loop in Debug build.

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Nov 11, 2020

Total time 240.175295 is 4 minutes, and it's the time required to run skeleton tests in Release Visual Studio build, right? It seems fast enough for CI. What is the total time for Visual Studio Debug build and for Mingw build?

I'm still working on this. Not all things are ready yet. But I'll provide results ASAP.

It might be worth reporting the Visual Studio problem with the infinite loop in Debug build.

Yeah. But TBH, I don't fully understand the nature of the issue and can't provide a minimal PoC

@skvadrik
Copy link
Owner

can't provide a minimal PoC

Minimal PoC will require effort, an I'm reluctant to spend the effort because I'm not sure the bug hasn't been fixed in newer or non-free versions. But I can provide exact instructions on reproducing this in Cygwin environment (starting with checking out re2c from git an on to running the hanging test). I'll have a look if there is an easy way to send such a bug report.

@sergeyklay
Copy link
Collaborator Author

Please keep us informed on this issue.

@sergeyklay
Copy link
Collaborator Author

@skvadrik I'm sorry for the silence and lack of any activity in this direction. Increased activity in my main job due to the end of the year. In any case I'm always here :)

@skvadrik
Copy link
Owner

@sergeyklay It's perfectly fine, I have the same problem with my day job. ;)

I haven't merged anything into master for weeks. In part this is because I have little time, and in part because I'm working on a local experiment. And I assume that you also have good reasons. Thanks for letting me know anyway, much appreciated!

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Jan 6, 2021

@skvadrik The work has not been completed yet and some things have yet to be implemented, but general PoC looks like: https://github.com/sergeyklay/re2c/blob/feature/powershell-test-runner/run_tests.ps1

I'm not a Windows user, although I have a PC running this system. And looking back at the whole PowerShell-tests-runner-journey, I understand that I would not want to become the main maintainer of this solution. The issue from strategic point of view is, in the long run, I would not want to support this solution. Therefore, I would like to propose a universal solution that works equally well in all major systems - Python. What do you think about a test runner written in Python for Linux/UNIX as well as Windows systems? Would this be an acceptable for re2c project?

@skvadrik
Copy link
Owner

skvadrik commented Jan 6, 2021

@sergeyklay That's a lot of work! Of course it's understandable if you wouldn't want to maintain it in the long run. I myself run Windows in a VM, and it is extremely slow and inconvenient. But the main disadvantage it the necessity to maintain different scripts on different platforms, they would diverge over time.

What do you think about a test runner written in Python for Linux/UNIX as well as Windows systems? Would this be an acceptable for re2c project?

Certainly, I think it is the best option. run_tests.sh is too complex for a bash script, and most of the time is spent on the test harness, not on running the tests themselves. And it's hard to maintain portability.

@sergeyklay
Copy link
Collaborator Author

Fine! I'll start work on unified solution then!

@skvadrik
Copy link
Owner

skvadrik commented Jan 6, 2021

Great! \o/

@sergeyklay
Copy link
Collaborator Author

sergeyklay commented Jun 15, 2021

Hello,

A small update about re2c test runner in Python. The work has not been completed yet, but regular tests are already passing:


asciicast

@sergeyklay
Copy link
Collaborator Author

Skeleton tests are a little slow, but and they already pass too:


asciicast


Run as time -v nice python run_tests.py --skeleton

Command being timed: "nice python run_tests.py --skeleton"
User time (seconds): 62.33
System time (seconds): 27.75
Percent of CPU this job got: 44%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:24.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 378976
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1674
Minor (reclaiming a frame) page faults: 7844314
Voluntary context switches: 19387
Involuntary context switches: 97620
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 9
Page size (bytes): 16384
Exit status: 0

@skvadrik
Copy link
Owner

Awesome, thanks for all your work! \o/
(And I like this asciinema tool.)

@sergeyklay sergeyklay linked a pull request Jun 16, 2021 that will close this issue
@skvadrik
Copy link
Owner

skvadrik commented Jul 4, 2023

A few notes from my latest unsuccessful attempt to enable skeleton tests on windows (and check if they are still too slow):

  • First, we need to change at least some of windows presets in CMakePresets.json to contain the word "skeleton" in the name (this triggers .github/workflows/ci-cmake.yml to enable the step that runs skeleton tests).

  • Then we need to change "Get build dir" step in .github/workflows/ci-cmake.yml to correctly set working directory (there is a problem with forward slashes) using definition like BUILD_DIR="${{ github.workspace }}\.build\${{ matrix.name }}-full".

  • Then we need to change run_tests.py.in to use windows-specific way to run the compiler. Currently the script fails as https://github.com/skvadrik/re2c/actions/runs/5451650795/jobs/9918139684:

    Traceback (most recent call last):
      File "C:\hostedtoolcache\windows\Python\3.11.4\x64\Lib\multiprocessing\pool.py", line 125, in worker
        result = (True, func(*args, **kwds))
                     ^^^^^^^^^^^^^^^^^^^
      File "D:\a\re2c\re2c\.build\windows-msvc-debug-ootree-skeleton-full\run_tests.py", line 497, in run_test
        process = subprocess.Popen(
                  ^^^^^^^^^^^^^^^^^
      File "C:\hostedtoolcache\windows\Python\3.11.4\x64\Lib\subprocess.py", line 1026, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "C:\hostedtoolcache\windows\Python\3.11.4\x64\Lib\subprocess.py", line 1538, in _execute_child
        hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    FileNotFoundError: [WinError 2] The system cannot find the file specified
    

    Perhaps some file paths are wrong (a problem forward slashes?). The compiler invocation also needs to be adjusted for windows, as well as output file extension, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants