-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make scalar/field choice depend on C-detected __int128 availability #793
Conversation
fe3e7eb
to
89acf9e
Compare
Concept ACK I like the naming here which demonstrates that this is only about multiplications. That seems useful as a first step towards the idea that has been mentioned in #711 (comment) and in the following comment. |
Indeed, this could be extended with a --with-wide-multiplication=mul128 for MSVC for example (well, MSVC platforms likely wouldn't use the configure script at all, but the equivalent macro config). |
Maybe we could get rid of autoconf here too by checking |
@real-or-random That would certainly simplify things further. One reason to not do this may be to keep the ability to explicitly select uint64_t based multiplication in a test. For example, I think we expect that all x86_64 Linux/gcc configurations would end up using the __int128 based multiplication, and without configurability, every such instance on Travis would do so. However, perhaps there are reasonable x86_64 Linux/gcc (or very similar) configurations out there that we don't know about, or can't test for, without __int128. If so, it may be better to keep it configurable, and have at least one explicit test with uint64_t based multiply. |
We could still disable in the tests using macros, e.g., by simply adding |
I don't think it should be exposed to the user. It should be possible to override w/ ifdefs for testing/benchmarking by developers (see for example the safegcd PR where testing 32-bit on x86_64 is both common and super useful), or for building for weirdo platforms where the type exists but doesn't work. This kind of thing can be documented in a development focused readme. Exposing it to joe-user just begs him to set it in a dumb way and hurt his performance. :) Another option if there is some reservation around define macro sniffing is to have it in autotools but hide the setting or explicitly note that it's for testing. Also ./configure setting can trigger overrides of macro-sniffed stuff, so the question of detection via autotools vs macros is largely orthogonal. I think that in general if there is a simple, portable, and reliable macro test it should be preferred-- autotools checking is slow, more likely to run into host vs target issues, and depends on using autotools, which not all targets will do. Especially in this project because the whole thing is one compilation unit and there are very few things that need to be detected (endianness, type sizes, clz instructions, not much else...) , so you don't end up with the issue of a huge detection header that is redundantly run for every object. |
I think I prefer a hidden configure flag, say "--with-test-override-widemul=X" or so. It feels a bit ugly and strangely roundabout to go undefine system defines to guide the autodetection you wrote yourself into the right thing for testing. I'd imagine it works by having USE_FORCE_WIDEMUL_X macros that are usually all undefined, but can be set by the configure script. There is no autodetection or verification in the script; it's just passed through. |
Right, I wasn't thinking undefining system macros would be good. -DUSE_FORCE_FOO makes perfect sense. (and could applied to inject testing CLZs and stuff like that if you later feel it would be useful to do so) |
Concept ACK. FWIW there are quite a lot of users who just use 32bit because they don't use autotools and for some reason decided it was easier to just always use 32bit code, which is a shame. and this could help them (if they'll update their libsecp...) examples: |
The undef thing was just a lazy example. |
It would be really good if configure still showed the configuration, and maybe there was some string that the test/bench utilities could print--- so otherwise some weird detection failure doesn't result in a user silently getting half the speed with no evidence of the cause. I've had cases before where I was accidentally linking the wrong library and was confused by the performance. e.g. "libsecp256k1-deadbeef gcc 10.2.1 le x86_64 s=64asm f=64asm clz ctz nobignum endo ..." |
So far this has not been needed, as it's only used by the static precomputation which always builds with 32-bit fields. This prepares for the ability to have __int128 detected on the C side, breaking that restriction.
89acf9e
to
2cd6ea3
Compare
Changed to make the C side detect the availability of __int128, with the ability to override through a define (and a hidden configure option). An unexpected side effect here is that |
2cd6ea3
to
da16068
Compare
Instead of supporting configuration of the field and scalar size independently, both are now controlled by the availability of a 64x64->128 bit multiplication (currently only through __int128). This is autodetected from the C code through __SIZEOF_INT128__, but can be overridden using configure's --with-test-override-wide-multiply, or by defining USE_FORCE_WIDEMUL_{INT64,INT128} manually.
da16068
to
79f1f7a
Compare
ACK 79f1f7a diff looks good and tests pass |
tACK 79f1f7a A. they all managed to parse it(some very old ones required replacing Interesting finding, clang added support(x86-64) for 128bit on 3.3. and gcc on 4.6.4 (from the versions available on godbolt, I did not check outside). |
clang 3.3. was in 2013 while gcc 4.6 was in 2011. |
…ailability Summary: ``` This PR does two things: It removes the ability to select the 5x52 field with a 8x32 scalar, or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions, or neither. The choice is made automatically by the C code, unless overridden by a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through configure with a hidden option --with-test-override-wide-multiplication={auto,int64,int128}). This reduces the reliance on autoconf for this performance-critical configuration option, and also reduces the number of different combinations to test. This removes one theoretically useful combination: if you had x86_64 asm but no __int128 support in your compiler, it was possible to use the 64-bit field before but the 32-bit scalar. I think this doesn't matter as all compilers/systems that support (our) x86_64 asm also support __int128. Furthermore, #767 will break this. As an unexpected side effect, this also means the gen_context static precomputation tool will now use __int128 based implementations when available (which required an addition to the 5x52 field; see first commit). ``` Backport of secp2561k [[bitcoin-core/secp256k1#793 | PR793]]. Depends on D7610. Test Plan: cmake -GNinja .. ninja check-secp256k1 cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int64 ninja check-secp256k1 cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int128 ninja check-secp256k1 ../configure make -j4 check ../configure --with-test-override-wide-multiply=int64 make -j4 check ../configure --with-test-override-wide-multiply=int128 make -j4 check Run the Travis build. https://travis-ci.org/github/Fabcien/secp256k1/builds/731196575 Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D7629
…ailability Summary: ``` This PR does two things: It removes the ability to select the 5x52 field with a 8x32 scalar, or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions, or neither. The choice is made automatically by the C code, unless overridden by a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through configure with a hidden option --with-test-override-wide-multiplication={auto,int64,int128}). This reduces the reliance on autoconf for this performance-critical configuration option, and also reduces the number of different combinations to test. This removes one theoretically useful combination: if you had x86_64 asm but no __int128 support in your compiler, it was possible to use the 64-bit field before but the 32-bit scalar. I think this doesn't matter as all compilers/systems that support (our) x86_64 asm also support __int128. Furthermore, #767 will break this. As an unexpected side effect, this also means the gen_context static precomputation tool will now use __int128 based implementations when available (which required an addition to the 5x52 field; see first commit). ``` Backport of secp2561k [[bitcoin-core/secp256k1#793 | PR793]]. Depends on D7610. Test Plan: cmake -GNinja .. ninja check-secp256k1 cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int64 ninja check-secp256k1 cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int128 ninja check-secp256k1 ../configure make -j4 check ../configure --with-test-override-wide-multiply=int64 make -j4 check ../configure --with-test-override-wide-multiply=int128 make -j4 check Run the Travis build. https://travis-ci.org/github/Fabcien/secp256k1/builds/731196575 Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D7629
This PR does two things:
configure
with a hidden option --with-test-override-wide-multiplication={auto,int64,int128}).This reduces the reliance on autoconf for this performance-critical configuration option, and also reduces the number of different combinations to test.
This removes one theoretically useful combination: if you had x86_64 asm but no __int128 support in your compiler, it was possible to use the 64-bit field before but the 32-bit scalar. I think this doesn't matter as all compilers/systems that support (our) x86_64 asm also support __int128. Furthermore, #767 will break this.
As an unexpected side effect, this also means the
gen_context
static precomputation tool will now use __int128 based implementations when available (which required an addition to the 5x52 field; see first commit).