Make scalar/field choice depend on C-detected __int128 availability #793

sipa · 2020-08-09T18:00:58Z

This PR does two things:

It removes the ability to select the 5x52 field with a 8x32 scalar, or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions, or neither.
The choice is made automatically by the C code, unless overridden by a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through configure with a hidden option --with-test-override-wide-multiplication={auto,int64,int128}).

This reduces the reliance on autoconf for this performance-critical configuration option, and also reduces the number of different combinations to test.

This removes one theoretically useful combination: if you had x86_64 asm but no __int128 support in your compiler, it was possible to use the 64-bit field before but the 32-bit scalar. I think this doesn't matter as all compilers/systems that support (our) x86_64 asm also support __int128. Furthermore, #767 will break this.

As an unexpected side effect, this also means the gen_context static precomputation tool will now use __int128 based implementations when available (which required an addition to the 5x52 field; see first commit).

real-or-random · 2020-08-09T22:13:24Z

Concept ACK

I like the naming here which demonstrates that this is only about multiplications. That seems useful as a first step towards the idea that has been mentioned in #711 (comment) and in the following comment.

sipa · 2020-08-09T22:18:08Z

Indeed, this could be extended with a --with-wide-multiplication=mul128 for MSVC for example (well, MSVC platforms likely wouldn't use the configure script at all, but the equivalent macro config).

real-or-random · 2020-08-09T22:52:34Z

Maybe we could get rid of autoconf here too by checking __SIZEOF_INT128__: https://stackoverflow.com/a/54815033

sipa · 2020-08-10T00:10:29Z

@real-or-random That would certainly simplify things further.

One reason to not do this may be to keep the ability to explicitly select uint64_t based multiplication in a test.

For example, I think we expect that all x86_64 Linux/gcc configurations would end up using the __int128 based multiplication, and without configurability, every such instance on Travis would do so. However, perhaps there are reasonable x86_64 Linux/gcc (or very similar) configurations out there that we don't know about, or can't test for, without __int128. If so, it may be better to keep it configurable, and have at least one explicit test with uint64_t based multiply.

real-or-random · 2020-08-10T00:39:33Z

We could still disable in the tests using macros, e.g., by simply adding -U__SIZEOF_INT128__ to the CPPFLAGS. I guess part of the question is whether we want to expose an easy interface for configuring to the user, because ./configure shines here. But I don't think we want that in that case. Forcing a narrower type is not very useful for users.

gmaxwell · 2020-08-10T02:47:19Z

I don't think it should be exposed to the user. It should be possible to override w/ ifdefs for testing/benchmarking by developers (see for example the safegcd PR where testing 32-bit on x86_64 is both common and super useful), or for building for weirdo platforms where the type exists but doesn't work. This kind of thing can be documented in a development focused readme.

Exposing it to joe-user just begs him to set it in a dumb way and hurt his performance. :)

Another option if there is some reservation around define macro sniffing is to have it in autotools but hide the setting or explicitly note that it's for testing. Also ./configure setting can trigger overrides of macro-sniffed stuff, so the question of detection via autotools vs macros is largely orthogonal.

I think that in general if there is a simple, portable, and reliable macro test it should be preferred-- autotools checking is slow, more likely to run into host vs target issues, and depends on using autotools, which not all targets will do. Especially in this project because the whole thing is one compilation unit and there are very few things that need to be detected (endianness, type sizes, clz instructions, not much else...) , so you don't end up with the issue of a huge detection header that is redundantly run for every object.

sipa · 2020-08-10T07:14:27Z

I think I prefer a hidden configure flag, say "--with-test-override-widemul=X" or so. It feels a bit ugly and strangely roundabout to go undefine system defines to guide the autodetection you wrote yourself into the right thing for testing.

I'd imagine it works by having USE_FORCE_WIDEMUL_X macros that are usually all undefined, but can be set by the configure script. There is no autodetection or verification in the script; it's just passed through.

gmaxwell · 2020-08-10T07:50:26Z

Right, I wasn't thinking undefining system macros would be good. -DUSE_FORCE_FOO makes perfect sense. (and could applied to inject testing CLZs and stuff like that if you later feel it would be useful to do so)

elichai · 2020-08-10T08:23:43Z

Concept ACK.

FWIW there are quite a lot of users who just use 32bit because they don't use autotools and for some reason decided it was easier to just always use 32bit code, which is a shame. and this could help them (if they'll update their libsecp...)

examples:
mimblewimble/rust-secp256k1-zkp#71
ethereum/go-ethereum#21203 (was fixed just 2 months ago)

real-or-random · 2020-08-10T09:09:33Z

Right, I wasn't thinking undefining system macros would be good. -DUSE_FORCE_FOO makes perfect sense. (and could applied to inject testing CLZs and stuff like that if you later feel it would be useful to do so)

The undef thing was just a lazy example. CPPFLAGS=-DUSE_FORCE_FOO is essentially a hidden flag that can be passed to autoconf on the command line, so I don't think there is a need to keep any autoconf code to pass the flag through (but I'm not opposed to that either).

src/basic-config.h

gmaxwell · 2020-08-10T12:21:39Z

It would be really good if configure still showed the configuration, and maybe there was some string that the test/bench utilities could print--- so otherwise some weird detection failure doesn't result in a user silently getting half the speed with no evidence of the cause. I've had cases before where I was accidentally linking the wrong library and was confused by the performance.

e.g.

"libsecp256k1-deadbeef gcc 10.2.1 le x86_64 s=64asm f=64asm clz ctz nobignum endo ..."

So far this has not been needed, as it's only used by the static precomputation which always builds with 32-bit fields. This prepares for the ability to have __int128 detected on the C side, breaking that restriction.

sipa · 2020-08-10T21:39:57Z

Changed to make the C side detect the availability of __int128, with the ability to override through a define (and a hidden configure option).

An unexpected side effect here is that gen_context will now start using the __int128 based implementations as well, if they are available. This means I had to add SECP256K1_FE_STORAGE_CONST_GET for the 5x52 field, which didn't exist as it was never needed!

Instead of supporting configuration of the field and scalar size independently, both are now controlled by the availability of a 64x64->128 bit multiplication (currently only through __int128). This is autodetected from the C code through __SIZEOF_INT128__, but can be overridden using configure's --with-test-override-wide-multiply, or by defining USE_FORCE_WIDEMUL_{INT64,INT128} manually.

real-or-random · 2020-08-11T09:04:26Z

ACK 79f1f7a diff looks good and tests pass

elichai · 2020-08-11T23:09:05Z

tACK 79f1f7a
I tested all the compilers available at godbolt(https://godbolt.org/z/8cqP1q) and:

A. they all managed to parse it(some very old ones required replacing uint64_t with unsigned long long) except cc64 and ppci 0.5.5.
B. None returned 0.
C. All the architectures I recognized returned the value I had expected(128 for 64bit and 64 for 32bit).
This also includes a few MSVC compilers there.

Interesting finding, clang added support(x86-64) for 128bit on 3.3. and gcc on 4.6.4 (from the versions available on godbolt, I did not check outside).

gmaxwell · 2020-08-11T23:27:11Z

clang 3.3. was in 2013 while gcc 4.6 was in 2011.

…ailability Summary: ``` This PR does two things: It removes the ability to select the 5x52 field with a 8x32 scalar, or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions, or neither. The choice is made automatically by the C code, unless overridden by a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through configure with a hidden option --with-test-override-wide-multiplication={auto,int64,int128}). This reduces the reliance on autoconf for this performance-critical configuration option, and also reduces the number of different combinations to test. This removes one theoretically useful combination: if you had x86_64 asm but no __int128 support in your compiler, it was possible to use the 64-bit field before but the 32-bit scalar. I think this doesn't matter as all compilers/systems that support (our) x86_64 asm also support __int128. Furthermore, #767 will break this. As an unexpected side effect, this also means the gen_context static precomputation tool will now use __int128 based implementations when available (which required an addition to the 5x52 field; see first commit). ``` Backport of secp2561k [[bitcoin-core/secp256k1#793 | PR793]]. Depends on D7610. Test Plan: cmake -GNinja .. ninja check-secp256k1 cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int64 ninja check-secp256k1 cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int128 ninja check-secp256k1 ../configure make -j4 check ../configure --with-test-override-wide-multiply=int64 make -j4 check ../configure --with-test-override-wide-multiply=int128 make -j4 check Run the Travis build. https://travis-ci.org/github/Fabcien/secp256k1/builds/731196575 Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D7629

sipa force-pushed the 202008_mergeconfig branch from fe3e7eb to 89acf9e Compare August 9, 2020 18:05

real-or-random reviewed Aug 10, 2020

View reviewed changes

src/basic-config.h Outdated Show resolved Hide resolved

Add SECP256K1_FE_STORAGE_CONST_GET to 5x52 field

0d7727f

So far this has not been needed, as it's only used by the static precomputation which always builds with 32-bit fields. This prepares for the ability to have __int128 detected on the C side, breaking that restriction.

sipa force-pushed the 202008_mergeconfig branch from 89acf9e to 2cd6ea3 Compare August 10, 2020 21:37

sipa changed the title ~~Merge scalar/field config into widemul setting~~ Make scalar/field choice depend on C-detected __int128 availability Aug 10, 2020

sipa force-pushed the 202008_mergeconfig branch from 2cd6ea3 to da16068 Compare August 10, 2020 21:52

sipa force-pushed the 202008_mergeconfig branch from da16068 to 79f1f7a Compare August 10, 2020 21:56

sipa mentioned this pull request Aug 10, 2020

Report build configuration in test/bench output #796

Open

real-or-random approved these changes Aug 11, 2020

View reviewed changes

real-or-random merged commit 887bd1f into bitcoin-core:master Aug 12, 2020

elichai mentioned this pull request Mar 9, 2021

Define field/scalar impl in secp256k1-sys/build.rs rust-bitcoin/rust-secp256k1#285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make scalar/field choice depend on C-detected __int128 availability #793

Make scalar/field choice depend on C-detected __int128 availability #793

sipa commented Aug 9, 2020 •

edited

Loading

real-or-random commented Aug 9, 2020 •

edited

Loading

sipa commented Aug 9, 2020

real-or-random commented Aug 9, 2020

sipa commented Aug 10, 2020

real-or-random commented Aug 10, 2020

gmaxwell commented Aug 10, 2020

sipa commented Aug 10, 2020

gmaxwell commented Aug 10, 2020

elichai commented Aug 10, 2020

real-or-random commented Aug 10, 2020 •

edited

Loading

gmaxwell commented Aug 10, 2020 •

edited

Loading

sipa commented Aug 10, 2020

real-or-random commented Aug 11, 2020 •

edited

Loading

elichai commented Aug 11, 2020

gmaxwell commented Aug 11, 2020

Make scalar/field choice depend on C-detected __int128 availability #793

Make scalar/field choice depend on C-detected __int128 availability #793

Conversation

sipa commented Aug 9, 2020 • edited Loading

real-or-random commented Aug 9, 2020 • edited Loading

sipa commented Aug 9, 2020

real-or-random commented Aug 9, 2020

sipa commented Aug 10, 2020

real-or-random commented Aug 10, 2020

gmaxwell commented Aug 10, 2020

sipa commented Aug 10, 2020

gmaxwell commented Aug 10, 2020

elichai commented Aug 10, 2020

real-or-random commented Aug 10, 2020 • edited Loading

gmaxwell commented Aug 10, 2020 • edited Loading

sipa commented Aug 10, 2020

real-or-random commented Aug 11, 2020 • edited Loading

elichai commented Aug 11, 2020

gmaxwell commented Aug 11, 2020

sipa commented Aug 9, 2020 •

edited

Loading

real-or-random commented Aug 9, 2020 •

edited

Loading

real-or-random commented Aug 10, 2020 •

edited

Loading

gmaxwell commented Aug 10, 2020 •

edited

Loading

real-or-random commented Aug 11, 2020 •

edited

Loading