Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make scalar/field choice depend on C-detected __int128 availability #793

Merged
merged 2 commits into from
Aug 12, 2020

Conversation

sipa
Copy link
Contributor

@sipa sipa commented Aug 9, 2020

This PR does two things:

  • It removes the ability to select the 5x52 field with a 8x32 scalar, or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions, or neither.
  • The choice is made automatically by the C code, unless overridden by a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through configure with a hidden option --with-test-override-wide-multiplication={auto,int64,int128}).

This reduces the reliance on autoconf for this performance-critical configuration option, and also reduces the number of different combinations to test.

This removes one theoretically useful combination: if you had x86_64 asm but no __int128 support in your compiler, it was possible to use the 64-bit field before but the 32-bit scalar. I think this doesn't matter as all compilers/systems that support (our) x86_64 asm also support __int128. Furthermore, #767 will break this.

As an unexpected side effect, this also means the gen_context static precomputation tool will now use __int128 based implementations when available (which required an addition to the 5x52 field; see first commit).

@real-or-random
Copy link
Contributor

real-or-random commented Aug 9, 2020

Concept ACK

I like the naming here which demonstrates that this is only about multiplications. That seems useful as a first step towards the idea that has been mentioned in #711 (comment) and in the following comment.

@sipa
Copy link
Contributor Author

sipa commented Aug 9, 2020

Indeed, this could be extended with a --with-wide-multiplication=mul128 for MSVC for example (well, MSVC platforms likely wouldn't use the configure script at all, but the equivalent macro config).

@real-or-random
Copy link
Contributor

Maybe we could get rid of autoconf here too by checking __SIZEOF_INT128__: https://stackoverflow.com/a/54815033

@sipa
Copy link
Contributor Author

sipa commented Aug 10, 2020

@real-or-random That would certainly simplify things further.

One reason to not do this may be to keep the ability to explicitly select uint64_t based multiplication in a test.

For example, I think we expect that all x86_64 Linux/gcc configurations would end up using the __int128 based multiplication, and without configurability, every such instance on Travis would do so. However, perhaps there are reasonable x86_64 Linux/gcc (or very similar) configurations out there that we don't know about, or can't test for, without __int128. If so, it may be better to keep it configurable, and have at least one explicit test with uint64_t based multiply.

@real-or-random
Copy link
Contributor

We could still disable in the tests using macros, e.g., by simply adding -U__SIZEOF_INT128__ to the CPPFLAGS. I guess part of the question is whether we want to expose an easy interface for configuring to the user, because ./configure shines here. But I don't think we want that in that case. Forcing a narrower type is not very useful for users.

@gmaxwell
Copy link
Contributor

I don't think it should be exposed to the user. It should be possible to override w/ ifdefs for testing/benchmarking by developers (see for example the safegcd PR where testing 32-bit on x86_64 is both common and super useful), or for building for weirdo platforms where the type exists but doesn't work. This kind of thing can be documented in a development focused readme.

Exposing it to joe-user just begs him to set it in a dumb way and hurt his performance. :)

Another option if there is some reservation around define macro sniffing is to have it in autotools but hide the setting or explicitly note that it's for testing. Also ./configure setting can trigger overrides of macro-sniffed stuff, so the question of detection via autotools vs macros is largely orthogonal.

I think that in general if there is a simple, portable, and reliable macro test it should be preferred-- autotools checking is slow, more likely to run into host vs target issues, and depends on using autotools, which not all targets will do. Especially in this project because the whole thing is one compilation unit and there are very few things that need to be detected (endianness, type sizes, clz instructions, not much else...) , so you don't end up with the issue of a huge detection header that is redundantly run for every object.

@sipa
Copy link
Contributor Author

sipa commented Aug 10, 2020

I think I prefer a hidden configure flag, say "--with-test-override-widemul=X" or so. It feels a bit ugly and strangely roundabout to go undefine system defines to guide the autodetection you wrote yourself into the right thing for testing.

I'd imagine it works by having USE_FORCE_WIDEMUL_X macros that are usually all undefined, but can be set by the configure script. There is no autodetection or verification in the script; it's just passed through.

@gmaxwell
Copy link
Contributor

Right, I wasn't thinking undefining system macros would be good. -DUSE_FORCE_FOO makes perfect sense. (and could applied to inject testing CLZs and stuff like that if you later feel it would be useful to do so)

@elichai
Copy link
Contributor

elichai commented Aug 10, 2020

Concept ACK.

FWIW there are quite a lot of users who just use 32bit because they don't use autotools and for some reason decided it was easier to just always use 32bit code, which is a shame. and this could help them (if they'll update their libsecp...)

examples:
mimblewimble/rust-secp256k1-zkp#71
ethereum/go-ethereum#21203 (was fixed just 2 months ago)

@real-or-random
Copy link
Contributor

real-or-random commented Aug 10, 2020

Right, I wasn't thinking undefining system macros would be good. -DUSE_FORCE_FOO makes perfect sense. (and could applied to inject testing CLZs and stuff like that if you later feel it would be useful to do so)

The undef thing was just a lazy example. CPPFLAGS=-DUSE_FORCE_FOO is essentially a hidden flag that can be passed to autoconf on the command line, so I don't think there is a need to keep any autoconf code to pass the flag through (but I'm not opposed to that either).

src/basic-config.h Outdated Show resolved Hide resolved
@gmaxwell
Copy link
Contributor

gmaxwell commented Aug 10, 2020

It would be really good if configure still showed the configuration, and maybe there was some string that the test/bench utilities could print--- so otherwise some weird detection failure doesn't result in a user silently getting half the speed with no evidence of the cause. I've had cases before where I was accidentally linking the wrong library and was confused by the performance.

e.g.

"libsecp256k1-deadbeef gcc 10.2.1 le x86_64 s=64asm f=64asm clz ctz nobignum endo ..."

So far this has not been needed, as it's only used by the static precomputation
which always builds with 32-bit fields.

This prepares for the ability to have __int128 detected on the C side, breaking
that restriction.
@sipa
Copy link
Contributor Author

sipa commented Aug 10, 2020

Changed to make the C side detect the availability of __int128, with the ability to override through a define (and a hidden configure option).

An unexpected side effect here is that gen_context will now start using the __int128 based implementations as well, if they are available. This means I had to add SECP256K1_FE_STORAGE_CONST_GET for the 5x52 field, which didn't exist as it was never needed!

@sipa sipa changed the title Merge scalar/field config into widemul setting Make scalar/field choice depend on C-detected __int128 availability Aug 10, 2020
Instead of supporting configuration of the field and scalar size independently,
both are now controlled by the availability of a 64x64->128 bit multiplication
(currently only through __int128). This is autodetected from the C code through
__SIZEOF_INT128__, but can be overridden using configure's
--with-test-override-wide-multiply, or by defining
USE_FORCE_WIDEMUL_{INT64,INT128} manually.
@real-or-random
Copy link
Contributor

real-or-random commented Aug 11, 2020

ACK 79f1f7a diff looks good and tests pass

@elichai
Copy link
Contributor

elichai commented Aug 11, 2020

tACK 79f1f7a
I tested all the compilers available at godbolt(https://godbolt.org/z/8cqP1q) and:

A. they all managed to parse it(some very old ones required replacing uint64_t with unsigned long long) except cc64 and ppci 0.5.5.
B. None returned 0.
C. All the architectures I recognized returned the value I had expected(128 for 64bit and 64 for 32bit).
This also includes a few MSVC compilers there.

Interesting finding, clang added support(x86-64) for 128bit on 3.3. and gcc on 4.6.4 (from the versions available on godbolt, I did not check outside).

@gmaxwell
Copy link
Contributor

clang 3.3. was in 2013 while gcc 4.6 was in 2011.

@real-or-random real-or-random merged commit 887bd1f into bitcoin-core:master Aug 12, 2020
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 29, 2020
…ailability

Summary:
```
This PR does two things:

    It removes the ability to select the 5x52 field with a 8x32 scalar,
or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions,
or neither.
    The choice is made automatically by the C code, unless overridden by
a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through
configure with a hidden option
--with-test-override-wide-multiplication={auto,int64,int128}).

This reduces the reliance on autoconf for this performance-critical
configuration option, and also reduces the number of different
combinations to test.

This removes one theoretically useful combination: if you had x86_64 asm
but no __int128 support in your compiler, it was possible to use the
64-bit field before but the 32-bit scalar. I think this doesn't matter
as all compilers/systems that support (our) x86_64 asm also support
__int128. Furthermore, #767 will break this.

As an unexpected side effect, this also means the gen_context static
precomputation tool will now use __int128 based implementations when
available (which required an addition to the 5x52 field; see first
commit).
```

Backport of secp2561k [[bitcoin-core/secp256k1#793 | PR793]].

Depends on D7610.

Test Plan:
  cmake -GNinja ..
  ninja check-secp256k1

  cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int64
  ninja check-secp256k1

  cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int128
  ninja check-secp256k1

  ../configure
  make -j4 check

  ../configure --with-test-override-wide-multiply=int64
  make -j4 check

  ../configure --with-test-override-wide-multiply=int128
  make -j4 check

Run the Travis build.
https://travis-ci.org/github/Fabcien/secp256k1/builds/731196575

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D7629
deadalnix pushed a commit to Bitcoin-ABC/secp256k1 that referenced this pull request Sep 30, 2020
…ailability

Summary:
```
This PR does two things:

    It removes the ability to select the 5x52 field with a 8x32 scalar,
or the 10x26 field with a 4x64 scalar. It's both 128-bit wide versions,
or neither.
    The choice is made automatically by the C code, unless overridden by
a USE_FORCE_WIDEMUL_INT{64,128} define (which is available through
configure with a hidden option
--with-test-override-wide-multiplication={auto,int64,int128}).

This reduces the reliance on autoconf for this performance-critical
configuration option, and also reduces the number of different
combinations to test.

This removes one theoretically useful combination: if you had x86_64 asm
but no __int128 support in your compiler, it was possible to use the
64-bit field before but the 32-bit scalar. I think this doesn't matter
as all compilers/systems that support (our) x86_64 asm also support
__int128. Furthermore, #767 will break this.

As an unexpected side effect, this also means the gen_context static
precomputation tool will now use __int128 based implementations when
available (which required an addition to the 5x52 field; see first
commit).
```

Backport of secp2561k [[bitcoin-core/secp256k1#793 | PR793]].

Depends on D7610.

Test Plan:
  cmake -GNinja ..
  ninja check-secp256k1

  cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int64
  ninja check-secp256k1

  cmake -GNinja .. -DSECP256K1_TEST_OVERRIDE_WIDE_MULTIPLY=int128
  ninja check-secp256k1

  ../configure
  make -j4 check

  ../configure --with-test-override-wide-multiply=int64
  make -j4 check

  ../configure --with-test-override-wide-multiply=int128
  make -j4 check

Run the Travis build.
https://travis-ci.org/github/Fabcien/secp256k1/builds/731196575

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D7629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants