Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear sensitive memory without getting optimized out (revival of #636) #1579

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

theStack
Copy link
Contributor

@theStack theStack commented Aug 6, 2024

This PR picks up #636 (which in turn picked up #448, so this is take number three) and is essentially a rebase on master.

Some changes to the original PR:

So far I haven't looked at any disassembly and possible performance implications yet (there were some concerns expressed in #636 (comment)), happy to go deeper there if this gets Concept ACKed.

The proposed method of using a memory barrier to prevent optimizating away the memset is still used in BoringSSL (where it was originally picked up from) and in the Linux Kernel, see e.g. https://github.com/google/boringssl/blob/5af122c3dfc163b5d1859f1f450756e8e320a142/crypto/mem.c#L335 and https://github.com/torvalds/linux/blob/d4560686726f7a357922f300fc81f5964be8df04/include/linux/string.h#L348 / https://github.com/torvalds/linux/blob/d4560686726f7a357922f300fc81f5964be8df04/include/linux/compiler.h#L102

Fixes #185.

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK (obviously)

Thanks for reviving this, I never had the time/motivation to come back to this PR, but it's important.

We should call SECP256K1_CHECKMEM_UNDEFINE (

* - SECP256K1_CHECKMEM_UNDEFINE(p, len):
) in secp256k1_memclear after clearing the memory. Reading cleared memory would clearly be a bug.

@theStack
Copy link
Contributor Author

We should call SECP256K1_CHECKMEM_UNDEFINE (

* - SECP256K1_CHECKMEM_UNDEFINE(p, len):

) in secp256k1_memclear after clearing the memory. Reading cleared memory would clearly be a bug.

Thanks, added that, and rebased on master.

@real-or-random
Copy link
Contributor

@theStack Can you rebase this on top of musig which has introduced a few more code locations that need clearing?

Personally, I'd love to have this in the next release.

@theStack
Copy link
Contributor Author

theStack commented Oct 8, 2024

@theStack Can you rebase this on top of musig which has introduced a few more code locations that need clearing?

Sure, done. With only five lines changed in the musig module, this needed less effort than expected, hope I didn't miss any instances (many of them are handled indirectly via the ..._{fe,scalar}_clear functions though).

Relevant excerpt of the range-diff (uncolored here)
$ git range-diff ac0e41...0b01d2
5:  6fcbae9 ! 15:  02ee811 Use secp256k1_memclear() to clear stack memory instead of memset()
    @@ src/modules/ellswift/main_impl.h: int secp256k1_ellswift_xdh(const secp256k1_con
          secp256k1_scalar_clear(&s);
      
     
    + ## src/modules/musig/session_impl.h ##
    +@@ src/modules/musig/session_impl.h: static void secp256k1_nonce_function_musig(secp256k1_scalar *k, const unsigned c
    +         secp256k1_scalar_set_b32(&k[i], buf, NULL);
    + 
    +         /* Attempt to erase secret data */
    +-        memset(buf, 0, sizeof(buf));
    +-        memset(&sha_tmp, 0, sizeof(sha_tmp));
    ++        secp256k1_memclear(buf, sizeof(buf));
    ++        secp256k1_memclear(&sha_tmp, sizeof(sha_tmp));
    +     }
    +-    memset(rand, 0, sizeof(rand));
    +-    memset(&sha, 0, sizeof(sha));
    ++    secp256k1_memclear(rand, sizeof(rand));
    ++    secp256k1_memclear(&sha, sizeof(sha));
    + }
    + 
    + int secp256k1_musig_nonce_gen_internal(const secp256k1_context* ctx, secp256k1_musig_secnonce *secnonce, secp256k1_musig_pubnonce *pubnonce, const unsigned char *input_nonce, const unsigned char *seckey, const secp256k1_pubkey *pubkey, const unsigned char *msg32, const secp256k1_musig_keyagg_cache *keyagg_cache, const unsigned char *extra_input32) {
    +@@ src/modules/musig/session_impl.h: int secp256k1_musig_nonce_gen_counter(const secp256k1_context* ctx, secp256k1_mu
    +     if (!secp256k1_musig_nonce_gen_internal(ctx, secnonce, pubnonce, buf, seckey, &pubkey, msg32, keyagg_cache, extra_input32)) {
    +         return 0;
    +     }
    +-    memset(seckey, 0, sizeof(seckey));
    ++    secp256k1_memclear(seckey, sizeof(seckey));
    +     return 1;
    + }
    + 
    +

@real-or-random
Copy link
Contributor

This one comes to my mind, too:

secp256k1/src/util.h

Lines 253 to 254 in a88aa93

/* acc may contain secret values. Try to explicitly clear it. */
acc = 0;

@theStack
Copy link
Contributor Author

theStack commented Oct 8, 2024

This one comes to my mind, too:

secp256k1/src/util.h

Lines 253 to 254 in a88aa93

/* acc may contain secret values. Try to explicitly clear it. */
acc = 0;

Ah good catch, missed that (only grepped for memset calls). Tackled now as well.

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want motivated, you could look at git grep secp256k1_ge_set_gej.*(. @sipa's recent comment in the MuSig PR really caught my attention. When I worked on the previous PR, I really didn't consider the possibility that a gej could leak secret data.


I wonder if it makes sense to have the _finalize functions in the hash module clear the state automatically. And then have "unsafe" funtions like _finalize_keep if the callers knows that data is public or if the caller wants to reuse the midstate. It sounds like a neat idea, but I'm not really convinced because it special-cases the hash module somewhat: we'll need to manually clear everything else including scalars, etc... So we have to be careful with this anyway when writing and reviewing code, and perhaps having yet another safety mechanism that works only for the hash module. (This could perhaps be a nice follow-up if we add the same mechanism to more modules, e.g., ge_set_gej could clear the gejs, unless you use a _keep variant.)

src/util.h Outdated Show resolved Hide resolved
for (i = 0; i <= counter; i++) {
secp256k1_rfc6979_hmac_sha256_generate(&rng, nonce32, 32);
}
secp256k1_rfc6979_hmac_sha256_finalize(&rng);

secp256k1_memclear(keydata, sizeof(keydata));
secp256k1_rfc6979_hmac_sha256_clear(&rng);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add similar calls in other code locations where we derive secrets via hash functions, e.g., in the schnorrsig and musig modules (and ellswift? now sure). The previous PR predates even the schnorrsig module, see the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add similar calls in other code locations where we derive secrets via hash functions, e.g., in the schnorrsig and musig modules (and ellswift? now sure).

As far as I could see by greping for _finalize, all places where we derive secrets via hashing had corresponding hash object clearing calls already (probably I did that already in the first version of this PR, though I honestly don't remember). What I found though was that it was unnecessarily called in the schnorrsig challenge function, so I removed it again, as there are no secrets involved. For the musig module, the _memclear calls were replaced by _sha256_clear calls.

One place where I added a hash clearing call was in the public API function secp256k1_tagged_sha256. We don't use that internally, but it seems to make sense to clear the hash object, considering that we don't know if the user passes in secret data or not.

The previous PR predates even the schnorrsig module, see the commit message.

Right, removed that part of the commit message.

@theStack
Copy link
Contributor Author

If you want motivated, you could look at git grep secp256k1_ge_set_gej.*(. @sipa's recent comment in the MuSig PR really caught my attention. When I worked on the previous PR, I really didn't consider the possibility that a gej could leak secret data.

I found the following functions containing _gej instances resulting from point multiplication (secp256k1_ecmult_{gen,const}) with secret scalars:

Taking care of those could be probably go into an own PR, as its trivial to fix and review and hence has a significantly higher chance to land in the next release than this one? (and having a version where the compiler still might optimize it out seems still much better than not doing it at all)

Interestingly, the ECDSA signing function does clear out the nonce commitment for both the jacobi and affine points (though the latter wouldn't be needed according to #1479 (comment)).

I wonder if it makes sense to have the _finalize functions in the hash module clear the state automatically. And then have "unsafe" funtions like _finalize_keep if the callers knows that data is public or if the caller wants to reuse the midstate. It sounds like a neat idea, but I'm not really convinced because it special-cases the hash module somewhat: we'll need to manually clear everything else including scalars, etc... So we have to be careful with this anyway when writing and reviewing code, and perhaps having yet another safety mechanism that works only for the hash module. (This could perhaps be a nice follow-up if we add the same mechanism to more modules, e.g., ge_set_gej could clear the gejs, unless you use a _keep variant.)

Sounds like a good idea to me for a follow-up.

@real-or-random
Copy link
Contributor

real-or-random commented Oct 10, 2024

Taking care of those could be probably go into an own PR, as its trivial to fix and review and hence has a significantly higher chance to land in the next release than this one? (and having a version where the compiler still might optimize it out seems still much better than not doing it at all)

My guess is that what we do currently is useless on any modern compiler. I admit that I haven't looked at the compiler output, but I'd rather spend the time on resolving the problem.

I don't think that fixing these additional cases here will make it much more difficult to review the PR. And to be honest, while this is a great for defense in depth, we haven't deeply cared about this so far. It's not the end of the world if we need to wait a few more months. So I think it's good to add these cases to this PR here.

@theStack
Copy link
Contributor Author

My guess is that what we do currently is useless on any modern compiler. I admit that I haven't looked at the compiler output, but I'd rather spend the time on resolving the problem.

I don't think that fixing these additional cases here will make it much more difficult to review the PR. And to be honest, while this is a great for defense in depth, we haven't deeply cared about this so far. It's not the end of the world if we need to wait a few more months. So I think it's good to add these cases to this PR here.

Thanks for the feedback! Makes sense, added an extra commit at the end for the gej clearing after point multiplications, it's only four lines of code anyway. Also went over the necessary hash clearing places and made a few small changes (see #1579 (comment)).

@sipa
Copy link
Contributor

sipa commented Oct 21, 2024

utACK c921078. I have not reviewed this for exhaustiveness (as in, are there more places where clearing is useful/necessary), but the code changes look good.

void *(*volatile const volatile_memset)(void *, int, size_t) = memset;
volatile_memset(ptr, 0, len);
#endif
SECP256K1_CHECKMEM_UNDEFINE(ptr, len);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems that in production code, we usually don't call any of the _CHECKMEM_{DEFINE,UNDEFINE} macros. There is one reachable code-path in secp256k1_declassify, but it would only hit if the context was created with the SECP256K1_CONTEXT_DECLASSIFY flag, where the API documentation explicitly says "Do not use". Should that _CHECKMEM_UNDEFINE call here be conditional by a preprocessor #if(def) or something alike, so it only applies to tests and can't slow down (and bloat) release builds?

I noticed that while looking at the disassembly of the .so file and wondering why there was so much extra code after the memory clearing, until I realized this must be valgrind's VALGRIND_MAKE_MEM_UNDEFINED. I'm building now explicitly with -DSECP256K1_VALGRIND=OFF -DSECP256K1_BUILD_CTIME_TESTS=OFF to avoid that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, great point! The reason why secp256k1_declassify uses a run-time flag (instead of compile-time flag) is that we want to run the constant-time tests on the real binary.

I don't know what the performance impact of these additional instructions is, but I doubt that having a compile-time flag is a concern in this case. That means that we could just wrap the SECP256K1_CHECKMEM_UNDEFINE in an #ifdef VERIFY. @sipa What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reasoning is that we want the release binaries to be as close as possible to what we actually test in the ctime test, but disable the effect at runtime using SECP256K1_CONTEXT_DECLASSIFY to avoid a performance impact.

It wouldn't surprise me that that is overkill; the conditional to determine whether or not to declassify operations at runtime may have a higher cost than actually executing the nop instructions that declassification actually correspond to when not running instrumented by valgrind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I think we're talking past each other. My suggestion, for which I'd like to have your opinion on, is to wrap SECP256K1_CHECKMEM_UNDEFINE in an #ifdef VERIFY block here in this particular case of clearing memory, i.e., inside secp256k1_memclear (and not in secp256k1_declassify. Do you think that's a reasonable thing to do?


But regarding what you said:

I think the reasoning is that we want the release binaries to be as close as possible to what we actually test in the ctime test, but disable the effect at runtime using SECP256K1_CONTEXT_DECLASSIFY to avoid a performance impact.

Yes, I agree. Perhaps an additional reason is to avoid any shenanigans that the valgrind syscalls may have. They should be noops, but it's certainly a bit safer not to run them at all.

It wouldn't surprise me that that is overkill; the conditional to determine whether or not to declassify operations at runtime may have a higher cost than actually executing the nop instructions that declassification actually correspond to when not running instrumented by valgrind.

I can imagine that the overhead of the noops is negligible, but at least checking the conditional should be negligible as well because we use EXCEPT to help the compiler predict the branch:

secp256k1/src/secp256k1.c

Lines 236 to 238 in 68b5520

static SECP256K1_INLINE void secp256k1_declassify(const secp256k1_context* ctx, const void *p, size_t len) {
if (EXPECT(ctx->declassify, 0)) SECP256K1_CHECKMEM_DEFINE(p, len);
}

Copy link
Contributor Author

@theStack theStack Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What bothered me in general (independent on how lightweight the code generated by the valgrind macros might be) is that when users build with the default CMake settings, they could currently end up with different release binaries, depending on whether valgrind is installed or not. Unrelated to this PR, but maybe the SECP256K1_VALGRIND build setting should only be auto-detected in the "dev-mode" preset, and OFF by default? (I only looked at the CMake build, don't know if the valgrind setting is also auto-detected on autotools).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That's a valid point, which has been brought up before, see #813 (comment). (This thread is also very helpful to understand why we have run-time flag etc.) The conclusion was that the benefits outweigh the drawback that the build outputs depends on the availability of valgrind, but yeah, there's no perfect solution here. And yes, it's also auto-detected in autotools.

@theStack
Copy link
Contributor Author

theStack commented Oct 22, 2024

For a first check on whether this PR fulfills its promise, I looked at the generated assembler code of secp256k1_ec_seckey_verify, as this function is particularly short and should clear out the secret scalar at the end. The following commands were used to generate the disassembled output:

$ cmake --build build && objdump --no-addresses --no-show-raw-insn --disassemble=secp256k1_ec_seckey_verify ./build/lib/libsecp256k1.so > seckey_verify_disasm.txt
compiler output diff of `secp256k1_ec_seckey_verify` (master vs. PR branch)

running on Debian 12 stable (Bookworm) and gcc 12.2.0 on a x86-64 machine

--- master.txt  2024-10-22 16:09:38.985554751 +0200
+++ pr1579.txt  2024-10-22 16:12:37.324112394 +0200
@@ -11,31 +11,38 @@
 Disassembly of section .text:
 
 <secp256k1_ec_seckey_verify>:
-       sub    $0x38,%rsp
+       push   %rbx
+       sub    $0x30,%rsp
        test   %rsi,%rsi
-       je     <secp256k1_ec_seckey_verify+0x48>
+       je     <secp256k1_ec_seckey_verify+0x58>
+       lea    0x10(%rsp),%rbx
        lea    0xc(%rsp),%rdx
-       lea    0x10(%rsp),%rdi
+       pxor   %xmm0,%xmm0
+       mov    %rbx,%rdi
        call   <secp256k1_scalar_set_b32>
        mov    0xc(%rsp),%ecx
        mov    0x10(%rsp),%rax
        or     0x18(%rsp),%rax
        or     0x20(%rsp),%rax
+       movaps %xmm0,0x10(%rsp)
        or     0x28(%rsp),%rax
+       movaps %xmm0,0x20(%rsp)
        setne  %dl
        xor    %eax,%eax
        test   %ecx,%ecx
        sete   %al
-       add    $0x38,%rsp
        and    %edx,%eax
+       add    $0x30,%rsp
+       pop    %rbx
        ret
        nopl   0x0(%rax)
        mov    %rdi,%rax
        mov    0xb0(%rdi),%rsi
-       lea    0x63b9(%rip),%rdi        # <_fini+0x1196>
+       lea    0x6199(%rip),%rdi        # <_fini+0xf46>
        call   *0xa8(%rax)
+       add    $0x30,%rsp
        xor    %eax,%eax
-       add    $0x38,%rsp
+       pop    %rbx
        ret
There is a little noise in the diff due to different register allocations and jump offsets, but in the middle one can see quite clearly that the memory clearing happens using the 16-byte SSE register xmm0: it is earlier zeroed with a `pxor` instruction and then written with `movaps` instruction twice to memory locations `0x10(%rsp)` and `0x20(%rsp)`, which correspond to the 32-byte `secp256k1_scalar` instance (interestingly, the clearing is interleaved with the `mov`/`or` instructions of the `_scalar_is_zero` check). So this seems to work as expected. Might be interesting to also try it with a variety of other compiler versions (especially newer ones) and different architectures, like arm64.

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK mod my review comments

src/util.h Outdated Show resolved Hide resolved
Comment on lines +232 to +234
#if defined(_MSC_VER)
/* SecureZeroMemory is guaranteed not to be optimized out by MSVC. */
SecureZeroMemory(ptr, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the same code in Core again, and found this PR, and commented there: bitcoin/bitcoin#26950 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I left it as-is for now, happy to adapt once there is clarity that the variant in Bitcoin Core is preferred or needed.

Comment on lines +249 to +251
#else
void *(*volatile const volatile_memset)(void *, int, size_t) = memset;
volatile_memset(ptr, 0, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this locally, but it would be nice to see a full CI run where this branch is forced (e.g., via #if 0 / #elif 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #1622 for that.

real-or-random and others added 7 commits October 22, 2024 18:24
This code is not supposed to handle secret data.
We rely on memset() and an __asm__ memory barrier where it's available or
on SecureZeroMemory() on Windows. The fallback implementation uses a
volatile function pointer to memset which the compiler is not clever
enough to optimize.
There are two uses of the secp256k1_fe_clear() function that are now separated
into these two functions in order to reflect the intent:

1) initializing the memory prior to being used -> converted to fe_set_int( . , 0 )
2) zeroing the memory after being used such that no sensitive data remains. ->
    remains as fe_clear()

In the latter case, 'magnitude' and 'normalized' need to be overwritten when
VERIFY is enabled.

Co-Authored-By: isle2983 <[email protected]>
All of the invocations of secp256k1_memclear() operate on stack
memory and happen after the function is done with the memory object.
This commit replaces existing memset() invocations and also adds
secp256k1_memclear() to code locations where clearing was missing;
there is no guarantee that this commit covers all code locations
where clearing is necessary.

Co-Authored-By: isle2983 <[email protected]>
real-or-random and others added 2 commits October 22, 2024 18:29
This gives the caller more control about whether the state should
be cleaned (= should be considered secret). Moreover, it gives the
caller the possibility to clean a hash struct without finalizing it.
Quoting sipa (see bitcoin-core#1479 (comment)):
"When performing an EC multiplication A = aG for secret a, the resulting
 _affine_ coordinates of A are presumed to not leak information about a (ECDLP),
  but the same is not necessarily true for the Jacobian coordinates that come
  out of our multiplication algorithm."

For the ECDH point multiplication result, the result in Jacobi coordinates should be
cleared not only to avoid leaking the scalar, but even more so as it's a representation
of the resulting shared secret.
@theStack
Copy link
Contributor Author

Fixed #1579 (comment) and rebased on master for easier comparison of branches (since #1553, the output location of the .so file changed from ./build/src to ./build/lib; not being aware of that, I was unintentionally comparing the library file generated from master with itself for quite some time, wondering why the PR doesn't change anything 🤦‍♂️ ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory zeroization improvements
3 participants