-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
secp256k1_scalar_check_overflow is not constant time on S390 #784
Comments
My first thought was: Maybe first do the two ORs and only afterwards do the AND with the result and ps: It even saves an op (when compiled naively)! |
Thanks for the suggestion. I already tried that. :( Didn't help. Though it didn't occur to me that it might be better generally. Sounds like a fine change. :P |
Another With this fix applied, we have the same issue in On the way, I wondered why some of the short functions are |
Great find! ugh. maybe it's just me but casting to volatile just seems even more awful than having a volatile temporary that copies an input as we've done in some other places. Also-- there are integer comparisons all over the code base and AFAIK the next version of the compiler will just create this problem for other ones. I also feel like if I prompted GPT3 with a story about peppering our source code with volatile casts it would likely continue the story by saying and that's how we found 27 compiler bugs... I'm going to see if I can solicit some help from a s390x expert or gcc developer. Having a BE test in travis would be really good. But unlike some of the other constant time hacks in the codebase, I think this one is unlikely to help any other cpus/compilers or even silence the issue completely on s390x. |
Well yeah. The cast variant is in fact faster here (typically one instruction) because then only the read is volatile and not the store in the temporary. But I was somewhat surprised that this works here. In the existing instances
This may just be true, yeah.
Hehe, maybe. Let's also add a few more
This would be neat.
Yes, and we don't need the ct test for this anyway.
Agreed.
Hm, I believe it would. At least on that GCC config, the one in this edit: strike through totally wrong assumption, I misremembered this. |
Funnily enough, it's also enough to make |
I have rarely felt the need for a "puke" reaction emoji on Github. |
That's probably not helpful but I forgot to mention it in my comment yesterday: we don't even know if the Valgrind models |
I found docs that said it was variable time at least on one older cpu in that line. As in it said that it was x NS + bytes * y NS. You're right thought that it could be different on more recent chips-- this is where talking to an expert would help, but even still, we'd be left with a valgrind FP so that wouldn't help us very much. |
And it supports up to 255 byte long comparisons, which is unlikely to be running in constant time if it can bail out early. Of course, it could work in a quantized way, comparing groups of say 8 bytes at a time, and comparisons of up to 8 would be ct then. |
This is an odd architecture and is mostly interesting here just because its the only BE system available in the CI system.
It compiles the following simple code into variable time code:
The same thing happens in the analogous part of the 32-bit version of the function.
I tried all manner of compiler switches and could only make the situation worse-- with other similar comparisons becoming variable time.
The issue is that the architecture has an instruction which works functionally like memcmp, and the compiler will sometimes emit it for seemingly arbitrary comparisons.
I tried several different forms for the code but couldn't get it to stop. I don't expect anyone working on this repository to do anything about it now, but I'm planning on opening a GCC bug report and want something to point to.
The text was updated successfully, but these errors were encountered: