-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syncvar Nees Atomic Loads and Stores #191
Comments
Okay, it looks like the syncvar code frequently uses the idiom of copying the whole |
Case in point: Lines 263 to 268 in d6ce514
Basically the assumption described in that comment isn't true for ARM and this idiom is (rightly) being flagged by the thread sanitizer. |
Alright, I did some more reading on this this morning. Apparently doing mixed-size atomics within the same block of memory is allowed by the x86 and ARM memory models as long as the outermost atomic is not bigger than 128 bits on x86 and 64 bits on ARM. There's a weird sticking point on ARM though where 128 bit loads are sometimes still implemented in libatomic using locks on ARM for backcompat reasons. I haven't tracked down what the C/C++ memory model says about this yet, but it seems like it'd probably be fine. Another interesting consequence of this idiom: the syncvar struct has an explicit lock anyway which means if we load the whole thing as a 128 bit atomic speculatively but then instead use the lock and non-atomic accesses to the other members we'd be doing mixed atomic and non-atomic reads and writes to the syncvar members other than the lock itself. I suspect the fix is to use atomic reads and writes for the other members too even when they're protected by the lock. At least in theory that should not have significant performance penalties since the thread that's acquired the lock has fresh access to the whole cache line. |
It is for situations such as this that we do try to keep most of our structs within one cache line in size. |
Our syncvar implementation also needs to be updated to use explicit atomic reads and writes instead of just relying on the x86 memory consistency guarantees.
One example race:
qthreads/src/syncvar.c
Line 175 in d6ce514
qthreads/src/syncvar.c
Line 1359 in d6ce514
This shows up consistently with the thread sanitizer in the
syncvar_prodcons
test.The text was updated successfully, but these errors were encountered: