-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible crash consistency issue with truncate using multiple file descriptors #9
Comments
Hello, I think the bug works as follows:
While decreasing the tree height, the new height and new root block using one instruction. The underlying issue is that WineFS (like PMFS) uses cmpxchg16b to "atomically" store 16.
Our testing tool splits this store into two 8 byte stores (first the height, then the root block). One bug fix/workaround is to enclose this cmpxchg16b call in a transaction. |
Hi Rohan,
I think I've found a new crash consistency issue in WineFS that occurs if multiple file descriptors are used to modify a file and we crash while increasing the file's size using truncate. The bug is a bit complicated and unfortunately I don't have a good way to replicate it outside of my testing infrastructure and don't fully understand the root cause yet, so let me know if anything doesn't make sense. I'm running WineFS in strict mode; I don't know if this occurs in relaxed mode as well.
Suppose we have a program that performs the following operations:
The workload is a bit strange (it was generated by our fuzzer) but I've minimized it as much as I can without removing the ability to reproduce the issue.
If we open a file descriptor on line 1 and use it for the rest of the operations, everything works as expected. However, suppose we open a new file descriptor for file0 between lines 2 and 3 and use it for the write and ftruncate on lines 3 and 4, then open another one betwen lines 4 and 5 and use it for the final two operations (as in this program: contenthash-test.zip).
Now, let's say that we crash after the flush on line 2091 of inode.c (flushing the head of the truncate list in
pmfs_truncate_add()
) goes through and that data becomes durable when performing the secondftruncate()
. When the file system is re-mounted and runs recovery, it will recover the truncate list and replay the truncate operation on the truncated inode. During normal/non-crashing execution, the truncate operation causes the extra bytes to be zeroed out, but it seems that this step is skipped here during recovery, and the recovered file can end up with some random garbage in it.I haven't yet narrowed down a root cause for this; truncation during normal execution vs. recovery use mostly the same code, and I haven't had the opportunity to dig into where things differ. I suspect right now that the use of multiple file descriptors causes some weird regular-execution behavior where the persistent inode might not look the same as it would if we just used 1 file descriptor, so recovery goes a bit differently, but I'm not sure.
Let me know if this makes sense or you have any ideas about what might be going on. I'll keep looking into this issue as well.
The text was updated successfully, but these errors were encountered: