[FileLocksmith]Move hanging operations to a distinct thread #22806
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of the Pull Request
File Locksmith is hanging for some users. After analyzing some dumps, we've detected there are some hanging system calls, especially NtQueryObject and GetFileType. It's not clear in what conditions this happens, so the work was offloaded to a different thread, which is destroyed and resumed if getting the information for a specific handle takes too long.
Users in the issues have confirmed this solves the problem for them.
PR Checklist
Detailed Description of the Pull Request / Additional comments
NtQueryObject and GetFileType are hanging for some handles.
This PR offloads the faulty code to a distinct thread, which is killed on a timeout and the work is resumed with a new thread.
Unfortunately, there are no alternative APIs that provide a timeout, meaning it's not possible to recover when these calls hang.
The solution presented here destroys the hanging threads, which is not advisable, but unfortunately there doesn't seem to be an alternative here except by offloading to a new process and destroying that process, which would complicate the code too much in my opinion.
There's also an additional condition before calling GetFileType so that it's only called on handles that make sense, to safeguard wrong calls.
Some additional tweaks to remove some warnings on other parts of the code. (Got those as errors on my tries to upgrade this project to only be /CLR on some specific files)
I've tried changing the project to only use /CLR on the needed interface code files so I could use newer C++ synchronization headers. I've tried using futures with a timeout but those need the thread to join at the end, and that's not possible with hanging calls. I've tried using packaged_tasks with a different thread for each handle, but that makes it so that the operation takes several minutes instead of seconds (millions of handles exist). I ended up reverting these tries.
Validation Steps Performed
Verified FileLocksmith still works well in this build. Had users confirm with a build that it solves the issue for them.