Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FileLocksmith]Move hanging operations to a distinct thread #22806

Merged
merged 3 commits into from
Dec 16, 2022

Conversation

jaimecbernardo
Copy link
Collaborator

Summary of the Pull Request

File Locksmith is hanging for some users. After analyzing some dumps, we've detected there are some hanging system calls, especially NtQueryObject and GetFileType. It's not clear in what conditions this happens, so the work was offloaded to a different thread, which is destroyed and resumed if getting the information for a specific handle takes too long.
Users in the issues have confirmed this solves the problem for them.

PR Checklist

Detailed Description of the Pull Request / Additional comments

NtQueryObject and GetFileType are hanging for some handles.
This PR offloads the faulty code to a distinct thread, which is killed on a timeout and the work is resumed with a new thread.
Unfortunately, there are no alternative APIs that provide a timeout, meaning it's not possible to recover when these calls hang.
The solution presented here destroys the hanging threads, which is not advisable, but unfortunately there doesn't seem to be an alternative here except by offloading to a new process and destroying that process, which would complicate the code too much in my opinion.
There's also an additional condition before calling GetFileType so that it's only called on handles that make sense, to safeguard wrong calls.
Some additional tweaks to remove some warnings on other parts of the code. (Got those as errors on my tries to upgrade this project to only be /CLR on some specific files)

I've tried changing the project to only use /CLR on the needed interface code files so I could use newer C++ synchronization headers. I've tried using futures with a timeout but those need the thread to join at the end, and that's not possible with hanging calls. I've tried using packaged_tasks with a different thread for each handle, but that makes it so that the operation takes several minutes instead of seconds (millions of handles exist). I ended up reverting these tries.

Validation Steps Performed

Verified FileLocksmith still works well in this build. Had users confirm with a build that it solves the issue for them.

@jaimecbernardo jaimecbernardo merged commit b17955c into main Dec 16, 2022
@crutkas crutkas deleted the dev/jaime/fix-file-locksmith-hanging branch December 28, 2022 18:44
@lb90
Copy link

lb90 commented Jan 30, 2023

As noted in the comments, TerminateThread is unsafe and may leave a few mutexes in the abandoned state.

I wonder if CancelSynchronousIo may help with unlocking the thread? According to https://community.osr.com/discussion/176861/handle-info-using-ntquerysysteminformation-how-to-differentiate-files-from-folders GetFileType / NtQueryObject is blocked in a read call.

@PolarGoose
Copy link

@lb90 ,

I have tried to call NtQueryObject(..., ObjectNameInformation, ...) from a separate thread and call CancelSynchronousIo if it hangs. Unfortunately, it doesn't work. The CancelSynchronousIo doesn't unlock the NtQueryObject call.

I have found a very similar bug report from Python psutil library: get_open_files might hang forever #340
This bug report contains a lot of information. They have tried a lot of workarounds, and in the end, they also ended up killing the thread.

@lb90
Copy link

lb90 commented Oct 2, 2023

Ok, thank you very much!

@PolarGoose
Copy link

There is a proper solution to prevent hangs associated with the NtQueryObject(OBJECT_INFORMATION_CLASS.ObjectNameInformation) function.
The ReOpenFile can be used to exclude all non-file-related handles that might cause NtQueryObject to hang.
This approach was suggested by @dmex in this GitHub discussion.

I have successfully implemented this workaround in my project: ShowWhatProcessLocksFile
I confirm that it works very well and eliminates hanging issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants