-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix infinite loop in SubprocessSet::DoWork() #2484
base: master
Are you sure you want to change the base?
Fix infinite loop in SubprocessSet::DoWork() #2484
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this changes the order of calls to OnPipeReady() and Done() called at runtime, making them inconsistent with the !USE_PPOLL
implementation. This may bite us in the future when refactoring this code.
Can you instead just add the missing increment in the if (fd < 0) continue
statement (e.g. if (fd < 0) { ++i; continue }
), which will have the same effect with a simpler change.
In case fd is invalid loop will be infinite because iterator is not incremented. Make loop consistent between both variants of SubprocessSet::DoWork() (with and withoug USE_PPOLL) by removing 'continue' and always advancing iterator in the end of the cycle.
ad18d33
to
a6da182
Compare
You are making the code more complicated for now good reason. Please simplify as suggested instead. @jhasse, wdyt? |
Code should be more clear now, since there is no continue statement, and consistent between two variants of DoWork(). I prepared it in this way to simplify the subsequent patch for stderr implementation, which adds handling of stderr fd to the loops. |
No. You are changing the behavior of the code by calling the Done() method in cases where it wasn't called before. This is wasteful and potentially dangerous. This is not what this patch is supposed to do, which would be to avoid fixing a potential infinite loop, and which can be trivially done by adding a missing increment in one line of source file. Please keep it simple. |
To be honest, I do not see how this condition should be met in first place:
I believe that is a leftover from some older implementation. And correct fix would be to replace this condition with assert. As long as Subprocess in in the running_ list its fd_ must always be valid. Please correct me if I am wrong, because I cannot see where else it might be changing outside of OnPipeReady(). If the above is true, then behavior was not changed, and Done() call will return false when OnPipeReady() was not called. The only reason I moved Done() to the end of the loop, is to prepare for the subsequent addition of the second error fd to handle. If you prefer not to do that in within this patch, then I believe the correct fix for this would be to replace the condition above with:
|
To clarify, when second error fd will be added, Subprocess will stay in the running_ list until both fds are cleared. And the condition check above in runtime makes sense. But with a single fd the condition will always fail. That is why you never see ninja to freeze in the current implementation, because fd is always valid. |
First, do not sneak unrelated behavior-changing code into a patch that is supposed to address a trivial issue. Not only this is a very easy way to introduce subtle bugs, or worse security issues, but your are also wasting time for you and reviewers, as well as lower the chances that your future contributions will be considered of appropriate quality. That being said, mistakes happen. Second, looking at the code, it looks like that given the current implementation, the
If you intend to modify the behavior of this function in non-trivial way, I suggest you do that in a dedicated PR that clearly explain the rationale for it, and hopefully provides separate commits to address changes in a logical order. And don't forget unit or regression tests for the new feature / behavior. Regarding the topic of supporting two separate event descriptors per sub-process, I would recommend first decoupling the file event wait logic from the queue management one (e.g. provide a PosixFileWatcher class that provides clients the ability to register callbacks to be invoked when an event is detected, with the ability to add / remove registrations within the callback function itself for safety). That will make the code much clearer to understand and maintain over time, while allowing unit-testing for different types of conditions that are not possible with the current code. Apart from that, I recommend you either drop this PR completely, or just fix the actual bug by adding a missing increment. |
There are no behavior changes within this patch. I believe the biggest confusion is in title, it should rather say "Cleanup" than "Fix", because there was no runtime issue within the current code. I already put that description into the commit body, but I would also change the commit title to make it clear. Justification for this code cleanup is to prepare the DoWork() implementation for the 2 fds handling simultaneously. I believe current implementation of the DoWork() is fine with regards to multiple fd event handling, and extending it with 2 fds support is trivial, here is how such a change will look after this patch: f8ce671#diff-e70adfab80664f71832232d71b3055c78e054dd75fbeec69d019cb0e9dce6b2dL283 Since fd_ is supposed to be always valid inside the running_ list, assert() should be the correct way of handling it. Fatal() is for catching runtime issues, assert() is for source code bugs. I will make appropriate changes, and will rephrase this PR description. |
the explanation doesn't need to be so complex the solution is to add instances of blocking of error conditions is preferred over the acceptance of non-error conditions, because otherwise the rest of the function is subject to fallthrough conditions |
for (vector<Subprocess*>::iterator i = running_.begin(); | ||
i != running_.end(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way this one has the increment anyway
if (fd >= 0 && FD_ISSET(fd, &set)) { | ||
(*i)->OnPipeReady(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is existing already, changes below have nothing to do with the fd value...
if an infinite loop is still possible after the fd value is handled then continue
should cause increment of i
, like how a normal for loop is written...
are we sure that omitting the increment in the loop itself isn't the actual mistake and all you need to do is put it back?
In that loop The code we are talking about is unreachable: This "fixup" patch was originally created in 2015, at that time I assumed that there may be more conditions when Subprocess fd_ is closed, but Subprocess itself may be still running. That thought also matched the feature I was working on: adding second error_fd_ for tracking the error output, which could be still open. By checking the code now I may claim that by design there is no condition when running_ list may contain Subprocess, which Done() == true. There is also a test case that validates this: SubprocessTest.SetWithMulti. Thus that check for the |
doesn't matter, the specific loop i'm referring to only reads from
This is likely true in real-life cases, but perhaps the reason that the fd check was put there to begin with was for the crazy unrealistic scenario of the system running out of available FDs to open. Without a supercomputer it would take a patched kernel to verify that. |
There is validation for that condition in code in 2 places: pipe() syscall in Subprocess::Start() and SubprocessTest.SetWithMulti. So it is still not possible to have undone Subprocess in the running_ list. |
simply removing the fd checks from the loops is likely acceptable then |
puh, sorry just skimmed over most of it. When exactly is this infinite loop triggered? |
It is never triggered with current implementation, and per my understanding that is by design that |
In case fd is invalid loop will be infinite because iterator is not incremented.
Make cycle more obvious by adding regular iterator into for() loop and move Subprocess from running_ into finished_ in separate loop.