Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate stderr from pyre.bin #863

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vthemelis
Copy link
Contributor

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run

Summary

Make it easier to diagnose issues like #860

Test Plan

Ran locally and got the full error message when running pyre

# poetry run pyre
ƛ No watchman binary found. 
To enable pyre incremental, you can install watchman: https://facebook.github.io/watchman/docs/install
ƛ Defaulting to non-incremental check.
ƛ Path does not exist for search path: SubdirectoryElement(root='/', subdirectory='tests')
ƛ /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin)
ƛ /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin: /lib64/libc.so.6: version `GLIBC_2.29' not found (required by /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin)
ƛ /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin: /lib64/libc.so.6: version `GLIBC_2.33' not found (required by /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin)
ƛ /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin)
ƛ /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /root/.cache/pypoetry/virtualenvs/bug-pyre-il7asoJj-py3.11/bin/pyre.bin)
ƛ Check command exited with non-zero return code: 1.

@@ -192,12 +192,17 @@ def _run_check_command(command: Sequence[str]) -> CheckResult:
result = subprocess.run(
command,
stdout=subprocess.PIPE,
stderr=log_file.file,
stderr=subprocess.PIPE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make more sense to leave this as a file, but dump the file whenever the return code is nonzero.

This might be better if the binary were ever to dump a huge amount to stderr, plus it makes it possible to cat the file if you want to check for messages as the command is running, and it would be enough to fix the issue we just saw where we weren't seeing the glibc link error.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I stream the logs to both the file and the python wrapper STDERR? I think it's always worth propagating the stderr as it makes debugging stuff much easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait I think I see what's going on here.

Pyre is piping the stdout, which will only contain type errors, but it is using the log file for stderr which includes all the ocaml logging as well as this backtrace.

So we definitely do want the log file, and we probably don't want to pipe the stderr directly because the python side is doing some prettifying (especially in interactive mode).

I think the problem must be that sometimes not all the logs actually get spit out before we get to line 224 in the case where the binary logs a stack trace and exits nonzero. The problem might be with the assumptions in this code around start.background_logging and how eagerly it logs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grievejia Do you remember how this works exactly?

I'm not an expert on context managers, but what we saw was that when the pyre binary crashed with a fairly low-level stack trace (a glibc version mismatch), Pyre was never displaying the stderr output.

I wonder if the problem might actually be in the client/log/log.py:

        if <big switch on log section from pyre>
            ...
        else:
            LOG.debug(line)

this may mean we are doing debug-level logging of stack traces and so they are invisible by default

@vthemelis and / or I can dig into this but wondering if you remember off the top of your head

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, indeed, that seems to be the case (which is why the same error comes up as Debug in this older issue).

Looks like if the log-line doesn't have the usual date-time SEVERITY message format, pyre will be using LOG.Debug to log this line or (I think) the severity of the previously logged line.

Even this though doesn't quite make sense as I can see DEBUG messages in the output provided in #860 yet the libc errors do not seem to have propagated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that thread joining is the underlying problem, but given that it's a context manager I think it may be that introducing a delay is the easiest fix. I'd be fine with adding a print or time.sleep statement anywhere that fixes the issue for now, with a comment noting the issue it's addressing (could include a link to this discussion).

I slightly prefer using a delay versus delete=False because we probably don't want the backend file to persist if we can avoid it, it's really only there to pipe logs to the frontend. But it's a temp file that will get cleaned up eventually anyway, so maybe my concern is misplaced here.

@samwgoldman might have more ideas on how to join the threads in a more principled way, I'm very much not a threading expert. But I'd like to get a short-term fix in soon and we can improve on it later rather than block this PR, anything that fixes the dropped logs issue is great work and a fix we want sooner than later!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really prefer to avoid adding a sleep call. Can we confirm where the log is being dropped? Based on my reading of the code what should be happening:

  1. pyre.bin's stderr is redirected to a log file
  2. the log file is being "tail"ed by log.log_tailer and converted into an Iterable for each line
  3. the StreamLogger prefixes each line and writes to stderr via the LOG logger

I'm not super familiar with the code in (2) and (3), but honestly it looks overcomplicated. I think it's nice to have a separate thread to read from the file, but that thread should just readline on the file descriptor until EOF. When we join that thread, the join should wait for the thread to hit that EOF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, thinking more about this, I think we should move this code to use the lower-level subprocess.Popen code, then read from stderr and stdout in two separate threads. This is what subprocess.communicate does (and what subprocess.run does in turn).

The thread reading from stderr can then output to the log file and transform+log to stderr. This is much nicer than redirecting to the file and then trying to also tail that file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a little more looking at this, I think the real issue isn't lack of a join, since we do join. I think the real issue is that StreamLogger has two modes it runs in, a best-effort tailing mode and a proper logging mode.

The tailing mode is used in some situations where we start a pyre daemon, e.g. start._run_in_background, and the logger is currently optimized for that case by having self._should_stop_reading_stream which gets set on __exit__, so that whenever the main thread decides we're done we stop reading logs.

But the proper logging mode is what we want for shell commands like pyre check, and when in that mode we really shouldn't be using _should_stop_reading_stream at all, we should run to completion.

This suggests either using two classes or an attribute to differentiate the cases

Copy link
Contributor

@stroxler stroxler Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops I missed Sam's second comment.

Using Popen and two threads (or even just one thread, stdout is going to be used in the foreground process anyway) should work well here too, and then the existing file-based StreamLogger could be used only for deamonized logic like start._run_in_background.

We can't completely rid of the file-based approach because of the need to use a file in the daemon case; the existing code is trying to use the same approach for both which is indeed questionable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants