Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File opens performed without context managers #207

Open
Hitham2496 opened this issue Jan 9, 2025 · 1 comment
Open

File opens performed without context managers #207

Hitham2496 opened this issue Jan 9, 2025 · 1 comment

Comments

@Hitham2496
Copy link

Dear development team,

I wanted to raise an issue relating to file opening outside of context managers in python - this relates to some discussions between your team at UCSD and the COSMIC team at the Wellcome Sanger Institute (where I am also based).

Looking at this GitHub search (using version 1.2.31): https://github.com/search?q=repo%3AAlexandrovLab%2FSigProfilerMatrixGenerator+%22%3D+open%22&type=code

we noted several cases of unsafe file opens e.g. setting directly variable = open(<some_file>, ...), rather than being placed within a context manager such as:

with open(<some_file>, ...) as filestream:
    # do something

Under normal running, and with small files, this does not pose too much of an issue - however there is still a risk that exceptions from failed reads and writes are not caught and reported unless file opens are performed with managed context.

This risk is heightened for users on HPC systems working with large files on networked filesystems - we have observed this occurring on our systems at Sanger, where some users working with MatrixGenerator reported truncated output files.

The file write failures causing the truncation was not a specific problem relating to MatrixGenerator, but the problem was that the file write failures were not reported to stderr - which means that users would not know they were working with incorrect results unless they manually inspected them.

Would it be possible to see if at least the most at-risk file reads and writes can be performed in a with statement as above, or with some logic that can catch the exceptions if they arise?

Many thanks for your time and attention,
Hitham

@mdbarnesUCSD
Copy link
Collaborator

Hi @Hitham2496,

Thanks for raising this concern. To better assess the issue, could you please clarify:

  • Which specific files were truncated? (e.g., log files or final output matrices)
  • What was missing from these files? (e.g., partial content, missing lines, only the last portion cut off)
  • Did the affected users check their log files for errors?
  • Was the truncation consistent or intermittent?

Best,
Mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants