You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to raise an issue relating to file opening outside of context managers in python - this relates to some discussions between your team at UCSD and the COSMIC team at the Wellcome Sanger Institute (where I am also based).
we noted several cases of unsafe file opens e.g. setting directly variable = open(<some_file>, ...), rather than being placed within a context manager such as:
with open(<some_file>, ...) as filestream:
# do something
Under normal running, and with small files, this does not pose too much of an issue - however there is still a risk that exceptions from failed reads and writes are not caught and reported unless file opens are performed with managed context.
This risk is heightened for users on HPC systems working with large files on networked filesystems - we have observed this occurring on our systems at Sanger, where some users working with MatrixGenerator reported truncated output files.
The file write failures causing the truncation was not a specific problem relating to MatrixGenerator, but the problem was that the file write failures were not reported to stderr - which means that users would not know they were working with incorrect results unless they manually inspected them.
Would it be possible to see if at least the most at-risk file reads and writes can be performed in a with statement as above, or with some logic that can catch the exceptions if they arise?
Many thanks for your time and attention,
Hitham
The text was updated successfully, but these errors were encountered:
Dear development team,
I wanted to raise an issue relating to file opening outside of context managers in python - this relates to some discussions between your team at UCSD and the COSMIC team at the Wellcome Sanger Institute (where I am also based).
Looking at this GitHub search (using version 1.2.31): https://github.com/search?q=repo%3AAlexandrovLab%2FSigProfilerMatrixGenerator+%22%3D+open%22&type=code
we noted several cases of unsafe file opens e.g. setting directly
variable = open(<some_file>, ...)
, rather than being placed within a context manager such as:Under normal running, and with small files, this does not pose too much of an issue - however there is still a risk that exceptions from failed reads and writes are not caught and reported unless file opens are performed with managed context.
This risk is heightened for users on HPC systems working with large files on networked filesystems - we have observed this occurring on our systems at Sanger, where some users working with MatrixGenerator reported truncated output files.
The file write failures causing the truncation was not a specific problem relating to MatrixGenerator, but the problem was that the file write failures were not reported to
stderr
- which means that users would not know they were working with incorrect results unless they manually inspected them.Would it be possible to see if at least the most at-risk file reads and writes can be performed in a
with
statement as above, or with some logic that can catch the exceptions if they arise?Many thanks for your time and attention,
Hitham
The text was updated successfully, but these errors were encountered: