You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it works fine for a while but as the number of files processed increase, when I run my script in parallel over thousands of record, at some point the parallel job breaks with the following error:
ubuntu@pycsw-prod:/mnt/csw/dev/py-mmd-tools/script$ python3 convert_all.py -i /mnt/csw/metadata/nbs -t /mnt/csw/dev/mmd/xslt/mmd-to-iso.xsl -o /mnt/csw/metadata/nbs_iso/
os.walk("/mnt/csw/metadata/nbs")
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/yaml_util.py", line 85, in load_yaml
OSError: [Errno 24] Too many open files: '/home/ubuntu/.config/mmdtool/config.yaml'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
File "/usr/local/lib/python3.8/dist-packages/parmap/parmap.py", line 104, in _func_star_single
File "convert_all.py", line 33, in writerecord
File "/mnt/csw/dev/py-mmd-tools/py_mmd_tools/mmd_to_csw_iso.py", line 40, in mmd_to_iso
File "/mnt/csw/dev/py-mmd-tools/py_mmd_tools/mmd_util.py", line 31, in setup_log
File "/mnt/csw/dev/py-mmd-tools/py_mmd_tools/mmd_util.py", line 21, in get_logpath
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/core.py", line 558, in __init__
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/core.py", line 600, in read
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/core.py", line 574, in _add_user_source
File "/home/ubuntu/.local/lib/python3.8/site-packages/confuse/yaml_util.py", line 88, in load_yaml
confuse.exceptions.ConfigReadError: file /home/ubuntu/.config/mmdtool/config.yaml could not be read: [Errno 24] Too many open files: '/home/ubuntu/.config/mmdtool/config.yaml'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "convert_all.py", line 56, in <module>
main(metadata=args.input_dir, mmd2iso_xslt=args.input_xslt, outdir=args.output_dir)
File "convert_all.py", line 42, in main
y = parmap.map(writerecord, xmlfiles, mmd2iso_xslt=mmd2iso_xslt, outdir=outdir, pm_pbar=False)
File "/usr/local/lib/python3.8/dist-packages/parmap/parmap.py", line 304, in map
return _map_or_starmap(function, iterable, args, kwargs, "map")
File "/usr/local/lib/python3.8/dist-packages/parmap/parmap.py", line 248, in _map_or_starmap
output = result.get()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 768, in get
raise self._value
confuse.exceptions.ConfigReadError: file file /home/ubuntu/.config/mmdtool/config.yaml could not be read: [Errno 24] Too many open files: '/home/ubuntu/.config/mmdtool/config.yaml' could not be read
We are in fact closing the file after reading it. You mentioned that you are running this program many times in parallel:
when I run my script in parallel over thousands of record
So it seems likely to me that these thousands of parallel processes are simultaneously opening the same file—even if it will shortly be closed again by all of them.
Any chance you can instead find a way to load your config once and share it across all the processes?
Hi,
I maybe mis-suing the
confuse
library but I am running a function which uses a method like:it works fine for a while but as the number of files processed increase, when I run my script in parallel over thousands of record, at some point the parallel job breaks with the following error:
I tried to replace my code with:
with the hope to get the config file closed, but that didn't work as I got a
AttributeError: __enter__
The text was updated successfully, but these errors were encountered: