-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resubmission with OOM doesn't seem to be kicked off #65
Comments
When I stick to the example in readthedocs (maybe just changing some of the default integers):
I get in the logs more or less the same messages with slight variations
again with resubmit either
|
After hammering this a bit more I managed to get the resubmission working using:
filled the
Might be good to update the readthedocs slightly to reflect this difference. |
Ok, this is resubmitting, but only once.... |
I remember someone once having issues when trying to do multiple resubmissions to the same destination all the time. Does this work for you guys? In the past all my resubmissions would go from destination A to destination B to destination C, never to the same one again... I think. |
It seems to be that this is more of a galaxy problem than TVP. After the first resubmission, the |
So, on the first failure, the k8s destination does have a |
I'm trying to get the resubmit integration tests working again: https://github.com/galaxyproject/total-perspective-vortex/blob/main/tests/test_mapper_resubmit.py, and see whether we can test this exact scenario. However, it sure does look like this line in the block you highlighted could cause issues in particular https://github.com/galaxyproject/galaxy/blob/e67cb9b615d3f373fdf3d8534a6e5208f20e94b9/lib/galaxy/jobs/runners/state_handlers/resubmit.py#L87-L89 If I understood that comment correctly, it won't re-evaluate the dynamic code, and instead uses the cached destination, which would explain this behaviour. |
On my k8s setup, having fixed the OOM labeling, I use the following tpv setup:
following I bit one of the test cases here.
on top of the existing one by default on the Helm chart:
on run I see the following DEBUG outputs from tpv:
which looks fine I guess besides the fact that
resubmit
is empty or "{}" depending on where you look. But then I see a more worrying:which I suspect is related. On Galaxy, the jobs shows the OOM message on the UI ("Tool failed due to insufficient memory. Try with more memory."). Any idea of what might be going wrong? Is the DEBUG showing what one would expect? Thanks.
I also tried the setup in the readthedocs docs, it didn't work for me either, will post the results here as well.
The text was updated successfully, but these errors were encountered: