How can I effectively debug BSON DocumentTooLarge errors? #5953
-
I have a workflow step that uses "with items", with each item outputting 1MB of data (csv string), I then join all those strings together using YAQL, input them to the next step which delete duplicates and reorders and outputs a ~5MB string. Everything is fine up until this point, the output of the last step shows me ~20000 lines in the UI. However, the next step (upload) tries to ingest this, in addition to some other small string variables, and I get the below error in my parent workflow
This upload step doesn't even start executing, so from what I can see it is failing due to the input vars. In my mind the large object of ~5MB should take up most of the space, and I expect the format of BSON and other metadata to increase it slightly, but I'm not sure what else accounts for the extra ~15MB. I'm just looking for some guidance on how I can debug this. Can it be described to me what the documents are that are being saved and their structure? Is there a log file somewhere that can help? Can I view all currently saved documents anywhere? I need to better my understanding around this area because I keep falling into this trap and refactoring my workflow. Then I increase my data set for testing and it fails again due to document size errors like this one. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I guess it helps a lot to write out the problem! It triggered me hunting out the mongo conf, dig around the collections and query for the execution id of the workflow and decoding parts of the document I found (which was 13.2MB itself). It looks like the workflow document stores all the context inside itself, so I'm going to expect one of the workflows I have is expanding its context too much. I'm currently playing around with the idea of just referencing task outputs rather than storing them in context, and reusing/deleting context keys when i don't need them anymore. Edit: Referencing the task outputs using task(...) meant these no longer needed to be in the context and so saved on all the document space |
Beta Was this translation helpful? Give feedback.
I guess it helps a lot to write out the problem! It triggered me hunting out the mongo conf, dig around the collections and query for the execution id of the workflow and decoding parts of the document I found (which was 13.2MB itself).
It looks like the workflow document stores all the context inside itself, so I'm going to expect one of the workflows I have is expanding its context too much. I'm currently playing around with the idea of just referencing task outputs rather than storing them in context, and reusing/deleting context keys when i don't need them anymore.
Edit: Referencing the task outputs using task(...) meant these no longer needed to be in the context and so saved on all …