You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Impact of the new feature
Workqueue and WMAgent I suspect, not sure about other parts
Is your feature request related to a problem? Please describe.
Yes, currently we cannot start running a WQE if its input is not 100% available. There are workflows whose inputs are available more than 99% and either they cannot start running if they have only one WQE or they cannot produce enough statistics due to the fact that their partial WQEs aren't picked up. This requires operations team to debug such transfers although the workflows could finish with partial input, so unnecessary manual work.
e.g. the following workflow's input is 99.8% available: transfer_doc and WQE but it's not picked up.
Describe the solution you'd like
As I understand, this requires lots of changes due to the design of the system. I'd like to understand which parts of the design makes it hard to change and what we can do about it.
Describe alternatives you've considered
Don't touch the default system behavior, but be able to hack it for affected workflows. Currently, the affected number of workflows are in the order of 10s. If we can make only those workflows run with partial WQE via an hacked agent, this would be an improvement. Any hints for this? We can use our P&R agents if this is possible.
Additional context
I suspected this issue requires @amaltaro 's attention. Thanks a lot in advance
I add my two cents. This is a small dataset (1.5TB) and workqueue decided to create only a single WQE for all the blocks in it, which makes irrelevant the "partial_copy = 0.5" set for the campaign.
Maybe we do not need to change the agents and make them process wqe with partially available inputs, maybe changing how splitting works for small datasets would be sufficient.
Different than workflows with partially available input data, there is no mechanism in WM that would accept a partially available workqueue element. The challenges with this development are many, starting with the hierarchical queues in the system, in other words:
an agent would acquire a partially available GQ (global workqueue) into LQ (local workqueue) --> this one is Okay in general.
then the agent would acquire a partially available LQ into WMBS --> this is tricky! As at this moment, the data discovery + location happens, all the meta-data is collected and prepared to be inserted into the relational database.
once data is in WMBS, JobCreator iterates over the subscriptions and create the relevant job groups / jobs.
That means, we would have to track which LQ elements were partially accepted and inserted into WMBS, keep looking for newly satisfied work, organizing (data discovery + location) and adding those to WMBS with all the previous association.
Many things can go wrong and/or get stuck with this model. In short, my opinion is that this could potentially cause more problems than benefits.
Impact of the new feature
Workqueue and WMAgent I suspect, not sure about other parts
Is your feature request related to a problem? Please describe.
Yes, currently we cannot start running a WQE if its input is not 100% available. There are workflows whose inputs are available more than 99% and either they cannot start running if they have only one WQE or they cannot produce enough statistics due to the fact that their partial WQEs aren't picked up. This requires operations team to debug such transfers although the workflows could finish with partial input, so unnecessary manual work.
e.g. the following workflow's input is 99.8% available: transfer_doc and WQE but it's not picked up.
Describe the solution you'd like
As I understand, this requires lots of changes due to the design of the system. I'd like to understand which parts of the design makes it hard to change and what we can do about it.
Describe alternatives you've considered
Don't touch the default system behavior, but be able to hack it for affected workflows. Currently, the affected number of workflows are in the order of 10s. If we can make only those workflows run with partial WQE via an hacked agent, this would be an improvement. Any hints for this? We can use our P&R agents if this is possible.
Additional context
I suspected this issue requires @amaltaro 's attention. Thanks a lot in advance
@hassan11196 @drkovalskyi FYI
The text was updated successfully, but these errors were encountered: