You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two reasons we perform weighted optimization
we receive weights from the selector (manual prioritization)
a downsampling strategy outputs weights (e.g. loss/gradnorm)
However I think right now some things in the weight handling are a bit weird:
a) If we receive weights from the selector and then do downsampling, we loose the weights
b) We do the following logic:
weighted_optimization = (
retrieve_weights_from_dataloader or self._downsampling_mode == DownsamplingMode.BATCH_THEN_SAMPLE
)
and I don't know why we do this. First, do we never use weighted optimization in StB? Second, why do we always use weights in BtS? I think whether we should perform weighted optimization is a property of the downsampling startegy (if we don't receive weights from the selector). If we don't receive weights from the selector and e.g. use RHO-LOSS, we should not use weighted optimization. While we set the weights to 1 currently, we use the noreduction loss function, which I think can have performance implications for training with downsampling. I think we should do the following:
if we receive weights from selector and use no downsampling, use weights from selector
if we don't receive weights from selector and use no downsampling OR a downsampling strategy that does not output weights (we probably need to add a flag), use no weighted optimization
if we don't receive weights from selector and use downsampling that outputs weights (e.g. loss/gradnorm), use donwmsapling weights in both StB and BtS
if we receive weights from selector and use downsamplers that outputs weights, not sure. either we use one of the weights or we multiply them.
right now the weight handling is using the expensive no reduction thing even if it's not necessary, I think
The text was updated successfully, but these errors were encountered:
@XianzheMa maybe, at your convienence, you could check this out, since it may have downsampling performance implications :) thank you! but not highest priority of course
There are two reasons we perform weighted optimization
we receive weights from the selector (manual prioritization)
a downsampling strategy outputs weights (e.g. loss/gradnorm)
However I think right now some things in the weight handling are a bit weird:
a) If we receive weights from the selector and then do downsampling, we loose the weights
b) We do the following logic:
and I don't know why we do this. First, do we never use weighted optimization in StB? Second, why do we always use weights in BtS? I think whether we should perform weighted optimization is a property of the downsampling startegy (if we don't receive weights from the selector). If we don't receive weights from the selector and e.g. use RHO-LOSS, we should not use weighted optimization. While we set the weights to 1 currently, we use the noreduction loss function, which I think can have performance implications for training with downsampling. I think we should do the following:
right now the weight handling is using the expensive no reduction thing even if it's not necessary, I think
The text was updated successfully, but these errors were encountered: