Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted Optimization #543

Open
MaxiBoether opened this issue Jun 22, 2024 · 1 comment
Open

Weighted Optimization #543

MaxiBoether opened this issue Jun 22, 2024 · 1 comment
Assignees

Comments

@MaxiBoether
Copy link
Contributor

There are two reasons we perform weighted optimization

  1. we receive weights from the selector (manual prioritization)

  2. a downsampling strategy outputs weights (e.g. loss/gradnorm)

However I think right now some things in the weight handling are a bit weird:

a) If we receive weights from the selector and then do downsampling, we loose the weights

b) We do the following logic:

        weighted_optimization = (
            retrieve_weights_from_dataloader or self._downsampling_mode == DownsamplingMode.BATCH_THEN_SAMPLE
        )

and I don't know why we do this. First, do we never use weighted optimization in StB? Second, why do we always use weights in BtS? I think whether we should perform weighted optimization is a property of the downsampling startegy (if we don't receive weights from the selector). If we don't receive weights from the selector and e.g. use RHO-LOSS, we should not use weighted optimization. While we set the weights to 1 currently, we use the noreduction loss function, which I think can have performance implications for training with downsampling. I think we should do the following:

  1. if we receive weights from selector and use no downsampling, use weights from selector
  2. if we don't receive weights from selector and use no downsampling OR a downsampling strategy that does not output weights (we probably need to add a flag), use no weighted optimization
  3. if we don't receive weights from selector and use downsampling that outputs weights (e.g. loss/gradnorm), use donwmsapling weights in both StB and BtS
  4. if we receive weights from selector and use downsamplers that outputs weights, not sure. either we use one of the weights or we multiply them.

right now the weight handling is using the expensive no reduction thing even if it's not necessary, I think

@MaxiBoether
Copy link
Contributor Author

@XianzheMa maybe, at your convienence, you could check this out, since it may have downsampling performance implications :) thank you! but not highest priority of course

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants