Make Jepa loader more flexible #945

antoine-tran · 2025-01-02T12:05:58Z

What does this PR do? Please describe:
In JEPA frozen evaluation scenario, the pretrained encoder was loaded from an external pretrained checkpoint. Because the pretrained checkpoints can be updated separately from the attentive pooling checkpoints, we need to make sure the weights used in combination with the corresponding attentive pooling are saved in a special place, and will not be overridden in the subsequent training epoches.

In JEPA, this was done via a special checkpoint key "target_encoder" that is saved along with "encoder" in a pretrained checkpoint, that is reserved for the evaluation reproducibility. References: JEPA evals configs (Example here, loader here)

This PR makes an update to allow loading a JEPA model for different purposes (training, frozen evaluation)

** Fixes #{issue number}

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

cbalioglu · 2025-01-02T14:56:04Z

src/fairseq2/models/jepa/loader.py

+    if "encoder" not in state_dict:
+        raise ValueError(f"`encoder` not found in state dict (available key: {state_dict.keys()})")
+
+    return state_dict["encoder"]


Although I understood the PR description, I am not sure if I understand the change here. Any reason for not handling this check in convert_jepa_checkpoint? I mean instead of

checkpoint = checkpoint["encoder"]

having:

checkpoint = checkpoint.get("encoder") if checkpoint is None: raise ValueError(...)

What is the benefit of having this check in a tensor_loader?

I have 2 thoughts in making this change, both are opinionated though:

We should narrow the scope of convert_jepa_checkpoint function to only converting the parameters related to the jepa model. How we get into these parameters is handled separately (in TensorLoader).

With this, we do not list all possible checkpoint keys ("encoder" , "target_encoder") and define their priority in convert_jepa_checkpoint. This allows us to inject the pretrained encoders from other "exotic" checkpoints (for example, the jepa-llava where the encoder is stored in vision_tower).

The drawback of this approach though is we have to write custom TensorLoader for each checkpoint, so it is the matter of opinions here...

How about doing something like:

# Handles different variants of JEPA checkpoints and delegates the actual conversion # to the standard converter. def convert_jepa_checkpoint( checkpoint: dict[str, Any], config: JepaConfig ) -> dict[str, Any]: if "vision_tower" in checkpoint: return convert_jepa_encoder_checkpoint(checkpoint["vision_tower"]) if "target_encoder" in checkpoint: return convert_jepa_encoder_checkpoint(checkpoint["target_encoder"]) if "encoder" in checkpoint: return convert_jepa_encoder_checkpoint(checkpoint["encoder"]) raise ValueError("encoder not found.") def convert_jepa_encoder_checkpoint( checkpoint: dict[str, Any], config: JepaConfig ) -> dict[str, Any]: # Contains the current implementation. ...

My worry with the TensorLoader approach is that we leak state dict handling logic to tensor loading. Essentially we want to "pre-process" the checkpoint before passing it to the converter. So a wrapper function might do the job as well. Let me know what you think.

cbalioglu

Please make sure to run mypy/flake8/isort/black to fix the format errors before merging. Otherwise, looks good to me! Thanks!

Tuan Tran added 5 commits January 1, 2025 08:56

revert loader

7884d1c

cleanup comments

27d290c

update jepa loader

8beeddd

Merge branch 'main' into tuan/fix_jepa_loader

982f8a3

rebase

4d920c9

antoine-tran requested a review from cbalioglu as a code owner January 2, 2025 12:05

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 2, 2025

cbalioglu reviewed Jan 2, 2025

View reviewed changes

Tuan Tran added 4 commits January 2, 2025 17:25

lint

0ffab25

mypy

24045f6

typo

bfd7951

Can's comments

09b843a

cbalioglu approved these changes Jan 3, 2025

View reviewed changes

Tuan Tran and others added 3 commits January 3, 2025 10:00

lint

615120d

black

a94f8cb

black

6f79d1b

antoine-tran merged commit b4b29e5 into main Jan 3, 2025
15 checks passed

antoine-tran deleted the tuan/fix_jepa_loader branch January 3, 2025 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Jepa loader more flexible #945

Make Jepa loader more flexible #945

antoine-tran commented Jan 2, 2025 •

edited

Loading

cbalioglu Jan 2, 2025

antoine-tran Jan 2, 2025 •

edited

Loading

cbalioglu Jan 2, 2025

cbalioglu left a comment

Make Jepa loader more flexible #945

Make Jepa loader more flexible #945

Conversation

antoine-tran commented Jan 2, 2025 • edited Loading

cbalioglu Jan 2, 2025

Choose a reason for hiding this comment

antoine-tran Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

cbalioglu Jan 2, 2025

Choose a reason for hiding this comment

cbalioglu left a comment

Choose a reason for hiding this comment

antoine-tran commented Jan 2, 2025 •

edited

Loading

antoine-tran Jan 2, 2025 •

edited

Loading