Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colflor #104

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Colflor #104

wants to merge 26 commits into from

Conversation

ManuelFay
Copy link
Collaborator

@ManuelFay ManuelFay commented Oct 23, 2024

@ManuelFay
Copy link
Collaborator Author

ManuelFay commented Oct 23, 2024

Changed from the OG repo:

  • casting the dtype to make loading in bfloat16 possible (pixel_values + full attention mask)
  • remove use_cache was put in the processor init
  • remove useless mock image
  • linting
  • WIP: train models with it to verify compatibility

todo:

  • clarify the full attention mask shenanigan
  • make the conversion code from the OG florence checkpoint available
  • clarify why default torch weight init in the projection layer is not okay ?
  • clarify why use_cache was put in the processor init
  • clarify why the fast tokenizer does not work ?

@AhmedMasryKU
Copy link

Very excited to see ColFlor being merged into the main repo. I wanna add a few comments and clarifications.

The attention mask of the backbone model, Florence-2, only covers the text embeddings. That's why we can not directly apply it on the final output embeddings (image+text). Instead, I added a new parameter "full attention mask" to cover both the image and text. I think we can refactor this part in a better way.

Regarding the initialization of the linear projection layer, we can use the default torch initialization and remove the extra code I added. I added this custom initialization when I was debugging the convergence issues, but forgot to remove it.

I also used the default tokenizer and processor from the original Florence-2-base checkpoint. (https://huggingface.co/microsoft/Florence-2-base).

@ManuelFay ManuelFay marked this pull request as ready for review November 5, 2024 17:20
@ManuelFay
Copy link
Collaborator Author

Probably mergeable when we clean up the weird "attention mask things" and we run trainings

Copy link
Collaborator

@tonywu71 tonywu71 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. One important remark: shouldn't we rename the model from colflor to colflor2? This will:
  • make it ISO with ColQwen2
  • make it easier if we ever want to use our ColVision architecture for an hypothetical Florence3 model → ColFlor3.

Personlly, I think it's worth doing the proper renaming. What do you think?

  1. Can you update the CHANGELOG too please?

Otherwise, LGTM! Thanks! 👌🏼

Comment on lines +1 to +14
# ruff: noqa
# coding=utf-8
# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks this code is adapted microsoft/Florence-2-large from the Hf Hub. If yes, I would add the original link on top of the file. Moreover, I'm not a big fan of # ruff: noqa so I think we could get rid of it.

You might need to apply the ruff linter after this change!

Suggested change
# ruff: noqa
# coding=utf-8
# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/configuration_florence2.py
# coding=utf-8
# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Comment on lines +15 to +17
import warnings

""" Florence-2 configuration"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import warnings
""" Florence-2 configuration"""
"""Florence-2 configuration"""
import warnings

def __init__(self, config: Florence2Config):
super().__init__(config=config)

self.dim = 128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it too late to allow the user to change the output dimension?

@@ -0,0 +1,3111 @@
# ruff: noqa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before. Might need to run ruff too!

Suggested change
# ruff: noqa
# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py
# NOTE: Disable ruff to keep the original file style.
# ruff: noqa

@@ -0,0 +1,1087 @@
# ruff: noqa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before.

Suggested change
# ruff: noqa
# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/processing_florence2.py
# NOTE: Disable ruff to keep the original file style.
# ruff: noqa

Comment on lines +7 to +9
from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor

from .processing_florence2 import Florence2Processor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the two imports absolute? Not super important, but i'd be great to be ISO with the rest of the codebase.

Suggested change
from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor
from .processing_florence2 import Florence2Processor
from colpali_engine.models.florence2.colflor.processing_florence2 import Florence2Processor
from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor


class ColFlorProcessor(BaseVisualRetrieverProcessor, Florence2Processor):
"""
Processor for ColPali.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Processor for ColPali.
Processor for ColFlor.

@tonywu71 tonywu71 added the new model New ColVision model label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model New ColVision model
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants