-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Colflor #104
base: main
Are you sure you want to change the base?
Conversation
Changed from the OG repo:
todo:
|
Very excited to see ColFlor being merged into the main repo. I wanna add a few comments and clarifications. The attention mask of the backbone model, Florence-2, only covers the text embeddings. That's why we can not directly apply it on the final output embeddings (image+text). Instead, I added a new parameter "full attention mask" to cover both the image and text. I think we can refactor this part in a better way. Regarding the initialization of the linear projection layer, we can use the default torch initialization and remove the extra code I added. I added this custom initialization when I was debugging the convergence issues, but forgot to remove it. I also used the default tokenizer and processor from the original Florence-2-base checkpoint. (https://huggingface.co/microsoft/Florence-2-base). |
Probably mergeable when we clean up the weird "attention mask things" and we run trainings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- One important remark: shouldn't we rename the model from
colflor
tocolflor2
? This will:
- make it ISO with
ColQwen2
- make it easier if we ever want to use our ColVision architecture for an hypothetical
Florence3
model →ColFlor3
.
Personlly, I think it's worth doing the proper renaming. What do you think?
- Can you update the CHANGELOG too please?
Otherwise, LGTM! Thanks! 👌🏼
# ruff: noqa | ||
# coding=utf-8 | ||
# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved. | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks this code is adapted microsoft/Florence-2-large
from the Hf Hub. If yes, I would add the original link on top of the file. Moreover, I'm not a big fan of # ruff: noqa
so I think we could get rid of it.
You might need to apply the ruff linter after this change!
# ruff: noqa | |
# coding=utf-8 | |
# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved. | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software | |
# distributed under the License is distributed on an "AS IS" BASIS, | |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
# See the License for the specific language governing permissions and | |
# limitations under the License. | |
# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/configuration_florence2.py | |
# coding=utf-8 | |
# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved. | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software | |
# distributed under the License is distributed on an "AS IS" BASIS, | |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
# See the License for the specific language governing permissions and | |
# limitations under the License. | |
import warnings | ||
|
||
""" Florence-2 configuration""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import warnings | |
""" Florence-2 configuration""" | |
"""Florence-2 configuration""" | |
import warnings |
def __init__(self, config: Florence2Config): | ||
super().__init__(config=config) | ||
|
||
self.dim = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it too late to allow the user to change the output dimension?
@@ -0,0 +1,3111 @@ | |||
# ruff: noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before. Might need to run ruff too!
# ruff: noqa | |
# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py | |
# NOTE: Disable ruff to keep the original file style. | |
# ruff: noqa |
@@ -0,0 +1,1087 @@ | |||
# ruff: noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before.
# ruff: noqa | |
# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/processing_florence2.py | |
# NOTE: Disable ruff to keep the original file style. | |
# ruff: noqa |
from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor | ||
|
||
from .processing_florence2 import Florence2Processor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the two imports absolute? Not super important, but i'd be great to be ISO with the rest of the codebase.
from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor | |
from .processing_florence2 import Florence2Processor | |
from colpali_engine.models.florence2.colflor.processing_florence2 import Florence2Processor | |
from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor |
|
||
class ColFlorProcessor(BaseVisualRetrieverProcessor, Florence2Processor): | ||
""" | ||
Processor for ColPali. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processor for ColPali. | |
Processor for ColFlor. | |
From:
https://github.com/AhmedMasryKU/colflor