Colflor #104

ManuelFay · 2024-10-23T14:12:02Z

From:
https://github.com/AhmedMasryKU/colflor

ManuelFay · 2024-10-23T14:16:47Z

Changed from the OG repo:

casting the dtype to make loading in bfloat16 possible (pixel_values + full attention mask)
remove use_cache was put in the processor init
remove useless mock image
linting
WIP: train models with it to verify compatibility

todo:

clarify the full attention mask shenanigan
make the conversion code from the OG florence checkpoint available
clarify why default torch weight init in the projection layer is not okay ?
clarify why use_cache was put in the processor init
clarify why the fast tokenizer does not work ?

AhmedMasryKU · 2024-10-23T16:50:38Z

Very excited to see ColFlor being merged into the main repo. I wanna add a few comments and clarifications.

The attention mask of the backbone model, Florence-2, only covers the text embeddings. That's why we can not directly apply it on the final output embeddings (image+text). Instead, I added a new parameter "full attention mask" to cover both the image and text. I think we can refactor this part in a better way.

Regarding the initialization of the linear projection layer, we can use the default torch initialization and remove the extra code I added. I added this custom initialization when I was debugging the convergence issues, but forgot to remove it.

I also used the default tokenizer and processor from the original Florence-2-base checkpoint. (https://huggingface.co/microsoft/Florence-2-base).

ManuelFay · 2024-11-05T17:49:56Z

Probably mergeable when we clean up the weird "attention mask things" and we run trainings

tonywu71

One important remark: shouldn't we rename the model from colflor to colflor2? This will:

make it ISO with ColQwen2
make it easier if we ever want to use our ColVision architecture for an hypothetical Florence3 model → ColFlor3.

Personlly, I think it's worth doing the proper renaming. What do you think?

Can you update the CHANGELOG too please?

Otherwise, LGTM! Thanks! 👌🏼

tonywu71 · 2024-11-05T21:10:56Z

colpali_engine/models/florence2/colflor/configuration_florence2.py

+# ruff: noqa
+# coding=utf-8
+# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.


Looks this code is adapted microsoft/Florence-2-large from the Hf Hub. If yes, I would add the original link on top of the file. Moreover, I'm not a big fan of # ruff: noqa so I think we could get rid of it.

You might need to apply the ruff linter after this change!

Suggested change

# ruff: noqa

# coding=utf-8

# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved.

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/configuration_florence2.py

# coding=utf-8

# Copyright 2024 Microsoft and the HuggingFace Inc. team. All rights reserved.

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

tonywu71 · 2024-11-05T21:11:44Z

colpali_engine/models/florence2/colflor/configuration_florence2.py

+import warnings
+
+""" Florence-2 configuration"""


Suggested change

import warnings

""" Florence-2 configuration"""

"""Florence-2 configuration"""

import warnings

tonywu71 · 2024-11-05T21:13:57Z

colpali_engine/models/florence2/colflor/modeling_colflor.py

+    def __init__(self, config: Florence2Config):
+        super().__init__(config=config)
+
+        self.dim = 128


Is it too late to allow the user to change the output dimension?

tonywu71 · 2024-11-05T21:15:18Z

colpali_engine/models/florence2/colflor/modeling_florence2.py

@@ -0,0 +1,3111 @@
+# ruff: noqa


Same as before. Might need to run ruff too!

Suggested change

# ruff: noqa

# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py

# NOTE: Disable ruff to keep the original file style.

# ruff: noqa

tonywu71 · 2024-11-05T21:19:26Z

colpali_engine/models/florence2/colflor/processing_florence2.py

@@ -0,0 +1,1087 @@
+# ruff: noqa


Same as before.

Suggested change

# ruff: noqa

# This code was copied and adapted from the "microsoft/Florence-2-large" model: https://huggingface.co/microsoft/Florence-2-large/blob/main/processing_florence2.py

# NOTE: Disable ruff to keep the original file style.

# ruff: noqa

tonywu71 · 2024-11-05T21:25:07Z

colpali_engine/models/florence2/colflor/processing_colflor.py

+from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor
+
+from .processing_florence2 import Florence2Processor


Can we make the two imports absolute? Not super important, but i'd be great to be ISO with the rest of the codebase.

Suggested change

from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor

from .processing_florence2 import Florence2Processor

from colpali_engine.models.florence2.colflor.processing_florence2 import Florence2Processor

from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor

tonywu71 · 2024-11-05T21:25:36Z

colpali_engine/models/florence2/colflor/processing_colflor.py

+
+class ColFlorProcessor(BaseVisualRetrieverProcessor, Florence2Processor):
+    """
+    Processor for ColPali.


Suggested change

Processor for ColPali.

Processor for ColFlor.

ManuelFay added 15 commits October 23, 2024 12:23

train colflor

5842c91

lint

89390d9

fix

cd999f9

fux

cbb5bce

fux

639ed19

ff

0a00dce

ff

e36d674

test

4b14cf3

train colflor

12bb5bb

fix

6a73851

ff

88597ac

letsgo

e6a1d1a

bfloat

abc69f7

remove bp

375a1a8

gp

fc976fc

ManuelFay added 8 commits October 23, 2024 16:19

debug breakpoint

ab391b2

bfloat16

7ead7e7

test

fc218f2

go

a81ffd7

fix

01256e2

little fix

2744d6d

mini debug training

e4bc4f9

colflor

12bfbe3

timm

4bea623

ManuelFay marked this pull request as ready for review November 5, 2024 17:20

ManuelFay requested a review from tonywu71 November 5, 2024 17:20

remove proj init

b39dd8d

tonywu71 assigned ManuelFay Nov 5, 2024

tonywu71 requested changes Nov 5, 2024

View reviewed changes

tonywu71 added the new model New ColVision model label Nov 5, 2024

query

e04af42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colflor #104

Colflor #104

ManuelFay commented Oct 23, 2024 •

edited

Loading

ManuelFay commented Oct 23, 2024 •

edited

Loading

AhmedMasryKU commented Oct 23, 2024

ManuelFay commented Nov 5, 2024

tonywu71 left a comment •

edited

Loading

tonywu71 Nov 5, 2024

tonywu71 Nov 5, 2024

tonywu71 Nov 5, 2024

tonywu71 Nov 5, 2024

tonywu71 Nov 5, 2024

tonywu71 Nov 5, 2024

tonywu71 Nov 5, 2024

		from colpali_engine.utils.processing_utils import BaseVisualRetrieverProcessor

		from .processing_florence2 import Florence2Processor

Colflor #104

Are you sure you want to change the base?

Colflor #104

Conversation

ManuelFay commented Oct 23, 2024 • edited Loading

ManuelFay commented Oct 23, 2024 • edited Loading

AhmedMasryKU commented Oct 23, 2024

ManuelFay commented Nov 5, 2024

tonywu71 left a comment • edited Loading

Choose a reason for hiding this comment

tonywu71 Nov 5, 2024

Choose a reason for hiding this comment

tonywu71 Nov 5, 2024

Choose a reason for hiding this comment

tonywu71 Nov 5, 2024

Choose a reason for hiding this comment

tonywu71 Nov 5, 2024

Choose a reason for hiding this comment

tonywu71 Nov 5, 2024

Choose a reason for hiding this comment

tonywu71 Nov 5, 2024

Choose a reason for hiding this comment

tonywu71 Nov 5, 2024

Choose a reason for hiding this comment

ManuelFay commented Oct 23, 2024 •

edited

Loading

ManuelFay commented Oct 23, 2024 •

edited

Loading

tonywu71 left a comment •

edited

Loading