-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to convert the pretrained model into onnx format? #45
Comments
Conversion to ONNX is possible and has been done, but inference in other languages is not trivial, mainly because HuggingFace's |
An automated conversion of the model to ONNX can be found here https://huggingface.co/kha-white/manga-ocr-base/blob/refs%2Fpr%2F3/model.onnx |
@kha-white Thanks for your reply, I have converted an ONNX model with input and output shapes as follows:
I understand that
@Mar2ck Wow I didn't see there is even a bot that turns model into onnx automatically! But I see it is a little different from mine:
It has |
Ok so I don't know exactly how to do the inference in onnx (I played around with it a little bit but it seemed rather tricky to do and I abandoned/postponed it), so I'll just tell you what I know. This model has an encoder-decoder architecture. The encoder gets an image as an input and outputs a feature vector (this is The tricky part is replicating the beam search (although it could be replaced with a simpler greedy search at cost of some accuracy drop) and getting all the little details right when passing around the tensors. BTW I suppose that there is something wrong with both yours and that bot's onnx export. I think that there should be separate onnx file for encoder and decoder, since the encoder is run only once per inference and then the decoder is run iteratively until the end of the sequence is reached. |
@kha-white Thank you for your reply, I'm starting to understand the whole workflow. I am currently looking for a solution regarding |
@LancerComet Hi, would you like to share your method to export the pre-trained model to onnx format? I am getting the below errors when exporting with optimum-cli: $ optimum-cli export onnx --model kha-white/manga-ocr-base bin/ (base)
Framework not specified. Using pt to export to ONNX.
Automatic task detection to image-to-text-with-past.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
/Users/mayo/miniconda3/lib/python3.11/site-packages/transformers/models/vit/feature_extraction_vit.py:28: FutureWarning: The class ViTFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ViTImageProcessor instead.
warnings.warn(
Traceback (most recent call last):
File "/Users/mayo/miniconda3/bin/optimum-cli", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 163, in main
service.run()
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/commands/export/onnx.py", line 232, in run
main_export(
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 399, in main_export
onnx_config, models_and_onnx_configs = _get_submodels_and_onnx_configs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 82, in _get_submodels_and_onnx_configs
onnx_config = onnx_config_constructor(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/base.py", line 623, in with_past
return cls(
^^^^
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/model_configs.py", line 1231, in __init__
super().__init__(
File "/Users/mayo/miniconda3/lib/python3.11/site-packages/optimum/exporters/onnx/config.py", line 322, in __init__
raise ValueError(
ValueError: The decoder part of the encoder-decoder model is bert which does not need past key values. Updated: Got exporting succucced. Run |
Seems like there's a rust implementation called Candle by huggingface as well. There's also an example for tocr, not sure how hard it is to convert this model to candle or use candle-onnx, but it seems promising for binary/wasm. Update: I modified the trocr example to work with mangaocr, and the image processor and encoder seems fine (not exact tensor outputs), but I am having trouble with the final output of the decoder. let output_projection = candle_nn::linear_no_bias(
decoder_cfg.d_model,
decoder_cfg.vocab_size,
vb.pp("decoder.cls.predictions.decoder")
)?; I am getting
|
Hope it helps. import re
import jaconv
import bumpy as np
from onnxruntime import InferenceSession
from PIL import Image
class MangaOCR():
def __init__(self, model_path: str, vocab_path: str):
self.session = InferenceSession(model_path)
self.vocab = self._load_vocab(vocab_path)
def __call__(self, image: Image.Image) -> str:
image = self._preprocess(image)
token_ids = self._generate(image)
text = self._decode(token_ids)
text = self._postprocess(text)
return text
def _load_vocab(self, vocab_file: str) -> list[str]:
with open(vocab_file, "r") as f:
vocab = f.read().splitlines()
return vocab
def _preprocess(self, image: Image.Image) -> np.ndarray:
# convert to grayscale
image = image.convert("L").convert("RGB")
# resize
image = image.resize((224, 224), resample=2)
# rescale
image = np.array(image, dtype=np.float32)
image /= 255
# normalize
image = (image - 0.5) / 0.5
# reshape from (224, 224, 3) to (3, 224, 224)
image = image.transpose((2, 0, 1))
# add batch size
image = image[None]
return image
def _generate(self, image: np.ndarray) -> np.ndarray:
token_ids = [2]
for _ in range(300):
[logits] = self.session.run(
output_names=["logits"],
input_feed={
"image": image,
"token_ids": np.array([token_ids]),
},
)
token_id = logits[0, -1, :].argmax()
token_ids.append(int(token_id))
if token_id == 3:
break
return token_ids
def _decode(self, token_ids: list[int]) -> str:
text = ""
for token_id in token_ids:
if token_id < 5:
continue
text += self.vocab[token_id]
return text
def _postprocess(self, text: str) -> str:
text = "".join(text.split())
text = text.replace("…", "...")
text = re.sub("[・.]{2,}", lambda x: (x.end() - x.start()) * ".", text)
text = jaconv.h2z(text, ascii=True, digit=True)
return text You can implement beam search, I was just lazy. But I've tested it and it works, though it's slower than HF's generate method. For completion this is how I exported the model: import torch
from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained("kha-white/manga-ocr-base")
model.eval()
# Dummy input for the model
dummy_image = torch.randn(1, 3, 224, 224)
dummy_token_ids = torch.tensor([[2]])
# Export the model
torch.onnx.export(
model,
(dummy_image, dummy_token_ids),
"ocr/model.onnx",
input_names=["image", "token_ids"],
output_names=["logits"],
dynamic_axes={
"image": {
0: "batch_size",
},
"token_ids": {
0: "batch_size",
1: "sequence_length",
},
"logits": {
0: "batch_size",
1: "sequence_length",
},
},
opset_version=11
) |
Hi, after reading the code, it seems that the pretrained weights need to be used in conjunction with a tokenizer and some other libraries:
This makes it seem like there's no way to convert the model to the ONNX format for use in other languages. Do you have any thoughts on how to achieve this? I don't know too much about it, thank you very much!
The text was updated successfully, but these errors were encountered: