Device allocation issue when running constrained generation #1713
Unanswered
GitHubOfAndrew
asked this question in
Q&A
Replies: 1 comment
-
Hi @GitHubOfAndrew! We've fixed a bug related to device location in Outlines |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
For context, I am facing the above issue when supplying text and/or image inputs into a multimodal llm (Llama 3.2 11B Vision Instruct). Furthermore, I am running this on a Vertex AI Workbench in GCP (configured with a single NVIDIA A100 GPU). Using
outlines==1.2.1
.Beyond this, I have looked at discussion #1708 which was seemingly an identical issue, however, the resolution they came to did not work for me (got a CUDA-level error). It does not seem that they were using multimodal functionality, so I am opening my own issue (please let me know if this is not appropriate).
This is the code snippet that is giving me errors:
This is the exception I receive:
This is not just an issue with images, text-only inputs to this model have the same issue. I think this may possibly be a bug with the
TransformersMultiModal
class during the logit masking process. I have verified that the following work:Only inference with constrained generation fails. I would appreciate any pointers.
Edit: I've tried the simplest example outlined in this documentation and I've noticed that this returns wonky outputs that are not JSON as well. Has the multimodal constrained generation capability of outlines been validated? Are there any notebooks or scripts that can reproduce consistently clean JSON?
Beta Was this translation helpful? Give feedback.
All reactions