Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transformers vision cookbook with atomic caption flow #1216

Merged
merged 3 commits into from
Oct 21, 2024

Conversation

fearnworks
Copy link
Contributor

Request received in discord to add an example for the new transformers vision capability.

Vision-Language Models with Outlines

This guide demonstrates how to use Outlines with vision-language models, leveraging the new transformers_vision module. Vision-language models can process both text and images, allowing for tasks like image captioning, visual question answering, and more.

We will be using the Pixtral-12B model from Mistral to take advantage of some of its visual reasoning capabilities and a workflow to generate a multistage atomic caption.

@rlouf
Copy link
Member

rlouf commented Oct 20, 2024

It's awesome! We'll need to link to it from mkdocs.yml and from the cookbooks' index page :)

@fearnworks
Copy link
Contributor Author

fearnworks commented Oct 20, 2024

It's awesome! We'll need to link to it from mkdocs.yml and from the cookbooks' index page :)

Updated!

@rlouf rlouf changed the title transformers vision cookbook with atomic caption flow Add transformers vision cookbook with atomic caption flow Oct 21, 2024
@rlouf rlouf merged commit a2fa1e0 into dottxt-ai:main Oct 21, 2024
6 checks passed
@rlouf
Copy link
Member

rlouf commented Oct 21, 2024

Thank you so much for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants