Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate LLMClient for Vision #430

Closed
reustle opened this issue Jan 24, 2025 · 2 comments
Closed

Separate LLMClient for Vision #430

reustle opened this issue Jan 24, 2025 · 2 comments

Comments

@reustle
Copy link

reustle commented Jan 24, 2025

#352 is a fantastic addition. I'm currently using deepseek-r1 via Ollama to run everything locally / offline. That said, afaik ds-r1 doesn't support vision.

I'd like to pass 2 LLMClients when initializing Stagehand. One for vision, and the other for everything else. This would allow me to try out something like LLava locally.

My guess is that the interface would look something like:

  const stagehand = new Stagehand({
    ...StagehandConfig,
    llmClient: new OllamaClient({
      modelName: "MFDoom/deepseek-r1-tool-calling:14b",
    }),
    llmVisionClient: new OllamaClient({
      modelName: "llava:13b",
    }),
  });

Has this been discussed internally? Is vision going to be kept longer term in Stagehand? Would a PR for this be welcome?

Thanks! 🤙

(related: #184)

@seanmcguire12
Copy link
Collaborator

Hey! thanks for the snippet & detailed issue desc -- I love this idea!

Vision is definitely in the long term plans for Stagehand. We are actually in the process of removing it + completely revamping it with a different approach, which we hope to add in the very near future.

I think after we get that in, a PR for this issue would be greatly appreciated. Gonna share this with the team and get their thoughts.

@reustle
Copy link
Author

reustle commented Jan 24, 2025

Ah, I see the fresh PR on this, sounds good. I think it's best to close for now and we revisit as vision v2 comes along. Thanks Sean.

@reustle reustle closed this as completed Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants