Replies: 4 comments 1 reply
-
I have some idea about the UI. Chat elements of Streamlit might be a good place to start. It allows us to focus on Python instead of HTML/CSS/JS. The following elements available out of the box may also be helpful for Multimodal use cases. gradio (oobabooga is a gradio web UI) seems to be a potential option as well, Streamlit seems to be more flexible: Hugging Face Acquires Gradio |
Beta Was this translation helpful? Give feedback.
-
Wow from a quick look I'm almost already sold on Streamlit. I'll dive into that as one option for sure. I'm really glad you made that suggestion. I agree that we don't want to waste much time on the UI. At the same time I think it would be a good thing for this project to have a more useful application built, so that people have something better to look at and tinker with right out of the box, so I'm ready to put in some effort, just want to be careful not to get carried away. In general I like this idea of picking a UI framework for the app rather than keeping it barebones the way it is now. |
Beta Was this translation helpful? Give feedback.
-
Code Interpreter might be an interesting use case: let the Agent draw data charts. |
Beta Was this translation helpful? Give feedback.
-
Just mentioning that I added an issue to create a new "starter" application to replace the demo. That'll be a first step towards enabling multimodal features. I'm eager to get something better built but I'll need some time to experiment. I'll post updates as soon as I have them. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone!
Now that kombu is in place for messaging, I'm excited to get started on exploring multimodal support!
Here's the thing though - What does that even mean?
I have some ideas for the use cases I'd like to ensure are supported. For example:
I think these cover a lot of ground between basic image and audio based use cases. I'd like to hear what you all have in mind regarding multimedia support so that I make sure to address the use cases you want to implement.
I didn't include video above because I don't yet have an idea of how I'd like that to work. If you're interested in video support please let me know what you're thinking!
Another thing that should be noted: Multimodal support implies UI work for the web app. I'm not sure how the UI will have to change. A chat interface is not all there is! I can't say that I'll have the time to implement a killer UI on the demo application, but I want to make sure that the demo is proof enough that many use cases are possible.
So if the above use cases are missing something or if you'd like to add some thoughts on what you'd like to see as we move into multimedia, please let me know!
Beta Was this translation helpful? Give feedback.
All reactions