-
Notifications
You must be signed in to change notification settings - Fork 150
[WIP] Image Support (OpenAI multipart messages) #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не могу принять в текущем виде. Выглядит очень перегруженно.
- Не понятно назначение препроцессинга, если к нам уже приходит котекст в openai формате и нам достаточно его переслать
- Непонятны мувы, зачем нам их загружать/преобразовывать - требуется ли для каких-то моделей подобная логика?
- Есть мнение, что httpx синхронный клиент был использован
| user_message = _get_last_user_message(request.messages) | ||
| user_content = _preprocess_message_content(user_message.content, user_message.images) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Предложил бы использовать готовый тип openai ChatCompletionMessageParam. Убрало бы необходимость его препроцессить
И как закономерное продолжение подобной переработки - брать не только последнее сообщение, а весь присылаемый контекст
| return message.content | ||
| # Combine text parts into a single string for task extraction | ||
| if isinstance(message.content, str): | ||
| return message.content | ||
| if isinstance(message.content, list): | ||
| text_parts = [p.get("text") for p in message.content if isinstance(p, dict) and p.get("type") == "text"] | ||
| return " ".join(filter(None, text_parts)) or "Image-only request" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
выглядит странно, для чего оно? Подобный контент объединять в одну строку нарушит дальнейший протокол передачи
Summary
image_urlparts with base64 payloads (LM Studio, llama.cpp server, vLLM, or any OpenAI-style endpoint that accepts images).What changed
core/utils/images.py: new helper that normalizes image references, downloads HTTP(S) images (<=5MB), infers mime, and emitsdata:<mime>;base64,...URLs required by vision backends.api/models.py:ChatMessage.contentnow accepts either text or content parts; optionalimagesfield is mapped intoimage_urlparts.api/endpoints.py: user message preprocessing convertscontent+imagesinto OpenAI-format parts and seeds the agent conversation with the multimodal user turn. Task extraction still derives text from the user parts.core/base_agent.py: message normalization wraps plain strings into content parts to keep OpenAI streaming calls happy with mixed modalities.How it works (data flow)
/v1/chat/completionswithmessages[*].content(string or parts) and/ormessages[*].images(paths/URLs/base64/data URLs).to_image_part, which base64-encodes local files or downloaded URLs._prepare_contextnormalizes all content into OpenAI parts before callingchat.completions.stream.Notes for vision backends
image_url.url; handled automatically byto_image_part.image_urlbase64 form.Manual check (suggested)
/v1/chat/completionswith a user message containingimages: ["./cat.png"]or an HTTPS image URL; expect the model to receive base64 inimage_url.url.The only thing that confuses me about this code is that a separate message from the user is added, but chatGPT Codex says that it can be changed to insert everything into the initial_user_request , but this will change the current UX