[WIP] Image Support (OpenAI multipart messages) #99

hijera · 2025-12-06T19:53:06Z

Summary

Added multimodal support for user messages: text plus image parts.
Images can be provided as local file paths, HTTP/HTTPS URLs, data URLs, or raw base64; all are converted to OpenAI-compatible image_url parts with base64 payloads (LM Studio, llama.cpp server, vLLM, or any OpenAI-style endpoint that accepts images).
Conversation normalization now preserves multimodal content through the agent pipeline without altering tool/streaming logic.

What changed

core/utils/images.py: new helper that normalizes image references, downloads HTTP(S) images (<=5MB), infers mime, and emits data:<mime>;base64,... URLs required by vision backends.
api/models.py: ChatMessage.content now accepts either text or content parts; optional images field is mapped into image_url parts.
api/endpoints.py: user message preprocessing converts content + images into OpenAI-format parts and seeds the agent conversation with the multimodal user turn. Task extraction still derives text from the user parts.
core/base_agent.py: message normalization wraps plain strings into content parts to keep OpenAI streaming calls happy with mixed modalities.

How it works (data flow)

Client sends /v1/chat/completions with messages[*].content (string or parts) and/or messages[*].images (paths/URLs/base64/data URLs).
Endpoint builds a multimodal user message; images are run through to_image_part, which base64-encodes local files or downloaded URLs.
The agent conversation includes this prepared message; _prepare_context normalizes all content into OpenAI parts before calling chat.completions.stream.
Downstream agents/tools are unchanged; streaming remains intact.

Notes for vision backends

LM Studio: requires base64 in image_url.url; handled automatically by to_image_part.
llama.cpp server / vLLM with OpenAI-compatible vision: should accept the same image_url base64 form.
Remote HTTP images are downloaded and size-limited to 5MB; oversize inputs fail fast with a clear error.

Manual check (suggested)

Call /v1/chat/completions with a user message containing images: ["./cat.png"] or an HTTPS image URL; expect the model to receive base64 in image_url.url.

The only thing that confuses me about this code is that a separate message from the user is added, but chatGPT Codex says that it can be changed to insert everything into the initial_user_request , but this will change the current UX

virrius

Не могу принять в текущем виде. Выглядит очень перегруженно.

Не понятно назначение препроцессинга, если к нам уже приходит котекст в openai формате и нам достаточно его переслать
Непонятны мувы, зачем нам их загружать/преобразовывать - требуется ли для каких-то моделей подобная логика?
Есть мнение, что httpx синхронный клиент был использован

virrius · 2025-12-10T14:54:23Z

sgr_deep_research/api/endpoints.py

+        user_message = _get_last_user_message(request.messages)
+        user_content = _preprocess_message_content(user_message.content, user_message.images)


Предложил бы использовать готовый тип openai ChatCompletionMessageParam. Убрало бы необходимость его препроцессить

И как закономерное продолжение подобной переработки - брать не только последнее сообщение, а весь присылаемый контекст

virrius · 2025-12-10T14:56:08Z

sgr_deep_research/api/endpoints.py

-            return message.content
+            # Combine text parts into a single string for task extraction
+            if isinstance(message.content, str):
+                return message.content
+            if isinstance(message.content, list):
+                text_parts = [p.get("text") for p in message.content if isinstance(p, dict) and p.get("type") == "text"]
+                return " ".join(filter(None, text_parts)) or "Image-only request"


выглядит странно, для чего оно? Подобный контент объединять в одну строку нарушит дальнейший протокол передачи

hijera added 5 commits December 6, 2025 22:19

Images process utils

727b222

Base Agent edits for multipart message support

101edb7

Update models.py

d1e12f5

update endpoints to support images

ad05bcc

Linter

b14df16

virrius requested changes Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Image Support (OpenAI multipart messages) #99

[WIP] Image Support (OpenAI multipart messages) #99

hijera commented Dec 6, 2025

Uh oh!

virrius left a comment •

edited

Loading

Uh oh!

virrius Dec 10, 2025

Uh oh!

virrius Dec 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		user_message = _get_last_user_message(request.messages)
		user_content = _preprocess_message_content(user_message.content, user_message.images)

[WIP] Image Support (OpenAI multipart messages) #99

Are you sure you want to change the base?

[WIP] Image Support (OpenAI multipart messages) #99

Conversation

hijera commented Dec 6, 2025

Summary

What changed

How it works (data flow)

Notes for vision backends

Manual check (suggested)

Uh oh!

virrius left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

virrius Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

virrius Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

virrius left a comment •

edited

Loading

virrius Dec 10, 2025 •

edited

Loading