You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: guides/05_chatbots/03_agents-and-tool-usage.md
+142-6
Original file line number
Diff line number
Diff line change
@@ -3,14 +3,14 @@
3
3
Tags: LLM, AGENTS, CHAT
4
4
Related spaces: https://huggingface.co/spaces/gradio/agent_chatbot, https://huggingface.co/spaces/gradio/langchain-agent
5
5
6
-
The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents. This guide will show you how.
6
+
The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents and chain-of-thought (CoT) demos. This guide will show you how.
7
7
8
8
## The metadata key
9
9
10
10
In addition to the `content` and `role` keys, the messages dictionary accepts a `metadata` key. At present, the `metadata` key accepts a dictionary with a single key called `title`.
11
11
If you specify a `title` for the message, it will be displayed in a collapsible box.
12
12
13
-
Here is an example, were we display the agent's thought to use a weather API tool to answer the user query.
13
+
Here is an example, where we display the agent's thought to use a weather API tool to answer the user query.
We'll create a UI for langchain agent that has access to a search engine.
93
95
94
-
We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with
95
-
the following environment variables set -
96
+
We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with the following environment variables set -
96
97
97
98
```
98
99
SERPAPI_API_KEY=
@@ -166,4 +167,139 @@ demo.launch()
166
167
That's it! See our finished langchain demo [here](https://huggingface.co/spaces/gradio/langchain-agent).
167
168
168
169
170
+
## Building with Visibly Thinking LLMs
171
+
172
+
173
+
The Gradio Chatbot can natively display intermediate thoughts of a _thinking_ LLM. This makes it perfect for creating UIs that show how an AI model "thinks" while generating responses. Below guide will show you how to build a chatbot that displays Gemini AI's thought process in real-time.
174
+
175
+
176
+
### A real example using Gemini 2.0 Flash Thinking API
177
+
178
+
Let's create a complete chatbot that shows its thoughts and responses in real-time. We'll use Google's Gemini API for accessing Gemini 2.0 Flash Thinking LLM and Gradio for the UI.
179
+
180
+
We'll begin with imports and setting up the gemini client. Note that you'll need to [acquire a Google Gemini API key](https://aistudio.google.com/apikey) first -
181
+
182
+
```python
183
+
import gradio as gr
184
+
from gradio import ChatMessage
185
+
from typing import Iterator
186
+
import google.generativeai as genai
187
+
188
+
genai.configure(api_key="your-gemini-api-key")
189
+
model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-1219")
190
+
```
191
+
192
+
First, let's set up our streaming function that handles the model's output:
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
213
+
)
214
+
)
215
+
216
+
for chunk in response:
217
+
parts = chunk.candidates[0].content.parts
218
+
current_chunk = parts[0].text
219
+
220
+
iflen(parts) ==2andnot thinking_complete:
221
+
# Complete thought and start response
222
+
thought_buffer += current_chunk
223
+
messages[-1] = ChatMessage(
224
+
role="assistant",
225
+
content=thought_buffer,
226
+
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
227
+
)
228
+
229
+
# Add response message
230
+
messages.append(
231
+
ChatMessage(
232
+
role="assistant",
233
+
content=parts[1].text
234
+
)
235
+
)
236
+
thinking_complete =True
237
+
238
+
elif thinking_complete:
239
+
# Continue streaming response
240
+
response_buffer += current_chunk
241
+
messages[-1] = ChatMessage(
242
+
role="assistant",
243
+
content=response_buffer
244
+
)
245
+
246
+
else:
247
+
# Continue streaming thoughts
248
+
thought_buffer += current_chunk
249
+
messages[-1] = ChatMessage(
250
+
role="assistant",
251
+
content=thought_buffer,
252
+
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
253
+
)
254
+
255
+
yield messages
256
+
```
257
+
258
+
Then, let's create the Gradio interface:
259
+
260
+
```python
261
+
with gr.Blocks() as demo:
262
+
gr.Markdown("# Chat with Gemini 2.0 Flash and See its Thoughts 💭")
263
+
264
+
chatbot = gr.Chatbot(
265
+
type="messages",
266
+
label="Gemini2.0 'Thinking' Chatbot",
267
+
render_markdown=True,
268
+
)
269
+
270
+
input_box = gr.Textbox(
271
+
lines=1,
272
+
label="Chat Message",
273
+
placeholder="Type your message here and press Enter..."
274
+
)
275
+
276
+
# Set up event handlers
277
+
msg_store = gr.State("") # Store for preserving user message
278
+
279
+
input_box.submit(
280
+
lambdamsg: (msg, msg, ""), # Store message and clear input
281
+
inputs=[input_box],
282
+
outputs=[msg_store, input_box, input_box],
283
+
queue=False
284
+
).then(
285
+
user_message, # Add user message to chat
286
+
inputs=[msg_store, chatbot],
287
+
outputs=[input_box, chatbot],
288
+
queue=False
289
+
).then(
290
+
stream_gemini_response, # Generate and stream response
291
+
inputs=[msg_store, chatbot],
292
+
outputs=chatbot
293
+
)
294
+
295
+
demo.launch()
296
+
```
297
+
298
+
This creates a chatbot that:
299
+
300
+
- Displays the model's thoughts in a collapsible section
301
+
- Streams the thoughts and final response in real-time
302
+
- Maintains a clean chat history
303
+
304
+
That's it! You now have a chatbot that not only responds to users but also shows its thinking process, creating a more transparent and engaging interaction. See our finished Gemini 2.0 Flash Thinking demo [here](https://huggingface.co/spaces/ysharma/Gemini2-Flash-Thinking).
0 commit comments