Skip to content

Commit 02caf94

Browse files
authored
Merge branch 'main' into chat-history
2 parents a1e0bd1 + aa2eec2 commit 02caf94

File tree

1 file changed

+142
-6
lines changed

1 file changed

+142
-6
lines changed

guides/05_chatbots/03_agents-and-tool-usage.md

+142-6
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33
Tags: LLM, AGENTS, CHAT
44
Related spaces: https://huggingface.co/spaces/gradio/agent_chatbot, https://huggingface.co/spaces/gradio/langchain-agent
55

6-
The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents. This guide will show you how.
6+
The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents and chain-of-thought (CoT) demos. This guide will show you how.
77

88
## The metadata key
99

1010
In addition to the `content` and `role` keys, the messages dictionary accepts a `metadata` key. At present, the `metadata` key accepts a dictionary with a single key called `title`.
1111
If you specify a `title` for the message, it will be displayed in a collapsible box.
1212

13-
Here is an example, were we display the agent's thought to use a weather API tool to answer the user query.
13+
Here is an example, where we display the agent's thought to use a weather API tool to answer the user query.
1414

1515
```python
1616
with gr.Blocks() as demo:
@@ -24,7 +24,9 @@ with gr.Blocks() as demo:
2424
![simple-metadat-chatbot](https://github.com/freddyaboulton/freddyboulton/assets/41651716/3941783f-6835-4e5e-89a6-03f850d9abde)
2525

2626

27-
## A real example using transformers.agents
27+
## Building with Agents
28+
29+
### A real example using transformers.agents
2830

2931
We'll create a Gradio application simple agent that has access to a text-to-image tool.
3032

@@ -87,12 +89,11 @@ You can see the full demo code [here](https://huggingface.co/spaces/gradio/agent
8789
![transformers_agent_code](https://github.com/freddyaboulton/freddyboulton/assets/41651716/c8d21336-e0e6-4878-88ea-e6fcfef3552d)
8890

8991

90-
## A real example using langchain agents
92+
### A real example using langchain agents
9193

9294
We'll create a UI for langchain agent that has access to a search engine.
9395

94-
We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with
95-
the following environment variables set -
96+
We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with the following environment variables set -
9697

9798
```
9899
SERPAPI_API_KEY=
@@ -166,4 +167,139 @@ demo.launch()
166167
That's it! See our finished langchain demo [here](https://huggingface.co/spaces/gradio/langchain-agent).
167168

168169

170+
## Building with Visibly Thinking LLMs
171+
172+
173+
The Gradio Chatbot can natively display intermediate thoughts of a _thinking_ LLM. This makes it perfect for creating UIs that show how an AI model "thinks" while generating responses. Below guide will show you how to build a chatbot that displays Gemini AI's thought process in real-time.
174+
175+
176+
### A real example using Gemini 2.0 Flash Thinking API
177+
178+
Let's create a complete chatbot that shows its thoughts and responses in real-time. We'll use Google's Gemini API for accessing Gemini 2.0 Flash Thinking LLM and Gradio for the UI.
179+
180+
We'll begin with imports and setting up the gemini client. Note that you'll need to [acquire a Google Gemini API key](https://aistudio.google.com/apikey) first -
181+
182+
```python
183+
import gradio as gr
184+
from gradio import ChatMessage
185+
from typing import Iterator
186+
import google.generativeai as genai
187+
188+
genai.configure(api_key="your-gemini-api-key")
189+
model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-1219")
190+
```
191+
192+
First, let's set up our streaming function that handles the model's output:
193+
194+
```python
195+
def stream_gemini_response(user_message: str, messages: list) -> Iterator[list]:
196+
"""
197+
Streams both thoughts and responses from the Gemini model.
198+
"""
199+
# Initialize response from Gemini
200+
response = model.generate_content(user_message, stream=True)
201+
202+
# Initialize buffers
203+
thought_buffer = ""
204+
response_buffer = ""
205+
thinking_complete = False
206+
207+
# Add initial thinking message
208+
messages.append(
209+
ChatMessage(
210+
role="assistant",
211+
content="",
212+
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
213+
)
214+
)
215+
216+
for chunk in response:
217+
parts = chunk.candidates[0].content.parts
218+
current_chunk = parts[0].text
219+
220+
if len(parts) == 2 and not thinking_complete:
221+
# Complete thought and start response
222+
thought_buffer += current_chunk
223+
messages[-1] = ChatMessage(
224+
role="assistant",
225+
content=thought_buffer,
226+
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
227+
)
228+
229+
# Add response message
230+
messages.append(
231+
ChatMessage(
232+
role="assistant",
233+
content=parts[1].text
234+
)
235+
)
236+
thinking_complete = True
237+
238+
elif thinking_complete:
239+
# Continue streaming response
240+
response_buffer += current_chunk
241+
messages[-1] = ChatMessage(
242+
role="assistant",
243+
content=response_buffer
244+
)
245+
246+
else:
247+
# Continue streaming thoughts
248+
thought_buffer += current_chunk
249+
messages[-1] = ChatMessage(
250+
role="assistant",
251+
content=thought_buffer,
252+
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
253+
)
254+
255+
yield messages
256+
```
257+
258+
Then, let's create the Gradio interface:
259+
260+
```python
261+
with gr.Blocks() as demo:
262+
gr.Markdown("# Chat with Gemini 2.0 Flash and See its Thoughts 💭")
263+
264+
chatbot = gr.Chatbot(
265+
type="messages",
266+
label="Gemini2.0 'Thinking' Chatbot",
267+
render_markdown=True,
268+
)
269+
270+
input_box = gr.Textbox(
271+
lines=1,
272+
label="Chat Message",
273+
placeholder="Type your message here and press Enter..."
274+
)
275+
276+
# Set up event handlers
277+
msg_store = gr.State("") # Store for preserving user message
278+
279+
input_box.submit(
280+
lambda msg: (msg, msg, ""), # Store message and clear input
281+
inputs=[input_box],
282+
outputs=[msg_store, input_box, input_box],
283+
queue=False
284+
).then(
285+
user_message, # Add user message to chat
286+
inputs=[msg_store, chatbot],
287+
outputs=[input_box, chatbot],
288+
queue=False
289+
).then(
290+
stream_gemini_response, # Generate and stream response
291+
inputs=[msg_store, chatbot],
292+
outputs=chatbot
293+
)
294+
295+
demo.launch()
296+
```
297+
298+
This creates a chatbot that:
299+
300+
- Displays the model's thoughts in a collapsible section
301+
- Streams the thoughts and final response in real-time
302+
- Maintains a clean chat history
303+
304+
That's it! You now have a chatbot that not only responds to users but also shows its thinking process, creating a more transparent and engaging interaction. See our finished Gemini 2.0 Flash Thinking demo [here](https://huggingface.co/spaces/ysharma/Gemini2-Flash-Thinking).
169305

0 commit comments

Comments
 (0)