Merge branch 'main' into chat-history

dawoodkhan82 · web-flow · commit 02caf948b14d · 2024-12-26T19:30:00.000-05:00
diff --git a/guides/05_chatbots/03_agents-and-tool-usage.md b/guides/05_chatbots/03_agents-and-tool-usage.md
@@ -3,14 +3,14 @@
 Tags: LLM, AGENTS, CHAT
 Related spaces: https://huggingface.co/spaces/gradio/agent_chatbot, https://huggingface.co/spaces/gradio/langchain-agent
 
-The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents. This guide will show you how.
+The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents and chain-of-thought (CoT) demos. This guide will show you how.
 
 ## The metadata key
 
 In addition to the `content` and `role` keys, the messages dictionary accepts a `metadata` key. At present, the `metadata` key accepts a dictionary with a single key called `title`. 
 If you specify a `title` for the message, it will be displayed in a collapsible box.
 
-Here is an example, were we display the agent's thought to use a weather API tool to answer the user query.
+Here is an example, where we display the agent's thought to use a weather API tool to answer the user query.
 
 ```python
 with gr.Blocks() as demo:
@@ -24,7 +24,9 @@ with gr.Blocks() as demo:
 ![simple-metadat-chatbot](https://github.com/freddyaboulton/freddyboulton/assets/41651716/3941783f-6835-4e5e-89a6-03f850d9abde)
 
 
-## A real example using transformers.agents
+## Building with Agents
+
+### A real example using transformers.agents
 
 We'll create a Gradio application simple agent that has access to a text-to-image tool.
 
@@ -87,12 +89,11 @@ You can see the full demo code [here](https://huggingface.co/spaces/gradio/agent
 ![transformers_agent_code](https://github.com/freddyaboulton/freddyboulton/assets/41651716/c8d21336-e0e6-4878-88ea-e6fcfef3552d)
 
 
-## A real example using langchain agents
+### A real example using langchain agents
 
 We'll create a UI for langchain agent that has access to a search engine.
 
-We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with
-the following environment variables set - 
+We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with the following environment variables set - 
 
 ```
 SERPAPI_API_KEY=
@@ -166,4 +167,139 @@ demo.launch()
 That's it! See our finished langchain demo [here](https://huggingface.co/spaces/gradio/langchain-agent).
 
 
+## Building with Visibly Thinking LLMs
+
+
+The Gradio Chatbot can natively display intermediate thoughts of a _thinking_ LLM. This makes it perfect for creating UIs that show how an AI model "thinks" while generating responses. Below guide will show you how to build a chatbot that displays Gemini AI's thought process in real-time.
+
+
+### A real example using Gemini 2.0 Flash Thinking API
+
+Let's create a complete chatbot that shows its thoughts and responses in real-time. We'll use Google's Gemini API for accessing Gemini 2.0 Flash Thinking LLM and Gradio for the UI.
+
+We'll begin with imports and setting up the gemini client. Note that you'll need to [acquire a Google Gemini API key](https://aistudio.google.com/apikey) first -
+
+```python
+import gradio as gr
+from gradio import ChatMessage
+from typing import Iterator
+import google.generativeai as genai
+
+genai.configure(api_key="your-gemini-api-key")
+model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-1219")
+```
+
+First, let's set up our streaming function that handles the model's output:
+
+```python
+def stream_gemini_response(user_message: str, messages: list) -> Iterator[list]:
+    """
+    Streams both thoughts and responses from the Gemini model.
+    """
+    # Initialize response from Gemini
+    response = model.generate_content(user_message, stream=True)
+    
+    # Initialize buffers
+    thought_buffer = ""
+    response_buffer = ""
+    thinking_complete = False
+    
+    # Add initial thinking message
+    messages.append(
+        ChatMessage(
+            role="assistant",
+            content="",
+            metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
+        )
+    )
+    
+    for chunk in response:
+        parts = chunk.candidates[0].content.parts
+        current_chunk = parts[0].text
+        
+        if len(parts) == 2 and not thinking_complete:
+            # Complete thought and start response
+            thought_buffer += current_chunk
+            messages[-1] = ChatMessage(
+                role="assistant",
+                content=thought_buffer,
+                metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
+            )
+            
+            # Add response message
+            messages.append(
+                ChatMessage(
+                    role="assistant",
+                    content=parts[1].text
+                )
+            )
+            thinking_complete = True
+            
+        elif thinking_complete:
+            # Continue streaming response
+            response_buffer += current_chunk
+            messages[-1] = ChatMessage(
+                role="assistant",
+                content=response_buffer
+            )
+            
+        else:
+            # Continue streaming thoughts
+            thought_buffer += current_chunk
+            messages[-1] = ChatMessage(
+                role="assistant",
+                content=thought_buffer,
+                metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
+            )
+        
+        yield messages
+```
+
+Then, let's create the Gradio interface:
+
+```python
+with gr.Blocks() as demo:
+    gr.Markdown("# Chat with Gemini 2.0 Flash and See its Thoughts 💭")
+    
+    chatbot = gr.Chatbot(
+        type="messages",
+        label="Gemini2.0 'Thinking' Chatbot",
+        render_markdown=True,
+    )
+    
+    input_box = gr.Textbox(
+        lines=1,
+        label="Chat Message",
+        placeholder="Type your message here and press Enter..."
+    )
+    
+    # Set up event handlers
+    msg_store = gr.State("")  # Store for preserving user message
+    
+    input_box.submit(
+        lambda msg: (msg, msg, ""),  # Store message and clear input
+        inputs=[input_box],
+        outputs=[msg_store, input_box, input_box],
+        queue=False
+    ).then(
+        user_message,  # Add user message to chat
+        inputs=[msg_store, chatbot],
+        outputs=[input_box, chatbot],
+        queue=False
+    ).then(
+        stream_gemini_response,  # Generate and stream response
+        inputs=[msg_store, chatbot],
+        outputs=chatbot
+    )
+
+demo.launch()
+```
+
+This creates a chatbot that:
+
+- Displays the model's thoughts in a collapsible section
+- Streams the thoughts and final response in real-time
+- Maintains a clean chat history
+
+ That's it! You now have a chatbot that not only responds to users but also shows its thinking process, creating a more transparent and engaging interaction. See our finished Gemini 2.0 Flash Thinking demo [here](https://huggingface.co/spaces/ysharma/Gemini2-Flash-Thinking).