mbodied agents v2 (#23)

mbodiai · Jun 16, 2024 · 8218d28 · 8218d28
1 parent 7a37ce4
commit 8218d28
Show file tree

Hide file tree

Showing 46 changed files with 906 additions and 2,423 deletions.
diff --git a/README.md b/README.md
@@ -5,10 +5,12 @@
 [![Ubuntu](https://github.com/MbodiAI/opensource/actions/workflows/ubuntu.yml/badge.svg)](https://github.com/MbodiAI/opensource/actions/workflows/ubuntu.yml)
 [![PyPI Version](https://img.shields.io/pypi/v/mbodied-agents.svg)](https://pypi.python.org/pypi/mbodied-agents)
 [![Documentation Status](https://readthedocs.com/projects/mbodi-ai-mbodied-agents/badge/?version=latest)](https://mbodi-ai-mbodied-agents.readthedocs-hosted.com/en/latest/?badge=latest)
-[![Example Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DAQkuuEYj8demiuJS1_10FIyTI78Yzh4?usp=sharing)
+
 
 Documentation: [mbodied agents docs](https://mbodi-ai-mbodied-agents.readthedocs-hosted.com/en)
 
+Example colab: [![Example Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing)
+
 # mbodied agents
 Welcome to **mbodied agents**, a toolkit for integrating state-of-the-art transformers into robotics systems. The goals for this repo are to minimize the ambiguouty, heterogeneity, and data scarcity currently holding generative AI back from wide-spread adoption in robotics. It provides strong type hints for the various types of robot actions and provides a unified interface for:
 
@@ -17,7 +19,7 @@ Welcome to **mbodied agents**, a toolkit for integrating state-of-the-art transf
 - Automatically recording observations and actions to hdf5
 - Exporting to the most popular ML formats such as [Gym Spaces](https://gymnasium.farama.org/index.html) and [Huggingface Datasets](https://huggingface.co/docs/datasets/en/index)
 
-And most importantly, the entire library is __100% configurable to any observation and action space__. That's right. With **mbodied agents**, the days of wasting precious engineering time on tedious formatting and post-processing are over. Jump to [Getting Started](#getting-started) to get up and running on [real hardware](https://colab.research.google.com/drive/1DAQkuuEYj8demiuJS1_10FIyTI78Yzh4?usp=sharing) or a [mujoco simulation](https://colab.research.google.com/drive/1sZtVLv17g9Lin1O2DyecBItWXwzUVUeH)
+And most importantly, the entire library is __100% configurable to any observation and action space__. That's right. With **mbodied agents**, the days of wasting precious engineering time on tedious formatting and post-processing are over. Jump to [Getting Started](#getting-started) to get up and running on [real hardware](https://colab.research.google.com/drive/16liQspSIzRazWb_qa_6Z0MRKmMTr2s1s?usp=sharing) or a [mujoco simulation](https://colab.research.google.com/drive/1sZtVLv17g9Lin1O2DyecBItWXwzUVUeH)
 
 
 ## Updates
@@ -38,18 +40,11 @@ Please join our [Discord](https://discord.gg/RNzf3RCxRJ) for interesting discuss
 
 - [Mbodied Agents](#mbodied-agents)
   - [Overview](#overview)
-    - [Support Matrix](#support-matrix)
   - [Installation](#installation)
+  - [Dev Environment Setup](#dev-environment-setup)
   - [Getting Started](#getting-started)
   - [Glossary](#glossary)
   - [Building Blocks](#building-blocks)
-    - [The Sample class](#the-sample-class)
-    - [Message](#message)
-    - [Backend](#backend)
-    - [Cognitive Agent](#cognitive-agent)
-    - [Controls](#controls)
-    - [Hardware Interface](#hardware-interface)
-    - [Recorder](#recorder)
   - [Directory Structure](#directory-structure)
   - [Contributing](#contributing)
 
@@ -85,6 +80,26 @@ If you would like to integrate a new backend, sense, or motion control, it is ve
 
 `pip install mbodied-agents`
 
+## Dev Environment Setup
+
+1. Clone this repo:
+
+   ```console
+   git clone https://github.com/MbodiAI/mbodied-agents.git
+   ```
+
+2. Install system dependencies:
+
+   ```console
+   source install.bash
+   ```
+
+3. Then for each new terminal, run:
+
+   ```console
+   hatch shell
+   ```
+
 ## Getting Started
 
 ### Real Robot Hardware

diff --git a/assets/architecture.jpg b/assets/architecture.jpg
diff --git a/examples/simple_robot_agent.ipynb b/examples/simple_robot_agent.ipynb
diff --git a/examples/simple_robot_agent.py b/examples/simple_robot_agent.py
@@ -1,11 +1,11 @@
 # Copyright 2024 Mbodi AI
-# 
+#
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
-# 
+#
 #     https://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -18,14 +18,14 @@
 import click
 from pydantic import BaseModel, Field
 from pydantic_core import from_json
-from gym import spaces
+from gymnasium import spaces
 
-from mbodied_agents.agents.language import CognitiveAgent
-from mbodied_agents.agents.sense.audio_handler import AudioHandler
+from mbodied_agents.agents.language import LanguageAgent
+from mbodied_agents.agents.sense.audio.audio_handler import AudioHandler
 from mbodied_agents.base.sample import Sample
 from mbodied_agents.hardware.sim_interface import SimInterface
 from mbodied_agents.types.controls import HandControl
-from mbodied_agents.types.vision import Image
+from mbodied_agents.types.sense.vision import Image
 from mbodied_agents.data.recording import Recorder
 
 
@@ -53,9 +53,9 @@ class AnswerAndActionsList(Sample):
     )
 
 
-# This prompt is used to provide context to the CognitiveAgent.
+# This prompt is used to provide context to the LanguageAgent.
 SYSTEM_PROMPT = f"""
-    You are a robot with vision capabilities. 
+    You are a robot with vision capabilities.
     For each task given, you respond in JSON format. Here's the JSON schema:
     {AnswerAndActionsList.model_json_schema()}
     """
@@ -69,32 +69,34 @@ def main(backend: str, disable_audio: bool, record_dataset: bool) -> None:
     """Main function to initialize and run the robot interaction.
 
     Args:
-        backend: The backend to use for the CognitiveAgent (e.g., "openai").
+        backend: The backend to use for the LanguageAgent (e.g., "openai").
         disable_audio: If True, disables audio input/output.
         record_dataset: If True, enables recording of the interaction data for training.
 
     Example:
         To run the script with OpenAI backend and disable audio:
         python script.py --backend openai --disable_audio
     """
-    # Initialize the intelligent Robot Agent.
-    robot_agent = CognitiveAgent(context=SYSTEM_PROMPT, api_service=backend)
+    # Initialize the intelligent Robot Agent with language interface.
+    robot_agent = LanguageAgent(context=SYSTEM_PROMPT, api_service=backend)
 
     # Use a mock robot interface for movement visualization.
     robot_interface = SimInterface()
 
     # Enable or disable audio input/output capabilities.
     if disable_audio:
         os.environ["NO_AUDIO"] = "1"
-    audio = AudioHandler(use_pyaudio=False)  # Prefer to use use_pyaudio=False for MAC.
+    # Prefer to use use_pyaudio=False for MAC.
+    audio = AudioHandler(use_pyaudio=False)
 
     # Data recorder for every conversation and action.
     if record_dataset:
         observation_space = spaces.Dict({
             'image': Image(size=(224, 224)).space(),
             'instruction': spaces.Text(1000)
         })
-        action_space = AnswerAndActionsList(actions=[HandControl()] * 6).space()
+        action_space = AnswerAndActionsList(
+            actions=[HandControl()] * 6).space()
         recorder = Recorder(
             'example_recorder',
             out_dir='saved_datasets',
@@ -116,7 +118,8 @@ def main(backend: str, disable_audio: bool, record_dataset: bool) -> None:
         print("Response:", response)
 
         # Validate the response to the pydantic object.
-        answer_actions = AnswerAndActionsList.model_validate(from_json(response))
+        answer_actions = AnswerAndActionsList.model_validate(
+            from_json(response))
 
         # Let the robot speak.
         if answer_actions.answer: