ROS2 package for deploying and fine-tuning multi-modal generalist agent models. This package provides inference servers as ROS2 action servers for the most popular generalist multimodal robotics models (see Available Models. It depends on robo_transformers to access and run inference for these models. Currently it also depends on dgl_ros for the agent servers but a refactor is needed.
- ✅ Ubuntu 22.04 + ROS2 Humble
Model Type | Variants | Observation Space | Action Space | Author |
---|---|---|---|---|
RT-1 | rt1main, rt1multirobot, rt1simreal | text + head camera | end effector pose delta | Google Research, 2022 |
RT-1-X | rt1x | text + head camera | end effector pose delta | Google Research et al., 2023 |
Octo | octo-base, octo-small | text + head camera + Optional[wrist camera] | end effector pose delta | Octo Model Team et al., 2023 |
Follow installation for dgl_ros.
Clone this repo in the src
directory of your ROS workspace:
git clone https://github.com/sebbyjp/ros2_transformers.git
Install ROS dependencies:
rosdep install --from-paths src/ros2_transformers --ignore-src --rosdistro ${ROS_DISTRO} -y
Build:
colcon build --symlink-install --base-paths src/ros2_transformers --cmake-args -DCMAKE_BUILD_TYPE=RelWithDebInfo
Source:
source install/setup.bash
Install robo_transformers:
python3 -m pip install robo-transformers
In a terminal run the demo app: ros2 launch ros2_transformers task_launch.py
. You can change the task with the task_name
parameter in config/app_config.yaml. You can change what objects are spawned for a task with its yaml file in tasks/
.
In another terminal run one of the following:
-
Octo:
ros2 run ros2_transformers octo --ros-args -p use_sim_time:=true -p src_topic0:=$YOUR_MAIN_CAMERA_TOPIC -p src_topic1:=$YOUR_WRIST_CAMERA_TOPIC -p action_topic:=vla -p model_type:=octo -p weights_key:=octo-small -p default_instruction:="pick up the coke can off the table"
-
RT-1/X:
ros2 run ros2_transformers rt1 --ros-args -p use_sim_time:=true -p src_topic0:=$YOUR_MAIN_CAMERA_TOPIC -p action_topic:=vla -p model_type:=rt1 -p weights_key:=rt1x -p default_instruction:="pick coke can"
config
: Contains configuration files for the application, ros_gz bridge, and rviz gui.moveit
: Contains MoveIt configuration files that are not specific to a robot.robots
: Contains robot specific files such as urdfs, meshes, ros_controllers, and robot-specific moveit configurations.tasks
: Contains task specifications (location and properties of objects in the environment) in yaml format.sim
: Contains simulation assets for tasks and gazebo world files.
- See
src/demo_app.cpp
andlaunch/task_launch.py
for an example of how to use this package.
- Agent Inference Server (C++ or Python) (See dgl_ros).
- See
include/rt1.cpp
andinclude/octo.cpp
- Tensorflow or PyTorch for inference and training (Python) (See robo_transformers)
- ONNX or OpenVino for high performance inference and training (C++)
The following layers are usually part of the user’s robot stack but we include them to support robots out-of-the-box for users who only have actuator driver or ROS control API’s from the manufacturer (Note that ROS control makes calls to the driver APIs). These layers are only required at all because current foundational models for robotics output to action spaces like position or velocity of a robots arms and feet. Once foundational models begin outputting to input spaces for each of these layers (first joint angles then motor torques), they become redundant.
- MoveIt Inverse Kinematics (C++)
- Open Motion Planning Library (C++)
- ROS2 control (C++)