Skip to content

feat: add ManiSkill3 environment integration#29

Closed
Shr1ftyy wants to merge 6 commits intomainfrom
feat/maniskill-env
Closed

feat: add ManiSkill3 environment integration#29
Shr1ftyy wants to merge 6 commits intomainfrom
feat/maniskill-env

Conversation

@Shr1ftyy
Copy link
Collaborator

@Shr1ftyy Shr1ftyy commented Feb 5, 2026

Summary

Add ManiSkill3 as a new environment family for GPU-accelerated dexterous manipulation tasks using SAPIEN physics engine.

New Environments

  • maniskill3/pick-cube-v1 - Pick up a cube
  • maniskill3/stack-cube-v1 - Stack one cube on another
  • maniskill3/peg-insertion-v1 - Insert peg into hole
  • maniskill3/push-cube-v1 - Push cube to target
  • maniskill3/pull-cube-v1 - Pull cube to target
  • maniskill3/lift-cube-v1 - Lift cube to height
  • maniskill3/turn-faucet-v1 - Turn a faucet handle
  • maniskill3/open-cabinet-door-v1 - Open a cabinet door

Changes

  • Add ManiSkill3Environment class implementing RoboticsEnvironment with canonical observation/action mapping
  • Add Docker container (environments/maniskill3/) with SAPIEN/Vulkan dependencies
  • Register all 8 environments in registry
  • Add maniskill3 to CLI available families

Implementation Details

  • Extracts TCP pose from ManiSkill3's state_dict observation format
  • Converts quaternion from ManiSkill3's wxyz to canonical xyzw format
  • Handles torch tensor returns (batched data with shape [1, ...])
  • Supports both GPU (Vulkan) and CPU rendering fallback

Note on Dependencies

ManiSkill3 dependencies are container-only due to gymnasium version conflict:

  • mani-skill requires gymnasium==0.29.1
  • kinitro requires gymnasium>=1.1

The container has its own requirements.txt with compatible versions.

Test Plan

  • Environment initialization and property access
  • Task generation with procedural randomization
  • Reset with correct TCP pose extraction
  • Step execution with action conversion
  • Camera observation capture (256x256 RGB)
  • Multiple environments tested (pick-cube, push-cube, stack-cube)
  • Registry integration verified

Summary by CodeRabbit

  • New Features

    • Added ManiSkill3 robotic manipulation environment support with 8 manipulation tasks (pick, stack, peg-insertion, push, pull, lift, turn-faucet, open-cabinet)
    • Introduced multi-worktree development infrastructure with isolated configurations and automatic port management
    • Made Docker Compose service ports configurable via environment variables
  • Documentation

    • Added End-to-End testing guide for multi-worktree development setup

Add scripts and documentation for running multiple worktrees in parallel
without service collisions.

- scripts/worktree-env.sh: generates .env and docker-compose.override.yml
  with deterministic port offsets based on worktree name
- scripts/start-services.sh: manages service lifecycle per worktree
- docs/e2e-testing-guide.md: comprehensive E2E testing documentation
- .gitignore: exclude docker-compose.override.yml (generated file)
- docker-compose.yml: Use env variables for ports to avoid merge
  conflicts with override files
- worktree-env.sh: Simplify override file to only set container names
  (ports now come from .env)
- start-services.sh: Fix pkill patterns to reliably stop processes
- AGENTS.md: Fix $DB -> $POSTGRES_DB in troubleshooting command
- worktree-env.sh: Add macOS compatibility (md5 fallback for md5sum)
- start-services.sh: Use bash array for executor command (safer quoting)
- start-services.sh: Implement documented --api-only/--scheduler/--executor flags
- start-services.sh: Validate WORKTREE_NAME after sourcing .env
- start-services.sh: Use _ for unused loop variable
Add ManiSkill3 as a new environment family for GPU-accelerated
dexterous manipulation tasks using SAPIEN physics.

New environments:
- maniskill3/pick-cube-v1
- maniskill3/stack-cube-v1
- maniskill3/peg-insertion-v1
- maniskill3/push-cube-v1
- maniskill3/pull-cube-v1
- maniskill3/lift-cube-v1
- maniskill3/turn-faucet-v1
- maniskill3/open-cabinet-door-v1

Changes:
- Add ManiSkill3Environment class with canonical observation/action mapping
- Add Docker container with SAPIEN/Vulkan dependencies
- Register environments in registry
- Add maniskill3 to CLI available families

Note: ManiSkill3 dependencies are container-only due to gymnasium
version conflict (mani-skill requires 0.29.1, kinitro requires >=1.1)
@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

Adds a new ManiSkill3 robotics evaluation environment with eight registered tasks, implements per-worktree development isolation via dynamic port configuration and service lifecycle scripts, and updates docker-compose to use environment-driven port bindings while removing legacy mani-skill2 dependencies.

Changes

Cohort / File(s) Summary
ManiSkill3 Environment Implementation
environments/maniskill3/*, kinitro/environments/maniskill3_env.py
Introduces new ManiSkill3 environment family with Dockerfile, Python Actor for Affinetes integration, dependencies manifest, metadata, and kinitro environment wrapper class supporting task generation, observation canonicalization, action translation, and image capture.
Environment Registry and CLI
kinitro/cli/env/commands.py, kinitro/environments/registry.py
Registers eight new ManiSkill3 task variants (pick-cube, stack-cube, peg-insertion, push-cube, pull-cube, lift-cube, turn-faucet, open-cabinet-door) and adds "maniskill3" to supported environment families.
Per-Worktree Development Infrastructure
scripts/worktree-env.sh, scripts/start-services.sh, .gitignore
Introduces deterministic port offset generation per worktree, service lifecycle management (start/stop/status/logs), and gitignore entry for per-worktree docker-compose overrides.
Docker and Configuration
docker-compose.yml, pyproject.toml
Replaces static host ports with environment variable bindings across all services (Postgres, API, Prometheus, Grafana) and removes legacy mani-skill2 optional dependencies with note on maniskill3 containerization.
Documentation
AGENTS.md
Adds End-to-End Testing sections detailing multi-worktree development, quick start, environment image building, logs, and troubleshooting across two locations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • rayokamoto

Poem

🐰 A new ManiSkill to master,
Per-worktree ports move faster,
Robots twist and gripper squeeze,
While services start with ease.
Infrastructure hops along,
Making multi-dev less wrong! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: adding ManiSkill3 as a new environment integration. The title is concise, clear, and directly reflects the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/maniskill-env

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@environments/maniskill3/Dockerfile`:
- Around line 60-65: The Dockerfile currently forces software rendering via the
ENV LIBGL_ALWAYS_SOFTWARE=1 which conflicts with the GPU-accelerated goal;
remove that hardcoded ENV or make it configurable (e.g., replace the fixed ENV
with a build ARG / optional ENV so the value can be left unset or set to 0 in
GPU builds), keep DISPLAY, MESA_GL_VERSION_OVERRIDE and VK_ICD_FILENAMES as-is
to support GL/Vulkan, and ensure any CI or runtime docs are updated to instruct
how to set LIBGL_ALWAYS_SOFTWARE when a software fallback is explicitly desired.

In `@environments/maniskill3/env.py`:
- Around line 152-153: Replace the direct raise of ValueError for the missing
base_url with the same structured error return used elsewhere: call
_build_error_result with an appropriate error message and status (matching how
invalid env_id is handled) instead of raising; update the base_url validation
branch (the check that currently does "if base_url is None") to return
_build_error_result("base_url (miner endpoint) is required", ...) so callers
receive a consistent error result format.

In `@kinitro/environments/maniskill3_env.py`:
- Around line 70-74: The docstring for camera_names is wrong — it claims the
default is ['base_camera', 'hand_camera'] while the code uses self.CAMERA_NAMES
(which is ['base_camera']). Update the docstring in the Maniskill3Env/__init__
(or surrounding class docstring) to either state the actual default
(['base_camera']) or state that it defaults to self.CAMERA_NAMES, and ensure the
symbol camera_names and self.CAMERA_NAMES are referenced so readers know the
source of the default.
- Around line 249-263: _extract_ee_velocity currently treats joint velocities as
Cartesian EE velocity by scaling qvel slices, which is incorrect; change it to
compute EE linear (and angular if available) velocity from the position-delta
used in _build_observation instead: read the current EE position from obs (e.g.,
obs["ee_pos"] or agent obs field used in _build_observation), subtract a stored
previous EE position (e.g., self._prev_ee_pos or obs["prev_ee_pos"]) and divide
by the environment timestep (e.g., self.control_timestep or self.dt) to produce
ee_lin_vel (and similarly for orientation/ee_ang_vel if previous orientation is
available), default to zeros if previous data or dt is missing, and remove the
qvel[:3]/qvel[3:6] 0.1 scaling; if you prefer higher fidelity later, replace
this with a Jacobian-based conversion when ManiSkill3 exposes the Jacobian API.
🧹 Nitpick comments (7)
environments/maniskill3/env.py (1)

43-43: Logger instantiated but never used.

The structlog logger is created but no logging statements exist in the file. Key events like evaluation start/completion, miner call failures, and timeouts would benefit from structured logging for observability and debugging.

kinitro/cli/env/commands.py (1)

57-67: Consider updating docstring examples to include maniskill3.

The docstring's "Environment families" section and examples only mention metaworld and procthor. Adding maniskill3 would keep documentation consistent with the updated AVAILABLE_ENV_FAMILIES constant.

📝 Proposed docstring update
     Environment families:
       - metaworld: MuJoCo-based manipulation tasks (~400MB image)
       - procthor: AI2-THOR procedural house tasks (~1.5GB image, x86_64 Linux only)
+      - maniskill3: SAPIEN/ManiSkill3 dexterous manipulation (~1GB image, requires Vulkan)

     Examples:
         # Build MetaWorld environment
         kinitro env build metaworld --tag kinitro/metaworld:v1

         # Build ProcTHOR environment
         kinitro env build procthor --tag kinitro/procthor:v1

+        # Build ManiSkill3 environment
+        kinitro env build maniskill3 --tag kinitro/maniskill3:v1
+
         # Build and push to registry
         kinitro env build metaworld --push --registry docker.io/myuser
scripts/worktree-env.sh (1)

41-44: Port collision risk with common services.

The calculated ports (base + 0-999 offset) could collide with other common services:

  • PostgreSQL: 5432-6431 (overlaps with X11 6000+)
  • API: 8000-8999 (common dev server range)
  • Prometheus: 9090-10089 (various monitoring tools)
  • Grafana: 3000-3999 (common dev tool range)

Consider using a higher base offset or validating port availability before generating the config.

scripts/start-services.sh (1)

71-80: Consider making --no-auth configurable.

The --no-auth flag is hardcoded, which disables API authentication. While acceptable for local development, consider making this configurable via an environment variable for scenarios where auth testing is needed:

Suggested improvement
 start_api() {
     echo "Starting API on port $API_PORT..."
+    local auth_flag=""
+    if [[ "${DISABLE_AUTH:-true}" == "true" ]]; then
+        auth_flag="--no-auth"
+    fi
     nohup uv run kinitro api \
         --database-url "$DATABASE_URL" \
         --port "$API_PORT" \
-        --no-auth \
+        $auth_flag \
         --log-level INFO \
         &> "$LOG_DIR/api.log" &
     echo "API started (PID: $!)"
 }
kinitro/environments/maniskill3_env.py (3)

393-418: Consider extracting tensor-to-scalar conversion helper.

The same tensor conversion pattern is repeated four times for reward, terminated, truncated, and success. A helper would reduce duplication:

Optional refactor
def _to_scalar(self, value: Any, as_bool: bool = False) -> float | bool:
    """Convert ManiSkill3 batched tensor/array to Python scalar."""
    if hasattr(value, "item"):
        result = value.item()
    elif hasattr(value, "cpu"):
        result = value.cpu().numpy().flat[0]
    else:
        result = value
    return bool(result) if as_bool else float(result)

Then usage:

reward = self._to_scalar(reward)
terminated = self._to_scalar(terminated, as_bool=True)
truncated = self._to_scalar(truncated, as_bool=True)
success = self._to_scalar(info.get("success", False), as_bool=True)

313-323: Image serialization may be inefficient.

Converting camera images to nested Python lists via img.tolist() (line 319) for 128×128×3 images creates ~49K element lists per camera. If performance becomes an issue, consider using base64-encoded bytes or keeping numpy arrays until serialization is needed.


238-246: Clarify quaternion format comment.

The comment "ManiSkill3 uses [x, y, z, qw, qx, qy, qz]" describes the full tcp_pose array (position + quaternion), which could be confusing. Consider clarifying:

-                # ManiSkill3 uses [x, y, z, qw, qx, qy, qz], convert to xyzw
+                # tcp_pose format: [pos_x, pos_y, pos_z, qw, qx, qy, qz]
+                # Convert quaternion from wxyz to xyzw order

Comment on lines +60 to +65
ENV DISPLAY=:0
ENV LIBGL_ALWAYS_SOFTWARE=1
ENV MESA_GL_VERSION_OVERRIDE=3.3

# For headless Vulkan rendering
ENV VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.x86_64.json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Software rendering enabled contradicts GPU acceleration goal.

LIBGL_ALWAYS_SOFTWARE=1 forces Mesa to use software rendering, which contradicts the PR's goal of "GPU-accelerated dexterous manipulation." This will significantly degrade rendering performance.

Consider making this configurable or removing it to allow GPU rendering when available:

Proposed fix
 # Set rendering environment variables
 # SAPIEN can use Vulkan (preferred) or EGL
 ENV DISPLAY=:0
-ENV LIBGL_ALWAYS_SOFTWARE=1
+# Note: Set LIBGL_ALWAYS_SOFTWARE=1 at runtime for CPU-only fallback
 ENV MESA_GL_VERSION_OVERRIDE=3.3
🤖 Prompt for AI Agents
In `@environments/maniskill3/Dockerfile` around lines 60 - 65, The Dockerfile
currently forces software rendering via the ENV LIBGL_ALWAYS_SOFTWARE=1 which
conflicts with the GPU-accelerated goal; remove that hardcoded ENV or make it
configurable (e.g., replace the fixed ENV with a build ARG / optional ENV so the
value can be left unset or set to 0 in GPU builds), keep DISPLAY,
MESA_GL_VERSION_OVERRIDE and VK_ICD_FILENAMES as-is to support GL/Vulkan, and
ensure any CI or runtime docs are updated to instruct how to set
LIBGL_ALWAYS_SOFTWARE when a software fallback is explicitly desired.

Comment on lines +152 to +153
if base_url is None:
raise ValueError("base_url (miner endpoint) is required")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent error handling: ValueError raised instead of returning error result.

Line 153 raises ValueError directly while all other validation errors (like invalid env_id on lines 142-150) return a structured error result via _build_error_result. This inconsistency means callers receive different error types depending on which validation fails.

🛠️ Proposed fix for consistent error handling
         if base_url is None:
-            raise ValueError("base_url (miner endpoint) is required")
+            return self._build_error_result(
+                env_id=env_id,
+                task_id=task_id,
+                seed=seed or task_id,
+                start_time=time.time(),
+                error="base_url (miner endpoint) is required",
+            )
🤖 Prompt for AI Agents
In `@environments/maniskill3/env.py` around lines 152 - 153, Replace the direct
raise of ValueError for the missing base_url with the same structured error
return used elsewhere: call _build_error_result with an appropriate error
message and status (matching how invalid env_id is handled) instead of raising;
update the base_url validation branch (the check that currently does "if
base_url is None") to return _build_error_result("base_url (miner endpoint) is
required", ...) so callers receive a consistent error result format.

Comment on lines +70 to +74
Args:
task_name: Name of the task (e.g., 'pick-cube-v1')
use_camera: Whether to include camera observations
camera_names: Which cameras to use (default: ['base_camera', 'hand_camera'])
image_size: (width, height) for camera images
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring inconsistent with default value.

The docstring says camera_names defaults to ['base_camera', 'hand_camera'], but the actual default is self.CAMERA_NAMES which only contains ['base_camera'].

Fix docstring
-            camera_names: Which cameras to use (default: ['base_camera', 'hand_camera'])
+            camera_names: Which cameras to use (default: ['base_camera'])
🤖 Prompt for AI Agents
In `@kinitro/environments/maniskill3_env.py` around lines 70 - 74, The docstring
for camera_names is wrong — it claims the default is ['base_camera',
'hand_camera'] while the code uses self.CAMERA_NAMES (which is ['base_camera']).
Update the docstring in the Maniskill3Env/__init__ (or surrounding class
docstring) to either state the actual default (['base_camera']) or state that it
defaults to self.CAMERA_NAMES, and ensure the symbol camera_names and
self.CAMERA_NAMES are referenced so readers know the source of the default.

Comment on lines +249 to +263
def _extract_ee_velocity(self, obs: dict[str, Any]) -> tuple[np.ndarray, np.ndarray]:
"""Extract end-effector velocity from observation."""
ee_lin_vel = np.zeros(3, dtype=np.float32)
ee_ang_vel = np.zeros(3, dtype=np.float32)

agent_obs = obs.get("agent", {})
if isinstance(agent_obs, dict) and "qvel" in agent_obs:
qvel = self._to_numpy(agent_obs["qvel"])
# Approximate ee velocity from joint velocities
# This is a simplification - proper FK would be needed for accuracy
if qvel.size >= 6:
ee_lin_vel = qvel[:3] * 0.1 # Scale factor approximation
ee_ang_vel = qvel[3:6] * 0.1

return ee_lin_vel, ee_ang_vel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Velocity approximation is physically incorrect.

Joint velocities (qvel[:3], qvel[3:6]) are in joint space, not Cartesian end-effector space. Scaling them by 0.1 doesn't produce meaningful EE velocity. The fallback position-delta computation in _build_observation (lines 305-307) is more accurate.

Consider either:

  1. Always using position-delta for velocity (simpler, consistent)
  2. Computing proper Jacobian-based velocity if ManiSkill3 exposes it
Option 1: Always use position delta
 def _extract_ee_velocity(self, obs: dict[str, Any]) -> tuple[np.ndarray, np.ndarray]:
     """Extract end-effector velocity from observation."""
-    ee_lin_vel = np.zeros(3, dtype=np.float32)
-    ee_ang_vel = np.zeros(3, dtype=np.float32)
-
-    agent_obs = obs.get("agent", {})
-    if isinstance(agent_obs, dict) and "qvel" in agent_obs:
-        qvel = self._to_numpy(agent_obs["qvel"])
-        # Approximate ee velocity from joint velocities
-        # This is a simplification - proper FK would be needed for accuracy
-        if qvel.size >= 6:
-            ee_lin_vel = qvel[:3] * 0.1  # Scale factor approximation
-            ee_ang_vel = qvel[3:6] * 0.1
-
-    return ee_lin_vel, ee_ang_vel
+    # Return zeros; velocity will be computed from position delta in _build_observation
+    return np.zeros(3, dtype=np.float32), np.zeros(3, dtype=np.float32)
🤖 Prompt for AI Agents
In `@kinitro/environments/maniskill3_env.py` around lines 249 - 263,
_extract_ee_velocity currently treats joint velocities as Cartesian EE velocity
by scaling qvel slices, which is incorrect; change it to compute EE linear (and
angular if available) velocity from the position-delta used in
_build_observation instead: read the current EE position from obs (e.g.,
obs["ee_pos"] or agent obs field used in _build_observation), subtract a stored
previous EE position (e.g., self._prev_ee_pos or obs["prev_ee_pos"]) and divide
by the environment timestep (e.g., self.control_timestep or self.dt) to produce
ee_lin_vel (and similarly for orientation/ee_ang_vel if previous orientation is
available), default to zeros if previous data or dt is missing, and remove the
qvel[:3]/qvel[3:6] 0.1 scaling; if you prefer higher fidelity later, replace
this with a Jacobian-based conversion when ManiSkill3 exposes the Jacobian API.

@Shr1ftyy
Copy link
Collaborator Author

Shr1ftyy commented Feb 5, 2026

shelving for now. prioritising #30 and already existing envs.

@Shr1ftyy Shr1ftyy closed this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant