Skip to content

feat(genesis): add Genesis physics simulation environment family#33

Open
Shr1ftyy wants to merge 15 commits intomainfrom
feat/genesis
Open

feat(genesis): add Genesis physics simulation environment family#33
Shr1ftyy wants to merge 15 commits intomainfrom
feat/genesis

Conversation

@Shr1ftyy
Copy link
Collaborator

@Shr1ftyy Shr1ftyy commented Feb 7, 2026

Summary

  • Add Genesis physics simulation environment family with a framework-based architecture: GenesisBaseEnvironment handles engine init, scene building, first-person ego camera, and PD control pipeline; thin subclasses define per-robot behavior
  • First concrete environment: Unitree G1 humanoid (43 actuated DOFs) with procedural scene generation (flat ground, 3-6 primitive objects) and four task types (navigate, pickup, place, push)
  • Simplify scene generator to flat-only terrain, removing unused hilly/stepped code
  • Add CLI --viewer and --seed flags to kinitro env test for Genesis environments
  • Wire show_viewer through base env, G1, registry, and CLI; update registry factories to accept **kwargs
  • 72 unit tests covering task types, scene generation, task generation, G1 reward/success, robot config, and registry integration

Test plan

  • 72 unit tests pass (pytest tests/unit/test_genesis.py)
  • ruff check clean
  • ty check clean (only pre-existing genesis unresolved-import)
  • Manual: uv run kinitro env list shows genesis/g1-v0
  • Manual: uv run kinitro env test genesis/g1-v0 --episodes 1 runs without error (requires Genesis + Menagerie)

Summary by CodeRabbit

  • New Features

    • Added Genesis physics support: public Genesis environment APIs, Unitree G1 humanoid, four task types (navigate, pickup, place, push), procedural scene & task generation, registry and CLI integration (genesis family, genesis/g1-v0).
    • Added a containerized evaluation image with build-time kernel warm-up and headless-rendering runtime configuration.
  • Tests

    • Added comprehensive unit tests for data models, scene/task generation, robot config, rewards/success logic, and registry/CLI.

Adds a new environment family powered by the Genesis simulation engine,
with a framework-based architecture where GenesisBaseEnvironment handles
all boilerplate (engine init, scene building, first-person ego camera,
PD control pipeline) and thin subclasses define per-robot behavior.

First concrete environment: Unitree G1 humanoid (43 actuated DOFs)
with procedural scene generation (flat ground, 3-6 primitive objects)
and four task types (navigate, pickup, place, push).

Includes CLI --viewer and --seed flags for `kinitro env test`, registry
**kwargs passthrough for env factory options, and 72 unit tests covering
all pure-logic modules.
@coderabbitai
Copy link

coderabbitai bot commented Feb 7, 2026

📝 Walkthrough
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(genesis): add Genesis physics simulation environment family' clearly and specifically summarizes the main change: adding a Genesis physics environment family to the codebase. It is concise, follows conventional commit format, and accurately represents the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 88.31% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/genesis

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
kinitro/environments/genesis/base.py (3)

55-108: EGL display not terminated on error path.

If eglInitialize succeeds (line 72) but eglChooseConfig fails or returns 0 configs (line 94), the RuntimeError is caught at line 102 without calling eglTerminate(display). The EGL display leaks for the remainder of the process.

This is low-impact since the probe runs once and the process falls back to OSMesa, but it's not clean practice.

Suggested fix: add a `finally` or restructure with try/finally
+        display_initialized = False
         major, minor = ctypes.c_int(), ctypes.c_int()
         ...
         if not egl.eglInitialize(display, ctypes.byref(major), ctypes.byref(minor)):
             raise RuntimeError("eglInitialize failed")
+        display_initialized = True
 
         ...
 
-        egl.eglTerminate(display)
-
         os.environ["PYOPENGL_PLATFORM"] = "egl"
         logger.info("render_platform_detected", platform="egl")
     except Exception as exc:
         os.environ["PYOPENGL_PLATFORM"] = "osmesa"
         logger.info(...)
+    finally:
+        if display_initialized:
+            egl.eglTerminate(display)

615-618: Accessing private _attached_link attribute of Genesis camera.

self._camera._attached_link is an internal attribute that could change or be removed in future Genesis releases without notice. If there's a public API to query attachment status, prefer that. Otherwise, a defensive hasattr guard would at least prevent an AttributeError crash.

Defensive alternative
-            if self._camera._attached_link is not None:
+            if getattr(self._camera, "_attached_link", None) is not None:
                 self._camera.move_to_attach()

324-368: Full scene rebuild on every reset may be a training bottleneck.

Each reset() destroys and recreates the entire Genesis scene (including robot loading, camera setup, and validation render). For RL training loops with thousands of episodes, this overhead could dominate wall-clock time. If Genesis supports resetting entity positions without full scene recreation, that would significantly reduce per-episode latency. Consider caching and reusing the scene when consecutive episodes use compatible terrain/object layouts.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
kinitro/cli/env/commands.py (1)

51-69: ⚠️ Potential issue | 🟡 Minor

Docstring doesn't mention genesis as a supported family.

The build_env docstring lists only metaworld and procthor under "Environment families", but genesis was added to AVAILABLE_ENV_FAMILIES on Line 24.

🤖 Fix all issues with AI agents
In `@environments/genesis/Dockerfile`:
- Line 21: The Dockerfile currently uses the root user (FROM python:3.11-slim)
which Trivy flags; add a non-root user and group before the final
COPY/ENTRYPOINT steps, create the user (e.g., appuser), chown the application
directories/files to that user, and switch to it with USER appuser to run the
container non-root; ensure any runtime directories (logs, tmp, venv path) have
appropriate ownership so the processes started from the existing COPY/ENTRYPOINT
commands can write to them.

In `@environments/genesis/env.py`:
- Around line 80-85: The connect timeout is hard-coded to 5.0s via
httpx.Timeout(timeout, connect=5.0) which can exceed the per-request action
timeout (variable timeout) and block the evaluation; update the call that
creates httpx.AsyncClient to compute a connect timeout based on the request
timeout (e.g., connect_timeout = min(timeout, max(0.1, timeout * 0.5))) and pass
that into httpx.Timeout, or alternatively create and reuse a single
httpx.AsyncClient per evaluation run (instead of creating one inside the request
call) so TCP setup is amortized; adjust the code around httpx.AsyncClient and
httpx.Timeout(timeout, connect=...) (the block that posts and returns
resp.json() after resp.raise_for_status) to use the new connect timeout or
shared client.

In `@kinitro/environments/genesis/base.py`:
- Around line 78-82: Add a brief comment above the lazy-initialized attributes
_scene, _robot, _camera, and _object_entities explaining why Any is used (these
are Genesis-specific types that can't be statically imported because the genesis
package is optional/conditional at runtime), so the use of Any is intentional
for runtime-only resolution; place the comment immediately above the
declarations for _scene, _robot, _camera, and _object_entities and mention that
they will hold Genesis scene/robot/camera/object-entity instances once
initialized.
- Around line 280-298: The code currently lets fallen (from _check_fallen)
overwrite a success reward, producing a conflicting signal when both success and
fallen occur; update the logic so success takes priority: after computing
_episode_success via _check_success and reward via _compute_reward, only apply
the fallen penalty (setting reward = -1.0 and/or clearing _episode_success) if
not _episode_success — i.e., change the fallen override so it checks
_episode_success before modifying reward (or explicitly set _episode_success =
False when fallen if you prefer the alternative). Ensure done calculation (which
uses _episode_success, fallen, and _episode_steps) remains consistent with this
change.

In `@kinitro/environments/genesis/envs/g1_humanoid.py`:
- Around line 14-22: Update the G1Environment class docstring to correct the
number of actuated DOFs from "37" to "43" to match G1_CONFIG.num_actuated_dofs;
locate the docstring in the G1Environment class and replace the phrase "37
actuated DOFs" with "43 actuated DOFs" so the documentation aligns with the
robot config.

In `@kinitro/environments/genesis/scene_generator.py`:
- Around line 100-140: The computation of num_pickupable in _generate_objects
can equal num_objects, leaving no landmark; change the logic in
SceneGenerator._generate_objects to enforce 1 <= num_pickupable <= num_objects-1
(or raise if num_objects < 2). Concretely, after computing the random count,
clamp it with num_pickupable = min(num_pickupable, max(1, num_objects - 1)) (or
validate and raise ValueError when num_objects < 2) so the loop always creates
at least one pickupable and one landmark.
🧹 Nitpick comments (16)
environments/genesis/requirements.txt (1)

1-8: Pin dependency versions for reproducible Docker builds.

Most dependencies are unpinned, which makes Docker image builds non-deterministic. torch in particular is version-sensitive (CUDA compatibility, API changes) and very large — an unexpected major version bump could silently break the image or balloon its size.

Consider pinning at least torch, numpy, and genesis-world to known-good versions (or use >=X,<Y bounds).

kinitro/environments/genesis/robot_config.py (1)

8-35: Consider adding __post_init__ validation for array length consistency.

If joint_names, default_dof_pos, or action_scale don't match num_actuated_dofs, the mismatch will surface as a cryptic indexing error deep in the simulation loop. A quick post-init check would catch misconfiguration early.

🛡️ Proposed validation
 `@dataclass`
 class RobotConfig:
     ...
     supported_task_types: list[str] = field(default_factory=list)
+
+    def __post_init__(self) -> None:
+        n = self.num_actuated_dofs
+        if len(self.joint_names) != n:
+            raise ValueError(
+                f"joint_names has {len(self.joint_names)} entries, expected {n}"
+            )
+        if len(self.default_dof_pos) != n:
+            raise ValueError(
+                f"default_dof_pos has {len(self.default_dof_pos)} entries, expected {n}"
+            )
+        if len(self.action_scale) != n:
+            raise ValueError(
+                f"action_scale has {len(self.action_scale)} entries, expected {n}"
+            )
kinitro/environments/genesis/scene_generator.py (2)

219-220: Silent skip of unknown object types — add a warning log.

If an unrecognized object_type appears (e.g., from a future OBJECT_TYPES expansion), build_scene silently drops it. This will be confusing to debug. The logger is already imported.

Proposed fix
             else:
-                continue
+                logger.warning(
+                    "unknown_object_type_skipped",
+                    object_type=obj_config.object_type,
+                    object_id=obj_config.object_id,
+                )
+                continue

168-178: Add a comment explaining the Any type for gs_scene.

gs_scene uses Any because genesis is a runtime-only dependency (deferred import on Line 178). As per coding guidelines, "Avoid Any type hints unless required; if used, explain why in code comments."

Proposed fix
-    def build_scene(self, gs_scene: Any, scene_config: SceneConfig) -> list[Any]:
+    def build_scene(self, gs_scene: Any, scene_config: SceneConfig) -> list[Any]:  # Any: genesis types are runtime-only
environments/genesis/Dockerfile (1)

58-62: Pin the Menagerie clone to a specific commit SHA for reproducible builds.

Cloning HEAD of main without a commit pin means MJCF assets can change between builds, potentially breaking the G1 joint ordering or DOF count that robot_config.py relies on.

Proposed fix
-RUN git clone --depth 1 --filter=blob:none --sparse \
-    https://github.com/google-deepmind/mujoco_menagerie.git /opt/menagerie && \
+ARG MENAGERIE_COMMIT=<pin-to-known-good-sha>
+RUN git clone --filter=blob:none --sparse \
+    https://github.com/google-deepmind/mujoco_menagerie.git /opt/menagerie && \
     cd /opt/menagerie && \
+    git checkout ${MENAGERIE_COMMIT} && \
     git sparse-checkout set unitree_g1 && \
     echo "Menagerie assets installed"
kinitro/cli/env/commands.py (1)

316-333: String-based env family dispatch for action construction.

Branching on env_id.startswith("genesis/") to decide which action keys to use couples this command to naming conventions. Consider introspecting the environment (e.g., checking supported ActionKeys or the action shape) to make this more extensible as new families are added.

That said, this works fine for now and the test command is a developer tool, so not critical.

environments/genesis/env.py (2)

47-48: Missing return type annotations on several methods.

__init__, _get_env, and cleanup lack return type annotations. As per coding guidelines, public functions and methods should have return types.

Proposed fix
-    def __init__(self):
+    def __init__(self) -> None:
-    def _get_env(self, env_id: str):
+    def _get_env(self, env_id: str) -> RoboticsEnvironment:

(This requires importing RoboticsEnvironment from kinitro.environments.base.)

-    async def cleanup(self):
+    async def cleanup(self) -> None:

As per coding guidelines: "Add return types to public functions and methods."

Also applies to: 54-58, 338-348


343-347: Silent exception swallowing during cleanup.

except Exception: pass hides cleanup failures. Consider logging at debug or warning level to aid troubleshooting in container environments.

Proposed fix
             try:
                 env.close()
-            except Exception:
-                pass
+            except Exception as e:
+                logger.debug("env_close_error", env_id=env_id, error=str(e))

(Note: you'd need to iterate with the env_id key as well.)

kinitro/environments/genesis/task_generator.py (2)

66-68: _get_landmark_objects is defined but never called.

_generate_place_task (Line 231) duplicates the same filter logic inline: [obj for obj in objects if not obj.pickupable]. Either use the helper or remove it.

Proposed fix: use the helper in _generate_place_task

In _generate_place_task at Line 231:

-        destinations = [obj for obj in objects if not obj.pickupable]
+        destinations = _get_landmark_objects(objects)

120-131: Retry loop always re-randomizes the task type on failure.

After a failed attempt, Line 131 picks a new random task type, which may select the same type again. This is fine probabilistically over 20 attempts, but if the scene only makes one task type infeasible, the retry wastes attempts on it. Not a bug, just slightly suboptimal convergence.

kinitro/environments/genesis/base.py (5)

33-38: GPU fallback: if gs.init(gpu) partially initializes and then throws, is it safe to call gs.init(cpu) next?

Some engines don't support re-initialization after a partial failure. If Genesis doesn't handle this gracefully, the process could be left in a corrupted state. Consider logging the GPU failure exception for easier debugging.

💡 Suggested improvement
     try:
         gs.init(backend=getattr(gs, "gpu"))
         logger.info("genesis_initialized", backend="gpu")
-    except Exception:
+    except Exception as e:
+        logger.info("genesis_gpu_init_failed", error=str(e))
         gs.init(backend=getattr(gs, "cpu"))
         logger.info("genesis_initialized", backend="cpu")

224-228: SceneObjectConfig could be imported at module level alongside SceneConfig.

SceneConfig is already a top-level import (line 14). Only SceneObjectConfig is new here. Moving it to the top-level import block would be more consistent and avoids the inline import.


337-341: Silent exception swallowing during scene teardown.

Unlike close() (line 322) which logs the error, _build_scene silently swallows the destroy failure. During development, this hides useful diagnostics.

💡 Suggested improvement
         if self._scene is not None:
             try:
                 self._scene.destroy()
-            except Exception:
-                pass
+            except Exception as e:
+                logger.debug("scene_destroy_on_rebuild", error=str(e))

459-470: Silent exception swallowing hides broken object tracking.

If entity.get_pos() fails, the object is silently dropped from states. At minimum, log a debug message so issues with entity tracking are diagnosable.

💡 Suggested improvement
                 try:
                     pos = entity.get_pos().cpu().numpy().flatten()
                     states[obj_id] = pos
-                except Exception:
-                    pass
+                except Exception as e:
+                    logger.debug("object_state_read_failed", object_id=obj_id, error=str(e))

546-550: Repeated np.array allocation on every step call.

default_pos and action_scale are re-created from Python lists every control step. Since _robot_config is immutable within an episode, these could be cached once (e.g., in _build_scene or __init__).

💡 Suggested improvement

Cache in __init__:

+        # Pre-compute for action pipeline
+        self._default_dof_pos = np.array(robot_config.default_dof_pos, dtype=np.float32)
+        self._action_scale = np.array(robot_config.action_scale, dtype=np.float32)

Then in _apply_action:

-        default_pos = np.array(self._robot_config.default_dof_pos, dtype=np.float32)
-        action_scale = np.array(self._robot_config.action_scale, dtype=np.float32)
-
-        target_pos = default_pos + joint_action * action_scale
+        target_pos = self._default_dof_pos + joint_action * self._action_scale
tests/unit/test_genesis.py (1)

609-614: object.__new__() bypass is fragile if _compute_reward ever accesses self attributes.

Using object.__new__(G1Environment) skips __init__, so no instance attributes are set. This works currently because _compute_reward only uses its parameters, but it'll break silently if the implementation changes to reference self. Consider adding a brief comment noting this assumption.

- Fix success/fallen reward conflict: success now takes priority over
  fallen penalty to avoid contradictory RL signal
- Scale connect timeout with action timeout instead of hardcoding 5s
- Fix G1 docstring DOF count (37 → 43) to match robot config
- Clamp num_pickupable to ensure at least one landmark object
- Add non-root container user for security
- Pin Menagerie clone to specific commit for reproducible builds
- Add explanatory comments for Any type hints (runtime-only genesis dep)
- Replace silent except-pass with debug logging in scene teardown,
  object state reads, and env cleanup
- Cache default_dof_pos and action_scale arrays to avoid per-step allocation
- Use _get_landmark_objects helper instead of inline filter in place task
- Add RobotConfig __post_init__ validation for array length consistency
- Add return type annotations to Actor methods in env.py
- Pin dependency versions in requirements.txt for reproducible builds
- Add genesis to build_env docstring and examples
- Log warning for unknown object types in scene builder
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@environments/genesis/env.py`:
- Line 83: The long line creating the AsyncClient exceeds the formatter's max
length; in the function where httpx.AsyncClient is used (the line with "async
with httpx.AsyncClient(timeout=httpx.Timeout(timeout, connect=min(timeout * 2,
5.0))) as client:"), break the call across multiple lines so each argument is
under the line-length limit—e.g., put the timeout=... and its nested
httpx.Timeout(...) on separate indented lines and move the connect=min(...)
argument to its own line—so Ruff will accept the file and readability is
improved.
- Around line 135-149: The code currently returns an error dict for an invalid
env_id but raises a ValueError for a missing base_url, causing inconsistent
behavior for callers like affinetes; update the base_url validation to mirror
the env_id path by returning the same structured error dict via
_build_error_result (use the same fields: env_id, task_id, seed or task_id,
start_time and an error string mentioning missing base_url/miner endpoint) so
both validation failures consistently return a result object rather than
raising, or alternatively move the base_url check inside the existing try/except
that handles other errors — modify the check near the base_url validation to use
_build_error_result (referencing env_id, base_url, seed, task_id and
_build_error_result) to ensure uniform error handling.
- Around line 55-59: The _get_env cache returns a shared RoboticsEnvironment
from _env_cache causing concurrent evaluate() calls to race on environment
methods reset() and step(); fix by adding a per-env-id asyncio.Lock (e.g.,
_env_locks dict keyed by env_id) and acquire/release the corresponding lock
around any shared environment mutations in evaluate() (or alternatively stop
caching and return a fresh environment from get_environment(env_id) each
evaluate call); update _get_env and evaluate() to use the new locking strategy
(referencing _get_env, _env_cache, evaluate(), reset(), step(), and the per-env
asyncio.Lock) so two concurrent calls for the same env_id are serialized.

In `@kinitro/environments/genesis/robot_config.py`:
- Around line 1-175: CI failure: file needs ruff formatting. Run `ruff format
kinitro/environments/genesis/robot_config.py` (or run ruff format on the repo),
review changes to ensure RobotConfig, G1_JOINT_NAMES, G1_DEFAULT_DOF_POS,
G1_ACTION_SCALE and G1_CONFIG formatting is preserved, then add and commit the
reformatted file to the PR.

In `@kinitro/environments/genesis/task_generator.py`:
- Around line 247-258: The initial_state is storing a direct mutable reference
to target.position (a list) in the TaskSpec returned by the PLACE generator (and
likewise in the PUSH generator), so mutate of the scene later will change the
"initial" snapshot; fix by copying the list when constructing initial_state
(e.g., use a shallow copy of target.position) so initial_target_pos is an
independent snapshot in the TaskSpec returned by the PLACE and PUSH generators
(look for places that set initial_state={"initial_target_pos": target.position}
and replace with a copy).
🧹 Nitpick comments (12)
kinitro/environments/genesis/scene_generator.py (1)

148-167: Silent fallback position when position generation exhausts retries.

If the 50-iteration loop fails to find a valid position, the fallback at Line 167 is returned silently. This could mask placement issues (e.g., an arena that's too small or a min_dist_from_center that's too large relative to the arena). A debug log would help diagnose scene generation anomalies.

Suggested fix
         # Fallback position
+        logger.debug(
+            "object_position_fallback",
+            pickupable=pickupable,
+            arena_size=self._arena_size,
+        )
         return [1.5, 0.0, 0.05]
kinitro/environments/genesis/robot_config.py (1)

23-25: Mutable list defaults shared with module-level constants.

joint_names, default_dof_pos, and action_scale are plain list fields. The dataclass stores a direct reference to the lists passed in (e.g., G1_JOINT_NAMES). If anything mutates those lists after construction, both the module constant and the config instance are affected. Low risk since these are treated as constants, but worth noting if configs are ever cloned or modified at runtime.

kinitro/environments/genesis/task_generator.py (1)

182-189: Hardcoded robot start position [0.0, 0.0, 0.75].

Line 188 hardcodes the G1's spawn height. This value should come from robot_config.init_pos (which is (0.0, 0.0, 0.75) for G1) to stay consistent if another robot is added. However, robot_config isn't passed into _generate_navigate_task. Consider threading it through or removing this field if it's unused downstream.

kinitro/environments/genesis/base.py (4)

482-486: Accessing private _attached_link on the Genesis camera object.

self._camera._attached_link relies on an internal attribute of the Genesis camera. If the Genesis library refactors this, it will silently break. Consider wrapping this in a hasattr check or caching the attached state locally.

Suggested defensive approach
-            if self._camera._attached_link is not None:
+            if getattr(self._camera, "_attached_link", None) is not None:

229-233: Redundant local re-import of SceneConfig.

SceneConfig is already imported at the top of the file (Line 14). Only SceneObjectConfig needs the local import here.

Suggested fix
         from kinitro.environments.genesis.scene_generator import (  # noqa: PLC0415
-            SceneConfig,
             SceneObjectConfig,
         )

535-556: actuated_dof_idx is recomputed every step.

list(range(6, 6 + n)) at Line 555 allocates a new list on every _apply_action call. Since n is fixed per environment instance, cache it alongside _default_dof_pos and _action_scale in __init__.

Suggested fix
 # In __init__:
         self._default_dof_pos = np.array(robot_config.default_dof_pos, dtype=np.float32)
         self._action_scale = np.array(robot_config.action_scale, dtype=np.float32)
+        self._actuated_dof_idx = list(range(6, 6 + robot_config.num_actuated_dofs))

Then use self._actuated_dof_idx in both _apply_action (Line 555) and _build_scene (Line 389).


33-40: Broad except Exception swallows GPU initialization errors indiscriminately.

If gs.init(backend=gs.gpu) fails for a non-trivial reason (e.g., driver incompatibility, OOM), the code silently falls back to CPU without logging the GPU error. At minimum, log the exception at debug/warning level so users can diagnose unexpected CPU fallback.

Suggested fix
     try:
         gs.init(backend=getattr(gs, "gpu"))
         logger.info("genesis_initialized", backend="gpu")
-    except Exception:
+    except Exception as e:
+        logger.info("genesis_gpu_init_failed_falling_back_to_cpu", error=str(e))
         gs.init(backend=getattr(gs, "cpu"))
         logger.info("genesis_initialized", backend="cpu")
environments/genesis/env.py (2)

166-173: Full traceback included in error result — consider truncating for production.

traceback.format_exc() on Line 172 embeds the full Python stack trace into the result dict. If results are persisted or shared externally, this leaks internal file paths and code structure. Consider logging the full traceback server-side and returning only the exception summary in the result.

Proposed fix
         except Exception as e:
+            logger.error(
+                "evaluation_failed",
+                env_id=env_id,
+                task_id=task_id,
+                error=str(e),
+                exc_info=True,
+            )
             return self._build_error_result(
                 env_id=env_id,
                 task_id=task_id,
                 seed=seed,
                 start_time=start_time,
-                error=f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()}",
+                error=f"{type(e).__name__}: {str(e)}",
             )

92-104: Consider a TypedDict for the evaluation result shape.

evaluate, _run_evaluation, and _build_error_result all return bare dict. The result has a well-defined schema (task_name, score, success, time_taken, extra, optional error). A TypedDict would document the contract and enable static type checking for callers.

environments/genesis/Dockerfile (3)

44-50: LIBGL_ALWAYS_SOFTWARE=1 disables GPU acceleration even when a GPU is available.

NVIDIA_DRIVER_CAPABILITIES=all (Line 46) exposes GPU capabilities to the container, but LIBGL_ALWAYS_SOFTWARE=1 (Line 49) unconditionally forces software rendering, negating any GPU benefit. If you intend to use GPU when available and fall back to software rendering otherwise, remove LIBGL_ALWAYS_SOFTWARE and let Mesa auto-detect the driver. If software-only is intentional for reproducibility, document that and consider removing the NVIDIA env var to avoid confusion.


56-65: Consider removing the .git directory after checkout to reduce image size.

The sparse clone leaves .git metadata in /opt/menagerie. Since the commit is already pinned and you only need the asset files, removing it saves image space.

Proposed fix
 RUN git clone --filter=blob:none --sparse \
     https://github.com/google-deepmind/mujoco_menagerie.git /opt/menagerie && \
     cd /opt/menagerie && \
     git checkout ${MENAGERIE_COMMIT} && \
     git sparse-checkout set unitree_g1 && \
-    echo "Menagerie assets installed"
+    rm -rf .git && \
+    echo "Menagerie assets installed (commit: ${MENAGERIE_COMMIT})"

21-21: Consider pinning the base image digest for fully reproducible builds.

python:3.11-slim is a floating tag that changes with patch releases. For deterministic builds, you could pin the image digest. However, this trades off auto-receiving security patches, so it's a judgment call.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@environments/genesis/env.py`:
- Around line 140-160: The error-result branches call self._build_error_result
and currently pass seed=seed or task_id which treats seed=0 as falsy; change
those arguments to use an explicit None check (e.g., seed=(seed if seed is not
None else task_id)) so a valid 0 seed is preserved in both _build_error_result
calls in the genesis env validation block (the two calls that return when env_id
doesn't start with "genesis/" and when base_url is None).

In `@kinitro/environments/genesis/scene_generator.py`:
- Around line 100-146: The docstring for _generate_objects promises distinct
colors but the implementation uses i % len(OBJECT_COLORS) so colors can repeat;
fix by enforcing distinct colors: in _generate_objects validate that num_objects
<= len(OBJECT_COLORS) (raise ValueError if not), update the docstring to state
the max objects equals number of available colors, and then iterate using
available_colors[i] (no modulo) when constructing SceneObjectConfig so each
object gets a unique color from OBJECT_COLORS.
- Around line 193-235: The loop in build_scene that iterates
scene_config.objects can produce a shorter entities list when an unknown type is
skipped, which breaks positional mapping used by _read_object_states; instead of
logging and continue, fail-fast by raising a clear exception (e.g., ValueError)
when obj_config.object_type is unrecognized (include object_type and object_id
in the message) so the mismatch cannot silently occur; update the block handling
the default/else case (the logger.warning + continue) to raise the error, or
alternatively change the function to append explicit (entity, obj_config) pairs
to entities so callers can use the explicit mapping between created entities and
scene_config.objects.

In `@kinitro/environments/genesis/task_generator.py`:
- Around line 182-189: The TaskSpec construction is storing mutable references
to SceneObject.position via the target_position and destination_position fields;
update all task generators that build TaskSpec (the NAVIGATE, MOVE, PICK, and
PLACE generators where TaskSpec(...) is returned) to defensively copy positions
by replacing uses of target.position or destination.position with
list(target.position) (similar to the earlier fix for initial_state) so
TaskSpec.target_position and TaskSpec.destination_position hold independent
lists and won't change if SceneObject.position mutates during simulation.
- Around line 89-138: The retry loop in generate_task currently overwrites a
caller-specified task_type on subsequent attempts; preserve the caller's intent
by capturing the initial_task_type (e.g., initial_task_type = task_type) or a
boolean flag (e.g., requested_specific = task_type is not None) before the loop
and only re-randomize task_type inside the for _ in range(self._max_attempts)
loop when the caller did not request a specific type (i.e., if not
requested_specific then task_type = _random_choice(available_types, rng));
ensure the first attempt still uses the provided task_type and keep references
to generate_task, _try_generate_task, available_types, and _random_choice when
making the change.
🧹 Nitpick comments (3)
environments/genesis/env.py (1)

238-306: Synchronous physics calls (env.reset / env.step) block the event loop.

reset() and step() run Genesis physics synchronously inside an async method. While the per-env lock serializes same-env evaluations, these blocking calls still stall the entire event loop—preventing concurrent evaluations on different env_ids, health-check responses, and timeout enforcement for other coroutines.

Consider offloading to a thread:

obs = await asyncio.to_thread(env.reset, task_config)
# ...
obs, reward, done, info = await asyncio.to_thread(env.step, action)
kinitro/environments/genesis/scene_generator.py (1)

148-172: Fallback position is deterministic — may cause object stacking.

If multiple objects fail the 50-iteration placement loop (e.g., with a very small arena), they all land on the same fallback position [1.5, 0.0, 0.05]. With default arena_size=5.0 this is unlikely, but the silent fallback could cause overlapping objects in edge cases.

A minor improvement would be to log a warning when falling back, consistent with the structured logging approach used elsewhere in this module.

🔧 Suggested improvement
         # Fallback position
+        logger.warning(
+            "object_position_fallback",
+            arena_size=self._arena_size,
+            pickupable=pickupable,
+        )
         return [1.5, 0.0, 0.05]
kinitro/environments/genesis/robot_config.py (1)

35-35: Consider using list[TaskType] or Literal instead of list[str] for supported_task_types.

Currently, supported_task_types is list[str] and is compared against TaskType.value strings in check_task_feasibility and TaskGenerator. This works but provides no compile-time safety against typos (e.g., "navgate" would silently be ignored). Using list[TaskType] would catch mismatches earlier. This would require importing TaskType from task_types, which is a reasonable intra-package dependency.

Add per-env-id asyncio.Lock to prevent concurrent evaluation races,
consistent error-dict returns for base_url validation, defensive copy
of mutable position in initial_state, num_objects<2 guard in scene
generator, and ruff formatting fixes.
Fix seed=0 falsy bug in error paths (use `is not None` check), cap
num_objects to available colors for distinct-color guarantee, raise on
unknown object types in build_scene to prevent entity/object index
misalignment, preserve caller-specified task_type across retries, and
defensively copy target_position/destination_position in all task
generators.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@environments/genesis/env.py`:
- Around line 164-169: The per-env lock creation has a race where two coroutines
can both see env_id missing and create separate asyncio.Lock objects; change the
creation to use an atomic upsert like self._env_locks.setdefault(env_id,
asyncio.Lock()) before acquiring the lock so both coroutines get the same lock
object; update the code around self._env_locks, env_id, and asyncio.Lock
creation to call setdefault once and then use async with on the returned lock.
- Around line 220-238: Update the docstring (currently describing steps at lines
~113–117) to match the actual implementation order: first call the miner's
/reset endpoint via self._call_miner (sending task metadata), then reset the
simulation environment with env.reset(task_config); mention that the miner
receives only task metadata on reset and that the initial observation is
provided to the miner on the first /act call. Ensure references to env.reset,
self._call_miner, and the miner /reset behavior are clear in the docstring.
🧹 Nitpick comments (4)
environments/genesis/env.py (3)

97-109: **kwargs is accepted but silently discarded.

evaluate accepts **kwargs (line 108) but never forwards them anywhere—not to _run_evaluation, _get_env, or _call_miner. This silently swallows typos or unsupported parameters from callers. If it's intentional (forward-compat), add a comment; otherwise remove it to surface caller mistakes.


49-61: Add type annotations for instance attributes.

_env_cache is a bare dict. Adding type hints improves readability and tooling support.

Proposed fix
     def __init__(self) -> None:
         """Initialize the evaluation actor."""
-        self._env_cache = {}
-        self._env_locks: dict[str, asyncio.Lock] = {}
+        self._env_cache: dict[str, RoboticsEnvironment] = {}
+        self._env_locks: dict[str, asyncio.Lock] = {}

356-366: cleanup doesn't clear _env_locks.

After clearing _env_cache, the stale locks in _env_locks remain. If the actor is reused after cleanup, new environments would still share old lock objects—not a correctness bug since Lock is stateless when not held, but clearing both keeps the state consistent.

Proposed fix
         self._env_cache.clear()
+        self._env_locks.clear()
kinitro/environments/genesis/scene_generator.py (1)

100-150: Modulo on line 125 is now dead code after the cap on line 116–117.

Since num_objects is capped to len(available_colors), the i % len(available_colors) can never wrap. Not harmful, but the modulo is misleading—it suggests wrapping is possible when it isn't. Consider simplifying to available_colors[i].

Use dict.setdefault for per-env asyncio.Lock creation to prevent
a race where two coroutines both create separate locks. Fix evaluate()
docstring to reflect actual execution order: miner /reset is called
before env.reset(), not after.
Add genesis to allowed-unresolved-imports in [tool.ty.analysis] since
it is a runtime-only dependency installed in the Docker container but
not available in the CI environment.
Add libosmesa6 to the Docker image for CPU-only software rendering
fallback. Remove LIBGL_ALWAYS_SOFTWARE=1 which caused segfaults when
/dev/dri device was passed through to the container.
@Shr1ftyy Shr1ftyy requested a review from rayokamoto February 7, 2026 14:52
RUN groupadd --gid 1000 genesis && \
useradd --uid 1000 --gid genesis --create-home genesis

# Copy the kinitro environments package (copied by env build command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this comment for? So is this copy line redundant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you set build flags to build for GPU vs CPU rendering? Necessary dependencies and env vars can be installed and set accordingly without unnecessarily bloating the image

return [x, y, 0.05] # Small offset above flat ground

# Fallback position
return [1.5, 0.0, 0.05]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x=1.5 so it's not at the origin where the robot is?

Returns:
List of Genesis entity references for object tracking
"""
import genesis as gs # noqa: PLC0415
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@environments/genesis/warmup_kernel_cache.py`:
- Around line 75-86: The current broad except around scene.build(n_envs=1)
captures all exceptions and treats them as expected EGL visualizer init
failures; change this to only swallow EGL-related initialization errors by
inspecting the caught exception (variable e) for EGL-related keywords (e.g.,
"EGL", "egl", "NoDisplay", "Could not initialize GLX", etc.) and only perform
the warm-up success prints and return in that case; for any other exception
message, re-raise the exception (or log and raise) so real build errors (missing
MJCF, invalid scene) are not silently ignored; keep references to scene.build,
t0, build_elapsed, and e when implementing the conditional handling.

In `@scripts/test-genesis-affinetes.py`:
- Around line 164-169: The import of the traceback module is inside the except
block causing PLC0415; move "import traceback" to the top-level imports near the
other imports in the file and remove the inline import from the except Exception
as e handler so the except block only prints the error and calls
traceback.print_exc() (now available from the top-level import) before
sys.exit(1).
🧹 Nitpick comments (4)
environments/genesis/warmup_kernel_cache.py (1)

26-26: Add a brief comment explaining the getattr indirection.

gs.cpu would be more readable. If this is working around a type-checker limitation (e.g., ty or mypy not resolving the attribute on the genesis module), a short comment will prevent someone from "fixing" it later.

-    gs.init(backend=getattr(gs, "cpu"))
+    gs.init(backend=getattr(gs, "cpu"))  # getattr avoids unresolved-attribute from type checker
environments/genesis/Dockerfile (1)

58-64: Consider adding --depth 1 after checkout for a smaller image layer.

After sparse checkout and pinning to a specific commit, the git history is unnecessary. You could save image size by removing the .git directory:

     git sparse-checkout set unitree_g1 && \
-    echo "Menagerie assets installed"
+    rm -rf .git && \
+    echo "Menagerie assets installed"
scripts/test-genesis-affinetes.py (2)

29-35: Add return type annotations to cleanup() and main().

Both are public functions missing -> None.

-def cleanup():
+def cleanup() -> None:
-def main():
+def main() -> None:

As per coding guidelines, "Add return types to public functions and methods."


93-110: call_evaluate return type dict could be more specific.

Minor: dict without type parameters is less informative. Consider dict[str, Any] to match the JSON structure, though for a test script this is low priority.

Comment on lines 75 to 86
try:
scene.build(n_envs=1)
except Exception as e:
# Genesis compiles physics kernels before building the visualizer.
# Without EGL (typical in docker build), the visualizer init fails,
# but the physics kernels are already compiled and will be cached
# by Genesis's atexit handler.
build_elapsed = time.perf_counter() - t0
print(f"[warmup] scene.build() physics kernels compiled in {build_elapsed:.1f}s")
print(f"[warmup] Visualizer init failed (expected without EGL): {type(e).__name__}")
print("[warmup] Kernel cache warm-up complete (cached on process exit)")
return
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Broad Exception catch may silently swallow non-EGL build errors.

If the MJCF file is missing or the scene definition is invalid, this handler will print a misleading "Kernel cache warm-up complete" message and return success. Consider narrowing the catch or at least checking the exception message for EGL-related keywords before treating it as expected.

Proposed fix
     try:
         scene.build(n_envs=1)
     except Exception as e:
-        # Genesis compiles physics kernels before building the visualizer.
-        # Without EGL (typical in docker build), the visualizer init fails,
-        # but the physics kernels are already compiled and will be cached
-        # by Genesis's atexit handler.
         build_elapsed = time.perf_counter() - t0
-        print(f"[warmup] scene.build() physics kernels compiled in {build_elapsed:.1f}s")
-        print(f"[warmup] Visualizer init failed (expected without EGL): {type(e).__name__}")
-        print("[warmup] Kernel cache warm-up complete (cached on process exit)")
-        return
+        err_msg = str(e).lower()
+        if "egl" in err_msg or "display" in err_msg or "visualizer" in err_msg:
+            print(f"[warmup] scene.build() physics kernels compiled in {build_elapsed:.1f}s")
+            print(f"[warmup] Visualizer init failed (expected without EGL): {type(e).__name__}")
+            print("[warmup] Kernel cache warm-up complete (cached on process exit)")
+            return
+        # Unexpected error — re-raise so the Docker build fails visibly
+        raise
🤖 Prompt for AI Agents
In `@environments/genesis/warmup_kernel_cache.py` around lines 75 - 86, The
current broad except around scene.build(n_envs=1) captures all exceptions and
treats them as expected EGL visualizer init failures; change this to only
swallow EGL-related initialization errors by inspecting the caught exception
(variable e) for EGL-related keywords (e.g., "EGL", "egl", "NoDisplay", "Could
not initialize GLX", etc.) and only perform the warm-up success prints and
return in that case; for any other exception message, re-raise the exception (or
log and raise) so real build errors (missing MJCF, invalid scene) are not
silently ignored; keep references to scene.build, t0, build_elapsed, and e when
implementing the conditional handling.

Comment on lines 164 to 169
except Exception as e:
print(f"\nERROR: {type(e).__name__}: {e}", file=sys.stderr)
import traceback

traceback.print_exc()
sys.exit(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Move import traceback to the top of the file — pipeline failure.

The ruff check (PLC0415) flags this import inside the exception handler. Move it to the top-level imports to fix the CI failure.

Proposed fix

At the top of the file:

 import json
 import subprocess
 import sys
 import time
+import traceback

In the exception handler:

     except Exception as e:
         print(f"\nERROR: {type(e).__name__}: {e}", file=sys.stderr)
-        import traceback
-
         traceback.print_exc()
         sys.exit(1)
🧰 Tools
🪛 GitHub Actions: ci

[error] 166-166: PLCFG0415: import should be at the top-level of a file. The import of traceback is occurring inside an exception handler.


[error] 166-166: ruff check failed due to PLC0415: import should be at the top-level of a file.

🤖 Prompt for AI Agents
In `@scripts/test-genesis-affinetes.py` around lines 164 - 169, The import of the
traceback module is inside the except block causing PLC0415; move "import
traceback" to the top-level imports near the other imports in the file and
remove the inline import from the except Exception as e handler so the except
block only prints the error and calls traceback.print_exc() (now available from
the top-level import) before sys.exit(1).

Bake compiled physics kernels into the Docker image layer to avoid
~36s of JIT compilation on every fresh container startup. A warmup
script builds a minimal headless scene (ground plane, G1 robot, all
primitive morph types) during `docker build`, triggering kernel
compilation. The cache layer is placed before app code copies to
maximize Docker layer caching.
Auto-detect EGL availability at runtime via ctypes probe and fall back
to OSMesa when no GPU is present (e.g. Docker Desktop, CPU-only hosts).
Use OSMesa during docker build warmup so both physics and rendering
kernels get fully cached.
The warmup_kernel_cache.py file uses genesis APIs that are only
available inside the Docker container. Exclude it from ty checking
to fix unresolved-attribute errors in CI.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@kinitro/environments/genesis/base.py`:
- Around line 601-612: The code slices joint_action to n but doesn't check for
undersized arrays, which will cause a NumPy broadcast error when computing
target_pos; update the handling in the block that uses
action.get_continuous(ActionKeys.JOINT_POS_TARGET) to validate that joint_action
has at least n elements (where n = self._robot_config.num_actuated_dofs) before
slicing, and if it's shorter either raise a descriptive ValueError (including
the received length and expected n) or explicitly pad the array to length n
(e.g., with zeros) before np.clip and the computation of target_pos =
self._default_dof_pos + joint_action * self._action_scale so broadcasting is
always valid.
🧹 Nitpick comments (5)
environments/genesis/warmup_kernel_cache.py (2)

29-29: getattr(gs, "cpu") works around ty checker limitations but obscures intent.

This is a reasonable workaround for the ty check unresolved-attribute errors on the genesis module (which uses dynamic attribute injection). A brief comment explaining why getattr is used would help future readers.

Suggested clarification
-    gs.init(backend=getattr(gs, "cpu"))
+    # Use getattr to avoid false-positive from ty check (genesis uses dynamic attrs)
+    gs.init(backend=getattr(gs, "cpu"))

25-25: Extract the G1 MJCF path to a shared constant to prevent divergence.

The string "unitree_g1/g1_with_hands.xml" is defined in two places: as MENAGERIE_ROBOT in warmup_kernel_cache.py (line 25) and hardcoded in G1_CONFIG.mjcf_path in robot_config.py (line 157). If the path changes in one location but not the other, the warmup script will compile kernels for the wrong robot model at runtime. Consider extracting this to a shared constant module or importing it from a single source of truth.

environments/genesis/Dockerfile (1)

46-49: NVIDIA_DRIVER_CAPABILITIES=all is broader than needed.

For headless rendering, NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utility would be sufficient and follows the principle of least privilege. all exposes video encode/decode and other capabilities unnecessarily.

Suggested change
-ENV NVIDIA_DRIVER_CAPABILITIES=all
+ENV NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utility
kinitro/environments/genesis/base.py (2)

290-293: Redundant local import — SceneConfig and SceneObjectConfig are already imported at module level (line 17).

Both symbols are unconditionally imported at the top of the file, so this inner import is unnecessary.

Suggested fix
-        # Reconstruct SceneConfig
-        from kinitro.environments.genesis.scene_generator import (  # noqa: PLC0415
-            SceneConfig,
-            SceneObjectConfig,
-        )
-
-        obj_configs = [
+        obj_configs = [

544-545: Accessing private _attached_link on the Genesis camera is fragile.

This relies on an internal implementation detail of the Genesis Camera class. If Genesis renames or restructures this attribute, this code breaks silently (the hasattr check would just skip the move_to_attach call, degrading camera tracking without any error).

Consider either: (a) always calling move_to_attach() unconditionally (it should be a no-op when no link is attached), or (b) tracking the attachment state locally via a boolean flag set in _attach_ego_camera.

Option (b): track attachment state locally

Add a field in __init__:

self._camera_attached: bool = False

Set it in _attach_ego_camera after successful attachment:

self._camera_attached = True

Then here:

-            if self._camera._attached_link is not None:
+            if self._camera_attached:
                 self._camera.move_to_attach()

Comment on lines +601 to +612
joint_action = action.get_continuous(ActionKeys.JOINT_POS_TARGET)
if joint_action is None:
return

n = self._robot_config.num_actuated_dofs
joint_action = joint_action[:n]

# Clip to [-1, 1]
joint_action = np.clip(joint_action, -1.0, 1.0)

# Map to PD targets: target = default_pos + action * action_scale
target_pos = self._default_dof_pos + joint_action * self._action_scale
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No validation that the action array matches the expected DOF count.

Line 606 truncates to n elements but doesn't guard against an undersized array. If joint_action has fewer than n elements, the arithmetic on line 612 will raise a NumPy broadcast error because _default_dof_pos (shape (n,)) and joint_action (shape (<n,)) are incompatible.

Suggested defensive check
         joint_action = joint_action[:n]
+        if joint_action.shape[0] != n:
+            logger.warning(
+                "action_shape_mismatch",
+                expected=n,
+                got=joint_action.shape[0],
+            )
+            padded = np.zeros(n, dtype=np.float32)
+            padded[: joint_action.shape[0]] = joint_action
+            joint_action = padded
🤖 Prompt for AI Agents
In `@kinitro/environments/genesis/base.py` around lines 601 - 612, The code slices
joint_action to n but doesn't check for undersized arrays, which will cause a
NumPy broadcast error when computing target_pos; update the handling in the
block that uses action.get_continuous(ActionKeys.JOINT_POS_TARGET) to validate
that joint_action has at least n elements (where n =
self._robot_config.num_actuated_dofs) before slicing, and if it's shorter either
raise a descriptive ValueError (including the received length and expected n) or
explicitly pad the array to length n (e.g., with zeros) before np.clip and the
computation of target_pos = self._default_dof_pos + joint_action *
self._action_scale so broadcasting is always valid.

- Dockerfile: trim header, shallow clone menagerie, simplify comments
- env.py: make base_url required, rename endpoint->path, add EvalResult
  TypedDict, fix placeholder, clarify binary scoring in docstring
- metadata.json: pretty-print
- base.py: narrow GPU fallback to RuntimeError, remove section headers,
  remove redundant inline import, add deferred import comments
- scene_generator.py: make min_dist_from_center a constructor param,
  derive fallback position from it, add deferred import comment
- task_types.py: convert if/elif to match/case, add RGBA TODO
camera.render() was failing silently because Genesis pyrender hardcodes
OSMesaPlatform.supports_framebuffers() = False, sending rendering down
a broken non-FBO code path where _forward_pass() returns None.

- Dockerfile: libosmesa6 -> libosmesa6-dev for GL 3.3 core profile
- base.py: patch OSMesaPlatform.supports_framebuffers to return True
  when using OSMesa, add _validate_camera() for early detection, add
  _camera_available flag to avoid duplicate warnings, improve logging
- warmup: add camera render validation to fail-fast during docker build
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@kinitro/environments/genesis/base.py`:
- Around line 99-102: Remove the unused type suppression on the dynamic
assignment to OSMesaPlatform.supports_framebuffers: replace the line
"OSMesaPlatform.supports_framebuffers = lambda self: True  # type:
ignore[assignment]" with "OSMesaPlatform.supports_framebuffers = lambda self:
True" (no type: ignore comment) so the CI `ty check` no longer flags an unused
suppression; leave the surrounding import and
logger.info("osmesa_framebuffer_patch_applied") unchanged.
🧹 Nitpick comments (5)
environments/genesis/warmup_kernel_cache.py (2)

33-33: Line exceeds target length (~113 chars).

Move the comment to a separate line above the constant.

Suggested fix
+# Small image size to minimize OSMesa rendering time during warm-up,
+# while still triggering all relevant kernels.
-IMAGE_SIZE = 84  # Small image size to minimize OSMesa rendering time during warm-up, while still triggering all relevant kernels.
+IMAGE_SIZE = 84

As per coding guidelines, "Target line length of 100 characters (Ruff ignores E501 but keep lines reasonable)."


37-37: Prefer direct attribute access over getattr without a default.

getattr(gs, "cpu") with no default behaves identically to gs.cpu (raises AttributeError if missing) but obscures the intent and defeats static analysis / IDE autocomplete. If this is to silence a type checker, a # type: ignore comment is more conventional.

Suggested fix
-    gs.init(backend=getattr(gs, "cpu"))
+    gs.init(backend=gs.cpu)  # type: ignore[attr-defined]
environments/genesis/Dockerfile (1)

39-47: Consider removing the .git directory after checkout to reduce image size.

After the sparse shallow clone completes, the .git metadata is no longer needed at runtime. Removing it trims the layer.

Suggested fix
     git fetch --depth 1 origin ${MENAGERIE_COMMIT} && \
-    git checkout FETCH_HEAD
+    git checkout FETCH_HEAD && \
+    rm -rf /opt/menagerie/.git
kinitro/environments/genesis/base.py (2)

121-127: getattr(gs, "gpu") obscures intent — consider a comment or # type: ignore.

Using getattr(gs, "gpu") instead of gs.gpu is presumably to silence the type checker for the unresolvable genesis module. A brief inline comment explaining this (or a targeted # type: ignore[attr-defined] on direct access) would be clearer for future readers.


590-593: Accessing private _attached_link is fragile.

Line 592 reads self._camera._attached_link, a private attribute of the Genesis camera. This could break silently on a Genesis version upgrade. Consider tracking the attachment state internally (e.g., a self._camera_attached flag set in _attach_ego_camera) instead of relying on the library's private API.

The direct import of genesis.ext.pyrender is unresolvable in CI where
genesis is not installed. Use importlib.import_module + setattr to keep
the patch opaque to the static type checker.
The previous EGL probe only checked eglGetDisplay + eglInitialize, which
succeed on Mesa's software EGL even without a GPU. Genesis then crashes
on eglCreateContext with EGL_BAD_MATCH. Now the probe also calls
eglChooseConfig to verify renderable configs exist before committing to
EGL, falling back to OSMesa immediately when no GPU is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants