feat(genesis): add Genesis physics simulation environment family#33
feat(genesis): add Genesis physics simulation environment family#33
Conversation
Adds a new environment family powered by the Genesis simulation engine, with a framework-based architecture where GenesisBaseEnvironment handles all boilerplate (engine init, scene building, first-person ego camera, PD control pipeline) and thin subclasses define per-robot behavior. First concrete environment: Unitree G1 humanoid (43 actuated DOFs) with procedural scene generation (flat ground, 3-6 primitive objects) and four task types (navigate, pickup, place, push). Includes CLI --viewer and --seed flags for `kinitro env test`, registry **kwargs passthrough for env factory options, and 72 unit tests covering all pure-logic modules.
📝 Walkthrough🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
kinitro/cli/env/commands.py (1)
51-69:⚠️ Potential issue | 🟡 MinorDocstring doesn't mention
genesisas a supported family.The
build_envdocstring lists onlymetaworldandprocthorunder "Environment families", butgenesiswas added toAVAILABLE_ENV_FAMILIESon Line 24.
🤖 Fix all issues with AI agents
In `@environments/genesis/Dockerfile`:
- Line 21: The Dockerfile currently uses the root user (FROM python:3.11-slim)
which Trivy flags; add a non-root user and group before the final
COPY/ENTRYPOINT steps, create the user (e.g., appuser), chown the application
directories/files to that user, and switch to it with USER appuser to run the
container non-root; ensure any runtime directories (logs, tmp, venv path) have
appropriate ownership so the processes started from the existing COPY/ENTRYPOINT
commands can write to them.
In `@environments/genesis/env.py`:
- Around line 80-85: The connect timeout is hard-coded to 5.0s via
httpx.Timeout(timeout, connect=5.0) which can exceed the per-request action
timeout (variable timeout) and block the evaluation; update the call that
creates httpx.AsyncClient to compute a connect timeout based on the request
timeout (e.g., connect_timeout = min(timeout, max(0.1, timeout * 0.5))) and pass
that into httpx.Timeout, or alternatively create and reuse a single
httpx.AsyncClient per evaluation run (instead of creating one inside the request
call) so TCP setup is amortized; adjust the code around httpx.AsyncClient and
httpx.Timeout(timeout, connect=...) (the block that posts and returns
resp.json() after resp.raise_for_status) to use the new connect timeout or
shared client.
In `@kinitro/environments/genesis/base.py`:
- Around line 78-82: Add a brief comment above the lazy-initialized attributes
_scene, _robot, _camera, and _object_entities explaining why Any is used (these
are Genesis-specific types that can't be statically imported because the genesis
package is optional/conditional at runtime), so the use of Any is intentional
for runtime-only resolution; place the comment immediately above the
declarations for _scene, _robot, _camera, and _object_entities and mention that
they will hold Genesis scene/robot/camera/object-entity instances once
initialized.
- Around line 280-298: The code currently lets fallen (from _check_fallen)
overwrite a success reward, producing a conflicting signal when both success and
fallen occur; update the logic so success takes priority: after computing
_episode_success via _check_success and reward via _compute_reward, only apply
the fallen penalty (setting reward = -1.0 and/or clearing _episode_success) if
not _episode_success — i.e., change the fallen override so it checks
_episode_success before modifying reward (or explicitly set _episode_success =
False when fallen if you prefer the alternative). Ensure done calculation (which
uses _episode_success, fallen, and _episode_steps) remains consistent with this
change.
In `@kinitro/environments/genesis/envs/g1_humanoid.py`:
- Around line 14-22: Update the G1Environment class docstring to correct the
number of actuated DOFs from "37" to "43" to match G1_CONFIG.num_actuated_dofs;
locate the docstring in the G1Environment class and replace the phrase "37
actuated DOFs" with "43 actuated DOFs" so the documentation aligns with the
robot config.
In `@kinitro/environments/genesis/scene_generator.py`:
- Around line 100-140: The computation of num_pickupable in _generate_objects
can equal num_objects, leaving no landmark; change the logic in
SceneGenerator._generate_objects to enforce 1 <= num_pickupable <= num_objects-1
(or raise if num_objects < 2). Concretely, after computing the random count,
clamp it with num_pickupable = min(num_pickupable, max(1, num_objects - 1)) (or
validate and raise ValueError when num_objects < 2) so the loop always creates
at least one pickupable and one landmark.
🧹 Nitpick comments (16)
environments/genesis/requirements.txt (1)
1-8: Pin dependency versions for reproducible Docker builds.Most dependencies are unpinned, which makes Docker image builds non-deterministic.
torchin particular is version-sensitive (CUDA compatibility, API changes) and very large — an unexpected major version bump could silently break the image or balloon its size.Consider pinning at least
torch,numpy, andgenesis-worldto known-good versions (or use>=X,<Ybounds).kinitro/environments/genesis/robot_config.py (1)
8-35: Consider adding__post_init__validation for array length consistency.If
joint_names,default_dof_pos, oraction_scaledon't matchnum_actuated_dofs, the mismatch will surface as a cryptic indexing error deep in the simulation loop. A quick post-init check would catch misconfiguration early.🛡️ Proposed validation
`@dataclass` class RobotConfig: ... supported_task_types: list[str] = field(default_factory=list) + + def __post_init__(self) -> None: + n = self.num_actuated_dofs + if len(self.joint_names) != n: + raise ValueError( + f"joint_names has {len(self.joint_names)} entries, expected {n}" + ) + if len(self.default_dof_pos) != n: + raise ValueError( + f"default_dof_pos has {len(self.default_dof_pos)} entries, expected {n}" + ) + if len(self.action_scale) != n: + raise ValueError( + f"action_scale has {len(self.action_scale)} entries, expected {n}" + )kinitro/environments/genesis/scene_generator.py (2)
219-220: Silent skip of unknown object types — add a warning log.If an unrecognized
object_typeappears (e.g., from a futureOBJECT_TYPESexpansion),build_scenesilently drops it. This will be confusing to debug. Theloggeris already imported.Proposed fix
else: - continue + logger.warning( + "unknown_object_type_skipped", + object_type=obj_config.object_type, + object_id=obj_config.object_id, + ) + continue
168-178: Add a comment explaining theAnytype forgs_scene.
gs_sceneusesAnybecausegenesisis a runtime-only dependency (deferred import on Line 178). As per coding guidelines, "AvoidAnytype hints unless required; if used, explain why in code comments."Proposed fix
- def build_scene(self, gs_scene: Any, scene_config: SceneConfig) -> list[Any]: + def build_scene(self, gs_scene: Any, scene_config: SceneConfig) -> list[Any]: # Any: genesis types are runtime-onlyenvironments/genesis/Dockerfile (1)
58-62: Pin the Menagerie clone to a specific commit SHA for reproducible builds.Cloning
HEADofmainwithout a commit pin means MJCF assets can change between builds, potentially breaking the G1 joint ordering or DOF count thatrobot_config.pyrelies on.Proposed fix
-RUN git clone --depth 1 --filter=blob:none --sparse \ - https://github.com/google-deepmind/mujoco_menagerie.git /opt/menagerie && \ +ARG MENAGERIE_COMMIT=<pin-to-known-good-sha> +RUN git clone --filter=blob:none --sparse \ + https://github.com/google-deepmind/mujoco_menagerie.git /opt/menagerie && \ cd /opt/menagerie && \ + git checkout ${MENAGERIE_COMMIT} && \ git sparse-checkout set unitree_g1 && \ echo "Menagerie assets installed"kinitro/cli/env/commands.py (1)
316-333: String-based env family dispatch for action construction.Branching on
env_id.startswith("genesis/")to decide which action keys to use couples this command to naming conventions. Consider introspecting the environment (e.g., checking supportedActionKeysor the action shape) to make this more extensible as new families are added.That said, this works fine for now and the test command is a developer tool, so not critical.
environments/genesis/env.py (2)
47-48: Missing return type annotations on several methods.
__init__,_get_env, andcleanuplack return type annotations. As per coding guidelines, public functions and methods should have return types.Proposed fix
- def __init__(self): + def __init__(self) -> None:- def _get_env(self, env_id: str): + def _get_env(self, env_id: str) -> RoboticsEnvironment:(This requires importing
RoboticsEnvironmentfromkinitro.environments.base.)- async def cleanup(self): + async def cleanup(self) -> None:As per coding guidelines: "Add return types to public functions and methods."
Also applies to: 54-58, 338-348
343-347: Silent exception swallowing during cleanup.
except Exception: passhides cleanup failures. Consider logging atdebugorwarninglevel to aid troubleshooting in container environments.Proposed fix
try: env.close() - except Exception: - pass + except Exception as e: + logger.debug("env_close_error", env_id=env_id, error=str(e))(Note: you'd need to iterate with the env_id key as well.)
kinitro/environments/genesis/task_generator.py (2)
66-68:_get_landmark_objectsis defined but never called.
_generate_place_task(Line 231) duplicates the same filter logic inline:[obj for obj in objects if not obj.pickupable]. Either use the helper or remove it.Proposed fix: use the helper in _generate_place_task
In
_generate_place_taskat Line 231:- destinations = [obj for obj in objects if not obj.pickupable] + destinations = _get_landmark_objects(objects)
120-131: Retry loop always re-randomizes the task type on failure.After a failed attempt, Line 131 picks a new random task type, which may select the same type again. This is fine probabilistically over 20 attempts, but if the scene only makes one task type infeasible, the retry wastes attempts on it. Not a bug, just slightly suboptimal convergence.
kinitro/environments/genesis/base.py (5)
33-38: GPU fallback: ifgs.init(gpu)partially initializes and then throws, is it safe to callgs.init(cpu)next?Some engines don't support re-initialization after a partial failure. If Genesis doesn't handle this gracefully, the process could be left in a corrupted state. Consider logging the GPU failure exception for easier debugging.
💡 Suggested improvement
try: gs.init(backend=getattr(gs, "gpu")) logger.info("genesis_initialized", backend="gpu") - except Exception: + except Exception as e: + logger.info("genesis_gpu_init_failed", error=str(e)) gs.init(backend=getattr(gs, "cpu")) logger.info("genesis_initialized", backend="cpu")
224-228:SceneObjectConfigcould be imported at module level alongsideSceneConfig.
SceneConfigis already a top-level import (line 14). OnlySceneObjectConfigis new here. Moving it to the top-level import block would be more consistent and avoids the inline import.
337-341: Silent exception swallowing during scene teardown.Unlike
close()(line 322) which logs the error,_build_scenesilently swallows the destroy failure. During development, this hides useful diagnostics.💡 Suggested improvement
if self._scene is not None: try: self._scene.destroy() - except Exception: - pass + except Exception as e: + logger.debug("scene_destroy_on_rebuild", error=str(e))
459-470: Silent exception swallowing hides broken object tracking.If
entity.get_pos()fails, the object is silently dropped fromstates. At minimum, log a debug message so issues with entity tracking are diagnosable.💡 Suggested improvement
try: pos = entity.get_pos().cpu().numpy().flatten() states[obj_id] = pos - except Exception: - pass + except Exception as e: + logger.debug("object_state_read_failed", object_id=obj_id, error=str(e))
546-550: Repeatednp.arrayallocation on every step call.
default_posandaction_scaleare re-created from Python lists every control step. Since_robot_configis immutable within an episode, these could be cached once (e.g., in_build_sceneor__init__).💡 Suggested improvement
Cache in
__init__:+ # Pre-compute for action pipeline + self._default_dof_pos = np.array(robot_config.default_dof_pos, dtype=np.float32) + self._action_scale = np.array(robot_config.action_scale, dtype=np.float32)Then in
_apply_action:- default_pos = np.array(self._robot_config.default_dof_pos, dtype=np.float32) - action_scale = np.array(self._robot_config.action_scale, dtype=np.float32) - - target_pos = default_pos + joint_action * action_scale + target_pos = self._default_dof_pos + joint_action * self._action_scaletests/unit/test_genesis.py (1)
609-614:object.__new__()bypass is fragile if_compute_rewardever accessesselfattributes.Using
object.__new__(G1Environment)skips__init__, so no instance attributes are set. This works currently because_compute_rewardonly uses its parameters, but it'll break silently if the implementation changes to referenceself. Consider adding a brief comment noting this assumption.
- Fix success/fallen reward conflict: success now takes priority over fallen penalty to avoid contradictory RL signal - Scale connect timeout with action timeout instead of hardcoding 5s - Fix G1 docstring DOF count (37 → 43) to match robot config - Clamp num_pickupable to ensure at least one landmark object - Add non-root container user for security - Pin Menagerie clone to specific commit for reproducible builds
- Add explanatory comments for Any type hints (runtime-only genesis dep) - Replace silent except-pass with debug logging in scene teardown, object state reads, and env cleanup - Cache default_dof_pos and action_scale arrays to avoid per-step allocation - Use _get_landmark_objects helper instead of inline filter in place task - Add RobotConfig __post_init__ validation for array length consistency - Add return type annotations to Actor methods in env.py - Pin dependency versions in requirements.txt for reproducible builds - Add genesis to build_env docstring and examples - Log warning for unknown object types in scene builder
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@environments/genesis/env.py`:
- Line 83: The long line creating the AsyncClient exceeds the formatter's max
length; in the function where httpx.AsyncClient is used (the line with "async
with httpx.AsyncClient(timeout=httpx.Timeout(timeout, connect=min(timeout * 2,
5.0))) as client:"), break the call across multiple lines so each argument is
under the line-length limit—e.g., put the timeout=... and its nested
httpx.Timeout(...) on separate indented lines and move the connect=min(...)
argument to its own line—so Ruff will accept the file and readability is
improved.
- Around line 135-149: The code currently returns an error dict for an invalid
env_id but raises a ValueError for a missing base_url, causing inconsistent
behavior for callers like affinetes; update the base_url validation to mirror
the env_id path by returning the same structured error dict via
_build_error_result (use the same fields: env_id, task_id, seed or task_id,
start_time and an error string mentioning missing base_url/miner endpoint) so
both validation failures consistently return a result object rather than
raising, or alternatively move the base_url check inside the existing try/except
that handles other errors — modify the check near the base_url validation to use
_build_error_result (referencing env_id, base_url, seed, task_id and
_build_error_result) to ensure uniform error handling.
- Around line 55-59: The _get_env cache returns a shared RoboticsEnvironment
from _env_cache causing concurrent evaluate() calls to race on environment
methods reset() and step(); fix by adding a per-env-id asyncio.Lock (e.g.,
_env_locks dict keyed by env_id) and acquire/release the corresponding lock
around any shared environment mutations in evaluate() (or alternatively stop
caching and return a fresh environment from get_environment(env_id) each
evaluate call); update _get_env and evaluate() to use the new locking strategy
(referencing _get_env, _env_cache, evaluate(), reset(), step(), and the per-env
asyncio.Lock) so two concurrent calls for the same env_id are serialized.
In `@kinitro/environments/genesis/robot_config.py`:
- Around line 1-175: CI failure: file needs ruff formatting. Run `ruff format
kinitro/environments/genesis/robot_config.py` (or run ruff format on the repo),
review changes to ensure RobotConfig, G1_JOINT_NAMES, G1_DEFAULT_DOF_POS,
G1_ACTION_SCALE and G1_CONFIG formatting is preserved, then add and commit the
reformatted file to the PR.
In `@kinitro/environments/genesis/task_generator.py`:
- Around line 247-258: The initial_state is storing a direct mutable reference
to target.position (a list) in the TaskSpec returned by the PLACE generator (and
likewise in the PUSH generator), so mutate of the scene later will change the
"initial" snapshot; fix by copying the list when constructing initial_state
(e.g., use a shallow copy of target.position) so initial_target_pos is an
independent snapshot in the TaskSpec returned by the PLACE and PUSH generators
(look for places that set initial_state={"initial_target_pos": target.position}
and replace with a copy).
🧹 Nitpick comments (12)
kinitro/environments/genesis/scene_generator.py (1)
148-167: Silent fallback position when position generation exhausts retries.If the 50-iteration loop fails to find a valid position, the fallback at Line 167 is returned silently. This could mask placement issues (e.g., an arena that's too small or a
min_dist_from_centerthat's too large relative to the arena). A debug log would help diagnose scene generation anomalies.Suggested fix
# Fallback position + logger.debug( + "object_position_fallback", + pickupable=pickupable, + arena_size=self._arena_size, + ) return [1.5, 0.0, 0.05]kinitro/environments/genesis/robot_config.py (1)
23-25: Mutable list defaults shared with module-level constants.
joint_names,default_dof_pos, andaction_scaleare plainlistfields. The dataclass stores a direct reference to the lists passed in (e.g.,G1_JOINT_NAMES). If anything mutates those lists after construction, both the module constant and the config instance are affected. Low risk since these are treated as constants, but worth noting if configs are ever cloned or modified at runtime.kinitro/environments/genesis/task_generator.py (1)
182-189: Hardcoded robot start position[0.0, 0.0, 0.75].Line 188 hardcodes the G1's spawn height. This value should come from
robot_config.init_pos(which is(0.0, 0.0, 0.75)for G1) to stay consistent if another robot is added. However,robot_configisn't passed into_generate_navigate_task. Consider threading it through or removing this field if it's unused downstream.kinitro/environments/genesis/base.py (4)
482-486: Accessing private_attached_linkon the Genesis camera object.
self._camera._attached_linkrelies on an internal attribute of the Genesis camera. If the Genesis library refactors this, it will silently break. Consider wrapping this in ahasattrcheck or caching the attached state locally.Suggested defensive approach
- if self._camera._attached_link is not None: + if getattr(self._camera, "_attached_link", None) is not None:
229-233: Redundant local re-import ofSceneConfig.
SceneConfigis already imported at the top of the file (Line 14). OnlySceneObjectConfigneeds the local import here.Suggested fix
from kinitro.environments.genesis.scene_generator import ( # noqa: PLC0415 - SceneConfig, SceneObjectConfig, )
535-556:actuated_dof_idxis recomputed every step.
list(range(6, 6 + n))at Line 555 allocates a new list on every_apply_actioncall. Sincenis fixed per environment instance, cache it alongside_default_dof_posand_action_scalein__init__.Suggested fix
# In __init__: self._default_dof_pos = np.array(robot_config.default_dof_pos, dtype=np.float32) self._action_scale = np.array(robot_config.action_scale, dtype=np.float32) + self._actuated_dof_idx = list(range(6, 6 + robot_config.num_actuated_dofs))Then use
self._actuated_dof_idxin both_apply_action(Line 555) and_build_scene(Line 389).
33-40: Broadexcept Exceptionswallows GPU initialization errors indiscriminately.If
gs.init(backend=gs.gpu)fails for a non-trivial reason (e.g., driver incompatibility, OOM), the code silently falls back to CPU without logging the GPU error. At minimum, log the exception at debug/warning level so users can diagnose unexpected CPU fallback.Suggested fix
try: gs.init(backend=getattr(gs, "gpu")) logger.info("genesis_initialized", backend="gpu") - except Exception: + except Exception as e: + logger.info("genesis_gpu_init_failed_falling_back_to_cpu", error=str(e)) gs.init(backend=getattr(gs, "cpu")) logger.info("genesis_initialized", backend="cpu")environments/genesis/env.py (2)
166-173: Full traceback included in error result — consider truncating for production.
traceback.format_exc()on Line 172 embeds the full Python stack trace into the result dict. If results are persisted or shared externally, this leaks internal file paths and code structure. Consider logging the full traceback server-side and returning only the exception summary in the result.Proposed fix
except Exception as e: + logger.error( + "evaluation_failed", + env_id=env_id, + task_id=task_id, + error=str(e), + exc_info=True, + ) return self._build_error_result( env_id=env_id, task_id=task_id, seed=seed, start_time=start_time, - error=f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()}", + error=f"{type(e).__name__}: {str(e)}", )
92-104: Consider aTypedDictfor the evaluation result shape.
evaluate,_run_evaluation, and_build_error_resultall return baredict. The result has a well-defined schema (task_name,score,success,time_taken,extra, optionalerror). ATypedDictwould document the contract and enable static type checking for callers.environments/genesis/Dockerfile (3)
44-50:LIBGL_ALWAYS_SOFTWARE=1disables GPU acceleration even when a GPU is available.
NVIDIA_DRIVER_CAPABILITIES=all(Line 46) exposes GPU capabilities to the container, butLIBGL_ALWAYS_SOFTWARE=1(Line 49) unconditionally forces software rendering, negating any GPU benefit. If you intend to use GPU when available and fall back to software rendering otherwise, removeLIBGL_ALWAYS_SOFTWAREand let Mesa auto-detect the driver. If software-only is intentional for reproducibility, document that and consider removing the NVIDIA env var to avoid confusion.
56-65: Consider removing the.gitdirectory after checkout to reduce image size.The sparse clone leaves
.gitmetadata in/opt/menagerie. Since the commit is already pinned and you only need the asset files, removing it saves image space.Proposed fix
RUN git clone --filter=blob:none --sparse \ https://github.com/google-deepmind/mujoco_menagerie.git /opt/menagerie && \ cd /opt/menagerie && \ git checkout ${MENAGERIE_COMMIT} && \ git sparse-checkout set unitree_g1 && \ - echo "Menagerie assets installed" + rm -rf .git && \ + echo "Menagerie assets installed (commit: ${MENAGERIE_COMMIT})"
21-21: Consider pinning the base image digest for fully reproducible builds.
python:3.11-slimis a floating tag that changes with patch releases. For deterministic builds, you could pin the image digest. However, this trades off auto-receiving security patches, so it's a judgment call.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@environments/genesis/env.py`:
- Around line 140-160: The error-result branches call self._build_error_result
and currently pass seed=seed or task_id which treats seed=0 as falsy; change
those arguments to use an explicit None check (e.g., seed=(seed if seed is not
None else task_id)) so a valid 0 seed is preserved in both _build_error_result
calls in the genesis env validation block (the two calls that return when env_id
doesn't start with "genesis/" and when base_url is None).
In `@kinitro/environments/genesis/scene_generator.py`:
- Around line 100-146: The docstring for _generate_objects promises distinct
colors but the implementation uses i % len(OBJECT_COLORS) so colors can repeat;
fix by enforcing distinct colors: in _generate_objects validate that num_objects
<= len(OBJECT_COLORS) (raise ValueError if not), update the docstring to state
the max objects equals number of available colors, and then iterate using
available_colors[i] (no modulo) when constructing SceneObjectConfig so each
object gets a unique color from OBJECT_COLORS.
- Around line 193-235: The loop in build_scene that iterates
scene_config.objects can produce a shorter entities list when an unknown type is
skipped, which breaks positional mapping used by _read_object_states; instead of
logging and continue, fail-fast by raising a clear exception (e.g., ValueError)
when obj_config.object_type is unrecognized (include object_type and object_id
in the message) so the mismatch cannot silently occur; update the block handling
the default/else case (the logger.warning + continue) to raise the error, or
alternatively change the function to append explicit (entity, obj_config) pairs
to entities so callers can use the explicit mapping between created entities and
scene_config.objects.
In `@kinitro/environments/genesis/task_generator.py`:
- Around line 182-189: The TaskSpec construction is storing mutable references
to SceneObject.position via the target_position and destination_position fields;
update all task generators that build TaskSpec (the NAVIGATE, MOVE, PICK, and
PLACE generators where TaskSpec(...) is returned) to defensively copy positions
by replacing uses of target.position or destination.position with
list(target.position) (similar to the earlier fix for initial_state) so
TaskSpec.target_position and TaskSpec.destination_position hold independent
lists and won't change if SceneObject.position mutates during simulation.
- Around line 89-138: The retry loop in generate_task currently overwrites a
caller-specified task_type on subsequent attempts; preserve the caller's intent
by capturing the initial_task_type (e.g., initial_task_type = task_type) or a
boolean flag (e.g., requested_specific = task_type is not None) before the loop
and only re-randomize task_type inside the for _ in range(self._max_attempts)
loop when the caller did not request a specific type (i.e., if not
requested_specific then task_type = _random_choice(available_types, rng));
ensure the first attempt still uses the provided task_type and keep references
to generate_task, _try_generate_task, available_types, and _random_choice when
making the change.
🧹 Nitpick comments (3)
environments/genesis/env.py (1)
238-306: Synchronous physics calls (env.reset/env.step) block the event loop.
reset()andstep()run Genesis physics synchronously inside anasyncmethod. While the per-env lock serializes same-env evaluations, these blocking calls still stall the entire event loop—preventing concurrent evaluations on different env_ids, health-check responses, and timeout enforcement for other coroutines.Consider offloading to a thread:
obs = await asyncio.to_thread(env.reset, task_config) # ... obs, reward, done, info = await asyncio.to_thread(env.step, action)kinitro/environments/genesis/scene_generator.py (1)
148-172: Fallback position is deterministic — may cause object stacking.If multiple objects fail the 50-iteration placement loop (e.g., with a very small arena), they all land on the same fallback position
[1.5, 0.0, 0.05]. With defaultarena_size=5.0this is unlikely, but the silent fallback could cause overlapping objects in edge cases.A minor improvement would be to log a warning when falling back, consistent with the structured logging approach used elsewhere in this module.
🔧 Suggested improvement
# Fallback position + logger.warning( + "object_position_fallback", + arena_size=self._arena_size, + pickupable=pickupable, + ) return [1.5, 0.0, 0.05]kinitro/environments/genesis/robot_config.py (1)
35-35: Consider usinglist[TaskType]orLiteralinstead oflist[str]forsupported_task_types.Currently,
supported_task_typesislist[str]and is compared againstTaskType.valuestrings incheck_task_feasibilityandTaskGenerator. This works but provides no compile-time safety against typos (e.g.,"navgate"would silently be ignored). Usinglist[TaskType]would catch mismatches earlier. This would require importingTaskTypefromtask_types, which is a reasonable intra-package dependency.
Add per-env-id asyncio.Lock to prevent concurrent evaluation races, consistent error-dict returns for base_url validation, defensive copy of mutable position in initial_state, num_objects<2 guard in scene generator, and ruff formatting fixes.
9ac168b to
c03c6b2
Compare
Fix seed=0 falsy bug in error paths (use `is not None` check), cap num_objects to available colors for distinct-color guarantee, raise on unknown object types in build_scene to prevent entity/object index misalignment, preserve caller-specified task_type across retries, and defensively copy target_position/destination_position in all task generators.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@environments/genesis/env.py`:
- Around line 164-169: The per-env lock creation has a race where two coroutines
can both see env_id missing and create separate asyncio.Lock objects; change the
creation to use an atomic upsert like self._env_locks.setdefault(env_id,
asyncio.Lock()) before acquiring the lock so both coroutines get the same lock
object; update the code around self._env_locks, env_id, and asyncio.Lock
creation to call setdefault once and then use async with on the returned lock.
- Around line 220-238: Update the docstring (currently describing steps at lines
~113–117) to match the actual implementation order: first call the miner's
/reset endpoint via self._call_miner (sending task metadata), then reset the
simulation environment with env.reset(task_config); mention that the miner
receives only task metadata on reset and that the initial observation is
provided to the miner on the first /act call. Ensure references to env.reset,
self._call_miner, and the miner /reset behavior are clear in the docstring.
🧹 Nitpick comments (4)
environments/genesis/env.py (3)
97-109:**kwargsis accepted but silently discarded.
evaluateaccepts**kwargs(line 108) but never forwards them anywhere—not to_run_evaluation,_get_env, or_call_miner. This silently swallows typos or unsupported parameters from callers. If it's intentional (forward-compat), add a comment; otherwise remove it to surface caller mistakes.
49-61: Add type annotations for instance attributes.
_env_cacheis a baredict. Adding type hints improves readability and tooling support.Proposed fix
def __init__(self) -> None: """Initialize the evaluation actor.""" - self._env_cache = {} - self._env_locks: dict[str, asyncio.Lock] = {} + self._env_cache: dict[str, RoboticsEnvironment] = {} + self._env_locks: dict[str, asyncio.Lock] = {}
356-366:cleanupdoesn't clear_env_locks.After clearing
_env_cache, the stale locks in_env_locksremain. If the actor is reused after cleanup, new environments would still share old lock objects—not a correctness bug sinceLockis stateless when not held, but clearing both keeps the state consistent.Proposed fix
self._env_cache.clear() + self._env_locks.clear()kinitro/environments/genesis/scene_generator.py (1)
100-150: Modulo on line 125 is now dead code after the cap on line 116–117.Since
num_objectsis capped tolen(available_colors), thei % len(available_colors)can never wrap. Not harmful, but the modulo is misleading—it suggests wrapping is possible when it isn't. Consider simplifying toavailable_colors[i].
Use dict.setdefault for per-env asyncio.Lock creation to prevent a race where two coroutines both create separate locks. Fix evaluate() docstring to reflect actual execution order: miner /reset is called before env.reset(), not after.
Add genesis to allowed-unresolved-imports in [tool.ty.analysis] since it is a runtime-only dependency installed in the Docker container but not available in the CI environment.
Add libosmesa6 to the Docker image for CPU-only software rendering fallback. Remove LIBGL_ALWAYS_SOFTWARE=1 which caused segfaults when /dev/dri device was passed through to the container.
environments/genesis/Dockerfile
Outdated
| RUN groupadd --gid 1000 genesis && \ | ||
| useradd --uid 1000 --gid genesis --create-home genesis | ||
|
|
||
| # Copy the kinitro environments package (copied by env build command) |
There was a problem hiding this comment.
What is this comment for? So is this copy line redundant?
There was a problem hiding this comment.
Can't you set build flags to build for GPU vs CPU rendering? Necessary dependencies and env vars can be installed and set accordingly without unnecessarily bloating the image
| return [x, y, 0.05] # Small offset above flat ground | ||
|
|
||
| # Fallback position | ||
| return [1.5, 0.0, 0.05] |
There was a problem hiding this comment.
x=1.5 so it's not at the origin where the robot is?
| Returns: | ||
| List of Genesis entity references for object tracking | ||
| """ | ||
| import genesis as gs # noqa: PLC0415 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@environments/genesis/warmup_kernel_cache.py`:
- Around line 75-86: The current broad except around scene.build(n_envs=1)
captures all exceptions and treats them as expected EGL visualizer init
failures; change this to only swallow EGL-related initialization errors by
inspecting the caught exception (variable e) for EGL-related keywords (e.g.,
"EGL", "egl", "NoDisplay", "Could not initialize GLX", etc.) and only perform
the warm-up success prints and return in that case; for any other exception
message, re-raise the exception (or log and raise) so real build errors (missing
MJCF, invalid scene) are not silently ignored; keep references to scene.build,
t0, build_elapsed, and e when implementing the conditional handling.
In `@scripts/test-genesis-affinetes.py`:
- Around line 164-169: The import of the traceback module is inside the except
block causing PLC0415; move "import traceback" to the top-level imports near the
other imports in the file and remove the inline import from the except Exception
as e handler so the except block only prints the error and calls
traceback.print_exc() (now available from the top-level import) before
sys.exit(1).
🧹 Nitpick comments (4)
environments/genesis/warmup_kernel_cache.py (1)
26-26: Add a brief comment explaining thegetattrindirection.
gs.cpuwould be more readable. If this is working around a type-checker limitation (e.g.,tyor mypy not resolving the attribute on thegenesismodule), a short comment will prevent someone from "fixing" it later.- gs.init(backend=getattr(gs, "cpu")) + gs.init(backend=getattr(gs, "cpu")) # getattr avoids unresolved-attribute from type checkerenvironments/genesis/Dockerfile (1)
58-64: Consider adding--depth 1after checkout for a smaller image layer.After sparse checkout and pinning to a specific commit, the git history is unnecessary. You could save image size by removing the
.gitdirectory:git sparse-checkout set unitree_g1 && \ - echo "Menagerie assets installed" + rm -rf .git && \ + echo "Menagerie assets installed"scripts/test-genesis-affinetes.py (2)
29-35: Add return type annotations tocleanup()andmain().Both are public functions missing
-> None.-def cleanup(): +def cleanup() -> None:-def main(): +def main() -> None:As per coding guidelines, "Add return types to public functions and methods."
93-110:call_evaluatereturn typedictcould be more specific.Minor:
dictwithout type parameters is less informative. Considerdict[str, Any]to match the JSON structure, though for a test script this is low priority.
| try: | ||
| scene.build(n_envs=1) | ||
| except Exception as e: | ||
| # Genesis compiles physics kernels before building the visualizer. | ||
| # Without EGL (typical in docker build), the visualizer init fails, | ||
| # but the physics kernels are already compiled and will be cached | ||
| # by Genesis's atexit handler. | ||
| build_elapsed = time.perf_counter() - t0 | ||
| print(f"[warmup] scene.build() physics kernels compiled in {build_elapsed:.1f}s") | ||
| print(f"[warmup] Visualizer init failed (expected without EGL): {type(e).__name__}") | ||
| print("[warmup] Kernel cache warm-up complete (cached on process exit)") | ||
| return |
There was a problem hiding this comment.
Broad Exception catch may silently swallow non-EGL build errors.
If the MJCF file is missing or the scene definition is invalid, this handler will print a misleading "Kernel cache warm-up complete" message and return success. Consider narrowing the catch or at least checking the exception message for EGL-related keywords before treating it as expected.
Proposed fix
try:
scene.build(n_envs=1)
except Exception as e:
- # Genesis compiles physics kernels before building the visualizer.
- # Without EGL (typical in docker build), the visualizer init fails,
- # but the physics kernels are already compiled and will be cached
- # by Genesis's atexit handler.
build_elapsed = time.perf_counter() - t0
- print(f"[warmup] scene.build() physics kernels compiled in {build_elapsed:.1f}s")
- print(f"[warmup] Visualizer init failed (expected without EGL): {type(e).__name__}")
- print("[warmup] Kernel cache warm-up complete (cached on process exit)")
- return
+ err_msg = str(e).lower()
+ if "egl" in err_msg or "display" in err_msg or "visualizer" in err_msg:
+ print(f"[warmup] scene.build() physics kernels compiled in {build_elapsed:.1f}s")
+ print(f"[warmup] Visualizer init failed (expected without EGL): {type(e).__name__}")
+ print("[warmup] Kernel cache warm-up complete (cached on process exit)")
+ return
+ # Unexpected error — re-raise so the Docker build fails visibly
+ raise🤖 Prompt for AI Agents
In `@environments/genesis/warmup_kernel_cache.py` around lines 75 - 86, The
current broad except around scene.build(n_envs=1) captures all exceptions and
treats them as expected EGL visualizer init failures; change this to only
swallow EGL-related initialization errors by inspecting the caught exception
(variable e) for EGL-related keywords (e.g., "EGL", "egl", "NoDisplay", "Could
not initialize GLX", etc.) and only perform the warm-up success prints and
return in that case; for any other exception message, re-raise the exception (or
log and raise) so real build errors (missing MJCF, invalid scene) are not
silently ignored; keep references to scene.build, t0, build_elapsed, and e when
implementing the conditional handling.
scripts/test-genesis-affinetes.py
Outdated
| except Exception as e: | ||
| print(f"\nERROR: {type(e).__name__}: {e}", file=sys.stderr) | ||
| import traceback | ||
|
|
||
| traceback.print_exc() | ||
| sys.exit(1) |
There was a problem hiding this comment.
Move import traceback to the top of the file — pipeline failure.
The ruff check (PLC0415) flags this import inside the exception handler. Move it to the top-level imports to fix the CI failure.
Proposed fix
At the top of the file:
import json
import subprocess
import sys
import time
+import tracebackIn the exception handler:
except Exception as e:
print(f"\nERROR: {type(e).__name__}: {e}", file=sys.stderr)
- import traceback
-
traceback.print_exc()
sys.exit(1)🧰 Tools
🪛 GitHub Actions: ci
[error] 166-166: PLCFG0415: import should be at the top-level of a file. The import of traceback is occurring inside an exception handler.
[error] 166-166: ruff check failed due to PLC0415: import should be at the top-level of a file.
🤖 Prompt for AI Agents
In `@scripts/test-genesis-affinetes.py` around lines 164 - 169, The import of the
traceback module is inside the except block causing PLC0415; move "import
traceback" to the top-level imports near the other imports in the file and
remove the inline import from the except Exception as e handler so the except
block only prints the error and calls traceback.print_exc() (now available from
the top-level import) before sys.exit(1).
Bake compiled physics kernels into the Docker image layer to avoid ~36s of JIT compilation on every fresh container startup. A warmup script builds a minimal headless scene (ground plane, G1 robot, all primitive morph types) during `docker build`, triggering kernel compilation. The cache layer is placed before app code copies to maximize Docker layer caching.
Auto-detect EGL availability at runtime via ctypes probe and fall back to OSMesa when no GPU is present (e.g. Docker Desktop, CPU-only hosts). Use OSMesa during docker build warmup so both physics and rendering kernels get fully cached.
d248665 to
466d399
Compare
The warmup_kernel_cache.py file uses genesis APIs that are only available inside the Docker container. Exclude it from ty checking to fix unresolved-attribute errors in CI.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@kinitro/environments/genesis/base.py`:
- Around line 601-612: The code slices joint_action to n but doesn't check for
undersized arrays, which will cause a NumPy broadcast error when computing
target_pos; update the handling in the block that uses
action.get_continuous(ActionKeys.JOINT_POS_TARGET) to validate that joint_action
has at least n elements (where n = self._robot_config.num_actuated_dofs) before
slicing, and if it's shorter either raise a descriptive ValueError (including
the received length and expected n) or explicitly pad the array to length n
(e.g., with zeros) before np.clip and the computation of target_pos =
self._default_dof_pos + joint_action * self._action_scale so broadcasting is
always valid.
🧹 Nitpick comments (5)
environments/genesis/warmup_kernel_cache.py (2)
29-29:getattr(gs, "cpu")works aroundtychecker limitations but obscures intent.This is a reasonable workaround for the
ty checkunresolved-attribute errors on thegenesismodule (which uses dynamic attribute injection). A brief comment explaining whygetattris used would help future readers.Suggested clarification
- gs.init(backend=getattr(gs, "cpu")) + # Use getattr to avoid false-positive from ty check (genesis uses dynamic attrs) + gs.init(backend=getattr(gs, "cpu"))
25-25: Extract the G1 MJCF path to a shared constant to prevent divergence.The string
"unitree_g1/g1_with_hands.xml"is defined in two places: asMENAGERIE_ROBOTinwarmup_kernel_cache.py(line 25) and hardcoded inG1_CONFIG.mjcf_pathinrobot_config.py(line 157). If the path changes in one location but not the other, the warmup script will compile kernels for the wrong robot model at runtime. Consider extracting this to a shared constant module or importing it from a single source of truth.environments/genesis/Dockerfile (1)
46-49:NVIDIA_DRIVER_CAPABILITIES=allis broader than needed.For headless rendering,
NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utilitywould be sufficient and follows the principle of least privilege.allexposes video encode/decode and other capabilities unnecessarily.Suggested change
-ENV NVIDIA_DRIVER_CAPABILITIES=all +ENV NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utilitykinitro/environments/genesis/base.py (2)
290-293: Redundant local import —SceneConfigandSceneObjectConfigare already imported at module level (line 17).Both symbols are unconditionally imported at the top of the file, so this inner import is unnecessary.
Suggested fix
- # Reconstruct SceneConfig - from kinitro.environments.genesis.scene_generator import ( # noqa: PLC0415 - SceneConfig, - SceneObjectConfig, - ) - - obj_configs = [ + obj_configs = [
544-545: Accessing private_attached_linkon the Genesis camera is fragile.This relies on an internal implementation detail of the Genesis
Cameraclass. If Genesis renames or restructures this attribute, this code breaks silently (thehasattrcheck would just skip themove_to_attachcall, degrading camera tracking without any error).Consider either: (a) always calling
move_to_attach()unconditionally (it should be a no-op when no link is attached), or (b) tracking the attachment state locally via a boolean flag set in_attach_ego_camera.Option (b): track attachment state locally
Add a field in
__init__:self._camera_attached: bool = FalseSet it in
_attach_ego_cameraafter successful attachment:self._camera_attached = TrueThen here:
- if self._camera._attached_link is not None: + if self._camera_attached: self._camera.move_to_attach()
| joint_action = action.get_continuous(ActionKeys.JOINT_POS_TARGET) | ||
| if joint_action is None: | ||
| return | ||
|
|
||
| n = self._robot_config.num_actuated_dofs | ||
| joint_action = joint_action[:n] | ||
|
|
||
| # Clip to [-1, 1] | ||
| joint_action = np.clip(joint_action, -1.0, 1.0) | ||
|
|
||
| # Map to PD targets: target = default_pos + action * action_scale | ||
| target_pos = self._default_dof_pos + joint_action * self._action_scale |
There was a problem hiding this comment.
No validation that the action array matches the expected DOF count.
Line 606 truncates to n elements but doesn't guard against an undersized array. If joint_action has fewer than n elements, the arithmetic on line 612 will raise a NumPy broadcast error because _default_dof_pos (shape (n,)) and joint_action (shape (<n,)) are incompatible.
Suggested defensive check
joint_action = joint_action[:n]
+ if joint_action.shape[0] != n:
+ logger.warning(
+ "action_shape_mismatch",
+ expected=n,
+ got=joint_action.shape[0],
+ )
+ padded = np.zeros(n, dtype=np.float32)
+ padded[: joint_action.shape[0]] = joint_action
+ joint_action = padded🤖 Prompt for AI Agents
In `@kinitro/environments/genesis/base.py` around lines 601 - 612, The code slices
joint_action to n but doesn't check for undersized arrays, which will cause a
NumPy broadcast error when computing target_pos; update the handling in the
block that uses action.get_continuous(ActionKeys.JOINT_POS_TARGET) to validate
that joint_action has at least n elements (where n =
self._robot_config.num_actuated_dofs) before slicing, and if it's shorter either
raise a descriptive ValueError (including the received length and expected n) or
explicitly pad the array to length n (e.g., with zeros) before np.clip and the
computation of target_pos = self._default_dof_pos + joint_action *
self._action_scale so broadcasting is always valid.
- Dockerfile: trim header, shallow clone menagerie, simplify comments - env.py: make base_url required, rename endpoint->path, add EvalResult TypedDict, fix placeholder, clarify binary scoring in docstring - metadata.json: pretty-print - base.py: narrow GPU fallback to RuntimeError, remove section headers, remove redundant inline import, add deferred import comments - scene_generator.py: make min_dist_from_center a constructor param, derive fallback position from it, add deferred import comment - task_types.py: convert if/elif to match/case, add RGBA TODO
8670e9e to
588b8bb
Compare
camera.render() was failing silently because Genesis pyrender hardcodes OSMesaPlatform.supports_framebuffers() = False, sending rendering down a broken non-FBO code path where _forward_pass() returns None. - Dockerfile: libosmesa6 -> libosmesa6-dev for GL 3.3 core profile - base.py: patch OSMesaPlatform.supports_framebuffers to return True when using OSMesa, add _validate_camera() for early detection, add _camera_available flag to avoid duplicate warnings, improve logging - warmup: add camera render validation to fail-fast during docker build
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@kinitro/environments/genesis/base.py`:
- Around line 99-102: Remove the unused type suppression on the dynamic
assignment to OSMesaPlatform.supports_framebuffers: replace the line
"OSMesaPlatform.supports_framebuffers = lambda self: True # type:
ignore[assignment]" with "OSMesaPlatform.supports_framebuffers = lambda self:
True" (no type: ignore comment) so the CI `ty check` no longer flags an unused
suppression; leave the surrounding import and
logger.info("osmesa_framebuffer_patch_applied") unchanged.
🧹 Nitpick comments (5)
environments/genesis/warmup_kernel_cache.py (2)
33-33: Line exceeds target length (~113 chars).Move the comment to a separate line above the constant.
Suggested fix
+# Small image size to minimize OSMesa rendering time during warm-up, +# while still triggering all relevant kernels. -IMAGE_SIZE = 84 # Small image size to minimize OSMesa rendering time during warm-up, while still triggering all relevant kernels. +IMAGE_SIZE = 84As per coding guidelines, "Target line length of 100 characters (Ruff ignores E501 but keep lines reasonable)."
37-37: Prefer direct attribute access overgetattrwithout a default.
getattr(gs, "cpu")with no default behaves identically togs.cpu(raisesAttributeErrorif missing) but obscures the intent and defeats static analysis / IDE autocomplete. If this is to silence a type checker, a# type: ignorecomment is more conventional.Suggested fix
- gs.init(backend=getattr(gs, "cpu")) + gs.init(backend=gs.cpu) # type: ignore[attr-defined]environments/genesis/Dockerfile (1)
39-47: Consider removing the.gitdirectory after checkout to reduce image size.After the sparse shallow clone completes, the
.gitmetadata is no longer needed at runtime. Removing it trims the layer.Suggested fix
git fetch --depth 1 origin ${MENAGERIE_COMMIT} && \ - git checkout FETCH_HEAD + git checkout FETCH_HEAD && \ + rm -rf /opt/menagerie/.gitkinitro/environments/genesis/base.py (2)
121-127:getattr(gs, "gpu")obscures intent — consider a comment or# type: ignore.Using
getattr(gs, "gpu")instead ofgs.gpuis presumably to silence the type checker for the unresolvablegenesismodule. A brief inline comment explaining this (or a targeted# type: ignore[attr-defined]on direct access) would be clearer for future readers.
590-593: Accessing private_attached_linkis fragile.Line 592 reads
self._camera._attached_link, a private attribute of the Genesis camera. This could break silently on a Genesis version upgrade. Consider tracking the attachment state internally (e.g., aself._camera_attachedflag set in_attach_ego_camera) instead of relying on the library's private API.
The direct import of genesis.ext.pyrender is unresolvable in CI where genesis is not installed. Use importlib.import_module + setattr to keep the patch opaque to the static type checker.
The previous EGL probe only checked eglGetDisplay + eglInitialize, which succeed on Mesa's software EGL even without a GPU. Genesis then crashes on eglCreateContext with EGL_BAD_MATCH. Now the probe also calls eglChooseConfig to verify renderable configs exist before committing to EGL, falling back to OSMesa immediately when no GPU is available.
Summary
GenesisBaseEnvironmenthandles engine init, scene building, first-person ego camera, and PD control pipeline; thin subclasses define per-robot behavior--viewerand--seedflags tokinitro env testfor Genesis environmentsshow_viewerthrough base env, G1, registry, and CLI; update registry factories to accept**kwargsTest plan
pytest tests/unit/test_genesis.py)ruff checkcleanty checkclean (only pre-existinggenesisunresolved-import)uv run kinitro env listshows genesis/g1-v0uv run kinitro env test genesis/g1-v0 --episodes 1runs without error (requires Genesis + Menagerie)Summary by CodeRabbit
New Features
Tests