feat(rl_interface): extensible dict-based Observation/Action + E2E docs#30
feat(rl_interface): extensible dict-based Observation/Action + E2E docs#30rayokamoto merged 14 commits intomainfrom
Conversation
Replace fixed-field CanonicalObservation/CanonicalAction with extensible dict-based Observation/Action classes: - Observation: proprio dict with ProprioKeys (EE_POS, EE_QUAT, GRIPPER, etc.) - Action: continuous/discrete dicts with ActionKeys (EE_TWIST, GRIPPER, etc.) - Base64 image encoding/decoding for efficient serialization - coerce_action() helper for backward compatibility with array inputs Remove duplicate root-level rl_interface.py; single source of truth is now kinitro/rl_interface.py (copied to build context by env build command).
- Fix API endpoint paths (use /v1/ prefix, not /api/v1/) - Add Basilica deployment testing section with deploy/delete workflow - Add cleanup commands for containers and services - Add database schema reference with query examples - Document Basilica CLI installation - Add --mock-miner flag to services.sh for E2E testing
📝 WalkthroughWalkthroughMigrates the RL public API from CanonicalObservation/CanonicalAction to extensible Observation/Action schemas (adds ProprioKeys, ActionKeys, image encode/decode, coercion helpers), updates rl_interface, environments, miner templates, CLI/testing, scripts, docs, and tests to the new payload shapes. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(220,230,255,0.5)
participant CLI as CLI / Orchestrator
participant Miner as Miner (policy server)
participant Env as Simulation Env
participant RL as kinitro.rl_interface
end
CLI->>Env: reset() → Observation
Env->>RL: build Observation (rgb, proprio via ProprioKeys)
RL-->>CLI: Observation payload (to_payload / model_dump)
CLI->>Miner: POST /act (ActRequest obs: Observation)
Miner->>RL: Action.model_validate / coerce_action
RL-->>Miner: validated Action
Miner-->>CLI: ActResponse(action: Action)
CLI->>Env: step(action: Action)
Env-->>CLI: Observation, reward, done, info
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/miner-guide.md (1)
120-148:⚠️ Potential issue | 🟡 MinorAlign the surrounding docs with the new Observation/Action schema.
The example now usesObservationandAction(continuous=...), but earlier sections and the API request/response examples still show legacy keys likeee_pos_m,twist_ee_norm, and top-levelobsfields. Please update those sections to showobs: {proprio: {...}, rgb: {...}}andaction: {continuous: {...}}so readers don’t mix formats.kinitro/environments/metaworld_env.py (1)
390-407:⚠️ Potential issue | 🔴 CriticalFix CI type error: serialize
rgbpayload before buildingObservation.CI reports
invalid-argument-typebecausecamera_viewsisdict[str, np.ndarray]whileObservation.rgbis typed as serializable list/dict. Convert images to lists (or widenObservation.rgbtyping) before construction.🐛 Suggested fix (serialize camera views)
- return Observation( - rgb=camera_views, # Will be auto-encoded if numpy arrays - proprio={ - ProprioKeys.EE_POS: ee_pos.tolist(), - ProprioKeys.EE_QUAT: ee_quat.tolist(), - ProprioKeys.EE_VEL_LIN: ee_lin_vel.tolist(), - ProprioKeys.EE_VEL_ANG: ee_ang_vel.tolist(), - ProprioKeys.GRIPPER: [gripper_norm], - }, - ) + rgb_payload = {name: img.tolist() for name, img in camera_views.items()} + return Observation( + rgb=rgb_payload, + proprio={ + ProprioKeys.EE_POS: ee_pos.tolist(), + ProprioKeys.EE_QUAT: ee_quat.tolist(), + ProprioKeys.EE_VEL_LIN: ee_lin_vel.tolist(), + ProprioKeys.EE_VEL_ANG: ee_ang_vel.tolist(), + ProprioKeys.GRIPPER: [gripper_norm], + }, + )
🤖 Fix all issues with AI agents
In `@AGENTS.md`:
- Around line 178-235: The /act example in AGENTS.md uses legacy observation
keys (ee_pos_m, ee_quat_xyzw, gripper_01) and must be updated to the new
Observation schema that nests sensor data under obs.proprio using the
ProprioKeys names; update the mock-miner example curl to send obs.proprio with
the current ProprioKeys (replace ee_pos_m, ee_quat_xyzw, ee_lin_vel_mps,
ee_ang_vel_rps, gripper_01) when calling the /act endpoint of the mock-miner
(mock-miner, /act, and the uv run kinitro mock-miner example) so the documented
payload matches the migrated schema.
In `@kinitro/environments/metaworld_env.py`:
- Around line 518-533: The step method currently assumes coerce_action returns
arrays of expected lengths and directly indexes twist_arr and gripper_arr; add
explicit validation for ActionKeys.EE_TWIST and ActionKeys.GRIPPER sizes before
indexing: coerce the arrays to numpy (e.g., np.asarray), check twist_arr is not
None and twist_arr.size >= 6 (otherwise raise ValueError with a clear message or
fill a default zero-6 array), and check gripper_arr is not None and
gripper_arr.size >= 1 (otherwise raise ValueError or default to 0.0); then
compute twist = np.clip(first-6 values, -1.0, 1.0) and gripper =
float(np.clip(gripper_arr[0], 0.0, 1.0)) using the validated arrays, updating
variables twist_arr, gripper_arr, twist, and gripper in the step function (where
coerce_action, ActionKeys.EE_TWIST, and ActionKeys.GRIPPER are referenced).
In `@kinitro/rl_interface.py`:
- Around line 312-327: The from_array method should validate that the flattened
input length matches the sum of schema sizes to avoid silent under/overflow:
compute expected = sum(schema.values()), convert array to arr =
np.asarray(array, dtype=np.float32).flatten(), then if arr.size != expected
raise a ValueError (include expected and actual sizes in the message) before
proceeding to slice into continuous; reference symbols: from_array, schema, arr,
continuous, idx, cls.
In `@scripts/services.sh`:
- Line 138: The pkill invocation ("pkill -f \"kinitro mock-miner\"") in
scripts/services.sh is too broad and will kill mock-miner processes across all
worktrees; narrow the match to the worktree-specific port by matching the
mock-miner command including the HTTP port variable (e.g. include "--http-port
${MOCK_MINER_HTTP_PORT}" or the equivalent port flag used when launching
mock-miner) so only the instance for the current worktree is targeted; update
the pkill pattern accordingly in the same scripts/services.sh pkill line.
🧹 Nitpick comments (5)
scripts/services.sh (1)
155-155: Status filter could show mock-miners from other worktrees.The literal
mock-minerin the grep pattern will match any mock-miner process. For consistency with the worktree isolation design, consider filtering by the mock-miner port instead.♻️ Suggested improvement
- ps aux | grep -E "kinitro.*(api|scheduler|executor|mock-miner)" | grep -v grep | grep -E "($WORKTREE_NAME|$API_PORT|$DATABASE_URL|mock-miner)" || echo " None running" + ps aux | grep -E "kinitro.*(api|scheduler|executor|mock-miner)" | grep -v grep | grep -E "($WORKTREE_NAME|$API_PORT|$DATABASE_URL|${MOCK_MINER_PORT:-8001})" || echo " None running"environments/README.md (2)
101-108: Add language specifier to fenced code block.The code block showing directory structure should have a language specifier for consistency. Use
textorplaintextfor directory trees.📝 Suggested fix
-``` +```text environments/ └── myrobot/ ├── Dockerfile # Container build spec ├── env.py # Actor class for affinetes ├── requirements.txt # Python dependencies └── metadata.json # Display info</details> --- `721-734`: **Add language specifier to fenced code block.** Same as above—use `text` or `plaintext` for directory structure blocks. <details> <summary>📝 Suggested fix</summary> ```diff -``` +```text environments/robomanip/ ├── Dockerfile ├── env.py ├── requirements.txt └── metadata.json kinitro/environments/robomanip/ ├── __init__.py ├── environment.py ├── task_types.py ├── task_generator.py └── scene_generator.py</details> </blockquote></details> <details> <summary>kinitro/miner/template/env.py (1)</summary><blockquote> `120-127`: **Consider using ProprioKeys constants for consistency.** The code uses string literals (`"ee_pos"`, `"ee_quat"`, etc.) to read proprio data, but uses `ActionKeys` constants when constructing the Action. For consistency and to reduce typo risk, consider importing and using `ProprioKeys` constants. <details> <summary>♻️ Suggested improvement</summary> ```diff # Import from local rl_interface (self-contained for Basilica deployment) -from rl_interface import Action, ActionKeys +from rl_interface import Action, ActionKeys, ProprioKeys ... # Extract proprioceptive data from new format proprio = observation.get("proprio", {}) - ee_pos = np.array(proprio.get("ee_pos", [0, 0, 0]), dtype=np.float32) - ee_quat = np.array(proprio.get("ee_quat", [0, 0, 0, 1]), dtype=np.float32) - ee_vel_lin = np.array(proprio.get("ee_vel_lin", [0, 0, 0]), dtype=np.float32) - ee_vel_ang = np.array(proprio.get("ee_vel_ang", [0, 0, 0]), dtype=np.float32) - gripper_state = proprio.get("gripper", [0.0])[0] + ee_pos = np.array(proprio.get(ProprioKeys.EE_POS, [0, 0, 0]), dtype=np.float32) + ee_quat = np.array(proprio.get(ProprioKeys.EE_QUAT, [0, 0, 0, 1]), dtype=np.float32) + ee_vel_lin = np.array(proprio.get(ProprioKeys.EE_VEL_LIN, [0, 0, 0]), dtype=np.float32) + ee_vel_ang = np.array(proprio.get(ProprioKeys.EE_VEL_ANG, [0, 0, 0]), dtype=np.float32) + gripper_state = proprio.get(ProprioKeys.GRIPPER, [0.0])[0]kinitro/miner/template/policy.py (1)
49-75: Add return type annotation toact()method.Per coding guidelines, public functions and methods should have return types. The method returns
Action.📝 Suggested fix
- async def act(self, observation: Observation): + async def act(self, observation: Observation) -> Action: """ Get action for current observation.
The coerce_action() function was providing false backward compatibility by silently accepting arrays and converting them to Action objects. This masked API changes and added complexity. Changes: - Remove coerce_action() from kinitro/rl_interface.py - Update all step() methods to accept only Action objects - Update Actor classes to construct Action objects explicitly - Update CLI test command to construct Action directly - Update documentation examples
c72a401 to
171cc4b
Compare
- Update AGENTS.md /act example to use `proprio` dict with ProprioKeys instead of legacy flat keys (ee_pos_m, ee_quat_xyzw, etc.) - Fix services.sh mock-miner pkill to use MOCK_MINER_PORT for worktree isolation instead of matching all mock-miners globally
171cc4b to
0a84b8b
Compare
Address PR feedback to make environments/README.md human-first instead of AI-first by moving code boilerplate into a working template. - Create environments/_template/ with Dockerfile, env.py, requirements.txt, and metadata.json containing inline TODOs - Trim README from 1210 lines to 114 lines - Keep existing environment docs, add concise "Adding New Environments" section
Remove defensive multi-format action parsing. Miners must now return a valid Action schema or receive an error. This makes the API contract clear and simplifies the code.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@environments/_template/env.py`:
- Around line 102-110: The error-return branch that builds the failure result
for invalid env_id uses "seed or task_id", which treats 0 as falsy and can
override a valid seed=0; update the call to _build_error_result to pass the seed
explicitly using an explicit None check (e.g., seed if seed is not None else
task_id) instead of "seed or task_id" so an intentional seed of 0 is preserved
when building the error result for env_id, task_id, seed, start_time, and error.
In `@environments/README.md`:
- Around line 101-104: The README uses the lowercase "affinetes" in the
description for Actor.evaluate(); update the text to consistently use the proper
noun "Affinetes" (e.g., change "called by affinetes" to "called by Affinetes")
and scan nearby references (lines mentioning Actor.evaluate(), TaskConfig,
Observation, Action) to ensure all occurrences match the capitalized form for
brand consistency.
🧹 Nitpick comments (1)
environments/_template/env.py (1)
235-242: Add explicit return type to cleanup().♻️ Proposed refactor
- async def cleanup(self): + async def cleanup(self) -> None:As per coding guidelines: Add return types to public functions and methods.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@environments/_template/env.py`:
- Around line 226-233: The cleanup method lacks a return type and silently
swallows exceptions; update async def cleanup to include the return type
annotation "-> None" and replace the bare except/pass with structured logging of
the exception (use the module's structlog logger, e.g. logger.error or
logger.exception) including context like the env key or env.__class__.__name__
and the operation ("closing env"), then continue closing remaining envs and
finally clear self._env_cache; reference the async cleanup method,
self._env_cache and env.close() when making the change.
🧹 Nitpick comments (2)
environments/_template/env.py (2)
35-38: Add an explicit return type on__init__.It’s a public method and should be annotated for consistency.
As per coding guidelines, add return types to public functions and methods.Suggested change
- def __init__(self): + def __init__(self) -> None:
199-206: Log exceptions and avoid returning full tracebacks to callers.Log the detailed traceback using
logger.exception(), and return only a concise error message in the result payload. This follows the guideline to provide actionable error messages and log details with structlog.Suggested change
- except Exception as e: - return self._build_error_result( - env_id=env_id, - task_id=task_id, - seed=seed, - start_time=start_time, - error=f"{type(e).__name__}: {e}\n{traceback.format_exc()}", - ) + except Exception as e: + logger.exception( + "Evaluation failed", + env_id=env_id, + task_id=task_id, + seed=seed, + ) + return self._build_error_result( + env_id=env_id, + task_id=task_id, + seed=seed, + start_time=start_time, + error=f"{type(e).__name__}: {e}", + )
Update observation and action field names to match the actual rl_interface schema: - Observation: ee_pos, ee_quat, ee_vel_lin, ee_vel_ang, gripper - Action: continuous.ee_twist, continuous.gripper Also update all curl examples to use the correct nested format.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
docs/miner-guide.md (4)
85-95:⚠️ Potential issue | 🟡 MinorClarify
grippershape in proprio to match examples.The text says
gripperis in[0, 1](scalar), but all examples use a list[1.0]. Pick one representation and make it consistent across prose and payloads.📝 Suggested doc tweak (if list is intended)
- - `gripper`: Gripper state in [0, 1] + - `gripper`: Gripper state as a list, e.g. [1.0], in [0, 1]
119-148:⚠️ Potential issue | 🟡 MinorExample uses
uuidwithout import.
reset()returnsuuid.uuid4().hexbutuuidisn’t imported, so the snippet won’t run as-is.🔧 Proposed fix
-from kinitro.rl_interface import Action, ActionKeys, Observation +import uuid +from kinitro.rl_interface import Action, ActionKeys, Observation
475-501:⚠️ Potential issue | 🟡 MinorAPI spec mixes scalar vs list in text vs examples.
The spec example uses list
[1.0]forgripper, but the earlier prose implies scalar. Please make the spec and prose unambiguous.
576-599:⚠️ Potential issue | 🟡 MinorLocal/remote curl examples repeat the
grippershape mismatch.Both examples use list
gripper; please align the earlier description to avoid client-side parsing errors.
🤖 Fix all issues with AI agents
In `@docs/miner-guide.md`:
- Around line 106-113: The documentation currently mismatches the gripper shape
between the Action continuous dict and the proprio section; pick a single
canonical shape (preferably a scalar float in [0,1] to match the proprio
description) and update the Action docs accordingly: change the `gripper`
example and type note in the Action Space block (where `Action` continuous dict,
`ee_twist`, and `gripper` are described) to describe `gripper` as a single float
scalar in [0,1] and adjust the example from `[1.0]` to `1.0`; ensure any other
mentions of `gripper` elsewhere use the same scalar convention (or conversely
choose list semantics everywhere if you prefer lists).
- Line 186: The example payload uses "gripper": [1.0] while the earlier prose
says “gripper in [0,1]”; make them consistent by either changing the example
payload to use a scalar "gripper": 1.0 if the expected shape is a single value,
or update the prose to state that gripper is a one-element list (e.g., “gripper
is a list containing a value in [0,1]”). Locate the example payload string
containing '{"obs": {"proprio": {..., "gripper": [1.0]}, ...}}' and the prose
line that says “gripper in [0,1]” and edit one of them so both describe the same
shape and range.
🧹 Nitpick comments (1)
docs/miner-guide.md (1)
85-86: “Canonical observation” wording is misleading after API change.The new API is dict-based
Observation; calling it “canonical observation” can be read as the legacy fixed-field schema. Consider rephrasing to “Observation” or “standardized observation payload.”
- Remove _coerce_images, _coerce_proprio, _coerce_continuous validators - Make encode_image/decode_image public for explicit image serialization - Update metaworld and procthor environments to use encode_image explicitly - Type hints now accurately reflect expected input types
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@kinitro/environments/procthor/environment.py`:
- Around line 561-571: The step method can IndexError when twist_arr or
gripper_arr are shorter than expected; update the handling in step (variables
twist_arr, gripper_arr, and keys ActionKeys.EE_TWIST / ActionKeys.GRIPPER) to
defensively pad or truncate incoming arrays: for EE_TWIST build a length-6 array
filled with zeros and copy up to the first min(len(twist_arr),6) values before
clipping, and for GRIPPER ensure you read the first element only after creating
a length-1 padded array (or default 0.0) so indexing never fails; then apply the
existing np.clip and float conversion as before.
Move verbose E2E testing section from AGENTS.md into: - docs/e2e-testing.md: detailed testing guide - scripts/test-api.sh: test API endpoints - scripts/test-mock-miner.sh: test mock miner endpoints - scripts/test-e2e.sh: full pipeline test Also fix `kinitro mock-miner` -> `kinitro miner mock` in services.sh. AGENTS.md now contains a quick reference pointing to the scripts.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/services.sh (1)
118-156:⚠️ Potential issue | 🟠 MajorStop/status still target the old
mock-minercommand name.Line 122 starts
kinitro miner mock, but Line 139 and Line 156 grep formock-miner, so the process won’t be stopped or shown. Please align the patterns.🔧 Align stop/status patterns with the actual command
- pkill -f "kinitro mock-miner.*--port $miner_port" 2>/dev/null || true + pkill -f "kinitro miner mock.*--port $miner_port" 2>/dev/null || true @@ - ps aux | grep -E "kinitro.*(api|scheduler|executor|mock-miner)" | grep -v grep | grep -E "($WORKTREE_NAME|$API_PORT|$DATABASE_URL|mock-miner)" || echo " None running" + ps aux | grep -E "kinitro.*(api|scheduler|executor|miner mock)" | grep -v grep | grep -E "($WORKTREE_NAME|$API_PORT|$DATABASE_URL|miner mock)" || echo " None running"
🤖 Fix all issues with AI agents
In `@scripts/test-api.sh`:
- Around line 65-75: The script currently uses set -e which aborts on the first
failing test; change the test invocation pattern so every test runs and results
are aggregated: either remove/temporarily disable set -e around the block and
implement a small wrapper (e.g., create a new run_test function) that calls
test_endpoint "<name>" "<path>" "<expected>" and captures its exit code,
appending the name to PASSED or FAILED arrays and returning success without
exiting the script; alternatively call test_endpoint in a subshell or with "||
true" and record $? into counters. Update all invocations (the lines using
test_endpoint for "health", "miners list", "environments list", "latest scores",
"latest weights", and "task stats") to use the wrapper so the full suite
completes and the final summary can report aggregated PASSED/FAILED counts.
In `@scripts/test-e2e.sh`:
- Around line 137-153: The script currently does a single check for evaluations
using SCORES_RESPONSE/SCORES_BODY/SCORES_CODE and ignores the TIMEOUT option;
change this to poll until the TIMEOUT expires: record a start timestamp, loop
making the same curl request (using API_URL to build the URL and capturing
SCORES_RESPONSE/SCORES_BODY/SCORES_CODE the same way), break from the loop if
SCORES_CODE == "200", otherwise sleep a short interval (e.g., 5s) and re-check
until elapsed time >= TIMEOUT; after the loop preserve the existing success
output when 200 and the existing "No completed evaluations yet" message when
timed out, and consider introducing a SLEEP_INTERVAL variable for clarity.
In `@scripts/test-mock-miner.sh`:
- Around line 1-4: The startup hint in scripts/test-mock-miner.sh still
references the old CLI subcommand "kinitro mock-miner"; update all occurrences
of that string (including the other block later in the file) to the new CLI form
"kinitro miner mock" so the help text and examples match the current
services.sh/service CLI usage.
- Around line 65-88: The script currently uses set -e which aborts on the first
failing test so the aggregated PASSED/FAILED summary never runs; modify the test
invocation pattern so each call to test_endpoint is guarded (e.g., call
test_endpoint ... || true) or temporarily disable errexit around the test block
(set +e before the tests and set -e after) so all tests run and the existing
PASSED/FAILED counters (the variables updated by test_endpoint) are still
incremented and the final summary prints; locate references to set -e and the
calls to test_endpoint to apply this change.
- Fix pkill/ps patterns to match `miner mock` (not `mock-miner`) so stop/status commands work correctly for mock miner processes - Update CLI hints in test scripts to show correct command - Disable set -e during test execution so all tests run and the summary shows accurate pass/fail counts - Implement polling loop in test-e2e.sh to honor --timeout flag
dcbc05a to
307da68
Compare
All continuous channels use list format for uniformity, including single-value channels like gripper.
Replace hard exits on missing .env in services.sh and test-e2e.sh with informational notes and inline defaults, matching the pattern already used by test-api.sh and test-mock-miner.sh. Update AGENTS.md and e2e-testing.md to reflect that worktree-env.sh is optional.
Summary
CanonicalObservation/CanonicalActionto extensible dict-basedObservation/ActionclassesProprioKeysandActionKeysfor conventional channel namingrl_interface.py(single source of truth is nowkinitro/rl_interface.py)Changes
RL Interface (
kinitro/rl_interface.py)Observation:propriodict with ProprioKeys (EE_POS, EE_QUAT, GRIPPER, BASE_POS, etc.)Action:continuous/discretedicts with ActionKeys (EE_TWIST, GRIPPER, BASE_VEL, etc.)get_proprio(),get_continuous(),proprio_array(),continuous_array()get_image(),imagesproperty for base64 decodeEnvironment Updates
Documentation
environments/README.mdwith new interface examplesAGENTS.mdwith:/v1/prefix)--mock-minerflag for services.shTest plan
ruff check .)Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests