-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Expert score for maze2d environment may be wrong #215
Comments
Hi, I think you're right. I trained the decision transformer in maze2d-medium-dense-v1 environment and calculated the normalized score with this command: |
Hi, I'm also attempting to calculate normalized score with command: |
Summary
Description:
Environment: maze2d
If you utilize the provided code (
scripts/reference_scores/maze2d_controller.py
) to calculate the score of the expert strategy, it may yield inaccurate results.The WaypointController strategy (expert strategy) may only produce accurate results in the initial episode. However, it is likely to fail in reaching the goal point during subsequent episodes in the maze2d environment.
Why this happen?
The issue arises from the expert strategy implemented in the
d4rl/pointmaze/waypoint_controller.py
file. Specifically, theget_action
function serves as the action selection mechanism for the expert strategy, and it contains the following code snippet:This code implies that the waypoints will only be recalculated when the endpoint changes.
Taking into consideration the code in
scripts/reference_scores/maze2d_controller.py
, it appears that theself._new_target()
function is executed solely at the beginning of the first episode. This is becauseenv.reset()
does not modify the endpoint. Consequently, in subsequent episodes, the waypoints will not be recalculated, and instead, the waypoints from the initial trajectory will be reused. As a result, the optimal strategy fails to achieve the desired outcome.Experiment
Upon incorporating
env.render()
into thescripts/reference_scores/maze2d_controller.py
file, it was observed that the expert strategy indeed fails to reach the target point. The video has been uploaded to Google Drive:After making modifications to the code, I conducted a re-evaluation of the expert strategy across different environments. The results are presented below:
The text was updated successfully, but these errors were encountered: