SlateQ agent implementation #698

rahul-zomato · 2022-11-15T13:27:58Z

Is next_state deliberate here in next_q_values calculation in slateQ agent - https://github.com/facebookresearch/ReAgent/blob/main/reagent/training/slate_q_trainer.py#L230

SlateQ agent implemented by SlateQ paper authors in recsim uses state instead of next state from replay buffer to get next_q_values - google-research/recsim#26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SlateQ agent implementation #698

SlateQ agent implementation #698

rahul-zomato commented Nov 15, 2022

SlateQ agent implementation #698

SlateQ agent implementation #698

Comments

rahul-zomato commented Nov 15, 2022