You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, would like to ask if anyone else has had this problem. I've trained up an agent for a couple of days using the high-quality settings (higher self-play and simulation parameters etc.), but when I play test games against it I notice that at times when I miss blocking its win (it has 3-in-a-row already on a diagonal) and play somewhere else, it also ignores its own win and plays elsewhere. This may go on for more than a couple of moves and it never takes the win. I am loading the weights from my model and using act() to get the agent's moves, with tau set to 0 so it acts deterministic.
Would this be a problem with the code or is it explainable in terms of exploitation vs exploration (where the agent is confused when encountering such situations because it has never explored that avenue because it will always block 3-in-a-rows when given the opportunity)? Would there be any way to discourage this behavior apart from hard-coding a 'win-lose check' that prioritizes playing to connect 3-in-a-rows first?
The text was updated successfully, but these errors were encountered:
Hi there, would like to ask if anyone else has had this problem. I've trained up an agent for a couple of days using the high-quality settings (higher self-play and simulation parameters etc.), but when I play test games against it I notice that at times when I miss blocking its win (it has 3-in-a-row already on a diagonal) and play somewhere else, it also ignores its own win and plays elsewhere. This may go on for more than a couple of moves and it never takes the win. I am loading the weights from my model and using act() to get the agent's moves, with tau set to 0 so it acts deterministic.
Would this be a problem with the code or is it explainable in terms of exploitation vs exploration (where the agent is confused when encountering such situations because it has never explored that avenue because it will always block 3-in-a-rows when given the opportunity)? Would there be any way to discourage this behavior apart from hard-coding a 'win-lose check' that prioritizes playing to connect 3-in-a-rows first?
The text was updated successfully, but these errors were encountered: