You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been reading through the code and I have a few questions regarding the CPE functionality.
In particular, with the DQN model you have separate Q-networks, both normal and target, dedicated just for CPE that I would like to better understand. At present their purpose is not really clear to me. In particular, what is the purpose of these additional networks over and above the standard Q-networks of the DQN model?
In the function _calculate_cpes, which is part of the RLTrainer class. Reading this function it seems that update the networks q_network_cpe and q_network_cpe_target to model not only the reward, but any additional metrics that could be on interest in CPE.
Am I right in thinking that performing CPE on these additional metrics is the main reason for these additional networks? Put another way, if one were only interested in performing CPE on the reward itself, would using the standard Q-networks of the DQN model suffice?
Thanks
The text was updated successfully, but these errors were encountered:
tfurmston
changed the title
CPE Functionaility
CPE Functionaility - Purpose of additional CPE Q-Networks?
Apr 19, 2021
Hi,
I have been reading through the code and I have a few questions regarding the CPE functionality.
In particular, with the DQN model you have separate Q-networks, both normal and target, dedicated just for CPE that I would like to better understand. At present their purpose is not really clear to me. In particular, what is the purpose of these additional networks over and above the standard Q-networks of the DQN model?
In the function _calculate_cpes, which is part of the
RLTrainer
class. Reading this function it seems that update the networksq_network_cpe
andq_network_cpe_target
to model not only the reward, but any additional metrics that could be on interest in CPE.Am I right in thinking that performing CPE on these additional metrics is the main reason for these additional networks? Put another way, if one were only interested in performing CPE on the reward itself, would using the standard Q-networks of the DQN model suffice?
Thanks
The text was updated successfully, but these errors were encountered: