Critic function learning #34

yesiam-png · 2021-03-01T06:01:04Z

Hi Shariq,
In your implementation and MAAC paper, you use expected discounted returns to learn the state-action Q function, e.g., Eq. (2) and (7), instead of the maximum Q(s, a) w.r.t action a. Could you explain it or give a reference?
Best,
Yesiam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Critic function learning #34

Critic function learning #34

yesiam-png commented Mar 1, 2021

Critic function learning #34

Critic function learning #34

Comments

yesiam-png commented Mar 1, 2021