From 23e06a36722a1355f61f3316ccb252053eb16f60 Mon Sep 17 00:00:00 2001 From: Patrick Helm <91552435+PatrickHelm@users.noreply.github.com> Date: Tue, 21 Nov 2023 11:20:03 +0100 Subject: [PATCH] Fix entropy equation --- docs/algorithms/sac.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/algorithms/sac.rst b/docs/algorithms/sac.rst index 6df7ff501..ff677f824 100644 --- a/docs/algorithms/sac.rst +++ b/docs/algorithms/sac.rst @@ -35,11 +35,11 @@ Entropy-Regularized Reinforcement Learning Entropy is a quantity which, roughly speaking, says how random a random variable is. If a coin is weighted so that it almost always comes up heads, it has low entropy; if it's evenly weighted and has a half chance of either outcome, it has high entropy. -Let :math:`x` be a random variable with probability mass or density function :math:`P`. The entropy :math:`H` of :math:`x` is computed from its distribution :math:`P` according to +Let :math:`x` be a random variable with probability mass or density function :math:`p`. The entropy :math:`H` of :math:`x` is computed from its distribution :math:`P` according to .. math:: - H(P) = \underE{x \sim P}{-\log P(x)}. + H(P) = \underE{x \sim P}{-\log p(x)}. In entropy-regularized reinforcement learning, the agent gets a bonus reward at each time step proportional to the entropy of the policy at that timestep. This changes `the RL problem`_ to: @@ -318,4 +318,4 @@ Other Public Implementations .. _`SAC release repo`: https://github.com/haarnoja/sac .. _`Softlearning repo`: https://github.com/rail-berkeley/softlearning -.. _`Yarats and Kostrikov repo`: https://github.com/denisyarats/pytorch_sac \ No newline at end of file +.. _`Yarats and Kostrikov repo`: https://github.com/denisyarats/pytorch_sac