Separate Learning Rates for PPO #368

hiraz01 · 2025-09-11T23:28:15Z

hiraz01
Sep 11, 2025

Is it possible to use two separate optimizers and learning rates for the policy and value optimizers without directly changing the current PPO implementation? I looked through the source and it appears that I could change it directly, but I am not sure how best way to go about doing this. I am trying to replicate results from DeepMimic environment where they use separate LRs in tensorflow 1. Goal is to have additional flexibility and use IsaacLab with skrl.

Toni-SM · 2025-09-16T01:27:12Z

Toni-SM
Sep 16, 2025
Maintainer

Hi @hiraz01

Currently, it is necessary to modify the source code to support different learning rate for the involved optimizers.
But your comment gave me the idea to support, as a learning rate value, a tuple too (where each element is the learning rate of each optimizer). I hope to have some time to develop it 😅

5 replies

hiraz01 Sep 16, 2025
Author

I think that could be useful. From what I can gather, none of the major libraries out there provide this by default (e.g. SB3 and RLLib). I found a PyTorch implementation that addresses this but does not include all the optimizations. For now, I added to the configuration and changed a couple lines at the optimizer initialization :

self._split_lr = self.cfg["split_lr"]
self._policy_learning_rate = self.cfg["policy_learning_rate"]
self._value_learning_rate = self.cfg["value_learning_rate"]

In the init section I have,

elif self._split_lr == False:
     self.optimizer = torch.optim.Adam(itertools.chain(self.policy.parameters(), self.value.parameters()), lr=self._learning_rate)
else:
     self.optimizer = torch.optim.Adam([{'params': self.policy.parameters(), 'lr': self._policy_learning_rate},
                                                               {'params': self.value.parameters(), 'lr': self._value_learning_rate}])

I think this is all that needed to change to accomplish a split learning rate using the same optimizer. Correct me if I am wrong, but we would also still need to change the learning rate scheduler to perform any progress-based learning rate changes. It would be nice to verify this is correct, and in the future have a setup where we could also potentially use two separate optimizers for full flexibility. I am happy to help with testing, developing, or in any other way.

Toni-SM Sep 16, 2025
Maintainer

~~I am not sure that specifying 'lr' in the list of dictionaries will work.~~
I can not find a reference to it in PyTorch's Adam docs.

In any case, you can separate the optimizers and LR schedulers as done for the DDPG implementation:

skrl/skrl/agents/torch/ddpg/ddpg.py

Lines 174 to 185 in 90adbbc

    
           self.policy_optimizer = torch.optim.Adam(self.policy.parameters(), lr=self._actor_learning_rate) 
        
           self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), lr=self._critic_learning_rate) 
        
           if self._learning_rate_scheduler is not None: 
        
               self.policy_scheduler = self._learning_rate_scheduler( 
        
                   self.policy_optimizer, **self.cfg["learning_rate_scheduler_kwargs"] 
        
               ) 
        
               self.critic_scheduler = self._learning_rate_scheduler( 
        
                   self.critic_optimizer, **self.cfg["learning_rate_scheduler_kwargs"] 
        
               ) 
        
           self.checkpoint_modules["policy_optimizer"] = self.policy_optimizer 
        
           self.checkpoint_modules["critic_optimizer"] = self.critic_optimizer

hiraz01 Sep 16, 2025
Author

Thank you very much for your response. I am not sure either, but I did find this library where they use this method for implementation.

https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO.py

In lines 139 to 142 of the PPO.py:

        self.optimizer = torch.optim.Adam([
                        {'params': self.policy.actor.parameters(), 'lr': lr_actor},
                        {'params': self.policy.critic.parameters(), 'lr': lr_critic}
                    ])

Then in the update step they make a single call to zero_grad and step as usual. I was hoping to do more checking and drilling down to see if this does, in fact work, as written. I will have a look at the Adam link you provided and continue to research this. Thanks again! The library and your responses are much appreciated.

Toni-SM Sep 16, 2025
Maintainer

Ohhh, I see...

Looking deeper into the PyTorch's Adam base class (Optimizer)... the learning rate is updated to param_group in https://github.com/pytorch/pytorch/blob/f638854e1da6b33e78dcc9f3e28c98c4cdce4e86/torch/optim/optimizer.py#L1124-L1130. So, yes, it is possible to specify the learning rate using dict 👀 (something new to learn today!)

Although I must say that the implementation is quite obscure and the PyTorch documentation does not clarify the details very well. Optimizer docs:

params (iterable) – an iterable of torch.Tensor s or dict s. Specifies what Tensors should be optimized.
defaults (dict[str, Any]) – (dict): a dict containing default values of optimization options (used when a parameter group doesn’t specify them).

hiraz01 Sep 16, 2025
Author

That is good to hear. Sometimes a combination of searching engines and library code and even PyTorch documentation does not always lead me to a very confident conclusion. I think that my first next step will be to use the single Adam optimizer, in combination with two learning rates defined using the dict, as we have discussed, and with a single learning rate schedule to adjust both rates. Thanks again.

Separate Learning Rates for PPO #368

Uh oh!

hiraz01 Sep 11, 2025

Replies: 1 comment · 5 replies

Uh oh!

Uh oh!

Toni-SM Sep 16, 2025 Maintainer

Uh oh!

Uh oh!

hiraz01 Sep 16, 2025 Author

Uh oh!

Uh oh!

Toni-SM Sep 16, 2025 Maintainer

Uh oh!

Uh oh!

hiraz01 Sep 16, 2025 Author

Uh oh!

Uh oh!

Toni-SM Sep 16, 2025 Maintainer

Uh oh!

hiraz01 Sep 16, 2025 Author

hiraz01
Sep 11, 2025

Replies: 1 comment 5 replies

Toni-SM
Sep 16, 2025
Maintainer

hiraz01 Sep 16, 2025
Author

Toni-SM Sep 16, 2025
Maintainer

hiraz01 Sep 16, 2025
Author

Toni-SM Sep 16, 2025
Maintainer

hiraz01 Sep 16, 2025
Author