The work is in progress https://github.com/pytorch/torchtitan/pull/1630; there are two general questions from our side 1. Do we wanna bring semi-orthogonal init rather than using the default one? https://arxiv.org/abs/2310.17813 https://arxiv.org/abs/2405.14813 2. Do we wanna to have the log of training dynamics from the optimizer's side?