Customer saw the recommended training setting of deepseekv3 GB200 in NeMo. mcore fsdp is not used. They want to know the reason. Could mcore fsdp be used for Deepseek v3? Or is the perf lower than 3D parallel in GB200. https://github.com/NVIDIA/NeMo/blob/main/scripts/performance/recommended_model_configs/model_configs_gb200.csv#L22 Thanks.