Joint Training of Adapter Modules for Multiple Properties #150

luisbro · 2025-04-23T17:15:05Z

luisbro
Apr 23, 2025

Hello Team,

I’ve been reading through the fine-tuning approach and it seems to me that for fine-tuning the model to multiple properties, you train the corresponding adapter modules jointly rather than combining individually trained adapters at inference time.

I was wondering:

What motivated the decision to train adapters jointly for multiple properties instead of training them separately and combining them later?
Did you experiment with separate training and combining (as done in ControlNet-style methods), and if so, how did the performance or behavior compare?

Would love to hear your thoughts or point me to any information I missed (:

Thanks,
Luis

Answered by danielzuegner

Jun 10, 2025

Hi @luisbro,

very insightful questions, thanks for asking.

What we're interested in during sampling is the conditional score $\nabla \log p(x_t | c_1, c_2)$, which we obtain from $p(x | c_1, c_2) \propto p(c_1, c_2 | x) p(x)$ (see derivation). Training the adapters separately amounts to making the assumption that the classifier distribution factorizes, i.e., $p(c_1, c_2 | x) \approx p(c_1 | x) p(c_2 | x)$. The amount to which this assumption is violated depends on the particular pair of properties, of course. In our case, looking at the scatterplot of HHI score and magnetic density, the properties actually do appear to be correlated, so going for the joint distribution $p(c_1, c_2 | x)$ a…

View full answer

danielzuegner · 2025-06-10T13:20:57Z

danielzuegner
Jun 10, 2025
Maintainer

Hi @luisbro,

very insightful questions, thanks for asking.

What we're interested in during sampling is the conditional score $\nabla \log p(x_t | c_1, c_2)$, which we obtain from $p(x | c_1, c_2) \propto p(c_1, c_2 | x) p(x)$ (see derivation). Training the adapters separately amounts to making the assumption that the classifier distribution factorizes, i.e., $p(c_1, c_2 | x) \approx p(c_1 | x) p(c_2 | x)$. The amount to which this assumption is violated depends on the particular pair of properties, of course. In our case, looking at the scatterplot of HHI score and magnetic density, the properties actually do appear to be correlated, so going for the joint distribution $p(c_1, c_2 | x)$ appeared like the right thing to do. Now, we did not actually try the alternative route of training them separately, so in practice it might work just as well. Note, though, that we now require three model evaluations to get the conditional score during sampling instead of two, so it comes at an additional computational cost.

Hope this helps!

Daniel

3 replies

luisbro Jun 20, 2025
Author

Hi @danielzuegner, thanks a lot for your reply. Your point about the dependence of the distributions is a good argument for joint training, really appreciate that explanation (:

Regarding your last sentence:

Note, though, that we now require three model evaluations to get the conditional score during sampling instead of two, so it comes at an additional computational cost.

Just to clarify: I was referring to adding the outputs of independently trained adapter modules and then applying that sum to the layers of the network to produce the conditional score. I did not mean to combine the scores of multiple fine-tuned models,although that would certainly be interesting as well. So, in that setup, there shouldn't be a significant increase in computational cost. Of course, your point about the distributional dependence still stands.

I also realize that this might not work out-of-the-box with MatterGen. To better understand, I'd like to briefly compare it to ControlNet (from 2302.05543), which your architecture seems partially inspired by. Could you confirm or correct the following comparison, especially on the MatterGen side?

ControlNet	MatterGen
Adapter modules are a copy of the original network	Adapter modules are smaller and not copies
Only adapter weights are updated during training	Both adapter and original model weights are updated during training

The second point in particular seems like it would make combining multiple independently trained adapter modules during inference challenging in MatterGen’s current setup.

Thanks again for all your help so far!
Luis

danielzuegner Jun 24, 2025
Maintainer

Hi @luisbro,

we did try training only the adapter layer weights at some point (which would allow combining several adapter modules), but full finetuning yielded better results. But of course you can manually turn off training for the original model weights and check results for yourself. The table looks good!

Best,

Daniel

luisbro Jun 24, 2025
Author

Thank you very much, great to hear the results of your tests (:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Joint Training of Adapter Modules for Multiple Properties #150

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Joint Training of Adapter Modules for Multiple Properties #150

Uh oh!

luisbro Apr 23, 2025

Replies: 1 comment · 3 replies

Uh oh!

danielzuegner Jun 10, 2025 Maintainer

Uh oh!

luisbro Jun 20, 2025 Author

Uh oh!

danielzuegner Jun 24, 2025 Maintainer

Uh oh!

luisbro Jun 24, 2025 Author

luisbro
Apr 23, 2025

Replies: 1 comment 3 replies

danielzuegner
Jun 10, 2025
Maintainer

luisbro Jun 20, 2025
Author

danielzuegner Jun 24, 2025
Maintainer

luisbro Jun 24, 2025
Author