Joint Training of Adapter Modules for Multiple Properties #150
-
Hello Team, I’ve been reading through the fine-tuning approach and it seems to me that for fine-tuning the model to multiple properties, you train the corresponding adapter modules jointly rather than combining individually trained adapters at inference time. I was wondering:
Would love to hear your thoughts or point me to any information I missed (: Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi @luisbro, very insightful questions, thanks for asking. What we're interested in during sampling is the conditional score Hope this helps! Daniel |
Beta Was this translation helpful? Give feedback.
Hi @luisbro,
very insightful questions, thanks for asking.
What we're interested in during sampling is the conditional score$\nabla \log p(x_t | c_1, c_2)$ , which we obtain from $p(x | c_1, c_2) \propto p(c_1, c_2 | x) p(x)$ (see derivation). Training the adapters separately amounts to making the assumption that the classifier distribution factorizes, i.e., $p(c_1, c_2 | x) \approx p(c_1 | x) p(c_2 | x)$ . The amount to which this assumption is violated depends on the particular pair of properties, of course. In our case, looking at the scatterplot of HHI score and magnetic density, the properties actually do appear to be correlated, so going for the joint distribution $p(c_1, c_2 | x)$ a…