Why pad to same length in Ch07-04, Preference Tuning with DPO #476
-
In
I believe that this means that for a given batch, both "chosen" and "rejected" prompts are padded to the same length. My understanding is that this is usually done when inputs are processed together in the same batch. However, I believe that the "chosen" and "rejected" prompt are actually processed separately:
Can I ask if the decision to pad these to the same length is for implementational convenience, or if this is actually necessary? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
That's a good question. It's been quite some time since I implemented this notebook, and if I remember correctly, this was more for convenience in the data loading utilities. The padding tokens also shouldn't have any effect here as we ignore them in the loss computation: if selection_mask is not None:
mask = selection_mask[:, 1:].clone()
# Apply the mask to filter out padding tokens
selected_log_probs = selected_log_probs * mask
# Calculate the average log probability excluding padding tokens
# This averages over the tokens, so the shape is (batch_size, num_tokens)
avg_log_prob = selected_log_probs.sum(-1) / mask.sum(-1) |
Beta Was this translation helpful? Give feedback.
That's a good question. It's been quite some time since I implemented this notebook, and if I remember correctly, this was more for convenience in the data loading utilities. The padding tokens also shouldn't have any effect here as we ignore them in the loss computation: