Seemingly missing reshaping operation in point prompt encoding in PromptEncoder #786

lppllppl920 · 2024-10-23T19:58:20Z

https://github.com/facebookresearch/segment-anything/blob/main/segment_anything/modeling/prompt_encoder.py#L81-L85

If I read it correctly, the shape of points input is [B, N, 2], where B is the batch size and N is the number of points per image. The padding ensures that the point prompt also contains the 2d coordinates of two points to make it compatible with the box prompt. Without reshaping operation before the torch.cat operation, wouldn't the shape become [B, N + 1, 2] after the padding. This doesn't feel right. Since this PromptEncoder is used in the SAM2 as well, it seems to impact both models.

Please correct me if I misunderstand any part of this.

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seemingly missing reshaping operation in point prompt encoding in PromptEncoder #786

Seemingly missing reshaping operation in point prompt encoding in PromptEncoder #786

lppllppl920 commented Oct 23, 2024 •

edited

Loading

Seemingly missing reshaping operation in point prompt encoding in PromptEncoder #786

Seemingly missing reshaping operation in point prompt encoding in PromptEncoder #786

Comments

lppllppl920 commented Oct 23, 2024 • edited Loading

lppllppl920 commented Oct 23, 2024 •

edited

Loading