-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkered Predictions (Checkered Artifacts?) #778
Comments
The image you posted seems to have about 64 'tiles' horizontally. By default the image encoder outputs a 64x64 token 'image' that is processed and eventually upscaled by the mask decoder. So it seems likely that the upscaling isn't 'mixing' the pixels together enough and therefore the original 64x64 grid is still visible in the result. There can also be other, much more subtle, lower-resolution grid artifacts that can appear due to the windowing the models uses. |
Wow! It does seem to have 64 tiles, if this is the reason then it could mean there is some issue with the upsampling method. |
Since the interpolation is bilinear, it probably shouldn't introduce any artifacts other than a blurring effect as the smaller pattern is scaled up. However, the fact that the model doesn't upscale all the way back to the original input size may be part of the problem (there was some discussion of this on the samv2 issue board), since it gives the model less chance of processing the original tokens + any artifacts get interpolated up to be more visible. |
I've tried to upscale the decoder more smoothly with some extra layers (512 and 1024) up to I think that there is still something else that is going to impact the resolution. |
That's interesting! Maybe the decoder model is just too small/simple to avoid these kinds of artifacts entirely. It's probably hard to improve it without breaking the original 'real-time on cpu' design constraint. Maybe a few regular convolutions in between the upscaling steps could help blend things better spatially? |
It is what I have tried to not invalidate the pretrained checkpoint part of the decoder. Probably we need to have a better design of these extra layers. If not you are going to strictly interpolate from https://github.com/facebookresearch/sam2/blob/main/sam2%2Fmodeling%2Fsam2_base.py#L373 |
Hi,
I have been modifying the scripts of vanilla SAM, mainly to come up with my own training script.
I was kind of successful in that, and training is happening with loss gradually reducing. But I noticed something, when I save the predictions made by the model in every epoch, I observe that there is a checkered lines all over the predictions.
For eg, in the below image, the left one is prediction from epoch 1 and right image is from epoch 117. I observe that although the grid is fading, but it's clearly visible.
Does anyone know what is causing this? Or is it just because the model is not trained for enough number of epochs?
Well, I'm using just 15 image and mask pairs for training (Using image encoder weights, but training prompt encoder and mask decoder from scratch).
I would be grateful if someone can give me some clue if not a proper solution. Thanks in advance!
The text was updated successfully, but these errors were encountered: