-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Optional Key-Point Visibility to draw_keypoints()
#8203
Comments
Thank you for this greatly detailed feature proposal @bmmtstb ! I think it makes sense. Instead of an additional |
I thought about that too, but for me a separate argument would be easier to understand. But because there are pro and con against both approaches, I startet a list of all the things I just thought of. In my opinion the extra attribute is more flexible, while demanding only a little more effort by the user.
If we follow your proposal, a few more questions arise: Will the third dimension be "force"-cast to be a bool? In the end, the visibility tensor is easily extractable from the output tensor using output = torch.ones((21, 3))
kp, vis = output.split([2, 1], dim=-1) Even though we can't draw them yet, what about "real" 3D key-points? Specific Type vs Key-Point Type: |
Thanks for your feedback @bmmtstb
That makes sense, it should be up to users to decide what the threshold should be, so let's go ahead with the |
I will do a PR, but I can't promise how fast I can finish it. Fingers crossed for the weekend 🤞🏼 |
There's no rush at all - thanks for doing it! |
🚀 The feature
I propose an optional key-point visibility flag for
~torchvision.utils.draw_keypoints()
, to be able to draw human skeletons with key-points that are not set / visible.If a key point is marked as invisible, the "dot" and the corresponding connection(s) in the
connectivity
/ skeleton will not be printed for this person on this image. Other people on the same image (num_instances
>1) can still have the full skeleton or another set of visible joints.The
visibility
input can either be atorch.BoolTensor
(or any otherCallable[bool]
) with the same number of entriesK
as thekeypoints
tensor, describing the visibility of each respective key point. Ifnum_instances
is bigger than one, there should either be a tensor of shape[num_instances, K]
describing every person individually, or one of shape[K]
describing all instances within this image at once.The
visibility
should be optional and therefore beTrue
/torch.ones((num_instances, K))
by default.Motivation, pitch
The current issue arises when key point coordinates are set to the origin, e.g., if they are not visible, not found, or otherwise not available.
Lets have a look at the example showing the possibilities of
draw_keypoints()
over at the docs.Given the image of the skateboarder, let some (other) model predict the key-point- or joint-coordinates as
(x, y, visibility)
, obtaining the following result:This is the result of the example, just that the
left_eye
,left_ear
, andleft_hip
are annotated as "not visible" with key point coordinates as(0, 0)
.Plotting this result shows three lines connecting the skateboarder with the origin, which doesn't look good. On the left is the original image, on the right the one using
new_keypoints
which has invisible key points.Now imagine how that looks for other skeleton structures, like Halpe-FullBody (136 key points) or COCO-WholeBody (133 key points)...
Alternatives
It is possible to remove the "invisible" key points from the the skeleton, by updating the skeleton for every image and using something like:
But the "dots" are still printed in the upper left corner (see below), and the whole process is fairly cumbersome. That's because now the skeleton of every human has to be analysed and drawn seperately, due to the fact that the skeletons of different persons might have different exclusions and
draw_keypoints()
only accepts one connectivity for all instances.Therefore, a second alternative would be to allow passing multiple connectivities, one for each instance. But that still doesn't solve the drawn "dot"s problem and feels less intuitive than the proposed approach.
Additional context
This image is taken from the PoseTrack21 dataset, and shows how a full body skeleton fails when only the upper body gets detected by the bounding box. These are the original annotated key points within the annotated bounding box of the dataset. (Image Source, not publically available: PoseTrack21/images/val/010516_mpii_test/000048.jpg)
The text was updated successfully, but these errors were encountered: