Facial Recognition Functionality #1989
Closed
LackesLab
started this conversation in
Show and tell
Replies: 2 comments
-
Pretty good summary. But I wonder if it uses only the information of this single image. Faces can change for example with Make-up or lighting. And what about kids? My kids look totally different at 1, 2 and 4 years old. Will we be able to add additional faces to a person? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks for kicking this off! Facial recognition has been implemented in #2180 and further enhancements are being discussed in #2472. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey there, with this thread, i want to give some insights into the current development status. Discussion takes part on the discord.
At the current stage, not all functionality is in a final state. I am open to suggestions and improvements.
How does facial recognition work?
At the first stage, a face detection neural network scans an incoming image for faces. All faces are then cut out, resized to a standard size (depending on model, e.g. 128px x128px)
The cropped faces are then aligned, so that eyes and nose are one the same image coordinates in every images)
These “faces” are then fed into a neural network that generates so-called embeddings. These embeddings have certain characteristics so that different faces are mapped to different points in the 512 dimensional space.
The distance between them is determined using cosine distance.
If a new image is now sent to the facial recognition service, the extracted face embeddings are compared to all other currently stored embeddings. If the distance between two embeddings is below a certain threshold, it is considered as a match.
Current dependencies
python:3.10
What does the facial recognition service do?
In the first version, the api offers the following endpoints:
Why Re-Initialize?
Storing only one embedding per person is currently the easiest way to perform facial recognition. Due to circumstances, an image is not guaranteed to be taken into the optimal lighting conditions, head angle etc.
Example:
Mike is a friend of mine and I have a lot of photos of him. I queuing all images the first time and the algorithm detects mike at a football game. Image was taken from the side, so we can only see half of the face. But because it is the first appearance of Mike, this image is taken as reference for ALL the following images. Thats not optimal.
As a workaround, an endpoint is exposed where embeddings/faces/persons can be re-initialized using a “perfect” photo.
Improve Rescan by storing information (TBD)
My idea is, that for future rescans of the data collection, I want to store meta information for each scanned image. This meta information contains the position of the found faces as well as the resulting embeddings. This is important because of the possible reinitialization of an embedding.
Storing the generated embeddings, I can simply recalculate the distances for the updated persons and provide the information back to the main application, that stores the “image tagging information”
Beta Was this translation helpful? Give feedback.
All reactions