-
Notifications
You must be signed in to change notification settings - Fork 13
Home
We present a framework for estimating race from faces in movie data. Please refer to a detailed survey “Learning race from face: A survey”, Fu S. et. al., 2014 for a discussion on studies on recognizing race from faces. The definitions and taxonomy of race from a computer vision perspective are borrowed from this survey. The list of sources we used can be found here.
- Data preparation
- Sample size
- Model architecture
-
Performance evaluation
i. Confusion matrix and accuracy
ii. ROC curves - What is the network learning?
We compiled face data across opensource databases which have clear definitions of labeling races and are consistent with our nomenclature. Additionally, we annotated race for IMDb identities as described in Ramakrishna et. al., 2017, ACL. Sources which required institutional EULA agreements were excluded due to the processing delays involved.
- All face images were resized to 100x100 and were converted to grayscale. Since a good number of images in our database were grayscale images, we converted all images to grayscale images
- Skin color has been shown to be one of the least important factors in race prediction (See survey)
- All images were aligned to achieve in-plane rotation to align landmarks of eyes and nose
- Left-right flipping of the images was performed for data augmentation
- Test images for evaluation were chosen to have no overlap of the person's identity with the training data as well as to be variable with respect to pose, illumination, background and occlusions
- Due to lack of data from the nativeamerica/pacificislander class, we only considered the other five classes in our prediction model
race | num_faces | num_images |
---|---|---|
eastasian | 14472 | 28133 |
caucasian | 267478 | 529741 |
latino/hispanic | 10912 | 21374 |
asian-indian | 28132 | 55473 |
african/african-american | 22646 | 44627 |
nativeamerican/pacificislander | 580 | 1069 |
TOTALs | 344220 | 680417 |
A simple modified VGG-like architecture was adopted for a 5-class race classification model. The architecture is as shown below.
- Class-wise system performance is tabulated below. Overall system performance accuracy was 82.6% for a training setup of mini-batch size 32, trained for 50 epochs.
Race | african | asianindian | caucasian | eastasian | latino |
---|---|---|---|---|---|
accuracy (%) | 83.2 | 85.9 | 93.2 | 87.4 | 65.6 |
Confusion matrix
We were inspired by visualizing filter activations in a CNN as described here and extended the concept to identify the input (or distorted input) that minimizes the output loss for each class. In this, we initialize the input with an average face and minimize the categorical cross entropy loss for each class and visualize the changes to the input that achieves this minimization. The resulting modified inputs are shown below.
Not surprisingly, the modified inputs resemble the per-class average faces and are quite distinct. The results were similar when the input was initialized by average face from a specific class, rather than across all images.
Media Informatics and Content Analysis, Signal Analysis and Interpretation Laboratory, USC