This project aims at monitoring people violating Social Distancing over video footage coming from CCTV Cameras. Uses YOLOv3 along with DBSCAN clustering for recognizing potential intruders. A Face Mask Classifier model (ResNet50) is trained and deployed for identifying people not wearing a face mask. For aiding the training process, augmented masked faces are generated (using facial landmarks) and blurring effects (frequently found in video frames) are also imitated.
A detailed description of this project along with the results can be found here.
Running this project on your local system requires the following packages to be installed :
- numpy
- matplotlib
- sklearn
- PIL
- cv2
- keras
- face_detection
- face_recognition
- tqdm
They can be installed from the Python Package Index using pip as follows :
pip install numpy
pip install matplotlib
pip install sklearn
pip install Pillow
pip install opencv-python
pip install Keras
pip install face-detection
pip install face-recognition
pip install tqdm
You can also use Google Colab in a Web Browser with most of the libraries preinstalled.
This project is implemented using interactive Jupyter Notebooks. You just need to open the notebook on your local system or on Google Colab and execute the code cells in sequential order. The function of each code cell is properly explained with the help of comments.
Please download the following files (from the given links) and place them in the Models folder in the root directory :
- YOLOv3 weights : https://pjreddie.com/media/files/yolov3.weights
- Face Mask Classifier ResNet50 Keras Model : https://drive.google.com/drive/folders/1Q59338kd463UqUESwgt7aF_W46Fj5OJd?usp=sharing
Also before starting you need to make sure that the path to various files and folders in the notebook are updated according to your working environment. If you are using Google Colab, then :
-
Mount Google Drive using :
from google.colab import drive drive.mount("drive")
-
Update file/folder locations as
"drive/path_to_file_or_folder"
.
- NumPy : Used for storing and manipulating high dimensional arrays.
- Matplotlib : Used for plotting.
- Scikit-Learn : Used for DBSCAN clustering.
- PIL : Used for manipulating images.
- OpenCV : Used for manipulating images and video streams.
- Keras : Used for designing and training the Face_Mask_Classifier model.
- face-detection : Used for detecting faces with Dual Shot Face Detector.
- face-recognition : Used for detecting facial landmarks.
- tqdm : Used for showing progress bars.
- Google Colab : Used as the developement environment for executing high-end computations on its backend GPUs/TPUs and for editing Jupyter Notebooks.
You are welcome to contribute :
- Fork it (https://github.com/rohanrao619/Social_Distancing_with_AI/fork)
- Create new branch :
git checkout -b new_feature
- Commit your changes :
git commit -am 'Added new_feature'
- Push to the branch :
git push origin new_feature
- Submit a pull request !
This Project is licensed under the MIT License, see the LICENSE file for details.
YOLO (You Only Look Once) is a state-of-the-art, real-time object detection system. It's Version 3 (pretrained on COCO dataset), with a resolution of 416x416 in used in this project for obtaining the bounding boxes of individual persons in a video frame. To obtain a faster processing speed, a resolution of 320x320 can be used (lowered accuracy). Similarly, a resolution of 512x512 or 608x608 can be used for an even better detection accuracy (lowered speed). You can change the resolution by modifying the height
and width
parameters in the yolov3.cfg file and in the blobFromImage function in the Monitor Notebook. YOLOv3-tiny can also be used for speed optimization. However it will result in a decreased detection accuracy.
Dual Shot Face Detector (DSFD) is used throughout the project for detecting faces. Common Face Detectors such as the Haar-Cascades or the MTCNN are not efficient in this particular use-case as they are not able to detect faces that are covered or have low-resolution. DSFD is also good in detecting faces in wide range of orientations. It is bit heavy on the pipeline, but produces accurate results.
A slighly modified ResNet50 model (with base layers pretrained on imagenet) is used for classifying whether a face is masked properly or not. Combination of some AveragePooling2D and Dense (with dropout) layers ending with a Sigmoid or Softmax classifier is appended on top of the base layers. Different architectures can be used for the purpose, however complex ones should be avoided to reduce overfitting.
For this classifier to work properly in all conditions, we need a diverse dataset that contains faces in various orientations and lighting conditions. For better results, our dataset should also cover people of different ages and gender. Finding such wide range of pictures having people wearing a face mask becomes a challenging task. Thus we need to apply numerous kinds of Augmentation before we start the training process.
It may seem a little akward, but with the power of deep learning in our hands, impossible is nothing!
Original | Facial Landmarks |
The picture on the left is the original image and that on the right shows the points (green dots) that we need to artificially put a face mask on it. These points are found by doing some manipulation on facial landmarks, namely nose_bridge and chin. Top point is near the 1st and 2nd points in the detected nose_bridge points. Left, Right and Bottom points are near the first, last and middle points in the detected chin points respectively.
Now we just need to resize and rotate the image of mask according to these 4 reference points and paste it on the original one. Implementation details can be found in this notebook. Trying with different mask images we get this :
Default | White | Blue | Black |
These augmented pics do not seem very real, but it is better having them rather than overfitting on a smaller dataset. We just need to take care that the original versions of these augmented samples do not appear in the training data, otherwise the model won't generalize well on new faces. This augmentation does not work very well for faces that are not fully visible or whose landmarks are not detected properly, but still manages to produce some decent results for various facial structures and orientations. Some other samples are shown below.
As we are working on video frames, it's highly probable we encounter blurred faces and it's damn sure DSFD won't miss any one of those! This Blurriness could be due to rapid movement, face being out of focus or random noise during capturing. So we need to randomly add some kind of blurring effect to some part of our training data. 3 types of effects are used :
- Motion Blur (Mimics Rapid Movement)
- Average Blur (Mimics Out of Focus)
- Gaussian Blur (Mimics Random Noise)
Implementation details can be found in this notebook. An example of average and gaussian blur is shown below. A kernel of size (5,5) was used for the same.
Original | Average Blur | Gaussian Blur |
On the same picture, motion blur with a kernel of size (7,7), in the 4 principal directions looks something like this :
Horizontal | Vertical | Main Diagonal | Anti Diagonal |
It's training time! By using Keras' ImageDataGenerator we apply augmentation on the fly. Training data now keep changing after each epoch. Rotation, Horizontal Flip, Brightness Shift are some of augmentations that can be applied. We also incorporate our blurring augmentation with some associated probabilities during the training. Implementation details can be found in this notebook.
The model needs to be trained on tons of relevant data before we can apply it in real-time and expect it to work. It needs a lot of computational power and I mean a lot! We can try our models trained on a small dataset in our local machines, but it would not produce desirable results. We can use pretrained open-source models for now. So we use the model trained by the team of Thang Pham for this purpose. It is basically a ResNet50 Model with a modified top.
Implementation details can be found in this notebook. Some optimizations can be made in the form of vectorization. For getting the position of a person, there are various approaches. One of them being simply using the centers of the bounding boxes, the one used in this project. Other one is using OpenCV's perspective transform to get a bird's eye view of the positions, but that kind of needs pretty accurate frame of reference points. Using it also increases the complexity of the system by a bit. However if implemented correctly, it will no doubt produce better results. For now we stick to the first approach. Remember, there's always scope for improvements!
As we can see, the system identifies the people in the frame and their faces (if it is visible) and puts green or red bounding boxes if they are safe or not safe respectively. Status is shown in a bar at the top, showing all the details. These are pretty decent results considering the complexity of the problem we have at our hands. But we can't see if the Face Mask Classifier is working properly or not. Let's take a look at another example.
Yes! there we have it. It is producing very neat results, detecting masked faces properly. These are just gifs for embedding in this Readme, original full resolution video results can be seen here.
Artificial Intelligence has sure come a long way in solving such real world problems. Hoping that it helps the world overcome the COVID-19 situation, so we can party again, hold our glasses high, and say Cheers!
Thanks for going through this Repository! Have a nice day.
S Rohan Rao
Contact me : [email protected]