Face Detection using MTCNN

Face detection is a computer vision problem that involves finding faces in photos. 

It is a trivial problem for humans to solve and has been solved reasonably well by classical feature-based techniques, such as the cascade classifier. More recently deep learning methods have achieved state-of-the-art results on standard benchmark face detection datasets. One example is the Multi-task Cascade Convolutional Neural Network, or MTCNN for short. 

Face detection is a problem in computer vision of locating and localizing one or more faces in a photograph. 

Locating a face in a photograph refers to finding the coordinate of the face in the image, whereas localization refers to demarcating the extent of the face, often via a bounding box around the face. 

MTCNN Installation- 

$ pip install mtcnn 

This implementation requires OpenCV>=4.1 and Keras>=2.0.0 (any Tensorflow supported by Keras will be supported by this MTCNN package). If this is the first time you use tensorflow, you will probably need to install it in your system. 

The following example illustrates the ease of use of this package: 

>>> from mtcnn import MTCNN  

>>> import cv2  

>>> img = cv2.cvtColor(cv2.imread(“ivan.jpg”), cv2.COLOR_BGR2RGB)  

>>> detector = MTCNN ()  

>>> detector.detect_faces(img) [ { ‘box’: [277, 90, 48, 63], ‘keypoints’: { ‘nose’: (303, 131), ‘mouth_right’: (313, 141), ‘right_eye’: (314, 114), ‘left_eye’: (291, 117), ‘mouth_left’: (296, 143) }, ‘confidence’: 0.99851983785629272 } ] 

The detector returns a list of JSON objects. Each JSON object contains three main keys: ‘box’, ‘confidence’ and ‘keypoints’: 

  • The bounding box is formatted as [x, y, width, height] under the key ‘box’. 
  • The confidence is the probability for a bounding box to be matching a face. 
  • The keypoints are formatted into a JSON object with the keys ‘left_eye’, ‘right_eye’, ‘nose’, ‘mouth_left’, ‘mouth_right’. Each keypoint is identified by a pixel position (x, y). 

The results obtained from MTCNN are excellent, especially when it comes to detecting multiple faces in a frame. 

MTCNN can be used as a base for multiple applications like face detection, face recognition, Emotion detection etc. 

The following tables shows the benchmark of this mtcnn implementation running on an Intel i7-3612QM CPU @ 2.10GHz, with a CPU-based Tensorflow 1.4.1. 

  • Pictures containing a single frontal face: 
Image size Total pixels Process time FPS 
460×259 119,140 0.118 seconds 8.5 
561×561 314,721 0.227 seconds 4.5 
667×1000 667,000 0.456 seconds 2.2 
1920×1200 2,304,000 1.093 seconds 0.9 
4799×3599 17,271,601 8.798 seconds 0.1 
  • Pictures containing 10 frontal faces: 
Image size Total pixels Process time FPS 
474×224 106,176 0.185 seconds 5.4 
736×348 256,128 0.290 seconds 3.4 
2100×994 2,087,400 1.286 seconds 0.7 

Leave a Comment

Your email address will not be published. Required fields are marked *