Face Detection Project

Project Enquiry:

Fields with * are mandatory


The goal of this project is to detect and locate human faces in a color image. A set of seven training images were provided for this purpose. The objective was to design and implement a face detector in MATLAB that will detect human faces in an image similar to the training images.

The problem of face detection has been studied extensively. A wide spectrum of techniques have been used including color analysis, template matching, neural networks, support vector machines (SVM), maximal rejection classification and model based detection. However, it is difficult to design algorithms that work for all illuminations, face colors, sizes and geometries, and image backgrounds. As a result, face detection remains as much an art as science.

Our method uses rejection based classification. The face detector consists of a set of weak classifiers that sequentially reject non-face regions. First, the non-skin color regions are rejected using color segmentation. A set of morphological operations are then applied to filter the clutter resulting from the previous step. The remaining connected regions are then classified based on their geometry and the number of holes. Finally, template matching is used to detect zero or more faces in each connected region. A block diagram of the detector is shown in Figure 1.

Block Diagram of Face Detector

Block Diagram of Face Detector.


The goal of skin color segmentation is to reject non-skin color regions from the input image. It is based on the fact that the color of the human face across all races agrees closely in its chrominance value and varies mainly in its luminance value.

We chose the HSV (Hue, Saturation, Value) color space for segmentation since it decouples the chrominance information from the luminance information. Thus we can only focus on the hue and the saturation component.

During the execution of the detector, segmentation is performed as follows:

  • The input image is subsampled at 2:1 to improve computational efficiency
  • The resulting image is converted to HSV color space
  • All pixels that fall outside the H and S thresholds are rejected (marked black).

The result is shown in Figure 3.

 Image After Histogram Thresholding (Training Image 1)

Image After Histogram Thresholding (Training Image 1).


Figure 3 shows that skin color segmentation did a good job of rejecting non-skin colors from the input image. However, the resulting image has quite a bit of noise and clutter. A series of morphological operations are performed to clean up the image, as shown in Figure 4. The goal is to  end up with a mask image that can be applied to the input image to yield skin color regions without noise and clutter.

 Morphological Processing on the Color Segmented Image

Morphological Processing on the Color Segmented Image.


The image output by morphological processing still contains quite a few non-face regions. Most of these are hands, arms, regions of dress that match skin color and some portions of background. In connected region analysis, image statistics from the training set are used to classify each connected region in the image.

Rejection Based on Geometry:

We defined four classes of regions that have a very high probability of being non-faces based on their bounding box:

narrow                                  Regions that have a small width
short                                      Regions that have a small height
narrow and tall                 Regions that have a small width but large height
wide and short                 Regions that have a large width but small height

Flow Chart for Rejection Based on Region Geometry

Flow Chart for Rejection Based on Region Geometry.

Rejection Based on Euler Number:

The Euler number of an image is defined as the number of objects in the image minus the total number of holes in those objects. Euler number analysis is based on the fact that regions of the eyes, nose  and lips are distinctively darker from other face  regions and show up as holes after proper thresholding in the intensity level.

 The Output of Connected Region Analysis for Training Image 1

The Output of Connected Region Analysis for Training Image 1.


The basic idea of template matching is to convolve the image with another image (template) that is representative of faces. Finding an appropriate template is a challenge since ideally the template (or group of templates) should match any given face irrespective of the size and exact features.

Template Generation:

The template was originally generated by cropping off all the faces in the training set using the ground truth data and averaging over them. We observed that the intensity image obtained after color segmentation contained faces with a neck region and so we decided to modify out template to include the neck region.

The Templates Used: (a) Without Neck Region, (B) With Neck Region

The Templates Used: (a) Without Neck Region, (B) With Neck Region.


This was probably the trickiest part of the project. As noted in the previous section, template matching was not an option here since it is insensitive to facial features. Instead, we decided to use some heuristics. One observation was that one of the females was wearing a white scarf on her head.

So if we drew a box starting from the coordinate of each face to the head and counted the  number of white pixels, then this female would have the maximum number. Using  this heuristic, we  spotted the female in four out of five training images. For the other females, we decided to look  for long hair. This was done by cropping a box starting below the face coordinates and extending to the chin, and counting the number of black pixels.


The final result for training image 1 is shown in Figure 14.

Final Result for Training Image 1.

Final Result for Training Image 1.

Our face detector detects 160 faces out of the total 164 faces in the seven training images with one false positive. This results in an accuracy of 97%. The average running time on a Pentium 4 1.8GHz PC was 35 seconds.


We have presented a face detector with a reasonably good accuracy and running time. However, many aspects of the design are tuned for the constrained scene conditions of the training images provided, hurting its robustness. This is not unfair given the scope and requirements of the project. Our algorithm is sensitive to the color information in the image and will not work for a gray scale image.

We feel that detecting connected faces was the hardest part of the project. A great deal of time was spent coming up with a template matching scheme that adapts well to connected faces, including those that are partly visible.

Source: Stanford University
Authors: Waqar Mohsin | Noman Ahmed | Chung-Tse Mar

Download Project

>> Matlab Projects on Voice, Speech Recognition and Fingerprint Recognition

Project Enquiry:

Fields with * are mandatory

Leave a Comment

Your email address will not be published. Required fields are marked *