Deep Learning Classification of Face Images with varying Illumination Conditions

In face recognition system, the accuracy of recognition is greatly affected by varying degree of illumination on both the probe and testing faces. Particularly, the changes in direction and intensity of illumination are two major contributors to varying illumination. In overcoming these challenges, different approaches had been proposed. However, the study presented in this paper proposes a novel approach that uses deep learning, in a MATLAB environment, for classification of face images under varying illumination conditions. One thousand one hundred (1100) face images employed were obtained from Yale B extended database. The obtained face images were divided into ten (10) folders. Each folder was further divided into seven (7) subsets based on different azimuthal angle of illumination used. The images obtained were filtered using a combination of linear filters and anisotropic diffusion filter. The filtered images were then segmented into light and dark zones with respect to the azimuthal and elevation angles of illumination. Eighty percent (80%) of the images in each subset which forms the training set, were used to train the deep learning network while the remaining twenty percent (20%), which forms the testing set, were used to test the accuracy of classification of the deep learning network generated. With three successive iterations, the performance evaluation results showed that the classification accuracy varies from 81.82% to 100.00%.   


I. INTRODUCTION
Face recognition technology, which is one of the most successful applications of computer vision and image processing, has over the last two decades gained recognition within the computer vision community.This is accountable for its wide range of applications in several fields.One of such applications is in identification of crime suspects by law enforcement and security agencies.Although development of face recognition systems has reached a certain level of maturity, their successes are limited by the conditions imposed by many real applications.Some of these conditions can be due to variation in pose, illumination, expression and age [1]- [2].For example, recognition of face images acquired in a real environment with changes in light intensity and its direction remains a largely unsolved problem [3].As a result, many techniques Published on May 1, 2019.C. G. Olebu, J. J. Popoola, M. R. Adu, Y. O. Olasoji and S. A. Oyetunji are with the Department of Electrical and Electronics Engineering, School of Engineering and Engineering Technology, Federal University of Technology, P.M.B. 704, Akure, Nigeria (e-mail: cgolebu@futa.edu.ng;jjpopoola@@futa.edu.ng;mradu@futa.edu.ng;yoolasoji@futa.edu.ng;saoyetunji@futa.edu.ng).
have been developed to tackle this problem.One of the techniques that have been developed are discrete wavelet transform (DWT), which are multi-resolution image analysis technique that extracts the low and high frequency components of an image at different scales of operation.The technique has been successfully utilized in different face recognition systems as a tool to extract facial features of a given face image [4]- [5].Another technique employed is the gradient face method [6].According to [6] the gradient face method is an illumination insensitive measure, which is quite robust to different illumination intensities including uncontrolled natural lighting.This method involves the extraction of gradient faces from face image gradient domain so as to discover the underlying features that are peculiar to a particular face image.Two other techniques that have been developed by researchers to tackle the problem of illumination variation are the histogram [7] and Gabor filters [8].
One of the major challenges of the above techniques is that they perform differently under different illumination conditions.This implies that the normalization of the images using different normalization methods is significantly influenced by the azimuthal angle and angle of elevation of the illumination characteristics of the face image.An example of how variation in illumination affects the normalization of face images was presented in the study presented in [9] where an improved retinex algorithm was used.The result of experimenting on Yale B face database showed that the normalization algorithm performed differently on the five different subsets of the face images that were being classified based on the angles of illumination.Furthermore, the study presented in [10] combined the difference of Gaussian (DOG) and contrast equalization method to normalize face images with varying illumination conditions.An experiment on Yale B face database also showed that the normalization performance of the proposed method on the different classes of faces in the database were different and varied with illumination intensity.A Wavelet-based illumination normalization technique was presented in [11].In [11], after performing an experiment on Yale B face database, the recognition rate of subsets 1 and 2 were greater than the recognition rate of subset 4, 5 and 6.The result of the study in [11] also revealed different recognition rates with varying accuracy for images with varying illumination using the extended Yale B face database.
Thus in the study presented in this paper, a new approach was proposed to classify face images under uncontrolled illumination in order to facilitate adaptive illumination normalization before the actual face recognition process.Detailed information on the new approach is presented in

II. LITERATURE REVIEW
An illumination aware method that uses image quality indices to adaptively select fusion of match scores from high and low frequency components of a represented face image for wavelet-based multi-stream face recognition has shown to improve face recognition under different lighting conditions [12].This technique employed luminance distortion factor to measure regional or global illumination quality of images and then normalizing only if image quality is below a predefined threshold.A method that hybridized homomorphic filtering techniques, histogram equalization and a combination of the two was proposed in the study presented in [13].The proposed method overcomes the problem of images, which have poor contrast and/or illumination variations.The method was evaluated using face images obtained from Yale B face database, which were divided into four classes based on the quality of contrast and illumination.
Similarly, in the study presented in [14], an angle classification technique that utilizes the similarity discrete cosine transform to obtain the light estimated face image was proposed.In [14], principal components analysis (PCA), two dimensional linear discriminant analysis (2DLDA) and image similarity methods were used to classify the light source direction.Likewise, a quality-aware technique that selectively denoised only noisy images was presented by Joshi and Prakash [15].The authors made use of image quality estimation techniques based on noise detection in order to avoid blind enhancement of images that does not require any enhancement or limited enhancement.In the proposed technique, the enhancement of any image is carried out only if the quality value is below a predetermined threshold.The technique combined quality estimation and image enhancement modules.Another study, which considered image quality as an important factor of face image classification is presented in [16].A survey of different concepts and interpretations of biometric quality was carried out in this study.Several factors that causes different types of biometric sample degradation were extensively considered.Similarly, factors that are attributable to degradation of face images were considered.

III. MATERIALS AND METHOD
This section is divided into two subsections.In the first subsection, the material used as well as its source was presented.In the second subsection, detailed information on the activities involved in carrying out the study was presented.Details on the two subsections are presented in the following subsections.

A. Data Collection
The face images employed in this study were retrieved from the Cropped Yale-B Extended Face Database.The retrieved face images were preprocessed before dividing them into training and testing sets.The database consists of 39 characters and each character has 64 instances/images, which were categorized based on the illumination angle of elevation and azimuthal angle of the light source incident on the face image.The variation of face images with light was made to be consistent throughout all the characters in the database.All images contained in each character folders were divided into 7 different subsets based on some specific range of the azimuthal angle of illumination as described in [14].Table I shows the relationship between the range of azimuthal angle of illumination and the subsets considered, which also shows the division of all the ten (10) characters into subsets.Furthermore, the graphical representation of the azimuthal angle and the angle of elevation illumination of the first character after classification into the various subsets is shown in Fig. 1.The figure clearly depicts that the seven subsets are visually separable based on the arrangements of points on the graph.Although distribution of light on the face image could be inconsistent as a result of the difference in the makeup of human face.From Fig. 1, it is obvious that there is a relationship that exists between the angle of subsets' angle of illumination and the subsets and their angle of elevation.For characters yaleB01 to yaleB10, the seven subsets are created for each of them and later combined into a whole.The total number of face images in each subsets were different after division.In order to have the same number of face images in each subset, complementary face images are randomly chosen from other characters (between yaleB11 and yaleB39).The same selection criteria are maintained as before until all subsets have the same number (110) of face images.Fig. 2 shows the subset-divided face images for YaleB01 character.

B. Method
In facilitating the classification of the prepared face images, under varying illumination degrees, some steps were involved.Firstly, the luminance of each face image in the prepared dataset was determined followed by illumination normalization using anisotropic diffusion filtering.Secondly, the resulting image was segmented and finally, the images were trained on a deep convolutional neural network.The generated network was used to classify the illumination angle of face images in the training set.The sequential steps involved are further elaborated in subsequent subsections.

Luminance Determination
The first step in determining the luminance of the face image is filtering with a Max-Min filter using a kernel of size .A max filter outputs a maximal pixel value from its rectangular window as shown in Fig. 3, while the min filter output a minimal pixel value from its rectangular window.Essentially, the straight forward implementation requires complexity value of the second order time complexity ( ) ( ) 2 r O [17] for each image pixel within the neighbourhood of concern [18].
In each operation (max and min filtering operations), the central image pixels is replaced with the maximum or minimum of all the pixels in the North, South, East, West, North-East, South-West, South-East and North-West directions.The max filter according to [20] is mathematically expressed as: where is the resulting pixel of the maximum operation of the local neighourhood .The function calculates the maximum value of all pixels in .Fig. 3. Rectangular neighbourhood [19]) Furthermore, the min filter can be mathematically expressed as: where is the resulting pixel of the minimum operations within .The function calculates the minimum value of all pixels in .After obtaining and , an illumination fusion operation is implemented on these image components by finding their average, pixel-by-pixel for each corresponding location.Using the method proposed in [21], the light shielding edges and other regions were distinguished.The images were then segmented and defined the fused illumination estimation result, of using equations ( 3) -( 5): (3) where is a constant varying between 0 and 1.In the study presented in this paper, is chosen to be 0.6 according to the value adopted in [21].

Anisotropic Diffusion Filtering
The fused estimation result obtained in was filtered using anisotropic filtering technique.An anisotropic filter is a new non-linear partial differential equation-based diffusion process [22].The filter was applied in reducing the diffusion effects of linear smoothing on images, which causes blurring and dislocation of the core edges of the image.Mathematically, we express the anisotropic diffusion filter used in this study as: where From ( 6), the is the rate of diffusion across edges, is the number of neighbours in all cardinal points, denotes the pixel position in the discrete 2-D grid.From (7) is the diffusion gradient threshold, is the gradient (considered for all cardinal points) and is the conductance function determined in all four directions.In order to reduce diffusion across edges, the edge stopping function (conductance function) used by [23] was employed.The edge stopping function is expressed mathematically as:

Image Segmentation
The fused estimation result, obtained previously was segmented into binary value of either 0 or 1.An appropriate threshold value was chosen to facilitate better binarization.This process helped separate the light-dominant region from the dark-dominant region in the face in .This process is mathematically representation in ( 9) and (10) as; (10) C. Classifying the Subsets Deep learning, which is an aspect of machine learning permits the computational models (composed of multiple processing layers) to learn representations of data with multiple levels of abstraction.This technique discovers intricacies in large datasets by using the back propagation algorithm to indicate how a machine should change its parameter using the relationship between layers [24] .At this stage, we evaluate the applicability of deep convolutional neural network (Deep Learning) for the classification problem described in this work.The deep learning architecture utilized in this paper consists of Image Input Layer, 2-D Convolutional Layer, Rectified Linear unit, Max-pooling layer, Fully Connected Layer, Soft-max Layer and Classification Layer.For simplicity, these layers and their functions are explained in the following subsections.

Image Input Layer
In this layer, the image data was acquired and the above previous operations were implemented to produce the image information that would be processed.80% of the image was used in training the neural network while the remaining 20% was used in testing the classification ability of the network.

2-D Convolutional Layer
Following the image input layer is the 2-D Convolutional Layer.A mask of size was used as in [25].Using the mask, the convolutional operation was implemented by adding the multiplication of each element of the mask mapped to the corresponding elements in the local neighborhood as shown in Fig. 4. In this study, a mask of size and 25 neurons connected to the same region of output was used.

Rectified Linear Unit
This serves as an activation of the output of the convolutional layer.For each element in the output of the convolutional layer, if the value of each element is greater than or equal to a certain threshold value, say '0', then the output equals the value of the input.However, if the element value is below the threshold value, then the output value is seen as zero [26].Mathematically, the function of the rectified linear unit (ReLu) layer is expressed as: (11) Max-Pooling Layer This layer further reduces the dimension of the image layer by finding the maximum of all the element within an local neighborhood.The max-pooling layer carries out a non-linear down sampling operation after the convolutional layer is passed through the ReLu activation function [27].In this study, a filter of size was used for the max-pooling layer.

Fully-Connected Layer
The fully-connected layer output a column vector of dimensions where is the number of possible classes predictable by the network.This vector consists of the probabilities for each class of any image being classified.In this research, all part of the neurons were interconnected to form the single vector that was be used in predicting the trained network.

Soft-Max Layer
Following the fully-connected layer is the soft-max layer.The soft-max layer help provides the soft-max activation function for a multi-class classification problem.The soft-max activation function is also referred to as the normalized exponential function.According to [28], the probability of choosing a particular is given as: where and and .
is the conditional probability of the sample given class , and is the class prior probability.

Classification Layer
The final layer is the classification layer.This layer uses the probabilities returned by the soft-max activation function for each input for assignment to one of the mutually exclusive classes.Fig. 5 summarizes the main idea of deep neural network, giving a cursory view of the layer involved in the architecture.

IV. RESULTS AND DISCUSSIONS
The objective of the study was implemented using MATLAB and the results for each stage of the research were recorded and the accuracy of the approach adopted was also analyzed.The MATLAB codes written was run on a single intel ® Pentium® CPU having a speed of 2.16 GHz.The Deep Learning algorithm was implemented using the Neural Network ToolBox.In order to understand the degree to which the research objectives have been achieved, the results of the methodology would be arranged sequentially.

A. Results of Luminance Determination and Anisotropic Diffusion Filtering
The results of the first two steps of the methodology, luminance determination and anisotropic diffusion filtering, are shown in Fig. 6(a) and Fig. 6(b).From the Fig. 6(a), the luminance of the different face images has been determined and the directions of both the azimuthal and elevation of the angles are clearly shown.The white region of the figure shows the distribution of light on the face images and the direction of incidence of the light, while the dark region illustrates the regions on the face with poor illumination.The directions of light from Subset 1 to Subset 7 show a gradual change from right to left.For Subset 4, the light occupies a central location on the face and the shape of the light region is almost perfectly spherical, while for the other subsets, excluding Subsets 6 and 2, the shape of the light region is partially spherical with a corner slightly curved that the other adjacent edge depending on the azimuthal and angle of elevation.However, there are no sharp edges between the light and dark regions in Fig. 6(a).There is diffusion between edges, which caused blurring along the edges.In order to enhance and sharpen the edges, the burred images has to be binarized, which would help in making each class linearly separable during the training process.

B. Result for Image Segmentation
The results of segmentation (binarization) are shown in Fig. 7.The burred images with high diffusion effects have been enhanced and sharpened using a segmentation process.This has helped in completely segmenting the light region from the dark region.There is an abrupt transition from the light region to the dark region.The transition of the light from the light region to the dark region is similar to the result of normalization except that the edges around the images are sharper and binarized.The shape of the light region of the segmented training face image was inconsistent, which is attributable to the differences in the structure of face images used in the training set.The mesh plot of each subset for a normalized face image from the training set is shown in Fig. 8(a) to Fig. 8(g).From the mesh plots, the regions with spikes and yellow coloring are regions on the normalized face with a great deal of exposure to light.The valley regions show a gradual decrease in the exposure to light.These dark region are not exposed to light.From subset 1 to subset 7, the orientation of the hilly part of the mesh plot changes from the right direction to the left direction, which explains the change in the angle of azimuthal and angle of elevation.The simulation was done at a constant learning rate of 0.0001.Fig. 9 shows the real time graph drawn during the simulation process.Also, Table III shows the simulation results for the first training on the training datasets.From the table, the 13 th epoch had a mini-batch accuracy of 82.81% using a constant learning rate of 0.0001.At the 15 th epoch the simulation attained a mini-batch accuracy of 70.31 still, using a constant learning rate of 0.0001.During the simulation process, the training accuracy graph was drawn real-time and the final result of the graph is shown in Fig. 10.Also, Table IV shows the simulation results for the second training on the training datasets.From the table, a batch accuracy of 99.22% was attained at the 25 th and 26 th epoch at a base learning rate of 0.0001 rate of 0.0001.From Table VI, the first training of the convolutional deep neural network resulted into a poor performance in terms of the accuracy.22.73% of subsets 2 were accurately predicted and 95.45% of the accurately predicted.After using the network produced in the second simulation, subsets 6 and 7 have the lowest prediction accuracy of 86.36% while first subset had a prediction accuracy of 97%.Furthermore, in the second simulation, a maximum prediction accuracy of 100% was achieved in predicting both subset 3 and subset 4. It can be generalized that prediction of subsets with negative azimuthal angle of illumination performed better than others with positive azimuthal angle of illumination.The prediction accuracy in the third simulation performed better in accurately predicting all subsets.
The study presented in [14] was able to classify the illumination angle of face images using the similarity approach and also achieved a good level of classification in terms of accuracy of prediction for all the subsets.The degree of accuracy achieved in this study surpasses those achieved in the predictions of subsets 1, 3 and 4 in the study in [14].In addition, the study conducted and reported in [13] also achieved a good accuracy value when tested on Yale Database.However, the method adopted by [13] involved the division of the database into four subsets which made the segregation of the azimuthal angle sharper and more separable than dividing into seven subsets.This accounts for the differences in results obtained when the datasets are divided into seven different subsets.

V. CONCLUSION
In this study, deep learning was successfully used to classify face images obtained from the Yale B Extended Face Database into subsets according to their azimuthal angle of illumination.A classification performance of 100% was achieved for subsets 3 and 4 while accuracies within the range of 81.82% and 94.76% were achieved for subsets 1, 2, 5, 6 and 7.For future research, this approach can be integrated into a face recognition system that selectively normalizes face images according to their class of subset in order to achieve maximum recognition accuracy.

Fig. 1 .
Fig. 1.Graph of Azimuthal angle and angle of Elevation for each subset in yaleB01 character of the datasets

Fig. 2 .
Fig. 2. Face images for subsets 1 to 7 based on the azimuthal angle of illumination

Fig. 6 .
Fig. 6.(a) Results for luminance determination per image subset and (b)Results for anisotropic diffusion filtering per image subset.

Fig. 9 .
Fig. 9. Graph of training accuracy for the first simulation.
Deep Learning Classification of Face Images with varying Illumination ConditionsChinedu G. Olebu, Jide J. Popoola, Michael R. Adu, Yekeen O. Olasoji, and Samson A. Oyetunji section III while a brief literature review is presented in section II.The result obtained is presented and discussed in section IV.This paper is finally concluded in section V.

TABLE I SUBSETS
AND THEIR CORRESPONDING AZIMUTHAL ANGLE

TABLE III SIMULATION
RESULT FOR FIRST TRAINING

TABLE IV SIMULATION
RESULT FOR SECOND TRAINING Finally, after using the 3 rd values specified in TableII(convolution mask of 2, number of neurons of 25, number of maximum pooling layer of 3, maximum epoch of 26 and a constant learning rate of 0.0001), the result presented in Table V was obtained.A satisfactory mini-batch of accuracy of 100% was obtained at both the 25 th and 26 th epochs with a minimal accuracy of 0.0369.The deep convolutional network obtained after training for each simulation was tested using the testing dataset, which forms 20% dataset.

Table
VI also shows the accuracy of prediction for each subset for the three simulations.