
Video facial expression recognition in the wild
Abstract
For facial expression recognition in the wild, due to the problem of irregular illumination and face pose, traditional shallow features extraction methods and classifiers have low robustness and are easy loss of effective information; and for facial expression recognition based on video sequences, the changing information of facial regions along the time axis can hardly be described by static descriptors and models for processing static image. This Dissertation uses a LRCN model that combines Convolutional Neural Networks (CNN) and Long Short-Term Memory networks (LSTM), of which the CNN part is to extract the features in spatial domain of each frame and the LSTM part is to process the features in time domain of time sequences. Meanwhile, by means of transfer learning, in CNN part we use VGG-16 networks as pre-training model into which loaded the weights trained on large-scale face recognition dataset VGG-Face, and fine tune it to solve the problem of small-scale dataset training. Otherwise, we used Support Vector Machine (SVM) as classifier trained on LBP-TOP feature of video. Comparison of the experimental results between the two classifiers can show the difference of these two models.
On MMI facial expression database, the LRCN achieved accuracy of 76%; On AFEW Facial Expressions database in the Wild, the LRCN network achieved accuracy of 42.10% on test set and 39.89% on validation set; the SVM classifier trained on LBP-TOP feature achieved accuracy of 33.96%. All of these experimental results proved the effectiveness of LRCN model.
KEYWORDS: Emotion Recognition; Convolutional Neural Network; Long Short-Term Memory networks; LBP-TOP; Support Vector Machine
If you are interested about specific content of my thesis, please contact me by email (check Email address on About page.).