Current Issue Cover
构建并行卷积神经网络的表情识别算法

徐琳琳,张树美,赵俊莉()

摘 要
目的 表情识别在商业领域、安全领域、医学领域等有着广泛的应用前景,能够快速准确的识别出面部表情对其研究与应用具有重要意义。传统的机器学习方法需要手工提取特征且准确率难以保证。近年来,卷积神经网络因其良好的自学习和泛化能力得到了广泛应用,但还存在表情特征提取困难、网络训练时间过长等问题,本文针对以上问题提出了一种基于并行卷积神经网络的表情识别方法。方法 首先对面部表情图像进行人脸定位、灰度统一以及角度调整等预处理,去除了复杂的背景、光照、角度等的影响,得到了精确的人脸部分。然后针对表情图像设计了一个具有两个并行卷积池化单元的卷积神经网络,可以提取细微的表情部分。该并行结构具有三个不同的通道,分别提取不同的图像特征并进行融合,最后送入SoftMax层进行分类。结果 实验使用提出的并行卷积神经网络在CK+、FER2013两个表情数据集上进行了十倍交叉验证,最终的结果取十次验证的平均值,在CK+上取得了94.03%,FER2013上取得了65.6%的准确率。迭代一次的时间分别为0.185s、0.101s。结论 本文为卷积神经网络的设计提供了一种新思路,可以在控制深度的同时扩展广度,提取更多的表情特征。实验证明,针对数量、分辨率、大小等差异较大的表情数据集,该网络模型均能够获得较高的识别率同时缩短训练时间。
关键词
Constructing Expression Recognition Algorithm for Parallel Convolutional Neural Networks

徐 琳琳,张 树美,赵 俊莉(College of Data Science and Software Engineering, Qing Dao University, Qingdao, 266071)

Abstract
Objective Face emotion recognition has a wide range of applications prospects in the commercial field, security field, medicine field and so on. Being able to quickly and accurately identify facial expressions is of great significance for their research and application. There are currently some traditional machine learning methods. For example, Support Vector Machine(SVM),Principal Component Analysis(PCA),Local Binary Pattern(LBP),etc. But these traditional machine learning algorithms need manually extract features. In this process, because of a lot of human intervention, some features are hidden or deliberately enlarged, which makes the accuracy difficult to guarantee. In recent years, Convolutional Neural Networks(CNN) have been widely used in image recognition because of their good self-learning and generalization capabilities. But still exists some problems with neural network training, such as the difficulty of face facial expression feature extraction and the long training time of neural network, etc. This paper proposes an expression recognition method based on parallel convolution neural network aiming at the above problem.Method First of all, a series of preprocessing operations are performed on facial expression images. For example, the original image was detected by the Adaboost cascade classifier to remove the complex background and obtain the face part. Then, the face image is compensated by illumination, and the histogram equalization method is used to nonlinearly stretch the image, and the pixel value of the image is reallocated. Finally, the affine transformation is used to achieve face alignment. The above preprocessing can remove complex background effects, make up for lighting and adjust the angle to get more accurate face parts than the original image. Then this paper design a convolutional neural network with two parallel convolution and pooling structure for the facial expression image, which can extract the subtle expression part. This parallel unit is the core unit of the convolutional neural network. It consists of convolutional layer, pooling layer and activation function Relu. And this parallel structure has three different channels, each has a different number of convolution, pooling layers and relu. To extract different image features and fuse the extracted features. The second parallel processing unit can perform further convolution and pooling on the features extracted by the first parallel processing unit. And further reduce the dimension of the image and shorten the training time of the convolutional neural network. Finally, the previously merged features are sent to the SoftMax layer for expression classification.Result Divide the CK+ and FER2013 expression datasets that have undergone pre-processing and data enhancement into ten equal parts. Then training and testing were performed on ten parts and the final accuracy is the average of ten results. The experimental results show that compared with the traditional machine learning methods such as support vector machines、principal component analysis、local binary patterns or their combination and other classical convolutional neural networks such as AlexNet、GoogLeNet, the accuracy has risen and times has decreased significantly. Finally, CK+ achieved 94.03% and FER2013 achieved 65.6% accuracy. And the iteration time reached 0.185s and 0.101s respectively.Conclusion This paper proposes a new parallel convolutional neural network structure, which extracts the features of the face facial expression image by using three different convolutional and pooling structures, respectively. These three paths have different combinations of convolutional and pooling layers. And they can extract different image features. Finally, the different features extracted are combined and sent to the next layer for processing. This provides a new idea for the design of convolutional neural networks, which can extend the breadth of the convolutional neural network while controlling the depth. This can extract more expressions that are ignored or difficult to extract. CK+ and FER2013 expression data sets have large difference in quantity, size, resolution, etc. Experiments on CK+ and FER2013 show that the model can extract subtle features for facial expression images, which can greatly shorten the time while ensuring the recognition rate.
Keywords
QQ在线


订阅号|日报