目的 复杂热红外监控场景中的行人检测问题是计算机视觉领域重要研究内容之一,是公共安全、灾难救援以及智慧城市等实际应用中的重要基础任务。现今的热红外行人检测算法大多依据图像中人体目标的灰度值高于场景环境这一假设,导致当环境温度升高热红外图像发生灰度值反转时行人检测率较低。为提高行人检测系统在不同场景中的鲁棒性以及行人目标检测率,提出一种面向热红外监控场景的基于频域显著性检测的全卷积网络行人目标检测算法。方法 该算法首先对热红外图像进行基于频域的显著性检测,生成对行人目标全覆盖的显著图；然后结合热红外原图像生成感兴趣区域图作为输入,以行人目标概率图为输出,搭建全卷积网络；最后,对热红外行人检测系统进行端对端训练,获取网络输出的行人目标概率图,进而实现行人目标检测。结果 论文使用俄亥俄州立大学建立的红外视频数据集OTCBVS中的OSU热红外行人数据库对算法进行验证。实验结果表明本文算法可以在各种场景中准确检测出行人目标,具有更高的检测率,对热红外图像中的灰度值反转问题具有更好鲁棒性。结论 本文提出一种面向热红外监控场景的基于频域显著性检测的全卷积网络行人目标检测算法,在实现检测算法端对端训练的同时,提高了其对各种复杂场景的鲁棒性以及行人目标检测率,提升热红外监控系统中行人目标检测性能。
Objective Pedestrian detection in the complex thermal infrared surveillance is an important research topic in the field of computer vision. It is an crucial task to be conducted in plenty of practical applications such as public security management, disaster relief and intelligent surveillance. The existing thermal infrared based pedestrian detection algorithms are generally composed of two steps. In the first step, several regions of interest (ROI) in thermal infrared imageries that are suspected to be containing human targets are generated. Afterwards it is to be verified whether the regions of interest is a human target in the second step. The verification can be conducted by processing with a classifier after extracting features from the ROIs, and the classification task can be combined with the features extraction task by adopting a deep learning method. However, most of the existing thermal infrared based pedestrian detection algorithms rely greatly on the assumption that the gray value of the human target in the image is higher than the environment in their first step, making the algorithms less efficient dealing with the case if higher ambient temperature. When the ambient temperature increases, the gray value inversion tends to occur, that is, the environmental gray value in the thermal infrared imagery turns out to be higher than the human target gray value, which will reduce the accuracy of the pedestrian detection algorithm. Under the circumstances, a fully convolutional network pedestrian detection algorithm based on frequency domain saliency detection is proposed, aiming at improving the robustness of pedestrian detection systems for thermal infrared surveillance scenes and achieving better accuracy in the pedestrian detection. Method In the algorithm, a frequency domain based saliency detection is firstly employed to generate the saliency map that can cover all pedestrian targets in the original thermal infrared imagery. What distinguishes the saliency detection based method from existing methods is that its process of detection is related to the saliency of human targets, instead of being affected by their gray value. Therefore, the generation of the following ROI map in the saliency detection based method is not restricted to the assumption that the gray value of the human target is higher, avoiding the inaccuracies in detection caused by the failure of the assumption when ambient temperature is high. Besides, one full-size saliency map instead of several sub-regions is generated in this algorithm. Then, a fully convolutional network is constructed where the region of interest map generated by the saliency map and the thermal infrared original imagery together is defined as the network input and the pedestrian target probability map is defined as the network output. The fully convolutional network constructed in this paper consists of two parts. The first part mainly refers to AlexNet and VGG network structures, which can be regarded as feature extraction module. The second part is the probability generation module that consists of three deconvolution layers with two size kernels. The sigmoid activation function is used in the last layer to generate the probability map of the pedestrian targets and the rest layers use the ReLU activation function instead. Finally, the proposed thermal infrared pedestrian detection algorithm is trained end to end obtaining the pedestrian probability map and making the detection of pedestrian target achieved. Result The OSU thermal infrared pedestrian database in the infrared video data set OTCBVS established by the Ohio State University is employed to verify the algorithm and a comparison between the proposed algorithm and 5 currently mature algorithms is conducted. There are 10 sequences captured from single viewpoint surveillance in the database covering several weathers such as sunny, cloudy and rainy days, making it capable to perform a more comprehensive test of the efficiency of pedestrian detection algorithms. In addition to the methods that are not based on convolutional neural network, the performance of region based convolutional neural network is also plotted. The results show that the proposed algorithm can accurately detect pedestrian targets in a variety of environmental conditions. Furthermore, some sample results of different pedestrian detections are shown as well. Taking the indicator MR-FR(miss rate-false positive) as a basis for comparison, the proposed algorithm achieves an average miss rate of 7%, performing better than the existing thermal infrared based pedestrian detection methods and basic deep learning based object detection methods. The proposed algorithm achieves a higher detection rate and performs better robustness dealing with the case of gray value inversion in thermal infrared imageries. In the process of detection, the proposed algorithm can not only remove the non-pedestrian targets but also detect the most pedestrian in thermal imageries, especially when the environment scene is complex such as there are other heat sources (street lights) or at day time. Conclusion A fully convolutional network pedestrian detection algorithm based on frequency domain saliency detection for thermal infrared surveillance scenes is proposed in this paper. In the first step, a saliency detection method which is robust to the case of gray value inversion when ambient temperature is higher such as in hot summer or at day time, is employed to generate a full-size ROI map. Afterwards, a fully convolutional network is used to output a probability map of the pedestrian targets. The algorithm proposed in this paper can be trained end to end and avoids generating too many sub-regions, making it more efficient by not requiring redundant computing and storage space. Experiments are conducted and the results show that the proposed method achieves an improvement in the robustness of pedestrian detection systems in a various of complex scenes and gets a higher pedestrian detection rate, making it capable to perform better detecting pedestrian targets in thermal infrared surveillance systems.