许茗,于晓升,陈东岳,吴成东,贾同,茹敬雨(东北大学信息科学与工程学院, 沈阳 110819;东北大学机器人科学与工程学院, 沈阳 110819)
目的 复杂热红外监控场景中的行人检测问题是计算机视觉领域的重要研究内容之一，是公共安全、灾难救援以及智慧城市等实际应用中的重要基础任务。现今的热红外行人检测算法大多依据图像中人体目标的灰度值高于场景环境这一假设，导致当环境温度升高热红外图像发生灰度值反转时行人检测率较低。为提高行人检测系统在不同场景中的鲁棒性以及行人目标检测率，提出一种面向热红外监控场景的基于频域显著性检测的全卷积网络行人目标检测算法。方法 该算法首先对热红外图像进行基于频域的显著性检测，生成对行人目标全覆盖的显著图；然后结合热红外原图像生成感兴趣区域图作为输入，以行人目标概率图为输出，搭建全卷积网络；最后，对热红外行人检测系统进行端对端训练，获取网络输出的行人目标概率图，进而实现行人目标检测。结果 论文使用俄亥俄州立大学建立的红外视频数据集OTCBVS中的OSU热红外行人数据库对算法进行验证，与目前5种较为成熟的算法进行对比。实验结果表明，本文算法可以在各种场景中准确检测出行人目标，以MR-FP（丢失率—假阳率）为对比依据，本文算法7%的平均丢失率低于其他算法，具有更高的检测率，对热红外图像中的灰度值反转问题具有更好的鲁棒性。结论 本文提出一种面向热红外监控场景的基于频域显著性检测的全卷积网络行人目标检测算法，在实现检测算法端对端训练的同时，提高了其对各种复杂场景的鲁棒性以及行人目标检测率，提升热红外监控系统中行人目标检测性能。
Pedestrian detection in complex thermal infrared surveillance scene
Xu Ming,Yu Xiaosheng,Chen Dongyue,Wu Chengdong,Jia Tong,Ru Jingyu(College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;Faculty of Robot Science and Engineering, Northeastern University, Shenyang 110819, China)
Objective Pedestrian detection in complex thermal infrared surveillance is an important research topic in the field of computer vision. Pedestrian detection is a crucial task to be conducted in several practical applications, such as public security management, disaster relief, and intelligent surveillance. Existing thermal infrared-based pedestrian detection algorithms are generally composed of two steps. In the first step, several regions of interest (ROI) in thermal infrared imageries that are suspected to be containing human targets are generated. Subsequently, the second step verifies whether the ROI is a human target. The verification can be conducted by processing with a classifier after the extraction of features from the ROIs, and the classification task can be combined with the feature extraction task by adopting a deep learning method. However, most of the existing thermal infrared-based pedestrian detection algorithms remarkably rely on the assumption that the gray value of the human target in the image is higher than the environment in their first step, which renders the algorithms ineffective in dealing with high ambient temperature. The gray value inversion occurs with the increase of ambient temperature, that is, the environmental gray value in the thermal infrared imagery becomes higher than the human target gray value, which reduces the accuracy of the pedestrian detection algorithm. On this basis, a fully convolutional network pedestrian detection algorithm based on frequency domain saliency detection is proposed, which aims to improve the robustness of pedestrian detection systems for thermal infrared surveillance scenes and to achieve better accuracy in pedestrian detection. Method In the algorithm, a frequency domain-based saliency detection is first employed to generate the saliency map that can cover all pedestrian targets in the original thermal infrared imagery. The difference of the saliency detection-based method from existing methods is that its detection is related to the saliency of human targets rather than the effect of their gray value. Therefore, the generation of the following ROI map in the saliency detection-based method is not limited to the assumption that the gray value of the human target is high, which avoids the inaccuracies in detection caused by the failure of the assumption when ambient temperature is high. In addition, one full-size saliency map is generated in this algorithm rather than several sub-regions. Then, a fully convolutional network is constructed, where the ROI map generated by the saliency map and thermal infrared original imagery is defined as the network input, and the pedestrian target probability map is defined as the network output. The constructed fully convolutional network consists of two parts. The first part mainly refers to AlexNet and VGG network structures, which can be regarded as feature extraction module. The second part is the probability generation module that consists of three deconvolution layers with two size kernels. A sigmoid activation function is used in the last layer to generate the probability map of pedestrian targets, and the remaining layers use the ReLU activation function. The proposed thermal infrared pedestrian detection algorithm is trained to obtain the pedestrian probability map and achieve the detection of pedestrian target. Result The Ohio State University (OSU) thermal infrared pedestrian database in the infrared video dataset of OTCBVS, which has also been established by OSU, is employed to verify the algorithm, and a comparison between the proposed algorithm and five existing mature algorithms is conducted. A total of 10 sequences are captured from single viewpoint surveillance in the database that covers several weathers, such as sunny, cloudy, and rainy days, which enables the conduct of a comprehensive test on the efficiency of pedestrian detection algorithms. Apart from the methods that are not based on convolutional neural network, the performance of region-based convolutional neural network is plotted. The results show that the proposed algorithm can accurately detect pedestrian targets in various environmental conditions. Furthermore, the several sample results of different pedestrian detections are shown. Taking the miss rate-false positive indicator as a basis for comparison, the proposed algorithm achieves an average miss rate of 7% and performs better than the existing thermal infrared-based pedestrian detection methods and basic deep learning-based object detection methods. The proposed algorithm achieves a high detection rate and shows better robustness in dealing with gray value inversion in thermal infrared imageries. In the detection process, the proposed algorithm can remove the non-pedestrian targets and detect the most pedestrians in thermal imageries, especially when the environment scene is complex, such as the existence of other heat sources (street lights) or at day time. Conclusion A fully convolutional network pedestrian detection algorithm based on frequency domain saliency detection for thermal infrared surveillance scenes is proposed in this study. In the first step, a saliency detection method, which is robust to gray value inversion when the ambient temperature is high, such as in hot summer or at day time, is employed to generate a full-size ROI map. Subsequently, a fully convolutional network is used to output the probability map of pedestrian targets. The proposed algorithm can be trained and avoids the generation of many sub-regions, which renders it efficient without the requirement of redundant computing and storage space. Experiments are conducted, and the results show that the proposed method achieves an improvement in the robustness of pedestrian detection systems in various complex scenes and obtains a high pedestrian detection rate. The experimental results also verify the capability of the proposed method to enhance the detection of pedestrian targets in thermal infrared surveillance systems.