熊昌镇,车满强,王润玲(城市道路交通智能控制技术北京市重点实验室, 北京 100144;北方工业大学理学院, 北京 100144)
目的 针对深度卷积特征相关滤波跟踪算法因特征维度多造成的跟踪速度慢及其在目标发生形变、遮挡等情况时存在跟踪失败的问题，提出了一种自适应卷积特征选择的实时跟踪算法。方法 该算法先分析结合深度卷积特征的相关滤波跟踪算法定位目标的特性，然后提出使用目标区域和搜索区域的特征均值比来评估卷积操作，选取满足均值比大于阈值的特征通道数最多的卷积层，减少卷积特征的层数及维度，并提取该卷积层的有效卷积特征来训练相关滤波分类器，最后采用稀疏的模型更新策略提高跟踪速度。结果 在OTB-100标准数据集上进行算法测试，本文算法的平均距离精度值达86.4%，平均跟踪速度达29.9帧/s，比分层卷积相关滤波跟踪算法平均距离精度值提高了2.7个百分点，速度快将近3倍。实验结果表明，本文自适应特征选择的方式在保证跟踪精度的同时有效地提升了跟踪的速度，且优于当前使用主成分分析降维的方式；与现有前沿跟踪算法对比，本文算法的整体性能优于实验中对比的9种算法。结论 该算法采用自适应卷积通道和卷积层选择的方式有效地减少了卷积层数和特征维度，降低了模型的复杂度，提升了跟踪速度，利用稀疏模型更新策略进一步提升了跟踪的速度，减少了模型漂移现象，当目标发生快速运动、遇到遮挡、光照变化等复杂场景时，仍可实时跟踪到目标，具有较强的鲁棒性和适应性。
Adaptive convolutional feature selection for real-time visual tracking
Xiong Changzhen,Che Manqiang,Wang Runling(Beijing Key Laboratory of Urban Intelligent Control, Beijing 100144, China;College of Sciences, North China University of Technology, Beijing 100144, China)
Objective In the field of object tracking, the most serious difficulty is that the object may have a motion in different degrees in each video frame. Different types of movements will cause complex scenes of the object's own non-rigid deformation, background clusters, occlusion, fast motion and so on, thereby making object tracking more difficult. The balance between high speed and high accuracy remains a challenging task, although considerable progress in enhancing the accuracy and speed of tracking has been achieved. Recently, discriminative correlation filter methods have been successfully and widely applied to the visual tracking field. The standard correlation filter method can obtain numerous training samples through a cyclic shift and can train the filters through fast Fourier transform algorithm, which can ensure real-time favorable performance and robustness. However, the tracking accuracy of the correlation filter tracking algorithms based on traditional manual features must be improved given the limitations of traditional manual features. Therefore, correlation filter tracking algorithms based on convolutional features have been proposed and developed. The correlation filter tracking algorithms based on deep convolutional features can lead to a low tracking speed considering multiple feature dimensions and tracking failure problems when the object is subjected to deformation or occlusion despite a high accuracy of such algorithms. Thus, a real-time tracking algorithm based on adaptive convolutional feature selection is proposed to solve these problems. Method First, the proposed method analyzes the characteristics of convolution features extracted from the convolutional network model trained on the classification data set and selects the multilayer convolution features suitable for object tracking. The method also analyzes the characteristics of localization prediction of correlation filter trackers based on deep convolutional features. Analysis results show that a large average feature ratio between object and search regions indicates an improved convolution operator. Thus, this study proposed the average feature ratio between object and search regions to evaluate the convolution operator of each channel of every convolution layer. Then, the feature selection strategy is applied to select the convolution layer with the most convolution channels whose feature mean ratio is larger than the threshold for each preselected convolution layer. This strategy can effectively reduce the number of layers with convolution features. Simultaneously, the strategy can reduce the dimensions of the selected convolution layer by removing the convolution features that are not larger than the threshold. Then, the correlation filter classifier is trained by extracting the remaining effective convolutional features from the selected layer. The trained classifier is used to predict the position of the object. Finally, a sparse model updating strategy is adopted to prevent overfitting of the correlation filter classifier and improve the tracking speed. Result The proposed approach is evaluated on 100 sequences of Object Tracker Benchmark (OTB-100), which mainly contains 11 challenges (e.g., variation, background blusters, low resolution and so on) that may be encountered in object tracking, and compared with 9 other state-of-the-art tracking methods. The selected benchmark, namely, center location error, distance precision, overlap precision, and one-pass evaluation is applied to evaluate the tracking algorithm. The experiments are divided into two parts. The first part analyzes the tracking results of the different pre-selected convolutional layers. This part includes the results of no dimension reduction method, dimension reduction using principal component analysis, and our adaptive feature selection method using the feature mean ratio. The average distance accuracy of our adaptive feature selection method is 86.4%, which is higher than that of other methods. Experimental results show that the method can effectively improve the tracking speed and that it is better than the current trackers which use principal component analysis in reducing feature dimensions. The second part presents the comparison of our method and the existing mainstream object tracking method. These algorithms include the original hierarchical convolutional filter tracking algorithm and other correlation filter tracking algorithms that use convolutional features or traditional manual features. The average distance accuracy of our algorithm is 86.4%, which is 2.7 percent points higher than the original hierarchical convolutional features for visual tracking algorithm. The average success rate in the proposed approach is 68.4%, which is 2.9 percent points higher than the original hierarchical convolutional filter tracking algorithm. The average tracking speed is 29.9 frame/s, which is approximately three times faster than the previous performance. The experimental results show that the adaptive feature selection method can effectively improve the tracking speed while ensuring the tracking accuracy. The overall performance is superior to the nine other state-of-the-art tracking methods in the experiment. Conclusion The feature mean ratio of the object and search regions is used to evaluate the convolution operator. The convolutional layer with the largest number of convolutional channels that satisfy the feature mean ratio threshold is selected, and the convolutional effective features of the selected convolutional layer are extracted to train the correlation filter classifier. The method not only effectively reduces the number of convolutional layers and the dimensions of the feature but also reduces the complexity of the model to improve the tracking speed by adaptively selecting convolutional channels and layers. In addition, a sparse model update strategy is utilized to further enhance the tracking speed and prevent model drifting. The proposed algorithm has excellent robustness and adaptability under complex scenes, such as occlusion, illumination change, and fast motion.