Current Issue Cover


摘 要
目的:本文提出一种定位图像匹配尺度及区域的有效算法,通过实现当前屏幕图像特征点与模板图像中对应尺度下部分区域中的特征点匹配,实现摄像机对模板图像的实时跟踪,解决三维跟踪算法中匹配精度与效率问题。方法:在预处理阶段,算法对模板图像建立多尺度表示,各尺度下的图像进行区域划分,在每个区域内采用ORB方法提取特征点并生成描述子,由此构建图像特征点的分级分区管理模式。在实时跟踪阶段,对于当前摄像机获得的图像,首先定位该图像所对应的尺度范围,在相应尺度范围内确定与当前图像重叠度大的图像区域,然后将当前图像与模板图像对应的尺度与区域中的特征点集进行匹配,最后根据匹配点对计算摄像机的位姿。结果:利用公开图像数据库(Stanford Mobile Visual Search Dataset)中不同分辨率的模板图像及更多图像进行实验,结果表明,新算法性能稳定,配准误差在1个像素左右;系统运行帧率总体稳定在20~30帧/秒。结论:与多种经典算法对比,新方法能够更好地定位图像匹配尺度与区域,采用这种局部特征点匹配的方法在配准精度与计算效率方面比现有方法有明显提升,并且当模板图像分辨率较高时性能更好;特别适合移动增强现实应用。
Real-time Camera Pose tracking with locating image patching scales and regions

Miao Jinghua,Sun Yankui(Tsinghua University)

Objective: In conventional augmented reality system, for a template image, its multi-scale image representations is constructed first, then feature key points at each scale are extracted and put together as a template feature set, which are used to match with the feature points extracted from camera images. The number of feature points of the template image would become large when the number of scales in the template image representations is big. Yet camera images just correspond to images within a scale range similar to the scale of the camera image, and they probably overlapped with these images in partial regions. This means that there is much useless computation in conventional feature matching algorithms, which lower image matching speed and decreases registration accuracy at the same time. To solve this problem, this paper proposes an effective method to locate image matching scales and regions in camera pose tracing. By using local feature patching between current camera image features and the corresponding image scales and regions’ features of template image pyramid, it achieves real-time computation of camera pose by feature matching pairs to solve feature matching accuracy and efficiency problem of traditional three-dimentional tracing method. Method: In preprocessing stage,scale-space layers of a template image are constructed firstly. Concretely, an image is obtained by down-sampling the original image by a factor of 1.5, and it is sequenced as the second layer. Then, the other layers are formed by progressively half-sampling the original image and the second layer image and putting the two sequences alternately, on the conditions that image resolution at the maximum layer is just less than that of the screen image specified.; secondly, key frame structure for each layer image is built. Specifically, each layer image is partitioned into the same rectangular regions, which could be overlapped when necessary. The size of the rectangular is selected as that of the layer image at the maximum scale in scale-space layers. In each region, its feature points are extracted and binary descriptors are generated by using ORB (oriented FAST and rotated BRIEF) algorithm Put every rectangular position, sub-image and the feature points within it together to form a key frame structure. By this way, the feature descriptors of the image pyramid are managed according to scales and regions. In real-time tracking stage,for any camera image, its scale range within the image pyramid is located first, then its covered image regions within this scale range are found using a defined overlapping degree rules. This decreases the scope of feature matching between current camera image features and features of template image pyramid greatly, and could improve feature matching accuracy and efficiency by using local feature matching. 1) Locating scale range. Since a camera image is obtained in a distance to a template image, it essentially corresponds to a scale range in image pyramid of the template image, and overlaps with some image regions in the scale range. This paper suggests a method to locate the scale range. It first predicts current camera pose in two ways: using the last frame camera pose and predicting it by Kalman filtering, then four vertices of the original image are projected on the screen image with the evaluated camera pose, finally, the projection area size is obtained and used to compare with the layer image sizes in the image pyramid to determine the scale range. 2) calculating region overlapping degree. For layer images within the scale range, we project all their key frame regions onto the screen image with the evaluated camera pose to calculate the areas of the overlapped regions, then region overlapping degree is calculated by our method. 3) Local feature extraction and matching. For a camera image, a number of key frames with large region overlapping degrees are got by using the last frame camera pose as evaluation; some other key frames are obtained similarly by using pose evaluation from Kalman filtering. We take the union of the two key frame set and match all their feature points with those extracted from the camera image by ORB algorithm and compute camera pose by some matching pairs. Result: The new algorithm is implemented and run on smartphone, tested on open image database(Stanford Mobile Visual Search Dataset)with different resolution images, and on more other template images. It compares with four state-of-the-art algorithms including FLISA(Fast Locating of Image Scale and Area)、ORB(oriented FAST and rotated BRIEF)、FREAK(fast retina keypoint)and BRISK(binary robust invariant scalable keypoints). In experiments, videos are recorded and used for all testing template images, where camera translations, rotations and scaling related template images are included, optimal parameters of ORB、FREAK、BRISK algorithms are selected by analysis and tests, and registration error and running frame rates are tested before and after integrating our feature matching algorithm with optical flow algorithm respectively. Experimental results show that our new algorithm is robust, and it has high registration accuracy with about one pixel and has real-time 3d tracing rate with 20~30 frames per second. Conclusion: The algorithm can locate image scale and region much better than before. Feature patching accuracy and speed between current camera image and template image increases obviously compared with several classic algorithms, especially when the resolution of image is high. It could be used to the tracking of natural image on mobile platform.