吴绍根,聂为清,路利军,刘娅琴(广东轻工职业技术学院, 广州 510300;广州无线电集团, 广州 510656;南方医科大学生物医学工程学院, 广州 510515)
目的 形状的表示和匹配是计算机视觉和模式识别领域的重要问题。在基于区域的形状表示方法中出现了一批典型的方法，包括Hu不变矩方法（Hu不变矩）、角径向变换方法（ART方法）、通用傅里叶描述子方法（GFD方法）、拉东柱状图方法（HRT方法）和多尺度积分不变量方法（MSⅡ方法）等。由于这些方法出现的时间跨度长且在以往的对比研究中研究维度单一，因此需要对这些方法的综合性能做一个全面的比较分析和研究，为下一步的理论研究和实际应用提供方向和指导。方法 采用3个基准形状库，包括简单几何图形形状库、MPEG-7形状库和汽车商标形状库，从3个维度，包括检索得分、检索稳定性和方法的计算复杂度，使用加权综合评估模型对典型的基于区域的形状表示方法进行比较分析，综合评估各种方法的综合性能指标。结果 在综合性能上GFD方法具有最优的效果，其次是ART方法；由于HRT方法在匹配计算阶段具有较高的时间复杂度，在大规模形状库匹配的场景下性能会下降；Hu不变矩和MSⅡ方法的实验效果均不理想。通过比较研究还发现，将形状正交投影到正交基函数是提取形状视觉特征的有效方式。进一步猜想，将图像正交投影到正交基函数也是提取图像视觉特征的有效方式。因此，未来的研究中，寻找理想的正交基函数是提取形状乃至图像视觉特征的重要研究方向。结论 在5种比较研究的方法中，GFD方法和ART方法在综合效果要好于HRT方法、Hu不变矩方法和MSⅡ方法，并且寻找理想的正交基函数是未来形状表示的重要研究方向。
Comparative study of classic region-based shape descriptors
Wu Shaogen,Nie Weiqing,Lu Lijun,Liu Yaqin(Guangdong industry technical college, Guangzhou 510300, China;Guangzhou Radio Group, Guangzhou 510656, China;Southern Medical University, Guangzhou 510515, China)
Objective Shape representation and shape matching are the basic tasks in computer vision and pattern recognition. Among all the region-based methods, several classic methods are already available, including Hu moment invariants (Hu method), angular radial transform (ART method), generic Fourier descriptor (GFD method), histogram of Radon transform (HRT method), and multi-scale integral invariant (MSⅡ method). Given the long time spans between all these proposed methods and the fact that only one factor (i.e., retrieval accuracy) is always compared in the past studies, we need a comprehensive comparative study of all these methods to help in application engineering and in future studies. Method To compare the different aspects of the five shape descriptors, we utilize three shape databases. The first shape database, which is a group of simple geometry and one that we modified, includes ten seed shapes. From each of these ten seed shapes, we construct three more shapes through non-rigid deformation, with increasing deformation from the first one to the third one, that is, 40 basic shapes constructed in total. Finally, for each of these 40 basic shapes, we obtain another three more similar shapes by random scale, random rotation, and random translation. Consequently, 160 shapes in the first shape database are constructed. In the retrieval test of the first database, we define a new rule of test scoring, which not only count the retrieval score but also considered the retrieval result order. Therefore, this new test score rule can inspect the intrinsic representation and retrieval ability of the shape descriptor. The second shape database we used in our comparative experiments is the MPEG-7, which consists of 70 different shape categories with each category consisting of 20 shapes of the same category modulo with various rigid and non-rigid deformations and is the standard shape database for shape descriptor and shape retrieval. The experiments are performed on 1400 shape images. Test score for MPEG-7 shape database is based on bullseye score, which counts the number of shapes in the same category (20 shapes in this case) within 40 best matching shapes. The third shape database we used in our experiments is the collection of vehicle trademark shapes. We collect 100 common vehicle flags, such as from Bents, BMW, and Toyota. For each of these 100 vehicle flag shapes, we construct three additional shapes by random scale, random rotation, and random translation and another three shapes using the random perspective parameters. Thus, we obtain a total of 700 vehicle flag shapes. The test score we used for this third database is also based on bullseye score, which counts the number of shapes in the same category (seven shapes in this case) within 14 best matching shapes. In all retrieval experiments for all the three shape databases, we not only compute the test scores but also the retrieval stability using the standard deviations of retrieval scores. We analyze and verified the computation complexity of the compared shape descriptors. After obtaining the test scores, retrieval stability, and computation complexity, a weighted formula, which considers all the three factors, is also defined to measure their comprehensive performances. Result In the retrieval experiments of the first simple geometry shape database, the HRT method achieves the best test score and lowest standard deviation, with the GFD method following, which indicates that HRT is not sensitive to noise in comparison with the other methods. In the retrieval experiments of the second shape database, ART and GFD methods obtain almost the same retrieval scores. In the retrieval experiments of the third shape database, GFD, ART, and HRT methods almost achieve the same retrieval score. In all of the retrieval experiments, the five compared methods are all invariant to scale, rotation, and translation, which are the fundamental requirements for a shape descriptor. We analyze and verify the computation complexity of the 5 methods and find that in the stage of feature creation, Hu method has the lowest computation complexity, and in the stage of feature matching, except for HRT method, all the other four methods have low matching computation complexity. The comprehensive performances of these five methods are measured by a weighted formula, and the GFD method has the best performance with ART as the next. HRT method can degrade with large number of shapes, because HRT method has higher complexity in matching phase than the other methods. The performances of Hu and MSⅡ methods are not satisfactory in all our experiments. The visual features of a shape can also be captured practically by the method of projecting shape onto a basis of orthogonal base functions. In this study, we suppose that the visual features of an image can also be captured practically by the same projection method. Conclusion Among all the evaluated region-based methods, GFD and ART methods have the best performance, and we suppose that they can be employed in engineering applications. Finding new basis of orthogonal base functions may be a fruitful research direction in shape visual feature extraction, as well as in image visual feature extraction, because projecting a shape onto the orthogonal base functions can capture its intrinsic vision features.