Current Issue Cover
自然场景图像与合成图像的快速分类 “NCIG2016“


摘 要
随着现代通信和传感技术的快速发展,互联网上多媒体数据日益增长,既为人们生活提供了便利,又给信息有效利用提出了挑战。为充分挖掘网络图像中蕴含的丰富信息,同时考虑到网络中图像类型的多样性,以及不同类型的图像需要不同的处理方法,本文针对当今互联网中两种主要的图像类型:自然场景图像与合成图像,设计层次化的快速分类算法。方法 该算法包括两层,第1层利用两类图像在颜色,饱和度以及边缘对比度上表现出来的差异性提取全局特征,并结合支持向量机 ( SVM) 进行初步分类,第1层分类结果中低置信度的图像会被送到第2层中。在第2层中,系统基于词袋模型 (Bag-of-Words) 对图像不同类型的局部区域的纹理信息进行编码得到局部特征并结合第2个SVM分类器完成最终分类。针对层次化分类框架,文中还提出两种策略对两个分类器进行融合,分别为分类器结果融合与全局 局部特征融合。为测试算法的实用性,同时收集并发布了一个包含超过30 000幅图像的数据库。结果 本文设计的全局与局部特征对两类图像具有较强的判别性。在单核Intel Xeon(R) (2.50GHz) CPU上,分类精度可达到98.26%,分类速度超过40帧/s。另外通过与基于卷积神经网络的方法进行对比实验可发现,本文提出的算法在性能上与浅层网络相当,但消耗更少的计算资源。结论 本文基于自然场景图像与合成图像在颜色、饱和度、边缘对比度以及局部纹理上的差异,设计并提取快速有效的全局与局部特征,并结合层次化的分类框架,完成对两类图像的快速分类任务,该算法兼顾分类精度与分类速度,可应用于对实时性要求较高的图像检索与数据信息挖掘等实际项目中。
Fast classification of natural scene and born-digital images

Liu Guoshuai,Zhong Weifeng,YIn Fei,Liu Chenglin(Institute of Automation of Chinese Academy of Sciences)

Objective The rapid development of the Internet, smartphones, sensing, and communication technology, have resulted in the rapid increase of multimedia data on the Internet, such as texts, images, and videos, which provide rich information and great convenience to our life. By contrast, it becomes increasingly difficult to exploit the information embedded in the heterogeneous data. To effectively extract the contents embedded in web images, classifying the images into different types is beneficial so that the contents can be fed to different procedures for detailed analysis. In this paper, a hierarchical algorithm is proposed for the fast genre classification of natural scene images and born-digital images, which are the most prevalent image types on the Web. Method Our algorithm consists of two stages; the first stage extracts certain global features, such as coherence of highly saturated pixels, average contrast of edge pixels, and color histogram. All feature measures are designed based on distinct differences between natural scene images and born-digital images. Specifically, the coherence of highly saturated pixels focuses on measuring different patterns of color transitions from pixel to pixel appearing in the two types of images. Natural scene images often depict objects of the real world, and neither regions of constant color nor coherent pixels of highly saturated are common in this type of image because of the natural texture of objects, noise, and diversity of illumination conditions. By contrast, born-digital images tend to have larger regions of constant color and more blocks consisting of highly saturated pixels. The second measure describes the edge contrast. Typically, edges in born-digital images occur between regions of constant color and the transitions are very steep, while their counterparts in natural scene images often correspond to boundaries between objects of the real world and are much smoother for light variations and shading. We introduce color histogram as the third global measure considering that certain colors are much more likely to appear in born-digital images than in natural scene images. The global features are fed into a support vector machine (SVM) classifier to classify an image after extraction. Global features are not very discriminative for separating confusing images but are successful in capturing the difference of appearance between most common natural scene images and born-digital images. To this end, we introduce the second stage. Images assigned as low confidence by the first stage classifier are processed by the second stage, which extracts local texture features represented in the bag-of-words framework and uses another SVM classifier for final classification. In this stage, three types of local patches are exploited, namely, local smooth patch, local edge patch, and local random patch, and extracted corresponding texture features. The local binary pattern feature is used to describe the first two types of local patches and reduced color index histogram for local random patches. In comparison with global features, these local descriptors represent the different patterns of color transitions and properties of edges between two types of images in a more detailed way. All local descriptor vectors are quantized using locality-constrained linear coding (LLC) algorithm. A two-step clustering process is also adopted to achieve a more discriminative codebook. Initially, the k-means clustering algorithm finds certain sub-centers for each image in the training set, and then all sub-centers are gathered and clustered again to generate a final codebook. We build the codebook for each type of local descriptor individually, generate three representation vectors for each image in the bag-of-words framework, and concatenate them into a final local feature vector for the second classifier. In addition, two strategies are designed to train the second classifier and generate the final label in the second stage depending on the way the local feature is used. The first strategy is to train the second classifier in global and local feature space with all training samples and directly use its prediction label as the final classification result. In the second strategy, we only use local features to train the second classifier and then fuse the posterior probability vectors of two classifiers with different weight coefficients. The image is then categorized into the class, which has higher confidence according to the fused result. A database containing more than 30,000 images from various sources is also built to experimentally validate the effectiveness of our proposed method. Result The discriminative global and local features proposed in this paper are effective and efficient for classifying natural scene images and born-digital images. An overall accuracy of 98.26% and a processing speed of over 40 fps (frames per second) are obtained on our test image set on an Intel Xeon(R) (2.50 GHz). In our experiments, the proposed hierarchical framework could present comparable accuracy with direct classification using both global and local features but at faster speed. We also compare our method with a deep neural network, the (convolutional neural network) CNN-based method, which is very popular in the area of image classification recently. The selected CNN architecture is the typical LeNet-5. Experiment results show that our method is comparable with the CNN-based method in terms of classification accuracy but consumes much less computer memory, which means that our algorithm is faster in the case of using limited computational resources. In addition, the CNN suffers from heavy computation in both training and testing and is usually implemented using the GPU for parallel computation. Therefore, it is not suitable for fast genre classification of images, which are of huge numbers on the Web. Conclusion In this paper, a fast classification algorithm is proposed to classify web images into two major types, namely, natural scene images and born-digital images. We likewise contributed a database containing over 30 000 images for future research. The hierarchical classification algorithm developed consists of two stages for a good tradeoff between classification accuracy and processing speed. It can likewise be used in large scale and real-time image based retrieval systems and other practical data-mining applications as an effective pre-filter model.