Current Issue Cover

杨云,张海宇,朱宇,张艳宁(陕西科技大学电气与信息工程学院, 西安 710021;西北工业大学计算机学院, 西安 710129)

摘 要
目的 基于生成式对抗网络的超分辨模型(SRGAN)以感知损失函数作为优化目标,有效解决了传统基于均方误差(MSE)的损失函数导致重建图像模糊的问题。但是SRGAN的感知损失函数中并未添加明确指示模型生成对应特征的标志性信息,使得其无法精准地将数据的具体维度与语义特征对应起来,受此局限性影响,模型对于生成图像的特征信息表示不足,导致重建结果特征不明显,给后续识别处理过程带来困难。针对上述问题,在SRGAN方法的基础上,提出一种类别信息生成式对抗网络的超分辨模型(class-info SRGAN)。方法 对SRGAN模型增设类别分类器,并将类别损失项添加至生成网络损失中,再利用反向传播训练更新网络参数权重,以达到为模型提供特征类别信息的目的,最终生成具有可识别特征的重建图像。创新及优势在于将特征类别信息引入损失函数,改进了超分辨模型的优化目标,使得重建结果的特征表示更加突出。结果 经CelebA数据集测试表明:添加性别分类器的class-info SRGAN的生成图像性别特征识别率整体偏高(58%97%);添加眼镜分类器的class-info SRGAN的生成图像眼镜框架更加清晰。此外,模型在Fashion-mnist与Cifar-10数据集上的结果同样表明其相较于SRGAN的重建质量更佳。结论 实验结果验证了本方法在超分辨重建任务中的优势和有效性,同时结果显示:虽然class-info SRGAN更适用于具有简单、具体属性特征的图像,但总体而言仍是一种效果显著的超分辨模型。
Class-information generative adversarial network for single image super-resolution

Yang Yun,Zhang Haiyu,Zhu Yu,Zhang Yanning(College of Electrical & Information Engineering, Shaanxi University of Science and Technology, Xi'an 710021, China;School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China)

Objective The use of image super-resolution reconstruction technology implies the utilization of a set of low-quality low-resolution images (or motion sequences) to produce the corresponding high-quality and high-resolution ones. This technology has a wide range of applications in many fields, such as military, medicine, public safety, and computer vision. In the field of computer vision, image super-resolution reconstruction enables the image to transform from the detection level to the recognition level, and even advance to the identification level. In other words, image super-resolution reconstruction can enhance image recognition capability and identification accuracy. In addition, image super-resolution reconstruction involves a dedicated analysis of a target. In this analytic scheme, a comparatively high spatial resolution image of the region of interest is obtained instead of directly calculating the configuration of a high spatial resolution image by using large amounts of data. The conventional approaches of super-resolution reconstruction generally include example-based model, bi-cubic interpolation model, and sparse coding methods, among others. Deep learning has been considered for many associative subjects since the advent of artificial intelligence in recent years, and substantial research achievements have been realized in this field alongside the research on super-resolution reconstruction. Convolutional neural networks (CNNs) and generative adversarial networks (GANs) have resulted in numerous breakthroughs and achievements in the domain of image super-resolution reconstruction. Examples include super-resolution reconstruction with CNN (SRCNN), super-resolution reconstruction with very-deep convolutional networks (VDSR), and super-resolution reconstruction with generative adversarial network (SRGAN). Particularly in SRGAN modeling, the single-image super-resolution technology has achieved remarkable progress, especially when the perceptual loss function instead of the traditional loss function based on the mean square error (MSE) is the optimization goal. The common problems during modeling can be effectively solved using the original loss function, and a relatively high peak signal-to-noise ratio (PSNR) can be obtained to resolve the fuzziness in the reconstruction results. However, even if super-resolution reconstruction can remarkably ameliorate image quality, a common problem is knowing how to comprehensively highlight the feature representation of reconstructed images, which then can improve the reconstruction quality of generated images. By itself, the method of super-resolution reconstruction causes an ill-posed problem; that is, images lose a certain amount of information during the down-sampling process. Therefore, the reconstruction of a high-resolution image may include the lost parts or characteristic of the corresponding low-resolution image, and this scenario inevitably leads to generative deviation. In addition, given that SRGAN does not add auxiliary trademark information into the loss function (i.e., the model should have been explicitly instructed to generate the corresponding features), the model may fail to accurately match the specific dimensions and semantic features of the data. Moreover, controllability will likely constrain the model from sufficiently representing the feature information of generated images, which then limits the model from improving the quality of reconstructed images. Such constraints pose difficulties to the subsequent identification and processing of the image. Aiming to solve the above problems, on the basis of the advantages of the SRGAN method, a super-resolution model based on the class-information generative adversarial network (class-info SRGAN) is proposed. Class-info SRGAN can be designed for the utilization of additional information variables to restrict the solution space scope of super-resolution reconstruction. Furthermore, class-info SRGAN can be used to assist the model to accurately fulfil the reconstruction task, particularly those referring to data semantic features. Method The original SRGAN model involves the adding of a class classifier and integrating the class-loss item into the generative network loss. Then, back-propagation is employed during the training process to update the parameter weights of the network and provide feature class-information for the model. Finally, the reconstructed images are produced and possessed with the corresponding features. In contrast to the original objective function, the proposed model is innovative given its merits of having to introduce feature class-information and improving the optimization objective of the super-resolution model. Sequentially, it optimizes the network training process, and it then renders the feature representation of the reconstruction results to become more prominent. Result According to the CelebA experiments, the class-loss item enables the SRGAN model to make minor changes and improve the output. A comparison of the SRGAN model with other models with gender-class information was conducted, and the differences were inconclusive, i.e., it is hard to conclude whether the model has a significant effect even if improvements were achieved to some extent. The overall gender recognition rate of the generated images from the class-info SRGAN model ranges from 58% to 97%, which is higher than the rate of those from SRGAN (8% to 98%). However, with glasses-class information, the capability of the model to learn how to form better-shaped glasses increased. The results for the Fashion-mnist dataset and Cifar-10 dataset also show that the model has a significant effect even if the final results with the Cifar-10 dataset were not highly prominent as the previous experiments. In summary, the outcomes show that the reconstruction quality of the generated images from the class-info SRGAN model are better than those of the original SRGAN model. Conclusion Class-information operates well in cases where the attributes are clear and the model has learned as much as possible. The experimental results verify the superiority and effectiveness of the proposed model in the super-resolution reconstruction task. On the basis of some concrete and simple feature attributes, class-info SRGAN will likely become a promising super-resolution model. However, to advance its application, the goals must be definite, e.g., how to develop a general class-info SRGAN that can be used for various super-resolution reconstruction tasks, how to successfully conduct class-info SRGAN with multiple attributes simultaneously, and how to integrate auxiliary class-information into the architectures of class-info SRGAN efficiently and conveniently. These assumptions can provide references and conditions for acquiring better performing super-resolution reconstruction in the future.