Current Issue Cover
生成式对抗网络及其计算机视觉应用研究综述

曹仰杰,贾丽丽,陈永霞,林楠,李学相(郑州大学)

摘 要
目的 生成式对抗网络(GAN)的出现为计算机视觉应用提供了新的技术和手段,它以独特零和博弈与对抗训练的思想生成高质量的样本,具有比传统机器学习算法更强大的特征学习和特征表达能力。目前在机器视觉领域尤其是样本生成领域取得了显著的成功,是当前研究的热点方向之一。方法 以生成式对抗网络的不同模型及其在计算机视觉领域的应用为研究对象,在广泛调研文献特别是 GAN的最新发展成果基础上,结合不同模型的对比试验,对每种方法的基本思想、方法特点及使用场景进行分析,并对 GAN的优势与劣势进行总结,阐述了 GAN研究的现状、在计算机视觉上的应用范围,归纳生成式对抗网络在高质量图像生成、风格迁移与图像翻译、文本与图像的相互生成和图像的还原与修复等多个计算机视觉领域的研究现状和发展趋势,并对每种应用的理论改进之处、优点、局限性及使用场景进行了总结,对未来可能的发展方向进行展望。结果 GAN 的不同模型在生成样本质量与性能上各有优劣。当前的GAN模型在图像的处理上取得较大的成就,能生成以假乱真的样本,但是也存在网络不收敛、模型易崩溃、过于自由不可控的问题。结论 GAN 作为一种新的生成模型具有很高的研究价值与应用价值,但目前存在一些理论上的桎梏亟待突破,在应用方面生成高质量的样本、逼真的场景是值得研究的方向。
关键词
Review of computer vision based on generative adversarial networks

caoyangjie,jialili,chenyongxia,linnan,lixuexiang(Zhengzhou University)

Abstract
Objective The appearance of generative adversarial networks (GAN) provides a new approach and framework for the applications of computer vision. GAN generates high-quality samples with unique zero-sum game and adversarial training concepts. It is therefore more powerful in both feature learning and representation comparing to traditional machine learning algorithms. Remarkable achievements have been made in the field of computer vision, especially in sample generation, which is one of the hottest topics in current researches. Method The research and application of different generative adversarial networks models based on computer vision are reviewed based on widespread research and especially the latest achievements of relevant literature. The typical GAN network methods are introduced, categorized and compared in experiments using generation samples to present their performance and the research status and development trends in the fields of computer vision are summarized, such as high-quality image generation, style transfer and image translation, text-image mutual generation, image inpainting and restoration; Finally, existing major research problems are summarized and discussed, and the potential future research directions are also forecasted. Result Since the emergence of GAN, many variations have been proposed for different fields, either structurally improved or developed in theory or innovated in applications. The different GAN models have their own advantages and disadvantages in generating examples and have made great achievements in many fields especially the computer vision and can generate examples like real ones, but it also exhibits unique problems, such as non convergence, model collapse and uncontrollability due to high degree-of-freedom. There ishardly any priori hypotheses about the data in the original GAN whose final goals are to have infinite modeling power and can fit all the distributions. In addition, the designs of GAN model are simple, it is not necessary to pre-design the complex function model, the generator and the discriminator can work normally with back propagation algorithm. GAN provides a powerful method for unsupervised deep learning models, it subverts the traditional artificial intelligence algorithms limit the machine by human thinking. Instead, it uses the machine to interact the machine and through its continuous confrontation and can learn the inherent laws in the real world after enough data training. While everything has two sides, a series of problems are hidden behind the goal of infinite modeling ability. The generation process is so free that the stability and convergence of the training process cannot be guaranteed, the model collapse is prone to occur and further training cannot achieve. The original GAN exists following problems such as disappearance of the gradients, the difficulties of training, the losses of generator and discriminator cannot indicate the training process, the lack of diversities in the generated samples and easy over-fitting. It is also difficult to generate discrete distributions due to the limitations of GAN. For the problems existing in the original GAN, many researchers proposed new ways to improve them, several landmark models which include DCGAN, CGAN, WGAN, WGAN-GP, EBGAN, BEGAN, InfoGAN, LSGAN will be introduced. The DCGAN combines GAN with CNN, which performs well in the field of computer vision. It sets a series limitations for CNN network so it can be trained stably and uses the learned feature representation to generate samples and classify images. The CGAN inputs the conditional variable c together with the random variable z and the real data x to guide the data generation process. The conditional variable c can be category labels, texts and generated targets, the straightforward improvement proves to be very effective and is widely used in subsequent work. The WGAN uses the Wasserstein distance to measure the distance between the real and generated samples instead of the JS divergence. There are following advantages of Wasserstein distance: it could measure distance even if the two distributions don’t overlap; it has excellent smoothing properties and can solve the problem of gradients disappearance to some degrees. In addition, the WGAN solves the problems of instability in training and makes the generated examples diverse, there is no necessary to balance the training of G and D carefully. The WGAN-GP replaces the weight pruning in WGAN to implement the Lipschitz constraint method. Experiments show that the quality of samples generated by WGAN-GP is higher than WGAN, it provides stable training without hyperparameters and trains a variety of generating tasks successfully. But the convergence speed of WGAN-GP is slower, it takes more time to converge under the same data set. The EBGAN interprets GAN from the view of energy, it can learn the probability distributions of images, while the speed of convergence is slow, too. When other models have been able to roughly express the outline of the objects, the images BEGAN produces are still disorganized. While the images generated by BEGAN have the sharpest edges and rich image diversities in the experiments. The discriminator of BEGAN draws lessons from EBGAN and the loss of generator refers to the loss of WGAN. It also proposes a hyper parameter which can measure the diversity of generated samples to balance D and G and stabilize the training process. The internal texture of generated images of the InfoGAN isn’t good,insides, the shape of generated objects are like each other. For the generator, in addition to the input noise z, a controllable variable c is added, which contains interpretable information about the data to control the generative results, which resulting in poor diversity. The LSGAN can generate high quality examples because of the object function of least squares loss replace the cross-entropy loss, it partly solves the two shortcomings includes low-quality and instability of training process. Conclusion The GAN has great values on both theory and practice as a new generative model. It provides a pretty good solution for solving the problems of insufficient sample, poor quality of generation and difficulties in extracting features. GAN is an inclusive framework which can combined with most deep learning algorithms to solve the problems that traditional machine learning algorithms cannot solve. However, there are some theoretical problems that need to be solved urgently. How to generate high-quality examples and realistic scene is worth to study. In the future, GAN will make further progresses in the following areas: breakthrough of theory, development of algorithm, system of evaluation, system of specialism and combination of industry.
Keywords
QQ在线


订阅号|日报

服务号|周报