ProGAN: How NVIDIA Generated Images of Unprecedented Quality

It was even more efficient (in terms of training time) than previous GANs.Growing GANsInstead of attempting to train all layers of the generator and discriminator at once — as is normally done — the team gradually grew their GAN, one layer at a time, to handle progressively higher resolution versions of the images.The ProGAN starts out generating very low resolution images..When training stabilizes, a new layer is added and the resolution is doubled..This continues until the output reaches the desired resolution..By progressively growing the networks in this fashion, high-level structure is learned first, and training is stabilized.To do this, they first artificially shrunk their training images to a very small starting resolution (only 4×4 pixels)..They created a generator with just a few layers to synthesize images at this low resolution, and a corresponding discriminator of mirrored architecture..Because these networks were so small, they trained relatively quickly, and learned only the large-scale structures visible in the heavily blurred images.When the first layers completed training, they then add another layer to G and D, doubling the output resolution to 8×8..The trained weights in the earlier layers were kept, but not locked, and the new layer was faded in gradually to help stabilize the transition (more on that later)..Training resumed until the GAN was once again synthesizing convincing images, this time at the new 8×8 resolution.In this way, they continued to add layers, double the resolution and train until the desired output size was reached.The Effectiveness of Growing GANsBy increasing the resolution gradually, we are continuously asking the networks to learn a much simpler piece of the overall problem..The incremental learning process greatly stabilizes training..This, in combination with some training details we’ll discuss below, reduces the chance of mode collapse.The low-to-high resolution trend also forces the progressively grown networks to focus on high level structure first (patterns discernible in the most blurred versions of the image), and fill in the details later..This improves the quality of the final image by reducing the likelihood that the network will get some high-level structure drastically wrong.Increasing the network size gradually is also more computationally efficient than the more traditional approach of initializing all the layers at once..Fewer layers are faster to train, as there are simply fewer parameters in them.. More details

Leave a Reply