An Intuitive Introduction to Generative Adversarial Networks (GANs)

Explaining the magic of GANs in computer vision

Published in

TDS Archive

7 min readFeb 23, 2022

**Generator-Discriminator Intuition** (Image by Author)

Consider an art forger trying to create fake paintings of Leonardo da Vinci and a detective who is trying to identify if the painting is a real or a fake painting.
Initially the forger may not be great at creating replicas of da Vinci’s paintings and it might be easier for the detective to spot the fake paintings.
With time, the forger learns the key components that the detective checks in the painting to classify it as a real painting or a fake painting. Similarly, as the detective sees more of the real and the fake paintings, the detective is able to learn the finer details required to differentiate between the two.
Eventually the forger will learn the trick so well that the forger will start fooling the detective into believing that the fake paintings are the original paintings of da Vinci. Now we can say that the training for the forger is complete.

This is exactly how the GAN works.

Introducing GANs

Generative Adversarial Networks or GANs is one of the amazing innovations of the decade that has led to many state-of-the-art products in the recent times. GAN was first introduced in 2014 by Ian Goodfellow et al. in the paper Generative Adversarial Networks. Since its inception there have been several variants of the GANs serving different needs.

Generative Adversarial Networks (GANs) is a deep-learning-based generative model that discovers underlying patterns in the input data and generates new samples from it. Don’t get intimidated by the name or the definition, once you reach the end of this article you will be awed by its simplicity.

The basic GAN architecture has two neural networks, the generator & the discriminator. The generator is the forger generating fake data and the discriminator is the detective whose main role is to classify the input data as real or fake. Once the generator is able to create data that can fool the discriminator into thinking that it is real, then the model training is complete and the generator is trained to create ‘good’ fake data.

Deep Convolutional Generative Adversarial Networks or DCGANs is one of the popular approaches of GAN that is specifically used for image data. The DCGAN adds convolutional layers between the input noise vector and output image in the generator. In addition, the discriminator uses convolutional layers to classify the actual and generated images as real or fake.

GAN — Supervised or Unsupervised Learning?

GAN uses two neural networks internally, the generator and the discriminator.

The generator network generates an image from a latent vector (noise vector in the case of DCGAN) and is a type of unsupervised machine learning.
The discriminator network classifies an image as real or fake and is a type of supervised learning as the labels (real/fake) are used to train the models.

Now moving to the working of GAN, during training GAN takes in two inputs, random noise data and an input data that is not labelled. Using these two inputs, it generates data that resembles the input data. Since all the inputs to the GAN are unlabelled, GAN is a type of unsupervised machine learning.

GAN Model Training

Internally GAN has two neural networks that compete with each other. The goal of the generator network is to deceive the discriminator network and the goal of the discriminator network is to correctly identify if the input is real or fake.

Training the Generator

In our forger-detective story, this is how the forger learns with every iteration:

**Generator learns from its mistakes** (Image by Author)

The generator architecture is a neural network which takes in the noise input vector and converts it to an output image (in the case of DCGAN).

Here is an example of the generator architecture picked from the DCGAN paper. As seen in the below architecture, the input to the generator is a noise vector z of size 100. This is first projected and reshaped to 1024x4x4 data followed by convolution layers of 512x8x8, 256x16x16 and 128x32x32. Finally this is fed into the output layer to generate a 64x64 RGB image.

Example DCGAN generator (Source: DCGAN paper)

Training the Discriminator

Here is how the detective learns with every iteration:

**Discriminator learns from previous predictions** (Image by Author)

Discriminators are simple binary classifiers that classify the input data as real or fake. In the case of DCGAN, the discriminator is a convolutional neural network (CNN) that does a binary classification. The input to this CNN is an image and the discriminator needs to classify it as a real image or a fake image.

GAN Loss — Intuition

As discussed earlier, there are two neural networks involved in the GAN architecture and hence there are two types of losses that needs to be back propagated during the model training step.

Discriminator Loss

The discriminator classifies both the fake data from the generator and the actual data.

When it classifies the generator data as fake or the actual data as real, then it is doing a good job otherwise it is misclassifying the output and the model needs to be retrained for this input.

Hence when the discriminator classifies a fake image as real or a real image as fake, it is penalised with the discriminator loss and this loss is back-propagated through the discriminator network to update the weights and biases.

Note that the generator weights and biases will be frozen while the discriminator is being trained.

Generator Loss

The role of the generator is to create an image from the noise input and it does not have any labelled information on the type of output expected. Hence the generator uses the output of the discriminator to update its network parameters.

If the discriminator misclassifies the image generated from the generator as a real image then the generator is doing its job but if the discriminator can identify that the generated image is a fake, then the generator needs to be updated.

Hence when the discriminator identifies a fake image and classifies it as fake, then the generator loss is computed and back propagated to the generator network to update its weights and biases.

Note that the discriminator weights and biases will be frozen while the generator is being trained.

To get a basic understanding of the Math behind GANs, refer to this follow-up article: Decoding the Basic Math in GANs

What’s cool about GANs?

Here is a brief example of some of the GANs in action.

Image to Image Translation — Unpaired Approach

Here image to image translation is done from one domain to another without the need to have any relationship between the input image and the output image. There are different types of GANs that implement this approach, like the CycleGAN, DualGAN, DiscoGAN.

Examples:

Changing from horse domain to zebra domain
Changing a summer image to a winter image

Source: Cycle-Consistent Adversarial Networks

Image to Image Translation — Paired Approach:

Here there is a natural mapping between the input and the output or there is a mapped output image for each input image. Conditional adversarial networks use this approach.

Examples:

Changing a street-view map to satellite-view map
Black & White to colour image
Sketch to actual image

Source: Conditional Adversarial Networks

Create Art Generation

Creative Adversarial Network (CAN) is an extension of GAN where the model learns the style of an artist and creates images of new paintings that match the style. Here the discriminator is trained over a large set of real artwork from painters and it can accurately classify an image as real/fake. In addition, the discriminator also classifies the time period to which the artwork belongs to. The generator will use this additional time period metric along with the real/fake classification to train itself.

And few more….

Generating 3D models from 2D images
Convert text captions to images
Generate high resolution images from low resolution images
Generating realistic images of people who don’t exist
Converting photos to emojis
Creating medical images from one domain to another

Conclusion

This is just a small sample of applications demonstrating the power of GANs. With thousands of research papers and several variants of GANs, it is clear that this technology is under active development and it holds a tremendous potential for the future.

As we reach the end of the article, I hope you have got a basic understanding on the working of the GAN and it helps you dive deeper into this maGANical world. Good luck!!