Developing an image editing application to edit images to have a desired attribute using Generative Adversarial Networks(DCGAN and CycleGAN)
“I wonder how I would have looked in this picture if I had a hat on…”, “If only I were smiling in this one…” Going through our phone galleries, we have all had similar thoughts every now and then. But you can’t change a clicked picture, right?
Yes we all have been clicking a lot of photographs, and though there’s a plethora of features available in today’s image editing applications, it is time to imagine what the ‘next generation of image editing tools’ would be like. These new tools would allow users to edit an image to have a desired attribute or feature such as ‘add a hat’ or ‘change hair color’, to create new images.
To make this possible, we need an algorithm which can learn different features of an image, edit them as required and then generate a new image. Generative Adversarial Networks or GANs developed by Ian Goodfellow [1] do a pretty good job of generating new images and have been used to develop such a next generation image editing tool.
GANs, a class of deep learning models, consist of a generator and a discriminator which are pitched against each other. The generator is tasked to produce images which are similar to the database while the discriminator tries to distinguish between the generated image and the real image from the database. This conflicting interplay eventually trains the GAN and fools the discriminator into thinking of the generated images as ones coming from the database. For a simplified tutorial on GANs refer this.
In this post, I would go through the process of using a GAN for the image editing task.
Deep Convolutional Generative Adversarial Network (DCGAN) as proposed by Radford [2] uses strategies to better train GAN. In this work, I have implemented the DCGAN model for this task. I have used the CelebA Face Database [3] which has 200,000+ images of celebrities with over 40 labeled attributes such as smiling, wavy hair, moustache etc.
The generator of the DCGAN model takes in a vector of 100 dimensions, also known as z-vector and converts this into an image similar to the images present in the database. While training, the generator learns to represent this z-vector into a facial image accounting for all the attributes. Some of the generated images after sampling the generator are shown below. Follow this link for the DCGAN code.
DCGAN model is an unconditional model i.e. the image attributes aren’t used to train it rather the attributes are used to manipulate images using an encoder-generator pair.
As the name says, an encoder is used to encode an image to a small representation. In this case, it converts an image to a z-vector of 100 dimensions which is taken as an input by the generator of DCGAN model so that the generator produces an image similar to the encoder input.
I have used the DeepConv Encoder model as proposed in this paper. Some of the generated samples of this encoder-generator pair are listed below. For the implementation of the encoder model, follow this link.
Input Image | Output Image |
---|---|
The z-vector has an interesting property; it can be easily manipulated using arithmetic operations allowing for manipulation of images to create new images.
The encoder produces an encoding or a z-vector of the input image. This latent representation (z-vector) is manipulated with the representation of some desired attribute. The ‘edited’ z-vector is then passed as input to the generator, which produces an image similar to the input image but having the desired attribute.
To find the representation of different attributes, all database images are passed through the encoder. The average encoding of all the images which don’t have a particular attribute is then subtracted from the average encoding of the ones which do. For example – To compute the representation of ‘smiling’, all images are first encoded. Now, let the average encoding of images having the attribute ‘smiling’ be ‘s’ and the average encoding of all images not having that attribute be ‘ns’. Then the z-vector for smiling becomes, z = s – ns.
Orignal Image | Encoded Image | Smiling | Bushy Eyebrows | Mustache | Wavy Hair |
---|---|---|---|---|---|
Orignal Image | Encoded Image | Male | High Cheekbones | Blond Hair | Bangs |
---|---|---|---|---|---|
Cycle Consistent Generative Adversarial Networks (CycleGANs) [4] were developed to solve the problem of unpaired image-to-image translation, mapping images from one domain to the other domain in an unsupervised manner.
The CycleGAN model is made up of two GANs which train in a fashion similar to that of DCGAN, but they use an extra ‘cyclic loss’ term to account for the cycle consistency between the input image and the generated sample.
Image from one domain is passed to the model which converts this to an image of the other domain. This generated sample is then converted back to the first domain to maintain the cyclic nature. My implementation the CycleGAN model is available here. For a tutorial on CycleGAN refer this. Below are some samples generated usin CycleGAN.
Input Image | Output Image |
---|---|
Input Image | Output Image |
---|---|
Input Image | Output Image |
---|---|
Input Image | Output Image |
---|---|
Thanks for reading.