Words to Birds

Modifying AttnGAN to use Image Captioning and BERT

CS194-26 Final Project,
Lizhi (Gary) Yang, Alex Zhou

PaperGithub RepoVideo

    1. Overview

Our final project aims to tackle the problem of text-to-image generation by leveraging the advancements in the Natural Language Processing domain. Text-to-Image generation is an interesting problem as it has great potential in the art and design field. Recent approaches to this problem use GANs to generate images from text, since GANs have the ability to encode text into feature representations and use a generator and a discriminator to do self-adversarial training in order to generate realistic images. It comes naturally to just encode the whole piece of text into a global vector and use it as the condition for image generation using GANs. However, this method ignores the information at the local word level, and AttnGAN addresses this problem by using word features on top of sentence features and using the Deep Attentional Multimodal Similarity Model (DAMSM) in order to compute a fine-grained loss to incorporate into the GAN. Recently, with the advent of natural language processing, more and more opportunities exist in this field as more and more powerful new tools in the natural language processing field becomes available to us. BERT is such a prime example. We take advantage of these tools from the natural language processing realm and modify AttnGAN to use it.

    2. Method

We implement AttnGAN and swap out certain modules, namely the text encoder and the DAMSM module, with pre-trained image caption networks and BERT. Below is the modified AttnGAN architecture and some results generated by AttnGAN. More details on the internal workings of AttnGAN and how we modified it to take advantage of the pre-trained modules can be found in the paper.


This bird is brown and yellow in color with a stubby beak.

This bird has wings that are brown and a white belly.

This bird has a bill and a large black eye with a yellow throat and a grey breast.

A brown colored bird with a long tail and a very small bill in comparison to its body.

    3. Video and Paper