AutoEncoder (AE) and Variational AutoEncoder (VAE)


Rahul S


Autoencoder (AE) and Variational Autoencoder (VAE) are end-to-end networks used to compress the input data. They transform the data from a higher to lower-dimensional space.

Autoencoder is used to learn efficient embeddings of unlabeled data for a given network configuration. It comprises two parts — an encoder and a decoder. The encoder compresses the data from a higher-dimensional space to a lower-dimensional space (called the latent space), while the decoder does the opposite i.e., convert the latent space back to higher-dimensional space.

The idea is to ensure that latent space captures most of the information from the dataset space. So we force input to encoder as output to the decoder, with a suitable loss function and backpropagation, come to right weights.

For a little mathematics, one can check the following diagram and equations.


  • Input data x is fed to the encoder function e_theta(x).
  • x is passed through a series of layers (parametrised by the variable theta) which reduces its dimensions to achieve a compressed latent vector z.
  • The number of layers, type and size of the layers, and the latent space dimension are hyperparameters.
  • The idea is the get rid of redundant features, and keep the most important ones describing the data.


  • The decoder d_phi(z) comprises near-compliment layers of the layers used in the encoder but in reverse order.
  • A near-complement layer of a layer is the one that can undo the operations (to some extent) of the original layer such as transposed conv layer to conv layer, pooling to unpooling, fully connected to fully connected, etc. If you know U-Net is a famous architecture.


The loss function, as usual, mathematizes the difference between input and output. It is kept as the mean squared error. Minimizing loss means we want the…