An Intuition of Neural Style Transfer

Part 1 on Neural Style Transfer

Rahul S
3 min readAug 11, 2023


Neural Style Transfer deals with two sets of images: Content image and Style image. It recreates the content image in the style of the Style image.

Here are the required inputs to the model for image style transfer:

  1. A Content Image –an image to which we want to transfer style to
  2. A Style Image — the style we want to transfer to the content image
  3. An Generated Image— the final blend of content and style image

NST employs a pre-trained Convolutional Neural Network with added loss functions to transfer style from one image to another and synthesize a newly generated image with the features we want to add.

With deep CNN, we meticulously segregate the representations of content and style. In this context, the VGG network emerges as a prominent player due to its remarkable ability in constructing robust semantic representations.

It is our feature extractor.

To extract the essence of content representation, we execute the following steps:

  1. We employ diverse images as input through VGG and selectively choose feature maps from a designated layer.
  2. These feature maps intricately capture the elements constituting the content.

These feature maps, akin to image-like representations, hold distinctive characteristics contingent on the layer’s depth within the network. This results in a spectrum ranging from low-level details like edges to high-level intricate nuances.

We know higher layers of the model focus more on the features present in the image i.e. overall content of the image. Thus, images having the same content should also have similar activations in the higher layers. So we extract its intermediate layers of our chosen feature maps and use them to describe the content and style of the input images.

Taking a step towards the creation of stylized imagery, we introduce a Gaussian noise image into the equation. Through the VGG network, we generate an initial style representation, albeit rudimentary. The objective is to align this representation with the content representation.