Encoder and Decoders in Transformers
This is a crisp introduction/revision to the two concepts impeccable for understanding transformers.
ENCODER
The encoder in the transformer converts a given set of tokens from a sentence into its equivalent vectors, also called hidden state or context. These vectors capture the semantics and relationships between the tokens using multiple techniques like positional encoding, embedding matrix, and attention.
The encoder has a complex architecture made of multiple building blocks. Let’s review the architecture now.
SINGLE ENCODER LAYER
We will start off with the architecture of a single encoder layer.
- The input tokens from the sentences are converted into equivalent embeddings.
- Then the same input is also used to create positional encoding vectors.
- The position encoding vectors are then appended to the input embedding matrix for each token.
- This is then sent to the first encoder layer.