Natural Language Processing: A Comprehensive Tutorial

This article explores various aspects of natural language processing (NLP). The evolution of machine learning techniques, the significance of transformer architectures, and the role of attention mechanisms are highlighted. The article emphasizes the importance of data labeling, tokenization, and Vectorization. Then after a non-mathematical of Transformers, it goes into emergence of pre-trained models and the role of Hugging Face as a platform for accessing and utilizing these models effectively.

Rahul S
30 min readJul 9, 2023



  1. Branches of NLP
  2. Machine Learning Pipeline in NLP
  3. Data Labelling
  4. Tokenisation
  5. Vectorization: Bag of Words, TF-IDF & Embedding Matrix
  6. Transformers
  7. Positional Encoding
  8. Attention Mechanism
  9. Encoder
  10. LLM
  11. BERT
  12. GPTs
  13. Hugging Face

Natural language processing deals with the ability of computers to process (PreProcessing), understand(NLU) and generate text(NLG). This includes both spoken and written human languages and enables automation of analytics, self-service actions and human machine interactions.


There are multiple branches of N L P. It starts with natural language understanding or NLU.

NLU: NLU is used to understand words, sentences, semantics, and context in text. Popular NLU applications include sentiment analysis and text summarization.

Information extraction is the earliest branch of NLP. This deals with extracting structured information from a body of text. Tasks for information extraction include named entity recognition (NER)and text search.