A new age of AI revolution began when OpenAI decided to give the power of AI to a common person’s hand via their invention ChatGPT. Before that, AI was synonymous with post-apocalyptic robots going wild like in Arnold Schwarzenegger films. But, how did this sudden change come into reality?
It wasn’t that sudden, but the ball started to roll when some scientists from Google published their paper “Attention is all you need” which gave the building blocks for the Transformer architecture to be such a potent and powerful architecture in the Neural network domain.
Overall, a Transformer NN Architecture ( Transformer Neural Networks Architecture) consists of a lot of elements, to make it able to understand the meaning of the text and generate the appropriate response. But if we want to divide it, we can separate the transformer-based architecture into two parts:
- Encoder
- Decoder
In the original paper, the Transformer model architecture is designed with 6 stacked Encoder and Decoder blocks, but this configuration is adjustable.

Encoder is good at understanding the input text, which is generally used to understand the given input, Decoder is generally good at generating text, hence most GPTs are decoder-only architecture.
Then there is Encoder-Decoder architecture which is used by models like T5, which is good for understanding the prompt and generating text.
Transformer Architecture Explained In Detail:
In this section, we break down the Transformer architecture into its two key components: the Encoder and Decoder. Each follows a structured process with multiple stages to transform input data into meaningful outputs. Let’s start by exploring the Encoder, its steps, and how it builds context-rich representations, before transitioning from the Encoder to Decoder, where these representations are utilized to generate the final output.
Encoder
Now let’s focus on the Encoder part, which processes the input sequence through embeddings, self-attention, and feed-forward layers to generate a set of context-rich representations. For better understanding of the part of the transformer model architecture, we have shown the simplest flow in the following visual.

I. Input Embedding
Although we are used to writing natural language words on our computers, did you know that your computer cannot understand them unless they are converted into a numerical format?
Hence, how does the LLM understand the text input we insert as a prompt?
We need to convert those texts into numerical format via vectorization.

It will need two steps to convert those text into numbers:
1. Tokenization: Converting a whole sentence into a list of tokens. Also, one word is not always equals to one token.
2. Embedding: Based on a learned model, the words are assigned a vector that accurately assigns a place that is suitable for it in the vector space w.r.t other available words.
Eg: Man and Woman will have similar distance difference as Male and Female, also Man and Male will be more closer as compared to Man and Female.
At this point, we have the words and their meanings processed in parallel. However, this process lacks the positional information of the words, which is crucial in natural language understanding. Therefore, the next step in the Transformer architecture model is to incorporate positional information.
II. Positional Encoding
A straightforward way to incorporate positional information is by assigning numerical indexes like 1, 2, 3, and so on to the tokens. However, as the sequence length increases, these numerical indexes can introduce additional complexity. To address this, a positional encoding vector is generated and added to the word vector obtained in the previous step.


III. Attention
Attention is a mechanism that determines the importance of different parts of input data by assigning numerical values to them. This helps the transformer NN architecture model focus on the most relevant parts of a sentence when processing information by dynamically assigning weights in aspects of relative importance or relevance

The above work shown is only for one attention head, similarly if the transformer architecture has supposed 8 heads then this process is done 8 times and the result is concatenated for further processing.
IV. Add & Norm
In the Attention mechanism, when the model begins learning the relationships between words, important words might be overlooked. To address this, we add the vector inserted before the attention layer to the output of the Attention layer and normalize it.
V. Feedforward Neural Network

In this way, we obtain learned embeddings that are more accurate. Embeddings retrieved from BERT are of this nature.
The output from the Encoders is then passed into the Decoder section, where it is processed to generate the final output.
Incorporate Generative AI to Simplify Your Business with Us
Transform your business process with our Gen AI App Development
Decoder

Now, as the block of legos, certain block of mechanism is repeated or reused in the decoder as well. The main difference is, in Encoder all the text inserted is visible completely to the Encoder, but on Decoder side, a part of it is masked in the Masked Attention so that the model learns to generate the next best word from it. Here, we will have multiple inputs, one is from the encoder, which in essence is providing the context or the meaning of the given input to the decoder, and the text itself is the input, which via a loop, keeps on generating the next best word.
Let’s look in detail at this key part on how this works and how it differs Decoder from the Encoder.
I. Masked Attention

Here, as shown in the gif above, each word is predicted line by line. For the next word, the entire available context (the segment in green) is used as input, and the next best word is generated as output. This cycle continues until the token is encountered.
Thus, this is how the generated output is produced by our beloved ChatGPT generative language model based on the transformer architecture.
Conclusion
Here we have tried to reveal the inner workings of the LLMs, no matter whether it is ChatGPT, Gemini or any other LLM, Transformer architecture is the basic building block behind it.
Hence, it is advisable to check the output of generated models, as they are created using complex calculations by the AI models and are highly biased based on the data on which they are trained.
At Triveni Global Software Services, we specialize in leveraging the power of Generative AI and Transformer architecture to build innovative Gen AI applications. From custom solutions to cutting-edge advancements in AI, we offer tailored app development services that empower businesses to harness the full potential of AI-driven technologies. Whether you’re looking to integrate GPT-based models or create unique AI-powered experiences, we are here to turn your vision into reality with Gen AI App Development.