The AI Glossary for Data Analytics: LLMs, Context Window, ReAct, Few-Shot, and CoT
Every day, we encounter new keywords around AI & LLMs that make us more confused than the day before.
Hi there! Alejandro here 😊
Suscribe if you like to read about technical data & AI learnings, deep dives!
Enjoy the reading and let me know in the comments what you think about it 👨🏻💻
I really like demystifying concepts. Internet gurus or LinkedIn engagement seekers love to speak with buzzwords because it gives them status.
I am not talking about real technical people, such as data professionals speaking to an audience of data professionals. I am talking about those particular individuals who are “[HYPE] Experts” depending on the current trend.
Now, we are going to dedicate some time to make those characters look not so fancy, because we will know the substance behind the technical speech.
If you like the topic, you can check another article on the topic here:
Table of Contents
LLM (Large Language Model)
Context Window
Prompt Techniques (ReAct, CoT, Few-Shot)
🧠 LLM (Large Language Model)
Before we start, GPT stands for Generative Pretrained Transformer.
It’s amazing how few people know that. I personally learned that from a book on building agents 😂
We are talking about a very large algorithm trained on the whole internet to predict and generate human-like language.
That’s the most important part: prediction. They predict tokens.
For simplicity, tokens are pieces of words.
Since words are put together one after the other, the LLM you use the most is known as autoregressive, since each step depends on what came before.
These models are good at generating text because they calculate probabilities for the next likely tokens.
e.g., “The capital city of Spain is” will not be predicted by a model trained on the whole internet as “Paris” but as “Madrid.”
This is the reason why, when you ask how many “r”s are in “strawberry,” it tries to guess or fails spectacularly, because its ability relies on predicting the next token.
Bear in mind that much of the information on the internet is wrong, inaccurate, or fake, and tools like ChatGPT were trained on such data.
These models generate based on something that already exists. So, creativity is biased at some point.
Think about it: they are trained on internet data, and all the content creators are using it to generate more content for the internet, just to start the loop all over again.
🏞️ Context Window
When you read something like, e.g., “Google Gemini 2.5 Flash tolerates 1M tokens in the context window,”
This refers to the short-term or contextual memory of the model—basically, how much it can process to generate a single response.
Every document, image, or text you add to a message in ChatGPT or Claude adds tokens to that window.
At some point, you will get an error that it’s exceeded, or the model will just ignore it.
Note that this applies to both input and output. The metric behind this is window tokens and max output tokens.
Source: OpenAI Model Comparison
⌨️ Prompt Techniques
Gurus tried converting this into a job position, but it went nowhere.
It is one of many skills, such as knowing programming languages, Excel, or how to write a data pipeline.
These techniques are nothing but contextual instructions to make the model behave or execute a task in a particular way.
At the end of the day, it’s about precision and avoiding vague or contextually poor instructions. The more precise you are, the better the outputs.
The same as when working with… humans ;)
I am not particularly talking about the classical role-playing approach of “act as [x] and prepare a report of the [y] industry”; this goes beyond that.
There are many of these techniques; we are covering the most common ones below:
💥 ReAct (Reason + Act)
You provide this reasoning workflow pattern to the LLM:
Reason: The LLM thinks step by step about what to do.
Act: The LLM then takes an action—like running code, looking up a document, or calling a function.
Observe: It analyzes the output of that process to validate if it is correct.
For example:
Question: What was the month with the highest sales in the following data?
Month Sales
Jan 100
Feb 150
Mar 120
Reasoning: Let's look at each month's sales and find the maximum value.
Action: Compare Jan (100), Feb (150), Mar (120).
Reasoning: Feb has the highest sales (150).
Action: Return "February".
If you use Cursor or any other AI-powered IDE, you will constantly see this approach.
Source: Prompting Guide - ReAct
⛓ CoT (Chain of Thought)
You provide a step-by-step flow so the LLM follows it to break down problems into a sequence.
For example:
Question: Given the sales numbers [100, 150, 120], which value is the highest? Please explain your reasoning step by step.
Step 1: First, look at all the sales values: 100, 150, and 120.
Step 2: Compare 100 and 150. 150 is greater.
Step 3: Compare 150 and 120. 150 is still greater.
Step 4: Therefore, the highest sales number is 150.
This is perfect for multi-step reasoning or anything relating to coding or mathematical thinking.
Source: Prompting Guide- CoT
👨🏻🏫 Few Shot
Instead of training your own, which can cost much more, you provide pairs of inputs and outputs to show how the model needs to achieve a task.
For example:
Example 1:
Data: [80, 90, 120]
Q: What is the maximum value?
A: 120
Example 2:
Data: [200, 150, 175]
Q: What is the maximum value?
A: 200
Now, try this one:
Data: [100, 150, 120]
Q: What is the maximum value?
A:
Works fantastically with formatting or tweaking a writing style.
Source: Prompting Guide - Few Shot
📝 TL;DR
🧠 LLMs are giant language prediction machines.
🏞️ Context window is the model’s short-term memory.
⌨️ Prompt techniques are about giving the right instructions to get better outputs.
💥 ReAct: The AI thinks, then acts (reason + act, not just chat).
⛓ CoT: Chain of Thought—model “shows its work,” step by step.
👨🏻🏫 Few-Shot: Guide the model with a handful of examples.
If you enjoyed the content, hit the like ❤️ button, share, comment, repost, and all those nice things people do when like stuff these days. Glad to know you made it to this part!
Hi, I am Alejandro Aboy. I am currently working as a Data Engineer. I started in digital marketing at 19. I gained experience in website tracking, advertising, and analytics. I also founded my agency. In 2021, I found my passion for data engineering. So, I shifted my career focus, despite lacking a CS degree. I'm now pursuing this path, leveraging my diverse experience and willingness to learn.