What does hallucination mean in AI?

In AI, hallucination means the model generates text that is confident and fluent but factually incorrect or completely made up. It is a key risk when using LLMs for tasks that require accuracy. Example: the model might hallucinate a function that does not exist in an SDK. Always verify AI-generated code before shipping.

What is a context window in an LLM?

A context window is the maximum amount of text, measured in tokens, that a model can read and remember during a single interaction. Text outside the context window is not seen by the model. If a document is too long to fit in the context window, developers chunk it into smaller pieces before sending it to the model.

What is the difference between fine-tuning and RAG?

Fine-tuning means further training a pre-trained model on a specific dataset to change its behavior or style. RAG means retrieving external data at inference time and adding it to the prompt. Fine-tuning changes the model's weights permanently; RAG provides context on each request without modifying the model. RAG is cheaper and easier to update; fine-tuning is better for changing tone or format.

← Back

Developer English guide

AI Vocabulary Every Developer Should Know

Q: What is an LLM?

LLM stands for Large Language Model. It is a type of AI model trained on large amounts of text data to understand and generate language. GPT-4, Claude, and Gemini are all examples of LLMs. Developers use LLMs to build features like chatbots, code assistants, and document summarizers.

Q: What is RAG in the context of AI development?

RAG stands for Retrieval-Augmented Generation. It is a technique that combines a search step with generation: relevant documents are retrieved from a database and added to the prompt so the model can answer based on real data. RAG reduces hallucinations and avoids the need to retrain the model on new information.

April 21, 2026

AI has introduced a wave of new terminology that now appears in job descriptions, technical interviews, architecture discussions, and everyday team conversations. This guide explains the most important terms clearly, with real examples from software development work.

Key Takeaways

LLM (Large Language Model) is the category that includes GPT-4, Claude, and Gemini.
Hallucination means the model generates confident but factually incorrect content. Always verify AI-generated code.
RAG (Retrieval-Augmented Generation) reduces hallucinations by grounding answers in real documents.
The context window is the maximum text a model can read at once. Text outside it is invisible to the model.
Fine-tuning changes the model permanently; RAG adds context at runtime without modifying the model.

Understanding models

The foundational terms you need to discuss what AI models are, how they are built, and what makes them different from one another.

LLM (Large Language Model)

A type of AI model trained on large amounts of text data to understand and generate language. GPT-4, Claude, and Gemini are all LLMs.

“We're integrating an LLM into the support tool so it can draft replies from past ticket data.”

token

The basic unit an LLM processes. A token is roughly a word or word fragment. Models have limits on how many tokens they can handle at once.

“The response was cut off because we hit the token limit. We need to shorten the prompt.”

context window

The maximum amount of text (measured in tokens) a model can read and remember during a single interaction. Text outside the window is not seen by the model.

“The document is too long to fit in the context window. We'll need to chunk it before sending.”

parameter

A learned numerical value inside a model. More parameters generally means more capacity to learn, though not always better performance. Model sizes like '7B' or '70B' refer to billions of parameters.

“We're experimenting with a smaller 7B parameter model for the on-device use case — latency matters more than accuracy here.”

training

The process of teaching a model by exposing it to large amounts of data and adjusting its internal parameters to minimize errors.

“Pre-training a model from scratch is expensive. Most teams start from an existing checkpoint instead.”

fine-tuning

Further training a pre-trained model on a smaller, specific dataset to improve its performance on a particular task or domain.

“We fine-tuned the base model on three months of customer support tickets. The responses are much more on-brand now.”

foundation model

A large model trained on broad, general data that can be adapted for many downstream tasks. Most commercial LLMs are foundation models.

“We're building on top of a foundation model rather than training from scratch — that would take months and millions of dollars.”

multimodal

Describes a model that can process more than one type of input or output, such as text, images, audio, or video.

“We switched to a multimodal model so users can upload a screenshot and ask questions about it.”

Prompting and generation

Vocabulary for working with a model's input and output: how to instruct it, what can go wrong, and how to control its behavior.

prompt

The input you send to a model — a question, instruction, or piece of text — to get a response.

“The output was way off. I think the prompt needs to be more specific about the expected format.”

system prompt

Instructions given to a model before the conversation starts, used to set its behavior, tone, or constraints. Users usually do not see it.

“We set a system prompt that tells the model to always respond in the user's language and never discuss competitors.”

temperature

A setting between 0 and 1 (sometimes higher) that controls how random or creative a model's output is. Low temperature produces more predictable output; high temperature produces more varied output.

“For the code generation feature we're using temperature 0 — we want consistent, deterministic output.”

hallucination

When a model generates text that is confident and fluent but factually incorrect or completely made up.

“The model hallucinated a function that doesn't exist in the SDK. Always verify AI-generated code before shipping.”

few-shot prompting

A technique where you include a few examples of the desired input-output format in the prompt to guide the model's response.

“I used few-shot prompting with three example summaries and the output quality improved significantly.”

zero-shot

Asking a model to perform a task with no examples in the prompt, relying entirely on its pre-trained knowledge.

“Surprisingly, the zero-shot classification worked well enough that we didn't need to fine-tune.”

inference

The process of running a trained model to generate output from new input. This is what happens every time you call the API.

“Inference latency is our biggest bottleneck right now. We're caching common responses to reduce API calls.”

grounding

Connecting a model's output to verified, real-world data to reduce hallucinations and improve accuracy.

“We added grounding by pulling relevant docs from our database before every request. Hallucinations dropped noticeably.”

Building with AI

The technical concepts that come up when integrating AI capabilities into a product or workflow.

RAG (Retrieval-Augmented Generation)

A technique that combines a search step with generation: relevant documents are retrieved from a database and added to the prompt so the model can answer based on real data.

“We implemented RAG so the assistant can answer questions about our internal documentation without retraining the model.”

embedding

A numerical representation of text (or images or audio) as a list of numbers, called a vector. Similar content produces similar embeddings, making them useful for search and comparison.

“We generate embeddings for every support ticket and store them so we can find semantically similar past issues.”

vector database

A database optimized for storing and searching embeddings by similarity rather than exact matches.

“We're using a vector database to power the semantic search — users can find docs even if they don't use the exact keyword.”

agent

An AI system that can take a sequence of actions — such as calling tools, browsing the web, or writing files — to complete a goal, rather than just generating a single response.

“We built an agent that reads the error log, searches the codebase, and opens a draft PR with a fix.”

tool use (function calling)

A model capability that allows it to call external functions or APIs during generation, such as looking up data or running code.

“We enabled tool use so the assistant can query our database directly instead of relying on information from the prompt.”

guardrails

Rules, filters, or checks applied to model input or output to prevent harmful, off-topic, or policy-violating responses.

“We added guardrails to block any response that mentions a competitor's product by name.”

latency

The time it takes for a model to produce a response after receiving a request. A key performance metric for AI-powered features.

“Latency is too high for a real-time chat feature. We're looking at smaller models and streaming responses.”

agentic

Describes AI systems or workflows where the model operates autonomously over multiple steps, making decisions and taking actions without a human approving each one.

“The agentic pipeline handles the whole release notes workflow: it reads the diff, groups changes, and drafts the document.”

Practice this vocabulary for free

Interactive exercises with real developer scenarios. No account required.

Start free practice →

Vocabulary

Coding Acronyms Every Developer Should Know

Guide

How to Talk About Code in English

Vocabulary

English Idioms Used in Tech Teams

Ready to practice your English at work?

Lingua-e has interactive exercises built around real developer conversations: standups, code reviews, retrospectives, and more. Practice until it comes naturally.

Try Lingua-e for free

Written by

Roxana Lafuente

Lingua-e's founder

Roxana Lafuente is a software engineer with 8+ years of experience. At the beginning of her career, even though she had already passed the First Certificate in English, she still froze every time she had to speak up in the daily standup. That was a gap nobody was fixing. After 2,000+ standups, she figured out what actually builds fluency: practice that looks like your real work. She built Lingua-e so other developers wouldn't have to take the long road to feel confident working in an international development environment.