What does hallucination mean in AI?

In AI, hallucination means the model generates text that is confident and fluent but factually incorrect or completely made up. It is a key risk when using LLMs for tasks that require accuracy. Example: the model might hallucinate a function that does not exist in an SDK. Always verify AI-generated code before shipping.

What is a context window in an LLM?

A context window is the maximum amount of text, measured in tokens, that a model can read and remember during a single interaction. Text outside the context window is not seen by the model. If a document is too long to fit in the context window, developers chunk it into smaller pieces before sending it to the model.

What is the difference between fine-tuning and RAG?

Fine-tuning means further training a pre-trained model on a specific dataset to change its behavior or style. RAG means retrieving external data at inference time and adding it to the prompt. Fine-tuning changes the model's weights permanently; RAG provides context on each request without modifying the model. RAG is cheaper and easier to update; fine-tuning is better for changing tone or format.

← Volver

Guía de inglés para developers

Vocabulario de IA que todo developer debería conocer

Q: What is an LLM?

LLM stands for Large Language Model. It is a type of AI model trained on large amounts of text data to understand and generate language. GPT-4, Claude, and Gemini are all examples of LLMs. Developers use LLMs to build features like chatbots, code assistants, and document summarizers.

Q: What is RAG in the context of AI development?

RAG stands for Retrieval-Augmented Generation. It is a technique that combines a search step with generation: relevant documents are retrieved from a database and added to the prompt so the model can answer based on real data. RAG reduces hallucinations and avoids the need to retrain the model on new information.

April 21, 2026

La IA ha introducido una ola de terminología nueva que ahora aparece en descripciones de trabajo, entrevistas técnicas, discusiones de arquitectura y conversaciones cotidianas del equipo. Esta guía explica los términos más importantes con claridad y con ejemplos reales del trabajo de desarrollo de software.

Puntos clave

LLM (Large Language Model) es la categoría que incluye GPT-4, Claude y Gemini.
Hallucination significa que el modelo genera contenido con confianza pero factualmente incorrecto. Verifica siempre el código generado por IA.
RAG (Retrieval-Augmented Generation) reduce las alucinaciones fundamentando las respuestas en documentos reales.
La ventana de contexto es el máximo texto que un modelo puede leer a la vez. El texto fuera de ella es invisible para el modelo.
El fine-tuning cambia el modelo de forma permanente; RAG añade contexto en tiempo de ejecución sin modificar el modelo.

Entender los modelos

Los términos fundamentales que necesitas para hablar sobre qué son los modelos de IA, cómo se construyen y qué los diferencia entre sí.

LLM (Large Language Model)

A type of AI model trained on large amounts of text data to understand and generate language. GPT-4, Claude, and Gemini are all LLMs.

“We're integrating an LLM into the support tool so it can draft replies from past ticket data.”

token

The basic unit an LLM processes. A token is roughly a word or word fragment. Models have limits on how many tokens they can handle at once.

“The response was cut off because we hit the token limit. We need to shorten the prompt.”

context window

The maximum amount of text (measured in tokens) a model can read and remember during a single interaction. Text outside the window is not seen by the model.

“The document is too long to fit in the context window. We'll need to chunk it before sending.”

parameter

A learned numerical value inside a model. More parameters generally means more capacity to learn, though not always better performance. Model sizes like '7B' or '70B' refer to billions of parameters.

“We're experimenting with a smaller 7B parameter model for the on-device use case — latency matters more than accuracy here.”

training

The process of teaching a model by exposing it to large amounts of data and adjusting its internal parameters to minimize errors.

“Pre-training a model from scratch is expensive. Most teams start from an existing checkpoint instead.”

fine-tuning

Further training a pre-trained model on a smaller, specific dataset to improve its performance on a particular task or domain.

“We fine-tuned the base model on three months of customer support tickets. The responses are much more on-brand now.”

foundation model

A large model trained on broad, general data that can be adapted for many downstream tasks. Most commercial LLMs are foundation models.

“We're building on top of a foundation model rather than training from scratch — that would take months and millions of dollars.”

multimodal

Describes a model that can process more than one type of input or output, such as text, images, audio, or video.

“We switched to a multimodal model so users can upload a screenshot and ask questions about it.”

Prompting y generación

Vocabulario para trabajar con la entrada y salida de un modelo: cómo instruirlo, qué puede salir mal y cómo controlar su comportamiento.

prompt

The input you send to a model — a question, instruction, or piece of text — to get a response.

“The output was way off. I think the prompt needs to be more specific about the expected format.”

system prompt

Instructions given to a model before the conversation starts, used to set its behavior, tone, or constraints. Users usually do not see it.

“We set a system prompt that tells the model to always respond in the user's language and never discuss competitors.”

temperature

A setting between 0 and 1 (sometimes higher) that controls how random or creative a model's output is. Low temperature produces more predictable output; high temperature produces more varied output.

“For the code generation feature we're using temperature 0 — we want consistent, deterministic output.”

hallucination

When a model generates text that is confident and fluent but factually incorrect or completely made up.

“The model hallucinated a function that doesn't exist in the SDK. Always verify AI-generated code before shipping.”

few-shot prompting

A technique where you include a few examples of the desired input-output format in the prompt to guide the model's response.

“I used few-shot prompting with three example summaries and the output quality improved significantly.”

zero-shot

Asking a model to perform a task with no examples in the prompt, relying entirely on its pre-trained knowledge.

“Surprisingly, the zero-shot classification worked well enough that we didn't need to fine-tune.”

inference

The process of running a trained model to generate output from new input. This is what happens every time you call the API.

“Inference latency is our biggest bottleneck right now. We're caching common responses to reduce API calls.”

grounding

Connecting a model's output to verified, real-world data to reduce hallucinations and improve accuracy.

“We added grounding by pulling relevant docs from our database before every request. Hallucinations dropped noticeably.”

Construir con IA

Los conceptos técnicos que surgen al integrar capacidades de IA en un producto o flujo de trabajo.

RAG (Retrieval-Augmented Generation)

A technique that combines a search step with generation: relevant documents are retrieved from a database and added to the prompt so the model can answer based on real data.

“We implemented RAG so the assistant can answer questions about our internal documentation without retraining the model.”

embedding

A numerical representation of text (or images or audio) as a list of numbers, called a vector. Similar content produces similar embeddings, making them useful for search and comparison.

“We generate embeddings for every support ticket and store them so we can find semantically similar past issues.”

vector database

A database optimized for storing and searching embeddings by similarity rather than exact matches.

“We're using a vector database to power the semantic search — users can find docs even if they don't use the exact keyword.”

agent

An AI system that can take a sequence of actions — such as calling tools, browsing the web, or writing files — to complete a goal, rather than just generating a single response.

“We built an agent that reads the error log, searches the codebase, and opens a draft PR with a fix.”

tool use (function calling)

A model capability that allows it to call external functions or APIs during generation, such as looking up data or running code.

“We enabled tool use so the assistant can query our database directly instead of relying on information from the prompt.”

guardrails

Rules, filters, or checks applied to model input or output to prevent harmful, off-topic, or policy-violating responses.

“We added guardrails to block any response that mentions a competitor's product by name.”

latency

The time it takes for a model to produce a response after receiving a request. A key performance metric for AI-powered features.

“Latency is too high for a real-time chat feature. We're looking at smaller models and streaming responses.”

agentic

Describes AI systems or workflows where the model operates autonomously over multiple steps, making decisions and taking actions without a human approving each one.

“The agentic pipeline handles the whole release notes workflow: it reads the diff, groups changes, and drafts the document.”

Practice this vocabulary for free

Interactive exercises with real developer scenarios. No account required.

Start free practice →

Vocabulary

Coding Acronyms Every Developer Should Know

Guide

How to Talk About Code in English

Vocabulary

English Idioms Used in Tech Teams

¿Listo para practicar tu inglés en el trabajo?

Lingua-e tiene ejercicios interactivos basados en conversaciones reales de developers: standups, code reviews, retrospectivas y más. Practica hasta que salga solo.

Prueba Lingua-e gratis

Escrito por

Roxana Lafuente

Fundadora de Lingua-e

Roxana Lafuente es ingeniera de software con más de 8 años de experiencia. Al comienzo de su carrera, aunque ya había aprobado el First Certificate in English, se bloqueaba cada vez que tenía que hablar en el standup diario. Era un problema que nadie estaba resolviendo. Después de más de 2.000 standups, descubrió qué es lo que realmente construye la fluidez: practicar situaciones que se parecen a tu trabajo real. Creó Lingua-e para que otros developers no tuvieran que tomar el camino largo para sentirse seguros trabajando en un entorno de desarrollo internacional.