Lingua-e
← Back

Developer English guide

AI Vocabulary Every Developer Should Know

AI has introduced a wave of new terminology that now appears in job descriptions, technical interviews, architecture discussions, and everyday team conversations. This guide explains the most important terms clearly, with real examples from software development work.

Understanding models

The foundational terms you need to discuss what AI models are, how they are built, and what makes them different from one another.

LLM (Large Language Model)

A type of AI model trained on large amounts of text data to understand and generate language. GPT-4, Claude, and Gemini are all LLMs.

We're integrating an LLM into the support tool so it can draft replies from past ticket data.

token

The basic unit an LLM processes. A token is roughly a word or word fragment. Models have limits on how many tokens they can handle at once.

The response was cut off because we hit the token limit. We need to shorten the prompt.

context window

The maximum amount of text (measured in tokens) a model can read and remember during a single interaction. Text outside the window is not seen by the model.

The document is too long to fit in the context window. We'll need to chunk it before sending.

parameter

A learned numerical value inside a model. More parameters generally means more capacity to learn, though not always better performance. Model sizes like '7B' or '70B' refer to billions of parameters.

We're experimenting with a smaller 7B parameter model for the on-device use case — latency matters more than accuracy here.

training

The process of teaching a model by exposing it to large amounts of data and adjusting its internal parameters to minimize errors.

Pre-training a model from scratch is expensive. Most teams start from an existing checkpoint instead.

fine-tuning

Further training a pre-trained model on a smaller, specific dataset to improve its performance on a particular task or domain.

We fine-tuned the base model on three months of customer support tickets. The responses are much more on-brand now.

foundation model

A large model trained on broad, general data that can be adapted for many downstream tasks. Most commercial LLMs are foundation models.

We're building on top of a foundation model rather than training from scratch — that would take months and millions of dollars.

multimodal

Describes a model that can process more than one type of input or output, such as text, images, audio, or video.

We switched to a multimodal model so users can upload a screenshot and ask questions about it.

Prompting and generation

Vocabulary for working with a model's input and output: how to instruct it, what can go wrong, and how to control its behavior.

prompt

The input you send to a model — a question, instruction, or piece of text — to get a response.

The output was way off. I think the prompt needs to be more specific about the expected format.

system prompt

Instructions given to a model before the conversation starts, used to set its behavior, tone, or constraints. Users usually do not see it.

We set a system prompt that tells the model to always respond in the user's language and never discuss competitors.

temperature

A setting between 0 and 1 (sometimes higher) that controls how random or creative a model's output is. Low temperature produces more predictable output; high temperature produces more varied output.

For the code generation feature we're using temperature 0 — we want consistent, deterministic output.

hallucination

When a model generates text that is confident and fluent but factually incorrect or completely made up.

The model hallucinated a function that doesn't exist in the SDK. Always verify AI-generated code before shipping.

few-shot prompting

A technique where you include a few examples of the desired input-output format in the prompt to guide the model's response.

I used few-shot prompting with three example summaries and the output quality improved significantly.

zero-shot

Asking a model to perform a task with no examples in the prompt, relying entirely on its pre-trained knowledge.

Surprisingly, the zero-shot classification worked well enough that we didn't need to fine-tune.

inference

The process of running a trained model to generate output from new input. This is what happens every time you call the API.

Inference latency is our biggest bottleneck right now. We're caching common responses to reduce API calls.

grounding

Connecting a model's output to verified, real-world data to reduce hallucinations and improve accuracy.

We added grounding by pulling relevant docs from our database before every request. Hallucinations dropped noticeably.

Building with AI

The technical concepts that come up when integrating AI capabilities into a product or workflow.

RAG (Retrieval-Augmented Generation)

A technique that combines a search step with generation: relevant documents are retrieved from a database and added to the prompt so the model can answer based on real data.

We implemented RAG so the assistant can answer questions about our internal documentation without retraining the model.

embedding

A numerical representation of text (or images or audio) as a list of numbers, called a vector. Similar content produces similar embeddings, making them useful for search and comparison.

We generate embeddings for every support ticket and store them so we can find semantically similar past issues.

vector database

A database optimized for storing and searching embeddings by similarity rather than exact matches.

We're using a vector database to power the semantic search — users can find docs even if they don't use the exact keyword.

agent

An AI system that can take a sequence of actions — such as calling tools, browsing the web, or writing files — to complete a goal, rather than just generating a single response.

We built an agent that reads the error log, searches the codebase, and opens a draft PR with a fix.

tool use (function calling)

A model capability that allows it to call external functions or APIs during generation, such as looking up data or running code.

We enabled tool use so the assistant can query our database directly instead of relying on information from the prompt.

guardrails

Rules, filters, or checks applied to model input or output to prevent harmful, off-topic, or policy-violating responses.

We added guardrails to block any response that mentions a competitor's product by name.

latency

The time it takes for a model to produce a response after receiving a request. A key performance metric for AI-powered features.

Latency is too high for a real-time chat feature. We're looking at smaller models and streaming responses.

agentic

Describes AI systems or workflows where the model operates autonomously over multiple steps, making decisions and taking actions without a human approving each one.

The agentic pipeline handles the whole release notes workflow: it reads the diff, groups changes, and drafts the document.

Ready to practice your English at work?

Lingua-e has interactive exercises built around real developer conversations: standups, code reviews, retrospectives, and more. Practice until it comes naturally.

Try Lingua-e for free