Reference

AI glossary

The terms of AI explained plainly, no needless jargon, no marketing spin. 43 clear, verifiable definitions to follow Joute's comparisons without getting lost.

Defined terms

Agentic engineer

An agentic engineer designs and steers AI agents rather than writing every line of code, defining tasks, tools and guardrails and verifying the output. The job shifts from typing code to architecting and reviewing what the AI produces.

AI agent

An AI agent is an LLM that can plan and act: it calls tools, runs steps and adjusts based on results, instead of just answering. Coding agents, web-browsing agents and assistants belong here. Their weak point remains reliability over long chains of steps.

Usage & practice

AI IDE

An AI IDE is a code editor with deeply integrated AI: inline completion, chat about the codebase and agents that edit multiple files. Cursor, Windsurf and the like belong here. It is where most developers feel AI productivity gains first.

Models & architecture

Attention mechanism

Attention lets a model, when producing each word, weigh how relevant every other word in the context is. It captures long-range dependencies that earlier architectures missed. Its compute cost grows with the square of context length, which is why very long contexts stay expensive.

Usage & practice

Benchmark

A benchmark is a standardized test used to compare models on tasks like reasoning, code or knowledge. Useful as a signal, but to be read with caution: scores can be gamed, and a benchmark rarely matches your real use case.

Usage & practice

Chain-of-thought

Chain-of-thought asks a model to spell out its reasoning step by step before concluding, which improves logic and math tasks. Reasoning models use it internally and in a structured way. Note: the displayed reasoning is not always the real path the model took.

Models & architecture

Context

The context is everything a model has in front of it at a given moment: your prompt, the conversation history and any documents provided. The model has no memory beyond it. Anything outside the context window is simply ignored.

Models & architecture

Context window

The context window is the maximum amount of text, measured in tokens, a model can handle at once, prompt and answer included. It ranges from a few thousand to over a million tokens. It is not memory: anything that leaves it is forgotten, and quality often degrades in the middle of very long inputs.

Models & architecture

Diffusion model

A diffusion model generates an image or video by starting from random noise and denoising it step by step until it matches the description. It is the dominant approach for image generation since Stable Diffusion. It gives strong control but stays compute-heavy at high resolution.

Models & architecture

Distillation

Distillation trains a small model (the student) to imitate the outputs of a large one (the teacher). The result is a lighter, faster model that keeps part of the big one's ability. It is one reason recent small models rival older, larger ones.

Embedding

An embedding turns a text into a vector of numbers that captures its meaning, so that similar texts end up close together. It powers semantic search, recommendation and RAG. It is the bridge between language and the math a machine can compare.

Usage & practice

Few-shot (and zero-shot)

Few-shot means slipping a few examples of the task into the prompt to guide the model, without retraining it. Zero-shot asks for the task directly, with no example. Giving two or three good examples often improves quality sharply, at less effort than fine-tuning.

Models & architecture

Fine-tuning

Fine-tuning continues the training of an existing model on a targeted dataset to specialize it in a style, domain or task. It is lighter than training from scratch but still requires quality data. For factual knowledge, RAG is often cheaper and more flexible.

Usage & practice

Function calling

Function calling lets a model request an external tool (web search, calculation, an API query) by producing a structured call, then fold the result into its answer. It is the basic mechanism of agents: it links language to real actions. MCP standardizes this connection to tools.

Models & architecture

GAN (generative adversarial network)

A GAN pits two networks against each other: a generator that fabricates images and a discriminator that tries to tell real from fake. They improve one another. Dominant before diffusion models, it is still used for tasks like upscaling or synthetic faces.

Infrastructure

GPU

A GPU is the processor that runs the parallel math behind AI, far faster than a CPU for this. Their scarcity and price (Nvidia leads the market) directly shape what models can be trained and run. They are the oil of the current AI boom.

Usage & practice

Guardrails

Guardrails are the filters and rules that frame what a model will produce: refusing dangerous content, validating outputs, limiting an agent's actions. Necessary in production but imperfect, bypassable via jailbreak or prompt injection, and sometimes so zealous they block legitimate uses.

Concepts

Hallucination

A hallucination is a plausible but false statement produced by a model with full confidence. It stems from how LLMs work: they predict likely text, not verified truth. This is the main reason to fact-check any factual output.

Usage & practice

Image generation

Image generation creates visuals from a text description, usually via diffusion models that start from random noise and denoise it step by step. Midjourney, Flux and Ideogram are examples. Sticking points remain text rendering inside images, character consistency and training-data copyright.

Inference

Inference is the act of running a model to get an answer, as opposed to training it. It is where the per-use cost and latency sit. Optimizing inference (quantization, caching, smaller models) is central to running AI at scale.

Concepts

Jailbreak

A jailbreak is a prompt manipulation that bypasses a model's guardrails to make it produce normally blocked content. Techniques often rely on role-play or contradictory instructions. Vendors patch these continuously, but LLM security remains an open problem rather than a solved one.

Infrastructure

Latency and throughput

Latency is the delay before a model's first word; throughput is the number of tokens generated per second. Together they decide how an assistant feels and what it costs at scale. A larger, smarter but slower model is not always the right pick for a real-time task.

Models & architecture

LLM (large language model)

An LLM is a neural network trained to predict the next word over massive amounts of text. From this simple task emerge writing, translation and partial reasoning abilities. GPT, Claude and Gemini are LLMs. An LLM does not query a live knowledge base: it returns what its parameters encoded during training.

Models & architecture

LoRA (lightweight fine-tuning)

LoRA is a fine-tuning method that adjusts only a small set of added parameters instead of retraining the whole model. Adapting a model to a style or domain becomes fast and cheap, without data-center GPUs. It is the standard way to customize open-source image models.

Infrastructure

MCP (Model Context Protocol)

MCP is an open standard that connects AI models to external tools and data through a common interface. Instead of a custom integration per tool, a model speaks MCP to any compatible server. It has become the de facto plumbing for agents.

Mixture of Experts (MoE)

An MoE model is split into specialized sub-networks, the experts, of which only a few activate per request. You get the capacity of a very large model at an inference cost close to a smaller one. Mixtral and several recent models use this approach.

Models & architecture

Multimodal

A multimodal model handles several types of input or output, such as text, images, audio and video, within a single model. It can describe a photo, read a chart or generate an image from a sentence. It is the norm for the latest flagship models.

Usage & practice

No-code & AI app builders

No-code lets you build software without writing code, through visual interfaces. AI app builders like Lovable, Bolt or v0 take it further: you describe the app in plain language and get a working interface. The limit is the same as vibe coding: past the prototype, mastering the code matters again.

Obsolescence risk

A Joute score that rates how fast a tool may become irrelevant, swallowed by a model's native features, a pricing shift or a stronger rival. The higher it is, the more cautious you should be about depending on the tool long term.

Models & architecture

Open source (open weights)

An open-weights model has freely downloadable parameters you can run, fine-tune and self-host. It offers control and privacy that closed APIs do not. Licenses vary, and truly open is not always the same as free for commercial use.

Models & architecture

Parameters

Parameters are a model's internal values, tuned during training, that encode what it knows. They are counted in billions (7B, 70B, 405B). More parameters usually means more capability but heavier compute. Their count alone does not judge a model: data and training quality matter as much.

Usage & practice

Prompt

A prompt is the instruction you give a model. Its formulation heavily shapes the quality of the answer: a precise, contextualized prompt yields far more than a vague one. Prompt engineering is the practice of refining these instructions.

Usage & practice

Prompt injection

Prompt injection slips malicious instructions into content a model will read (a web page, a document, an email) to hijack its behavior. It is the main security flaw of agents that browse and read external sources. No complete fix exists today, only mitigations.

Infrastructure

Quantization

Quantization lowers the numerical precision of a model's parameters (say from 16 to 4 bits) to shrink its memory footprint and speed up inference. It lets large models run on modest hardware at a small quality cost. It is what makes running an LLM locally on an ordinary PC feasible.

Concepts

RAG (retrieval-augmented generation)

RAG feeds a model relevant documents retrieved at query time so it answers from your data rather than from memory alone. It cuts hallucinations and lets you cite sources. Its quality depends entirely on the retrieval step: bad retrieval, bad answer.

Models & architecture

Reasoning (reasoning models)

Reasoning models spend extra compute working through a problem step by step before answering, which improves math, logic and code. They are slower and pricier, and overkill for simple tasks. Their displayed reasoning is not always the real path taken.

Models & architecture

RLHF (reinforcement learning from human feedback)

RLHF aligns a model with human preferences: annotators rank answers, and the model is tuned to produce the ones judged better. It is the step that turns a raw, capable model into a helpful, polite assistant. It also bakes in the biases of whoever does the ranking.

Usage & practice

Sampling (top-p, top-k)

At each word, an LLM produces a probability distribution over possible tokens; sampling decides which to pick. Top-k limits the choice to the k likeliest tokens, top-p (nucleus) to those covering a given probability mass. With temperature, these settings balance reliability against variety.

Temperature

Temperature controls how random a model's answers are. Low, it makes outputs deterministic and predictable, useful for code or facts. High, it favors variety and creativity at the risk of errors. It is the simplest knob to tune an LLM's behavior.

Models & architecture

Token

A token is the unit of text a model handles: roughly a word fragment of a few characters. Pricing and context limits are counted in tokens, not words. In English, one token averages about 4 characters.

Models & architecture

Transformer

The Transformer is the neural-network architecture behind almost every LLM since 2017. Its breakthrough is the attention mechanism, which lets each word weigh the importance of all others. That made large-scale training and long contexts possible. The T in GPT stands for Transformer.

Infrastructure

Vector database

A vector database stores texts as embeddings and retrieves the closest to a query by similarity. It is the search engine behind RAG: you index documents to later feed an LLM the relevant passages. Pinecone, Weaviate and pgvector are examples.

Concepts

Vibe coding

Vibe coding means building software by describing what you want in natural language and letting the AI write the code, with little manual review. Great for prototypes and demos. Past that point, understanding the generated code becomes necessary again.