AI GLOSSARY

AI Glossary

Key concepts of artificial intelligence

Clear, practical definitions of the AI concepts that show up in any project: AI agents, RAG, chatbots, LLMs, MCP, function calling and more. If you have questions about how to apply AI to your business, visit our AI services page or tell us about your case.

Term index

AI Agent
LLM (Large Language Model)
RAG (Retrieval Augmented Generation)
Chatbot
AI Automation
Embedding
Vector database
Prompt engineering
Fine-tuning
MCP (Model Context Protocol)
Function calling / Tool use
Multimodal
Hallucination
Tokens & context window
Multi-agent system
Open source vs closed models
Reasoning model

AI Agent

Autonomous LLM-based system that makes decisions, executes actions on external systems and completes complex tasks without human intervention at each step.

An AI agent combines a language model (GPT-4, Claude, Gemini) with a set of tools (function calling) to execute tasks. Unlike a chatbot, which answers questions, an agent acts: queries databases, sends emails, modifies a CRM, calls APIs, runs code and plans multiple steps to solve a task. Example: a support agent that receives an incident, looks up information in your knowledge base (RAG), checks the customer's history in the CRM, proposes a solution, executes it if authorised and logs the interaction.

LLM (Large Language Model)

AI model trained on large amounts of text to understand and generate natural language. Examples: GPT-4, Claude, Gemini, Llama.

A Large Language Model is an AI model trained on hundreds of billions of words (books, web, code) that learns language patterns and world knowledge. The most used in production are GPT-4 / GPT-4o (OpenAI), Claude Sonnet / Opus (Anthropic), Gemini (Google) and open-source models such as Llama (Meta) or Mistral. Capabilities: generate text, translate, summarise, code, reason and — connected to tools — execute complex tasks as agents.

Related: Tokens & context · Fine-tuning

RAG (Retrieval Augmented Generation)

Technique that combines an LLM with your own knowledge base so answers are based on your data instead of only what the model learned during training.

RAG works in two steps: (1) when a question arrives, the system retrieves the most relevant fragments of your documents using vector search over embeddings; (2) those fragments are passed as context to the LLM, which writes the answer based on them. Benefits: accurate and citable answers, always-up-to-date data (just re-index) and a drastic reduction of hallucinations. Typical use cases: support chatbots over corporate documentation, sales assistants with up-to-date catalogue and pricing, semantic search across knowledge bases.

Related: Embedding · Vector database · Hallucination

Chatbot

Conversational system that answers questions in natural language. Modern ones use LLMs and RAG to give precise answers based on private data.

A traditional chatbot follows predefined flows (decision trees) and answers with templates. Modern chatbots use LLMs and understand natural language without a script. Combined with RAG, they answer based on your documentation. Add function calling and tools and they become AI agents that can execute actions. Common channels: web, WhatsApp Business, Telegram, Messenger, Microsoft Teams, Slack.

Related: AI Agent · RAG

AI Automation

Automated processes combining AI with existing systems (CRM, ERP, email, etc.) to execute business tasks without human intervention.

An AI automation goes beyond a traditional workflow (Zapier, Make, n8n): it includes an LLM or agent to make contextual decisions, write responses, classify tickets, extract data from unstructured documents, etc. Typical examples: automatic classification of incoming emails, lead response generation, data extraction from scanned invoices, daily report generation from sales data, automated support during off-hours.

Related: AI Agent · Function calling

Embedding

Numerical representation (vector) of a text that captures its meaning. Enables semantic search: finding similar texts in meaning, not just in words.

An embedding is a vector of hundreds or thousands of dimensions generated by a model (OpenAI text-embedding-3, Cohere, etc.) that represents the semantic meaning of a text. Two texts similar in meaning have close embeddings in vector space, even if they don't share words. Example: "book a table" and "make a reservation at the restaurant" have very similar embeddings. It's the foundation of RAG, semantic search, support ticket clustering and duplicate detection.

Related: Vector database · RAG

Vector database

Database specialised in storing and searching embeddings efficiently. Essential for RAG and semantic search.

A vector database stores vectors (embeddings) and finds the closest ones to a given query in milliseconds. The most used: Pinecone, Weaviate, Qdrant, Milvus, Chroma and the pgvector extension for PostgreSQL. For small cases (up to ~100k vectors), pgvector on top of your existing Postgres is more than enough. For larger scale (millions), Pinecone or Qdrant are the typical choice.

Related: RAG · Embedding

Prompt engineering

Discipline of designing clear and effective instructions for LLMs. Determines the quality of the responses and the model's behaviour.

Prompt engineering is about how to phrase instructions (system prompt, user prompt, few-shot examples) so an LLM produces useful, consistent and safe answers. Best practices: be specific about the output format, give examples, set explicit constraints, use chain-of-thought ("think step by step") in reasoning tasks. In serious projects prompts are versioned and tested against an evaluation dataset.

Related: LLM

Fine-tuning

Retraining an LLM with your own examples so it learns a specific style, format or domain knowledge.

Fine-tuning takes a pretrained model (GPT-4o-mini, Llama, Mistral) and retrains it on your own dataset (typically between 50 and thousands of question-answer pairs) to specialise it. Useful when you need a very specific response style, guaranteed structured JSON output or niche knowledge. Most projects do NOT need fine-tuning: with good prompt engineering and RAG it's usually enough. Cost: typically €50–€300 for a fine-tune that ships to production.

Related: LLM · RAG

MCP (Model Context Protocol)

Open protocol from Anthropic (2024) to connect AI models with tools and data sources in a standard way. It's like USB for LLMs.

MCP defines a JSON-RPC standard for models to access tools (functions), resources (data) and predefined prompts. Instead of building custom integrations every time, you expose an MCP server and any compatible client (Claude Desktop, Cursor IDE, your own agents) can use it. Examples: MCP server for your CRM, for a file system, for a database, for Google Workspace. We implement it when your data/systems need to be available both for external assistants and for in-house AI agents.

Related: Function calling · AI Agent

Function calling / Tool use

Capability of an LLM to invoke external functions/tools you define. It's the mechanism that turns a chatbot into an AI agent.

Function calling lets the model decide which function in your system to execute (query_customers, send_email, search_product) and with what arguments, based on the conversation. The runtime executes the function, returns the result to the model and the model writes the final answer. Supported by GPT-4, Claude, Gemini and open-source models. It's the foundation on which AI agents and MCP are built.

Related: AI Agent · MCP

Multimodal

LLMs that natively process text, images, audio and/or video. Examples: GPT-4o, Claude, Gemini.

A multimodal model understands and generates more than one type of content. It can analyse a PDF invoice + photo, transcribe audio, describe a video or generate text from a diagram. Typical enterprise uses: extracting data from scanned documents, automatic product photo analysis, meeting transcription and summarisation, customer service via WhatsApp voice notes. Models: GPT-4o (text+image+audio), Claude Sonnet (text+image), Gemini Pro (text+image+video+audio).

Related: LLM

Hallucination

When an LLM invents information that looks plausible but is false. It's the main risk in serious applications.

Hallucinations happen when the model generates plausible but invented data (names, figures, references to laws). Mitigations: (1) RAG with mandatory source citations, (2) prompts that require "don't make things up, say I don't know", (3) downstream validation with tools (function calling to query real data), (4) human in the loop for critical decisions. In production projects we treat hallucinations as bugs and build automatic evaluations to detect them.

Related: RAG

Tokens & context window

Tokens are the units in which an LLM processes text (~0.75 words in English). The context window is the maximum number of tokens the model can read/write.

A token is typically a short word or part of a word. "Hello, how are you?" is ~6 tokens. Models charge per token (input and output) and have a maximum context window: GPT-4o has 128k, Claude Sonnet 200k, Gemini 1.5 Pro up to 2M. For long documents you need to split into chunks or use long-context models. Typical cost: 1M input tokens costs between $0.15 (mini) and $15 (top model).

Related: LLM

Multi-agent system

Architecture where several specialised AI agents collaborate to solve complex tasks. Each agent has a different role and tools.

Instead of a single generalist agent, a multi-agent system splits the tasks: a "receptionist agent" understands the request and routes it, a "researcher agent" looks up information, a "writer agent" prepares the answer and a "reviewer agent" validates. Useful for complex flows where a single prompt would be too long or ambiguous. Frameworks: LangGraph, CrewAI, AutoGen, custom systems with bespoke orchestration.

Related: AI Agent

Open source vs closed models

Open source (Llama, Mistral, Qwen): weights are public, you can self-host. Closed (GPT-4, Claude, Gemini): only accessible via API.

Closed models are usually more capable but you send your data to a third party (with zero-data-retention clauses in enterprise use). Open-source models you self-host on your own infrastructure: maximum privacy, no per-token cost, but it requires GPU and maintenance. Typical decision: start with a closed API (fast, cheap to test) and migrate to open source when volume or privacy justify it. Llama 3.3 70B and Qwen 2.5 72B come close to GPT-4 on many tasks.

Related: LLM

Reasoning model

LLM designed to reason step by step before answering. Examples: o1, o3 (OpenAI), Claude Opus with extended thinking, DeepSeek R1.

Reasoning models generate explicit chains of thought before the final answer, significantly improving on maths, logic, programming and complex planning. They are slower and more expensive than fast models (GPT-4o, Claude Haiku) but indispensable for tasks that require deep analysis: process optimisation, hard debugging, legal or strategic analysis. Typical use: combine a reasoner for planning with a fast model for execution.

Related: LLM

DO YOU NEED TO BRING AI INTO YOUR BUSINESS?

We help you identify which processes in your company can be improved by AI and design a tailored solution.

Request a free consultation

AI Glossary

Term index

AI Agent

LLM (Large Language Model)

RAG (Retrieval Augmented Generation)

Chatbot

AI Automation

Embedding

Vector database

Prompt engineering

Fine-tuning

MCP (Model Context Protocol)

Function calling / Tool use

Multimodal

Hallucination

Tokens & context window

Multi-agent system

Open source vs closed models

Reasoning model

DO YOU NEED TO BRING AI INTO YOUR BUSINESS?

We use cookies