AI Industry Terms and Glossary

A

Agent / AI Agent: An AI system that can autonomously take actions — browsing the web, writing code, calling APIs — to complete a goal, rather than just responding to a single prompt. Agents typically combine an LLM with tools and memory.
Alignment: The challenge of ensuring AI systems behave in ways that are consistent with human values and intentions. A major area of AI safety research.
Attention Mechanism: A core component of transformer models that allows the model to weigh the relevance of different words in a sequence when generating output. The basis of modern LLMs.

Context Window: The maximum amount of text (measured in tokens) a model can process at once — both input and output combined. Larger context windows allow models to handle longer documents and conversations.
Chain-of-Thought (CoT): A prompting technique where a model is encouraged to reason step-by-step before giving a final answer, improving accuracy on complex tasks.
CUDA: NVIDIA's parallel computing platform, widely used to accelerate AI model training and inference on GPUs. Dominance of CUDA is a key reason NVIDIA leads AI hardware.

Embedding: A numerical representation of text (or images, audio, etc.) as a vector in high-dimensional space. Similar concepts have similar vectors, enabling semantic search and retrieval.
Emergent Behavior: Capabilities that appear in large models that were not explicitly trained for and were not present in smaller versions. Examples include multi-step reasoning and code generation.

Fine-tuning: Further training a pre-trained model on a specific dataset to adapt it for a particular task or domain. Less expensive than training from scratch.
Foundation Model: A large model trained on broad data that can be adapted to many downstream tasks. GPT-4, Claude, and Gemini are examples. Also called a base model.

Generative AI: AI systems that produce new content — text, images, code, audio, video — rather than just classifying or predicting from existing data.
GPU (Graphics Processing Unit): Hardware originally designed for rendering graphics, now essential for training and running AI models due to its ability to perform many parallel computations simultaneously.
Guardrails: Constraints applied to AI systems to prevent harmful, biased, or off-topic outputs. Can be implemented via training, prompting, or separate filtering layers.

Hallucination: When an AI model generates plausible-sounding but factually incorrect or fabricated information. A known limitation of current LLMs.
RLHF (Reinforcement Learning from Human Feedback): A training technique where human raters score model outputs and those scores are used to fine-tune the model to produce more preferred responses. Used heavily to make LLMs more helpful and safe.

Inference: Running a trained model to generate outputs from new inputs. Distinct from training. Inference cost and speed are key factors for deploying AI in production.
In-Context Learning: The ability of a model to adapt its behavior based on examples or instructions provided directly in the prompt, without any weight updates.

LLM (Large Language Model): A neural network trained on large amounts of text data to understand and generate human language. The technology behind ChatGPT, Claude, Gemini, and others.
LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that trains only a small set of additional weights, making fine-tuning much cheaper than full model training.

Mixture of Experts (MoE): A model architecture where only a subset of the model's parameters are activated for any given input, allowing for very large models that are efficient at inference.
Multimodal: AI models that can process and generate multiple types of data — text, images, audio, video — rather than just one modality.

Parameters: The numerical weights inside a neural network that are learned during training. Model size is often described by parameter count (e.g., 70B = 70 billion parameters).
Prompt Engineering: The practice of crafting inputs to an AI model to elicit better, more accurate, or more useful outputs. A key skill for working with LLMs.
Pre-training: The initial phase of training a foundation model on a large, general dataset — typically text from the internet and books — before any task-specific fine-tuning.

Quantization: Reducing the numerical precision of model weights (e.g., from 32-bit to 4-bit) to shrink model size and speed up inference, with a small trade-off in accuracy.

RAG (Retrieval-Augmented Generation): A technique that combines a retrieval system (fetching relevant documents from a knowledge base) with a generative model to produce more accurate, grounded responses.
Reasoning Model: A class of LLMs optimized for multi-step logical reasoning, often by generating an internal "thinking" process before producing a final answer. Examples: OpenAI o1, DeepSeek R1.

System Prompt: Instructions given to an LLM at the start of a conversation (typically by the application developer, not the end user) to set behavior, tone, and constraints.
Synthetic Data: Data generated by AI models rather than collected from the real world, used to train or fine-tune other models — especially when real data is scarce or sensitive.

Token: The basic unit of text that LLMs process. A token is roughly 3–4 characters or 0.75 words in English. Model pricing and context windows are measured in tokens.
Transformer: The neural network architecture that underpins modern LLMs, introduced in the 2017 paper "Attention Is All You Need." Uses attention mechanisms to process sequences in parallel.

Vector Database: A database optimized for storing and querying embeddings. Used in RAG systems to quickly find the most semantically similar documents to a query. Examples: Pinecone, Weaviate, pgvector.
Vibe Coding: An informal term for using AI coding assistants (like Claude or GitHub Copilot) to write code through natural language, often without deep technical knowledge of the implementation details.