A

Agent / AI Agent
An AI system that can autonomously take actions — browsing the web, writing code, calling APIs — to complete a goal, rather than just responding to a single prompt. Agents typically combine an LLM with tools and memory.
Alignment
The challenge of ensuring AI systems behave in ways that are consistent with human values and intentions. A major area of AI safety research.
Attention Mechanism
A core component of transformer models that allows the model to weigh the relevance of different words in a sequence when generating output. The basis of modern LLMs.

C

Context Window
The maximum amount of text (measured in tokens) a model can process at once — both input and output combined. Larger context windows allow models to handle longer documents and conversations.
Chain-of-Thought (CoT)
A prompting technique where a model is encouraged to reason step-by-step before giving a final answer, improving accuracy on complex tasks.
CUDA
NVIDIA's parallel computing platform, widely used to accelerate AI model training and inference on GPUs. Dominance of CUDA is a key reason NVIDIA leads AI hardware.

E

Embedding
A numerical representation of text (or images, audio, etc.) as a vector in high-dimensional space. Similar concepts have similar vectors, enabling semantic search and retrieval.
Emergent Behavior
Capabilities that appear in large models that were not explicitly trained for and were not present in smaller versions. Examples include multi-step reasoning and code generation.

F

Fine-tuning
Further training a pre-trained model on a specific dataset to adapt it for a particular task or domain. Less expensive than training from scratch.
Foundation Model
A large model trained on broad data that can be adapted to many downstream tasks. GPT-4, Claude, and Gemini are examples. Also called a base model.

G

Generative AI
AI systems that produce new content — text, images, code, audio, video — rather than just classifying or predicting from existing data.
GPU (Graphics Processing Unit)
Hardware originally designed for rendering graphics, now essential for training and running AI models due to its ability to perform many parallel computations simultaneously.
Guardrails
Constraints applied to AI systems to prevent harmful, biased, or off-topic outputs. Can be implemented via training, prompting, or separate filtering layers.

H

Hallucination
When an AI model generates plausible-sounding but factually incorrect or fabricated information. A known limitation of current LLMs.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where human raters score model outputs and those scores are used to fine-tune the model to produce more preferred responses. Used heavily to make LLMs more helpful and safe.

I

Inference
Running a trained model to generate outputs from new inputs. Distinct from training. Inference cost and speed are key factors for deploying AI in production.
In-Context Learning
The ability of a model to adapt its behavior based on examples or instructions provided directly in the prompt, without any weight updates.

L

LLM (Large Language Model)
A neural network trained on large amounts of text data to understand and generate human language. The technology behind ChatGPT, Claude, Gemini, and others.
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning technique that trains only a small set of additional weights, making fine-tuning much cheaper than full model training.

M

Mixture of Experts (MoE)
A model architecture where only a subset of the model's parameters are activated for any given input, allowing for very large models that are efficient at inference.
Multimodal
AI models that can process and generate multiple types of data — text, images, audio, video — rather than just one modality.

P

Parameters
The numerical weights inside a neural network that are learned during training. Model size is often described by parameter count (e.g., 70B = 70 billion parameters).
Prompt Engineering
The practice of crafting inputs to an AI model to elicit better, more accurate, or more useful outputs. A key skill for working with LLMs.
Pre-training
The initial phase of training a foundation model on a large, general dataset — typically text from the internet and books — before any task-specific fine-tuning.

Q

Quantization
Reducing the numerical precision of model weights (e.g., from 32-bit to 4-bit) to shrink model size and speed up inference, with a small trade-off in accuracy.

R

RAG (Retrieval-Augmented Generation)
A technique that combines a retrieval system (fetching relevant documents from a knowledge base) with a generative model to produce more accurate, grounded responses.
Reasoning Model
A class of LLMs optimized for multi-step logical reasoning, often by generating an internal "thinking" process before producing a final answer. Examples: OpenAI o1, DeepSeek R1.

S

System Prompt
Instructions given to an LLM at the start of a conversation (typically by the application developer, not the end user) to set behavior, tone, and constraints.
Synthetic Data
Data generated by AI models rather than collected from the real world, used to train or fine-tune other models — especially when real data is scarce or sensitive.

T

Token
The basic unit of text that LLMs process. A token is roughly 3–4 characters or 0.75 words in English. Model pricing and context windows are measured in tokens.
Transformer
The neural network architecture that underpins modern LLMs, introduced in the 2017 paper "Attention Is All You Need." Uses attention mechanisms to process sequences in parallel.

V

Vector Database
A database optimized for storing and querying embeddings. Used in RAG systems to quickly find the most semantically similar documents to a query. Examples: Pinecone, Weaviate, pgvector.
Vibe Coding
An informal term for using AI coding assistants (like Claude or GitHub Copilot) to write code through natural language, often without deep technical knowledge of the implementation details.