A
- Agent / AI Agent
- An AI system that can autonomously take actions — browsing the web, writing code, calling APIs — to complete a goal, rather than just responding to a single prompt. Agents typically combine an LLM with tools and memory.
- Alignment
- The challenge of ensuring AI systems behave in ways that are consistent with human values and intentions. A major area of AI safety research.
- Attention Mechanism
- A core component of transformer models that allows the model to weigh the relevance of different words in a sequence when generating output. The basis of modern LLMs.
C
- Context Window
- The maximum amount of text (measured in tokens) a model can process at once — both input and output combined. Larger context windows allow models to handle longer documents and conversations.
- Chain-of-Thought (CoT)
- A prompting technique where a model is encouraged to reason step-by-step before giving a final answer, improving accuracy on complex tasks.
- CUDA
- NVIDIA's parallel computing platform, widely used to accelerate AI model training and inference on GPUs. Dominance of CUDA is a key reason NVIDIA leads AI hardware.
E
- Embedding
- A numerical representation of text (or images, audio, etc.) as a vector in high-dimensional space. Similar concepts have similar vectors, enabling semantic search and retrieval.
- Emergent Behavior
- Capabilities that appear in large models that were not explicitly trained for and were not present in smaller versions. Examples include multi-step reasoning and code generation.
F
- Fine-tuning
- Further training a pre-trained model on a specific dataset to adapt it for a particular task or domain. Less expensive than training from scratch.
- Foundation Model
- A large model trained on broad data that can be adapted to many downstream tasks. GPT-4, Claude, and Gemini are examples. Also called a base model.
G
- Generative AI
- AI systems that produce new content — text, images, code, audio, video — rather than just classifying or predicting from existing data.
- GPU (Graphics Processing Unit)
- Hardware originally designed for rendering graphics, now essential for training and running AI models due to its ability to perform many parallel computations simultaneously.
- Guardrails
- Constraints applied to AI systems to prevent harmful, biased, or off-topic outputs. Can be implemented via training, prompting, or separate filtering layers.
H
- Hallucination
- When an AI model generates plausible-sounding but factually incorrect or fabricated information. A known limitation of current LLMs.
- RLHF (Reinforcement Learning from Human Feedback)
- A training technique where human raters score model outputs and those scores are used to fine-tune the model to produce more preferred responses. Used heavily to make LLMs more helpful and safe.
I
- Inference
- Running a trained model to generate outputs from new inputs. Distinct from training. Inference cost and speed are key factors for deploying AI in production.
- In-Context Learning
- The ability of a model to adapt its behavior based on examples or instructions provided directly in the prompt, without any weight updates.
L
- LLM (Large Language Model)
- A neural network trained on large amounts of text data to understand and generate human language. The technology behind ChatGPT, Claude, Gemini, and others.
- LoRA (Low-Rank Adaptation)
- A parameter-efficient fine-tuning technique that trains only a small set of additional weights, making fine-tuning much cheaper than full model training.
M
- Mixture of Experts (MoE)
- A model architecture where only a subset of the model's parameters are activated for any given input, allowing for very large models that are efficient at inference.
- Multimodal
- AI models that can process and generate multiple types of data — text, images, audio, video — rather than just one modality.
P
- Parameters
- The numerical weights inside a neural network that are learned during training. Model size is often described by parameter count (e.g., 70B = 70 billion parameters).
- Prompt Engineering
- The practice of crafting inputs to an AI model to elicit better, more accurate, or more useful outputs. A key skill for working with LLMs.
- Pre-training
- The initial phase of training a foundation model on a large, general dataset — typically text from the internet and books — before any task-specific fine-tuning.
Q
- Quantization
- Reducing the numerical precision of model weights (e.g., from 32-bit to 4-bit) to shrink model size and speed up inference, with a small trade-off in accuracy.
R
- RAG (Retrieval-Augmented Generation)
- A technique that combines a retrieval system (fetching relevant documents from a knowledge base) with a generative model to produce more accurate, grounded responses.
- Reasoning Model
- A class of LLMs optimized for multi-step logical reasoning, often by generating an internal "thinking" process before producing a final answer. Examples: OpenAI o1, DeepSeek R1.
S
- System Prompt
- Instructions given to an LLM at the start of a conversation (typically by the application developer, not the end user) to set behavior, tone, and constraints.
- Synthetic Data
- Data generated by AI models rather than collected from the real world, used to train or fine-tune other models — especially when real data is scarce or sensitive.
T
- Token
- The basic unit of text that LLMs process. A token is roughly 3–4 characters or 0.75 words in English. Model pricing and context windows are measured in tokens.
- Transformer
- The neural network architecture that underpins modern LLMs, introduced in the 2017 paper "Attention Is All You Need." Uses attention mechanisms to process sequences in parallel.
V
- Vector Database
- A database optimized for storing and querying embeddings. Used in RAG systems to quickly find the most semantically similar documents to a query. Examples: Pinecone, Weaviate, pgvector.
- Vibe Coding
- An informal term for using AI coding assistants (like Claude or GitHub Copilot) to write code through natural language, often without deep technical knowledge of the implementation details.