AIMaks

AI and ML Demystified for Product Managers

30 min readvideoAI Literacy for PMs
1 of 18AI for Product Managers

AI and ML Demystified for Product Managers

Let's start with the unglamorous truth. In 2026, "AI" is a marketing word stretched over a very specific kind of software: functions, fit to data, that produce outputs with statistical guarantees instead of deterministic ones. A neural network is a function. A large language model (LLM) is a function that, given some text, predicts the next token — and does it well enough at scale that the output looks like reasoning. That's it. Once you internalize this, every PM decision about AI features — scoping, eval, pricing, risk — becomes clearer. This lesson strips away the mystique so you can talk to your engineers and your buyers without bluffing.

1. The Term Hierarchy You Will Hear Every Day

When a vendor pitches you "AI", they usually mean one of six things. Knowing which is the difference between a realistic roadmap and a hallucinated one.

TermWhat it actually is2026 example
AIUmbrella marketing term for "software that does something humans used to do"Anything in a press release
ML (Machine Learning)Algorithms that learn patterns from data instead of being explicitly programmedSpotify recommendations, Stripe Radar fraud scoring
Deep LearningML using multi-layer neural networks; the dominant approach since ~2015iPhone Face ID, Tesla Autopilot vision
LLMA neural network trained to predict the next token over enormous text corporaGPT-5, Claude Sonnet 4.5, Gemini 2.5
GenAILLM-style models that generate output (text, images, audio, video) rather than classify itChatGPT, Midjourney, ElevenLabs, Sora
AgentAn LLM wrapped in a loop that calls tools, makes plans, and acts on the worldCursor's agent mode, Claude Code, Devin, Linear's auto-triage

2. The Three Eras (and Why It Matters Now)

Every AI feature in your product traces back to one of three paradigms. Each is still in use; each has a different cost shape and reliability profile.

EraApproachPM-relevant exampleReliability
1. RulesHand-coded if/then logic1990s spam filters, regex-based input validationDeterministic, brittle
2. Classical MLStatistical models on hand-crafted featuresNetflix recommender (pre-2015), credit scoring, ad CTRProbabilistic, well-calibrated, narrow
3. Deep Learning / GenAINeural networks; features learned end-to-endChatGPT, GitHub Copilot, image searchProbabilistic, broad, sometimes wrong in surprising ways

The shift from era 2 to era 3 between roughly 2017 and 2023 is why your roadmap looks different than it did five years ago. Era-3 models can ship features that were science fiction before — but they fail in non-obvious ways that era-2 models did not.

3. What "Training a Model" Actually Means

Training is the most expensive, least understood phrase in the AI vocabulary. Strip the jargon and it's three things:

  1. You have data. Pairs of (input, desired output), or just raw text the model is supposed to complete.
  2. You have a function with billions of knobs. A neural network. Each knob is a number called a parameter or weight.
  3. You adjust the knobs to make the function's output match the data better. The math for "adjust the knobs" is called gradient descent; the score it tries to improve is called the loss function.

That's the whole game. Training GPT-5 means turning ~$500M of GPUs, electricity, and human-curated text into a fixed set of ~2-5 trillion numbers. Once trained, those numbers are the model. Copy them and you have copied the model. They are an asset on a balance sheet.

code
data + compute + labels --[gradient descent]--> trained weights

                                                 (the "model")

4. Inference vs Training: The Cost Asymmetry

Training is a one-time (or periodic) bill. Inference — running the model to serve a user — is a recurring bill that scales with usage. The two have very different shapes and very different optimization levers.

TrainingInference
FrequencyOnce per model versionEvery user request
2026 frontier cost500M per top-tier model15 per 1M output tokens
Scales withModel size, dataset sizeActive users × tokens per user
PM leverBuild vs buy vs fine-tuneModel tier, prompt length, caching
LatencyHidden from userVisible — sub-second matters

This asymmetry is why "use a frontier model from a vendor" beats "train your own" for almost every product team in 2026. OpenAI / Anthropic / Google amortize training across millions of customers; you'd be paying $500M to ship one feature.

5. The 2026 Model Landscape

Three categories matter. You'll mix them.

CategoryExamples (2026)Cost / 1M tokensLatencyControl
Frontier closedGPT-5, Claude Opus 4, Gemini 2.5 Ultra75 outMed-HighLow (API only)
Mid-tier closedGPT-4o-mini, Claude Haiku 4, Gemini Flash3 outLowLow
Open weightsLlama 4, DeepSeek V3, Qwen 3, Mistral Large 3~2 hosted; near-zero self-hostedVariableHigh (you run it)

Three obvious tactical implications for your roadmap:

  • Default to closed mid-tier. 80% of product features (classification, summarization, extraction, drafting) work on Claude Haiku / GPT-4o-mini at <$1/1M tokens. Reserve frontier models for the hardest 20%.
  • Reach for frontier when quality is the bottleneck. Coding agents, complex reasoning, legal/medical drafting — pay the 75 per million.
  • Reach for open weights when control is the bottleneck. Healthcare data residency, on-prem enterprise, latency-sensitive on-device, predictable cost at extreme scale.

6. The Vocabulary Trap

PMs lose deals and credibility by saying "AI" when they mean one specific capability. Train yourself to translate.

What people sayWhat they usually mean
"AI-powered search"Vector embeddings + semantic ranking, often plus an LLM rewrite
"AI assistant"An LLM with a system prompt, sometimes tools, sometimes RAG over your docs
"AI agent"An LLM in a tool-calling loop with a stopping condition
"AI summary"An LLM call with a "summarize this:" prompt
"Personalization with AI"Could be a recommender (classical ML) or an LLM — these are different bills
"AI copilot"An LLM-driven UI that suggests next actions, like Cursor or Microsoft Copilot

7. Four Common PM Misconceptions

8. The Mental Model to Take Away

AI in 2026 is software with three differences from the software you grew up with:

  1. Statistical, not deterministic. Same input, slightly different output — possibly very different output if temperature is non-zero. Build for this from day one with retries, evals, and human-in-the-loop where it matters.
  2. Compute is the COGS. Every feature has a $/request unit cost that scales with prompt + output length. Pricing, packaging, and roadmap all need to account for it.
  3. Capabilities move under your feet. The model you scoped against in Q1 is not the model you ship against in Q4. Build in version pinning, regression evals, and a "capability creep" review cadence.
Up next · Key AI Concepts Every PM Should Know