AI and ML Demystified for Product Managers

30 min readvideoAI Literacy for PMs

1 of 18AI for Product Managers

AI and ML Demystified for Product Managers

Let's start with the unglamorous truth. In 2026, "AI" is a marketing word stretched over a very specific kind of software: functions, fit to data, that produce outputs with statistical guarantees instead of deterministic ones. A neural network is a function. A large language model (LLM) is a function that, given some text, predicts the next token — and does it well enough at scale that the output looks like reasoning. That's it. Once you internalize this, every PM decision about AI features — scoping, eval, pricing, risk — becomes clearer. This lesson strips away the mystique so you can talk to your engineers and your buyers without bluffing.

1. The Term Hierarchy You Will Hear Every Day

When a vendor pitches you "AI", they usually mean one of six things. Knowing which is the difference between a realistic roadmap and a hallucinated one.

Term	What it actually is	2026 example
AI	Umbrella marketing term for "software that does something humans used to do"	Anything in a press release
ML (Machine Learning)	Algorithms that learn patterns from data instead of being explicitly programmed	Spotify recommendations, Stripe Radar fraud scoring
Deep Learning	ML using multi-layer neural networks; the dominant approach since ~2015	iPhone Face ID, Tesla Autopilot vision
LLM	A neural network trained to predict the next token over enormous text corpora	GPT-5, Claude Sonnet 4.5, Gemini 2.5
GenAI	LLM-style models that generate output (text, images, audio, video) rather than classify it	ChatGPT, Midjourney, ElevenLabs, Sora
Agent	An LLM wrapped in a loop that calls tools, makes plans, and acts on the world	Cursor's agent mode, Claude Code, Devin, Linear's auto-triage

2. The Three Eras (and Why It Matters Now)

Every AI feature in your product traces back to one of three paradigms. Each is still in use; each has a different cost shape and reliability profile.

Era	Approach	PM-relevant example	Reliability
1. Rules	Hand-coded if/then logic	1990s spam filters, regex-based input validation	Deterministic, brittle
2. Classical ML	Statistical models on hand-crafted features	Netflix recommender (pre-2015), credit scoring, ad CTR	Probabilistic, well-calibrated, narrow
3. Deep Learning / GenAI	Neural networks; features learned end-to-end	ChatGPT, GitHub Copilot, image search	Probabilistic, broad, sometimes wrong in surprising ways

The shift from era 2 to era 3 between roughly 2017 and 2023 is why your roadmap looks different than it did five years ago. Era-3 models can ship features that were science fiction before — but they fail in non-obvious ways that era-2 models did not.

3. What "Training a Model" Actually Means

Training is the most expensive, least understood phrase in the AI vocabulary. Strip the jargon and it's three things:

You have data. Pairs of (input, desired output), or just raw text the model is supposed to complete.
You have a function with billions of knobs. A neural network. Each knob is a number called a parameter or weight.
You adjust the knobs to make the function's output match the data better. The math for "adjust the knobs" is called gradient descent; the score it tries to improve is called the loss function.

That's the whole game. Training GPT-5 means turning ~$500M of GPUs, electricity, and human-curated text into a fixed set of ~2-5 trillion numbers. Once trained, those numbers are the model. Copy them and you have copied the model. They are an asset on a balance sheet.

code

data + compute + labels --[gradient descent]--> trained weights
                                                       ↓
                                                 (the "model")

4. Inference vs Training: The Cost Asymmetry

Training is a one-time (or periodic) bill. Inference — running the model to serve a user — is a recurring bill that scales with usage. The two have very different shapes and very different optimization levers.

	Training	Inference
Frequency	Once per model version	Every user request
2026 frontier cost	$50 M -$ 500M per top-tier model	$0.50 -$ 15 per 1M output tokens
Scales with	Model size, dataset size	Active users × tokens per user
PM lever	Build vs buy vs fine-tune	Model tier, prompt length, caching
Latency	Hidden from user	Visible — sub-second matters

This asymmetry is why "use a frontier model from a vendor" beats "train your own" for almost every product team in 2026. OpenAI / Anthropic / Google amortize training across millions of customers; you'd be paying $500M to ship one feature.

5. The 2026 Model Landscape

Three categories matter. You'll mix them.

Category	Examples (2026)	Cost / 1M tokens	Latency	Control
Frontier closed	GPT-5, Claude Opus 4, Gemini 2.5 Ultra	$10 -$ 75 out	Med-High	Low (API only)
Mid-tier closed	GPT-4o-mini, Claude Haiku 4, Gemini Flash	$0.50 -$ 3 out	Low	Low
Open weights	Llama 4, DeepSeek V3, Qwen 3, Mistral Large 3	~ $0.30 -$ 2 hosted; near-zero self-hosted	Variable	High (you run it)

Three obvious tactical implications for your roadmap:

Default to closed mid-tier. 80% of product features (classification, summarization, extraction, drafting) work on Claude Haiku / GPT-4o-mini at <$1/1M tokens. Reserve frontier models for the hardest 20%.
Reach for frontier when quality is the bottleneck. Coding agents, complex reasoning, legal/medical drafting — pay the $15 -$ 75 per million.
Reach for open weights when control is the bottleneck. Healthcare data residency, on-prem enterprise, latency-sensitive on-device, predictable cost at extreme scale.

6. The Vocabulary Trap

PMs lose deals and credibility by saying "AI" when they mean one specific capability. Train yourself to translate.

What people say	What they usually mean
"AI-powered search"	Vector embeddings + semantic ranking, often plus an LLM rewrite
"AI assistant"	An LLM with a system prompt, sometimes tools, sometimes RAG over your docs
"AI agent"	An LLM in a tool-calling loop with a stopping condition
"AI summary"	An LLM call with a "summarize this:" prompt
"Personalization with AI"	Could be a recommender (classical ML) or an LLM — these are different bills
"AI copilot"	An LLM-driven UI that suggests next actions, like Cursor or Microsoft Copilot

7. Four Common PM Misconceptions

8. The Mental Model to Take Away

AI in 2026 is software with three differences from the software you grew up with:

Statistical, not deterministic. Same input, slightly different output — possibly very different output if temperature is non-zero. Build for this from day one with retries, evals, and human-in-the-loop where it matters.
Compute is the COGS. Every feature has a $/request unit cost that scales with prompt + output length. Pricing, packaging, and roadmap all need to account for it.
Capabilities move under your feet. The model you scoped against in Q1 is not the model you ship against in Q4. Build in version pinning, regression evals, and a "capability creep" review cadence.

Up next · Key AI Concepts Every PM Should Know