The libraryMachine Learning Fundamentals

Types of ML Systems

25 min readvideoThe ML Landscape

2 of 40Machine Learning Fundamentals

Types of ML Systems

"Machine learning" is an umbrella over a dozen distinct problem shapes that each have their own algorithms, evaluation, and pitfalls. Knowing the taxonomy is the difference between picking the right tool in five minutes and burning a week chasing the wrong one. This lesson is the catalogue: the four big learning paradigms, the practical task types within each, and a decision tree for picking the right shape for a new problem.

1. The Four Learning Paradigms

Paradigm	Training data	Goal	Examples
Supervised	(input, label) pairs	Predict label from new input	Spam detection, price prediction
Unsupervised	Inputs only	Discover structure	Clustering customers, anomaly detection
Self-supervised	Inputs only, but the task is to predict missing parts	Learn general representations	BERT (predict masked words), MAE (predict masked patches)
Reinforcement	State, action, reward over time	Maximise long-term reward	Game-playing, robotics, ad serving

Most production ML in 2026 is supervised; self-supervised drives the foundation models (LLMs, vision encoders) we fine-tune on top of; unsupervised covers the rest. RL is a smaller (but high-impact) niche.

2. Supervised: Regression vs Classification

Property	Regression	Classification
Output	A real number (price, temperature, ETA)	A discrete label (spam / not spam, dog breed)
Loss	MSE, MAE, Huber	Cross-entropy, hinge, focal
Metric	R², RMSE, MAE	Accuracy, F1, AUC, log loss
Decision	Use the predicted value	Apply a threshold to predicted probability

Sometimes the same problem can be framed either way: "predict click probability" is technically regression on [0, 1] but is usually framed as classification with cross-entropy loss. The framing affects which losses, metrics, and calibration techniques apply.

3. Classification Sub-Types

Binary — one of two classes (spam / not, fraud / legit). The default and simplest case.
Multi-class — one of K mutually exclusive classes (dog breed). Use softmax + cross-entropy.
Multi-label — N independent yes/no decisions (this image has both a dog and a frisbee). Use sigmoid per class + BCE.
Imbalanced binary — one class dominates (fraud is 0.1% of transactions). Reach for class weighting, focal loss, or threshold tuning.
Open-set — at inference, an example may belong to none of the training classes. Adds an "unknown" threshold or out-of-distribution detector.

4. Unsupervised: Three Common Goals

Goal	Methods	Use case
Clustering	K-Means, DBSCAN, Gaussian mixtures, hierarchical	Customer segmentation, document grouping
Dimensionality reduction	PCA, t-SNE, UMAP	Visualisation, denoising, compression
Anomaly detection	Isolation forest, one-class SVM, autoencoder reconstruction	Fraud / fault / intrusion detection

Sections 5 and 7 cover all three, hands-on.

5. Online vs Batch Learning

Property	Batch	Online
Training	Uses the full dataset, often retrained periodically	One example or mini-batch at a time, continuously
Adaptation	Slow; retrain to incorporate new data	Fast; the model updates as data arrives
Catastrophic forgetting	Not an issue	A real risk; old patterns can be overwritten
Examples	Most production ML	Real-time recsys, ad bidding, IoT anomaly detection

Default to batch training. Reach for online only when freshness requirements or scale rule out batch retraining.

6. Instance-Based vs Model-Based

Property	Instance-based	Model-based
Training	Memorise the dataset	Fit parameters that summarise the dataset
Prediction	Look up similar examples	Apply the learned function
Examples	k-NN, kernel methods	Linear, tree-based, neural networks
Inference cost	Grows with training-set size	Constant per example

Model-based dominates production. Instance-based methods (especially k-NN) remain strong baselines and surface again in modern retrieval systems (vector search ≈ k-NN over embeddings).

7. Parametric vs Non-Parametric

Parametric — fixed number of parameters regardless of dataset size (linear regression, logistic regression, fixed-architecture neural networks). Inference cost independent of data size.
Non-parametric — model complexity grows with data (decision trees that grow until pure leaves, k-NN, kernel methods, Gaussian processes). More flexible; slower; needs more regularisation.

"Non-parametric" is misleading — these models often have more parameters than parametric ones. The distinction is whether the parameter count is fixed or grows with data.

Types of ML Systems