AIMaks

Types of ML Systems

25 min readvideoThe ML Landscape
2 of 40Machine Learning Fundamentals

Types of ML Systems

"Machine learning" is an umbrella over a dozen distinct problem shapes that each have their own algorithms, evaluation, and pitfalls. Knowing the taxonomy is the difference between picking the right tool in five minutes and burning a week chasing the wrong one. This lesson is the catalogue: the four big learning paradigms, the practical task types within each, and a decision tree for picking the right shape for a new problem.

1. The Four Learning Paradigms

ParadigmTraining dataGoalExamples
Supervised(input, label) pairsPredict label from new inputSpam detection, price prediction
UnsupervisedInputs onlyDiscover structureClustering customers, anomaly detection
Self-supervisedInputs only, but the task is to predict missing partsLearn general representationsBERT (predict masked words), MAE (predict masked patches)
ReinforcementState, action, reward over timeMaximise long-term rewardGame-playing, robotics, ad serving

Most production ML in 2026 is supervised; self-supervised drives the foundation models (LLMs, vision encoders) we fine-tune on top of; unsupervised covers the rest. RL is a smaller (but high-impact) niche.

2. Supervised: Regression vs Classification

PropertyRegressionClassification
OutputA real number (price, temperature, ETA)A discrete label (spam / not spam, dog breed)
LossMSE, MAE, HuberCross-entropy, hinge, focal
MetricR², RMSE, MAEAccuracy, F1, AUC, log loss
DecisionUse the predicted valueApply a threshold to predicted probability

Sometimes the same problem can be framed either way: "predict click probability" is technically regression on [0, 1] but is usually framed as classification with cross-entropy loss. The framing affects which losses, metrics, and calibration techniques apply.

3. Classification Sub-Types

  • Binary — one of two classes (spam / not, fraud / legit). The default and simplest case.
  • Multi-class — one of K mutually exclusive classes (dog breed). Use softmax + cross-entropy.
  • Multi-label — N independent yes/no decisions (this image has both a dog and a frisbee). Use sigmoid per class + BCE.
  • Imbalanced binary — one class dominates (fraud is 0.1% of transactions). Reach for class weighting, focal loss, or threshold tuning.
  • Open-set — at inference, an example may belong to none of the training classes. Adds an "unknown" threshold or out-of-distribution detector.

4. Unsupervised: Three Common Goals

GoalMethodsUse case
ClusteringK-Means, DBSCAN, Gaussian mixtures, hierarchicalCustomer segmentation, document grouping
Dimensionality reductionPCA, t-SNE, UMAPVisualisation, denoising, compression
Anomaly detectionIsolation forest, one-class SVM, autoencoder reconstructionFraud / fault / intrusion detection

Sections 5 and 7 cover all three, hands-on.

5. Online vs Batch Learning

PropertyBatchOnline
TrainingUses the full dataset, often retrained periodicallyOne example or mini-batch at a time, continuously
AdaptationSlow; retrain to incorporate new dataFast; the model updates as data arrives
Catastrophic forgettingNot an issueA real risk; old patterns can be overwritten
ExamplesMost production MLReal-time recsys, ad bidding, IoT anomaly detection

Default to batch training. Reach for online only when freshness requirements or scale rule out batch retraining.

6. Instance-Based vs Model-Based

PropertyInstance-basedModel-based
TrainingMemorise the datasetFit parameters that summarise the dataset
PredictionLook up similar examplesApply the learned function
Examplesk-NN, kernel methodsLinear, tree-based, neural networks
Inference costGrows with training-set sizeConstant per example

Model-based dominates production. Instance-based methods (especially k-NN) remain strong baselines and surface again in modern retrieval systems (vector search ≈ k-NN over embeddings).

7. Parametric vs Non-Parametric

  • Parametric — fixed number of parameters regardless of dataset size (linear regression, logistic regression, fixed-architecture neural networks). Inference cost independent of data size.
  • Non-parametric — model complexity grows with data (decision trees that grow until pure leaves, k-NN, kernel methods, Gaussian processes). More flexible; slower; needs more regularisation.

"Non-parametric" is misleading — these models often have more parameters than parametric ones. The distinction is whether the parameter count is fixed or grows with data.

8. The Decision Tree

9. Common Mistakes in Picking a Shape

10. The Mental Model

Up next · Setting Up Your ML Environment