Animations · the math, in motion

Step-through animations, made with Manim.

Each one is a small film: a single idea, frame by frame. Most have a step-through viewer so you can stop, read the caption, and move on.

02All animations · 6
Gradient DescentFinding the minimum step by stepClick next to see the algorithm in action
Step-through
optimization · beginner

Gradient Descent Visualization

Watch how gradient descent iteratively finds the minimum of a loss function by following the steepest downhill direction.

45s
Neural NetworkForward Pass VisualizationStep through to see data flow through each layer
Step-through
neural nets · beginner

Neural Network Forward Pass

See how input data flows through a neural network layer by layer, with activations lighting up as data is transformed.

40s
Matrix TransformationsWhat matrices really do to spaceNavigate to see how matrices warp the coordinate grid
Step-through
linear algebra · beginner

Matrix Transformations

Visualize how matrices transform 2D space — stretching, rotating, and shearing the coordinate grid.

35s
BackpropagationThe chain rule in actionx×w+bLStep through to see gradients flow backward
Step-through
backpropagation · intermediate

Backpropagation Explained

The chain rule in action — watch gradients flow backward through a computation graph to update weights during training.

50s
Activation FunctionsThe key to neural network powerWithout activation: Linear + Linear = still LinearW₂(W₁x) = (W₂W₁)x = single matrix multiplyWith activation: Non-linear magic!σ(W₂ · σ(W₁x)) can learn ANY functionUniversal Approximation Theorem
Step-through
activations · beginner

Activation Functions Compared

Compare ReLU, Sigmoid, Tanh, and GELU — understand why non-linearity is essential for neural networks.

35s
Self-Attention MechanismHow transformers understand contextSentence: “The model learns context”Themodellearnscontextstrongest attention
Step-through
transformers · advanced

Self-Attention in Transformers

A detailed walkthrough of scaled dot-product self-attention, including Q/K/V projections, score scaling, softmax weights, and multi-head intuition.

120s