Each one is a small film: a single idea, frame by frame. Most have a step-through viewer so you can stop, read the caption, and move on.
Watch how gradient descent iteratively finds the minimum of a loss function by following the steepest downhill direction.
See how input data flows through a neural network layer by layer, with activations lighting up as data is transformed.
Visualize how matrices transform 2D space — stretching, rotating, and shearing the coordinate grid.
The chain rule in action — watch gradients flow backward through a computation graph to update weights during training.
Compare ReLU, Sigmoid, Tanh, and GELU — understand why non-linearity is essential for neural networks.
A detailed walkthrough of scaled dot-product self-attention, including Q/K/V projections, score scaling, softmax weights, and multi-head intuition.