Animationsactivations

Activation Functions Compared

Compare ReLU, Sigmoid, Tanh, and GELU — understand why non-linearity is essential for neural networks.

beginner35s4 frames · step through
Activation FunctionsThe key to neural network powerWithout activation: Linear + Linear = still LinearW₂(W₁x) = (W₂W₁)x = single matrix multiplyWith activation: Non-linear magic!σ(W₂ · σ(W₁x)) can learn ANY functionUniversal Approximation Theorem
Frame 1 of 4
\text{Without: } f(x) = W_2(W_1 x) = (W_2 W_1)x = W'x

Why Activation Functions?