A detailed walkthrough of scaled dot-product self-attention, including Q/K/V projections, score scaling, softmax weights, and multi-head intuition.
Why Self-Attention?