AIMaks· an atelier for engineers

Self-Attention in Transformers

A detailed walkthrough of scaled dot-product self-attention, including Q/K/V projections, score scaling, softmax weights, and multi-head intuition.

advanced120s8 frames · step through

Frame 1 of 8

\text{Goal: contextualize each token using all other tokens}

Why Self-Attention?