2026-05-21·reading

Attention Is All You Need

The Transformer paper that made attention-only sequence modeling the default substrate for modern language models.

paper · queued

Ashish Vaswani et al.

arXiv:1706.03762

A baseline paper for understanding the architecture stack beneath modern LLMs: self-attention, multi-head attention, position encodings, residual streams, and the compute shape that made large-scale sequence modeling practical.

Neighborhood

Related