Index
·research·reading

Attention Is All You Need

The Transformer paper that made attention-only sequence modeling the default substrate for modern language models.

paper · queued
Ashish Vaswani et al.
arXiv:1706.03762
source ↗

A baseline paper for understanding the architecture stack beneath modern LLMs: self-attention, multi-head attention, position encodings, residual streams, and the compute shape that made large-scale sequence modeling practical.

Neighborhood

Related

Language Models are Few-Shot LearnersLanguage Models are Few...AI systems engineeringAI systems engineeringFull Stack Artificial IntelligenceFull Stack Artificial I...Full-Stack Artificial IntelligenceFull-Stack Artificial Intel...The APIThe APIPretrained Transformers as Universal Computation EnginesPretrained Transformers as ...Attention Is All You Need