·research·reading
Attention Is All You Need
The Transformer paper that made attention-only sequence modeling the default substrate for modern language models.
A baseline paper for understanding the architecture stack beneath modern LLMs: self-attention, multi-head attention, position encodings, residual streams, and the compute shape that made large-scale sequence modeling practical.
Neighborhood