·research·paper

Broadening and building beyond classical reinforcement learning

A survey-flavored note on what we lose by treating RL as scalar-reward maximization, and what alternatives a more honest framing suggests.

Jacob Valdez
arXiv (preprint)
cite key: valdez2022broadening

Premise

Classical RL inherits a particular philosophical commitment from its origin in operant conditioning: that goal-directed behavior reduces to maximizing a scalar signal. This commitment is convenient mathematically and false empirically.

The note collects the alternatives — multi-objective formulations, preference-based learning, intrinsic motivation, free-energy minimization — and asks what taking them seriously would change about how we structure agents.

The honest answer is "almost everything, in ways we haven't built tooling for yet."

Status

Long working draft; sections circulate informally.

Neighborhood

Related

LLMs are the Update Rules of Intelligent Fractals: Escaping the Context Window with Iterative, Structured Local UpdatesLLMs are the Update Rul...Multi-graph former — a transformer over heterogeneous graph structureMulti-graph former — a ...Research and technical writingResearch and technical ...multigraph-nnmultigraph-nnThe Node Neural Network (NNN)The Node Neural Network (NN...Broaden and Build Conference 2021Broaden and Build Conferenc...Broadening and Building Beyond Classical Reinforcement LearningBroadening and Building Bey...Broadening and building beyond ...