·research·paper
Broadening and building beyond classical reinforcement learning
A survey-flavored note on what we lose by treating RL as scalar-reward maximization, and what alternatives a more honest framing suggests.
Jacob Valdez
arXiv (preprint)
cite key:
valdez2022broadeningPremise
Classical RL inherits a particular philosophical commitment from its origin in operant conditioning: that goal-directed behavior reduces to maximizing a scalar signal. This commitment is convenient mathematically and false empirically.
The note collects the alternatives — multi-objective formulations, preference-based learning, intrinsic motivation, free-energy minimization — and asks what taking them seriously would change about how we structure agents.
The honest answer is "almost everything, in ways we haven't built tooling for yet."
Status
Long working draft; sections circulate informally.
Neighborhood