2022-08-15·paper

Broadening and building beyond classical reinforcement learning

A survey-flavored note on what we lose by treating RL as scalar-reward maximization, and what alternatives a more honest framing suggests.

Jacob Valdez

arXiv (preprint)

cite key: valdez2022broadening

PDF ↗

Premise

Classical RL inherits a particular philosophical commitment from its origin in operant conditioning: that goal-directed behavior reduces to maximizing a scalar signal. This commitment is convenient mathematically and false empirically.

The note collects the alternatives — multi-objective formulations, preference-based learning, intrinsic motivation, free-energy minimization — and asks what taking them seriously would change about how we structure agents.

The honest answer is "almost everything, in ways we haven't built tooling for yet."

Status

Long working draft; sections circulate informally.

Neighborhood

Premise

Status

Related