2021-10-01·project

The Multi-Agent Network (MAN)

A modular framework for fusing pretrained foundation models with self-organising learning agents on a shared, typed workspace.

github ↗

Problem

We have strong modality-specific architectures (BERT, ViT, WaveNet) and a handful of foundation models, but no obvious way to combine them. There are foundation perception models — where are the foundation policies? And once you do want to wire a vision encoder into a language model into a control loop, the glue code is bespoke every time: each pipeline reinvents how data moves from dataset to model to environment to reward.

Solution

The Multi-Agent Network (MAN) is an iterative-update deep-learning architecture built around a shared, typed workspace. Pretrained models become specialized agents with labelled ports; learning modules become common agents that grow, prune, split, and die on top of the same workspace. An executor schedules updates each timestep, a reward agent routes credit, and a compute-energy estimator caps how much work the network can do per tick — so the whole thing behaves like one composable function rather than a hand-stitched pipeline.

How

Runtime: Python, built on top of salina's workspace + agent abstraction. Every variable carries an owner, local name, type, value, and (optionally) a gradient.
Specialized agents: thin wrappers around pretrained models — man.agents.specialized.bert, .gpt, .vit, .wavenet, .xlm, plus triple_graph (DBPedia / Wikidata-backed knowledge graph) and IO adapters: gym_interface, dataset_interface, video, audio, webcam.
Common agents: PredAgent, SORNAgent, SOMPAgent, … share a sparse representation language where 0 ≡ None, name-scope their variables, and expose bottom · side · top ports. They keep their own parent/child lists and search the agent pool for new connections.
Sparse self-gating: no signal — even input — is required to be present each timestep except energy. Agents whose output is None are simply not applied to the workspace, so the graph naturally runs sparse-MoE-style.
Credit and budget: a RewardAgent distributes a reward-typed variable downstream; a compute-energy estimator gives the executor a per-step ceiling agents have to respect.

The architecture diagram above traces a single timestep — specialized agents on the left read and write task-specific variables, common agents on the right self-organise against the same workspace, and the reward / energy channels close the loop back across both sides.

The developer surface is meant to feel like a normal ML library:

man = MAN(agents=dict(
    bert=man.agents.specialized.bert,
    gpt=man.agents.specialized.gpt,
    webcam=man.agents.specialized.webcam,
    reward=man.agents.specialized.gym_interface(env),
    somp0=man.agents.common.somp.SOMP(),
    somp1=man.agents.common.somp.SOMP(),
    energy=man.agents.common.compute_energy.ComputeEnergyEstimator(),
), connections=[
    ("gpt:input", "somp1:input"),
    # ...
], grow=True)

# RL loop
traj = executor.run(man, env)
trainer.train(man, env, traj)

# or a supervised pipeline
man.fit(ds, epochs=10)

# add a new modality without rebuilding
man.add_agent(man.agents.specialized.audio.Audio(), "audio")
man.save("my_man.pkl")

Results

The MAN is an active research framework rather than a benchmarked product — the artefact is the architecture, the type system around the workspace, and the small zoo of agent base classes on GitHub. It seeded much of how I think about composing agents in later projects (TensaCode, Computatrum, ROAM).

Lessons

A typed workspace beats ad-hoc tensors. Once variables had owners, types, and optional gradients, "wire model A into model B" stopped being a code change and started being a connection edit.
Sparse-by-default is the right primitive. Treating absent signals as the norm — not the exception — made open-ended growth tractable; the system only spends energy where there is something to say.
Self-organising connection topology is hard to evaluate. Letting common agents grow and prune their own neighbours is elegant on paper, but without a tight reward signal the graph drifts. Future work would put a much tougher harness around topology learning before scaling up the agent zoo.

Neighborhood

Problem

Solution

How

Results

Lessons

Related