Recursive Multi-Agent Systems: Latent State Transfer for LLMs

Learn about RecursiveMAS, a framework enabling efficient multi-agent collaboration through latent state transfer, significantly reducing token usage and improving performance in LLM applications. This technical guide covers its architecture, implementation, and key advantages.

5 min readAI Guide

Introduction

Introduction
RecursiveMAS is a framework that enables multiple AI agents to collaborate efficiently by exchanging latent state representations rather than natural language. This approach significantly reduces token usage and improves performance, making complex tasks more feasible and cost-effective for LLM-based applications.

Configuration Checklist

Element	Version / Link
Language / Runtime	Python 3.10 (implied)
Main library	RecursiveMAS (via `pip install -r requirements.txt`)
Required APIs	Not explicitly mentioned, but likely standard LLM APIs (e.g., OpenAI, Ollama)
Keys / credentials needed	Not explicitly mentioned, but likely API keys for LLM providers

Step-by-Step Guide

Step 1 — Setting up the Environment

To begin, create a new Conda environment for RecursiveMAS to manage dependencies effectively. This isolates the project's requirements from other Python projects.

conda create -n recursivemas python=3.10 -y
conda activate recursivemas

Step 2 — Installing Dependencies

Install the necessary Python packages using pip and the provided requirements.txt file. This ensures all required libraries for running RecursiveMAS are in place.

pip install -r requirements.txt

[Editor's note: The requirements.txt file content is not provided in the video. Refer to the official GitHub repository for the exact dependencies.]

Step 3 — Understanding Agent Communication (Text-based)

In traditional multi-agent systems, agents communicate using natural language (text). This involves a Planner, Critic, and Solver agent, each processing and generating text.

# Example of text-based communication flow (conceptual)

# Planner Agent's output
# 1. substitute x = tanθ
# 2. symmetry on [0, π/4]
# 3. collapse the integral

# Critic Agent's output (based on Planner's text)
# "Plan is sound. Bounds map cleanly. Symmetry is the key move."

# Solver Agent's output (based on Critic's text)
# I = integral from 0 to pi/4 of ln(1 + tanθ) dθ
# 2I = pi/4 ln 2
# I = pi/8 ln 2

This method, while intuitive for humans, can be computationally expensive for LLMs due to token processing overhead.

Step 4 — Implementing Latent State Transfer

RecursiveMAS proposes linking agents' "brains" directly by passing raw, undecoded numerical representations (latent states) instead of text. This is achieved through "InnerLink" and "OuterLink" modules, enabling more efficient and cheaper communication.

# Conceptual representation of latent state transfer
# Agent A1 generates latent thoughts (last-layer embeddings)
latent_thoughts_A1 = agent_A1.generate_latent_thoughts(input_context)

# These latent thoughts are passed directly to Agent A2 via InnerLink/OuterLink
# Agent A2 processes these raw numerical signals
input_embeds_A2 = agent_A2.align_input_embeds(latent_thoughts_A1, contexts_A2)

# Agent A2 then generates its own latent thoughts or decodes for output
latent_thoughts_A2 = agent_A2.generate_latent_thoughts(input_embeds_A2)

# This process can loop for multiple recursion rounds
# Agent A_n decodes for final output after several recursion rounds
final_output = agent_An.decode_for_outputs(latent_thoughts_A_n)

This direct transfer reduces the need for token encoding/decoding, leading to significant efficiency gains.

Step 5 — Patching Code Vulnerabilities with Agents

AI agents can be trained to identify and patch vulnerabilities in codebases by understanding the context and applying appropriate sanitization.

# Original vulnerable code snippet
def handle_request(req):
    token = req.get('auth')
    user = db.find(token)
    run(req['cmd']) # unsanitized input - A vulnerability
    return render(user.page)

# Patched code snippet using an agent's suggestion
def handle_request(req):
    token = req.get('auth')
    user = db.find(token)
    run(sanitize(req['cmd'])) # patched - The agent added sanitization
    return render(user.page)

This demonstrates the agent's ability to perform complex security tasks.

Comparison Tables

Token Usage Comparison (Recursion Round 3)

Metric	Recursive-TextMAS (1x)	RecursiveMAS (Fewer Token Usage)
Math500	1x	5.1x
AIME25	1x	4.7x
AIME26	1x	3.7x
GPQA-D	1x	4.2x
MedQA	1x	4.1x
Code Gen	1x	3.4x
Average 75.6% Fewer Token Usage with RecursiveMAS

Performance Comparison (AIME2026 Accuracy)

Method	Recursive Round r=1	Recursive Round r=2	Recursive Round r=3
Recursive-TextMAS	73.3%	73.3%	73.3%
RecursiveMAS	75.8%	77.3%	86.0%
RecursiveMAS shows significant accuracy improvement with more recursion rounds.

Small vs. Big Agent Performance (Difficult Math Problems)

Agent Type	Accuracy
3 Small Agents (RecursiveMAS)	92.4%
1 Big Agent	94.1%
RecursiveMAS allows smaller, cheaper models to achieve performance comparable to much larger, more expensive models.

⚠️ Common Mistakes & Pitfalls

Assuming direct plug-and-play: RecursiveMAS is still research-grade. Do not expect immediate production readiness without significant adaptation and testing.
- Fix: Treat it as a research project. Thoroughly test and adapt the framework to your specific use case, understanding its current limitations and early-stage development.
Ignoring optimal latent thought length: The efficiency gains are tied to an optimal latent thought length (around 80 steps). Exceeding this limit does not yield significant additional value and can waste computation.
- Fix: Monitor and tune the latent_thoughts_length parameter. Use evaluation metrics to find the sweet spot for your specific task, typically around 80 steps, to maximize efficiency without sacrificing performance.
Overlooking the "teacher" effect: The performance gains might be partly due to effective knowledge distillation from a larger "teacher" model used during training, rather than solely the brain-linking mechanism.
- Fix: Conduct controlled experiments to differentiate between the benefits of the RecursiveMAS architecture itself and the quality of the initial training data or teacher model. This helps in understanding the true source of performance improvement.

Glossary

Latent State Transfer: The direct exchange of raw, numerical representations (embeddings or hidden states) between AI agents, bypassing natural language tokenization and decoding.
Prompt Injection: A vulnerability where malicious input manipulates an AI model's behavior, causing it to deviate from its intended function or reveal sensitive information.
Hallucination: A phenomenon in AI models, especially LLMs, where the model generates plausible but factually incorrect or nonsensical information.

Key Takeaways

The number of AI agents is rapidly increasing, but their coordination and reliability remain significant challenges.
Traditional text-based communication between agents is inefficient and prone to errors compounding across steps.
RecursiveMAS introduces "cross-agent latent state transfer," allowing agents to communicate via raw numerical brain signals (embeddings) instead of natural language.
This latent state transfer significantly reduces token usage (up to 75% fewer tokens) and computational cost.
RecursiveMAS improves accuracy on complex tasks, enabling smaller models to achieve performance comparable to much larger, more expensive models.
The training cost for RecursiveMAS can be remarkably low (e.g., $4 for certain benchmarks).
The framework shows potential for a new scaling law, where more recursion rounds lead to better results.
While promising, RecursiveMAS is still in early research stages and requires further development for broader application.

Resources

Recursive Multi-Agent Systems Project Page: https://recursivemas.github.io/
Weights & Biases - Weave: wandb.me/papers
Paper on AI Agents May Always Fall for Prompt Injections: https://arxiv.org/pdf/2605.17634 (Note: The year 2026 is a placeholder in the video, actual publication date may vary.)
Brain-to-Text Communication Research (Willett et al. 2020): [Editor's note: Specific link not provided in video, search for "Willett et al. 2020 brain to text communication" for relevant papers.]

All guides Lire en français →