CodeZero: Extending RL-Swarm Toward Cooperative Coding Agents
CodeZero extends Gensyn's RL-Swarm framework into the domain of code generation, transforming a distributed training network into a specialized ecosystem where models take on distinct roles - Proposers, Solvers, and Evaluators - and improve through continuous collaboration. The system uses RL-Swarm's peer-to-peer infrastructure while introducing a novel model-based reward system that enables safe, scalable training on real coding tasks.
From RL-Swarm to CodeZero
RL-Swarm demonstrated that distributed reinforcement learning could operate across a global network. The early design drew inspiration from research in decentralized multi-agent learning, including AbsoluteZero, which explored foundational ideas around peer-to-peer coordination and reward sharing.
CodeZero builds on that foundation as a new swarm environment within the RL-Swarm application, using the same peer-to-peer infrastructure, smart contracts, and distributed orchestration, while shifting toward a more practical and complex frontier: coding.
For context: RL-Swarm is the application layer that powers live swarm environments, CodeZero is one such environment focused on collaborative coding, and GenRL is the underlying framework that makes it possible to build and experiment with these systems. Together, they form the foundation of Gensyn’s approach to large-scale, cooperative learning.
In this new swarm environment, there are three distinct roles:
- Proposers: Generate new coding problems and adjust their difficulty over time to continually challenge Solvers.
- Solvers: Attempt programming challenges, learn locally through reinforcement-learning loops, and share their rollouts with the network to enable collective learning.
- Evaluators: Assess submissions and assign rewards based on performance and quality.
Together, these components form a distributed, self-sustaining learning economy: problems flow from Proposers to Solvers, solutions flow to Evaluators, and rewards flow back to guide improvement - continuously and at scale.
A New Reward System
In previous RL-Swarm environments, agents collaborated on math and logic problems, which presented an effective way to test coordination and learning dynamics. With CodeZero, we’re extending that same framework to coding tasks, which introduce new challenges around evaluation and safety.
Rather than relying on direct code execution, CodeZero uses a rule-based reward function guided by evaluator feedback. A local model evaluates each solution’s structure and reasoning, estimating whether it would succeed on its tests without ever running the code itself. This keeps the network reliable and decentralized while maintaining a strong learning signal.
Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score. The result is a robust and tunable reward model that enables large-scale collaborative training on real coding problems, without compromising reliability or control.
Learning Through Interaction
Each cycle of CodeZero is an experiment in emergent coordination.
This is a living, breathing, evolving system where the proposer adjusts to the success of the solvers. When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time. Evaluators keep score, rewarding meaningful progress.
These advances mark a major step toward reliable, autonomous research loops running directly within the swarm, where new problems, solutions, and evaluations continually inform one another.
Why Coding Problems?
Code is the native language of machines - structured, precise, and verifiable. It’s how systems express reasoning, test ideas, and build on one another’s outputs.
By grounding the swarm in coding tasks, we’re not just training models to solve problems - we’re enabling an ecosystem where machine intelligence advances by learning to build and reason together.
Coding provides a uniquely rich testbed for collaboration, where every attempt produces interpretable feedback, every solution can be validated or improved, and every iteration teaches both the model and the network how to learn more effectively together.
Over time, this approach aims to grow into something larger than code generation alone - a living environment where machines evolve through the shared language of problem-solving.
Toward a Cooperative Network of Models
CodeZero transforms the RL-Swarm network into a living ecosystem, a place where agents learn, teach, and evaluate one another. Over time, this society of models will expand beyond code, exploring new problem types and richer forms of cooperation.
Beneath every swarm environment, including CodeZero, lies GenRL, our open framework for building and experimenting with these systems. Anyone can use GenRL to create their own swarm environments, test new reward mechanisms, and study how cooperative learning emerges across domains.
CodeZero is one example of what’s possible today: collaborative agents learning to code together. In time, we expect new environments to arise, each contributing to our larger vision of a decentralized network of learning systems, improving together rather than in isolation.
—
Try it yourself. Contribute to RL-Swarm and see collective learning in action. If you’re new, simply clone the repository to get started. If you’re already running RL-Swarm, just git pull to update.
You can launch the swarm using Docker or by running the provided shell script, check the README for detailed setup instructions.
👉 github.com/gensyn-ai/rl-swarm
Links:
- Documentation: https://docs.gensyn.ai/testnet/rl-swarm/how-it-works/codezero
- Leaderboard: dashboard.gensyn.ai
- Discord: discord.gg/gensyn