Research Verde Verification System In Production In this blog post, we dive into the landscape of verification methods, discuss their advantages and drawbacks, and explain our method, Verde.
Research From Bundles to Time: A Theory of Decentralised Compute Markets We present a decentralised two-sided market design that treats compute as a time‑bound asset, enabled by reproducibility, verification, and checkpointing, yielding dynamic pricing and simple matching without combinatorial auctions.
Research Hail to the Thief: Exploring Attacks and Defenses in Decentralized GRPO” Our paper, “Hail to the Thief: Exploring Attacks and Defenses in Decentralized GRPO”, is the first systematic study that explores both the attack vectors and defense strategies in decentralised reinforcement learning for Large Language Models (LLMs).
Product CodeZero: Extending RL-Swarm Toward Cooperative Coding Agents CodeZero extends Gensyn's RL-Swarm framework into the domain of code
Product Introducing CodeAssist Today, we're introducing CodeAssist, an AI coding assistant that trains on your local machine. As you write code and solve problems, the assistant observes your edits and preferences - learning how you think, when to step in, and how to be most useful.
Research SAPO, Efficient LM Post-Training with Collective RL This is an academic paper describing SAPO, a meta-algorithm that wraps around your preferred policy gradient algorithm.
Product Introducing Judge Judge brings cryptographically verifiable AI evaluation to scale. Built on Verde, Judge ensures independent verification - eliminating opaque APIs.
Product Introducing BlockAssist BlockAssist is an AI Minecraft assistant that learns from your in-game actions, enabling reinforcement learning research in an interactive environment.
Article Introducing RL Swarm’s new backend: GenRL GenRL is a new framework designed from the ground up to simplify and accelerate the creation of advanced RL environments, particularly those involving multiple agents.
Research CheckFree: fault tolerant training without checkpoints This is an academic paper describing CheckFree, a novel recovery method for failures in distributed training that does not require checkpointing or redundant computation.
Research NoLoCo: training large models with no all-reduce This is an academic paper describing NoLoCo, a novel optimisation method for distributed training that replaces the global synchronisation step with a gossip method.
Research Diverse Expert Ensembles: embarrassingly parallel LLMs from diverse experts This is an academic paper that finds benefits to heterogeneity (different model sizes and number of training steps) when training embarrassingly-parallel ensembles of expert models.