Sign in Subscribe

Gensyn

Introducing Delphi

Introducing Delphi

Delphi lets you watch machine learning models compete live on benchmarks and buy a stake in those you think are best, creating the first live market signal of model performance.

CodeZero: Extending RL-Swarm Toward Cooperative Coding Agents

CodeZero: Extending RL-Swarm Toward Cooperative Coding Agents

CodeZero extends Gensyn's RL-Swarm framework into the domain of code

Introducing CodeAssist

Introducing CodeAssist

Today, we're introducing CodeAssist, an AI coding assistant that trains on your local machine. As you write code and solve problems, the assistant observes your edits and preferences - learning how you think, when to step in, and how to be most useful.

SAPO, Efficient LM Post-Training with Collective RL

SAPO, Efficient LM Post-Training with Collective RL

This is an academic paper describing SAPO, a meta-algorithm that wraps around your preferred policy gradient algorithm.

Introducing Judge

Introducing Judge

Judge brings cryptographically verifiable AI evaluation to scale. Built on Verde, Judge ensures independent verification - eliminating opaque APIs.

Introducing BlockAssist

Introducing BlockAssist

BlockAssist is an AI Minecraft assistant that learns from your in-game actions, enabling reinforcement learning research in an interactive environment.

Introducing RL Swarm’s new backend: GenRL

Introducing RL Swarm’s new backend: GenRL

GenRL is a new framework designed from the ground up to simplify and accelerate the creation of advanced RL environments, particularly those involving multiple agents.

CheckFree: fault tolerant training without checkpoints

CheckFree: fault tolerant training without checkpoints

This is an academic paper describing CheckFree, a novel recovery method for failures in distributed training that does not require checkpointing or redundant computation.

NoLoCo: training large models with no all-reduce

NoLoCo: training large models with no all-reduce

This is an academic paper describing NoLoCo, a novel optimisation method for distributed training that replaces the global synchronisation step with a gossip method.

Diverse Expert Ensembles: embarrassingly parallel LLMs from diverse experts

Diverse Expert Ensembles: embarrassingly parallel LLMs from diverse experts

This is an academic paper that finds benefits to heterogeneity (different model sizes and number of training steps) when training embarrassingly-parallel ensembles of expert models.

RL Swarm: a framework for collaborative RL

RL Swarm: a framework for collaborative RL

This is open source code (MIT Licence) for peer-to-peer nodes that perform collaborative reinforcement learning over the internet, accessible by anyone on consumer or datacentre hardware.

SkipPipe: a communication efficient method for decentralised training

SkipPipe: a communication efficient method for decentralised training

This is an academic paper for efficient communication in pipeline parallel training. It introduces an optimal scheduling algorithm that maximises performance and fault tolerance whilst minimising convergence impact from layer skips.