Gensyn (Page 2)

Sign in Subscribe

Introducing RL Swarm’s new backend: GenRL

Introducing RL Swarm’s new backend: GenRL

GenRL is a new framework designed from the ground up to simplify and accelerate the creation of advanced RL environments, particularly those involving multiple agents.

CheckFree: fault tolerant training without checkpoints

CheckFree: fault tolerant training without checkpoints

This is an academic paper describing CheckFree, a novel recovery method for failures in distributed training that does not require checkpointing or redundant computation.

NoLoCo: training large models with no all-reduce

NoLoCo: training large models with no all-reduce

This is an academic paper describing NoLoCo, a novel optimisation method for distributed training that replaces the global synchronisation step with a gossip method.

Diverse Expert Ensembles: embarrassingly parallel LLMs from diverse experts

Diverse Expert Ensembles: embarrassingly parallel LLMs from diverse experts

This is an academic paper that finds benefits to heterogeneity (different model sizes and number of training steps) when training embarrassingly-parallel ensembles of expert models.

RL Swarm: a framework for collaborative RL

RL Swarm: a framework for collaborative RL

This is open source code (MIT Licence) for peer-to-peer nodes that perform collaborative reinforcement learning over the internet, accessible by anyone on consumer or datacentre hardware.

SkipPipe: a communication efficient method for decentralised training

SkipPipe: a communication efficient method for decentralised training

This is an academic paper for efficient communication in pipeline parallel training. It introduces an optimal scheduling algorithm that maximises performance and fault tolerance whilst minimising convergence impact from layer skips.

Verde: a verification system for machine learning over untrusted nodes

Verde: a verification system for machine learning over untrusted nodes

This is an academic paper describing Verde, a verification protocol for machine learning programs, as well as the underlying Reproducible Operators (RepOps) system that enables it.

GPT@home: Why the Future of Training is Decentralized

GPT@home: Why the Future of Training is Decentralized

AI training costs are hitting $100B per run. Gensyn's decentralized infrastructure enables efficient training across edge devices at massive scale—making model development collaborative and accessible.