This is an academic paper describing SAPO, a meta-algorithm that wraps around your preferred policy gradient algorithm; models generate rollouts on a local batch of data, share them with a swarm, sample rollouts
Evaluating AI model performance is a necessity: it drives model selection, informs research, and allows us to reason about the frontier of machine intelligence. It’s also hard. Traditional approaches rely on human
BlockAssist is an AI Minecraft assistant that learns from your in-game actions.
Today we are introducing BlockAssist, an AI assistant that learns from its user’s actions in Minecraft. The assistant appears in-game
Flexible, decentralised multi-agent RL environments
Reinforcement Learning (RL) continues to prove its power in solving complex problems, from optimising systems to training intelligent agents. As we push the boundaries, especially in scenarios involving
This is an academic paper describing CheckFree, a novel recovery method for failures in distributed training that does not require checkpointing or redundant computation, enabling efficient training in the presence of frequent failures.