F-TIS: Heterogeneous GRPO Without Homogeneous Assumptions
Our method, F-TIS, enables diverse models to collaborate in decentralised GRPO by combining truncated importance sampling with filtering. Across model-size, expertise, and PEFT heterogeneity, F-TIS matches on-policy convergence and, in some cases, improves out-of-distribution reasoning.