The Role of Feedback Alignment in Self-Distillation
We propose step-aligned feedback for self-distillation: feedback that follows the solver's reasoning trace step by step. By anchoring training on reasoning rather than stylistic tokens, it resolves a central bottleneck of the method and outperforms the misaligned context types used in prior work.