Diverse Expert Ensembles: embarrassingly parallel LLMs from diverse experts
This is an academic paper that finds benefits to heterogeneity (different model sizes and number of training steps) when training embarrassingly-parallel ensembles of expert models.