Stop Multitask Training. Just DUME.
Dynamic Upcycling MoE (DUME) cleverly reuses dense experts trained on different domains to create a unified MoE multidomain expert model. DUME retains the knowledge of the original dense experts without any additional training, offering a cost-effective, scalable solution with no compromises.