EMO: Pretraining mixture of experts for emergent modularity

EMO introduces a new pretraining method using mixture of experts to achieve emergent modularity in AI models. This approach allows models to dynamically allocate resources to specialized experts, improving efficiency and performance. The technique shows promise for building more scalable and interpretable AI systems.

ArchiveLaunchHigh-signal source

Signal trust

High-signal sourceSingle sourceEarly signal

PublishedFriday, May 8, 2026 at 6:03 PMMay 8, 06:03 PM

FreshnessArchive

Story ID#992

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

A Blog post by Ai2 on Hugging Face

Today we're releasing EMO, a new mixture-of-experts (MoE) model pretrained end-to-end so that modular structure emerges directly from the data without relying on human-defined priors. EMO lets you use a small subset of its experts - just 12.5% of the total - for a given task while keeping near full-model performance, and still works as a strong general-purpose model when all experts are used together.

Opening the briefing

EMO: Pretraining mixture of experts for emergent modularity

Original article excerpt