Amalgafy Labs developed the Micro-Expert-Router (MER), a software abstraction layer enabling efficient inference of large Mixture of Experts (MoE) models on commodity CPU-based cloud instances without GPUs. They demonstrated running the Mixtral 8x7B model (47B parameters, 4-bit quantization) on a standard VM with 128GB RAM and local NVMe SSD, achieving sustained 21.38 tokens per second over a 5,000-token context window. This challenges the prevailing assumption that high-bandwidth GPU memory is required for usable MoE inference speeds.
Use Case
Opening the operator briefing
Pulling the full operator breakdown, tooling context, and verification notes.
