Original article excerpt
Server-side extracted preview paragraphs from the original source.
A Blog post by IBM Research on Hugging Face
Recent advances in coding agents have sparked excitement around AI-assisted modernization. But an important question remains:
Existing software engineering benchmarks have demonstrated impressive progress in bug fixing and code generation, but framework migration presents a fundamentally different challenge. Success requires not only translating code, but also preserving behavior, adapting build systems, and navigating runtime dependencies.
To address this gap, we introduce ScarfBench (Self-Contained Application Refactoring Benchmark), an open benchmark for evaluating AI agents on cross-framework migration tasks in Enterprise Java.
Unlike traditional benchmarks that compare generated code against reference implementations, ScarfBench evaluates whether migrated applications actually build, deploy, and preserve behavior.
A simple repository migration can require changes across dependency injection, persistence configuration, queries, and framework descriptors. Small mistakes in any of these pieces can prevent successful deployment.
Framework migration requires translating framework semantics, not just source code.
