A team improved a coding AI agent's ranking from Top 30 to Top 5 on Terminal Bench 2.0 by redesigning the agent's harness (context engineering) without changing the underlying model. They optimized system prompts, dynamically selected relevant tools to reduce token overhead by ~60%, implemented continuous context compaction with LLM summarization to handle longer tasks, and injected backpressure signals from linters and test runners to reduce errors by ~80%. This approach reduced token consumption by 40%, saving approximately $109,000/year at scale, while improving output quality and reliability.
Use Case
Opening the operator briefing
Pulling the full operator breakdown, tooling context, and verification notes.
