Evaluate your Amazon Nova Sonic voice agent at scale, no microphone required

Amazon has released the Nova Sonic Test Harness, an open source tool to evaluate their voice agent without needing a microphone. It automates multi-turn conversations and uses LLMs to judge voice agent quality at scale. The tool also detects mismatches between audio and text outputs, improving evaluation accuracy.

NowLaunchHigh-signal source

Signal trust

High-signal sourceSingle sourceEarly signal

Market reaction

Original article excerpt

Server-side extracted preview paragraphs from the original source.

In this post, we walk you through the Nova Sonic Test Harness, an open source framework that we built to solve both problems. It serves as a rapid iteration tool for tuning system prompts and tool configurations (run a conversation, see results, adjust, repeat) and as a comprehensive evaluation framework for validating voice agent quality at scale. It runs complete multi-turn conversations with Amazon Nova Sonic automatically, evaluates them using LLM-as-judge techniques, and can even detect cases where the model’s audio output doesn’t match its text output (audio hallucinations). No microphone required.

Voice agents are transforming how businesses interact with customers, handling appointment bookings, order inquiries, account management, and more through natural spoken conversation. But as these agents grow more capable, a fundamental challenge emerges: how do you test them?

Opening the briefing

Evaluate your Amazon Nova Sonic voice agent at scale, no microphone required

Original article excerpt