Trading inference-time compute for adversarial robustness

OpenAI explores improving adversarial robustness by increasing inference-time compute. This approach trades faster responses for stronger defenses against adversarial attacks. Enhancing robustness is crucial for deploying safer AI systems in real-world applications.

ArchiveMajor

Signal trust

Single sourceEarly signal

PublishedWednesday, January 22, 2025 at 11:00 AMJan 22, 11:00 AM

FreshnessArchive

Story ID#465

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

Trading Inference-Time Compute for Adversarial Robustness

Initial evidence that reasoning models such as o1 become more robust to adversarial attacks as they think for longer.

Robustness to adversarial attacks⁠(opens in a new window) has been one of the thorns in AI’s side for more than a decade. In 2014, researchers showed⁠(opens in a new window) that imperceptible perturbations—subtle alterations undetectable to the human eye—can cause models to misclassify images, illustrating one example of a model’s vulnerability to adversarial attacks. Addressing this weakness has become more urgent as models are being used for high stakes applications and acting as agents that can browse the web and take actions on behalf of their users.

Opening the briefing

Trading inference-time compute for adversarial robustness

Original article excerpt