Original article excerpt
Server-side extracted preview paragraphs from the original source.
Trading Inference-Time Compute for Adversarial Robustness
Initial evidence that reasoning models such as o1 become more robust to adversarial attacks as they think for longer.
Robustness to adversarial attacks(opens in a new window) has been one of the thorns in AI’s side for more than a decade. In 2014, researchers showed(opens in a new window) that imperceptible perturbations—subtle alterations undetectable to the human eye—can cause models to misclassify images, illustrating one example of a model’s vulnerability to adversarial attacks. Addressing this weakness has become more urgent as models are being used for high stakes applications and acting as agents that can browse the web and take actions on behalf of their users.