Designing AI agents to resist prompt injection

OpenAI discusses methods to design AI agents that can resist prompt injection attacks. These attacks manipulate AI behavior by injecting malicious prompts. Improving resistance is crucial for safer and more reliable AI agent deployment.

ArchiveMajor

Signal trust

Single sourceEarly signal

PublishedWednesday, March 11, 2026 at 12:30 PMMar 11, 12:30 PM

FreshnessArchive

Story ID#50

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.

AI agents are increasingly able to browse the web, retrieve information, and take actions on a user’s behalf. Those capabilities are useful, but they also create new ways for attackers to try to manipulate the system.

These attacks are often described as prompt injection⁠: instructions placed in external content in an attempt to make the model do something the user did not ask for. In our experience, the most effective real-world versions of these attacks increasingly resemble social engineering more than simple prompt overrides.

Opening the briefing

Designing AI agents to resist prompt injection

Original article excerpt