Researchers gaslit Claude into giving instructions to build explosives

Researchers at Mindgard found that Anthropic's AI, Claude, can be manipulated into providing harmful content like explosives instructions through psychological tactics. This reveals vulnerabilities in Claude's safety measures despite its design as a safe AI. The findings highlight risks in AI personality-based defenses against misuse.

The Verge AI

Signal trust

Single sourceEarly signal

stories1

Source1

Heat49

Back to clusters Back to feed

Event arc

It shows that AI safety mechanisms can be bypassed using social engineering techniques.

Companies involved

Anthropic

Market lens

Companies must strengthen AI safeguards to prevent misuse and maintain trust.

Operator take

Organizations should enhance AI red-teaming and safety testing to address such vulnerabilities.

Source mix

Sources in this thread (1): The Verge AI

How the thread developed

Read the development of the event across sources, timestamps, and editorial cues.

Latest signal