Andy Zou, a researcher at Gray Swan AI, an IT security firm, says AI agents have only become prevalent in the past few months, as the focus has shifted from “just talking to the chatbot” to giving it tools that can take real-world actions—dramatically increasing the risks. The concern, he notes, is that agentic AI resembles the Hollywood caricature of George of the Jungle: ready to believe anything, no matter the consequences. “We found you can essentially manipulate the AI [to] override its programming,” Zou says.
In a new study, Zou and colleagues tested 22 leading mainstream AI agents with 1.8 million prompt injection attacks, around 60,000 of which successfully pushed the agents off their guardrails to grant unauthorized data access, conduct illicit financial transactions, and bypass regulatory compliance.
An earlier study showed even weaker defenses, with AI assistants fooled nearly 70% of the time into wiring money to fraudsters via buried “fine print” instructions. And just this week, browser developer Brave alleged that a similar website-based attack could manipulate Perplexity’s Comet browser. (Perplexity has since patched the flaw, though Brave contends the fix is incomplete.)
The lesson is clear: Even modest success rates at this scale translate into dangerous vulnerabilities. Before handing these bots the keys, they’ll need sharper critical thinking.
This isn’t merely hypothetical. One cryptocurrency user lost $50,000 when an AI agent was tricked into sending funds to the wrong wallet through malicious, agent-only instructions. As adoption grows—eight in 10 corporations now use some form of agentic AI, according to PricewaterhouseCoopers—the risks multiply.
Read more | FAST COMPANY