OpenAI rolled out its ChatGPT agent to Plus, Pro, and Team subscribers on Thursday, offering users a powerful new way to automate online tasks. But the launch came with a warning: the agent could expose users to prompt injection attacks.
“When you sign ChatGPT agent into websites or enable connectors, it will be able to access sensitive data from those sources, such as emails, files, or account information,” OpenAI wrote in a blog post.
The feature will also be able to take actions, such as sharing files or modifying account settings.
"This can put your data and privacy at risk due to the existence of ‘prompt injection‘ attacks online, OpenAI conceded.
A prompt injection is a type of attack where malicious actors embed hidden instructions in content that an AI agent might read, such as blog posts, website text, or email messages.
If successful, the injected prompt can trick the agent into taking unintended actions, such as accessing personal data or sending sensitive information to an attacker’s server.
OpenAI announced the AI agent on July 17, initially planning a full rollout the following Monday.
That timeline slipped to July 24, when the company launched the feature alongside an app update.
ChatGPT agent can log into websites, read emails, make reservations, and interact with services like Gmail, Google Drive, and GitHub.
While designed to boost productivity, the agent also creates new security risks tied to how AI systems interpret and execute instructions.
According to Steven Walbroehl, CTO and co-founder of blockchain and AI cybersecurity firm Halborn, prompt injection is essentially a form of command injection, but with a twist.
“It’s a command injection, but the command injection, instead of being like code, it’s more social engineering,” Walbroehl told Decrypt. “You’re trying to trick or manipulate the agent to do things that are outside the bounds of its parameters.”
Unlike traditional code injections, which rely on precise syntax, prompt injection exploits the fuzziness of natural language.
“With code injection, you’re working with structured, predictable input. Prompt injection flips that: You’re using natural language to slip malicious instructions past the AI’s guardrails,” Walbroehl said.
He warned that malicious agents could impersonate trusted ones and advised users to verify their sources and use safeguards such as endpoint encryption, manual overrides, and password managers.
However, even multi-factor authentication may not be enough if the agent can access email or SMS.
“If it can see the data, or log keystrokes, it doesn’t matter how secure your password is,” Walbroehl said. “Even multi-factor authentication can fail if the agent fetches backup codes or SMS texts. The only real protection might be biometrics—something you are, not something you have.”
OpenAI recommends using the “Takeover” feature when entering sensitive credentials. That pauses the agent and hands control back to the user.
To defend against prompt injection and other AI-related threats in the future, Walbroehl recommended a layered approach, using specialized agents to strengthen security.
“You could have one agent always acting as a watchdog,” he said. “It could monitor for heuristics or behavior patterns that indicate a potential attack before it happens.”
Your Email