AISecurity.lol

January 2026 Recap

We’re only a little over a month into 2026, but software security failures in AI systems have already given us some interesting headlines that basically includes many of the major players (Google, Microsoft, OpenAI, Anthropic and IBM). And the first week of February isn’t quite over yet, and I already have one on the list for the February recap… (and that’s completely ignoring the dumpster fire that is ClawdBot/MoltBot/OpenClaw).

Let’s dive in!

Google Gemini Calendar Data Exfiltration

Link to original research: https://www.miggo.io/post/weaponizing-calendar-invites-a-semantic-attack-on-google-gemini
Summary: A threat actor can send a calendar invite to a victim, containing a prompt injection in the invite’s description. When the victim asks Gemini any benign question related to their calendar that causes Gemini to read the invite, the prompt injection takes effect. The researchers crafted a prompt injection that asks Gemini to create a new calendar invite containing any data related to the calendar (for example, summaries of the list of the user’s meetings). This new calendar invite includes the threat actor making the data accessible.

My take: There’s two things I really love about this attack. One is that the prompt injection entry point, the trigger and the data exfiltration mechanism are all in the same components (calendar). This makes everything semantically relevant which can make it hard to defend using classifiers like relevancy. The other thing I love here is creating a calendar invite to exfiltrate the data.

Microsoft Copilot Single-Click Data Exfiltration

Link to original research: https://www.varonis.com/blog/reprompt
Summary: This attack chain has three component. First, Copilot allowed links with a ?q= parameter in the URL to start a new conversation. A threat actor sends a Copilot link to a victim, containing a prompt injection in the URL parameter. Second, the researchers found that Copilot’s safeguards around making web requests only applied to the first time in a series, so the prompt injection instructs copilot to execute an action twice. And third, the web request’s result contains a follow-up instruction. The initial prompt injection instructs Copilot to continue executing, resulting in a chain that can result in a very dynamic data exfiltration attack.

My take: These are the classic software issues that I love to rant about. The fact that the second execution does not perform the same checks as the first is unbelievable. Second, that the result of a web request contains a prompt injection that isn’t filtered out AND causes more actions to be taken, also seems like beginner mistakes. Web requests should be about the highest on the list of untrusted data, and allowing it to result in follow-up actions means there’s no plan/task checking in place in the orchestration layer.

Claude Code ignores .claudeignore file

Link to news article: https://www.theregister.com/2026/01/28/claude_code_ai_secrets_files/
Link to one of several GitHub issues: https://github.com/anthropics/claude-code/issues/16704
Summary: When Claude Code works against a codebase, developers can add filenames and paths to a .claudeignore file to avoid those being read by the AI, similar to how a .gitignore file works. There could be many good reasons to do this, but a crucial scenario would be to avoid it reading secrets on disk, etc. Unfortunately, Claude seems to be able to access the files anyway, putting it at risk to exfiltrate this data if triggered through another vulnerability like a prompt injection.

My take: This feels like a vary basic implementation flaw. The code that reads and writes files should at a fundamental level not allow access for files in the ignore list. From what I understand from the articles and GitHub issue status, this issue has not been fixed yet?

Claude Cowork Data Exfiltration

Link to original research: https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files
Summary: Building on a known prompt injection vulnerability from last year. The prompt injection abuses the code execution environment, which is restricting network access but allows Anthropic’s own APIs. The threat actor in this case provides an API key to an Anthropic account they control to exfil the data.

My take: Code execution environments are difficult, they sort of by definition provide RCE. But I’ve seen many of those sandboxes not be actual sandboxes but just use filtering, rudimentary limitations (block DNS) etc. instead of hard network blocks.

Anthropic MCP Git Server RCE

Link to original research: https://cyata.ai/blog/cyata-research-breaking-anthropics-official-mcp-server/
Summary: Three CVEs for the price of one. A flag that was supposed to restrict the server to a specific path didn’t validate paths in subsequent tool calls. The second CVE allowed any directory to be turned into a Git repository and subsequent Git operations (git init). And lastly, two functions passed arguments directly from the user to the underlying library without sanitization, allowing a malicious user to inject extra parameters. The threat actor in this case could create script files, then create git hooks to have git execute said script files.

My take: Again these seem like basic security vulnerabilities. And I guess they are, since an LLM wasn’t involved here (but obviously could be if you hooked this MCP server up somewhere and your AI system gets prompt injected to execute this attack chain).

ChatGPT ShadowLeak leaks again with ZombieAgent

Link to original research: https://www.radware.com/blog/threat-intelligence/zombieagent/
Summary: The fixes OpenAI made for the ShadowLeak vulnerability apparently blocked the agent from dynamically crafting URLs, stopping that exfiltration path. However, the researchers just made static URLs for the alphabet and had ChatGPT exfiltrate the data one character at a time… Additionally, the researchers gained persistence by manipulating memory. OpenAI made some changes to prevent memory manipulation and connectors (to external services like Gmail etc) from being used in the same chat session. But of course with memory being persistent, the session is somewhat irrelevant.

My take: Classic prompt injection stuff and I feel memory is a severe weakpoint in many systems today. The nature of memory is that the semantic meaning of what goes in it may vary wildly, making it potentially difficult to check. But of course once memory is compromised, the consequences can be pretty severe. Memory entries should ideally not be allowed contain instructions, but how does one semantically verify that? Personally, I always turn off memory in any AI system I use. Beyond the security of it I find the isolation of chat sessions useful, and memory breaks that feature.

IBM Bob Runs Malware

Link to original research: https://www.promptarmor.com/resources/ibm-ai-(-bob-)-downloads-and-executes-malware
Summary: A prompt injection to execute command line gets caught in Bob’s human-in-the-loop feature. However, the injection executes several benign commands causing multiple requests to the user to allow the agent to continue. The user is also presented with a “Always Allow”, which of course is the goal of multiple benign request to annoy the user and get them to allow all. Additionally, the researchers found that chained commands were shown to the user only when semi-colons were used to separate them. They could use the redirect operator (>) to bypass the allow-list feature, causing the “always allow” to also apply to their chained commands (as the subsequent command was not detected).

My take: TIL that Microsoft isn’t the only big tech company that’s had a product named “Bob”. Anyway, this research shows how classic cybersecurity techniques are 100% applicable in the AI world. User prompt fatigue, attackers evading simple filtering logic, etc. Human-in-the-loop features are needed and are useful, but they clearly have their limitations.