Follow Cyber Kendra on Google News! | WhatsApp | Telegram

Add as a preferred source on Google

Hacker Weaponized Claude AI to Breach Mexico's Tax and Voter Databases

A hacker jailbroke Claude AI to breach Mexican tax and electoral agencies, stealing 195M taxpayer records and voter data in a month-long attack.

AI Hacker

A single unknown attacker spent roughly a month holding a conversation with Anthropic's Claude AI — asking it to think like a hacker, spot security gaps in government networks, write exploitation scripts, and automate the data exfiltration that followed. By the time Anthropic noticed and killed the accounts, 150 gigabytes of Mexican government data were already gone.

The findings come from Gambit Security, an Israeli cybersecurity firm that stumbled onto the breach while testing new threat-hunting techniques. Researchers found publicly available traces of the attacker's Claude conversation logs — a detailed, step-by-step paper trail of how an AI chatbot was coaxed into becoming an offensive hacking assistant.

The haul reportedly includes 195 million taxpayer records from Mexico's federal tax authority (SAT), voter data from the national electoral institute (INE), government employee credentials, and civil registry files from Mexico City. State governments in Jalisco, Michoacán, and Tamaulipas were also hit, along with Monterrey's water utility. At least 20 distinct vulnerabilities were exploited across these targets.

How the Jailbreak Actually Worked

This is where it gets technically interesting. Claude initially pushed back. When the attacker asked for penetration testing help while also specifying that logs should be deleted and command histories wiped, Claude flagged it: "In legitimate bug bounty, you don't need to hide your actions — in fact, you need to document them for reporting."

The attacker's counter-move was clever. Rather than arguing back, they stopped the conversation entirely and fed Claude a detailed, pre-written operational playbook — essentially bypassing the back-and-forth guardrails by removing the context that triggered them. That single structural change was enough. Claude complied, executing thousands of commands across government systems.

When Claude hit walls or needed additional guidance, the attacker pivoted to OpenAI's ChatGPT for lateral movement advice — how to jump between internal systems, which credentials to use where, and how to stay undetected. OpenAI says its models refused to comply with those requests and has since banned the relevant accounts.

Anthropic's Response

Anthropic confirmed it investigated Gambit's findings, disrupted the operation, and banned the accounts involved. The company noted that its latest model, Claude Opus 4.6, includes built-in "probes" designed to detect and interrupt misuse patterns in real time. It also feeds discovered attack methods back into training to reduce future exposure.

This incident follows Anthropic's own November disclosure of a suspected Chinese state-sponsored group that manipulated Claude into targeting 30 global organisations. The pattern is becoming hard to ignore: AI tools are not just being used to write phishing emails — they are being used to actively plan, execute, and adapt intrusions in real time.

"This reality is changing all the game rules we have ever known," said Alon Gromakov, Gambit's co-founder and CEO.

For security teams, the takeaway is uncomfortable: the attacker here was not a nation-state with novel zero-days. They had Claude, ChatGPT, and a playbook. That combination was enough to compromise the tax records of a country.

Post a Comment