Detecting and Preventing Distillation Attacks

Key Highlights

Anthropic has identified three AI labs—DeepSeek, Moonshot, and MiniMax—conducting illicit distillation attacks on Claude.
These attacks involved over 16 million exchanges with fraudulent accounts to extract powerful model capabilities.
The threat of distillation attacks extends beyond any single company or region and requires coordinated action among industry players, policymakers, and the global AI community.
Anticipating this threat, Anthropic has invested in detection systems and developed countermeasures to protect their models from illicit use.

Detected Illicit Distillation Attacks

Anthropic recently announced that they have uncovered industrial-scale distillation attacks by three prominent AI labs: DeepSeek, Moonshot, and MiniMax. These attacks involved an astonishing 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, a clear violation of their terms of service.

The technique used, known as “distillation,” is a legitimate training method where less capable models are trained on the outputs of more powerful ones. While widely used, distillation can also be exploited to illicitly extract valuable capabilities from other labs at significantly reduced costs and timeframes.

DeepSeek’s Distillation Campaign

DeepSeek operated with over 150,000 exchanges, targeting Claude’s reasoning across diverse tasks. They employed a sophisticated “load balancing” strategy to evade detection, asking Claude to articulate its internal reasoning in step-by-step detail—effectively generating valuable chain-of-thought training data.

Among their targets were censorship-safe alternatives for politically sensitive queries, indicating an intent to train their own models to avoid sensitive topics. By analyzing the request metadata and IP addresses, Anthropic was able to trace these accounts back to specific researchers at DeepSeek.

Moonshot AI’s Targeted Approach

Moonshot’s campaign involved over 3.4 million exchanges, focusing on agentic reasoning and tool use. They used hundreds of fraudulent accounts across multiple access pathways, making detection more challenging by diversifying their approach. Later phases saw them attempting to extract Claude’s reasoning traces.

By matching the public profiles of senior Moonshot staff with request metadata, Anthropic was able to attribute the campaign accurately. This underscores the complexity and sophistication of these distillation attacks, which can easily bypass standard security measures.

MiniMax’s Active Campaign

MiniMax’s operation targeted agentic coding and tool use with over 13 million exchanges. They were detected while active—before MiniMax released their model, offering unprecedented visibility into the distillation process from data generation to launch. By pivoting quickly after Anthropic released a new model, they demonstrated their ability to respond swiftly.

MiniMax’s campaign was attributed through request metadata and infrastructure indicators, further confirming the scale of these attacks. The fact that they could shift focus so rapidly highlights the need for continuous vigilance in defending against such threats.

The Threat Landscape

The threat of distillation attacks is not limited to any single company or region; it extends globally and requires a coordinated response from the entire AI industry, including cloud providers and policymakers. The ability to extract powerful model capabilities at scale poses significant national security risks, particularly when these models are used by foreign labs with close ties to authoritarian governments.

Anticipating this threat, Anthropic has invested heavily in detection systems that identify distillation attack patterns. They have also developed product-level safeguards designed to reduce the efficacy of model outputs for illicit use without degrading legitimate customer experiences.

Response and Countermeasures

In response to these threats, Anthropic continues to strengthen their defenses against distillation attacks. Key measures include:

Detection: Building classifiers and behavioral fingerprinting systems to identify attack patterns in API traffic.
Intelligence Sharing: Collaborating with other AI labs, cloud providers, and relevant authorities to gain a more holistic view of the distillation landscape.
Access Controls: Enhancing verification processes for educational accounts, security research programs, and startup organizations that are commonly exploited by fraudulent users.

These countermeasures aim to make such attacks harder to execute and easier to identify. However, no single company can solve this problem alone.

A coordinated industry-wide response is essential to address the growing threat of distillation attacks.

The stakes are high in today’s AI arms race. Companies like Anthropic must remain vigilant and proactive in defending their intellectual property while also pushing for broader industry standards and regulations. The future of AI security hinges on these efforts, and the clock is ticking.

About US

Contact Us

Disclaimer

Home

News

Privacy Policy

Sitemap