DeepSeek warns of ‘jailbreak’ risks for its open-source models


DeepSeek has revealed details about the risks posed by its artificial intelligence models for the first time, noting that open-sourced models are particularly susceptible to being “jailbroken” by malicious actors.

The Hangzhou-based start-up said it evaluated its models using industry benchmarks as well as its own tests in a peer-reviewed article published in the academic journal Nature.

American AI companies often publicise research about the risks of their rapidly improving models and have introduced risk mitigation policies in response, such as Anthropic’s Responsible Scaling Policies and OpenAI’s Preparedness Framework.

Chinese companies were less outspoken about risks, despite their models being just a few months behind their US equivalents, according to AI experts. However, DeepSeek had conducted evaluations of such risks before, including the most serious “frontier risks”, the Post reported earlier.

10:41

How Hangzhou’s ‘Six Little Dragons’ built a new Chinese tech hub

How Hangzhou’s ‘Six Little Dragons’ built a new Chinese tech hub

The Nature paper provided more “granular” details about DeepSeek’s testing regime, said Fang Liang, an expert member of China’s AI Industry Alliance (AIIA), an industry body. These included “red-team” tests based on a framework introduced by Anthropic, in which testers try to get AI models to produce harmful speech.

  • Related Posts

    OpenAI names ex-Uber India chief Prabhjeet Singh to lead India operations – Firstpost

    OpenAI on Sunday formally appointed former Uber India and South Asia President Prabhjeet Singh as its Managing Director for India. Singh will join OpenAI in September and will be the…

    Continue reading
    From baggage drop to international transit – Firstpost

    Air India launched its first flights under the government’s hub-and-spoke model on June 25, 2026, with Varanasi serving as the initial trial point. The choice of Varanasi carries symbolic weight,…

    Continue reading

    Leave a Reply

    Your email address will not be published. Required fields are marked *