AI systems lose their safety controls during long conversations, increasing the risk of harmful or inappropriate replies, a new report revealed.
A few simple prompts can override most safety barriers in artificial intelligence tools, according to the study.
Cisco Tests Major AI Models
Cisco examined the large language models powering chatbots from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. The company measured how many prompts it took before the systems shared unsafe or illegal information.
Researchers ran 499 conversations using “multi-turn attacks,” where users asked several consecutive questions to trick AI tools into ignoring safeguards. Each exchange involved five to ten prompts.
They compared answers across different questions to assess how often chatbots complied with requests for sensitive or harmful details. These requests included leaking private company data or spreading misinformation.
The researchers extracted malicious information in 64 percent of multi-question chats but only 13 percent of single-question ones. Success rates varied widely—from 26 percent with Google’s Gemma to 93 percent with Mistral’s Large Instruct model.
Open Models Shift Safety Burden
Cisco warned that multi-turn attacks could spread dangerous content or help hackers access confidential company data.
The study found that AI systems often forget to enforce their safety policies during long interactions. Attackers can refine their questions to slip past built-in restrictions.
Mistral, Meta, Google, OpenAI, and Microsoft rely on open-weight LLMs, allowing public access to their safety configurations. Cisco explained that these models use weaker default protections so users can modify them. That design shifts responsibility for safety to those customizing the models.
Cisco added that Google, OpenAI, Meta, and Microsoft have introduced steps to limit harmful fine-tuning of their tools.
AI developers continue to face criticism for weak safety systems that criminals can easily exploit.
In August, Anthropic reported that criminals used its Claude model for large-scale theft and extortion, stealing personal data and demanding ransoms exceeding $500,000 (€433,000).

