AI Safety Weakens Over Time

AI systems lose their safety controls during long conversations, increasing the risk of harmful or inappropriate replies, a new report revealed.
A few simple prompts can override most safety barriers in artificial intelligence tools, according to the study.

Cisco Tests Major AI Models

Cisco examined the large language models powering chatbots from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. The company measured how many prompts it took before the systems shared unsafe or illegal information.
Researchers ran 499 conversations using “multi-turn attacks,” where users asked several consecutive questions to trick AI tools into ignoring safeguards. Each exchange involved five to ten prompts.
They compared answers across different questions to assess how often chatbots complied with requests for sensitive or harmful details. These requests included leaking private company data or spreading misinformation.
The researchers extracted malicious information in 64 percent of multi-question chats but only 13 percent of single-question ones. Success rates varied widely—from 26 percent with Google’s Gemma to 93 percent with Mistral’s Large Instruct model.

Open Models Shift Safety Burden

Cisco warned that multi-turn attacks could spread dangerous content or help hackers access confidential company data.
The study found that AI systems often forget to enforce their safety policies during long interactions. Attackers can refine their questions to slip past built-in restrictions.
Mistral, Meta, Google, OpenAI, and Microsoft rely on open-weight LLMs, allowing public access to their safety configurations. Cisco explained that these models use weaker default protections so users can modify them. That design shifts responsibility for safety to those customizing the models.
Cisco added that Google, OpenAI, Meta, and Microsoft have introduced steps to limit harmful fine-tuning of their tools.
AI developers continue to face criticism for weak safety systems that criminals can easily exploit.
In August, Anthropic reported that criminals used its Claude model for large-scale theft and extortion, stealing personal data and demanding ransoms exceeding $500,000 (€433,000).

What's Hot

S&P 500 and Dow Close Higher

UBS Faces Intensifying Pressure Over Alleged Nazi-Era Accounts Linked to Jewish Families

Rockstar Postpones Grand Theft Auto VI Again

Study Finds AI Thinks Best in Polish

U.S. Tech Firms Launch AI Products

New York AI Startup Investment Soars

NYC Smart Streetlight Project Boosts Safety

Quantum Computing Stabilization Method Achieved

Apple Breaks Mold With Bold iPhone 17 New Design

S&P 500 and Dow Close Higher

UBS Faces Intensifying Pressure Over Alleged Nazi-Era Accounts Linked to Jewish Families

Rockstar Postpones Grand Theft Auto VI Again

AI Safety Weakens Over Time

U.S. Warns Financial Institutions to be on Watch for Russian Sanctions

SpaceX Launches Starlink Satellites on ‘American Broomstick’ and Lands Rocket at Sea

Meta investigated after AI allegedly engaged children in inappropriate chats

AI Assistant for Astronaut Health

Categories

Important Links

Latest News

Lancashire Wildlife Trust launches project to restore rare insects

Vaccinating New Zealand’s Rarest Birds Against Avian Flu

Toothless Sharks? Ocean Acidification Threatens Apex Predators

Wild Bees Strategically Balance Diet Across Flowers

Non-vintage wines gain popularity as climate disrupts vineyards

Surge in Tourism Threatens Antarctica

Brazil Urges Nations to Deliver Climate Plans Before Cop30

What's Hot

AI Safety Weakens Over Time

Cisco Tests Major AI Models

Open Models Shift Safety Burden

Keep Reading

Categories

Important Links

Latest News