Anthropic's Opus 4.6 system card breaks out prompt injection attack success rates by surface, attempt count, and safeguard ...
New research outlines how attackers bypass safeguards and why AI security must be treated as a system-wide problem.
As LLMs and diffusion models power more applications, their safety alignment becomes critical. Our research shows that even minimal downstream fine‑tuning can weaken safeguards, raising a key question ...
Chaos-inciting fake news right this way A single, unlabeled training prompt can break LLMs' safety behavior, according to ...
A viral AI caricature trend may be exposing sensitive enterprise data, fueling shadow AI risks, social engineering attacks, ...
That helpful “Summarize with AI” button? It might be secretly manipulating what your AI recommends. Microsoft security researchers have discovered a growing trend of AI memory poisoning attacks used ...
The GRP‑Obliteration technique reveals that even mild prompts can reshape internal safety mechanisms, raising oversight ...
Agentic AI tools like OpenClaw promise powerful automation, but a single email was enough to hijack my dangerously obedient ...