Anthropic’s Newly Released Claude Opus 4.5 Faces Persistent Cybersecurity Risks

The AI labs did NOT take a Thanksgiving break, because Anthropic just dropped a surprise update: Claude Opus 4.5 — remember we talked about it yesterday? If it’s a little fuzzy, here’s your quick refresher:

They’re calling it “the best model in the world for coding, agents, and computer use.”

Big words. Even bigger expectations, right?

Anthropic says Opus 4.5 pretty much levels up everything:

Sharper, deeper research
Cleaner slide generation
Smarter spreadsheets
Major upgrades to Claude Code
New tools inside the Claude apps for longer-running agents that can operate across Excel, Chrome, and even full desktop environments

And the best part? It’s all available right now — on the apps, the API, and all major cloud providers. So yeah, it’s officially out in the wild.

But here’s the real plot twist: security is still the dragon nobody can slay.

Even though Anthropic claims Opus 4.5 is harder to trick with prompt injection than any other frontier model, it’s still dealing with the same cybersecurity headaches every agentic system faces.

Prompt injection, btw, is basically the AI version of someone slipping hidden instructions into a website and convincing the model to ignore its guardrails — kinda like sliding a sticky note under a table that says, “psst… misbehave.”

And yep, Anthropic’s own tests admit Opus 4.5 still isn’t immune.

In one coding-misuse evaluation with 150 malicious coding requests, Opus 4.5 refused 100%. Love that.

But when the tests moved to more dangerous stuff like malware, DDoS scripts, and invasive monitoring code — the model only refused about 78%.

And in broader computer-use scenarios involving things like spying, shady data collection, or harmful content? Refusals landed around 88%.

It’s better than most, but still not where you’d want an AI agent living inside your browser to be.

Oh, and the examples Anthropic shared are… yikes.

We’re talking requests like:

Compiling usernames of people struggling with gambling addiction for targeted ads
Drafting fake ransomware emails.

The model didn’t fall for all of it, but enough slipped through to make you raise an eyebrow.

So here’s the big picture:

The agentic future is arriving fast.

Models are getting absurdly strong, ridiculously capable… and still way too easy to confuse, manipulate, or socially engineer.

Anthropic’s making progress — real, measurable progress — but the industry as a whole is stuck in a race between capability vs. security, and right now?

Capability is winning.

Anthropic’s Newly Released Claude Opus 4.5 Faces Persistent Cybersecurity Risks

Reply

More From The Automated

For The AI Era