GPT-5 Review: Capabilities, Limitations, and What It Means for AGI

The wait is finally over folks! OpenAI’s long-anticipated GPT-5 has finally arrived.

And to be honest — this isn’t just a chatbot anymore. It’s acting more like an agent. It’s out here building apps, handling schedules, creating research briefs, and basically becoming the most useful coworker you never hired.

So what’s new? Let’s break it down:

It’s “unified” now – GPT-5 blends the speed of GPT-4 with the deeper reasoning of OpenAI’s o-series models. Think: fast and smart in one.
Real-time router – No more picking the right settings. GPT-5 auto-decides how best to answer your questions: whether to respond instantly or take a thoughtful deep dive. It’s like having a brain that knows when to sprint vs. when to chill
Free users get it too – Yep, OpenAI’s finally making advanced AI available without the paywall. For the first time ever, free ChatGPT users are getting access to true reasoning power. Although with limits, but still that’s huge.

Sam Altman’s calling it “the best model in the world.” And let’s be honest — he’s not exactly known for overhyping.

But is it flawless? Not exactly.

Here’s what GPT-5 actually crushes:

Coding. It can generate full apps in one go. Scored 74.9% on SWE-bench - slightly ahead of Anthropic, DeepMind, and xAI.
Health questions – Wildly lower hallucination rate: just 1.6%. It’s more careful, more accurate, and even flags health issues proactively.
Creative tasks – It’s got “better taste” apparently. OpenAI says it sounds more natural, makes better design choices, and gives “vibier” creative outputs.
Hallucinations (aka making stuff up)? Is dramatically reduced. GPT-5 “with thinking” hallucinates only 4.8% of the time— That’s a HUGE drop from GPT-4o’s 20.6%.
Safety – It’s less likely to be sneaky, it’s more honest, and way smarter about shutting down bad actors while still responding helpfully to safe requests. In other words: it’s cleaner, clearer, and more trustworthy.

But let’s be real — it’s not all perfect.

It underperforms in some areas like creative writing; a lot of folks say GPT-4.5 (or even 4o) still has better flow for creative writing.
On certain “agent” tasks (like navigating websites), GPT-5 gets beat by Claude and o3 in specific scenarios.

And in the Real-world test?

One early tester plugged GPT-5 into everything — Cursor, Raindrop, Codex, Canvas. Full playground mode.

Their verdict? “This is the closest we’ve gotten to AGI.”

It crushed software engineering tasks– One-shotted entire apps. Solved nasty bugs. Refactored huge, messy codebases like it was nothing.

BUT… weirdly? It underperformed at writing.

According to the testers, it felt less sharp than GPT-4.5. So it’s not a “better at everything” situation — more like: “insane at some things, mid at others.”

Bottom line? Which model’s best depends on you and the task.

The big takeaway?

GPT-5 isn’t just another model update. It’s OpenAI getting dead serious about building AI agents that don’t just talk — they do.

It’s smarter. It’s more accurate. It hallucinates less. It’s getting dangerously close to full-on “let the AI handle it” territory.

Now, It’s not AGI yet. But it’s absolutely AGI-shaped.

And yeah — we’re officially in the next chapter of the AI story

Want to learn more? Here’s the complete report.

GPT-5 Review: Capabilities, Limitations, and What It Means for AGI

Reply

More From The Automated

For The AI Era