OpenAI’s new AIs are in their chaotic era?!

Plus - maximize your social media reach with AI-driven co-branding

Hello and welcome to the Automated, your AI tour guide.

OpenAI’s latest AI models, o3 and o4-mini, have been making waves with their impressive capabilities — but there’s a lot of drama lurking beneath the surface.

While these models are breaking records in some areas, they also have a wild side: hallucinations that could rival a sci-fi plot.

Curious about why OpenAI’s newest AIs are anything but predictable?

Here's what we have for you today

🤯 Inside OpenAI’s o3 and o4-mini Hallucination Drama

Remember how just last week OpenAI’s new o3 and o4-mini models freaked everyone out because they were somehow pinpointing exact real-world locations just from photos?

Yeah, well... plot twist: turns out these same models also have a pretty wild imagination.

Like, hallucination-level wild.

And not just "oops, a little mistake" — we're talking way more hallucinations than even OpenAI’s older models, including GPT-4o.

Here’s the breakdown:

  • OpenAI openly admitted that o3 and o4-mini hallucinate more than o1, o1-mini, o3-mini, and even non-reasoning models like GPT-4o. 

  • In fact, o3 hallucinated on 33% of questions in OpenAI’s internal benchmark about people — compared to just 14–16% from previous models.

  • And O4-mini? A whopping 48% hallucination rate. I mean that’s not just vibes; that’s full-on delusion.

  • Third-party researchers (shoutout to Transluce) even caught o3 making up fake actions — like claiming it ran code on a 2021 MacBook Pro and pasted the results... outside of ChatGPT. Like, bruh. No you didn’t.

Oh, and sometimes it confidently drops broken website links that lead nowhere!

And here’s the kicker: OpenAI doesn’t even know why it’s happening. Literally. In their own words: “more research is needed.” 

So what’s actually going on here?

Apparently, these new "reasoning models" perform better in areas like coding and math (big win), but the tradeoff is that they're generating more answers overall — and, surprise surprise, more answers = more hallucinations.  

  • One theory is that the reinforcement learning method OpenAI used may actually be amplifying the hallucinations instead of reducing them.

Now, to be fair, hallucinations aren’t always bad. Sometimes they spark creativity, according to studies— they’re how models come up with fresh, interesting ideas.

But in high-stakes settings like law, finance, or serious business deals, creativity isn’t what you want. You want facts. Cold, hard, boringly accurate facts. Not AI writing you some fan fiction in a legal contract.

The possible fix? Let models tap into live web search.

GPT-4o with web access already hit 90% accuracy on another benchmark — showing that search can definitely help rein in the nonsense, at least when you’re okay with exposing your queries to a third-party search engine.

But Here’s The Bottom line:

OpenAI’s push into "reasoning models" is exciting —BUT if reasoning = more hallucinations, then Houston, we’ve got a whole new problem.

And after last week's location drama, it’s very clear: These models are powerful, but still a little too chaotic to be trusted blindly.

Oh — and just to add a little extra spice:

Turns out OpenAI’s shiny new o3 model also scored lower on a benchmark than the company initially implied.

Maybe that’s why all our expectations have been getting lowkey let down.

Really you should definitely dig into that.

Thinking about starting your own newsletter? We recommend using Beehiiv — it's what we’ve used for the past 2 years and love.

If you want to support us, use our affiliate link below 👇

🔧 AI Troubleshooter: When ChatGPT Butchers Your Translations.

Hi everyone, we’ve got another fix for you!

One of our readers hit us up, confused and slightly horrified after ChatGPT turned their carefully crafted text into… well, something totally off in another language.

If you’ve ever tried translating with ChatGPT and ended up with a result that sounds like it went through five different languages and back, this one's for you.

Here’s how to get cleaner, more accurate translations:

  • Be Super Clear: Don’t just say “Translate this.” Instead, spell it out like: “Translate this text from English to Spanish.” The more specific you are, the better the result.

  • Keep It Short and Sweet: Dumping a huge wall of text into ChatGPT is not the best move. Break it into smaller sections so it can focus and stay accurate. Double

  • Check Critical Stuff: For critical content (like legal docs, client emails, or anything that can’t afford a mistranslation), run it through a second tool like DeepL or Google Translate to compare. A quick sanity check never hurts.

  • Ask ChatGPT to Double Check Itself: You can even ask ChatGPT to double-check itself by saying: "Review the translation for accuracy compared to the original." Sometimes it catches its own slip-ups!

  • Pro Tip: If you’re using the voice or image input features, make sure your audio is clear or your images are sharp and well-lit. Blurry inputs = blurry outputs.

That’s it for this round!

If you still don’t get the results you need, fill out this quick Google Form and we’ll help you troubleshoot it one-on-one.

PS: Premium subscribers get priority help!

Now onto the next big thing in the AI space…

 🤖 ChatGPT's New Feature That’s Got People Talking

So, OpenAI just took ChatGPT’s memory game up a notch with a newly added feature called “Memory with Search.”

Now, this might sound like a small tweak, but trust me, it’s actually a pretty cool step forward.

Here’s the scoop:

Memory with Search lets ChatGPT tap into your past convos (like remembering you’re vegan or that you live in San Francisco) to make your web searches way more relevant.

Here’s how it works:

If ChatGPT knows from memory that you’re vegan and live in San Francisco, and you ask, “What are some restaurants near me?” — it might quietly rewrite your question to something more helpful, like: “What are some good vegan restaurants in San Francisco?”

And the best part? If you’re not feeling the whole memory vibe, you can just turn it off in settings.

Now, some folks have already started seeing this feature pop up as of last weekend, so it’s a bit confusing right now who has it and who doesn’t. But we’re definitely keeping an eye on it.

So yeah, if this feels like OpenAI’s not-so-secret attempt to make ChatGPT even more useful and personalized (without making it feel creepy), you’re not wrong. They’re definitely pushing for that “it knows me so well” feeling.

But here’s the question: Is this in a good way? Should we be worried about AI remembering personal stuff, or is this a game-changer?

I mean, just last weekend some ChatGPT users noticed something a little… odd.

Apparently, the bot started calling them by name during conversations—even when they never told it what to call them. Yikes, right?

The reactions have been mixed. While some find it unnecessary and creepy, others are still trying to wrap their heads around it.

One user even compared it to a teacher who keeps calling their name in class. 🤦‍♂️ Not exactly the vibe ChatGPT was going for, right?

It’s unclear whether this is tied to the new memory feature, even though some users say it happened after they disabled the memory settings.

It’s also unclear when exactly this change took place, and OpenAI hasn’t commented on it yet.

But if you ask me, this name-calling thing could be walking a fine line. As one expert put it, using a name can be a powerful way to build relationships, but overdoing it can come across as fake or intrusive.

Maybe it’s just a clumsy attempt at making ChatGPT feel more human, but honestly, most people don’t want a chatbot calling them by name like it’s their new best friend.

So, yeah, this Memory with Search thing? It’s cool in theory, but as ChatGPT gets more personalized, some folks might be wondering if it’s crossing the line.

But what do you think? Should ChatGPT stick to just knowing your preferences without the name-calling?

🧱Around The AI Block

🤖 ChatGPT Prompt Of The Day: Co-branding on Facebook or Instagram.

Co-branding on Facebook or Instagram is essential as it allows businesses to leverage each other's audience and credibility.

It results in increased brand exposure, customer engagement, and trust.

See this as an AI WoD!

As a marketing manager at [company], you’ve been tasked with running a co-branded campaign on Instagram for [product]. Your partner is [influencer name], who has a large following in your target demographic. How would you plan and execute this campaign to maximize reach, engagement, and sales? What are some potential challenges you might face and how would you address them? What metrics would you use to measure the success of the campaign?

That's all we've got for you today.

Did you like today's content? We'd love to hear from you! Please share your thoughts on our content below👇

What'd you think of today's email?

Login or Subscribe to participate in polls.

Your feedback means a lot to us and helps improve the quality of our newsletter.

🚀 Want your daily AI workout?

Premium members get daily video prompts, premium newsletter, an no-ad experience - and more!

Already a paying subscriber? Sign In.

Premium members get::

  • • 👨🏻‍🏫 A 30% discount on the AI Education Library (a $600 value - and counting!)
  • • 📽️ Get the daily AI WoD (a $29.99 value!)
  • • ✅ Priority help with AI Troubleshooter
  • • ✅ Thursday premium newsletter
  • • ✅ No ad experience
  • • ✅ and more....

Reply

or to participate.