• The Automated
  • Posts
  • Did xAI twist the truth about Grok 3’s performance?

Did xAI twist the truth about Grok 3’s performance?

🚀 Big News: A Major Upgrade is Coming...

Hello and welcome to the Automated, your AI tour guide.

Grok 3, which was expected to be a game-changer, is now embroiled in controversy–-from accusations of manipulated benchmarks to awkward AI responses and a growing list of unanswered questions.

Let’s just say things are quickly escalating into a real mess over there.

Here’s what we have for you today:

  • 🤯 xAI Got Caught Inflating Grok 3’s Scores?

  • 💻 OpenAI Expands Its AI Agent, Operator, to More Countries.

  • 🤝 How to make generative AI a partner in your daily workflow.

  • 📈 How to use NotebookLM to boost my productivity.

  • 🤖 ChatGPT Prompt Of The Day: Email newsletters.

 🤯 xAI Got Caught Inflating Grok 3’s Scores?

Elon Musk’s AI company, xAI, is neck-deep in AI drama.

Their latest model, Grok 3, was supposed to be a big win—but instead, it’s landed them in a controversy sandwich.

First, OpenAI employees accused xAI of fudging benchmark results to make Grok 3 look better than it actually is.

Then, Grok 3 went off the rails, suggesting both Trump and Musk deserved the death penalty.

And to top it off? It briefly censored unflattering mentions of them.

Let’s start with the benchmark fiasco.

When xAI released Grok 3, they proudly posted a graph showing it outperforming OpenAI’s best available model, o3-mini-high, on a math test called AIME 2025.

But there was a catch—xAI conveniently left out a key metric called “cons@64,” which gives AI models 64 attempts to solve a problem and selects the most common answer.

With that metric included, OpenAI’s model actually performed better.

In short, xAI’s graph was like bragging about winning a race without mentioning the other guy was running uphill.

xAI’s co-founder, Igor Babushkin, defended the results, arguing that OpenAI has pulled similar moves before.

But then, a neutral third party stepped in, posted a more accurate graph—and surprise, surprise, it told a very different story.

Then, as if xAI didn’t have enough on its plate, Grok 3 went rogue—handing out death penalty suggestions like a malfunctioning dystopian judge.

When asked who in the U.S. most deserved capital punishment, it first named Jeffrey Epstein. But when reminded Epstein was dead, it pivoted to Donald Trump.

And when the question was tweaked slightly? It dropped another bombshell: Elon Musk. Yikes.

Naturally, xAI scrambled to contain the damage, calling it a “really terrible and bad failure.”

Grok has since been reprogrammed to dodge such questions, now responding with a much safer, “As an AI, I am not allowed to make that choice.”

Oh, and just to add another twist—Grok 3 also briefly censored negative mentions of both Trump and Musk.

All of this only fuels growing suspicion that AI companies aren’t just tweaking benchmarks to make their models look better—they’re also fine-tuning how they handle certain topics.

The real takeaway? 

AI benchmarks are messy—like Instagram filters. What you see isn’t always the full picture, and companies love to spin them in their favor.

And if AI companies keep bending the numbers (and tweaking the censorship dials), the real question isn’t which model is smarter—it’s which one is best at bending the truth.

Grok 3 may be marketed as the “world’s smartest AI,” but it clearly still has a few… quirks.

[Check out the full story here.]

 🚀 Big News: A Major Upgrade is Coming…

We’ve got something exciting in the works—and we wanted you to be the first to know. 🚀

For nearly 2 years, we’ve been sharing AI insights, tools, and deep dives straight to your inbox. Now, we’re taking things to the next level.

💡 Introducing The Lo Down Premium Experience.
It’s more than a newsletter—it’s your shortcut to understanding the latest in AI, delivered in bite-sized, actionable insights.

Here’s a quick peek at what’s coming:
✅ Exclusive Weekly AI Deep Dives: Actionable insights you won’t find elsewhere.
📖 The Automated’s AI Insider Toolkit ($9.99 value): Your guide to must-have AI tools.
🎁 1:1 Call Bonus (valued at $500): First 10 annual subscribers get a private strategy session.
💰 Affordable Launch Price: Just $3.99/month or $39/year (2 months free).

This is for those who want more signal, less noise in the rapidly evolving world of AI.

👉 The countdown begins now. Launching in 2 days!

Stay tuned!

 💻 OpenAI Expands Its AI Agent, Operator, to More Countries.

OpenAI is rolling out Operator, its AI-powered agent that performs tasks on behalf of users, to ChatGPT Pro subscribers in multiple countries, including Australia, Canada, India, Japan, and the U.K.

However, the service remains unavailable in the EU, Switzerland, and a few other regions.

Initially launched in January in the U.S., Operator allows users to automate actions like booking tickets, making restaurant reservations, filing expense reports, and shopping online.

Unlike standard chatbots, it operates in a separate browser window, where users can take control at any time.

For now, Operator is exclusive to the $200-per-month ChatGPT Pro plan and accessible only through a dedicated web page.

OpenAI has confirmed plans to integrate it across all ChatGPT clients in the future.

The AI agent market is heating up, with Google, Anthropic, and Rabbit developing similar tools.

However, Google’s AI agent is still on a waitlist, Anthropic provides access via API, and Rabbit’s model is tied to its proprietary hardware.

With Operator expanding its reach, AI-powered task automation is becoming more accessible than ever.

Will it revolutionize productivity, or is it just another AI experiment?

Unlock the full potential of your workday with cutting-edge AI strategies and actionable insights, empowering you to achieve unparalleled excellence in the future of work. Download the free guide today!

✍️ Editor’s Corner

One of my seed investments, Fasal, just got mentioned by Satya Nadella, CEO of Microsoft. Fasal is on a mission to elevate the quality and quantity of agriculture in India and, one day, the world.

I first invested in Fasal around 9 years ago. Do you know how many other investors clamored to follow my investment?

Zero. That’s right, 0.

Do you know how many sleepless nights I had to figure out how they could continue? About explaining to LPs why they haven’t grown as quickly as “some of their other investments did”? Not to mention the founders’ worries.

The lesson is - good things take time. Patience. What can you be patient for?

Cheers,

Tak Lo (Editor at The Automated, AI entrepreneur and thought leader. More at thetaklo.com)

🧱Around The AI Block

  • Synthesia: AI-powered tool for creating realistic AI avatars and voiceovers for videos.

  • QuickVid: AI-generated short-form video creator for YouTube Shorts, TikTok, and Reels.

  • Illustroke: AI-powered tool that converts text prompts into vector illustrations.

  • AI Picasso: AI art generator that transforms sketches and text into stunning digital artwork.

  • Voicemod AI: Real-time AI voice changer with custom sound effects for gamers and streamers.

🤖ChatGPT Prompt Of The Day: Email newsletters.

Struggling to keep your email newsletters engaging, consistent, and on-brand? ChatGPT-4 has you covered!

With the right prompt, you can generate a compelling email series that builds trust, delivers value, and keeps your audience coming back for more.

Here's how you can craft a high-impact email campaign with ease:

I'm creating an email marketing campaign to engage our subscribers and establish our expertise on [topic]. Act as an email marketing specialist with knowledge in [topic]. Write a series of five email newsletters, each providing valuable tips and tricks for [topic]. Each email should be around 300 words, have a catchy subject line, and include a clear call-to-action. Ensure the content is engaging, practical, and tailored to our audience of [describe target audience].

We've Compiled a List of Over 100 ChatGPT Power Prompts.

This should help streamline your interactions with ChatGPT and get the results you need more efficiently.

Best of all, It's free!

That's all we've got for you today.

Did you like today's content? We'd love to hear from you! Please share your thoughts on our content below👇

What do you think of today's email?

Login or Subscribe to participate in polls.

Your feedback means a lot to us and helps improve the quality of our newsletter.

Reply

or to participate.