
Welcome Automaters 👋
So here’s a wild one for the weekend: Remember how Wikipedia editors have been fighting an uphill battle against AI-generated slop since ChatGPT dropped in 2022? Well, someone just pulled off the ultimate plot twist. It turns out that when you publish a "How to Catch a Robot" manual, you’re actually just giving the robots a "How to Hide" cheat sheet.
The Drama: Since late 2023, the volunteer legends over at WikiProject AI Cleanup have been doing the lords’ work. They’ve spent thousands of hours meticulously cataloging 24 specific patterns that scream "this was written by a bot."
We’re talking about the classic AI "tells" that drive us all a little crazy:
Significance Inflation: Turning a boring 1989 tax office into "a pivotal moment in the evolution of regional statistics."
Weasel Words: Phrases like "some experts suggest" or "widely regarded as" that sound smart but say nothing.
The -ing Obsession: Tacking on vague phrases like "reflecting broader trends" or "symbolizing ongoing challenges."
The Em Dash Obsession: Using dramatic dashes like they’re getting paid per—well—dash.
The "Uno-Reverse" Moment: Tech entrepreneur Siqi Chen, this past Saturday (Jan 18), dropped an open-source plugin called "Humanizer" for Claude Code.
Chen basically looked at Wikipedia’s comprehensive "signs of AI writing" list and said, "Cool, let's just tell the LLM to do the exact opposite." The plugin feeds all 24 of those specific Wikipedia red flags into Claude and gives one simple instruction: DON'T do these things.
The Result: It’s working. The GitHub repo absolutely blew up over last weekend, racking up 2,300+ stars. As Chen put it on X, it’s "really handy" that Wikipedia went and collated a detailed list of what not to do.
The Catch (and it’s a big one): This is purely a "vibes" update. The plugin makes AI sound WAY more natural and conversational, but it doesn't make it any smarter. It’ll still hallucinate facts or make up fake history, it just does it with the confidence and smoothness of a person you’d actually grab a beer with.
Why This Is a Headache: This raises some pretty spicy questions about the future of truth online. Because the same guide designed to protect one of the internet’s most trusted sources is now being weaponized to help AI hide in plain sight. It is a classic arms race where one side accidentally published their entire strategy manual for the enemy to read.
Our take: The only real way to spot AI-written content now, is to use AI so much that your brain develops a sixth sense. At this point, experience is the detector.
So yeah, if you come across a Wikipedia article today that sounds perfectly human and doesn't use a single "pivotal moment" or "triple adjective," stay on your toes. The bots are officially learning to blend in.
Here's what we have for you today
🤖 New Mercor Benchmark Tests AI Agents on Real-World Work—and the Results Are Humbling

Screenshot
Y’all, we need to talk about the most awkward disconnect happening in corporate America right now. It is like watching two people describe completely different movies after seeing the same film.
On one side, you have the "AI agents are replacing everyone" hype. On the other side, we finally have some cold, hard data that says... maybe don't fire your legal team just yet.
The Reality Bomb: Training-data giant Mercor just dropped a benchmark called APEX-Agents. Think of it as the SATs for AI agents trying to do actual white-collar work. They had investment bankers, lawyers, and consultants create real, soul-crushing tasks they do every day. Then, they let the AI loose in a "digital world" filled with Slack messages, Google Drive files, and messy spreadsheets.
The Verdict: Even the "god-tier" models like Google’s Gemini 3 Flash and OpenAI’s GPT-5.2 only got about 24% of the tasks right. Imagine hiring an intern who screws up three out of every four assignments.
But here's where it gets REALLY interesting. While AI is objectively failing most workplace tasks, there's a massive perception gap between the corner office and everyone else.
A new survey by research firm Section dropped some jaw-dropping numbers that perfectly illustrate this AI reality distortion field. They surveyed 5,000 white-collar workers at companies with 1,000+ employees, and the divide is almost comical.
It turns out CEOs are living their best life:
70%+ are "excited" about AI.
19% say it saves them more than 12 hours a week.
Only 2% say it saves them zero time.
However, actual workers are in the trenches:
Nearly 70% feel "anxious or overwhelmed."
40% say it saves them zero time per week.
Just 2% are saving 12+ hours.
Let that sink in for a second. Nineteen percent of executives are saving a full day of work every week, while 40% of workers are saving literally nothing. You could not design a more perfect illustration of the AI hype bubble if you tried.
So what's going on here?
According to Mercor CEO Brendan Foody (the 22-year-old billionaire college dropout), the problem is something called "multi-domain reasoning."
Here's the thing: AI can handle ONE task in ONE place pretty well. But the moment you need it to connect information from Slack AND Google Drive AND your email AND that PDF someone sent you three weeks ago? It completely falls apart. And guess what? That's literally how all work happens.
The APEX-Agents benchmark simulates this exact scenario.
Instead of asking trivia questions or testing general knowledge like most benchmarks do, APEX-Agents creates an entire fake workplace. We're talking realistic project scenarios with emails, Slack messages, Google Drive files, PDFs, spreadsheets, calendars—the whole nine yards. Then they give the AI tasks like "analyze this company's data export and tell me if it violates Article 49 under their own policies."
The leaderboard is fascinating (in a slightly depressing way for AI hype folks):
Gemini 3 Flash came out: 24.0% accuracy
GPT-5.2: 23.0% accuracy
Claude Opus 4.5: 18.4% accuracy
Gemini 3 Pro: 18.4% accuracy
GPT-5: 18.3% accuracy
Grok 4: 15.2%
For context, these tasks take a human pro with 7+ years of experience about 3.5 hours to finish.
But here's the plot twist: This isn't necessarily BAD news. It's actually GOOD that someone finally created a realistic test instead of everyone just vibing off hype. And honestly? The progress is pretty wild when you zoom out. Foody pointed out that last year, AI was getting these tasks right 5-10% of the time. Now it's at 24%. That's basically doubling year-over-year.
So why are CEOs having such a different experience than their employees?
Turns out, when you're a CEO, you use AI for high-level tasks like summarizing a report or drafting a "great job team" email. If the AI is 80% right, that is good enough because someone else will catch the errors.
But for employees? They use AI for precise technical work where errors have real consequences. A Workday survey found that while 85% of workers save some time with AI, most of those gains are eaten up by "The AI Tax." This is the time spent correcting, clarifying, or completely redoing the low-quality content the AI generates.
So what does this all mean?
The APEX-Agents benchmark gives us the hard data: AI agents aren't ready for primetime yet. They're getting better fast, but "better than last year" doesn't mean "ready to do your job."
The good news? The benchmark is now open source. Mercor released all 480 tasks on Hugging Face along with their entire evaluation infrastructure, called Archipelago. Now every AI lab in the world knows exactly what they need to fix to make these agents actually useful.
So what about you? Is AI actually clearing your plate, or are you just the robot's full-time editor now?
Hit reply and let us know!
Introducing the first AI-native CRM
Connect your email, and you’ll instantly get a CRM with enriched customer insights and a platform that grows with your business.
With AI at the core, Attio lets you:
Prospect and route leads with research agents
Get real-time insights during customer calls
Build powerful automations for your complex workflows
Join industry leaders like Granola, Taskrabbit, Flatfile and more.
🧱 Around The AI Block
😬 A new study links heavy AI use to depression and anxiety.
🩺 Amazon enters the agentic AI Health Race with AI medical assistant.
📝 Google now offers free SAT practice exams, powered by Gemini.
🎶 Spotify wants you to prompt your playlists with AI.
📚 This AI-powered learning app is gaining attention in kids’ education
🔥 ElevenLabs made an AI album to plug its music generator.
🛠️ Trending Tools
For the Smart Schedulers: Reclaim.ai is an AI calendar that doesn't just block off meetings; it automatically finds the best time for your habits (like gym or reading) and breaks, reshuffling them instantly if a high-priority task drops in. It’s the end of "scheduling crisis."
For the Curious Readers: ExplainPaper turns scary, brain-melting PDFs into “ohhh, that makes sense” explanations. You highlight the paragraph that broke you, and the AI rewrites it in plain English. Seriously, it’s perfect for mastering new topics in half the time.
For the Vibe Seekers: Endel uses AI to generate real-time, personalized soundscapes that adapt to your heart rate, the weather, and even the time of day. Whether you need "Deep Work" focus or "Wind Down" sleep tracks, it builds the audio specifically for your body’s current state.
For the Memory Keepers: GFP-GAN is a state-of-the-art restoration tool that uses its "Generative Facial Prior" AI to instantly clear up low-res, grainy, or damaged faces in seconds, making old memories look like they were taken on a modern iPhone.
For the Instant Designers: Microsoft Designer is a lightweight, purely AI-driven design tool that lets you describe what you want (e.g., "A cozy invitation for a backyard BBQ with a rustic feel") and generates multiple complete layouts with images and text that you can tweak in seconds.
Hope these tools help you reclaim some of your time today!
🤖 AI Workout Of The Day: How To Use AI Interior Design Tools To Visualize Your Ideas
So remember when redesigning your bedroom meant flipping through magazines for hours and praying your new furniture wouldn't look like a total disaster? Yeah, those days are officially over.
AI interior design tools are absolutely blowing up right now. It’s kind of wild: you upload a photo of your messy room, pick a vibe (Modern, "Japandi," or even "Dopamine Decor"), and BAM. The AI shows you exactly what your space could look like without you ever touching a measuring tape.
Here is the tea on how to use it:
Before you start clicking, you need a good base.
Lighting is everything: Use natural daylight. Yup, open those curtains!
Angle check: Stand in the corner of the room to get a wide shot.
Level up: Hold your phone straight, not tilted.
Clean the lens: Seriously, wipe that camera lens first.
Clear photo only (not blurry)
The AI Tools:
There are dozens of these popping up, but here are the heavy hitters we are watching:
Planner 5D: The “everything included” pick. You get 8,000+ items (in paid plans) to play with, an ‘AI Smart Wizard’ that basically holds your hand, VR walkthroughs, and crispy 4K renders all for about $4.55 a month. If you’re building a whole floor plan from scratch, this is the one.
Spacely AI: The “pro choice.” This one’s for designers, architects, and SketchUp folks who like to move fast and skip the babysitting. You get quick visual iterations, a SketchUp extension, and image-to-3D magic for when you just want the idea out of your head and on the screen.
Interior AI: Great for real estate vibes and virtual staging. It can take an empty room and make it look lived-in in about 15 seconds.
Decor8 AI: The "color specialist." If you just want to see how Sherwin-Williams "Navy" looks on your wall without buying a sample, this is the one.
RoomGPT: Transform any room with just one photo.
💡 Pro Tips for Best Results
Try a general style first like "Scandinavian" or "Minimalist."
Use detailed text prompts
Don't be afraid to try new things.
The first result is rarely perfect. Generate 5 versions and pick the best one.
The Bottom Line: You don’t need an expensive designer or a huge budget anymore. Most of these tools cost less than a Netflix subscription. It’s basically the tech world saying "we’ve got you" so you don’t end up with a couch blocking your door (the ultimate rookie move).
Ready to transform your space? Pick a tool and go wild. Just remember: if the AI suggests a neon green ceiling, maybe ask for a second opinion.
✍️ Prompt Writing Tip Of The Day:
If you’re ready to graduate from the “basic” presets, you’ve gotta start talking to the AI like you actually know what you want.
The cheat code is simple:
[Style] + [Specific items] + [Colors] + [Mood]
A good example:
[STYLE] [ROOM TYPE] with a [LAYOUT DESCRIPTION], featuring [KEY FURNITURE PIECES + MATERIALS]. Use a [PRIMARY COLOR PALETTE] with [ACCENT COLORS]. Lighting should be [LIGHTING TYPE + TIME OF DAY]. Add [DECOR ELEMENTS] and keep/replace [EXISTING ELEMENTS]. Overall mood should feel [MOOD WORDS], with a [REFERENCE VIBE].”
Also: If you don’t like the first result, don’t start over, just stack the prompt. Add one more detail at a time (lighting, materials, time of day) and let the AI dial it in. That’s how you get from “meh” to “wow.”
Here’s a Prompt you can try:
Modern minimalist living room with an open layout, featuring a low-profile cream fabric sectional sofa, walnut wood media console, and round stone coffee table. Use a neutral palette with warm beige and soft gray accents. Large floor-to-ceiling windows with sheer white curtains, lighting should be natural afternoon light with soft shadows. Add minimal decor like ceramic vases, stacked art books, and one large indoor plant. Overall mood should feel calm, cozy, and high-end, like a modern design magazine shoot.”
Is this your AI Workout of the Week (WoW)? Cast your vote!
That's all we've got for you today.
Did you like today's content? We'd love to hear from you! Please share your thoughts on our content below👇
What'd you think of today's email?
Your feedback means a lot to us and helps improve the quality of our newsletter.


