AI Out-Diagnosed Two Real Doctors in the ER

The ER got a very unexpected new colleague folks

Researchers at Harvard Medical School and Beth Israel Deaconess Medical Center just published a study in Science (yes, the big-deal peer-reviewed journal), and the results are honestly a little jaw-dropping. They put OpenAI's o1 model head-to-head against two attending physicians in a real-world emergency room setting.

The goal? See who could actually figure out what was wrong with the patients first.

Here’s how the "Medical Smackdown" went down

The setup was simple but brutal: 76 patients arrived at the Beth Israel ER. Two internal medicine doctors, OpenAI’s o1, and the older 4o model were all handed the exact same electronic health records. We’re talking raw vitals, demographic info, and those brief, messy nurse's notes.

To keep things fair, two other doctors graded the diagnoses without knowing who, or what, wrote them.

The results? Honestly, it wasn't even a fair fight:

OpenAI o1: Nailed the correct diagnosis 67% of the time.
Doctor #1: Hit 55%.
Doctor #2: Hit 50%.

That gap is widest right at triage; which, as we all know, is the absolute highest-stakes moment in emergency care.

Here’s the cherry on top: the researchers confirmed they did not "clean up" or pre-process the data for the AI. The model had to dig through the same messy, real-world records the human doctors had. So yeah, it wasn't fed a perfect script; it had to deal with the chaos of a real hospital.

So, are doctors out of a job?

Not exactly. While experts are calling this progress "real," the path to actually using this in a hospital is still a giant question mark. As one expert from Mount Sinai put it: "The open question is how the heck do you introduce it into clinical workflows in ways that actually improve care?"

It’s a fair point. Even the Harvard researchers are being very careful to say there is zero evidence that AI should replace doctors. This is about AI being the ultimate "super-assistant"—the kind of partner that catches what a tired, overworked human might miss at 3:00 AM.

But here’s another question: Would you trust a chatbot to triage you in the ER, or do you still need to see a human in a white coat to feel safe?

Are we ready for "Doctor GPT," or is this getting a little too close for comfort?

Learn more here.

AI Out-Diagnosed Two Real Doctors in the ER

Reply

More From The Automated

For The AI Era