At the University of Pennsylvania Health System, an AI-powered algorithm once served as a subtle yet powerful prompt. It helped oncologists initiate difficult conversations with cancer patients — not just about treatment plans, but about end-of-life choices. But something went wrong.
During the COVID-19 pandemic, the predictive performance of the tool dropped by seven percentage points, according to a 2022 study led by Dr. Ravi Parikh, now an oncologist at Emory University. That seemingly small slip may have had devastating effects — missed conversations, continued aggressive treatments, and missed opportunities to prioritize quality of life over costly and painful interventions.
“This is not unique to one system,” Parikh noted. “Many health institutions are deploying AI tools without continuously checking whether they’re still working the way they should.”
And that’s the catch: AI doesn’t fix itself. In fact, it can degrade in silence — until the impact is felt in the real world.
AI Needs Maintenance — And a Whole Lot of Humans
Tech companies may advertise AI as sleek, smart, and self-improving. But hospitals are discovering a hard truth: keeping AI systems in top shape demands serious manpower, rigorous oversight, and ongoing financial investment.
“You don’t just plug it in and walk away,” said Dr. Nigam Shah, chief data scientist at Stanford Health Care. “Everyone thinks AI will solve our capacity problems, but if it makes care 20% more expensive, is it really helping?”
The irony? Hospitals often turn to AI to reduce workloads. But in practice, they might need more people to manage the tech than the tech can replace.
Also Read – New Effort by Health Insurers Aims to Simplify Prior Authorization
Even the U.S. Food and Drug Administration is sounding alarms. At a recent agency roundtable, FDA Commissioner Robert Califf openly admitted, “I don’t believe there’s a single health system in the U.S. today that can fully validate a clinical AI algorithm in-house.”
The Algorithms Are Here — Ready or Not
AI is already embedded in much of American healthcare. It’s used to:
-
Flag patients at risk of sudden deterioration
-
Approve or deny insurance claims
-
Transcribe and summarize doctor visits
-
Suggest potential diagnoses
Investors are pouring money into the space. Rock Health tracked over $350 million in funding last year alone for AI tools that automate clinical documentation. The FDA has approved nearly 1,000 AI-driven medical products.
But performance isn’t consistent — and that’s the problem.
One study at Yale analyzed six different hospital “early warning” AI systems — the kind that are supposed to sound the alarm when a patient’s condition starts to spiral. The results? Some worked well, others didn’t — and the differences weren’t obvious unless you had access to a supercomputer and days of data crunching.
“There’s no Yelp for algorithms,” said Dr. Jesse Ehrenfeld, former president of the American Medical Association. “There’s no industry-wide guidebook or standard that tells you how to monitor or audit these tools once they’re live.”
When AI Makes Medical Errors
Even the most common AI tools aren’t immune. Large language models — like the kind behind ChatGPT — are being tested to write summaries of patient visits. At Stanford, researchers tried this and found that even in the best-case scenario, the AI made errors 35% of the time.
Also Read – Doctors Warn Measles Comeback Signals Deeper Public Health Crisis
Miss one detail like “shortness of breath” or “family history of cancer” in a medical summary, and you’re not just looking at a typo — you’re risking lives.
Some failures are logical: algorithms trained on data from one lab may flounder if a hospital switches to another provider with different equipment. Other bugs seem bizarre.
In Boston, a genetic counseling tool built to recommend scientific literature started spitting out different results when asked the same question repeatedly. The developers called it “nondeterminism.” Translation: the AI was being unpredictably weird — which is not great in a high-stakes field like medicine.
The Cost of Making AI Safe
So what can hospitals do to protect patients?
The answer seems to be throwing a lot of resources at the problem. Stanford’s Shah says it took his team 8 to 10 months and over 100 hours of labor just to evaluate two AI models for fairness and reliability.
Some experts suggest creating “AI that monitors AI” — using algorithms to audit the behavior of other algorithms. But that solution is far from free, and ironically, it adds more complexity to a system already tangled in data, privacy laws, and regulatory red tape.
“It’s like melting a glacier to build a dam to control the melt,” Shah quipped. “Is that really the direction we want to go?”
The Road Ahead: Smarter AI or Smarter Oversight?
AI in medicine holds enormous promise — it could streamline diagnoses, reduce burnout, and personalize care. But for now, trustworthy AI isn’t a plug-and-play solution. It’s a full-time project, demanding engineers, clinicians, ethicists, and IT staff to keep watch.
Also Read – Republican Plan to Slash Medicaid Sparks Fears of Setbacks in Red States
Healthcare leaders are asking the hard questions now: How do we ensure AI helps more than it harms? How do we hold it accountable? And what happens when it breaks — quietly, invisibly — while no one is watching?
As the healthcare industry marches toward a high-tech future, one thing is clear: the machines won’t save us unless we’re paying close attention to them — and paying the people who do.