At the University of Pennsylvania Health System, doctors rely on more than their training to guide difficult conversations with cancer patients — they also lean on artificial intelligence. An algorithm predicts a patient’s likelihood of dying and nudges oncologists to bring up treatment goals and end-of-life care.
But during the COVID-19 pandemic, that predictive model silently lost its edge.
A routine audit found the AI tool’s accuracy dropped by 7 percentage points, a failure that potentially left hundreds of patients without critical conversations that could have prevented aggressive — and perhaps unwanted — treatments like last-ditch chemotherapy.
AI Can Help — Until It Doesn’t
Dr. Ravi Parikh, now at Emory University and lead author of the 2022 study on the Penn algorithm, believes many AI systems embedded in clinical workflows degraded during the pandemic — and most hospitals likely haven’t noticed.
“Many institutions aren’t tracking whether these tools continue to work after deployment,” he warned.
AI’s success in healthcare isn’t just about code and data — it’s about constant human oversight. Algorithms are dynamic; they learn, adapt, and sometimes break down. Yet despite AI’s promise to enhance care and ease workloads, the reality is more complicated: every tool still needs regular checkups and a support team.
Also Read – Republican Plan to Slash Medicaid Sparks Fears of Setbacks in Red States
Automation Without Oversight Is a Recipe for Risk
“AI is sold as a way to increase efficiency and reduce costs,” said Dr. Nigam Shah, Chief Data Scientist at Stanford Health Care. “But if maintaining AI ends up inflating costs by 20%, is it really sustainable?”
It’s a question health systems are beginning to wrestle with. AI is already embedded in dozens of medical decisions: from triaging patients in emergency rooms to processing insurance claims to transcribing doctor visits. In 2024 alone, the FDA greenlit nearly 1,000 AI-driven medical tools.
And while the startup world is betting big — some health-focused AI firms are projected to generate $10 million or more annually — few systems have the resources to regularly test and recalibrate these tools.
FDA Commissioner Robert Califf recently remarked that no U.S. health system currently has the infrastructure to fully validate an AI tool once it’s live in clinical settings.
Where’s the AI Consumer Guide?
Hospitals struggle to compare and evaluate AI products. There’s no central body like Consumer Reports for algorithms, and the average clinician doesn’t have access to a supercomputer to run performance tests.
A recent study by Yale Medicine, for instance, examined six “early warning” AI systems used to detect patient deterioration. While the analysis uncovered massive performance differences among the products, executing that test took days of computing power — an impossible task for most hospitals.
“We have no standards,” said Dr. Jesse Ehrenfeld, former president of the American Medical Association. “There’s nothing that tells you how to validate or benchmark an AI model once it’s deployed.”
That lack of standardization is especially troubling when even small mistakes can lead to big consequences. Stanford researchers found that generative AI tools used to summarize patient histories had a 35% error rate — even under ideal conditions. Leaving out a single word like “fever” or “diabetic” in a clinical summary could be life-threatening.
Also Read – Justice Department Sends Threatening Letters to Medical Journals
Why Algorithms Go Haywire
Sometimes algorithm errors have understandable roots. A change in a hospital’s lab testing vendor, for example, might alter data formats and confuse a model. Other times, problems arise for no clear reason.
At Mass General Brigham, a tool used to help genetic counselors find relevant DNA literature began producing different answers to the same question. “We were seeing nondeterministic behavior,” said Sandy Aronson, a tech leader there — meaning the tool’s results varied unpredictably.
AI can be promising in this space, especially in genomics, but Aronson notes: “The technology still has a long way to go.”
Who Watches the Watchers?
The solution, many experts agree, is a bigger investment in ongoing monitoring. Stanford’s Shah said his team spent eight to ten months auditing just two models for fairness and reliability — a resource drain few hospitals can afford.
Some researchers suggest that AI should monitor other AI, catching bugs or bias before they become dangerous. But even that solution would require staffing up: hiring data scientists, ethicists, and engineers — roles that are already in short supply.
“It’s kind of like melting icebergs to monitor other melting icebergs,” Shah quipped. “At what point are we just building a whole new bureaucracy of machines and people to babysit machines?”
Is AI Still the Future of Healthcare?
Yes — but with caveats. Automation has already proven helpful in reducing doctor burnout by transcribing patient visits and speeding up documentation. Startups focused on “ambient documentation” received over $350 million in funding last year alone.
But without standardized benchmarks or rigorous oversight, doctors worry these tools are moving faster than the system can keep up with.
Also Read – FDA Approves Injectable Lenacapavir in Major Step for HIV Prevention
Ultimately, AI isn’t a plug-and-play solution. It’s a new type of workforce, one that needs training, testing, recalibration — and supervision. And that means hospitals must prepare not just to adopt AI, but to manage it, long-term.
After all, even the smartest machines can make dangerous mistakes when left unsupervised.