Physicians Must Lead the Assessment Procedure of Artificial Intelligence

Walking out of the cath lab one evening after attending to a patient with an acute myocardial infarction, a thought occurred to me. In cardiology, we would never launch a new device without thorough evaluation. Before a stent ever reaches a coronary artery, it goes through bench testing, animal research, and human trials to establish its safety and efficacy. This process generally spans years, and in some instances, decades. Data and design choices are validated and critically examined. Yet many artificial intelligence systems that impact medical decisions today lack that same level of scrutiny. From assessing chest pain in the ED to analyzing echocardiograms, from creating clinical notes to forecasting readmission risk, AI is now involved in almost every aspect of medicine. Tools like ambient documentation and diagnostic support systems are becoming more prevalent. However, while the technology has progressed at a rapid pace, our validation frameworks remain antiquated, obsolete, or nonexistent. The result is an expanding divide between innovation and trust, and that divide is precisely where physicians need to take the lead.

Why evaluating AI is crucial now more than ever

Earlier this year, I addressed the American College of Cardiology’s Board of Governors meeting about the essential need for structured evaluation of AI in clinical practice. Because here’s the reality: Evaluating doesn’t impede innovation; it ensures it is safe, ethical, and reproducible. Without it, enthusiasm may outstrip evidence. We ought to assess AI with the same rigor we apply to any clinical instrument. What does it mean to assess this with clinical rigor, and what frameworks should we explore:

Utility: Is it merely technology looking for a problem, or does the technology genuinely enhance outcomes or workflow?
Technical robustness: Is it both accurate and precise? Does it show reliability across varied populations and conditions, or does it falter at the edges?
Ethical integrity: Are we proactively evaluating for bias before implementation?
Regulatory transparency: Do we comprehend its logic sufficiently to explain it to a patient, or a jury?
Every new model must address these inquiries prior to being integrated into clinical care. AI may possess the ability to analyze patterns beyond human perception. Nevertheless, it should still fulfill the same evidence standards as any medical device or medication.

The evolving role of the clinician

I view AI as enhancing the clinician’s role rather than diminishing it. Clinicians are ideally situated to guarantee the integrity and relevance of the generated insights. This necessitates that we shift from being passive end-users to proactive clinical stewards of technology. When physicians engage early in dataset formulation, bias assessment, and post-market monitoring, we not only safeguard patients but also contribute to the development of superior AI. Clinical context is the critical component that many tech firms underestimate. We must query vendors and developers:

What data informed this model, and does it represent my patient demographic?
How does it perform on populations like mine, beyond just averages?
What is its false-positive rate, and how can I validate its outputs?
When it encounters failure, how will I be informed?
If we cannot confidently respond to these inquiries, we should refrain from utilizing the tool. Physicians are the final barrier between an algorithm’s confidence and a patient’s consequences.

From cath lab to courtroom: Implementing medical rigor universally

The same vetting principles of clinical AI extend well beyond hospital environments. In my role developing AI systems for high-stakes decision-making, our team of physicians, engineers, and legal specialists confronts these challenges daily. The difficulties are not solely technical; they are also ethical. When an AI system organizes thousands of pages of medical records for a malpractice lawsuit or synthesizes evidence for peer evaluation, accuracy isn’t optional. It’s fundamental to fairness. We construct our platforms with the same core principles we apply in medicine: traceability, validation, and human oversight. Every output connects back to its source document, every finding can be audited, and every user retains discretion over what is recorded. We’ve realized that the same reasoning discipline and transparent provenance we demand in clinical medicine should be applied in every area where AI interfaces with human judgment. Whether in a diagnostic decision in the ICU or a case evaluation in a law firm, the principle remains unchanged: Trust derives from verification.

The true danger isn’t AI; it’s unexamined AI.

AI will make errors. So do we. The solution isn’t fear; it’s accountability. This entails ongoing validation, bias identification, and human oversight by design. It means insisting that we hold companies to the same rigorous standards we have always maintained: sensitivity, specificity, positive predictive value, and negative predictive value, just as we anticipate from any diagnostic test. The most perilous errors arise when neither the clinician nor the developer comprehends why the system faltered. A biased dataset or a poorly generalized model can cause chaos faster than any human could. The “black box” mentality must be eliminated. If an algorithm’s reasoning cannot be explicated to a colleague, its effectiveness in patient care should be questioned.

Leading with clinical integrity

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.