`
Raquel Bernal Calmarza*
Pediatrician, Tarazona Health Center, Spain
*Corresponding author:Raquel Bernal Calmarza, Pediatrician, Tarazona Health Center, Spain
Submission: February 09, 2026;Published: February 25, 2026
ISSN: 2689-2707 Volume 6 Issue 3
Background: The period between 2024 and 2026 represents a transformative era for Artificial Intelligence
(AI) in medicine, characterized by the shift from narrow, task-specific algorithms to Multimodal Large
Language Models (M-LLMs) and Generalist Medical AI (GMAI). While technical capabilities have advanced
exponentially, the clinical integration of these tools remains complex and unevenly distributed.
Objective: This literature review synthesizes the latest research (2024-2026) to evaluate the state of AI in
clinical practice, focusing on diagnostic accuracy, administrative efficiency through ambient intelligence,
and the ethical-legal frameworks governing autonomous systems.
Methods: A literally review of the higher-impact articles published since January 2024 was conducted.
Sources included PubMed, JMIR, and medRxiv, with a focus on bibliometric analyses, clinical trials of
generative AI, and regulatory policy documents.
Result: Current literature indicates that generative AI has achieved “specialist-level” performance
in written clinical reasoning and diagnostic imaging across multiple specialties, including radiology
and pathology. “Ambient AI scribes” have demonstrated a significant reduction in physician burnout
(approximately 40%) by automating EHR documentation. However, a significant “maturity gap” persists:
despite the volume of publications, fewer than 1% of models have reached the stage of randomized
controlled trials. Emerging research highlights critical concerns regarding “automation bias,” algorithmic
transparency (the “black box” problem), and the potential for AI to exacerbate health inequities in lowresource
settings.
Discussion: The discourse has evolved from pure technical validation to implementation science. Key
themes include the necessity of “Human-in-the-Loop” systems, the impact of the EU AI Act on medical
software development, and the urgent need for a radical restructuring of medical education to include
AI literacy.
Conclusions: AI has transitioned from an experimental adjunct to a foundational infrastructure in
modern healthcare. Future success depends not on increasing model parameters, but on achieving
seamless interoperability, ensuring algorithmic fairness and establishing clear liability frameworks for
autonomous medical decision-making.
Keywords:Artificial intelligence; Genomics; Clinical data; Well-being; Healthcare
The integration of Artificial Intelligence (AI) into medicine has entered its “Third Wave.” If the first wave (the 2000s) was characterized by expert systems and logic-based rules, and the second wave (2010s) by deep learning and pattern recognition in imaging, the third wave-beginning in earnest around 2024-is defined by Generalist Medical AI (GMAI). This era is marked by the transition from “narrow” AI, which could only perform one task (such as identifying a specific type of skin lesion), to multimodal systems capable of reasoning across text, images, genomics and real-time clinical data. As we move through 2025 and into 2026, the primary driver of this evolution has been the maturity of Multimodal Large Language Models (M-LLMs). Unlike their predecessors, these models do not require isolated datasets for every new clinical question. Instead, they leverage foundation models trained on massive, diverse datasets, allowing them to adapt to rare diseases and complex co-morbidities with minimal additional training [1,2]. This shift addresses one of the most significant bottlenecks in medical AI: the scarcity of highquality, labeled medical data for niche conditions.
The context of this “Third Wave” is also defined by the Ambient Intelligence movement. In 2024, the medical community saw a surge in “AI scribes”-tools that use Natural Language Processing (NLP) to listen to patient-provider encounters and automatically generate structured clinical notes in Electronic Health Records (EHR). Research suggests that these tools have done more for physician well-being in two years than any previous technological intervention, directly tackling the “administrative tax” that has plagued modern medicine [3]. However, the rapid deployment of these technologies has outpaced the development of traditional clinical trial frameworks. In 2025, the literature began to grapple with a paradox: while AI models demonstrate “superhuman” performance on standardized medical exams and retrospective benchmarks, their “at-the-bedside” utility is often hindered by technical debt, lack of interoperability, and the “human factor” of clinician distrust [3,4]. The current research landscape is therefore not just about building better models, but about building better integration frameworks.
Furthermore, the democratization of AI tools has led to a shift in patient behavior. Patients are now using generative AI for selfdiagnosis and triaging at a scale never seen before. This has forced the medical establishment to reconsider the “gatekeeper” model of healthcare. The literature of 2024-2026 reflects a growing urgency to validate these consumer-facing tools to prevent misinformation while harnessing their potential to alleviate the burden on primary care systems [5,6]. In summary, the introduction of AI into medicine is no longer a futuristic promise; it is an infrastructure project. The goal is no longer just “accuracy,” but “alignment”-ensuring that AI behaviors match clinical safety standards, ethical norms and the practical realities of a high-pressure hospital environment.
Multimodal diagnostic superiority
The most pivotal research finding of 2025 is the measurable superiority of multimodal AI over traditional unimodal systems. While earlier AI was limited to analyzing a single data type (e.g., just an X-ray), recent studies have demonstrated that models integrating heterogeneous data-such as medical imaging, Electronic Health Record (EHR) text, and genomic markers-provide a “holistic” diagnostic accuracy that matches or exceeds senior specialists [7].
Cardiovascular health: A landmark 2025 meta-analysis published in JMIR showed that deep learning models combining radiomics with clinical features achieved an Area Under the Curve (AUC) of 0.93 for detecting carotid plaques, significantly outperforming models that used imaging data alone [8].
Gastroenterology: Research in 2025 reported that multimodal AI systems integrating endoscopic video with pathological and molecular data increased the detection rate of early gastric cancer to 92.7% [9]. Furthermore, these models were able to predict tumor recurrence with an AUC of 0.91, allowing for more aggressive early intervention.
Oncology: In colorectal cancer research, the fusion of CT radiomics with circulating tumor DNA (ctDNA) methylation profiles improved the accuracy of lymph node metastasis prediction by 23% compared to standard clinical assessments [10].
Ambient Clinical Intelligence (ACI) and burnout
Perhaps the most “human-centric” research breakthrough of 2024-2025 involves Ambient AI Scribes. Unlike traditional speechto- text, these systems use generative AI to understand the context of a patient-doctor conversation and convert it into a structured medical note.
Reduction in documentation burden: A randomized trial conducted by the University of Wisconsin (published in NEJM AI in late 2025) found that ambient AI tools reduced documentation time by 30 minutes per provider per day [11].
Clinician Well-being: At the Cleveland Clinic, a pilot involving 250 physicians across 80 specialties showed that 76% of clinicians chose to use the AI scribe for all office visits, reporting a significant decrease in “pajama time” (work done after hours) and a lower cognitive load during patient interactions [12].
Accuracy and “Note Bloat”: Interestingly, a 2026 study in medRxiv revealed that while AI-generated notes are highly accuratewith 15.6% of notes being signed by doctors with zero edits-there is a growing concern regarding “note bloat.” AI-drafted sections tend to be longer than human-written ones, raising questions about whether the increased length might dilute clinically vital information [13].
AI-driven drug discovery and clinical trials
By 2026, AI has moved from the periphery to the core of the pharmaceutical pipeline. The research shift here is from “identifying targets” to “designing molecules” and “simulating trials.”
AlphaFold 3 and beyond: Building on the Nobel prize-winning work of 2024, the release of AlphaFold 3 allowed researchers to predict interactions between proteins and DNA, RNA and ligands with unprecedented precision. This has led to the first AI-designed antimicrobial peptides for drug-resistant bacteria entering Phase I trials in early 2026 [14].
Digital twins in trials: A major 2025 trend was the use of “Digital Twins”-AI models of patients based on longitudinal data. These allowed researchers to simulate control groups, potentially reducing the number of human subjects needed for early-stage trials [15].
Pancreatic cancer breakthrough: In late 2025, researchers at the Italian Institute of Technology used AI to design a molecule that specifically targets the resistance mechanisms of pancreatic cancer cells, making them significantly more susceptible to standard chemotherapy [16].
The “Real-World” Performance Gap
Crucially, 2025-2026 research has also highlighted a “sobering” reality: AI performance often declines when moved from controlled datasets to diverse, real-world populations.
The generalizability gap: A narrative review published by the NIH in 2025 found that AI models frequently suffer from “distributional shift.” For example, a sepsis-prediction model that performed at 95% accuracy in a high-resource hospital dropped to 72% when deployed in a rural setting with different patient demographics and data-recording habits [17].
Detection of underdiagnosis: Studies released in 2024 and 2025 showed that AI algorithms for chest radiographs often have higher rates of underdiagnosis among underserved groups, including women and racial minorities, if the training data is not meticulously balanced [18].
FDA oversight: This has led to a surge in FDA activity. By 2026, the number of approved AI-enabled medical devices exceeded 300, with new “Algorithm Change Protocols” requiring companies to prove that their models remain accurate as real-world data “drifts” over time [19].
The implementation gap and the “black box”
One of the most persistent themes in recent literature is the “Maturity Gap.” As of late 2025, it is estimated that while thousands of AI models are published monthly, fewer than 1% reach the level of a Randomized Controlled Trial (RCT) or receive regulatory clearance as a Medical Device (SaMD) [20]. The discussion now centers on why these models fail in the “wild.” The “Black Box” problem has evolved; it is no longer just about not knowing how a model reached a decision, but about the distributional shift-where a model trained in a prestigious academic center in the US fails when applied to a rural clinic in the Global South [21].
The ethics of autonomy and liability
A significant portion of the 2024-2026 literature is dedicated to the legal “No Man’s Land” of autonomous AI. If an AI system suggests a dosage that leads to an adverse event, who is liable? Current legal frameworks in the EU and the US are trending toward “Human-inthe- Loop” requirements, but as AI becomes more sophisticated, the “Automation Bias”-the tendency for humans to over-rely on automated suggestions-makes the “human-in-the-loop” a potential legal fiction [22,23]. The discussion is moving toward Algorithmic Accountability, where the developers, not just the clinicians, share the burden of malpractice.
Bias, equity and the “digital divide”
The discussion has also taken a sharp turn toward social justice. Research in 2025 has exposed that many LLMs inherit the structural biases of their training data, leading to poorer diagnostic accuracy for underrepresented minorities [24]. The “Third Wave” of AI risks widening the health equity gap if the tools are only accessible to wealthy health systems or if the models are not trained on diverse genomic and phenotypic data. “Data Sovereignty” and “Fairness-by- Design” have become the new benchmarks for ethical AI research [25].
The future of medical education
Finally, we must address the “De-skilling” concern. As AI takes over diagnostic and administrative tasks, how do we train the next generation of doctors? Literature in 2025 suggests a radical overhaul of medical curricula, moving away from rote memorization and toward “AI Literacy” and “Critical Appraisal” [26,27]. The physician of 2026 is envisioned not just as a healer, but as an “AI Orchestrator,” capable of auditing AI outputs and managing the human-machine interface. In conclusion, the current state of medical AI is a delicate balance between unprecedented potential and systemic risk. The success of AI in medicine will not be measured by the complexity of its code, but by its ability to disappear into the background of clinical practice-becoming as reliable and unremarkable as the stethoscope.
Artificial Intelligence in medicine has matured from a promising experimental tool to a core component of modern healthcare infrastructure. The primary trend for 2026 and beyond is the transition toward multimodal autonomy, where AI manages both the diagnostic reasoning and the administrative burden of care. To ensure sustainable adoption, future research must prioritize validated clinical impact over laboratory accuracy and address the legal and ethical frameworks necessary for autonomous medical intervention.
© 2026 Filza Haq Nawaz. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.
a Creative Commons Attribution 4.0 International License. Based on a work at www.crimsonpublishers.com.
Best viewed in