A multi-institutional research team led by Weill Cornell Medicine, Cornell Tech, the Cornell Bowers College of Computing and Information Science, Columbia University, and NewYork-Presbyterian has developed an artificial intelligence model that can identify patients with advanced heart failure using data routinely collected during standard hospital visits, with no specialized testing required. The study was published on in npj Digital Medicine, a Nature portfolio journal focused on digital health research.

The system works by analyzing cardiac ultrasound images alongside electronic health records to predict a key physiological measurement that doctors currently obtain only through a demanding exercise test available at a small number of large medical centers. In a validation study across four hospital campuses, the model correctly distinguished high-risk patients approximately 85 percent of the time.

The Diagnostic Bottleneck Slowing Advanced Heart Failure Care

To understand why this matters, you need to know how advanced heart failure is currently diagnosed, and why that process fails so many patients.

Advanced heart failure is a distinct, severe stage of the disease in which the heart can no longer pump enough blood to meet the body's demands even at rest. An estimated 200,000 people in the United States live with the condition, but only a small fraction ever receive care calibrated to their actual disease severity. The reason is largely logistical: the definitive diagnostic test is a CPET.

A low peak VO2 is a hallmark of advanced heart failure and one of the primary criteria used to determine whether a patient is a candidate for heart transplantation or a mechanical heart pump called a LVAD. The problem is that CPET requires specialized equipment, trained technicians, and a clinical environment equipped to handle high-risk patients who may not tolerate vigorous exercise. That infrastructure exists primarily at large academic medical centers. For the vast majority of cardiology practices, and for patients in underserved or rural areas, CPET is simply not available. Patients who never receive the test often never receive the escalated care they need.

Predicting What a Treadmill Test Measures, Without the Treadmill

The Weill Cornell-led team approached this as a data problem. If peak VO2 is the number that matters, can a machine learning model estimate that number from information doctors already have?

The answer, the study suggests, is yes, at least with enough accuracy to flag the highest-risk patients for follow-up evaluation.

The team built what they describe as a multi-modal, multi-instance machine learning model. Breaking that phrase down is worth the effort. "Multi-modal" means the model draws from several distinct types of data simultaneously, rather than relying on a single input. "Multi-instance" refers to a technique that allows the model to process variable numbers of data samples for each patient, for example multiple ultrasound clips taken over time, and aggregate them into a single prediction rather than requiring a fixed, uniform input.

The data sources fed into the model are three-fold: standard cardiac ultrasound images (called echocardiograms), waveform imagery capturing the dynamics of heart valve movement, and structured data from EHR including diagnoses, medications, lab results, and demographic information.

Think of it this way: an echocardiogram is essentially a video of the heart beating. The ultrasound probe records dozens of frames per second from multiple angles. From those frames, a trained cardiologist can assess how well the heart muscle contracts, how the valves open and close, and whether the heart's chambers are enlarged, all of which correlate with disease severity. The waveform imagery adds another layer, capturing the velocity of blood flow across valves as a time-series signal. The electronic health record fills in the clinical history: how long the patient has had heart failure, what medications they are taking, whether their kidneys are showing signs of strain from reduced cardiac output.

No single one of these data sources is sufficient on its own. A severely weakened heart on echocardiogram doesn't always translate to a severely impaired peak VO2. The model's value is in synthesizing all three streams simultaneously, finding patterns across the combination that no individual test reveals cleanly. This same logic of combining heterogeneous data sources is driving breakthroughs across AI science, including efforts where AI pipelines validate thousands of exoplanet candidates by integrating multiple signal types from space telescopes.

Training Data and Validation Results

The model was trained on data from approximately 1,000 patients evaluated at NewYork-Presbyterian and Columbia University facilities. It was then tested on a separate set of 127 patients drawn from three additional NewYork-Presbyterian campuses, a meaningful design choice, because testing on patients from different clinical sites helps confirm that the model is learning genuine physiological signals rather than quirks of a particular institution's imaging equipment or data entry practices.

Across that external test set, the model achieved roughly 85 percent accuracy in classifying patients as high-risk versus lower-risk, where high-risk corresponds to the peak VO2 threshold clinically associated with advanced heart failure and eligibility for advanced therapies. The full methodology is detailed in the npj Digital Medicine study.

Dr. Fei Wang, the study's senior author and associate dean for artificial intelligence and data science at Weill Cornell Medicine, where he also holds the Frances and John L. Loeb Professorship of Medical Informatics, described the significance in terms of what already exists in the clinic.

"This opens up a promising pathway for more efficient assessment of patients with advanced heart failure using data sources that are already embedded in routine care," Wang said.

Dr. Fei Wang, Senior Author and Associate Dean for AI, Weill Cornell Medicine

That phrase, "already embedded in routine care," is the key. Echocardiograms are performed on virtually every heart failure patient. Electronic health records exist for every hospitalized patient. No additional equipment purchase, no additional test order, no additional patient burden is required for the AI model to generate its prediction. The infrastructure for deployment already exists wherever cardiology is practiced.

What Clinicians Stand to Gain

Dr. Nir Uriel, director of the Advanced Heart Failure Program at NewYork-Presbyterian, was direct about the clinical stakes.

"If we can use this approach to identify many advanced heart failure patients who would not be identified otherwise, then this will change our clinical practice and significantly improve patient outcomes and quality of life," he said.

Dr. Nir Uriel, Director, Advanced Heart Failure Program, NewYork-Presbyterian

The phrase "would not be identified otherwise" points to a documented gap in advanced heart failure care. Studies have repeatedly shown that the condition is underdiagnosed in community hospital settings, in part because CPET access is limited and in part because general cardiologists may not recognize when a patient's disease has progressed to the advanced stage. The consequence is patients remaining on medications appropriate for moderate heart failure while their condition deteriorates, without ever being referred to the specialized centers that could offer transplant evaluation or device therapy.

A screening tool that runs automatically on data the hospital already possesses could flag those patients for review by an advanced heart failure specialist, functioning less like a replacement for physician judgment and more like a first-pass filter that ensures fewer patients fall through the cracks.

This is a familiar pattern in AI-assisted diagnostics: the model's value is not in replacing the specialist but in expanding the funnel of patients who reach the specialist in the first place. The same logic has driven AI applications in diabetic retinopathy screening, lung nodule detection on CT scans, and sepsis prediction in emergency departments. The scale of AI investment driving these applications is also reshaping corporate strategy, as covered in analysis of big tech AI spending scrutiny in 2026.

Medicine Directing the AI, Not the Reverse

One of the more intellectually interesting aspects of this project is how the clinical problem shaped the technical approach, rather than the other way around.

Dr. Deborah Estrin, associate dean for impact at Cornell Tech and a collaborator on the project, put it plainly:

"This was a case of medicine shaping the future of AI, not just AI shaping the future of medicine."

Dr. Deborah Estrin, Associate Dean for Impact, Cornell Tech

The multi-instance learning framework, for example, was adopted specifically because echocardiograms don't come in uniform packages. A patient might have two ultrasound views or twenty, depending on the clinical protocol. Standard supervised learning models require fixed-dimension inputs, which would have forced the team to either discard data or normalize it in ways that could introduce error. The multi-instance approach treats the variable collection of images as a single "bag" and learns to make a prediction from whatever is in the bag, preserving the richness of each patient's actual imaging data.

Similarly, the decision to include waveform imagery as a separate modality, rather than only the raw ultrasound video frames, came from cardiologists on the team who recognized that valve dynamics carry clinical information not fully captured in the static anatomy visible on a standard echocardiogram. The AI architecture followed the clinical insight, not the other way around.

This distinction matters because a recurring criticism of medical AI is that it optimizes for whatever signal is easiest to model rather than whatever signal is most clinically relevant. The Weill Cornell team's approach, with active clinical leadership from both the cardiology and data science sides, is an attempt to close that gap.

The Cardiovascular AI Initiative and What Comes Next

The study is part of a broader CVAI, a collaboration between Cornell University, Columbia University, and NewYork-Presbyterian designed to accelerate the translation of machine learning research into clinical cardiology applications. The partnership combines Cornell's computing and data science infrastructure with Columbia's clinical research depth and NewYork-Presbyterian's patient population and multi-campus reach. More detail on the initiative's structure is available from the Weill Cornell Medicine press release and the Cornell Cardiovascular AI Initiative page.

The next phase, the team has indicated, involves designing prospective clinical studies, meaning studies in which the AI tool is used on real patients in real time, with outcomes tracked forward, rather than retrospectively analyzing existing records. That shift from retrospective to prospective evaluation is a critical scientific step and also a regulatory one.

To reach clinical deployment at scale, the model would require clearance from the U.S. FDA under its Software as a Medical Device pathway. The FDA has been actively developing frameworks for AI-based clinical decision support tools, and several dozen such tools have already received clearance in cardiology, radiology, and pathology. The agency's primary requirements center on demonstrating performance across diverse patient populations, including differences in age, sex, race, and ethnicity, and establishing that the model's outputs are interpretable enough for clinicians to act on responsibly.

The validation study's test set of 127 patients, while promising, is small by regulatory standards. The prospective clinical studies the team is planning will need to enroll substantially larger and more diverse cohorts to satisfy those requirements. They will also need to demonstrate that the tool's predictions lead to measurable changes in patient management: that flagged patients are actually referred and evaluated, and that earlier identification translates to better outcomes, not just better predictions.

A Familiar Problem, a New Tool

Advanced heart failure is, in some ways, a solved problem at the individual patient level. The therapies, including transplantation, left ventricular assist devices, and optimized medical management, are well established and often highly effective. The unsolved problem is reaching the patients who need them before their condition becomes irreversible.

AI-assisted screening won't fix the shortage of transplant-capable centers, the waiting list for donor hearts, or the insurance barriers that limit access to advanced heart failure programs. Those are structural problems that tools like this one cannot address. What the Weill Cornell model targets is narrower but still meaningful: the diagnostic gap between the approximately 200,000 Americans with advanced heart failure and the far smaller number who are ever formally evaluated as such.

The 85 percent accuracy figure is not perfection, and the team has not claimed it is. What they have demonstrated is that the gap between "we need a CPET machine and a specialized lab" and "we can identify high-risk patients" may be considerably smaller than previously assumed, and that the data to close that gap already exists in hospital systems across the country, waiting to be interpreted.

Whether the clinical study results, regulatory review, and real-world implementation bear out that promise is the question the next few years will answer. The pathway from a published model to a deployed diagnostic tool is long, and most candidates don't complete it. But the signal from this study is strong enough, and the unmet clinical need acute enough, that the answer to that question carries real weight for a large population of patients who currently receive no answer at all.

Sources

  1. AI Model for Advanced Heart Failure Diagnosis - Weill Cornell Medicine Press Release
  2. Multi-Modal AI Model for Peak VO2 Prediction - npj Digital Medicine
  3. Cornell Cardiovascular AI Initiative