An artificial intelligence pipeline built at the University of Warwick has validated 118 new planets and identified more than 2,000 high-quality planet candidates by processing data from 2.2 million stars observed by NASA's Transiting Exoplanet Survey Satellite (TESS), the most systematically characterized close-orbit planet sample ever assembled from the mission's first four years of operation. The findings, published in in the Monthly Notices of the Royal Astronomical Society, offer both a new catalogue of confirmed worlds and a precise statistical picture of just how rare certain extreme planetary environments actually are.

What the Pipeline Found and Why the Scale Matters

The numbers alone are striking: 118 newly validated planets, over 2,000 high-quality candidates, and nearly 1,000 of those candidates identified for the first time. But the significance runs deeper than a head count. To understand why, you need to know what "validated" means in planet-hunting. A candidate is any signal that looks like a planet passing in front of its star. A validated planet is one where researchers have statistically ruled out (to a high degree of confidence) every plausible alternative explanation for that signal. Validation is slow, labor-intensive work, and it typically requires a separate analysis pipeline for each candidate.

The Warwick team's approach changes that calculus. Their pipeline, named RAVEN, handles the entire process in a single automated workflow. Dr. Andreas Hadjigeorghiou, who led the pipeline's development, described the design intent plainly:

"RAVEN is designed to handle the whole process in one go, from detecting the signal, to vetting it with machine learning and statistically validating it."

Dr. Andreas Hadjigeorghiou, Pipeline Lead, University of Warwick

The result is a pipeline that does in hours what previously took weeks of individual human analysis, and does it with consistent, reproducible standards across every single candidate in the dataset.

How TESS Finds Planets in the First Place

Before examining what RAVEN does, it helps to understand the raw data it is working with. TESS, launched by NASA in 2018, stares at large patches of sky and measures the brightness of stars with extreme precision. When a planet orbits in front of its star from our line of sight, it blocks a small fraction of that starlight. The star appears to dim slightly, then return to full brightness once the planet has moved on. That dip in brightness, called a transit, is the telltale signature TESS hunts for.

Think of it like standing in a football stadium at night and watching someone walk in front of a floodlight on the far side of the pitch. You can't see the person directly, but you notice the light dim for a moment as they pass. That's the transit method: detecting the shadow of an unseen world as it crosses a distant sun.

The problem is that TESS generates enormous volumes of data, with brightness measurements for millions of stars sampled repeatedly over months. Not every dip in a star's brightness is a planet. Starspots, binary star systems, instrumental noise, and background eclipsing binaries can all mimic a planetary transit. Sorting signal from noise at scale is where automated pipelines become essential, and where machine learning has begun to outperform traditional human-coded filters. This kind of AI-driven data analysis is also reshaping medicine, as seen in how AI tools now diagnose advanced heart failure by processing routine clinical data at scale.

Inside RAVEN: Detection, Vetting, Validation in One Pass

RAVEN integrates three distinct stages that were previously handled by separate tools, or by human astronomers working individually through candidate lists.

The first stage is automated signal detection: scanning each star's brightness time series to identify periodic dips that match the mathematical profile of a transiting planet. The second stage is machine learning vetting, where a trained classifier examines each candidate's signal characteristics and flags those most likely to be genuine planets rather than false positives. The third stage is statistical validation, where the pipeline estimates the probability that the remaining candidates are actual planets rather than any alternative astrophysical explanation.

The analogy here is a three-stage sorting facility. In the first room, workers separate packages by rough size and shape. In the second room, a computer vision system reads labels and rejects misrouted items. In the third room, quality control checks each package against the manifest before it goes out the door. RAVEN runs all three rooms as a continuous process, applied to 2.2 million addresses simultaneously.

Dr. Marina Lafarga Magro, the lead author of the study and a postdoctoral researcher at Warwick, described the output:

"Using our newly developed RAVEN pipeline, we were able to validate 118 new planets, and over 2,000 high-quality planet candidates, nearly 1,000 of them entirely new. This represents one of the best characterised samples of close-in planets and will help us identify the most promising systems for future study."

Dr. Marina Lafarga Magro, Lead Author, University of Warwick

The phrase "best characterised" carries weight. Prior candidate lists from TESS were often assembled through separate pipelines with different assumptions, making comparisons difficult. RAVEN's uniform methodology means the entire catalogue was produced under consistent standards, a significant advantage for any researcher trying to draw population-level conclusions. The full study results are published in Monthly Notices of the Royal Astronomical Society.

The Neptunian Desert: Mapping a Gap in the Galaxy

One of the more striking outputs from this work is not about individual planets but about a pattern in their absence. Astronomers have known for years about a strange feature in the distribution of planets around Sun-like stars: a zone of orbital distance so close to the host star that Neptune-sized planets are almost never found there. This region has been called the Neptunian desert.

The reason for the desert is still debated, but the leading explanation involves atmospheric stripping. Planets that orbit very close to their stars are bathed in intense radiation. Gas giants have enough mass to hold their atmospheres even at close range. Rocky, Earth-sized planets have very little atmosphere to lose. But Neptune-sized planets (which sit in the middle) appear to lose their hydrogen and helium envelopes to stellar radiation over time, shrinking down to rocky cores. The result is a population gap: you find large gas giants close in, and you find small rocky worlds close in, but Neptune-sized planets in that zone are rare to the point of near-absence.

RAVEN's large, uniformly characterized sample allowed the Warwick team to put a precise frequency on this rarity for the first time. Their population analysis found that Neptunian desert planets occur around roughly 0.08 percent of Sun-like stars. Dr. Kaiming Cui, who led the population study, noted the significance:

"For the first time, we can put a precise number on just how empty this 'desert' is."

Dr. Kaiming Cui, Population Study Lead, University of Warwick

That precision matters because population statistics drive the theoretical models astronomers use to understand how planetary systems form and evolve. A vague description of the desert as "rare" can accommodate a wide range of formation theories. A hard number at 0.08 percent constrains the models considerably: some theories now fit less well, others fit better, and the field narrows its disagreements.

Ultra-Short Orbits and the 24-Hour Threshold

The Warwick study's focus on close-in planets also shines a light on one of the more alien categories in planetary science: ultra-short-period planets, which complete a full orbit in less than 24 hours. For perspective, Mercury, the fastest planet in our solar system, takes 88 days to orbit the Sun. A planet with a 20-hour orbital period is hugging its star so tightly that its year passes before Earth's day ends.

RAVEN's sample includes a substantial number of these objects, helping to refine the statistics on how common they are in the broader population. The pipeline found that approximately 9 to 10 percent of Sun-like stars host close-orbiting planets of some variety, a figure that reflects the overall population RAVEN is characterizing, not just the extreme ultra-short-period cases.

Understanding ultra-short-period planets at population scale matters for several reasons. They are among the easiest planets to detect (their short orbits mean frequent transits) and among the easiest to study atmospherically with instruments like the James Webb Space Telescope. They also represent a stress test for planetary formation theories, since current models struggle to explain how planets end up in orbits so tight they were almost certainly not formed there. Migration, tidal interactions, and resonance chains all play potential roles, and better population data helps weigh those possibilities.

Why Automation Was the Only Viable Path

The scale of the Warwick analysis, covering 2.2 million stars, makes clear why a manual approach was not an option. To put that number in context: if a human astronomer spent one minute examining each star's brightness record, working eight hours a day, it would take more than 23 years to work through the full TESS dataset. That is before any vetting or validation work even begins.

Machine learning has been applied to TESS data before, but typically in fragmented ways: one algorithm for detection, a separate tool for vetting, human review for validation decisions. The contribution of RAVEN is not that it applies AI to any single stage, but that it integrates all three stages into a workflow that can be run systematically across an entire mission archive.

This integration also enables reproducibility in a way that ad hoc pipelines do not. Because every candidate in the Warwick catalogue went through identical processing steps with identical parameters, researchers who want to cross-compare results or update the analysis with new data can do so reliably. Science benefits from consistent methodology at least as much as it benefits from any individual clever algorithm.

The timing is also notable. TESS continues to operate and continues to accumulate data. The first four years of observations were the input for this study. Future years of data, including repeat observations of the same fields which improve signal quality, will feed naturally into the same pipeline framework. The broader policy environment shaping how AI tools are deployed is itself evolving rapidly, as the Colorado AI Act policy overhaul illustrates.

Looking Ahead: ESA's PLATO and the Next Generation of Planet Hunting

The Warwick team's work arrives at an inflection point for the field. The ESA's PLATO telescope is scheduled to launch in 2026, with a mission explicitly designed to find Earth-like planets in the habitable zones of Sun-like stars, and to characterize those planetary systems with far greater precision than TESS was designed to provide. Further details on the mission are available from the ESA PLATO mission page.

PLATO will generate a data volume comparable to TESS, but its science goals demand even more rigorous planet validation. The kinds of pipeline architectures demonstrated by RAVEN are exactly what PLATO's science teams will need to process that data efficiently. The Warwick study functions, in part, as a proof of concept: automated end-to-end pipelines can produce high-quality, statistically robust planet catalogues at the scale that upcoming missions like PLATO will require.

The 118 validated planets in the new catalogue also provide a well-characterized sample for follow-up observation. Each validated planet now becomes a target for atmospheric characterization with existing telescopes, a benchmark for testing formation models, and potentially a stepping stone toward identifying which close-in systems might also host more distant, habitable companions. The 2,000-plus candidates represent the next queue for that follow-up work. The Astronomy Now report on this research provides additional context on the pipeline's significance for the field.

What the Warwick study ultimately demonstrates is that the bottleneck in planet discovery has shifted. Detecting planetary signals in starlight data is no longer the hard part: TESS has been generating those signals by the millions for years. The hard part is processing them fast enough to keep pace. RAVEN addresses that bottleneck directly, and the 0.08 percent figure for Neptunian desert occurrence is only the first statistical result to emerge from a catalogue that researchers will mine for years.

Whether future pipelines will refine that number further, or whether PLATO's higher-precision data will shift it entirely, is the kind of question that makes this work a starting point rather than an endpoint. The galaxy has 100 billion stars. Planet hunters have measured the brightness of 2.2 million of them in detail. The fraction of those 100 billion that host close-orbiting worlds, and what conditions those worlds actually experience, remains one of astronomy's most data-hungry open questions.

Sources

  1. AI Pipeline Validates 118 New Exoplanets - Astronomy Now
  2. RAVEN Pipeline Study - Monthly Notices of the Royal Astronomical Society
  3. University of Warwick Press Release - Exoplanet Discovery
  4. NASA TESS Mission Overview