Human-AI teaming to improve accuracy and efficiency of eligibility criteria prescreening for oncology trials: a randomized evaluation trial using retrospective electronic health records.

2026-02-03
Nature communications 17(1)
- Ravi B Parikh
- Likhitha Kolla
- Elizabeth A Beothy
- William J Ferrell
- Brenda Laventure
- Matthew Guido
- Anthony Girard
- Yang Li
- Khaled Essam Mahmoud Dosoky
- Karim Tarabishy
- Parth S Patel
- Ayana Andalcio
- Kristin Maloney
- Jose Ulises Mena
- Wael Salloum
- Jinbo Chen
- Ezekiel J Emanuel

PubMed: 41634037
DOI: 10.1038/s41467-026-68873-8

Study Design

Type: Clinical Trial
Sample size: n = 355
Population: 355 patients with non-small cell lung or colorectal cancer
Methods: randomized noninferiority trial using retrospectively collected clinical charts comparing prescreening by trained research staff alone vs. augmented with a pre-trained language model

Large Human Trial
Rigorous Journal

Few adult patients with cancer enroll in oncology clinical trials. A rate-limiting step to trial enrollment is prescreening, involving clinical research staff manually abstracting unstructured health records to identify patients who meet eligibility criteria. Prescreening is time-consuming, labor-intensive, and prone to human error, resulting in under-identification of eligible patients. Neurosymbolic AI language models may approximate or improve the accuracy of prescreening through automated abstraction of enrollment criteria from longitudinal unstructured patient charts. We conduct a randomized noninferiority trial using retrospectively collected clinical charts to compare the accuracy and efficiency of prescreening by trained research staff alone (Human-alone) vs. augmented with a pre-trained language model (Human+AI), among a cohort of 355 patients with non-small cell lung or colorectal cancer. Sample size is determined from analyses of a preliminary dataset as well as a prespecified, interim dataset of 74 charts. Chart-level accuracy, the primary endpoint of Human+AI prescreening is noninferior and superior to Human-alone (76.5% vs. 71.1%). However, efficiency is unchanged with similar average time per chart review, the secondary endpoint, (37.4 vs. 37.8 min). AI-assisted abstraction most improves accuracy for biomarker, staging, and response criteria. Performance is limited in some domains due to automation bias. Although improvements are modest, this large randomized trial evaluating a human-AI framework for oncology prescreening shows that AI language models can approximate and augment human-driven prescreening to enhance identification of trial-eligible patients, potentially increasing enrollment. The trial is registered on ClinicialTrials.gov (NCT06561217).

Human-AI teaming to improve accuracy and efficiency of eligibility criteria prescreening for oncology trials: a randomized evaluation trial using retrospective electronic health records.

Study Design

Research Insights