Human-AI teaming to improve accuracy and efficiency of eligibility criteria prescreening for oncology trials: a randomized evaluation trial using retrospective electronic health records.
- 2026-02-03
- Nature communications 17(1)
- Ravi B Parikh
- Likhitha Kolla
- Elizabeth A Beothy
- William J Ferrell
- Brenda Laventure
- Matthew Guido
- Anthony Girard
- Yang Li
- Khaled Essam Mahmoud Dosoky
- Karim Tarabishy
- Parth S Patel
- Ayana Andalcio
- Kristin Maloney
- Jose Ulises Mena
- Wael Salloum
- Jinbo Chen
- Ezekiel J Emanuel
- PubMed: 41634037
- DOI: 10.1038/s41467-026-68873-8
Study Design
- Type
- Clinical Trial
- Sample size
- n = 355
- Population
- 355 patients with non-small cell lung or colorectal cancer
- Methods
- randomized noninferiority trial using retrospectively collected clinical charts comparing prescreening by trained research staff alone vs. augmented with a pre-trained language model
- Large Human Trial
- Rigorous Journal
Few adult patients with cancer enroll in oncology clinical trials. A rate-limiting step to trial enrollment is prescreening, involving clinical research staff manually abstracting unstructured health records to identify patients who meet eligibility criteria. Prescreening is time-consuming, labor-intensive, and prone to human error, resulting in under-identification of eligible patients. Neurosymbolic AI language models may approximate or improve the accuracy of prescreening through automated abstraction of enrollment criteria from longitudinal unstructured patient charts. We conduct a randomized noninferiority trial using retrospectively collected clinical charts to compare the accuracy and efficiency of prescreening by trained research staff alone (Human-alone) vs. augmented with a pre-trained language model (Human+AI), among a cohort of 355 patients with non-small cell lung or colorectal cancer. Sample size is determined from analyses of a preliminary dataset as well as a prespecified, interim dataset of 74 charts. Chart-level accuracy, the primary endpoint of Human+AI prescreening is noninferior and superior to Human-alone (76.5% vs. 71.1%). However, efficiency is unchanged with similar average time per chart review, the secondary endpoint, (37.4 vs. 37.8 min). AI-assisted abstraction most improves accuracy for biomarker, staging, and response criteria. Performance is limited in some domains due to automation bias. Although improvements are modest, this large randomized trial evaluating a human-AI framework for oncology prescreening shows that AI language models can approximate and augment human-driven prescreening to enhance identification of trial-eligible patients, potentially increasing enrollment. The trial is registered on ClinicialTrials.gov (NCT06561217).