Use of Machine Learning to Identify Follow-Up Recommendations in Radiology Reports.

Carrodeguas E1, Lacson R2, Swanson W3, Khorasani R2.

Author information

1: Harvard Medical School, Boston, Massachusetts; Center for Evidence-Based Imaging, Department of Radiology, Brigham and Women's Hospital, Brookline, Massachusetts. Electronic address: emmanuel_carrodeguas@hms.harvard.edu.
2: Harvard Medical School, Boston, Massachusetts; Center for Evidence-Based Imaging, Department of Radiology, Brigham and Women's Hospital, Brookline, Massachusetts.
3: Center for Evidence-Based Imaging, Department of Radiology, Brigham and Women's Hospital, Brookline, Massachusetts.

Abstract

PURPOSE:

The aims of this study were to assess follow-up recommendations in radiology reports, develop and assess traditional machine learning (TML) and deep learning (DL) models in identifying follow-up, and benchmark them against a natural language processing (NLP) system.

METHODS:

This HIPAA-compliant, institutional review board-approved study was performed at an academic medical center generating >500,000 radiology reports annually. One thousand randomly selected ultrasound, radiography, CT, and MRI reports generated in 2016 were manually reviewed and annotated for follow-up recommendations. TML (support vector machines, random forest, logistic regression) and DL (recurrent neural nets) algorithms were constructed and trained on 850 reports (training data), with subsequent optimization of model architectures and parameters. Precision, recall, and F1 score were calculated on the remaining 150 reports (test data). A previously developed and validated NLP system (iSCOUT) was also applied to the test data, with equivalent metrics calculated.

RESULTS:

Follow-up recommendations were present in 12.7% of reports. The TML algorithms achieved F1 scores of 0.75 (random forest), 0.83 (logistic regression), and 0.85 (support vector machine) on the test data. DL recurrent neural nets had an F1 score of 0.71; iSCOUT also had an F1 score of 0.71. Performance of both TML and DL methods by F1 scores appeared to plateau after 500 to 700 samples while training.

CONCLUSIONS:

TML and DL are feasible methods to identify follow-up recommendations. These methods have great potential for near real-time monitoring of follow-up recommendations in radiology reports.