An automated detection method can help clinicians identify overlooked actionable findings in radiology reports and avoid costly delays to patient care. Therefore, one University of Tokyo research team recently assessed four methods for flagging “actionable” results from free-text in reports from their radiology department.
Their analysis compared the performance of bidirectional encoder representations from transformers (BERT) to logistic regression (LR), gradient boosting decision tree (GBDT), and bidirectional long short-term memory (LSTM). These analyses included evaluating whether adding order information can improve performance. The study’s findings were reported in BMC Medical Informatics and Decision Making on 11 September.
Algorithm performance was determined on 90,923 reports, of which 788 (0.87%) contained “actionable” findings. Area under the precision-recall curve (AUPRC) was used as a performance proxy, which plots the algorithm’s ability to accurately identify all “actionable” examples without any false positives. Higher AUPRCs indicate better precision and specifically refers to how the algorithms accurately handled the reports with actionable findings (0.87% of the total).
BERT is a natural language processing (NLP) architecture that accounts for each word’s context, using bidirectional training. This differs from previous NLP approaches that train by reading text from left-to-right or from combining left-to-right and right-to-left reading. BERT outperformed other machine learning models in identifying “actionable” findings (AUPRC: 0.51 vs. 0.46 - 0.48). Adding order information slightly increased performance, but not greatly (AUPRC: 0.52).
The team speculates that BERT’s high effectiveness may result from actionable abnormalities being often emphasised in reports and because the system identifies explicitly highlighted recommendations for follow-up care.
The researchers state: “The results showed that our method based on BERT is more useful for distinguishing various actionable radiology reports from non-actionable ones than models based on other deep learning methods or statistical machine learning.” These results are promising, but larger data pools from multiple organisations are needed to refine the methodology further.