Oxford researchers released new reporting guidelines jointly published in Nature Medicine and the BMJ for evaluating artificial intelligences (AIs) used to treat patients.
AI-supported clinical decision-support systems are increasingly progressing from development to implementation. However, an AI system’s actual clinical performance at a small scale should be assessed to ensure its safety, and evaluate the human factors surrounding its use. This information can then allow large-scale trials.
Currently, there is a lack of high-quality clinical study evidence showing improved clinician performance or patient outcomes, which is important to support their use. Underlying reasons for this evidence gap may include insufficient expertise to translate a tool into practice, funding for translation, underappreciation of clinical research as a translation mechanism, and a disregard for the value of the early-stage clinical evaluation and the analysis of human factors.
Thus, Oxford researchers convened an international, multistakeholder group of experts to provide better guidance on reporting human factors and early-stage clinical evaluation to develop a new reporting guideline, DECIDE-AI, using the Delphi methodology. Experts were identified and recruited by literature searches or recommendations in each of 20 stakeholder groups. About 151 experts from 18 countries participated in two rounds of Delphi exercises. DECIDE-AI aims to improve the reporting on proof of clinical utility at a small scale, safety, human factors evaluation, and preparation for larger-scale summative trials.
The following recommendations are made:
- Users should be considered as study participants.
- Comparator groups should be considered optional due to the emphasis being on issues other than clinical efficacy, and small studies are likely to be underpowered to make these determinations.
- Human factors should be evaluated.
- Using references, online supplementary materials, and open access repositories is recommended to facilitate the sharing and connecting all required information within one main published evaluation report.
The following issues were considered but left out of recommendations:
- Output interpretability, although important, was not included because the clinical value may be independent of its interpretability.
- User trust in the AI system was not included because there is no commonly accepted way to measure trust in a clinical AI.
- Although iterative design evaluation offers advantages, modifying the AI system (the intervention) during the evaluation received mixed opinions because doing so may interfere with result interpretation.
Image credit: iStock