HTA Considerations for Large Language Models in Healthcare
Leonard C, Unsworth H, Warttig S, Gildea L, Mordin M, Ling C.
Caoimhe Leonard, MSc
Research Associate, Value and Access
RTI Health Solutions
Regulators and Health Technology Assessment bodies have struggled to keep pace with the rapidly evolving technology of large language models. These generative AI models are trained on extremely large data sets to be effective in natural language processing tasks. The latest generation of these models can perform healthcare-related tasks, including text analysis and summarisation, diagnostic assistance, answering medical queries, and image captioning. They have the potential to improve data handling, process automation, service quality, care personalisation, and shorten time to diagnoses.
NICE first published the Evidence Standards Framework for digital health technology evaluations in 2019 and updated it in 2022 to cover more current types of artificial intelligence. The Framework provides a set of standards, aimed to inform purchasing decisions in the NHS and to guide developers in generating evidence for their technologies. It was designed for digital health technologies that use AI and are data driven or have fixed or adaptive machine learning algorithms. Among these are AI image analysis, AI decision support, and health-related chatbots. However, large language models with healthcare applications are not covered by the Framework, as they were not available in 2022.
In our research, we consider the complexity and requirements for evaluating large language models and suggest updates to the Evidence Standards Framework to help HTA bodies and developers of digital health technologies meet standards to successfully approach HTA.