Developing an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description- Based Code Selection

Puts, Sander; Zegers, Catharina M. L.; Dekker, Andre; BERMEJO DELGADO, Inigo

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/45888

Title:	Developing an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description- Based Code Selection
Authors:	Puts, Sander Zegers, Catharina M. L. Dekker, Andre BERMEJO DELGADO, Inigo
Issue Date:	2025
Publisher:	JMIR PUBLICATIONS, INC
Source:	JMIR Formative Research, 9 (Art N° e60095)
Abstract:	Background: The International Classification of Diseases (ICD), developed by the World Health Organization, standardizes health condition coding to support health care policy, research, and billing, but artificial intelligence automation, while promising, still underperforms compared with human accuracy and lacks the explainability needed for adoption in medical settings. Objective: The potential of large language models for assisting medical coders in the ICD-10 coding was explored through the development of a computer-assisted coding system. This study aimed to augment human coding by initially identifying lead terms and using retrieval-augmented generation (RAG)-based methods for computer-assisted coding enhancement. Methods: The explainability dataset from the CodiEsp challenge (CodiEsp-X) was used, featuring 1000 Spanish clinical cases annotated with ICD-10 codes. A new dataset, CodiEsp-X-lead, was generated using GPT-4 to replace full-textual evidence annotations with lead term annotations. A Robustly Optimized BERT (Bidirectional Encoder Representations from Transformers) Pretraining Approach transformer model was fine-tuned for named entity recognition to extract lead terms. GPT-4 was subsequently employed to generate code descriptions from the extracted textual evidence. Using a RAG approach, ICD codes were assigned to the lead terms by querying a vector database of ICD code descriptions with OpenAI's text-embedding-ada-002 model. Results: The fine-tuned Robustly Optimized BERT Pretraining Approach achieved an overall F1-score of 0.80 for ICD lead term extraction on the new CodiEsp-X-lead dataset. GPT-4-generated code descriptions reduced retrieval failures in the RAG approach by approximately 5% for both diagnoses and procedures. However, the overall explainability F1-score for the CodiEsp-X task was limited to 0.305, significantly lower than the state-of-the-art F1-score of 0.633. The diminished performance was partly due to the reliance on code descriptions, as some ICD codes lacked descriptions, and the approach did not fully align with the medical coder's workflow. Conclusions: While lead term extraction showed promising results, the subsequent RAG-based code assignment using GPT-4 and code descriptions was less effective. Future research should focus on refining the approach to more closely mimic the medical coder's workflow, potentially integrating the alphabetic index and official coding guidelines, rather than relying solely on code descriptions. This alignment may enhance system accuracy and better support medical coders in practice.
Notes:	Puts, S (corresponding author), Maastricht Univ, GROW Res Inst Oncol & Reprod, Med Ctr, Dept Radiat Oncol Maastro, POB 616, NL-6200 MD Maastricht, Netherlands. putssander@gmail.com
Keywords:	International Classification of Diseases;ICD-10;computer-assisted-coding;GPT-4;coding;term extraction;code analysis;computer assisted coding;transformer model;artificial intelligence;AI automation;retrieval-augmented generation;RAG;large language model;LLM;Bidirectional Encoder Representations from
Document URI:	http://hdl.handle.net/1942/45888
e-ISSN:	2561-326X
DOI:	10.2196/60095
ISI #:	001454741600019
Rights:	Sander Puts, Catharina M L Zegers, Andre Dekker, Iñigo Bermejo. Originally published in JMIR Formative Research (https://formative.jmir.org), 11.02.2025. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
Category:	A1
Type:	Journal Contribution
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
xx.pdf	Published version	782.16 kB	Adobe PDF	View/Open

Show full item record

SCOPUS^TM
Citations

3

checked on Feb 24, 2026

WEB OF SCIENCE^TM
Citations

2

checked on Feb 26, 2026

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM