Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/45888
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPuts, Sander-
dc.contributor.authorZegers, Catharina M. L.-
dc.contributor.authorDekker, Andre-
dc.contributor.authorBERMEJO DELGADO, Inigo-
dc.date.accessioned2025-04-22T10:05:03Z-
dc.date.available2025-04-22T10:05:03Z-
dc.date.issued2025-
dc.date.submitted2025-04-18T09:57:38Z-
dc.identifier.citationJMIR Formative Research, 9 (Art N° e60095)-
dc.identifier.urihttp://hdl.handle.net/1942/45888-
dc.description.abstractBackground: The International Classification of Diseases (ICD), developed by the World Health Organization, standardizes health condition coding to support health care policy, research, and billing, but artificial intelligence automation, while promising, still underperforms compared with human accuracy and lacks the explainability needed for adoption in medical settings. Objective: The potential of large language models for assisting medical coders in the ICD-10 coding was explored through the development of a computer-assisted coding system. This study aimed to augment human coding by initially identifying lead terms and using retrieval-augmented generation (RAG)-based methods for computer-assisted coding enhancement. Methods: The explainability dataset from the CodiEsp challenge (CodiEsp-X) was used, featuring 1000 Spanish clinical cases annotated with ICD-10 codes. A new dataset, CodiEsp-X-lead, was generated using GPT-4 to replace full-textual evidence annotations with lead term annotations. A Robustly Optimized BERT (Bidirectional Encoder Representations from Transformers) Pretraining Approach transformer model was fine-tuned for named entity recognition to extract lead terms. GPT-4 was subsequently employed to generate code descriptions from the extracted textual evidence. Using a RAG approach, ICD codes were assigned to the lead terms by querying a vector database of ICD code descriptions with OpenAI's text-embedding-ada-002 model. Results: The fine-tuned Robustly Optimized BERT Pretraining Approach achieved an overall F1-score of 0.80 for ICD lead term extraction on the new CodiEsp-X-lead dataset. GPT-4-generated code descriptions reduced retrieval failures in the RAG approach by approximately 5% for both diagnoses and procedures. However, the overall explainability F1-score for the CodiEsp-X task was limited to 0.305, significantly lower than the state-of-the-art F1-score of 0.633. The diminished performance was partly due to the reliance on code descriptions, as some ICD codes lacked descriptions, and the approach did not fully align with the medical coder's workflow. Conclusions: While lead term extraction showed promising results, the subsequent RAG-based code assignment using GPT-4 and code descriptions was less effective. Future research should focus on refining the approach to more closely mimic the medical coder's workflow, potentially integrating the alphabetic index and official coding guidelines, rather than relying solely on code descriptions. This alignment may enhance system accuracy and better support medical coders in practice.-
dc.description.sponsorshipThe authors acknowledge the use of ChatGPT (GPT-4 and GPT-4o, OpenAI, 2023/2024) to assist in improving the language and readability of this manuscript. ChatGPT was specifically employed to rephrase sentences for clarity and enhance the overall articulation of the text. The authors carefully reviewed and refined the rephrased content to ensure that the final text accurately reflected their intended meaning and original ideas.-
dc.language.isoen-
dc.publisherJMIR PUBLICATIONS, INC-
dc.rightsSander Puts, Catharina M L Zegers, Andre Dekker, Iñigo Bermejo. Originally published in JMIR Formative Research (https://formative.jmir.org), 11.02.2025. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.-
dc.subject.otherInternational Classification of Diseases-
dc.subject.otherICD-10-
dc.subject.othercomputer-assisted-coding-
dc.subject.otherGPT-4-
dc.subject.othercoding-
dc.subject.otherterm extraction-
dc.subject.othercode analysis-
dc.subject.othercomputer assisted coding-
dc.subject.othertransformer model-
dc.subject.otherartificial intelligence-
dc.subject.otherAI automation-
dc.subject.otherretrieval-augmented generation-
dc.subject.otherRAG-
dc.subject.otherlarge language model-
dc.subject.otherLLM-
dc.subject.otherBidirectional Encoder Representations from-
dc.titleDeveloping an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description- Based Code Selection-
dc.typeJournal Contribution-
dc.identifier.volume9-
local.format.pages12-
local.bibliographicCitation.jcatA1-
dc.description.notesPuts, S (corresponding author), Maastricht Univ, GROW Res Inst Oncol & Reprod, Med Ctr, Dept Radiat Oncol Maastro, POB 616, NL-6200 MD Maastricht, Netherlands.-
dc.description.notesputssander@gmail.com-
local.publisher.place130 QUEENS QUAY East, Unit 1100, TORONTO, ON M5A 0P6, CANADA-
local.type.refereedRefereed-
local.type.specifiedArticle-
local.bibliographicCitation.artnre60095-
dc.identifier.doi10.2196/60095-
dc.identifier.pmid39935026-
dc.identifier.isi001454741600019-
dc.contributor.orcidDekker, Andre/0000-0002-0422-7996-
local.provider.typewosris-
local.description.affiliation[Puts, Sander; Zegers, Catharina M. L.; Dekker, Andre; Bermejo, Inigo] Maastricht Univ, GROW Res Inst Oncol & Reprod, Med Ctr, Dept Radiat Oncol Maastro, POB 616, NL-6200 MD Maastricht, Netherlands.-
local.description.affiliation[Bermejo, Inigo] Hasselt Univ, Data Sci Inst DSI, Hasselt, Belgium.-
local.uhasselt.internationalyes-
item.fulltextWith Fulltext-
item.contributorPuts, Sander-
item.contributorZegers, Catharina M. L.-
item.contributorDekker, Andre-
item.contributorBERMEJO DELGADO, Inigo-
item.fullcitationPuts, Sander; Zegers, Catharina M. L.; Dekker, Andre & BERMEJO DELGADO, Inigo (2025) Developing an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description- Based Code Selection. In: JMIR Formative Research, 9 (Art N° e60095).-
item.accessRightsOpen Access-
crisitem.journal.eissn2561-326X-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
xx.pdfPublished version782.16 kBAdobe PDFView/Open
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.