Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/48356Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | CARRILERO MARDONES, Mikel | - |
| dc.contributor.author | Perez-Martin, Jorge | - |
| dc.contributor.author | Diez, Francisco Javier | - |
| dc.contributor.author | BERMEJO DELGADO, Inigo | - |
| dc.date.accessioned | 2026-02-02T13:47:27Z | - |
| dc.date.available | 2026-02-02T13:47:27Z | - |
| dc.date.issued | 2026 | - |
| dc.date.submitted | 2026-02-02T12:03:29Z | - |
| dc.identifier.citation | Frontiers in digital health, 7 (Art N° 1718330) | - |
| dc.identifier.issn | - | |
| dc.identifier.uri | http://hdl.handle.net/1942/48356 | - |
| dc.description.abstract | Background and objective Structured clinical data is essential for research and informed decision-making, yet medical reports are frequently stored as unstructured free text. This study compared the performance of BERT-based and generative language models in converting unstructured breast imaging reports into structured, tabular data suitable for clinical and research applications.Methods A dataset of 286 anonymised breast imaging reports in Spanish was translated into English and used to evaluate five transformer-based models pre-trained in medical data: BlueBERT, BioBERT, BioMedBERT, BioGPT and ClinicalT5. Two natural language processing approaches were explored: classification of 19 categorical variables (e.g. diagnostic technique, report type, family history, BI-RADS category, tumour shape and margin) and extractive question answering of four entities (patient age, patient history, parenchymal distortion or asymmetries, and tumour size). Multiple fine-tuning strategies and input configurations were tested for each model, and performance was evaluated using accuracy and macro F1 scores.Results BioGPT demonstrated the best performance in classification tasks, achieving an overall accuracy of 96.10% and a macro F1 score of 90.30%. This was significantly better than BERT-based models (p=0.012 for accuracy and p=0.017 for F1), particularly in underrepresented categories such as tumour descriptors. In extractive question answering tasks, BioGPT achieved an average accuracy of 93.24%, which is slightly lower than that of BioMedBERT and ClinicalT5, but not significantly so. Notably, BioGPT could perform classification and extractive question answering simultaneously, which is a capability unavailable in BERT-like models.Conclusions Generative models, particularly BioGPT, offer a robust and scalable approach to automating the extraction of structured information from unstructured breast imaging reports. Their superior performance, combined with their ability to handle multiple tasks concurrently, highlights their potential to reduce the manual effort required for clinical data curation and to enable the efficient integration of imaging data into research and clinical workflows. | - |
| dc.description.sponsorship | Funding The author(s) declared that financial support was received for this work and/or its publication. This work has been supported by grant PID2019-110686RB-I00 and PID2023-150515OB-I00 from the Spanish Government. The corresponding author was also supported by two UNED–Santander predoctoral researcher grants, a one-year contract and a doctoral research stay at Hasselt University (EIDUNED mobility program 2024). Acknowledgments We would like to thank the HM Montepríncipe and HM Velézquez hospitals in Madrid for their collaboration and support in providing the anonymised medical reports used in this study. We also want to thank other members of our project: A. Delgado and A. Arellano, our medical advisors. B. Fernández de Toro, A. Goñi, and M. García, technical support. | - |
| dc.language.iso | en | - |
| dc.publisher | FRONTIERS MEDIA SA | - |
| dc.rights | 2026 Carrilero-Mardones, Pérez-Martín, Díez and Bermejo Delgado. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. | - |
| dc.subject.other | BI-RADS | - |
| dc.subject.other | BI-RADS | - |
| dc.subject.other | breast cancer | - |
| dc.subject.other | breast cancer | - |
| dc.subject.other | generative models | - |
| dc.subject.other | generative models | - |
| dc.subject.other | BERT models | - |
| dc.subject.other | BERT models | - |
| dc.subject.other | breast imaging | - |
| dc.subject.other | breast imaging | - |
| dc.subject.other | classification | - |
| dc.subject.other | classification | - |
| dc.subject.other | extractive question answering | - |
| dc.subject.other | extractive question answering | - |
| dc.subject.other | structured reporting TYPE Original Research PUBLISHED | - |
| dc.subject.other | structured reporting | - |
| dc.title | Extracting structured data from unstructured breast imaging reports with transformer-based models | - |
| dc.type | Journal Contribution | - |
| dc.identifier.volume | 7 | - |
| local.format.pages | 15 | - |
| local.bibliographicCitation.jcat | A1 | - |
| dc.description.notes | Carrilero-Mardones, M (corresponding author), Univ Nacl Educ Distancia UNED, Dept Artificial Intelligence, Madrid, Spain. | - |
| dc.description.notes | mcarrilero@dia.uned.es | - |
| local.publisher.place | AVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE, CH-1015, SWITZERLAND | - |
| local.type.refereed | Refereed | - |
| local.type.specified | Article | - |
| local.bibliographicCitation.artnr | 1718330 | - |
| dc.identifier.doi | 10.3389/fdgth.2025.1718330 | - |
| dc.identifier.pmid | 41586210 | - |
| dc.identifier.isi | 001667593700001 | - |
| local.provider.type | wosris | - |
| local.description.affiliation | [Carrilero-Mardones, Mikel; Perez-Martin, Jorge; Diez, Francisco Javier] Univ Nacl Educ Distancia UNED, Dept Artificial Intelligence, Madrid, Spain. | - |
| local.description.affiliation | [Bermejo Delgado, Inigo] Hasselt Univ, Data Sci Inst, Hasselt, Belgium. | - |
| local.uhasselt.international | yes | - |
| item.fullcitation | CARRILERO MARDONES, Mikel; Perez-Martin, Jorge; Diez, Francisco Javier & BERMEJO DELGADO, Inigo (2026) Extracting structured data from unstructured breast imaging reports with transformer-based models. In: Frontiers in digital health, 7 (Art N° 1718330). | - |
| item.contributor | CARRILERO MARDONES, Mikel | - |
| item.contributor | Perez-Martin, Jorge | - |
| item.contributor | Diez, Francisco Javier | - |
| item.contributor | BERMEJO DELGADO, Inigo | - |
| item.fulltext | With Fulltext | - |
| item.accessRights | Open Access | - |
| crisitem.journal.eissn | 2673-253X | - |
| Appears in Collections: | Research publications | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| fdgth-07-1718330.pdf | Published version | 925.69 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.