Publication
Mar. 25. 2025.
Title
Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation
Author
Eun Kyoung Hong1, Jiyeon Ham2, Byungseok Roh2, Jawook Gu3, Beomhee Park2, Sunghun Kang2, Kihyun You3, Juhwan Eom2, Byeonguk Bae2, Jae-Bock Jo3, Ok Kyu Song2, Woong Bae3, Ro Woon Lee4, Chong Hyun Suh5, Chan Ho Park6, Seong Jun Choi6, Jai Soung Park6, Jae-Hyeong Park7, Hyun Jeong Jeon8, Jeong-Ho Hong9, Dosang Cho10, Han Seok Choi11, Tae Hee Kim12
1Department of Radiology, Brigham & Women’s Hospital, 75 Francis St, Boston, MA 02215
2Kakao, Seoul, South Korea
3Soombit.ai, Seoul, South Korea
4Inha University, Incheon, South Korea
5Asan Medical Center, Seoul, South Korea
6College of Medicine, Soonchunhyang University, Cheonan, South Korea
7College of Medicine, Chungnam National University, Daejun, South Korea
8College of Medicine, Chungbuk National University, Cheongju, South Korea
9School of Medicine, Keimyung University, Daegu, South Korea
10College of Medicine, Ewha Womans University, Seoul, South Korea
11College of Medicine, Dongguk University, Goyang, South Korea
12School of Medicine, Ajou University, Suwon, South Korea
Published
Use of a multimodal generative artificial intelligence model increased the efficiency and quality of chest radiograph interpretations by reducing reading times and increasing report accuracy and agreement.
Background
Multimodal generative artificial intelligence (AI) technologies can produce preliminary radiology reports, and validation with reader studies is crucial for understanding the clinical value of these technologies.
Purpose
To assess the clinical value of the use of a domain-specific multimodal generative AI tool for chest radiograph interpretation by means of a reader study.
Materials and Methods
A retrospective, sequential, multireader, multicase reader study was conducted using 758 chest radiographs from a publicly available dataset from 2009 to 2017. Five radiologists interpreted the chest radiographs in two sessions: without AI-generated reports and with AI-generated reports as preliminary reports. Reading times, reporting agreement (RADPEER), and quality scores (five-point scale) were evaluated by two experienced thoracic radiologists and compared between the first and second sessions from October to December 2023. Reading times, report agreement, and quality scores were analyzed using a generalized linear mixed model. Additionally, a subset of 258 chest radiographs was used to assess the factual correctness of the reports, and sensitivities and specificities were compared between the reports from the first and second sessions with use of the McNemar test.
Results
The introduction of AI-generated reports significantly reduced average reading times from 34.2 seconds ± 20.4 to 19.8 seconds ± 12.5 (P < .001). Report agreement scores shifted from a median of 5.0 (IQR, 4.0–5.0) without AI reports to 5.0 (IQR, 4.5–5.0) with AI reports (P < .001). Report quality scores changed from 4.5 (IQR, 4.0–5.0) without AI reports to 4.5 (IQR, 4.5–5.0) with AI reports (P < .001). From the subset analysis of factual correctness, the sensitivity for detecting various abnormalities increased significantly, including widened mediastinal silhouettes (84.3% to 90.8%; P < .001) and pleural lesions (77.7% to 87.4%; P < .001). While the overall diagnostic performance improved, variability among individual radiologists was noted.
Conclusion
A domain-specific multimodal generative AI model demonstrated potential for high diagnostic accuracy and clinical value in providing preliminary interpretations of chest radiographs for radiologists.
AIRead-CXR uses cookies. By continuing to view this site, you agree to our policy.