A Chinese university team has evaluated the performance of multimodal large language models (MMLMs) – specifically Gemini and GPT-series models – across various medical tasks.
The team, led by Dr Xin Zhang from Northwestern Polytechnical University in Xi’an, China, said the advancement of MLLMs has increasingly demonstrated their potential in medical data mining. However, the diversity and heterogenous nature of medical images and radiology reports can pose significant challenges to the universality of data mining methods.
“Our study encompasses 14 diverse medical datasets, spanning dermatology, radiology, dentistry, ophthalmology, and endoscopy image categories, as well as radiology report datasets,” said Dr Zhang. “The tasks evaluated include disease classification, lesion segmentation, anatomical localisation, disease diagnosis, and report generation.”
The results reveal that the Gemini series excelled in report generation and lesion detection, while the GPT series demonstrated strengths in lesion segmentation and anatomical localisation.
“The study highlights the promise of these multimodal models in alleviating the burden on clinicians and fostering the integration of AI into clinical practice, potentially mitigating healthcare resource constraints,” Dr Zhang said. “Nonetheless, further optimisation and rigorous validation are required before clinical deployment.”
The team published its findings in Meta-Radiology.1
Reference
- Zhang Y, Pan Y, Zhong T, et al., Potential of multimodal large language models for data mining of medical images and free-text reports, Meta-Radiology, 2024;2(4):100103. doi: 10.1016/j.metrad.2024.100103.