Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini

Objective: To evaluate the accuracy, applicability, comprehensiveness, and communication quality of responses generated by ChatGPT and Google Gemini in adolescent idiopathic scoliosis (AIS)-related scenarios, with the aim of assessing their potential utility as tools in patient management.

Methods: Six case-based questions reflecting common patient concerns related to adolescent idiopathic scoliosis were developed by orthopedic specialists. Responses generated by ChatGPT and Google Gemini were independently evaluated by 61 orthopedic surgeons using a standardized rubric assessing accuracy, applicability, comprehensiveness, and communication clarity, each rated on a 1-5 Likert scale. Comparative analyses between platforms were performed using the Mann-Whitney U and Wilcoxon signed-rank tests. Additionally, open-ended feedback was collected to explore participants’ perspectives on the potential and limitations of AI-based consultations.

Results: ChatGPT outperformed Google Gemini in terms of accuracy (P = .013) in postoperative care scenarios. The results for applicability (P = .119), comprehensiveness (P = .619), and communication (P = .240) were not statistically significant. Orthopedic specialists rated both AI models significantly higher than residents in accuracy, applicability, and comprehensiveness. Most evaluators acknowledged the potential of AI to reduce physician workload and support patient guidance; however, concerns were raised regarding reliability, ethical implications, and the current limitations of AI in ensuring patient safety.

Conclusion: ChatGPT and Google Gemini demonstrated moderate accuracy and communication quality in adolescent idiopathic scoliosis–related scenarios, with ChatGPT showing a modest advantage. Although both models show promising results as supportive tools for patient education and preliminary consultations, their current limitations in accuracy and comprehensiveness restrict their clinical reliability. Multidisciplinary collaboration is crucial to ensure e!ective applications of AI in orthopedic practice.

Level of Evidence: Level III, Diagnostic Study.

Cite this article as: Ya! S, Yapar D, Yapar A, et"al. Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini. Acta Orthop Traumatol Turc., 2025;59(4):222-229.