The article investigates racial bias in psychiatric diagnosis and treatment recommendations across four large language models (LLMs): Claude, ChatGPT, Gemini, and NewMes-15. The study evaluates the models’ responses to ten psychiatric cases representing five diagnoses (depression, anxiety, schizophrenia, eating disorders, and ADHD) under three conditions: race-neutral, race-implied, and race-explicitly stated (African American).
Key findings include:
1) Bias in Treatment Recommendations: LLMs often proposed inferior or divergent treatments when racial characteristics were explicitly or implicitly indicated, particularly for schizophrenia and anxiety cases. Diagnostic decisions showed minimal bias overall.
2) Model Performance: NewMes-15 exhibited the highest degree of racial bias, while Gemini demonstrated the least bias across conditions.
3) Statistical Analysis: A Kruskal–Wallis H-test revealed significant differences in bias among the LLMs, with Gemini being significantly less biased than ChatGPT and NewMes-15.
4) Challenges in AI Development: The study highlights that LLMs trained on biased datasets may perpetuate racial disparities in psychiatric care, even when specialized medical training data is used. Local LLMs, despite their cost and privacy advantages, showed higher susceptibility to bias compared to larger, online models.
Learn more about this study here: https://doi.org/10.1038/s41746-025-01746-4
Reference
Bouguettaya, A., Stuart, E.M. & Aboujaoude, E. Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models. npj Digit. Med. 8, 332 (2025).
