Tag: AEQUITAS

  • Assessing Algorithmic Bias in Language-Based Depression Detection: A Comparison of DNN and LLM Approaches

    Assessing Algorithmic Bias in Language-Based Depression Detection: A Comparison of DNN and LLM Approaches

    A study found that large language models (LLMs) outperform traditional deep neural network (DNN) embeddings in automated depression detection and show reduced gender bias, through racial disparities remain. Among DNN fairness-mitigation techniques, the worst-group loss provided the best balance between overall accuracy and demographic fairness, while fairness-regularized loss underperformed.

    The identified biases affect the fairness and diagnostic reliability of AI systems for mental health assessment, particularly by disadvantaging underrepresented racial and gender groups, mainly Hispanic participants in the case of this research. Such disparities risk perpetuating inequities in automated mental health screening and could undermine trust and validity in clinical or public health applications.

    Learn more about the study here: https://doi.org/10.48550/arXiv.2509.25795


    Reference

    Junias, O., Kini, P., & Chaspari, T. (2025). Assessing Algorithmic Bias in Language-Based Depression Detection: A Comparison of DNN and LLM Approaches. 2025 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), 1-7.

  • Developing personalized algorithms for sensing mental health symptoms in daily life

    Developing personalized algorithms for sensing mental health symptoms in daily life

    This study investigates algorithmic bias in AI tools that predict depression risk using smartphone-sensed behavioral data.

    It finds that these tools underperform in larger, more diverse populations because the behavioral patterns used to predict depression are inconsistent across demographic and socioeconomic subgroups.

    Specifically, the AI models often misclassify individuals from certain groups—such as older adults or those from different racial or gender backgrounds—as being at lower risk than they actually are. The authors emphasize the need for tailored, subgroup-aware approaches to improve reliability and fairness in mental health prediction tools. This work highlights the importance of addressing demographic bias to ensure equitable AI deployment in mental healthcare.

    Learn more about this study here: https://doi.org/10.1038/s44184-025-00147-5


    Reference

    Timmons, A.C., Tutul, A.A., Avramidis, K. et al. Developing personalized algorithms for sensing mental health symptoms in daily life. npj Mental Health Res 4, 34 (2025).

  • Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models

    Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models

    The article investigates racial bias in psychiatric diagnosis and treatment recommendations across four large language models (LLMs): Claude, ChatGPT, Gemini, and NewMes-15. ​ The study evaluates the models’ responses to ten psychiatric cases representing five diagnoses (depression, anxiety, schizophrenia, eating disorders, and ADHD) under three conditions: race-neutral, race-implied, and race-explicitly stated (African American). ​

    Key findings include:

    1) Bias in Treatment Recommendations: LLMs often proposed inferior or divergent treatments when racial characteristics were explicitly or implicitly indicated, particularly for schizophrenia and anxiety cases. ​ Diagnostic decisions showed minimal bias overall. ​

    2) Model Performance: NewMes-15 exhibited the highest degree of racial bias, while Gemini demonstrated the least bias across conditions. ​

    3) Statistical Analysis: A Kruskal–Wallis H-test revealed significant differences in bias among the LLMs, with Gemini being significantly less biased than ChatGPT and NewMes-15. ​

    4) Challenges in AI Development: The study highlights that LLMs trained on biased datasets may perpetuate racial disparities in psychiatric care, even when specialized medical training data is used. ​ Local LLMs, despite their cost and privacy advantages, showed higher susceptibility to bias compared to larger, online models. ​

    Learn more about this study here: https://doi.org/10.1038/s41746-025-01746-4


    Reference

    Bouguettaya, A., Stuart, E.M. & Aboujaoude, E. Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models. npj Digit. Med. 8, 332 (2025).

  • Domain Adversarial Training for Mitigating Gender Bias in Speech-based Mental Health Detection

    Domain Adversarial Training for Mitigating Gender Bias in Speech-based Mental Health Detection

    A domain adversarial training (DAT) was developed in a study as a method to reduce gender bias in AI models for depression and PTSD detection using speech data (E-DAIC dataset).

    DAT improved F1-scores up to +13% and reduced gender gaps in detection accuracy, improving generalization across male and female participants, specially addressing the effects of the latter’s underrepresentation.

    Learn more about this study here: https://doi.org/10.48550/arXiv.2505.03359


    Reference

    Kim, J., Yoon, H., Oh, W., Jung, D., Yoon, S., Kim, D., Lee, D., Lee, S., & Yang, C. (2025). Domain Adversarial Training for Mitigating Gender Bias in Speech-based Mental Health Detection. 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 1-7.

  • Minding the Gaps: Neuroethics, AI, and Depression

    Minding the Gaps: Neuroethics, AI, and Depression

    In this article, the author highlights the benefits and potential issues regarding the use of AI in depression diagnosis/treatment, focusing on the prevalent gender, racial and ethnicity biases.

    It is mentioned that, given the historical, inherent biases in society generally and healthcare specifically, AI-driven advancements are not going to serve minority groups as a matter of course. Unless they are tailored to represent and serve all communities equally, they will exacerbate existing biases and disparities.

    Learn more about this article here: https://nonprofitquarterly.org/minding-the-gaps-neuroethics-ai-and-depression/


    Reference

    Boothroyd, Gemma (2024), “Minding the Gaps: Neuroethics, AI, and Depression”, in Nonprofit Quarterly Magazine, winter 2024, “Health Justice in the Digital Age: Can We Harness AI for Good?”

  • Bias and Fairness in AI-Based Mental Health Models

    Bias and Fairness in AI-Based Mental Health Models

    The paper examines bias and fairness issues in AI-based mental health applications, including diagnostic tools, chatbots, and suicide risk prediction models. It reports how unrepresentative datasets lead to misdiagnosis and unequal outcomes across different socioeconomic, gender and racial groups – namely concerning women, local ethnic minorities or non-Western societies -, and presents mitigation strategies such as diverse datasets, fairness metrics, and human-in-the-loop approaches.

    Learn more about this paper here: https://www.researchgate.net/publication/389214235_Bias_and_Fairness_in_AI-Based_Mental_Health_Models


    Reference

    Barnty, Barnabas & Joseph, Oloyede & Ok, Emmanuel. (2025). Bias and Fairness in AI-Based Mental Health Models.

  • AI and Mental Healthcare – ethical and regulatory considerations

    AI and Mental Healthcare – ethical and regulatory considerations

    This governmental report discusses the ethical and regulatory considerations of using artificial intelligence in mental healthcare in the UK.

    Bias in AI tools (algorithmic bias) can stem from various places, including tools being trained on biased datasets and outputting discriminatory outcomes or developers making biased decisions in the design or training of such tools. For example, mental health Electronic health record (EHR) data is susceptible to cohort and label bias. This can occur because culture-bound presentations of mental disorders, combined with a lack of transcultural literacy among clinicians, often lead to both over- and under-diagnosis. People can also exhibit bias when using AI tools, such as over-relying on, or mistrusting AI outputs. All these biases can be conscious or unconscious.

    Learn more about the report here: https://doi.org/10.58248/PN738


    Reference

    Gardiner, Hannah and Natasha Mutebi (2025), AI and Mental Healthcare – ethical and regulatory considerations, UK Parliament – POST, POSTnote 738, 31 January 2025

  • A Data-Centric Approach to Detecting and Mitigating Demographic Bias in Pediatric Mental Health Text: A Case Study in Anxiety Detection

    A Data-Centric Approach to Detecting and Mitigating Demographic Bias in Pediatric Mental Health Text: A Case Study in Anxiety Detection

    This study examines classification parity across sex and finds that female adolescents have systematically under-diagnosed mental health disorders: their model’s accuracy was ~4 % lower and false negative rate ~9 % higher compared to male patients. The source of the bias resides in the textual data, namely notes corresponding to male patients tended to be on average 500 words longer and had distinct word usage. To mitigate this, the authors introduce a de-biasing method, based on neutralizing biased terms (gendered words and pronouns) and reducing sentences to essential clinical information. After correcting, diagnostic bias is reduced by up to 27%.

    This emphasizes how linguistically transmitted bias—ensuing from word choice and gendered language—consistently leads to the under-diagnosis of mental health disorders among female adolescents, which critically undermines the impartiality of medical diagnosis and treatment.

    Learn more about this study here: https://doi.org/10.48550/arXiv.2501.00129


    Reference

    Ive, J., Bondaronek, P., Yadav, V., Santel, D., Glauser, T., Cheng, T., Strawn, J.R., Agasthya, G., Tschida, J., Choo, S., Chandrashekar, M., Kapadia, A.J., & Pestian, J.P. (2024). A Data-Centric Approach to Detecting and Mitigating Demographic Bias in Pediatric Mental Health Text: A Case Study in Anxiety Detection. 

  • The Role of Gender: Gender Fairness in the Detection of Depression Symptoms on Social Media

    The Role of Gender: Gender Fairness in the Detection of Depression Symptoms on Social Media

    The study found that the BDI-Sen dataset used for depression symptom detection on social media exhibits gender bias, with machine learning models such as mentalBERT showing predictive disparities that generally favour male users. Although bias mitigation techniques like data augmentation reduced the bias, they did not eliminate it completely.

    The existence of this bias affects the fairness and reliability of AI systems in detecting depression symptoms, leading to unequal predictive performance across genders. This can result in under- or over-identification of depression symptoms in certain groups, thereby compromising the validity of such systems for clinical or mental health monitoring.

    Learn more about this study here: https://studenttheses.uu.nl/handle/20.500.12932/47734


    Reference

    Gierschmann, Lara (2024), The Role of Gender: Gender Fairness in the Detection of Depression Symptoms on Social Media, Utrecht University, unpublished Master Thesis

  • AEQUITAS

    AEQUITAS

    Proposal: Preparation of the CSOs and public healthcare sector to address gender and racial biases that might arise from the wide usage of AI in order to protect and promote fundamental rights

    Implementation: 2025 to 2027

    Call:  CERV-2024-CHAR-LITI – Promote civil society organizations’ awareness of, capacity building and implementation of the EU Charter of Fundamental Rights

    Topic: CERV-2024-CHAR-LITI-CHARTER

    Type of Action: CERV-PJG – CERV Project Grants

    Proposed Budget:   1 061 761,00€

    Keywords: gender and racial biases, biomedical AI, EU Charter of Fundamental Rights

    Objective: The topic of the AEQUITAS project is the gender and racial biases that have been reported in biomedical AI and which can lead to misdiagnosis and mistreatment and how they pose a threat to the fundamental rights protected by the EU Charter. The aim of the project is 3-fold:

    – to increase the capacity of the CSOs and human rights organizations in educating the public, monitoring the biases and advocating for the protection of the fundamental rights especially regarding the biomedical AI;

    – to increase the knowledge of healthcare staff from public hospitals on the biases that biomedical AI can present due to the biased data from which they were fed and to help them approach and consult these AI systems with a critical mindset;

    – to develop an AI Regulatory Model that will be used by CSOs and public hospitals in their practices;

    – to develop a European network of CSOs and public hospitals that will collaborate and support each other in the effort to raise public’s awareness on the gender and racial biases of biomedical AI and on the applications of the EU Charter;

    – to raise awareness of the EU Charter of fundamental rights and its application in the AI era;

    – to develop and distribute policy recommendations in order to advocate for the need to regulate biomedical AI.

    Partners:

    • Innovation Hive – Kypseli Kainotomias
    • Kentro Ginaikeion Meleton Kai Ereyvnon Astiki Mi K
    • Universitat Zu Koln
    • Center for the Study of Democracy
    • C.I.P. Citizens in Power
    • Moterų Informacijos Centras Asociacija Mic
    • Lobby Europeo de Mujeres en Espana LEM España
    • TIA Formazione Internazionale Associazione APS
    • Health Citizens – European Institute
    • Technologiko Panepistimio Kyprou
    • Cyens Centre of Excellence
    • Edex – Educational Excellence Corporation Limited
    • Rite Research Institute for Technological Evolution 
    • Vucable
    • TotalEU Production

    Project Website: under development

  • Gender Bias in AI’s Perception of Cardiovascular Risk

    Gender Bias in AI’s Perception of Cardiovascular Risk

    The study investigated gender bias in GPT-4’s assessment of coronary artery disease risk and showed that there was a substantial shift in the perception of risk between men and women when a psychiatric comorbidity was added to the vignette, even when they presented identical complaints.

    This resulted in women being assessed as having as lower risk of CAD when concurrently having a psychiatric condition.

    Learn more about this study here: https://www.jmir.org/2024/1/e54242


    Reference

    Achtari M, Salihu A, Muller O, Abbé E, Clair C, Schwarz J, Fournier S
    Gender Bias in AI’s Perception of Cardiovascular Risk
    J Med Internet Res 2024;26:e54242
    DOI: 10.2196/54242

  • Fairness in AI-Based Mental Health: Clinician Perspectives and Bias Mitigation

    Fairness in AI-Based Mental Health: Clinician Perspectives and Bias Mitigation

    Considering how there is limited research on fairness in automated decision making systems in the clinical domain, particularly in the mental health domain, this study explores clinicians’ perceptions of AI fairness through two distinct scenarios: violence risk assessment and depression phenotype recognition using textual clinical notes.

    Clinicians were engaged with through semi-structured interviews to understand their fairness perceptions and to identify appropriate quantitative fairness objectives for these scenarios. Then, a set of bias mitigation strategies were compared, developed to improve at least one of the four selected fairness objectives. The findings underscore the importance of carefully selecting fairness measures, as prioritizing less relevant measures can have a detrimental rather than a beneficial effect on model behavior in real-world clinical use.

    Learn more about the article here: https://doi.org/10.1609/aies.v7i1.31732


    Reference

    Sogancioglu, G., Mosteiro, P., Salah, A. A., Scheepers, F., & Kaya, H. (2024). Fairness in AI-Based Mental Health: Clinician Perspectives and Bias Mitigation. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society7(1), 1390-1400.

  • Multimodal Fusion of EEG and Audio Spectrogram for Major Depressive Disorder Recognition Using Modified DenseNet121

    Multimodal Fusion of EEG and Audio Spectrogram for Major Depressive Disorder Recognition Using Modified DenseNet121

    Depression and anxiety are common, often co-occurring mental health disorders that complicate diagnosis due to overlapping symptoms and reliance on subjective assessments.

    Standard diagnostic tools are widely used but can introduce bias, as they depend on self-reported symptoms and clinician interpretation, which vary across individuals. These methods also fail to account for neurobiological factors such as neurotransmitter imbalances and altered brain connectivity.

    Similarly, clinical AI/ML models used in healthcare often lack demographic diversity in their training data, with most studies failing to report race and gender, leading to biased outputs and reduced fairness. EEG offers a promising, objective approach to monitoring brain activity, potentially improving diagnostic accuracy and helping address biases in mental health assessment, as this study found.

    Learn more about it here: https://doi.org/10.3390/brainsci14101018


    Reference

    Yousufi, M., Damaševičius, R., & Maskeliūnas, R. (2024). Multimodal Fusion of EEG and Audio Spectrogram for Major Depressive Disorder Recognition Using Modified DenseNet121. Brain sciences14(10), 1018.

  • Deconstructing demographic bias in speech-based machine learning models for digital health

    Deconstructing demographic bias in speech-based machine learning models for digital health

    This study investigates algorithmic bias in AI tools that predict depression risk using smartphone-sensed behavioral data.

    It finds that the model underperforms across several demographic subgroups, including gender, race, age, and socioeconomic status, often misclassifying individuals with depression as low-risk. For example, older adults and Black or low-income individuals were frequently ranked lower in risk than healthier younger or White individuals.

    These biases stem from inconsistent relationships between sensed behaviors and depression across groups. The authors emphasized the need for subgroup-specific modeling to improve fairness and reliability in mental health AI tools.

    Learn more about this study here: https://doi.org/10.3389/fdgth.2024.1351637


    Reference

    Yang M, El-Attar AA and Chaspari T (2024) Deconstructing demographic bias in speech-based machine learning models for digital health. Front. Digit. Health 6: 1351637. 

  • Fairness and bias correction in machine learning for depression prediction across four study populations

    Fairness and bias correction in machine learning for depression prediction across four study populations

    A study found that standard machine learning approaches often exhibit biased behaviours in predicting depression across different populations. It also demonstrated that both standard and novel post-hoc-bias mitigation techniques can effectively reduce unfair bias, though no single model achieves equality of outcomes.

    The biases that were identified risk reinforcing structural inequalities in mental healthcare, particularly affecting underserved populations. This underscores the importance of analyzing fairness during model selection and transparently reporting the impact of debiasing interventions to ensure equitable healthcare applications.

    Learn more about this study here: https://doi.org/10.1038/s41598-024-58427-7


    Reference

    Dang, V.N., Cascarano, A., Mulder, R.H. et al. Fairness and bias correction in machine learning for depression prediction across four study populations. Sci Rep 14, 7848 (2024).

  • Key language markers of depression on social media depend on race

    Key language markers of depression on social media depend on race

    A recent U.S. study published in PNAS found that artificial intelligence models analyzing social media posts can detect signs of depression in white Americans but are far less accurate for Black Americans, underscoring the dangers of using AI trained on non-diverse data in healthcare.

    According to co-author Sharath Chandra Guntuku from Penn Medicine, these differences suggest that prior AI models and language-based assessments have largely overlooked racial diversity. While the researchers noted that social media analysis should not be used for diagnosis, it may still help assess risk or monitor mental health trends in communities

    Learn more about the study here: https://www.pnas.org/doi/10.1073/pnas.2319837121


    Reference

    S. Rai et al (2024), Key language markers of depression on social media depend on race, Proc. Natl. Acad. Sci. U.S.A. 121 (14)

  • Systematic review and meta-analysis of performance of wearable artificial intelligence in detecting and predicting depression

    Systematic review and meta-analysis of performance of wearable artificial intelligence in detecting and predicting depression

    The systematic review and meta-analysis found that wearable AI systems demonstrate promising performance in detecting and predicting depression. However, substantial variability exists among algorithms and devices, thereby indicating that performance can vary significantly.

    What this means is that disparities across different algorithms and devices were identified, suggesting that certain demographic groups may be underrepresented or inadequately served by current wearable AI systems. This variability underscores the need for further research to enhance the generalizability and fairness of these technologies in clinical practice.

    Learn more about this review here: https://doi.org/10.1038/s41746-023-00828-5


    Reference

    Abd-Alrazaq, A., AlSaad, R., Shuweihdi, F. et al. Systematic review and meta-analysis of performance of wearable artificial intelligence in detecting and predicting depression. npj Digit. Med. 6, 84 (2023).

  • Bias Discovery in Machine Learning Models for Mental Health

    Bias Discovery in Machine Learning Models for Mental Health

    This article examined how AI can unintentionally reproduce social and demographic biases when applied to mental health prediction. Using benzodiazepine prescriptions as a proxy for conditions such as depression and anxiety, a study analyzed machine learning models trained on patient data to identify systematic disparities.

    It found that women are more frequently predicted to receive such treatments, reflecting gender bias, while the models perform less accurately for minority ethnic groups, indicating representation and evaluation bias. The AI models here are not used to prescribe drugs but rather to predict treatment likelihoods, revealing how bias in healthcare data can lead to inequitable AI performance in the context of depression-related care.

    Learn more about the article here: https://doi.org/10.3390/info13050237


    Reference

    Mosteiro, P.J., Kuiper, J., Masthoff, J., Scheepers, F., & Spruit, M. (2022). Bias Discovery in Machine Learning Models for Mental Health. Inf., 13, 237.

  • Digital health tools for the passive monitoring of depression: a systematic review of methods

    Digital health tools for the passive monitoring of depression: a systematic review of methods

    This systematic review examines studies linking passive data from smartphones and wearables to depression, identifying key methodological flaws and threats to reproducibility. It highlights biases such as representation, measurement, and evaluation bias, stemming from small, homogenous samples and inconsistent feature construction.

    Although gender and race are not explicitly discussed, the lack of diversity in study populations suggests potential demographic bias. The review calls for improved reporting standards and broader sample inclusion to enhance generalizability and clinical relevance. These improvements are essential for ensuring that digital mental health tools are equitable and reliable across diverse populations.

    Learn more about this review here: https://doi.org/10.1038/s41746-021-00548-8


    Reference

    De Angel, V., Lewis, S., White, K., Oetzmann, C., Leightley, D., Oprea, E., Lavelle, G., Matcham, F., Pace, A., Mohr, D. C., Dobson, R., & Hotopf, M. (2022). Digital health tools for the passive monitoring of depression: a systematic review of methods. NPJ digital medicine5(1), 3.

  • Artificial Intelligence in mental health and the biases of language based models

    Artificial Intelligence in mental health and the biases of language based models

    In this literature review of the uses of Natural Language Processing (NLP) models in psychiatry, an approach that “systematically evaluates each stage of model development to explore how biases arise from a clinical, data science and linguistic perspective” was employed to find existing patterns.

    The result was that significant biases were found, with respect to religion, race, gender, nationality, sexuality and age.

    Learn more about this review here: https://doi.org/10.1371/journal.pone.0240376


    Reference

    Straw, I., & Callison-Burch, C. (2020). Artificial Intelligence in mental health and the biases of language based models. PloS one15(12), e0240376.