Tag: Representation Bias

Systematic review and meta-analysis of performance of wearable artificial intelligence in detecting and predicting depression

The systematic review and meta-analysis found that wearable AI systems demonstrate promising performance in detecting and predicting depression. However, substantial variability exists among algorithms and devices, thereby indicating that performance can vary significantly.

What this means is that disparities across different algorithms and devices were identified, suggesting that certain demographic groups may be underrepresented or inadequately served by current wearable AI systems. This variability underscores the need for further research to enhance the generalizability and fairness of these technologies in clinical practice.

Learn more about this review here: https://doi.org/10.1038/s41746 -023-00828-5

Reference

Abd-Alrazaq, A., AlSaad, R., Shuweihdi, F. et al. Systematic review and meta-analysis of performance of wearable artificial intelligence in detecting and predicting depression. npj Digit. Med. 6, 84 (2023).
Digital health tools for the passive monitoring of depression: a systematic review of methods

This systematic review examines studies linking passive data from smartphones and wearables to depression, identifying key methodological flaws and threats to reproducibility. It highlights biases such as representation, measurement, and evaluation bias, stemming from small, homogenous samples and inconsistent feature construction.

Although gender and race are not explicitly discussed, the lack of diversity in study populations suggests potential demographic bias. The review calls for improved reporting standards and broader sample inclusion to enhance generalizability and clinical relevance. These improvements are essential for ensuring that digital mental health tools are equitable and reliable across diverse populations.

Learn more about this review here: https://doi.org/10.1038/s41746-021-00548-8

Reference

De Angel, V., Lewis, S., White, K., Oetzmann, C., Leightley, D., Oprea, E., Lavelle, G., Matcham, F., Pace, A., Mohr, D. C., Dobson, R., & Hotopf, M. (2022). Digital health tools for the passive monitoring of depression: a systematic review of methods. NPJ digital medicine, 5(1), 3.
Bias Discovery in Machine Learning Models for Mental Health

This article examined how AI can unintentionally reproduce social and demographic biases when applied to mental health prediction. Using benzodiazepine prescriptions as a proxy for conditions such as depression and anxiety, a study analyzed machine learning models trained on patient data to identify systematic disparities.

It found that women are more frequently predicted to receive such treatments, reflecting gender bias, while the models perform less accurately for minority ethnic groups, indicating representation and evaluation bias. The AI models here are not used to prescribe drugs but rather to predict treatment likelihoods, revealing how bias in healthcare data can lead to inequitable AI performance in the context of depression-related care.

Learn more about the article here: https://doi.org/10.3390/info13050237

Reference

Mosteiro, P.J., Kuiper, J., Masthoff, J., Scheepers, F., & Spruit, M. (2022). Bias Discovery in Machine Learning Models for Mental Health. Inf., 13, 237.
Assessing Algorithmic Bias in Language-Based Depression Detection: A Comparison of DNN and LLM Approaches

A study found that large language models (LLMs) outperform traditional deep neural network (DNN) embeddings in automated depression detection and show reduced gender bias, through racial disparities remain. Among DNN fairness-mitigation techniques, the worst-group loss provided the best balance between overall accuracy and demographic fairness, while fairness-regularized loss underperformed.

The identified biases affect the fairness and diagnostic reliability of AI systems for mental health assessment, particularly by disadvantaging underrepresented racial and gender groups, mainly Hispanic participants in the case of this research. Such disparities risk perpetuating inequities in automated mental health screening and could undermine trust and validity in clinical or public health applications.

Learn more about the study here: https://doi.org/10.48550/arXiv.2509.25795

Reference

Junias, O., Kini, P., & Chaspari, T. (2025). Assessing Algorithmic Bias in Language-Based Depression Detection: A Comparison of DNN and LLM Approaches. 2025 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), 1-7.
Developing personalized algorithms for sensing mental health symptoms in daily life

This study investigates algorithmic bias in AI tools that predict depression risk using smartphone-sensed behavioral data.

It finds that these tools underperform in larger, more diverse populations because the behavioral patterns used to predict depression are inconsistent across demographic and socioeconomic subgroups.

Specifically, the AI models often misclassify individuals from certain groups—such as older adults or those from different racial or gender backgrounds—as being at lower risk than they actually are. The authors emphasize the need for tailored, subgroup-aware approaches to improve reliability and fairness in mental health prediction tools. This work highlights the importance of addressing demographic bias to ensure equitable AI deployment in mental healthcare.

Learn more about this study here: https://doi.org/10.1038/s44184-025-00147-5

Reference

Timmons, A.C., Tutul, A.A., Avramidis, K. et al. Developing personalized algorithms for sensing mental health symptoms in daily life. npj Mental Health Res 4, 34 (2025).
Domain Adversarial Training for Mitigating Gender Bias in Speech-based Mental Health Detection

A domain adversarial training (DAT) was developed in a study as a method to reduce gender bias in AI models for depression and PTSD detection using speech data (E-DAIC dataset).

DAT improved F1-scores up to +13% and reduced gender gaps in detection accuracy, improving generalization across male and female participants, specially addressing the effects of the latter’s underrepresentation.

Learn more about this study here: https://doi.org/10.48550/arXiv.2505.03359

Reference

Kim, J., Yoon, H., Oh, W., Jung, D., Yoon, S., Kim, D., Lee, D., Lee, S., & Yang, C. (2025). Domain Adversarial Training for Mitigating Gender Bias in Speech-based Mental Health Detection. 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 1-7.
Bias and Fairness in AI-Based Mental Health Models

The paper examines bias and fairness issues in AI-based mental health applications, including diagnostic tools, chatbots, and suicide risk prediction models. It reports how unrepresentative datasets lead to misdiagnosis and unequal outcomes across different socioeconomic, gender and racial groups – namely concerning women, local ethnic minorities or non-Western societies -, and presents mitigation strategies such as diverse datasets, fairness metrics, and human-in-the-loop approaches.

Learn more about this paper here: https://www.researchgate.net/publication/389214235_Bias_and_Fairness_in_AI-Based_Mental_Health_Models

Reference

Barnty, Barnabas & Joseph, Oloyede & Ok, Emmanuel. (2025). Bias and Fairness in AI-Based Mental Health Models.
AI and Mental Healthcare – ethical and regulatory considerations

This governmental report discusses the ethical and regulatory considerations of using artificial intelligence in mental healthcare in the UK.

Bias in AI tools (algorithmic bias) can stem from various places, including tools being trained on biased datasets and outputting discriminatory outcomes or developers making biased decisions in the design or training of such tools. For example, mental health Electronic health record (EHR) data is susceptible to cohort and label bias. This can occur because culture-bound presentations of mental disorders, combined with a lack of transcultural literacy among clinicians, often lead to both over- and under-diagnosis. People can also exhibit bias when using AI tools, such as over-relying on, or mistrusting AI outputs. All these biases can be conscious or unconscious.

Learn more about the report here: https://doi.org/10.58248/PN738

Reference

Gardiner, Hannah and Natasha Mutebi (2025), AI and Mental Healthcare – ethical and regulatory considerations, UK Parliament – POST, POSTnote 738, 31 January 2025
A Data-Centric Approach to Detecting and Mitigating Demographic Bias in Pediatric Mental Health Text: A Case Study in Anxiety Detection

This study examines classification parity across sex and finds that female adolescents have systematically under-diagnosed mental health disorders: their model’s accuracy was ~4 % lower and false negative rate ~9 % higher compared to male patients. The source of the bias resides in the textual data, namely notes corresponding to male patients tended to be on average 500 words longer and had distinct word usage. To mitigate this, the authors introduce a de-biasing method, based on neutralizing biased terms (gendered words and pronouns) and reducing sentences to essential clinical information. After correcting, diagnostic bias is reduced by up to 27%.

This emphasizes how linguistically transmitted bias—ensuing from word choice and gendered language—consistently leads to the under-diagnosis of mental health disorders among female adolescents, which critically undermines the impartiality of medical diagnosis and treatment.

Learn more about this study here: https://doi.org/10.48550/arXiv.2501.00129

Reference

Ive, J., Bondaronek, P., Yadav, V., Santel, D., Glauser, T., Cheng, T., Strawn, J.R., Agasthya, G., Tschida, J., Choo, S., Chandrashekar, M., Kapadia, A.J., & Pestian, J.P. (2024). A Data-Centric Approach to Detecting and Mitigating Demographic Bias in Pediatric Mental Health Text: A Case Study in Anxiety Detection.
The Role of Gender: Gender Fairness in the Detection of Depression Symptoms on Social Media

The study found that the BDI-Sen dataset used for depression symptom detection on social media exhibits gender bias, with machine learning models such as mentalBERT showing predictive disparities that generally favour male users. Although bias mitigation techniques like data augmentation reduced the bias, they did not eliminate it completely.

The existence of this bias affects the fairness and reliability of AI systems in detecting depression symptoms, leading to unequal predictive performance across genders. This can result in under- or over-identification of depression symptoms in certain groups, thereby compromising the validity of such systems for clinical or mental health monitoring.

Learn more about this study here: https://studenttheses.uu.nl/handle/20.500.12932/47734

Reference

Gierschmann, Lara (2024), The Role of Gender: Gender Fairness in the Detection of Depression Symptoms on Social Media, Utrecht University, unpublished Master Thesis