Natural language processing of narrative writing for depression screening in adolescents

a machine-learning approach to predict mental status from single piece of composition

Psychology has a longstanding belief that the words and content one chooses to express are not random but rather a reflection of the hidden state of one’s mental world. The famous Freudian Slip suggests that a verbal or memory mistake is believed to reveal an unconscious belief, thought, or motive. The Thematic Apperception Test (TAT) follows a similar assumption by analyzing one’s thought to an ambiguous or open-end stimuli.

Left, Freudian Slip. Right, Thematic Apperception Test.

While those methods have deep roots in psychology, they are often accused of bad inter-rater reliability and doubtful validity. On the other hand, although questionaires are considered the standard of clinical diagnosis, for particular cohorts, in our case, adolescents, the self-report measure may be potentially biased due to factors such as limited language skills, life experiences, cognitive impairments, and the social expectancy bias. This study aims to find more ecological, age-appropriate and psychologically appropriate depression detection methods for adolescents.

We used students’ compositions written in the classroom to detect depression among adolescents.

For feature extraction, we employed both theory-based and data-driven approach:

  • LIWC (Linguistic Inquiry and Word Count)
  • Word2Vec

For classifiers, we used:

  • classic machine learning approach
    • Logistic Regression
    • SVM
  • deep neural networks
    • TextCNN
    • TextRNN
Schematic illustration of the computational models.

The textRNN model achieved the best performance with F-measure at 0.74.

Performance of TextCNN and TextRNN

Our research highlights the potential of natural language processing techniques in detecting depressive tendencies in young populations at school.

See our work at PsyArXiv