Researchers USP they are using artificial intelligence and one of the largest platforms in the world, Twitter, to try to create anxiety and depression prediction models that, in the future, may show signs of these disorders before clinical diagnosis.
The construction of the database, called SetembroBR, was a first step and is described in an article published in the scientific journal Language Resources and Evaluation. The name is a tribute to the Yellow September movement – an annual suicide prevention campaign – and also because data collection began in September.
In the second stage of the work, still under development, the scientists achieved some preliminary results. Among them, which indicates that it is possible to detect whether a person is at greater risk of developing depression based only on the social network of friends and followers, that is, without taking into account the posts made by the individual.
The database created by the group includes information related to the text (in Portuguese) and the network of connections of 3,900 Twitter users who, after the survey, reported diagnosis or treatment of a mental disorder. The corpus (or collection of information on a given topic) includes all public tweets written by these users individually – without retweets –, totaling around 47 million of these small texts.
“Initially we collected the timelines in an artisanal way, analyzing texts from around 19,000 Twitter users, which corresponds almost to the population of a small town. And then we used two sets – a part of users actually diagnosed with mental disorders and a random part, which served as a control. We wanted to differentiate between people with depression and the general population”, explains Ivandre Paraboni, professor at USP’s School of Arts, Sciences and Humanities (EACH) and corresponding author of the article.
In addition to users, the survey collected texts from the network of friends and followers. That’s because it’s common for a person who has some kind of mental disorder to follow certain accounts, such as discussion forums or a celebrity who has publicly admitted to being depressed. “These people are attracted because they have common interests”, adds Paraboni, who is an associate researcher at the Center for Artificial Intelligence (C4AI), an Engineering Research Center (CPE) made up of FAPESP It is IBM Brazil at USP.
The Foundation also supports the study through the project “Analysis of language in social networks for early detection of mental health disorders”, led by Paraboni.
disorders of mental health, including depression and anxiety, have been identified by the World Health Organization (WHO) as a growing concern in the world. Agency estimates calculate that about 3.8% of the population – or 280 million people – are affected by depression, according to data from 2021.
With the Covid-19 pandemic, the period in which Twitter texts were collected by researchers, there was a 25% increase in the global prevalence of anxiety and depression.
In Brazil, a recent study by the Ministry of Health involving 784,000 participants revealed that 11.3% of Brazilians have already been diagnosed with depression, the majority being women.
Previous research has shown that mental disorders are often reflected in the language used by individuals who suffer from these conditions, which has led to a considerable number of works involving Natural Language Processing (NLP, its acronym in English), focusing on depression, anxiety and bipolar disorder, among others. However, most of them were written in English, not always reflecting the Brazilian profile.
Models
To carry out the study, the USP group subjected the textual corpus to pre-processing and data cleaning procedures to remove hashtags, URLs, emoticons and non-standard characters, while maintaining the original writing.
Deep learning methods were used deep learning) to create four text classifiers and individualized or context-dependent word embeddings using models based on Bert-like transformers (a deep learning algorithm). These models correspond to a neural network that learns context and meaning by monitoring relationships in sequential data, such as words in a sentence.
As input, a sample of 200 randomly selected tweets from each user was used. The parameters are defined by performing five times cross-validation of the training data and calculating the average results.
The research found that the Bert-type transformers models performed best on depression and anxiety disorder prediction tasks. The difference between it and the second best alternative, LogReg, was statistically significant.
As the models analyze sequences of words or entire sentences, it was observed that individuals with depression, for example, tend to talk about subjects related to themselves, using expressions and verbs in the first person, and topics such as death, crisis and psychologist.
“The indications of depression that appear in the office are not necessarily the same as those on the social network. For example: we noticed, in a very strong way, the use in the network of pronouns in the first person, such as “I” and “me”, which in psychology is a classic indicator of depression. But we also found a high incidence among depressive users of the use of the little heart symbol, the emoji of affection, which perhaps is not yet characterized in psychology”, says Paraboni.
The teacher points out that the texts were collected completely anonymized. “We don’t release any tweets or usernames. We take care that not even the students involved in the project have access to user data to protect people’s identity, ”he says.
Now, in addition to expanding the database, the researchers are working to refine the computational technique used and improve the initial models, aiming, in the future, at a tool that may be applied in practice. It could help both in a possible initial screening of people with indications of disorders and help parents, family members and friends of young people at risk of depression and anxiety.
Brazil is the third country that most consumes social networks in the world, according to a survey released in early March by Comscore, behind India and Indonesia and ahead of the United States, Mexico and Argentina.
There are 131.5 million users connected in the country for 46 hours a month, on average, which represents almost two full days. To the nets most accessed by Brazilians are YouTube, Facebook, Instagram, TikTok, Kwai and Twitter, which recently changed its rules, in addition to charging for some types of services.
--