Researchers at the São Carlos Institute of Mathematical Sciences at the University of São Paulo (USP) have developed an algorithm that, according to experiments, can detect fake news with 96% accuracy.
The tool, which will work on the website www.fakenewsbr.com, will be calibrated and undergo new tests over the next few months, especially during the covid pandemic and the elections scheduled for October 2022.
Statistician Francisco Louzada, one of the project’s coordinators, says that the proposal is to bring an objective analysis, made through artificial intelligence, to the subjective assessment that human beings make when evaluating the veracity of a journalistic text.
“We combine several mathematical models that are able to identify whether a news item corresponds to the reality of the facts or not”, explains the researcher, who is also the director of technology transfer at the Center for Mathematical Sciences Applied to Industry (Cepi-Cemeai), which brings together several institutions and has the support of the Foundation for Research Support of the State of São Paulo (Fapesp).
“We put the models to analyze more than 100 thousand news published in the last five years. Then, we confronted the platform with a text base with false or true information”, he continues.
“In the analyzed base, the precision index is around 96%”, informs Louzada.
Once the first tests are completed, the platform will need to undergo constant updates and improvements, not least because fake news adapts and changes over time, predicts the expert.
In search of answers to real problems
Louzada explains that the idea of creating the algorithm that identifies fake news came from the Professional Master’s Program in Mathematics, Statistics and Computing Applied to Industry at USP de São Carlos.
“We have students who are working in the market and bring real problems, which can be solved during the master’s”, he details.
“After a meeting about what problems we were going to tackle, we chose to do an investigation into fake news and, from there, generate a product that could help people”, says the expert.
As mentioned above, the platform brings together a series of mathematical models that, through artificial intelligence and machine learning, determine the probability of a news being false or true.
“The models analyzed more than 100,000 texts to find vocabulary patterns, sentence construction and syntax that are commonly used in fake news”, says Louzada.
After “learning” the typical structure of fake news, the algorithm went through a new phase: the direct analysis of a database of texts classified according to the veracity (or not) of the information.
And it was precisely in this second stage of testing that the researchers observed that the platform was able to identify fake news with 96% accuracy.
Louzada considers that this rate of 96% corresponds only to the database evaluated in this experimental study, and it is possible that the number varies in a broader scenario and outside the controlled research environment.
A job that never ends
The USP São Carlos group also bears in mind that, in order to continue working, the algorithm needs to undergo several updates over time.
“The process of mathematical modeling is growing and needs increments all the time”, points out Louzada, who classifies this constant battle as “a cat and mouse race”.
“We need to expose the platform to new vocabularies and sentence constructions, not least because fake news adapts according to the new barriers that are imposed”, he says.
The statistician reports that the team that takes care of the algorithm is increasing and they plan to transfer the data to a more secure internet server, which can resist hacker attacks.
“And we need to be extra careful, because nothing guarantees that the model will be used by the creators of fake news themselves, to see if the content they created passes our sieve or not”, he adds.
How to combine the best of both worlds
Louzada also believes that computerized platforms that distinguish what is true or false do not come to replace the checking agencies, which have professionals capable of investigating the origins of each piece of news.
“I imagine that the future will have a structure of interaction between men and machines”, he bets.
“Thus, we managed to unite the best of both worlds: the objectivity of artificial intelligence with the subjectivity and thoughtfulness of the human being”, he says.
The statistician also points out another limitation of the platform: for now, it is only possible to enter the full text published on a website, and there is no possibility of analyzing posts from social networks or message groups, such as WhatsApp or Telegram.
A long way ahead
Computer scientist Fabrício Benevenuto, from the Federal University of Minas Gerais (UFMG), who was not directly involved with the work at USP in São Carlos, understands that this area of research is still at a very early stage.
“I would say that we are still in an exploratory stage, especially because the dataset that distinguishes false and true news is still very limited”, he evaluates.
The researcher, who also coordinates the Eleções Sem Fake project, one of the initiatives to combat disinformation created by the Superior Electoral Court (TSE), says that it is still very difficult to know if a tested algorithm for a subject — elections, for example — is also will work for another topic completely different.
“It seems to me that there is still a long way to go before these solutions are available and implemented in practice”, he believes.
Benevenuto argues that there are other paths that can be explored, which go far beyond analyzing the veracity of each news item individually.
“You can take into account the geographic location of that domain or how long a particular site is registered and exists on the internet”, he exemplifies.
“It is also necessary to distinguish what is a news story from what is just an opinion piece or a meme”, continues the computer scientist.
“Often, disinformation is not in a text, but in a digitally altered image or in a chain spread by WhatsApp or Telegram”, he adds.
Despite all the limitations, Louzada understands that the platform can serve as an “additional tool” for the population to stay well informed and separate the wheat from the chaff.
“Statistical models bring a probability of that news being true or false, which can be weighed with the work done by fact-checking agencies, which go after the origin of that information and seek the opinion of experts on the subject”, he says.
“I imagine that we will find the most appropriate way to combat fake news in the midst of these two efforts that complement each other”, he reinforces.