One of the strategies used to study the functioning of a biological system – be it a simple cell, a group of organs in an organism, or even a group of species in an ecosystem – involves representing the system as a complex network of interconnected variables.
This strategy can be used, for example, to compare healthy and diseased tissues to identify differences in these two networks and determine which variables should be changed to rebalance the system.
However, comparisons of such complexity require computational tools capable of simultaneously analyzing millions of datasets, typically obtained from analyses of the genome (complete DNA sequences), the transcriptome (all messenger RNA expressed by the genome), the proteome (the entire set of proteins expressed by the genome) and the metabolome (all metabolites resulting from cellular processes).
As part of a project supported by FAPESP and Microsoft Research, scientists at the University of São Paulo (USP) in Brazil have developed a new software package called BioNetStat that can be used to analyze more than two complex networks at the same time and determine biologically relevant patterns.
According to the authors, this tool can have several applications in biological sciences. Examples include the study of disease mechanisms, plant physiology and the structural analysis of large cities using census data.
The study was conducted by scientists at the University of São Paulo Bioscience Institute (IB-USP) and the Mathematics and Statistics Institute (IME-USP). This study was described in an article published in Frontiers in Genetics.
“Most existing programs can compare two complex networks, such as gene expression in cancerous and noncancerous tissues. BioNetStat enables the user to analyze up to ten networks concurrently using statistical methods and to determine, for example, that networks 1 and 5 are different from each other while all other networks are the same,” Marcos Buckeridge, Full Professor at IB-USP and principal investigator for the project, told Agência FAPESP.
The program can be downloaded free of charge from bioconductor.org/packages/release/bioc/html/BioNetStat.html.
“The analysis is based on complex network theory parameters called centralities,” Buckeridge explained. “Centralities measure the importance of each of the variables, or nodes, in the network and determine their positions in the hierarchy of the system”.
“There are various kinds of centralities, according to the angle from which you look at a network. The best-known centralities are degree, betweenness and closeness,” he said.
Degree centrality is a simple centrality measure that counts how many neighbors or correlations a node has. The more connections to other nodes in the network, the greater the degree centrality of the variable in question. The central node is the node with the most connections. The second node in the hierarchy is the node that connects the central node to the remaining nodes, i.e., the node with the highest betweenness centrality. Closeness centrality measures how close two nodes are in a network or how close any node is to all other nodes.
In one of the experiments conducted to validate the tool, the researchers reanalyzed data from a previous study in which the metabolism of sorghum (Sorghum bicolor), the fifth most produced cereal crop worldwide, was compared in three situations: normal (control), drought, and drought with high levels of carbon dioxide (CO2).
The aim of this older study was to assess the possible impacts of climate change on the species.
Using the new software package, the researchers compared the metabolic networks of five organs: leaves, culms, roots, prop roots and grains.
“Thanks to the program, we discovered the high degree centrality of a-ketoglutarate [AKG], a compound overlooked in the previous study, and on that basis, we began formulating a series of new hypotheses,” Buckeridge said.
In another validation experiment, the scientists compared genomic data for four different types of brain tumors – astrocytoma, oligodendroglioma, oligoastrocytoma and glioblastoma – available from a public database, The Cancer Genome Atlas (TCGA).
“These four tumors are not equally aggressive. We used the software to identify the promoters of genes with the most degree centrality. This assessment helps determine key genes to gauge the aggressiveness of a tumor, for example,” Buckeridge said.
The group led by Buckeridge is “now looking for partners with expertise in artificial intelligence techniques, such as machine learning,” he added, “so they can use BioNetStat for predictive modeling purposes.”
“This software would be interesting for use in plant engineering, epidemiological studies of diseases, such as dengue and zika, drug development, and even town planning,” he said. “BioNetStat would enable you to determine, for example, which variable has the most impact on the quality of life for the population of a city and what would happen as a result of intervention in that variable.”
The researchers are also working on an upgrade to make the program faster and capable of analyzing a larger number of networks. A second computer program is being developed to perform the simultaneous analysis of different levels of organization in the various networks that are being compared.
“In the case of a biological system, we will be able to study transcriptomic, proteomic and metabolomic networks under different conditions, and determine how these networks all relate to physiology,” Buckeridge said.
“Focusing on a single key variable is still the usual way to observe phenomena relating to complex systems in economics, politics or biology,” Buckeridge added. “If we can gradually change this culture, we may be able to solve problems more efficiently and with fewer unforeseen consequences,” he said.
The article “BioNetStat: a tool for biological networks differential analysis” by Vinícius Carvalho Jardim, Suzana de Siqueira Santos, Andre Fujita and Marcos Silveira Buckeridge can be read at: www.frontiersin.org/articles/10.3389/fgene.2019.00594/full.