Earth is estimated to be home to between 10 million and 15 million eukaryotes - species of plants, animals, fungi, and other organisms with cells in which the chromosomal DNA is organized into a membrane-bound nucleus. Only 14% of these species (2.3 million) are known, and under 0.1% (15,000) have partially or completely sequenced genomes.
Knowledge of this small fraction of Earth's biodiversity has facilitated enormous advances in agriculture, medicine and biotechnology-based industries, as well as enhanced strategies for the conservation of endangered species, according to researchers in the field.
To fill the huge knowledge gap and develop the scientific, economic, social and environmental potential of Earth's eukaryotic biodiversity, an international consortium plans to sequence, catalogue and characterize the genomes of every eukaryotic species on Earth within ten years.
The initiative is called Earth BioGenome Project (EBP). Its goals and challenges are set out in a paper published in late April in the journal Proceedings of the National Academy of Sciences of the United States of America (PNAS).
The Sao Paulo Research Foundation - FAPESP will participate in the project via its Research Programs on Biodiversity Characterization, Conservation, Restoration and Sustainable Use (BIOTA-FAPESP) and on eScience and Data Science.
"FAPESP's participation in the EBP offers researchers in Sao Paulo State an opportunity to take part in one of the boldest research projects in the world today. Furthermore, because Brazil is one of the most biodiverse countries in the world, the project's goals can contribute very significantly to the advancement of our nation," said FAPESP Scientific Director Carlos Henrique de Brito Cruz.
Considered one of the most ambitious undertakings in the history of biology, the project is only now feasible according to its coordinators thanks to advances in genome sequencing technology, high-performance computing, data storage and bioinformatics, as well as the rapidly falling cost of genome sequencing and the growth of biobanks, repositories of catalogued biodiversity such as museums, herbariums, and crop collection centers.
With the sequencing of an average vertebrate-sized genome currently costing US$1,000 and with this cost falling rapidly, it will be possible to sequence the genomes of some 1.5 million known eukaryotes and up to 100,000 new eukaryotic species for approximately US$4.7 billion, as well as a large number of single-celled organisms, insects and small marine animals sampled at biodiversity hotspot collection sites, according to the authors of the paper.
The cost estimate includes sequencing instruments, sample collection, data storage, analysis, visualization and dissemination, and project management. The authors note that it is less than the cost of the Human Genome Project (HGP), which began in 1990 and was completed in 2003 for an investment of US$4.8 billion at current prices.
That investment has had a profound effect not just on human medicine but also on veterinary medicine, agricultural bioscience, industrial biotechnology, environmental protection, renewable energy and forensic science. A report issued in 2013 by Battelle Memorial Institute estimated the financial benefits of the HGP to the United States at about US$1 trillion.
After completion of the HGP, the genomes of many biomedically, agriculturally and industrially important organisms were sequenced. In 2015, a group of researchers affiliated with the University of California, Davis, the University of Illinois and the Smithsonian Institution in the US organized a meeting with representatives from academia, government and science funding agencies. This fleshed out the EBP and introduced an even more ambitious goal: sequencing the DNA of all complex life on Earth.
According to Harris Lewin, Professor of Evolution and Ecology at UC Davis and co-chair of the EBP working group, which authored the paper, the project's economic impact could be similar to that of the HGP or even greater. In this case, he noted, the benefits will be distributed globally and will be particularly significant for developing countries like Brazil that have much of the world's biodiversity.
"The EBP will lay the scientific foundation for a new bioeconomy that has the potential to bring innovative solutions to health, environmental, economic and social problems to people across the globe, especially in underdeveloped countries that have significant biodiversity assets," Lewin said in a press release.
In August 2017, FAPESP and the Brazilian Academy of Sciences (ABC) held a Biodiversity and Biobank Workshop with the aim of engaging the Brazilian scientific community in the project. The event took place at FAPESP and was attended by Lewin as well as other researchers from Brazil and the US.
Other participants included curators of several Brazilian biological collections, who later met with representatives of the EBP and Global Genome Biodiversity Network (GGBN) to discuss requirements for and obstacles to Brazilian participation in these initiatives.
According to the EBP working group, the results of the whole-genome sequencing of all known eukaryotes will facilitate the development of better tools for conserving endangered species and ecosystems, especially those affected by climate change, and for protecting and enhancing ecosystem services.
The Living Planet Index, which measures the state of the world's biological diversity, reported a 58% decline in vertebrate populations between 1970 and 2017, and the International Union for Conservation of Nature (IUCN) estimates that some 23,000 out of 80,000 species surveyed are approaching extinction, according to the PNAS paper's authors.
By the year 2050, up to 50% of existing species may become extinct, they note, owing mainly to natural resource-intensive industries.
"The Earth BioGenome Project will give us insight into the history and diversity of life and help us better understand how to conserve it," said Gene Robinson, director of the Carl R. Woese Institute for Genomic Biology at the University of Illinois and co-chair of the EBP working group.
The PNAS paper also stresses the project's essential role in the development of new drugs for infectious and inherited diseases, as well as new biological synthetic fuels, biomaterials and food sources for the human population, which is expected to reach 9.6 billion by 2050.
"We are in the midst of the sixth great extinction event of life on our planet, which not only threatens wildlife species but also imperils the global food supply," the paper adds.
To help achieve its goal of sequencing all eukaryotic life on Earth and making the information available in an open digital repository, the EBP is building partnerships with groups of scientists who work on different organisms and with an array of communities and projects. These include the Global Genome Biodiversity Network, the Vertebrate Genomes Project, the 10,000 Plant Genomes Project and the 5000 Arthropod Genomes Project, among many others.
Among the challenges to be addressed by the project will be coordinating these ongoing genome sequencing initiatives, developing a global strategy to collect and properly preserve specimens so as to enable the production of high-quality genome assemblies, and creating new computational tools to maximize understanding and use of the masses of genomic data generated. Above all, access for the scientific community and benefit sharing across society will have to be guaranteed in an organized manner.
"Sequencing the genomes of organisms may perhaps be the easiest part of the project. The most daunting challenges are acquiring samples with the necessary quality and developing the tools to interpret the huge amount of data that will be generated," said Marie-Anne Van Sluys, Full Professor in the University of Sao Paulo's Bioscience Institute (IB-USP) and a member of the EBP working group.
The completed project is expected to require approximately 1 exabyte (1 billion gigabytes) of digital storage capacity. It will create challenges and opportunities for the development of computer algorithms and other tools with which to visualize, compare and understand the connection of genome sequences to the evolution of phenotypes, organisms and ecosystems.
It will also stimulate the development and deployment of new technologies for sample collection, such as aerial, terrestrial and aquatic drones or autonomous vehicles equipped with high-resolution cameras.
"The project represents an opportunity for us researchers in Brazil to create and innovate not only in genome sequencing but also in data analysis and visualization, sample collection and preservation, and other areas," Sluys said.