EukRef
Phylogenetically informed curation of Eukaryotic 18S rDNA
The diversity of eukaryotes extends far beyond the familiar plants, animals, and fungi. In fact, the vast majority of eukaryotic lineages are microbial. Eukaryotic microbes (protists) are important players in ecological processes and also directly influence the biology and health of animals and plants as parasites, commensals and symbionts. However, the extent of their diversity is still largely unknown because most eukaryotes have not yet been or cannot be cultured.
High-throughput environmental sequencing (HTES) has greatly expanded our understanding of microbial biodiversity and its ecological role. HTES enables characterization of microbial communities rapidly and from hundreds of samples at the same time. The depth of sampling has also revealed novel diversity in all ecosystems examined to date. However, the value of HTES data for cataloging the extent and distribution of protistan biodiversity is critically dependent on the quality of the reference databases used to annotate these sequences.
There is a growing and urgent need for well-curated reference databases to annotate the flood of environmental sequences coming in. The lack of a successful method to clean up mislabeled sequences makes manual curation by experts a necessary task. Ribosomal DNA is the marker most frequently used to characterize diversity because it is universally present and has been sequenced for the most comprehensive array of known taxa (microscopically identified and/or cultured organisms). Curated eukaryotic databases of ribosomal DNA have greatly improved analysis capacity in recent years. However, they struggle to keep pace with rapidly changing views on eukaryotic taxonomy, the influx of new data, and computational challenges related to assembling high quality alignments and trees that are necessary for accurate characterization of lineage diversity. As new environmental sequence data continues to reveal novel lineages these data should ideally inform refinements in taxonomy and be incorporated into reference databases. This rarely happens in practice because 1) the communities building taxonomic frameworks for eukaryotes are distinct from those conducting environmental sequence analysis, and 2) curating the vast amounts of existing data in a phylogenetic framework is beyond the scope of individual research groups. Investment now in a curated reference database with high quality alignments and phylogenetic trees will pay dividends now through the diversity research it will enable and in the future because it will facilitate easier and more reliable maintenance and automatic growth that can keep pace with developments in the field.
Generating a phylogenetically and taxonomically informed reference databases in and of itself leads to novel insights into microbial diversity and ecology, in addition to producing a community resource that adds value to further investigations. Many sequences across the eukaryotic tree of life are poorly or misannotated and have not been assembled into a comprehensive and coherent phylogenetic framework. As a result, the curation process can uncover additional novel lineages, refine understanding of the relationships among environmental clades and previously described lineages, and offer new glimpses into the diversity contained within these clades.
|
No hay comentarios:
Publicar un comentario