S20 - Session O3 - Datamining, a powerful tool to magnify an infinite source of information hitherto put aside
Information
Authors: Jean-Michel Hily *
With the dawn of high throughput sequencing (HTS), the deposit and accumulation of genetic information in digital form within dedicated databases (metadata) is massive and ever growing. Datamining, i.e. the process of collecting, searching, extracting and discovering usable information within such large amount of data, is therefore becoming a very important and powerful tool to identify possible new pathogens, as well as new viruses or new variants of known viruses, such as for example from the now well-known Coronaviridae family ( https://virological.org/t/serratus-the-ultra-deep-search-to-discover-novel-coronaviruses/516 ). Grapevine Pinot gris virus (GPGV) is a newly described virus (Giampetruzzi et al. 2012) that infects grapevine and has now been detected in most, if not all grape-growing countries where it has been sought. While its presence is sometimes associated with severe mottling and deformation symptoms, the virus is generally detected in asymptomatic vines. Prior to this work, knowledge on the genetic diversity of GPGV was mostly limited to biased and partial genomic sequences based on PCR analyses. By performing a systematic datamining effort over 500 samples using publicly available SRA (Sequence Read Archives) files as well as in-house dataset, and in association with specific bio-informatic tools, we uncovered invaluable information regarding GPGV. The knowledge revealed from this work is relevant at different levels with information regarding (1) varieties and countries where the virus was detected from, (2) the precise epidemiological data linked to specific locations around the world, (3) the obtention of an important number of unbiased complete GPGV genomic sequences, (4) reporting a so far undescribed genetic diversity which ultimately allowed (5) the unraveling of the worldwide evolutionary history of the virus (Hily et al. 2021b; Hily et al. 2021a; Hily et al. 2020). Out of this 'proof of concepts' studies, some advantages and pitfalls of datamining will be discussed. Giampetruzzi, A. et al. 2012. Virus Research. Hily, J.-M. et al. 2021a. Phytobiomes Journal. Hily, J.-M. et al. 2020.Phytobiomes Journal. Hily, J.-M. et al. 2021b. European Journal of Plant Pathology.