S20 - Session O3 - Datamining, a powerful tool to magnify an infinite source of information hitherto put aside

S20 - Session O3 - Datamining, a powerful tool to magnify an infinite source of information hitherto put aside

Friday, August 19, 2022 11:00 AM to 11:15 AM · 15 min. (Europe/Paris)
Angers University
S20 International symposium on the vitivinicultural sector: which tools to face current challenges?

Information

Authors: Jean-Michel Hily *

With the dawn of high throughput sequencing (HTS), the deposit and accumulation of genetic information in digital form within dedicated databases (metadata) is massive and ever growing. Datamining, i.e. the process of collecting, searching, extracting and discovering usable information within such large amount of data, is therefore becoming a very important and powerful tool to identify possible new pathogens, as well as new viruses or new variants of known viruses, such as for example from the now well-known Coronaviridae family ( https://virological.org/t/serratus-the-ultra-deep-search-to-discover-novel-coronaviruses/516 ). Grapevine Pinot gris virus (GPGV) is a newly described virus (Giampetruzzi et al. 2012) that infects grapevine and has now been detected in most, if not all grape-growing countries where it has been sought. While its presence is sometimes associated with severe mottling and deformation symptoms, the virus is generally detected in asymptomatic vines. Prior to this work, knowledge on the genetic diversity of GPGV was mostly limited to biased and partial genomic sequences based on PCR analyses. By performing a systematic datamining effort over 500 samples using publicly available SRA (Sequence Read Archives) files as well as in-house dataset, and in association with specific bio-informatic tools, we uncovered invaluable information regarding GPGV. The knowledge revealed from this work is relevant at different levels with information regarding (1) varieties and countries where the virus was detected from, (2) the precise epidemiological data linked to specific locations around the world, (3) the obtention of an important number of unbiased complete GPGV genomic sequences, (4) reporting a so far undescribed genetic diversity which ultimately allowed (5) the unraveling of the worldwide evolutionary history of the virus (Hily et al. 2021b; Hily et al. 2021a; Hily et al. 2020). Out of this 'proof of concepts' studies, some advantages and pitfalls of datamining will be discussed. Giampetruzzi, A. et al. 2012. Virus Research. Hily, J.-M. et al. 2021a. Phytobiomes Journal. Hily, J.-M. et al. 2020.Phytobiomes Journal. Hily, J.-M. et al. 2021b. European Journal of Plant Pathology.

Type of sessions
Oral Presentations
Type of broadcast
In Replay (after IHC)In personIn remote
Keywords
DataminingepidemiologyGrapevineHTSvirology
Room
Amphitheatre Volney

Log in