S02 - Session O6 - Construction of a semantic distance for inferring structure of the variability among the19th century Rosa varieties

S02 - Session O6 - Construction of a semantic distance for inferring structure of the variability among the19th century Rosa varieties

Friday, August 19, 2022 11:45 AM to 12:00 PM · 15 min. (Europe/Paris)
Angers Congress Centre
S02 International symposium on conservation and sustainable use of horticultural genetic resources

Information

Authors: Alix Pernet *, Rayan Eid, Claudine Landès, Emmanuel Benoît, Pierre Santagostini, Jordan Marie-Magdelaine, Jérémy Clotault, Angelina El Ghaziri, Julie Bourbeillon

Maintaining the diversity of genetic resources is critical to mantaining the evolution capacity of agriculture and horticulture. Many efforts are made to preserve and characterize genetic resources at different levels, especially phenotypically and genetically with molecular markers or DNA sequencing. Statistical tools to describe the different variables (qualitative, semi-ordered, quantitative) are quite limited, especially if there are missing data. The aims of this study are to ( i ) integrate different types of variables in a unique statistical analysis by using the concept of ontology and defining a new distance measure based, if necessary, on expert knowledge; and ( ii ) use statistical techniques to better investigate the underlying reasons of the revealed structure as well as the characterization of each cluster. Our new semantic distance, based on various types of data (passport, phenotypic), was used to estimate pairwise distances among 1400individuals of Rosa. The resulting distance matrix was projected into a new coordinates space thanks to the metric Multi-Dimensional Scaling, and the projection was then used as input for clustering algorithms. To evaluate the contribution of each rosebush characteristic to the observed structure, modalities of the variables were projected into the coordinates space and their proportion estimated for each cluster. Similar work was done by using the Gower distance. Data points representing individuals are widely spread into the coordinates space, and the projection of the modalities of the variables shows a stronger structuring when using the semantic distance. Our distance better represents the reality, and the stronger structuring of the modalities of the variables leads to more precise biological questions. This semantic distance seems useful to investigate variability between genetic resources accessions and should be tested on more datasets.

Type of sessions
Oral Presentations
Type of broadcast
In Replay (after IHC)In personIn remote
Keywords
Ontologyphenotypic and genetic diversityRosasemantic distance
Room
Grand Angle Room A - Screen 1

Log in