A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.

MOTIVATION:The recent revolution in new sequencing technologies, as a part of the continuous process of adopting new innovative protocols has strongly impacted the interpretation of relations between phenotype and genotype. Thus, understanding the resulting gene sets has become a bottleneck that nee...

Full description

Bibliographic Details
Main Authors:	Aarón Ayllón-Benítez, Fleur Mougin, Julien Allali, Rodolphe Thiébaut, Patricia Thébault
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2018-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC6258551?pdf=render

id	doaj-e00d1e07eb384babb23c504c2de16f76
record_format	Article
spelling	doaj-e00d1e07eb384babb23c504c2de16f762020-11-25T01:30:50ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011311e020803710.1371/journal.pone.0208037A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.Aarón Ayllón-BenítezFleur MouginJulien AllaliRodolphe ThiébautPatricia ThébaultMOTIVATION:The recent revolution in new sequencing technologies, as a part of the continuous process of adopting new innovative protocols has strongly impacted the interpretation of relations between phenotype and genotype. Thus, understanding the resulting gene sets has become a bottleneck that needs to be addressed. Automatic methods have been proposed to facilitate the interpretation of gene sets. While statistical functional enrichment analyses are currently well known, they tend to focus on well-known genes and to ignore new information from less-studied genes. To address such issues, applying semantic similarity measures is logical if the knowledge source used to annotate the gene sets is hierarchically structured. In this work, we propose a new method for analyzing the impact of different semantic similarity measures on gene set annotations. RESULTS:We evaluated the impact of each measure by taking into consideration the two following features that correspond to relevant criteria for a "good" synthetic gene set annotation: (i) the number of annotation terms has to be drastically reduced and the representative terms must be retained while annotating the gene set, and (ii) the number of genes described by the selected terms should be as large as possible. Thus, we analyzed nine semantic similarity measures to identify the best possible compromise between both features while maintaining a sufficient level of details. Using Gene Ontology to annotate the gene sets, we obtained better results with node-based measures that use the terms' characteristics than with measures based on edges that link the terms. The annotation of the gene sets achieved with the node-based measures did not exhibit major differences regardless of the characteristics of terms used.http://europepmc.org/articles/PMC6258551?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Aarón Ayllón-Benítez Fleur Mougin Julien Allali Rodolphe Thiébaut Patricia Thébault
spellingShingle	Aarón Ayllón-Benítez Fleur Mougin Julien Allali Rodolphe Thiébaut Patricia Thébault A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets. PLoS ONE
author_facet	Aarón Ayllón-Benítez Fleur Mougin Julien Allali Rodolphe Thiébaut Patricia Thébault
author_sort	Aarón Ayllón-Benítez
title	A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.
title_short	A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.
title_full	A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.
title_fullStr	A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.
title_full_unstemmed	A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.
title_sort	new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2018-01-01
description	MOTIVATION:The recent revolution in new sequencing technologies, as a part of the continuous process of adopting new innovative protocols has strongly impacted the interpretation of relations between phenotype and genotype. Thus, understanding the resulting gene sets has become a bottleneck that needs to be addressed. Automatic methods have been proposed to facilitate the interpretation of gene sets. While statistical functional enrichment analyses are currently well known, they tend to focus on well-known genes and to ignore new information from less-studied genes. To address such issues, applying semantic similarity measures is logical if the knowledge source used to annotate the gene sets is hierarchically structured. In this work, we propose a new method for analyzing the impact of different semantic similarity measures on gene set annotations. RESULTS:We evaluated the impact of each measure by taking into consideration the two following features that correspond to relevant criteria for a "good" synthetic gene set annotation: (i) the number of annotation terms has to be drastically reduced and the representative terms must be retained while annotating the gene set, and (ii) the number of genes described by the selected terms should be as large as possible. Thus, we analyzed nine semantic similarity measures to identify the best possible compromise between both features while maintaining a sufficient level of details. Using Gene Ontology to annotate the gene sets, we obtained better results with node-based measures that use the terms' characteristics than with measures based on edges that link the terms. The annotation of the gene sets achieved with the node-based measures did not exhibit major differences regardless of the characteristics of terms used.
url	http://europepmc.org/articles/PMC6258551?pdf=render
work_keys_str_mv	AT aaronayllonbenitez anewmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT fleurmougin anewmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT julienallali anewmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT rodolphethiebaut anewmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT patriciathebault anewmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT aaronayllonbenitez newmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT fleurmougin newmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT julienallali newmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT rodolphethiebaut newmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets AT patriciathebault newmethodforevaluatingtheimpactsofsemanticsimilaritymeasuresontheannotationofgenesets
_version_	1725089447433732096

A new method for evaluating the impacts of semantic similarity measures on the annotation of gene sets.

Similar Items