Reference : Unification of functional annotation descriptions using text mining
Scientific journals : Article
Life sciences : Biochemistry, biophysics & molecular biology
Systems Biomedicine
http://hdl.handle.net/10993/47454
Unification of functional annotation descriptions using text mining
English
Queirós, Pedro [> >]
Novikova, Polina mailto [University of Luxembourg > CRC > Communication department (Communication Department) > ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)]
Wilmes, Paul mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Systems Ecology]
May, Patrick mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core]
13-May-2021
Biological Chemistry
Walter de Gruyter
Yes
International
1431-6730
1437-4315
Berlin
Germany
[en] Protein annotation ; Functional annotation ; Text mining ; Natural language processing
[en] A common approach to genome annotation involves the use of homology-based tools for the prediction of the functional role of proteins. The quality of functional annotations is dependent on the reference data used, as such, choosing the appropriate sources is crucial. Unfortunately, no single reference data source can be universally considered the gold standard, thus using multiple references could potentially increase annotation quality and coverage. However, this comes with challenges, particularly due to the introduction of redundant and exclusive annotations. Through text mining it is possible to identify highly similar functional descriptions, thus strengthening the confidence of the final protein functional annotation and providing a redundancy-free output. Here we present UniFunc, a text mining approach that is able to detect similar functional descriptions with high precision. UniFunc was built as a small module and can be independently used or integrated into protein function annotation pipelines. By removing the need to individually analyse and compare annotation results, UniFunc streamlines the complementary use of multiple reference datasets.
Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group) ; Luxembourg Centre for Systems Biomedicine (LCSB): Eco-Systems Biology (Wilmes Group)
FNR PRIDE17/11823097
http://hdl.handle.net/10993/47454
10.1515/hsz-2021-0125
https://doi.org/10.1515/hsz-2021-0125
FnR ; FNR11823097 > Paul Wilmes > MICROH-DTU > Microbiomes In One Health > 01/09/2018 > 28/02/2025 > 2017

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
10.1515_hsz-2021-0125.pdfPublisher postprint736.56 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.